Principal AI Taxonomist / Knowledge Engineer
Define the evidence rules that control what reaches the model — translate regulatory frameworks into the structured Kura data that makes our governance gates work.
Stack: PostgreSQL (Aurora / pgvector), TypeScript, text-embedding-3-large, LLM APIs
Why this role exists
Kenshiki verifies AI outputs before anyone can act on them. That verification depends on Kura — structured evidence corpora built from regulatory frameworks, standards, and authoritative sources. Somebody has to decide what goes in, what stays out, and how the boundaries between jurisdictions are drawn. That person is you. Vector similarity alone can't distinguish "physical access controls" from "logical access controls" — the words overlap, but the legal obligations don't. We solve this structurally: deterministic SIRE inclusion and exclusion rules enforced in the database before the model ever sees the text. You own those rules.
What you'll do
- Define evidence boundaries. Translate messy regulatory frameworks (ISO 27001, SOC 2, EU AI Act) into strict, atomically chunked evidence nodes inside PostgreSQL. Establish the SIRE inclusion/exclusion vocabularies that control what evidence enters the retrieval pipeline. You don't just define what the model should know — you define the explicit boundaries of what it is forbidden from accessing or inferring.
- Map regulatory crosswalks. Build the structural relationships (supersedes, equivalent, overlaps) between international frameworks so the system can reason across jurisdictions without conflating them.
- Set chunking policy. Dictate how hierarchical compliance documents are parsed — Title, Chapter, Article — so text is chunked by jurisdictional identity, not arbitrary token counts.
- Audit pipeline output. Review the semantic mappings produced by our ingestion pipelines. If the automated output doesn't meet the standard, it doesn't ship. Authority is earned, not generated.
What we're looking for
- Education: MLIS or equivalent, ideally with a focus on Knowledge Organization, Metadata Architecture, or Information Systems.
- Taxonomy experience: You've built controlled vocabularies, ontologies, or classification systems for large unstructured datasets.
- Database comfort: You don't need to write SQL, but you need to design the JSON schemas, indexing rules, and logical constraints that backend systems execute.
- IR literacy: You understand the difference between lexical search and dense vector search, and you know the mathematical limitations of both.
- Precision mindset: In compliance domains, a single false positive is a system failure. You treat that seriously.
Why Kenshiki?
You'll apply classical Library and Information Science to a problem most AI companies don't even know they have. The rules you write become the structural enforcement layer — not a suggestion to the model, but a gate it cannot bypass.