Skip to content
OnticBeta

Foundation layer

Oracle Foundry

The foundation layer of governed inference.

Oracle Foundry transforms authoritative source documents into a queryable, tamper-evident knowledge base. Every downstream governance decision traces back to this layer.

Without this

Without tamper-evident source provenance, every downstream governance decision is an assertion without evidence. Claims can’t be verified, retrieval can’t be audited, and your gate has nothing to check against.

Position in the platform

SystemLayer
Oracle FoundryFoundation
SIRE CrosswalkPost-Foundry
Prompt CompilerL0
Claim LedgerL1-L4
Process Control SystemCross-layer
Forensics LabL5

Seven-stage foundry pipeline

Document ingestion — SHA-256 source hash, idempotent upsert, version-aware change detection, actor tracking
Section-aware chunking — split on heading boundaries, merge undersized chunks (<75 words), heading path metadata, content hashes
Cryptographic watermarking — HMAC-SHA-256 signatures embedded per-chunk, self-contained verification without database access
Embedding — text-embedding-3-large (512d Matryoshka), lease-based claiming prevents double-embed, sovereignty attribution enforced by CHECK constraint
Corpus registration — tier classification, framework/industry/segment metadata, manifest with chunk count and content hash
SIRE identity metadata — Subject anchors domain, Included enriches search, Excluded enforces boundaries, Relevant maps cross-framework topology
License and output policy — five-status enum (licensed/customer_provided/public_domain/synthetic/unknown) with deterministic Prompt Compiler instructions

Canonical oracle frontmatter

Every oracle corpus carries a YAML frontmatter block that combines identity, provenance, SIRE authority metadata, and license enforcement policy into a single canonical schema.

corpus_id: iso-iec-27001-2022-v1
title: "ISO/IEC 27001:2022 — Information Security Management Systems"
tier: tier_2
version: 1
content_type: prose
frameworks: [ISO27001]
industries: [fintech, healthcare, saas, ecommerce, cloud, manufacturing, government]
segments: [enterprise, smb]
source_url: https://www.iso.org/standard/27001
source_publisher: "ISO/IEC Joint Technical Committee JTC 1, Subcommittee SC 27"
last_verified: "2026-02-28"
language: english

license:
status: licensed
notes: "ISO/IEC copyright. Reproduction restricted."
output_policy: citation_only

fact_check:
status: ai_parsed
checked_at: "2026-02-28"
checked_by: openrouter/anthropic/claude-sonnet-4-5

sire:
subject: information_security_management
included:
- ISMS
- risk assessment
- risk treatment
- Statement of Applicability
- controls
- Annex A
- confidentiality
- integrity
- availability
excluded:
- PHI
- covered entity
- business associate
- HIPAA
- ePHI
- data subject
- personal data
- controller
- processor
- GDPR
- DPIA
relevant:
- ISO-27002:2022
- NIST-CSF
- SOC2:CC6
- HITRUST-CSF
- ISO-31000:2018

sire

  • subjectstring

    Domain label (lowercase snake_case). Identity anchor for this corpus.

  • includedstring[]

    Editorial keywords inside this domain. Strengthens discovery — never enforces.

  • excludedstring[]

    Anti-keywords from other domains. The only deterministic enforcement gate at retrieval time.

  • relevantstring[]

    Cross-framework references for topological expansion. Discovery only, not enforcement.

license

  • statusenum

    licensed | customer_provided | public_domain | synthetic | unknown

  • output_policyenum

    citation_only | full_text_permitted | unrestricted | attributed_reproduction | restricted_pending

  • notesstring

    Human-readable constraint description for the source material.

Retrieval modes

Semantic search

Vector similarity retrieval over embedded chunks with metadata filters and thresholding.

Hybrid search

Weighted reciprocal-rank fusion combining vector relevance and full-text ranking.

Hybrid score formula

combined_score = semantic_weight * (1 / (20 + semantic_rank)) + (1 - semantic_weight) * (1 / (20 + text_rank))

Sovereignty and tamper evidence

  • Every embedding vector carries an immutable attribution chain: embedding authority, egress policy, and pipeline run attestation
  • Un-attributed embeddings are structurally impossible — enforced by database CHECK constraint, not application logic
  • Chunk watermarks use HMAC-SHA-256 signatures that verify without database access — the chunk proves its own provenance
  • Immutable event log records every embedding operation (success or failure) for audit and compliance reporting
  • Designed for air-gap and VPC deployment where data must never leave the customer's security perimeter

Manufacturing quality model

  • Stage-gated quality controls from source qualification to production corpus activation
  • Per-stage SPC metrics: chunk variation, validation errors, embedding reliability
  • Production feedback metrics: retrieval hit rate, citation rate, gate pass rate, freshness age
  • Continuous-improvement loop: low-performing corpora are reworked, not ignored

Deployment profile

  • Works across Studio, Refinery, and Clean Room environments (runtime profiles: Lite, Standard, Full)
  • Supports customer-perimeter deployment (air-gap or VPC) for regulated workloads
  • Designed for customer-owned data boundary and provenance-preserving operation

See the full architecture

Oracle Foundry is the first layer in the six-system governed inference stack.

Who uses this

Operator

Corpus engineers

Data stewards who curate, version, and maintain authoritative source collections.

Consumer

Every downstream system

Prompt Compiler, Claim Ledger, SIRE Crosswalk, and Forensics Lab all consume oracle artifacts.