Provenance-Preserving Cache Chains for Small Language Models
How Merkle trees turn Reverse RAG into a verifiable, shared knowledge layer.
Download PDFReverse Retrieval Augmented Generation (Reverse RAG) showed that client-side context injection lets small 8B-parameter language models outperform much larger models on page-specific questions. Our reference inference endpoint runs Hermes 3, a NousResearch instruction fine-tune of Meta's Llama 3.1 8B, served freely at uncloseai.com. The client already holds the document. No vector database. No embedding pipeline. No retrieval failures.
Merkle Providence Reverse RAG extends this with a second insight: the client not only holds the document, it can remember what it already computed about that document, prove that memory came from an unmodified source, & share that verified knowledge with others on public or private chains.
A Merkle tree over document content chunks produces a root hash: a 32-byte fingerprint that changes when any byte of the document changes. Pairing this root hash with a question hash, a model identity, & a small set of governance dimensions yields a content-addressed cache key. A cache hit returns a previously computed answer with a Merkle proof of origin; a cache miss triggers fresh inference, classifies the result against its source, & extends the chain.
Our reference implementation is Aborist, a Python content-addressed document store that ingests entire corpora at once, computes Merkle commitments at scale, classifies each answer's faithfulness against its sources with a deterministic lexical verifier, & serves cached answers to any peer that produces a matching cache key. Aborist runs against hermes.ai.unturf.com by default & ships under AGPL-3.0-only at git.unturf.com/engineering/unturf/aborist.
Once published, this technique forces a reckoning: every RAG system that cannot prove provenance of its retrieved chunks operates on unverifiable context. Merkle Providence makes provenance a first-class primitive.
We call this stack Web 2.5: the existing web's documents & URIs, augmented with machine learning context injection & Merkle-verified answer chains, without requiring a blockchain, a new protocol, or replacement of any existing infrastructure. Web 2.5 runs on top of what already exists. Any page. Any browser. Any small model. Public or private chain. No permission required.
source_root, question_hash, model_profile_hash, conversation_hash, governance_policy_hash, schema_version, canonicalization_version, chunking_version. Each dimension partitions the cache namespace cleanly.live / failed / stale / quarantined; falsify (audit-preserving) vs. burn (kindergarten leaf-delete).POINTER-LINKED → ANCHOR-WARRANTED → EVIDENCE-WARRANTED → (ENTAILMENT-VERIFIED reserved), with UNGROUNDED below. Each rung names a strictly stronger property.aborist reclassify promotes records up the ladder when new evidence arrives. The ratchet only goes one way: guess → warranted record.Russell Ballestrini, David Wong, Riley Morgan. "Unturf Automated General Intelligence: Merkle Providence Reverse RAG." unfirehose.com, 2026. https://unfirehose.com/merkle-providence-reverse-rag.html
Source & build: git.unturf.com/.../whitepaper