agentai WebEdge guide

Karpathy's LLM Knowledge Base System: Full Breakdown of His CLAUDE.md Schema | WebEdge

Karpathy published an LLM Wiki Pattern — a methodology for building persistent, compounding knowledge bases with LLM agents. We break down the exact three-layer architecture, CLAUDE.md schema, and the three core operations: Ingest, Query, Lint.

11 April 2026 7 min read

In this article

  • [2026-04-05] query | Question Topic
  • [2026-04-07] lint | Weekly health check</code></pre>

WebEdge team

[2026-04-05] query | Question Topic

[2026-04-07] lint | Weekly health check</code></pre>

This format allows Unix tool processing: grep "^## \[" log.md | tail -5 retrieves the last 5 operations. The log is also how the LLM (and the user) understand the wiki's history and activity.

CLI Tooling and qmd

Karpathy recommends building CLI tools to help the LLM operate efficiently at scale. He identifies search as the most critical capability. His recommendation: qmd — a local markdown search engine with hybrid BM25/vector search and LLM re-ranking, all on-device. It provides both a CLI interface and an MCP server interface, allowing the LLM agent to invoke search directly during query processing.

Obsidian as the Front-End

The gist describes an Obsidian-native workflow for managing raw sources:

  • Obsidian Web Clipper — browser extension for one-click web-to-markdown capture
  • Image handling: configure Obsidian attachment folder to raw/assets/, bind "Download attachments for current file" hotkey (e.g. Ctrl+Shift+D) for local image storage. Note: LLMs must view images separately after reading the text file.
  • Obsidian Graph View — visualizes wiki connectivity, identifies hub pages and orphans
  • Marp — markdown slide deck format with Obsidian plugin support
  • Dataview — Obsidian plugin that queries page frontmatter to generate dynamic tables from YAML metadata
  • Git repository — the entire wiki is a version-controlled markdown repo with built-in collaboration

Why This Works (The Maintenance Argument)

Karpathy makes a sharp observation about why knowledge bases typically fail: not because reading is hard, but because maintenance is. Updating cross-references, keeping multiple pages consistent, flagging stale claims — these are exactly the tedious, systematic tasks that LLMs excel at and humans avoid.

He connects the pattern to Vannevar Bush's 1945 Memex concept — a vision of personal, curated knowledge with "associative trails between documents." Bush's vision failed because it lacked a solution to the maintenance problem. LLMs solve that problem.

The human's role in this system: curator, analyst, questioner. The LLM's role: everything else.

Enterprise Application with Claude API

The LLM Wiki pattern is the right conceptual foundation for enterprise knowledge management systems built on the Claude API. Direct applications:

  • Competitive intelligence: Ingest competitor press releases, product changelogs, analyst reports — the wiki maintains live entity pages per competitor, with the Lint operation flagging when claims become outdated
  • Internal expertise base: Every project, every decision, every lesson learned — structured, searchable, cross-referenced, and maintained automatically
  • Customer-facing agents: Support agents that don't just answer questions but file valuable Q&A pairs back into the knowledge base, compounding over time
  • Research synthesis: Analysts who need to track a domain over months rather than isolated sessions

The critical architectural insight: the CLAUDE.md schema is the variable. Karpathy is explicit: "The directory structure, the schema conventions, the page formats, the tooling — all of that will depend on your domain." Every element is optional and modular. The schema is co-developed with the LLM, tailored to the specific business domain.

This is exactly what we build at WebEdge — Claude API integrations where the schema evolves with the client's domain and the knowledge base compounds over time.

Summary

Karpathy's LLM Wiki pattern is one of the clearest and most practical methodological contributions to LLM agent design in 2026. It addresses a real architectural problem in traditional RAG and proposes a concrete, implementable solution.

Key components to remember: three-layer architecture (sources → wiki → schema), three operations (Ingest, Query, Lint), two special files (index.md, log.md), and CLAUDE.md as the operational configuration center that you co-develop with the LLM.

If you want to build a system like this for your business using the Claude API, reach out.

Explore our AI solutions → | Get in touch →

W

WebEdge

We specialise in building custom AI solutions, automation systems and web products for growth-oriented companies in Lithuania. GDPR-compliant, EU-hosted.

Get in touch

Ready to implement AI in your business?

Book a free 30-min call — we'll show you what to automate first in your business process.

Related articles

Back to all articles