Build Your Own LLM Brain — Starter Kit

The "brain" pattern is the cheapest, fastest way to give yourself persistent memory across every Claude / ChatGPT / Gemini session you'll ever have. A folder of plain markdown files with one schema + two Python scripts. Works on Mac, Windows, Linux. No Obsidian, no Notion, no SaaS bill. Based on the production brain I run for my 4 SaaS — 91 wiki pages, ~150 sources ingested over 6 months.

Pattern is Andrej Karpathy's LLM Wiki, refined for daily use.

What you get

A 4-layer personal knowledge base:

your-brain/
├── BRAIN.md                  ← the schema (rules for the other layers)
├── brain.config.json         ← per-machine paths (skills dir, etc.)
├── README.md                 ← human-facing intro
├── raw/                      ← immutable copies of every source you ingest
│   └── 2026-05-25-some-source.md
├── wiki/                     ← LLM-maintained markdown with [[wikilinks]]
│   ├── index.md              ← catalog of every page (read first on query)
│   ├── log.md                ← append-only operation log
│   ├── concepts/             ← abstract ideas, frameworks
│   ├── entities/             ← people, companies, products
│   ├── sources/              ← one page per ingested source
│   ├── synthesis/            ← cross-cutting insights
│   └── skills-pending/       ← drafts for new Claude skills (await approval)
└── scripts/
    ├── brain-lint.py         ← broken links, orphans, missing frontmatter
    └── brain-search.py       ← naive term-frequency search

The wiki is queryable by Claude in any session — point Claude at the folder and tell it to read BRAIN.md first.

Why this beats Notion / Obsidian / ChatGPT Memory

Tool	Limitation	Brain solves it
Notion	Locked in proprietary DB, slow LLM access, no `[[wikilinks]]` semantics	Plain markdown, instant Claude access via folder read
Obsidian	Great viewer; weak LLM workflow; vault graph is decorative	Brain treats LLM as the writer/reader, not a human-only viewer
ChatGPT Memory	Black box, can't audit, can't share, fixed quota	Every fact has a source page + frontmatter you can grep
Apple Notes / scattered docs	No structure, no `[[links]]`, no quality gate	6-criterion gate keeps signal high; lint catches broken links
Pinecone / vector DB	Right answer; wrong abstraction for solo operator	Markdown wins until you have >10K pages

10-minute setup

Step 1 — Create the directory structure

# Mac / Linux
mkdir -p your-brain/{raw,wiki/{concepts,entities,sources,synthesis,skills-pending},scripts}

# Windows PowerShell
New-Item -ItemType Directory -Force -Path your-brain\raw, your-brain\wiki\concepts, your-brain\wiki\entities, your-brain\wiki\sources, your-brain\wiki\synthesis, your-brain\wiki\skills-pending, your-brain\scripts

Step 2 — Drop in the 4 starter files

your-brain/BRAIN.md

# BRAIN.md

Schema for the brain. Read this first, every operation.

## 1. Layers

The brain has four layers:

1. **`raw/`** — immutable copies of every source ever ingested. Never edit, never delete.
2. **`wiki/`** — LLM-maintained markdown pages with `[[page-slug]]` cross-references. Five page types: concepts, entities, sources, synthesis, skill drafts.
3. **`BRAIN.md`** (this file) — the schema. Rules for the other three layers.
4. **External skills folder** — read-mostly. Path comes from `brain.config.json` → `externalSkillsDir` (portable across machines). The brain proposes skill updates here, never silently edits. New skills are drafted in `wiki/skills-pending/` for approval before graduating.

**Cross-platform note:** all paths in scripts use `pathlib.Path`. The only OS-specific value is `externalSkillsDir` in `brain.config.json` — edit per machine.

## 2. Operations

### ingest

Triggered by "ingest this" / "add this to the brain":

1. Save the raw source verbatim to `raw/YYYY-MM-DD-<short-slug>.<ext>`. Never modify after.
2. Discuss with the user — takeaway, who it applies to, what the brain already knows.
3. Write or update relevant wiki pages. Cross-reference with `[[page-slug]]`.
4. Run the **Quality Gate** (§3) on every procedure.
5. Append a log entry to `wiki/log.md`.

### query

Triggered by "what does the brain know about X" / "check the brain":

1. Read `wiki/index.md` first.
2. Drill into 2–6 relevant pages.
3. Answer the question, citing pages with `[[page-slug]]`.
4. If synthesis is non-trivial + reusable, file under `wiki/synthesis/` and add to index.
5. Append a log entry.

### lint

Triggered by "lint the brain". Runs `python scripts/brain-lint.py`. Reports broken wikilinks, orphan pages, missing frontmatter.

## 3. Quality Gate

A procedure earns a brain page only if it scores ≥ 4 of 6:

1. **Reusable** — applies to more than one situation.
2. **Improves performance / workflow** — measurably, not stylistic.
3. **Concrete** — specific enough to act on.
4. **In-domain** — within the topics you actually work on (edit this list per your brain).
5. **Novel** — not already captured.
6. **Source-traceable** — links back to a specific section in `raw/`.

| Score | Action |
|---|---|
| 6/6 | Skill draft → `wiki/skills-pending/<slug>.md` (await approval). |
| 4–5/6 | Wiki page in concepts / entities / synthesis. |
| <4/6 | Mention in source page under `## Skipped (failed quality gate)`. |

## 4. Page anatomy

```markdown
---
type: concept              # concept | entity | source | synthesis | skill-draft | index | log
title: Display Title
created: YYYY-MM-DD
updated: YYYY-MM-DD
sources:
  - 2026-05-25-some-source.md
---

# Display Title

Body. Use [[page-slug]] for cross-references.

## Cross-references

- [[page-slug-a]] — why it relates
```

## 5. Naming

- Kebab-case filenames: `construction-draw-schedule.md`.
- Source pages: `YYYY-MM-DD-` prefix.
- Concept / entity / synthesis: no date prefix.
- Skill drafts: future skill name.

## 6. Log format

Append-only `wiki/log.md`. Format: `## [<date> <time>] <op> | "<title>"` followed by ≤ 8 bullets. `<op>` ∈ {init, ingest, query, lint, skill-promote, contradiction}.

## 7. Anti-patterns

Never:
- Write to the external skills folder for a new skill without approval. Drafts go to `wiki/skills-pending/` first.
- Silently overwrite a contradiction. Add a `> [!contradiction]` callout pointing to both sources, then ask.
- Ingest without saving the raw source first. No `raw/` → no ingest.
- Invent a procedure. Every procedure must trace to a specific line in `raw/`.
- Promote a wiki page to a skill without the Quality Gate (6/6).

your-brain/brain.config.json

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "description": "Brain configuration — edit per machine.",
  "externalSkillsDir": "EDIT-ME",
  "externalSkillsDirNote": "OS-native path. Forward slashes on macOS/Linux (e.g. /Users/you/CoWork-Skills); double-backslash on Windows (e.g. C:\\\\Users\\\\you\\\\CoWork-Skills).",
  "rawDir": "raw",
  "wikiDir": "wiki",
  "scriptsDir": "scripts",
  "qualityGateThreshold": 4,
  "qualityGateSkillThreshold": 6,
  "orphanExempt": ["index", "log"],
  "requiredFrontmatter": ["type", "created", "updated"]
}

your-brain/wiki/index.md

---
type: index
title: Wiki Index
created: 2026-05-25
updated: 2026-05-25
---

# Wiki Index

Catalog of every page. One line per page: `- [[slug]] — short summary`. Grouped by section. The query operation always reads this file first.

## Concepts
(none yet — ingest your first source)

## Entities
(none yet)

## Sources
(none yet)

## Synthesis
(none yet)

## Skills
(none yet)

your-brain/wiki/log.md

---
type: log
title: Operation Log
created: 2026-05-25
updated: 2026-05-25
---

# Operation Log

Append-only. Format: `## [<date> <time>] <op> | "<title>"` followed by ≤ 8 bullets.

## [2026-05-25 12:00] init | "brain initialized"

- created BRAIN.md, brain.config.json, scripts/, wiki/, raw/
- ready for first ingest

Step 3 — Drop in the 2 Python scripts

Both are stdlib-only Python 3. No pip install anything.

your-brain/scripts/brain-lint.py

#!/usr/bin/env python3
"""brain-lint.py — Lint the wiki for broken links, orphans, missing frontmatter."""

import re
import sys
from pathlib import Path

WIKI_DIR = Path(__file__).resolve().parent.parent / "wiki"
REQUIRED_FRONTMATTER = ("type", "created", "updated")
ORPHAN_EXEMPT = {"index", "log"}

FENCED_CODE = re.compile(r"```.*?```", re.DOTALL)
INLINE_CODE = re.compile(r"`[^`\n]*`")
WIKILINK = re.compile(r"\[\[([^\[\]\|#]+?)(?:#[^\[\]\|]*)?(?:\|[^\[\]]*)?\]\]")
FRONTMATTER = re.compile(r"\A---\s*\n(.*?)\n---\s*\n", re.DOTALL)


def parse_frontmatter(content):
    m = FRONTMATTER.match(content)
    if not m:
        return None, content
    keys = set()
    for line in m.group(1).splitlines():
        if not line or line[0] in (" ", "\t", "-", "#"):
            continue
        if ":" in line:
            keys.add(line.split(":", 1)[0].strip())
    return keys, content[m.end():]


def strip_code(text):
    text = FENCED_CODE.sub("", text)
    text = INLINE_CODE.sub("", text)
    return text


def find_wikilinks(body):
    return [m.group(1).strip() for m in WIKILINK.finditer(strip_code(body))]


def main():
    if not WIKI_DIR.is_dir():
        print(f"ERROR: wiki dir not found: {WIKI_DIR}", file=sys.stderr)
        return 2

    md_files = sorted(WIKI_DIR.rglob("*.md"))
    slug_to_path = {f.stem: f for f in md_files}

    broken_links = []
    inbound = {slug: 0 for slug in slug_to_path}
    missing_fm = []

    for f in md_files:
        content = f.read_text(encoding="utf-8")
        keys, body = parse_frontmatter(content)
        if keys is None:
            missing_fm.append((f, list(REQUIRED_FRONTMATTER)))
        else:
            missing = [k for k in REQUIRED_FRONTMATTER if k not in keys]
            if missing:
                missing_fm.append((f, missing))

        for slug in find_wikilinks(body):
            if slug not in slug_to_path:
                broken_links.append((f, slug))
            elif slug != f.stem:
                inbound[slug] += 1

    orphans = sorted(
        (slug_to_path[s] for s in slug_to_path
         if inbound[s] == 0 and s not in ORPHAN_EXEMPT),
        key=lambda p: str(p),
    )

    print("=" * 60)
    print("BRAIN LINT REPORT")
    print("=" * 60)

    print("\n[1] Broken wikilinks")
    if broken_links:
        for src, slug in broken_links:
            print(f"  {src.relative_to(WIKI_DIR)} -> [[{slug}]]")
    else:
        print("  none")

    print("\n[2] Orphan pages (no inbound wikilinks)")
    if orphans:
        for p in orphans:
            print(f"  {p.relative_to(WIKI_DIR)}")
    else:
        print("  none")

    print("\n[3] Missing required frontmatter")
    if missing_fm:
        for p, missing in missing_fm:
            print(f"  {p.relative_to(WIKI_DIR)}: missing {', '.join(missing)}")
    else:
        print("  none")

    return 1 if broken_links else 0


if __name__ == "__main__":
    sys.exit(main())

your-brain/scripts/brain-search.py

#!/usr/bin/env python3
"""brain-search.py — Naive term-frequency search across the wiki.

usage: python scripts/brain-search.py <query terms...>
"""

import re
import sys
from collections import Counter
from pathlib import Path

WIKI_DIR = Path(__file__).resolve().parent.parent / "wiki"
TOKEN = re.compile(r"[a-z0-9]{3,}")
WIKILINK = re.compile(r"\[\[[^\]]*\]\]")


def tokenize(text):
    text = WIKILINK.sub(" ", text.lower())
    return TOKEN.findall(text)


def main(argv):
    if len(argv) < 2:
        print("usage: brain-search.py <query terms>", file=sys.stderr)
        return 2

    query = tokenize(" ".join(argv[1:]))
    if not query:
        print("no usable query terms (need >= 3 chars each)", file=sys.stderr)
        return 2

    scored = []
    for f in sorted(WIKI_DIR.rglob("*.md")):
        text = f.read_text(encoding="utf-8")
        counts = Counter(tokenize(text))
        score = sum(counts[t] for t in query)
        # Title bonus: 5x weight for matches in the H1
        h1_match = re.search(r"^# (.+)$", text, re.MULTILINE)
        if h1_match:
            title_tokens = tokenize(h1_match.group(1))
            score += 5 * sum(1 for t in query if t in title_tokens)
        if score > 0:
            scored.append((score, f.relative_to(WIKI_DIR)))

    scored.sort(key=lambda x: (-x[0], str(x[1])))
    if not scored:
        print("(no matches)")
        return 0

    for score, path in scored[:10]:
        print(f"{score:>5}  {path}")
    return 0


if __name__ == "__main__":
    sys.exit(main(sys.argv))

Step 4 — Verify it works (Mac/Linux/Windows)

# Run lint — should print "BRAIN LINT REPORT" with "none" under every section
cd your-brain
python scripts/brain-lint.py

# Try a search — should return (no matches) on empty brain
python scripts/brain-search.py construction draws

Windows note: if python returns "command not found," try python3 or install from python.org. Both scripts work identically on all 3 OSes — only pathlib for paths, only stdlib for everything.

Step 5 — Tell Claude how to use it

Open Claude (Code, Desktop, or any chat client with file access to your-brain/) and paste:

I have a personal knowledge base at <path-to-your-brain>. Read `BRAIN.md` first,
then `wiki/index.md`. From now on, when I say:

- "ingest this" → run the ingest operation from BRAIN.md §2
- "what does the brain know about X" → run the query operation
- "lint the brain" → run scripts/brain-lint.py

Confirm you understand the schema, then wait for my first request.

Now Claude has persistent memory across every session as long as it can read the folder.

How to ingest your first source

In Claude, after pasting the bootstrap above:

Ingest this: <paste a transcript / article / video summary / your notes>

Claude will:

Save the raw source to raw/YYYY-MM-DD-<slug>.md
Ask you 2-3 questions to score the Quality Gate
Write a wiki page (concept / entity / synthesis based on the content type)
Append a log entry

Now you query it next month:

What does the brain know about <topic>?

Claude reads wiki/index.md, drills into 2-6 pages, cites sources, and synthesizes. Repeatable across every session, every chat tool, every device.

Real numbers from my brain

I've been running this pattern for 6 months. Current state:

91 wiki pages (concepts, entities, sources, synthesis)
~150 raw sources ingested (podcasts, articles, my own notes, transcripts)
Lint clean for the entire wiki at any given time
Used in every Claude Code session — when I start a new project, Claude reads my brain to know my conventions, my Hormozi-style offer math preferences, my TJ Robertson SEO rules, etc.
Total cost: $0/month (no SaaS, no DB, no vector store)
Bus factor improvement: if my hard drive dies, the brain is in git. If I switch from Claude to Gemini tomorrow, the brain ports unchanged.

What I'd add as your brain grows

Once you have 30+ pages:

Add a brain-export.py that concatenates the wiki into a single .md for LLM ingestion when starting fresh.
Add tagging — append tags: [...] to frontmatter, query by tag in search.
Add a synthesis cron — once a week, run a Claude session that re-reads recent sources + writes a fresh synthesis page.
Connect to your skill library — set externalSkillsDir in brain.config.json so the brain can propose new Claude skills (drafts go to wiki/skills-pending/, you approve, they graduate).

What this won't do

Find facts you never ingested. Garbage in, no recall.
Replace a vector DB at >10K pages. Use Pinecone/Chroma when you cross 5,000+ pages or want semantic search.
Run on its own. This is a tool you operate with Claude, not a daemon.

Going deeper

Watch the channel for "I let Claude run my content channel for 24 hours" — most of that workflow uses brain queries.
Andrej Karpathy's original LLM Wiki gist: gist.github.com/karpathy/...
Drop questions in DMs at @cory_salisbury — tag me when you ingest your first source.

MIT licensed. Use it. Remix it. Ship something this week.

More starter skills

This pairs perfectly with the CLAUDE.md template (defines how Claude reads your brain on each session) and the 3 production prompts (what to run once the brain is loaded).

See all 3 skills →