Memory System

Three-Layer Architecture

Layer 1: Episodic Memory (Raw Sessions)

Every message, tool call, and event is persisted as append-only JSONL. One file per session (typically per burst) at .annihilation/sessions/.

Session.Writer appends events atomically. Session.Reader replays them for analysis.

Searchable via the Search.Index — a separate SQLite database (.annihilation/search.db) with an FTS5 virtual table using porter unicode61 tokenization:

# Search past sessions
Search.Index.search(project_root, "SSE parser edge case", limit: 10, agent: "coder")
# -> {:ok, [%{burst_id: "...", content_snippet: "...>>>SSE parser<<<...", ...}]}

Supports boolean operators (AND, OR, NOT), phrase matching ("exact phrase"), prefix matching (word*), and column-specific filtering (content:word).

Layer 2: Working Memory (Diary Entries)

After each burst, the reflection pipeline processes transcripts into structured summaries:

%Annihilation.Memory.DiaryEntry{
  id: "diary_42_12345",
  burst_id: "burst_20260227T103045_001",
  bead_id: "42",
  agent_template: "coder",
  accomplishments: ["Implemented phase transitions", "Added tests for all phases"],
  decisions: ["Used handle_continue pattern over gen_statem"],
  challenges: ["SSE parser edge case with split chunks"],
  key_learnings: ["Always buffer SSE until double newline"],
  tags: ["elixir", "genserver", "streaming"],
  status: :success,          # :success | :failure | :mixed
  quality_score: 0.85,
  timestamp: "2026-02-27T10:45:00Z"
}

Diary entries are persisted as JSONL via Memory.DiaryWriter. They support tag matching, quality scoring, and success/failure filtering.

Layer 3: Procedural Memory (Playbook Rules)

Distilled from diary entries into reusable, confidence-scored rules:

%Annihilation.Memory.PlaybookRule{
  id: "rule_xyz",
  text: "Always buffer SSE chunks until \\n\\n delimiter before parsing",
  confidence: 0.75,
  maturity: :established,    # :nascent | :established | :proven
  success_count: 5,
  failure_count: 0,
  anti_pattern: false,
  source_entries: ["diary_42_12345"],
  tags: ["streaming", "sse"],
  created_at: ~U[2026-02-27 10:50:00Z],
  last_applied_at: ~U[2026-02-27 12:00:00Z]
}

Storage

Playbook rules are stored in YAML for human readability:

Project: .annihilation/playbook.yaml (project-specific rules)
Global: ~/.annihilation/playbook.yaml (cross-project rules)

Atomic writes via temp file + rename prevent corruption. Project rules take precedence over global rules with the same ID via Playbook.load_merged/1.

Confidence Scoring

# Scoring formula
decayed_confidence = confidence * 0.5^(days_since_last_applied / 90)

# Feedback asymmetry
success_increment = +0.05
failure_decrement = -0.20    # 4x asymmetry -- one bug outweighs several successes

Time Decay

Rules lose half their confidence every 90 days of non-use. If last_applied_at is nil, created_at is used as the baseline.

Maturity Progression

:nascent --> :established --> :proven
  (new)      (confidence >= 0.5    (confidence > 0.8
              AND 3+ applications)  AND 10+ applications)

Promotion happens one level at a time. Demotion occurs when confidence falls below thresholds:

:proven -> :established if confidence < 0.5
:established -> :nascent if confidence < 0.3

Maintenance Sweep

Confidence.sweep/2 runs periodic maintenance across the entire playbook:

{:ok, %{promoted: 2, demoted: 1, flagged: 3}} = Confidence.sweep(project_root)

Applies time decay, promotes/demotes based on thresholds, and flags demotion candidates (confidence < 0.2) and removal candidates (confidence < 0.1, more failures than successes).

Anti-Patterns

When a rule accumulates too many failures, the system proposes inversion:

Detection Criteria

failure_count >= 3 AND failure_count > 2 * success_count

Inversion Process

AntiPattern.scan/1 identifies candidates across the playbook
Each candidate gets an inversion proposal:
- Original rule is deprecated (confidence set to 0.0)
- Inverted rule created: "AVOID: {original rule} -- this pattern has caused repeated issues ({N} failures vs {M} successes)."
Tether reviews proposals before they are applied
AntiPattern.apply_inversions/2 deprecates originals and adds inversions

Anti-patterns are surfaced alongside positive rules so psychonauts know what NOT to do.

Context Injection

Before starting a task, Memory.Context queries the playbook for relevant rules and formats them for injection into the agent’s system prompt.

Relevance Scoring

relevance = matching_tags / total_tags    # 0.0 - 1.0
combined_score = decayed_confidence * relevance
# Anti-pattern boost: combined_score * 1.5

Rule tags are matched against bead labels and type (case-insensitive).

Selection

context = Memory.Context.select_rules(project_root, bead_labels, bead_type)
# -> %{
#   rules: [%PlaybookRule{}, ...],        # Positive rules, ranked by score
#   anti_patterns: [%PlaybookRule{}, ...], # Things to avoid
#   total_score: 2.45,
#   token_count: 312
# }

Budget: max 500 tokens, max 10 rules. Rules below score 0.05 are excluded.

Prompt Format

Memory.Context.format_for_prompt(context)
# ->
# ## Relevant Guidelines
#
# The following rules are based on past experience with similar tasks:
#
# 1. [ESTABLISHED] Always buffer SSE chunks until \n\n delimiter (confidence: 0.75)
# 2. [PROVEN] Use handle_continue for phase transitions (confidence: 0.92)
#
# ## Patterns to Avoid
#
# These patterns have caused problems in similar past work:
#
# 1. AVOID: Using gen_statem for simple state machines (confidence: 0.67)

Feedback Attribution

Context.track_injection/3 records which rules were injected for each agent/bead combo (in ETS). After task completion, Context.record_outcome/4 applies success/failure feedback to all injected rules:

# After agent completes successfully
Memory.Context.record_outcome(project_root, agent_id, bead_id, :success)
# -> All injected rules get +0.05 confidence

Skills

Skills are reusable prompt/code templates stored at .annihilation/skills.yaml:

%Annihilation.Skills.Skill{
  id: "abc123",
  name: "GenServer Testing",
  description: "Pattern for testing GenServer state machines",
  template: "defmodule MyServerTest do\n  use ExUnit.Case...",
  tags: ["testing", "genserver", "elixir"],
  success_count: 7,
  failure_count: 1,
  alpha: 8.0,      # Beta distribution parameter (successes + 1)
  beta: 2.0,       # Beta distribution parameter (failures + 1)
  created_by: "coder",
  created_at: ~U[2026-02-20 14:00:00Z]
}

Thompson Sampling Ranking

Skills are ranked using Thompson sampling — a bandit algorithm that balances exploitation (proven skills) with exploration (new/uncertain skills):

# Non-deterministic: samples from each skill's Beta(alpha, beta) distribution
ranked = Skills.Ranking.rank(skills)

# Deterministic alternatives for debugging:
Skills.Ranking.rank_by_mean(skills)    # Expected value ranking
Skills.Ranking.rank_by_ucb(skills)     # Upper Confidence Bound ranking

New skills (alpha=1, beta=1, uniform prior) get natural exploration because their sampled values vary widely. Proven skills with many successes converge to consistently high samples.

Skill Tools

search_skills — Find relevant skills by query, ranked by Thompson sampling
create_skill — Add a new skill to the catalog

Feedback

Skills.Ranking.record_feedback(project_root, skill_id, :success)
# -> Increments alpha by 1.0 (and success_count)

Skills.Ranking.record_feedback(project_root, skill_id, :failure)
# -> Increments beta by 1.0 (and failure_count)