Skip to content

Memory System

Every message, tool call, and event is persisted as append-only JSONL. One file per session (typically per burst) at .annihilation/sessions/.

Session.Writer appends events atomically. Session.Reader replays them for analysis.

Searchable via the Search.Index — a separate SQLite database (.annihilation/search.db) with an FTS5 virtual table using porter unicode61 tokenization:

# Search past sessions
Search.Index.search(project_root, "SSE parser edge case", limit: 10, agent: "coder")
# -> {:ok, [%{burst_id: "...", content_snippet: "...>>>SSE parser<<<...", ...}]}

Supports boolean operators (AND, OR, NOT), phrase matching ("exact phrase"), prefix matching (word*), and column-specific filtering (content:word).

After each burst, the reflection pipeline processes transcripts into structured summaries:

%Annihilation.Memory.DiaryEntry{
id: "diary_42_12345",
burst_id: "burst_20260227T103045_001",
bead_id: "42",
agent_template: "coder",
accomplishments: ["Implemented phase transitions", "Added tests for all phases"],
decisions: ["Used handle_continue pattern over gen_statem"],
challenges: ["SSE parser edge case with split chunks"],
key_learnings: ["Always buffer SSE until double newline"],
tags: ["elixir", "genserver", "streaming"],
status: :success, # :success | :failure | :mixed
quality_score: 0.85,
timestamp: "2026-02-27T10:45:00Z"
}

Diary entries are persisted as JSONL via Memory.DiaryWriter. They support tag matching, quality scoring, and success/failure filtering.

Layer 3: Procedural Memory (Playbook Rules)

Section titled “Layer 3: Procedural Memory (Playbook Rules)”

Distilled from diary entries into reusable, confidence-scored rules:

%Annihilation.Memory.PlaybookRule{
id: "rule_xyz",
text: "Always buffer SSE chunks until \\n\\n delimiter before parsing",
confidence: 0.75,
maturity: :established, # :nascent | :established | :proven
success_count: 5,
failure_count: 0,
anti_pattern: false,
source_entries: ["diary_42_12345"],
tags: ["streaming", "sse"],
created_at: ~U[2026-02-27 10:50:00Z],
last_applied_at: ~U[2026-02-27 12:00:00Z]
}

Playbook rules are stored in YAML for human readability:

  • Project: .annihilation/playbook.yaml (project-specific rules)
  • Global: ~/.annihilation/playbook.yaml (cross-project rules)

Atomic writes via temp file + rename prevent corruption. Project rules take precedence over global rules with the same ID via Playbook.load_merged/1.

# Scoring formula
decayed_confidence = confidence * 0.5^(days_since_last_applied / 90)
# Feedback asymmetry
success_increment = +0.05
failure_decrement = -0.20 # 4x asymmetry -- one bug outweighs several successes

Rules lose half their confidence every 90 days of non-use. If last_applied_at is nil, created_at is used as the baseline.

:nascent --> :established --> :proven
(new) (confidence >= 0.5 (confidence > 0.8
AND 3+ applications) AND 10+ applications)

Promotion happens one level at a time. Demotion occurs when confidence falls below thresholds:

  • :proven -> :established if confidence < 0.5
  • :established -> :nascent if confidence < 0.3

Confidence.sweep/2 runs periodic maintenance across the entire playbook:

{:ok, %{promoted: 2, demoted: 1, flagged: 3}} = Confidence.sweep(project_root)

Applies time decay, promotes/demotes based on thresholds, and flags demotion candidates (confidence < 0.2) and removal candidates (confidence < 0.1, more failures than successes).

When a rule accumulates too many failures, the system proposes inversion:

failure_count >= 3 AND failure_count > 2 * success_count
  1. AntiPattern.scan/1 identifies candidates across the playbook
  2. Each candidate gets an inversion proposal:
    • Original rule is deprecated (confidence set to 0.0)
    • Inverted rule created: "AVOID: {original rule} -- this pattern has caused repeated issues ({N} failures vs {M} successes)."
  3. Tether reviews proposals before they are applied
  4. AntiPattern.apply_inversions/2 deprecates originals and adds inversions

Anti-patterns are surfaced alongside positive rules so psychonauts know what NOT to do.

Before starting a task, Memory.Context queries the playbook for relevant rules and formats them for injection into the agent’s system prompt.

relevance = matching_tags / total_tags # 0.0 - 1.0
combined_score = decayed_confidence * relevance
# Anti-pattern boost: combined_score * 1.5

Rule tags are matched against bead labels and type (case-insensitive).

context = Memory.Context.select_rules(project_root, bead_labels, bead_type)
# -> %{
# rules: [%PlaybookRule{}, ...], # Positive rules, ranked by score
# anti_patterns: [%PlaybookRule{}, ...], # Things to avoid
# total_score: 2.45,
# token_count: 312
# }

Budget: max 500 tokens, max 10 rules. Rules below score 0.05 are excluded.

Memory.Context.format_for_prompt(context)
# ->
# ## Relevant Guidelines
#
# The following rules are based on past experience with similar tasks:
#
# 1. [ESTABLISHED] Always buffer SSE chunks until \n\n delimiter (confidence: 0.75)
# 2. [PROVEN] Use handle_continue for phase transitions (confidence: 0.92)
#
# ## Patterns to Avoid
#
# These patterns have caused problems in similar past work:
#
# 1. AVOID: Using gen_statem for simple state machines (confidence: 0.67)

Context.track_injection/3 records which rules were injected for each agent/bead combo (in ETS). After task completion, Context.record_outcome/4 applies success/failure feedback to all injected rules:

# After agent completes successfully
Memory.Context.record_outcome(project_root, agent_id, bead_id, :success)
# -> All injected rules get +0.05 confidence

Skills are reusable prompt/code templates stored at .annihilation/skills.yaml:

%Annihilation.Skills.Skill{
id: "abc123",
name: "GenServer Testing",
description: "Pattern for testing GenServer state machines",
template: "defmodule MyServerTest do\n use ExUnit.Case...",
tags: ["testing", "genserver", "elixir"],
success_count: 7,
failure_count: 1,
alpha: 8.0, # Beta distribution parameter (successes + 1)
beta: 2.0, # Beta distribution parameter (failures + 1)
created_by: "coder",
created_at: ~U[2026-02-20 14:00:00Z]
}

Skills are ranked using Thompson sampling — a bandit algorithm that balances exploitation (proven skills) with exploration (new/uncertain skills):

# Non-deterministic: samples from each skill's Beta(alpha, beta) distribution
ranked = Skills.Ranking.rank(skills)
# Deterministic alternatives for debugging:
Skills.Ranking.rank_by_mean(skills) # Expected value ranking
Skills.Ranking.rank_by_ucb(skills) # Upper Confidence Bound ranking

New skills (alpha=1, beta=1, uniform prior) get natural exploration because their sampled values vary widely. Proven skills with many successes converge to consistently high samples.

  • search_skills — Find relevant skills by query, ranked by Thompson sampling
  • create_skill — Add a new skill to the catalog
Skills.Ranking.record_feedback(project_root, skill_id, :success)
# -> Increments alpha by 1.0 (and success_count)
Skills.Ranking.record_feedback(project_root, skill_id, :failure)
# -> Increments beta by 1.0 (and failure_count)