Introduction

akh-medu is a neuro-symbolic AI engine that combines hyperdimensional computing (Vector Symbolic Architecture) with knowledge graphs and symbolic reasoning. It runs entirely on the CPU with no LLM dependency, no GPU requirement, and no external NLP models.

What It Does

akh-medu stores, reasons about, and discovers knowledge. You feed it facts (triples like "Dog is-a Mammal"), and it can:

Infer new knowledge via spreading activation, backward chaining, and superposition reasoning
Reason symbolically using e-graph rewrite rules (equality saturation)
Search semantically using high-dimensional binary vectors (VSA)
Act autonomously via an OODA-loop agent with 15 built-in tools
Parse and generate natural language in 5 languages via a grammar framework
Serve knowledge over REST and WebSocket APIs

Architecture

┌──────────────────────────────────────────────────────────┐
│                      Engine API                          │
│  create_symbol · add_triple · infer · query · traverse   │
├──────────┬──────────┬──────────┬──────────┬──────────────┤
│   VSA    │Knowledge │Reasoning │Inference │   Agent      │
│  Ops     │  Graph   │  (egg)   │ Engine   │  OODA Loop   │
│ ─────────│──────────│──────────│──────────│──────────────│
│ HyperVec │petgraph  │ rewrite  │spreading │ 15 tools     │
│ SIMD     │oxigraph  │ rules    │backward  │ planning     │
│ ItemMem  │SPARQL    │ e-graphs │superpos. │ psyche       │
├──────────┴──────────┴──────────┴──────────┴──────────────┤
│                   Tiered Storage                         │
│  Hot (DashMap) · Warm (mmap) · Durable (redb)            │
└──────────────────────────────────────────────────────────┘

Key Differentiators

CPU-only: SIMD-accelerated hypervector operations (AVX2 where available, generic fallback everywhere else). No GPU, no CUDA, no model weights.
No LLM dependency: All reasoning is algebraic (VSA bind/unbind, e-graph rewriting, graph traversal). Grammar-based NLP replaces transformer models.
Hyperdimensional computing: 10,000-bit binary vectors encode symbols. Similarity is Hamming distance. Binding is XOR. Bundling is majority vote.
Full provenance: Every derived fact has a traceable derivation chain recording exactly how it was produced.
Multilingual: Grammar-based parsing and generation in English, Russian, Arabic, French, and Spanish with cross-language entity resolution.

How to Read This Book

Getting Started walks you through building from source and your first knowledge session.
Concepts explains the core data model and reasoning strategies.
Agent covers the autonomous OODA-loop agent.
Advanced dives into compartments, workspaces, grammars, and shared partitions.
Server documents the REST and WebSocket APIs.
Reference has the component status matrix and full CLI command reference.

Source Code

The source code is on GitHub: Toasterson/akh-medu

License

akh-medu is licensed under the GPLv3. For integration into proprietary applications, contact the author.

Installation

Prerequisites

Rust toolchain: Edition 2024 (rustup update)
Platform: Linux, macOS, or Windows (WSL recommended)
RAM: 512 MB minimum, 2 GB recommended for large knowledge graphs

Build from Source

git clone https://github.com/akh-medu/akh-medu.git
cd akh-medu

# Core CLI binary
cargo build --release

# The binary is at target/release/akh-medu

Feature Flags

Feature	Flag	What It Adds
Server	`--features server`	REST + WebSocket server binary (`akh-medu-server`)
WASM Tools	`--features wasm-tools`	Wasmtime runtime for WASM-based agent tools

# Build with server support
cargo build --release --features server

# Build with everything
cargo build --release --features "server wasm-tools"

Binary Targets

Binary	Path	Feature Gate
`akh-medu`	`src/main.rs`	None (always built)
`akh-medu-server`	`src/bin/akh-medu-server.rs`	`server`

Initialize a Workspace

akh-medu uses XDG directory conventions for data, config, and state:

# Create the default workspace
akh-medu init

# Create a named workspace
akh-medu -w my-project init

This creates:

~/.config/akh-medu/
    workspaces/default.toml       # workspace config

~/.local/share/akh-medu/
    workspaces/default/
        kg/                       # knowledge graph data
        skills/                   # activated skill packs
        compartments/             # knowledge compartments
        scratch/                  # agent scratch space

~/.local/state/akh-medu/
    sessions/default.bin          # agent session state

Verify Installation

# Show engine info (in-memory, no persistence)
akh-medu info

# Show engine info with persistence
akh-medu -w default info

Run Tests

cargo test --lib

Quick Start Tutorial

This tutorial walks through a complete session: creating an engine, adding knowledge, querying it, running inference, and using the agent.

1. Initialize a Workspace

Start by creating a persistent workspace:

akh-medu init

All subsequent commands use the default workspace automatically. To use a named workspace, add -w my-project to any command.

2. Bootstrap with Seed Packs

Seed packs load foundational knowledge. Three packs are bundled:

# See what's available
akh-medu seed list

# Apply the ontology (fundamental relations like is-a, has-part, causes)
akh-medu seed apply ontology

# Apply common-sense knowledge (animals, materials, spatial concepts)
akh-medu seed apply common-sense

# Check what's been applied
akh-medu seed status

Seeds are idempotent -- applying the same seed twice has no effect.

3. Ingest Your Own Knowledge

From the grammar parser

The fastest way to add knowledge is natural language:

# Parse a statement and ingest it into the KG
akh-medu grammar parse "Dogs are mammals" --ingest
akh-medu grammar parse "Cats are mammals" --ingest
akh-medu grammar parse "Mammals have warm blood" --ingest
akh-medu grammar parse "The heart is part of the circulatory system" --ingest

From a JSON file

For bulk ingestion, prepare a JSON file:

[
  {"subject": "Earth", "predicate": "is-a", "object": "planet", "confidence": 1.0},
  {"subject": "Earth", "predicate": "orbits", "object": "Sun", "confidence": 1.0},
  {"subject": "Mars", "predicate": "is-a", "object": "planet", "confidence": 1.0},
  {"subject": "Mars", "predicate": "orbits", "object": "Sun", "confidence": 1.0}
]

akh-medu ingest --file planets.json

From CSV

# Subject-Predicate-Object CSV
akh-medu ingest --file data.csv --format csv --csv-format spo

# Entity CSV (column headers become predicates)
akh-medu ingest --file entities.csv --format csv --csv-format entity

From plain text

akh-medu ingest --file article.txt --format text --max-sentences 100

4. Query the Knowledge Graph

List symbols

# List all known symbols
akh-medu symbols list

# Show details for a specific symbol
akh-medu symbols show Dog

SPARQL queries

# Find all mammals
akh-medu sparql "SELECT ?s WHERE { ?s <https://akh-medu.dev/sym/is-a> <https://akh-medu.dev/sym/mammal> }"

# Or from a file
akh-medu sparql --file query.sparql

Graph traversal

# BFS from Dog, 2 hops deep
akh-medu traverse --seeds Dog --max-depth 2

# Only follow is-a edges
akh-medu traverse --seeds Dog --predicates is-a --max-depth 3

# Output as JSON
akh-medu traverse --seeds Dog --max-depth 2 --format json

Similarity search

# Find symbols similar to Dog via VSA
akh-medu search --symbol Dog --top-k 5

5. Run Inference

Spreading activation

Discover related knowledge by spreading activation from seed symbols:

# What's related to Dog and Cat?
akh-medu query --seeds "Dog,Cat" --depth 2 --top-k 10

Analogy

Compute "A is to B as C is to ?":

akh-medu analogy --a King --b Man --c Queen --top-k 5

Role-filler recovery

Find the object of a (subject, predicate) pair via VSA:

akh-medu filler --subject Dog --predicate is-a --top-k 5

Forward-chaining rules

Run e-graph rewrite rules to derive new facts:

akh-medu agent infer --max-iterations 10

6. Use the Agent

The autonomous agent uses an OODA loop (Observe-Orient-Decide-Act) with utility-based tool selection.

Single cycle

# Run one OODA cycle with a goal
akh-medu agent cycle --goal "Find what mammals eat"

Multi-cycle run

# Run until the goal is satisfied or 20 cycles pass
akh-medu agent run --goals "Discover properties of planets" --max-cycles 20

Interactive REPL

# Start the agent REPL
akh-medu agent repl

In the REPL, type goals in natural language. Commands:

p or plan -- show the current plan
r or reflect -- trigger reflection
q or quit -- exit (session is auto-persisted)

Resume a session

# Pick up where you left off
akh-medu agent resume --max-cycles 50

7. Use the TUI

The unified TUI provides an interactive chat interface:

akh-medu chat

TUI commands (prefix with /):

/help -- show available commands
/grammar -- switch grammar archetype (narrative, formal, terse)
/workspace -- show workspace info
/goals -- list active goals
/quit -- exit

Type natural language to set goals or ask questions. The agent runs OODA cycles automatically and synthesizes findings using the active grammar.

8. Export Data

# Export all symbols as JSON
akh-medu export symbols

# Export all triples
akh-medu export triples

# Export provenance chain for a symbol
akh-medu export provenance --symbol Dog

9. Graph Analytics

# Most connected symbols
akh-medu analytics degree --top-k 10

# PageRank importance
akh-medu analytics pagerank --top-k 10

# Strongly connected components
akh-medu analytics components

# Shortest path between two symbols
akh-medu analytics path --from Dog --to Cat

Using the Rust API

All CLI operations are available programmatically:

#![allow(unused)]
fn main() {
use akh_medu::engine::{Engine, EngineConfig};
use akh_medu::symbol::SymbolKind;
use akh_medu::graph::Triple;
use akh_medu::infer::InferenceQuery;

// Create an in-memory engine
let engine = Engine::new(EngineConfig::default())?;

// Create symbols
let dog = engine.create_symbol(SymbolKind::Entity, "Dog")?;
let mammal = engine.create_symbol(SymbolKind::Entity, "mammal")?;
let is_a = engine.create_symbol(SymbolKind::Relation, "is-a")?;

// Add a triple
engine.add_triple(Triple::new(dog.id, is_a.id, mammal.id, 0.95))?;

// Query: what is Dog?
let triples = engine.triples_from(dog.id);
for t in &triples {
    println!("{} -> {} -> {}",
        engine.resolve_label(t.subject),
        engine.resolve_label(t.predicate),
        engine.resolve_label(t.object));
}

// Run inference
let query = InferenceQuery::default()
    .with_seeds(vec![dog.id])
    .with_max_depth(2)
    .with_min_confidence(0.2);

let result = engine.infer(&query)?;
for (sym, conf) in &result.activations {
    println!("  {} (confidence {:.2})", engine.resolve_label(*sym), conf);
}
}

Next Steps

Read Symbols and Triples for a deep dive into the data model.
Read Inference for the three inference strategies.
Read OODA Loop for how the agent works.
Read Seed Packs for creating custom knowledge bundles.

Seed Packs

Seed packs are TOML-defined knowledge bundles that bootstrap workspaces with foundational triples.

Bundled Packs

Three packs are compiled into the binary:

Pack	Triples	Description
`identity`	~20	Core identity: who akh-medu is, its capabilities and components
`ontology`	~30	Fundamental relations (is-a, has-part, causes, etc.) and category hierarchy
`common-sense`	~40	Basic world knowledge: animals, materials, spatial/temporal concepts

Usage

# List available seed packs
akh-medu seed list

# Apply a specific pack
akh-medu seed apply identity

# Check which seeds are applied to the current workspace
akh-medu seed status

Seeds are idempotent: applying the same seed twice has no effect. Applied seeds are tracked via the akh:seed-applied predicate in the knowledge graph.

Creating a Custom Seed Pack

Directory structure

my-seed/
  seed.toml     # manifest + inline triples

seed.toml format

[seed]
id = "my-custom-pack"
name = "My Custom Knowledge"
version = "1.0.0"
description = "Domain-specific knowledge for my project"

[[triples]]
subject = "rust"
predicate = "is-a"
object = "programming language"
confidence = 0.95

[[triples]]
subject = "cargo"
predicate = "is-a"
object = "build tool"
confidence = 0.9

[[triples]]
subject = "cargo"
predicate = "has-part"
object = "dependency resolver"
confidence = 0.85

Fields

Field	Required	Description
`[seed].id`	yes	Unique identifier for the pack
`[seed].name`	yes	Human-readable name
`[seed].version`	yes	Semantic version string
`[seed].description`	yes	Brief description
`[[triples]].subject`	yes	Subject entity label
`[[triples]].predicate`	yes	Predicate relation label
`[[triples]].object`	yes	Object entity label
`[[triples]].confidence`	no	Confidence score 0.0-1.0 (default: 0.8)

Installing a Seed Pack

Copy the seed directory to the seeds location:

cp -r my-seed/ ~/.local/share/akh-medu/seeds/my-seed/

The pack will appear in akh-medu seed list on next invocation.

Auto-Seeding on Workspace Creation

Workspace configs specify which seeds to apply on first initialization:

# ~/.config/akh-medu/workspaces/default.toml
name = "default"
seed_packs = ["identity", "ontology", "common-sense"]

When a workspace is created, the listed seeds are applied automatically.

Architecture

akh-medu is built on three complementary foundations: hyperdimensional computing (VSA), knowledge graphs, and symbolic reasoning. Each handles a different aspect of intelligence, and the engine unifies them under a single API.

The Three Pillars

1. Vector Symbolic Architecture (VSA)

VSA encodes symbols as high-dimensional binary vectors (default: 10,000 bits). These hypervectors support algebraic operations:

Operation	Function	Effect
Bind (XOR)	`bind(A, B)`	Creates a composite dissimilar to both inputs. Reversible: `unbind(bind(A, B), B) = A`
Bundle (majority vote)	`bundle(A, B, C)`	Creates a vector similar to all inputs. Set-like union.
Permute (bit shift)	`permute(A, n)`	Preserves structure but shifts representation. Encodes sequence position.
Similarity (Hamming)	`similarity(A, B)`	0.5 = random, 1.0 = identical. Measures semantic closeness.

VSA provides:

Fuzzy matching: Misspellings, spelling variants, and near-synonyms discovered automatically via similarity search.
Analogy: "A is to B as C is to ?" computed as unbind(bind(A, B), C).
Implicit knowledge: Relationships not explicit in the graph exist in the vector space and can be recovered via unbinding.

2. Knowledge Graph

A directed graph of (subject, predicate, object) triples with confidence scores. Backed by two stores:

petgraph: In-memory directed graph with dual indexing (symbol-to-node, node-to-symbol) for fast traversal.
Oxigraph: Persistent RDF store with SPARQL query support and named graphs for compartment isolation.

The graph provides:

Explicit knowledge: Direct lookup of facts.
Traversal: BFS with depth, predicate, and confidence filtering.
Analytics: Degree centrality, PageRank, strongly connected components.

3. Symbolic Reasoning (egg)

The egg e-graph library performs equality saturation -- applying rewrite rules until no more simplifications are possible. This provides:

Algebraic simplification: Expressions like unbind(bind(A, B), B) simplify to A.
Forward-chaining inference: Rules derive new facts from existing ones.
Verification: VSA-recovered inferences can be checked against the e-graph for mathematical consistency.

Subsystem Map

┌──────────────────────────────────────────────────────────┐
│                      Engine API                          │
├──────────┬──────────┬──────────┬──────────┬──────────────┤
│   VSA    │Knowledge │Reasoning │Inference │   Agent      │
│  Ops     │  Graph   │  (egg)   │ Engine   │  OODA Loop   │
├──────────┼──────────┼──────────┼──────────┼──────────────┤
│ HyperVec │petgraph  │ rewrite  │spreading │ 15 tools     │
│ SIMD     │oxigraph  │ rules    │backward  │ planning     │
│ ItemMem  │SPARQL    │ e-graphs │superpos. │ psyche       │
├──────────┴──────────┴──────────┴──────────┴──────────────┤
│             Grammar · Skills · Provenance                │
├──────────────────────────────────────────────────────────┤
│                   Tiered Storage                         │
│  Hot (DashMap) · Warm (mmap) · Durable (redb)            │
└──────────────────────────────────────────────────────────┘

Data Flow

A typical query flows through multiple subsystems:

Input: Natural language parsed by the grammar framework into an AbsTree.
Resolution: Entity names resolved to SymbolId values via the registry (exact match) or item memory (fuzzy VSA match).
Graph lookup: Direct triples retrieved from the knowledge graph.
Inference: Spreading activation discovers related symbols. VSA bind/unbind recovers implicit relationships in parallel.
Reasoning: E-graph rules derive new facts and verify VSA results.
Provenance: Every derivation step is recorded with full traceability.
Output: Results linearized back to prose via the grammar framework.

Tiered Storage

Data flows through three tiers based on access pattern:

Tier	Backend	Purpose	Access Speed
Hot	DashMap (concurrent HashMap)	Active working set	Nanoseconds
Warm	memmap2 (memory-mapped files)	Large read-heavy data	Microseconds
Durable	redb (ACID key-value store)	Persistent data that survives restarts	Milliseconds

The TieredStore composes all three with automatic promotion: data accessed from the durable tier is promoted to the hot tier for subsequent reads.

Configuration

The EngineConfig struct controls all subsystem parameters:

#![allow(unused)]
fn main() {
EngineConfig {
    dimension: Dimension::DEFAULT,    // 10,000 bits
    encoding: Encoding::Bipolar,      // +1/-1 interpretation
    data_dir: Some(path),             // None = memory-only
    max_memory_mb: 1024,              // hot tier budget
    max_symbols: 1_000_000,           // symbol registry limit
    language: Language::Auto,         // grammar language
}
}

Symbols and Triples

Symbols and triples are the fundamental data model in akh-medu. Every piece of knowledge is represented as symbols connected by triples.

Symbols

A symbol is a unique entity in the knowledge base. Each symbol has:

SymbolId: A NonZeroU64 identifier, globally unique within an engine.
SymbolKind: The type of thing the symbol represents.
Label: A human-readable name (e.g., "Dog", "is-a", "mammal").
Confidence: How certain we are this symbol is meaningful (0.0-1.0).

Symbol Kinds

Kind	Purpose	Examples
`Entity`	A concrete or abstract thing	"Dog", "Paris", "mitochondria"
`Relation`	A relationship type	"is-a", "has-part", "causes"
`Concept`	An abstract idea	"freedom", "recursion"
`Agent`	An autonomous agent	The agent itself

Creating Symbols

#![allow(unused)]
fn main() {
use akh_medu::symbol::SymbolKind;

// Create specific symbols
let dog = engine.create_symbol(SymbolKind::Entity, "Dog")?;
let is_a = engine.create_symbol(SymbolKind::Relation, "is-a")?;

// Resolve-or-create (idempotent)
let mammal = engine.resolve_or_create_entity("mammal")?;
let causes = engine.resolve_or_create_relation("causes")?;
}

Symbol Resolution

The engine resolves names to IDs through multiple strategies:

Exact match: engine.lookup_symbol("Dog") -- direct registry lookup.
ID parse: engine.resolve_symbol("42") -- tries parsing as a numeric ID.
Fuzzy match: The lexer encodes unknown tokens as hypervectors and searches item memory for similar known symbols (threshold >= 0.60).

Hypervector Encoding

Every symbol is encoded as a 10,000-bit binary hypervector using deterministic seeded random generation. The same SymbolId always produces the same vector:

SymbolId(42) -> deterministic random HyperVec (seeded with 42)

Multi-word labels are encoded by bundling per-word vectors:

"big red dog" -> bundle(encode("big"), encode("red"), encode("dog"))

The resulting vector is similar to each component but identical to none, capturing set-like semantics.

Triples

A triple is a fact: (subject, predicate, object) with a confidence score.

#![allow(unused)]
fn main() {
use akh_medu::graph::Triple;

let triple = Triple::new(dog_id, is_a_id, mammal_id, 0.95);
engine.add_triple(triple)?;
}

Triple Fields

Field	Type	Description
`subject`	`SymbolId`	The entity being described
`predicate`	`SymbolId`	The relationship type
`object`	`SymbolId`	The related entity
`confidence`	`f32`	Certainty score (0.0-1.0)

Confidence

Confidence scores flow through the system:

Input confidence: Set when triples are created (default: 0.8 for seed packs, 0.90 for grammar-parsed facts).
Propagation: During inference, confidence propagates multiplicatively: C(node) = C(parent) * edge_confidence.
Fusion: When multiple paths reach the same node, confidences merge: C = max(C_graph, C_vsa).

Querying Triples

#![allow(unused)]
fn main() {
// All facts about Dog
let from_dog = engine.triples_from(dog_id);

// All facts pointing to mammal
let to_mammal = engine.triples_to(mammal_id);

// Check existence
let exists = engine.has_triple(dog_id, is_a_id, mammal_id);

// All triples in the KG
let all = engine.all_triples();
}

SPARQL

For complex queries, use SPARQL against the Oxigraph store:

akh-medu sparql "SELECT ?animal WHERE {
  ?animal <https://akh-medu.dev/sym/is-a> <https://akh-medu.dev/sym/mammal>
}"

Symbol IRIs follow the pattern https://akh-medu.dev/sym/{label}.

Provenance

Every triple and derived fact carries provenance -- a record of how it was created:

Derivation Kind	Description
`Fact { source }`	External input (ingested file, user assertion)
`InferenceRule { rule_id }`	Derived by e-graph rewrite rule
`VsaBind { factors }`	Created via VSA binding operation
`VsaUnbind { factors }`	Recovered via VSA unbinding
`GraphEdge`	Found via knowledge graph traversal
`AgentConsolidation`	Created during agent memory consolidation
`AgentDecision`	Created by agent tool execution
`CompartmentLoaded`	Loaded from a compartment's triples file

#![allow(unused)]
fn main() {
// Get the derivation trail for a symbol
let records = engine.provenance_of(dog_id)?;
for record in &records {
    println!("  derived via {:?} (confidence {:.2})",
        record.derivation_kind, record.confidence);
}
}

The Symbol Registry

The SymbolRegistry is a bidirectional map between labels and IDs, backed by the tiered store:

Labels are case-sensitive.
Duplicate labels are rejected -- each label maps to exactly one ID.
The AtomicSymbolAllocator generates IDs via atomic increment (lock-free).

Item Memory

The ItemMemory stores hypervector encodings for approximate nearest-neighbor (ANN) search using the HNSW algorithm with Hamming distance:

#![allow(unused)]
fn main() {
// Find symbols similar to Dog
let results = engine.search_similar_to(dog_id, 5)?;
for r in &results {
    println!("  {} (similarity {:.2})", engine.resolve_label(r.id), r.similarity);
}
}

Search is sub-linear: O(log n) even with millions of vectors.

Inference Engine

The akh-medu inference engine discovers implicit knowledge from existing symbol associations. It combines three complementary strategies — spreading activation, backward chaining, and superposition reasoning — all operating on the same hypervector (VSA) substrate. Every inference produces a full provenance trail so results can be explained, audited, and verified.

Module: src/infer/ (4 files, ~1,500 lines, 17 tests)

Architecture Overview

                    ┌─────────────────────────┐
                    │       Engine API         │
                    │  infer()  infer_analogy  │
                    │  recover_filler()        │
                    └─────────┬───────────────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
    ┌─────────▼──────┐ ┌─────▼──────┐ ┌──────▼──────────┐
    │   Spreading    │ │  Backward  │ │  Superposition  │
    │   Activation   │ │  Chaining  │ │   Reasoning     │
    │  (engine.rs)   │ │(backward.rs│ │(superposition.rs│
    └───────┬────────┘ └─────┬──────┘ └───────┬─────────┘
            │                │                │
    ┌───────▼────────────────▼────────────────▼─────────┐
    │              Shared Infrastructure                 │
    │  VsaOps · ItemMemory · KnowledgeGraph · Provenance│
    └───────────────────────────────────────────────────┘

Each strategy accesses:

VsaOps — bind, unbind, bundle, similarity (SIMD-accelerated)
ItemMemory — HNSW-based approximate nearest neighbor search
KnowledgeGraph — directed graph of (subject, predicate, object) triples
ProvenanceLedger — persistent record of how each result was derived

Strategy 1: Spreading Activation

File: src/infer/engine.rs (623 lines, 10 tests)

The primary inference strategy. Starting from seed symbols, activation spreads outward along knowledge graph edges. At each hop, VSA bind/unbind recovery runs in parallel, catching implicit relationships that the graph edges alone would miss.

Algorithm

Seed activation: Each seed symbol gets confidence 1.0. Their hypervectors are bundled into an initial interference pattern.
Frontier expansion (repeated for max_depth iterations):
- For each activated but unexpanded symbol, retrieve outgoing triples
- Graph-direct activation: Activate the triple's object with confidence = parent_confidence * edge_confidence
- VSA recovery: Compute unbind(subject_vec, predicate_vec), search item memory for the nearest match. If similarity >= threshold and the match differs from the graph-direct object, activate it too
- Bundle newly activated vectors into the interference pattern
E-graph verification (optional): For VSA-recovered results, build an egg expression (bind (bind from predicate) symbol) and check if the e-graph can simplify it. Non-simplifiable expressions get a 10% confidence penalty.
Result assembly: Filter out seed symbols, sort by confidence, truncate to top_k.

Confidence Model

Confidence propagates multiplicatively along the graph path:

C(node) = C(parent) * edge_confidence

For VSA recovery, confidence is capped by both the graph path and the vector similarity:

C_vsa(node) = C(parent) * min(edge_confidence, similarity)
C(node) = max(C_graph, C_vsa)

Configuration

Parameter	Type	Default	Description
`seeds`	`Vec<SymbolId>`	`[]`	Starting symbols (required, non-empty)
`top_k`	`usize`	`10`	Maximum results to return
`max_depth`	`usize`	`1`	Number of expansion hops
`min_confidence`	`f32`	`0.1`	Discard activations below this
`min_similarity`	`f32`	`0.6`	VSA recovery similarity threshold
`verify_with_egraph`	`bool`	`false`	Enable e-graph verification
`predicate_filter`	`Option<Vec<SymbolId>>`	`None`	Only follow these predicates

API

#![allow(unused)]
fn main() {
// Builder pattern for queries
let query = InferenceQuery::default()
    .with_seeds(vec![dog_id, cat_id])
    .with_max_depth(3)
    .with_min_confidence(0.2)
    .with_egraph_verification();

let result: InferenceResult = engine.infer(&query)?;
// result.activations: Vec<(SymbolId, f32)> — sorted by confidence
// result.pattern: Option<HyperVec> — combined interference pattern
// result.provenance: Vec<ProvenanceRecord> — full derivation trail
}

Additional Operations

Analogy — "A is to B as C is to ?":

#![allow(unused)]
fn main() {
// Computes bind(A, B) to capture the A→B relation,
// then unbind(relation, C) to recover D, and searches item memory.
let results: Vec<(SymbolId, f32)> = engine.infer_analogy(a, b, c, top_k)?;
}

Requires three distinct symbols. The relational vector bind(A, B) captures the abstract relationship, which is then applied to C via unbind.

Role-filler recovery — "What is the object of (subject, predicate)?":

#![allow(unused)]
fn main() {
// unbind(subject_vec, predicate_vec) → search item memory
let fillers: Vec<(SymbolId, f32)> = engine.recover_filler(subject, predicate, 5)?;
}

Strategy 2: Backward Chaining

File: src/infer/backward.rs (245 lines, 3 tests)

Reasons from a goal backward to find supporting evidence. Given a target symbol, finds all triples where it appears as the object, then recursively finds support for each subject. This answers the question: "What evidence supports this conclusion?"

Algorithm

Find all incoming triples (?, ?, goal) where the goal is the object
For each triple, optionally verify via VSA: similarity(unbind(subject_vec, predicate_vec), goal_vec)
Compute chain confidence: parent_confidence * edge_confidence * vsa_similarity
Prune chains below min_confidence
Recursively find support for each subject (up to max_depth)
Record leaf chains (where no further support exists)

Types

#![allow(unused)]
fn main() {
pub struct BackwardChain {
    pub goal: SymbolId,                  // Target symbol
    pub supporting_triples: Vec<Triple>, // Evidence chain
    pub confidence: f32,                 // Product of step confidences
    pub depth: usize,                    // Deepest step
}

pub struct BackwardConfig {
    pub max_depth: usize,       // Default: 3
    pub min_confidence: f32,    // Default: 0.1
    pub vsa_verify: bool,       // Default: true
}
}

API

#![allow(unused)]
fn main() {
use akh_medu::infer::backward::{infer_backward, BackwardConfig};

let chains = infer_backward(&engine, goal_symbol, &BackwardConfig::default())?;

for chain in &chains {
    println!("Support chain (confidence {:.2}, depth {}):",
        chain.confidence, chain.depth);
    for triple in &chain.supporting_triples {
        println!("  {} --{}--> {}", triple.subject, triple.predicate, triple.object);
    }
}
}

VSA Verification

When vsa_verify is enabled (default), each step in the chain is checked:

recovered = unbind(subject_vec, predicate_vec)
similarity = cosine(recovered, goal_vec)

This acts as a plausibility check — if the VSA substrate doesn't "agree" that the relationship holds, confidence is reduced proportionally.

Strategy 3: Superposition Reasoning

File: src/infer/superposition.rs (517 lines, 4 tests)

Implements "computing in superposition" — multiple competing hypotheses processed simultaneously in the same vector substrate. At branch points, hypotheses fork. Constructive interference merges similar hypotheses; destructive interference collapses contradicted ones.

Core Concept

Unlike spreading activation which maintains a single global activation map, superposition maintains multiple independent hypotheses. Each hypothesis is a separate hypervector pattern with its own confidence and provenance. This enables the engine to explore contradictory interpretations in parallel and let the mathematics of interference determine the winner.

Algorithm

Seed: Create initial hypothesis from bundled seed vectors (confidence 1.0)
Expand (repeated for max_depth iterations):
- For each hypothesis, expand each activated symbol's outgoing triples
- At branch points (multiple outgoing edges), fork new hypotheses
- Each fork bundles the parent's pattern with the new symbol's vector
Constructive interference: Merge hypotheses whose patterns are similar (similarity > merge_threshold). Merged confidence uses noisy-OR: (C_a + C_b) * 0.6
Destructive interference: Compare each hypothesis against the seed evidence pattern. Low similarity reduces confidence: interference = (similarity - 0.5) * 2.0 Negative interference decays confidence; hypotheses below min_confidence are pruned.
Result: Return all surviving hypotheses sorted by confidence, with the dominant (highest-confidence) hypothesis highlighted.

Types

#![allow(unused)]
fn main() {
pub struct Hypothesis {
    pub pattern: HyperVec,                 // Superposition vector
    pub confidence: f32,                   // Current confidence
    pub provenance: Vec<ProvenanceRecord>, // How this hypothesis formed
    pub activated: Vec<(SymbolId, f32)>,   // Symbols in this hypothesis
}

pub struct SuperpositionConfig {
    pub max_hypotheses: usize,   // Default: 8
    pub merge_threshold: f32,    // Default: 0.65
    pub min_confidence: f32,     // Default: 0.1
    pub max_depth: usize,        // Default: 3
}

pub struct SuperpositionResult {
    pub dominant: Option<Hypothesis>,  // Highest-confidence survivor
    pub hypotheses: Vec<Hypothesis>,   // All survivors, sorted
    pub merges: usize,                 // Number of constructive merges
    pub collapses: usize,              // Number of destructive collapses
}
}

API

#![allow(unused)]
fn main() {
use akh_medu::infer::superposition::{infer_with_superposition, SuperpositionConfig};

let config = SuperpositionConfig {
    max_hypotheses: 16,
    merge_threshold: 0.7,
    ..Default::default()
};

let result = infer_with_superposition(&[seed1, seed2], &engine, &config)?;

println!("Surviving hypotheses: {}", result.hypotheses.len());
println!("Merges: {}, Collapses: {}", result.merges, result.collapses);

if let Some(dominant) = &result.dominant {
    println!("Dominant hypothesis (confidence {:.2}):", dominant.confidence);
    for (sym, conf) in &dominant.activated {
        println!("  {:?} ({:.2})", sym, conf);
    }
}
}

State Management

SuperpositionState manages the hypothesis population:

Method	Description
`fork()`	Create new hypothesis from parent + new symbol
`merge_constructive()`	Merge similar hypotheses (constructive interference)
`collapse_destructive()`	Prune contradicted hypotheses (destructive interference)
`dominant()`	Get highest-confidence hypothesis
`into_result()`	Consume state into final `SuperpositionResult`

Provenance

Every inference operation produces ProvenanceRecord entries that explain exactly how each result was derived. Records are persisted to the provenance ledger (redb) when available.

Derivation Kinds

Kind	Description	Fields
`Seed`	Starting point of inference	—
`GraphEdge`	Activated via knowledge graph triple	`from`, `predicate`
`VsaRecovery`	Recovered via VSA unbind + item memory search	`from`, `predicate`, `similarity`
`RuleInference`	Derived by e-graph rewrite rule	`rule_name`, `from_symbols`
`FusedInference`	Produced by confidence fusion in autonomous cycle	`path_count`, `interference_score`

Each record also carries:

derived_id: SymbolId — the symbol this record is about
confidence: f32 — confidence at this derivation step
depth: usize — how many hops from the seed
sources: Vec<SymbolId> — symbols that contributed to this derivation

Integration Points

Engine (`src/engine.rs`)

The Engine type exposes three top-level inference methods:

#![allow(unused)]
fn main() {
impl Engine {
    /// Spreading activation with all active rules (built-in + skills).
    /// Automatically persists provenance to ledger.
    pub fn infer(&self, query: &InferenceQuery) -> AkhResult<InferenceResult>;

    /// Analogy: A:B :: C:?
    pub fn infer_analogy(&self, a: SymbolId, b: SymbolId, c: SymbolId, top_k: usize)
        -> AkhResult<Vec<(SymbolId, f32)>>;

    /// Role-filler recovery for (subject, predicate) → object.
    pub fn recover_filler(&self, subject: SymbolId, predicate: SymbolId, top_k: usize)
        -> AkhResult<Vec<(SymbolId, f32)>>;
}
}

Pipeline (`src/pipeline/mod.rs`)

The Infer stage runs spreading activation as part of the linear pipeline:

Retrieve → Infer → Reason → Extract

The infer stage accepts a query_template that is cloned and populated with seeds from the retrieve stage's output. Custom depth and confidence can be set via the template.

Autonomous Cycle (`src/autonomous/integration.rs`)

The autonomous cycle uses superposition reasoning as step 3:

Forward-chaining rules (e-graph rewrite)
Symbol grounding (re-encode all symbols into VSA)
Superposition inference — forking hypotheses from seed symbols
Confidence fusion — merge rule-derived and superposition-derived paths
KG commit — insert high-confidence triples into the knowledge graph

Agent (`src/agent/ooda.rs`)

The agent's Orient phase runs spreading activation to find relevant knowledge for the current goal, feeding inferences into the Decide phase for tool selection.

CLI (`src/main.rs`)

# Spreading activation
akh-medu query --seeds "Dog,Cat" --depth 3 --top-k 20

# Analogy
akh-medu analogy --a "King" --b "Man" --c "Queen" --top-k 5

# Forward-chaining inference rules
akh-medu infer

# Pipeline with custom inference depth
akh-medu pipeline run --infer-depth 3 --stages retrieve,infer,reason

Error Handling

All inference errors are reported via InferError with miette diagnostics:

Error	Code	Description
`NoSeeds`	`akh::infer::no_seeds`	Inference query has empty seeds list
`InvalidAnalogy`	`akh::infer::analogy`	Analogy requires 3 distinct symbols
`MaxDepthExceeded`	`akh::infer::depth`	Inference depth limit reached
`VsaError`	(transparent)	Underlying VSA operation failure

Design Rationale

Why three strategies? Each addresses a different reasoning need:

Spreading activation is the workhorse — fast, breadth-first, good for "what's related to X?" queries. It finds direct and indirect associations.
Backward chaining answers "why?" questions — given a conclusion, find the evidence chain that supports it. Essential for explainability.
Superposition handles ambiguity — when the graph has multiple contradictory paths, it explores them all simultaneously and lets constructive/destructive interference pick the winner.

Why VSA recovery alongside graph traversal? The knowledge graph captures explicit relationships, but VSA encodes distributional similarity. A triple (Dog, is-a, Mammal) is explicit in the graph, but the implicit analogy "Dog is to Puppy as Cat is to Kitten" lives in the vector space. Running both in parallel catches knowledge that either alone would miss.

Why e-graph verification? The egg e-graph engine provides algebraic simplification of VSA expressions. If unbind(bind(A, B), B) doesn't simplify to A, it suggests the recovery was noisy. This provides a cheap mathematical sanity check on VSA-recovered inferences.

Tests

The inference module has 17 tests across the three strategy files:

Spreading Activation (engine.rs, 10 tests):

infer_no_seeds_returns_error — empty seeds produce NoSeeds error
single_hop_inference — one-step graph traversal finds correct target
multi_hop_inference — depth 1 vs depth 2 reaches different nodes
confidence_propagates_multiplicatively — confidence decays along path
role_filler_recovery — unbind(subject, predicate) recovers filler
analogy_inference — A:B::C:? returns results
analogy_requires_three_distinct — duplicate symbols rejected
provenance_records_generated — Seed and GraphEdge records present
empty_graph_no_activations — isolated symbol produces empty results
predicate_filter_respected — only specified predicates are followed

Backward Chaining (backward.rs, 3 tests):

find_support_chain — Dog→Mammal→Animal finds multi-step evidence
confidence_decreases_with_depth — deeper chains have lower confidence
no_support_for_isolated_symbol — no incoming triples means no chains

Superposition (superposition.rs, 4 tests):

fork_creates_multiple_hypotheses — branch points produce multiple hypotheses
constructive_merge_combines_similar — identical patterns merge
destructive_collapse_removes_contradicted — dissimilar patterns are pruned
dominant_hypothesis_has_highest_confidence — dominant picks the right one

Reasoning

akh-medu uses the egg e-graph library for symbolic reasoning via equality saturation. E-graphs efficiently represent equivalence classes of expressions, and rewrite rules transform them until a fixed point is reached.

How E-Graphs Work

An e-graph (equality graph) stores expressions and their equivalences simultaneously. When a rewrite rule fires, the e-graph doesn't destructively replace -- it adds the new expression to the same equivalence class.

Before rule: bind(unbind(X, Y), Y) => X
E-graph: { bind(unbind(A, B), B) }

After rule:
E-graph: { bind(unbind(A, B), B), A }  <- both in the same e-class

This means multiple rules can fire without conflict, and the optimal result is extracted after all rules have been applied.

AkhLang

akh-medu defines AkhLang, a domain-specific language for the e-graph:

#![allow(unused)]
fn main() {
define_language! {
    pub enum AkhLang {
        // VSA operations
        "bind" = Bind([Id; 2]),
        "unbind" = Unbind([Id; 2]),
        "bundle" = Bundle([Id; 2]),
        "permute" = Permute([Id; 2]),
        // Knowledge operations
        "triple" = Triple([Id; 3]),
        "sim" = Similarity([Id; 2]),
        // Leaf nodes
        Symbol(Symbol),
    }
}
}

Built-in Rewrite Rules

The engine ships with algebraic rules for VSA operations:

Rule	Pattern	Result	Purpose
Bind-unbind cancel	`unbind(bind(X, Y), Y)`	`X`	VSA algebra: unbinding reverses binding
Unbind-bind cancel	`bind(unbind(X, Y), Y)`	`X`	Symmetric cancellation
Bind commutative	`bind(X, Y)`	`bind(Y, X)`	XOR is commutative
Bundle commutative	`bundle(X, Y)`	`bundle(Y, X)`	Majority vote is commutative
Bind self-inverse	`bind(X, X)`	`identity`	XOR with self = zero

Skills can contribute additional rules that are loaded dynamically.

Using the Reasoner

CLI

# Simplify an expression
akh-medu reason --expr "unbind(bind(Dog, is-a), is-a)"
# Output: Dog

# Verbose mode shows the e-graph state
akh-medu reason --expr "unbind(bind(Dog, is-a), is-a)" --verbose

Rust API

#![allow(unused)]
fn main() {
use egg::{rewrite, Runner, Extractor, AstSize};
use akh_medu::reason::AkhLang;

let rules = akh_medu::reason::built_in_rules();

let expr = "unbind(bind(Dog, is-a), is-a)".parse()?;
let runner = Runner::default()
    .with_expr(&expr)
    .run(&rules);

let extractor = Extractor::new(&runner.egraph, AstSize);
let (cost, best) = extractor.find_best(runner.roots[0]);
println!("Simplified to: {} (cost {})", best, cost);
}

Forward-Chaining Inference

The agent's infer_rules tool runs rewrite rules as forward-chaining inference:

Existing triples are encoded as e-graph expressions.
Rewrite rules fire, potentially producing new triple expressions.
New triples are extracted and committed to the knowledge graph.

akh-medu agent infer --max-iterations 10 --min-confidence 0.5

E-Graph Verification

During inference, VSA-recovered results can optionally be verified by the e-graph. If unbind(bind(A, B), B) doesn't simplify to A, it suggests the recovery was noisy and confidence is reduced by 10%.

This is controlled by InferenceQuery::with_egraph_verification():

#![allow(unused)]
fn main() {
let query = InferenceQuery::default()
    .with_seeds(vec![dog_id])
    .with_egraph_verification();
}

Skills and Custom Rules

Skill packs can contribute domain-specific rewrite rules:

# In a skill's rules.toml
[[rules]]
name = "transitive-is-a"
lhs = "(triple ?a is-a ?b) (triple ?b is-a ?c)"
rhs = "(triple ?a is-a ?c)"

When a skill is loaded, its rules are compiled into egg rewrites and added to the engine's rule set. When the skill is unloaded, the rules are removed.

Design Rationale

Why e-graphs? Traditional rule engines apply rules destructively -- once a rule fires, the original expression is gone. E-graphs keep all intermediate forms, avoiding the phase-ordering problem where the order of rule application matters. With equality saturation, all rules fire simultaneously and the optimal result is extracted at the end.

Why algebraic verification? VSA recovery via unbind(subject, predicate) is approximate -- the recovered vector is searched against item memory for the nearest match. This can produce false positives, especially with high symbol density. The e-graph provides a cheap mathematical sanity check: if the algebraic identity doesn't hold, the recovery is suspect.

OODA Loop

The akh-medu agent operates on a continuous Observe-Orient-Decide-Act (OODA) loop. Each cycle gathers state, builds context, selects a tool, executes it, and evaluates progress toward goals.

Cycle Structure

┌─────────┐     ┌─────────┐     ┌─────────┐     ┌─────────┐
│ Observe │ --> │  Orient │ --> │  Decide │ --> │   Act   │
│         │     │         │     │         │     │         │
│ Goals   │     │ KG ctx  │     │ Tool    │     │ Execute │
│ WM      │     │ Infer   │     │ select  │     │ Eval    │
│ Episodes│     │ Episodes│     │ Score   │     │ Progress│
└─────────┘     └─────────┘     └─────────┘     └─────────┘
      ^                                               │
      └───────────── next cycle ──────────────────────┘

Phase Details

Observe

Gathers the current state of the world:

Active goals and their status (Active, Suspended, Completed, Failed)
Working memory size and recent entries
Recalled episodic memories relevant to current goals

The observation is recorded in working memory as a WorkingMemoryKind::Observation.

Orient

Builds context from available knowledge:

Collects adjacent KG triples for each active goal's symbols
Runs spreading-activation inference from goal-related symbols
Incorporates knowledge from recalled episodic entries
Measures memory pressure (ratio of WM entries to capacity)

Decide

Selects the best tool for the current situation using utility-based scoring:

total_score = base_score - recency_penalty + novelty_bonus
            + episodic_bonus + pressure_bonus + archetype_bonus

Factor	Range	Purpose
`base_score`	0.0-1.0	State-dependent score (e.g., kg_query scores high when KG has few triples)
`recency_penalty`	-0.4 to 0.0	Penalizes recently used tools (-0.4 last cycle, -0.2 two ago, -0.1 three ago)
`novelty_bonus`	0.0 or +0.15	Rewards tools never used on this goal
`episodic_bonus`	0.0 or +0.2	Rewards tools mentioned in recalled episodes
`pressure_bonus`	0.0 or +0.2	Boosts consolidation tools when memory is nearly full
`archetype_bonus`	-0.07 to +0.07	Psyche archetype weight bias

The decision includes a score breakdown string for transparency:

[score=0.85: base=0.80 recency=-0.00 novelty=+0.15 episodic=+0.00 pressure=+0.00 archetype=+0.030]

Act

Executes the chosen tool and evaluates the outcome:

Shadow check: If a psyche is configured, veto/bias patterns are checked before execution.
Tool execution: The tool runs against the engine, producing a ToolOutput.
Goal evaluation: Success criteria are checked against KG state:
- Keywords from criteria are matched against non-metadata triples.
- Tool output content is also checked for criteria matches.
Progress assessment: Goal is marked Completed, Failed, or Advanced.
Provenance: An AgentDecision provenance record is stored.

Goal Management

Goals are the agent's driving objectives. Each goal has:

#![allow(unused)]
fn main() {
Goal {
    symbol_id: SymbolId,       // KG entity for this goal
    description: String,       // What to achieve
    success_criteria: String,  // Evaluated against KG state
    priority: u8,              // 0-255 (higher = more important)
    status: GoalStatus,        // Active, Suspended, Completed, Failed
    stall_threshold: usize,    // Cycles without progress before decomposition
}
}

Stall Detection

The agent tracks per-goal progress:

cycles_worked: Total OODA cycles spent on this goal.
last_progress_cycle: Cycle number when last progress was made.

If cycles_worked - last_progress_cycle >= stall_threshold, the goal is considered stalled and decomposition fires automatically.

Goal Decomposition

Stalled goals are split into sub-goals:

Criteria are split on commas and "and" conjunctions.
The parent goal is suspended.
Child goals are created with Active status and linked via agent:parent_goal / agent:child_goal predicates.

Working Memory

Working memory is the agent's ephemeral scratch space:

#![allow(unused)]
fn main() {
WorkingMemoryEntry {
    id: u64,
    content: String,                   // Text representation
    symbols: Vec<SymbolId>,            // Linked entities
    kind: WorkingMemoryKind,           // Observation, Inference, Decision, Action, Reflection
    timestamp: u64,
    relevance: f32,                    // 0.0-1.0
    source_cycle: u64,
    reference_count: u64,              // Incremented when consulted
}
}

Capacity is configurable (default: 100 entries). When full, the oldest low-relevance entries are evicted.

Episodic Memory

High-relevance working memory entries are consolidated into episodic memories -- persistent long-term records:

#![allow(unused)]
fn main() {
EpisodicEntry {
    timestamp: u64,
    goal: SymbolId,
    summary: String,
    learnings: Vec<SymbolId>,          // Symbols found relevant
    derivation_kind: DerivationKind,   // AgentConsolidation
}
}

Consolidation fires automatically when memory pressure exceeds 0.8, or manually via agent consolidate.

Session Persistence

The agent's full state (working memory, cycle count, goals, plans, psyche) is serialized to the durable store on exit and restored on resume:

# Persists automatically on exit
akh-medu agent run --goals "..." --max-cycles 20

# Resume where you left off
akh-medu agent resume --max-cycles 50

CLI Commands

# Single cycle
akh-medu agent cycle --goal "Find mammals"

# Multi-cycle
akh-medu agent run --goals "Discover planet properties" --max-cycles 20

# Fresh start (ignores persisted session)
akh-medu agent run --goals "..." --max-cycles 10 --fresh

# Interactive REPL
akh-medu agent repl

# Resume persisted session
akh-medu agent resume

# Trigger consolidation
akh-medu agent consolidate

# Recall episodic memories
akh-medu agent recall --query "mammals" --top-k 5

Configuration

#![allow(unused)]
fn main() {
AgentConfig {
    working_memory_capacity: 100,     // Max WM entries
    consolidation: ConsolidationConfig::default(),
    max_cycles: 1000,                 // Safety limit
    auto_consolidate: true,           // Auto-fire when pressure > 0.8
    reflection: ReflectionConfig::default(),
    max_backtrack_attempts: 3,        // Plan retries before giving up
}
}

Tools

The agent has 15 built-in tools organized into three categories: core knowledge tools, external interaction tools, and advanced tools.

Tool Architecture

Each tool implements the Tool trait:

#![allow(unused)]
fn main() {
pub trait Tool: Send + Sync {
    fn signature(&self) -> ToolSignature;
    fn execute(&self, engine: &Engine, input: ToolInput) -> AgentResult<ToolOutput>;
    fn manifest(&self) -> ToolManifest;
}
}

Tools produce a ToolOutput with:

success: bool -- whether the operation succeeded
result: String -- human-readable summary
symbols_involved: Vec<SymbolId> -- entities touched (linked in working memory)

Core Knowledge Tools

kg_query

Query the knowledge graph by symbol, predicate, or direction.

Parameter	Required	Description
`symbol`	yes	Symbol name or ID to query
`predicate`	no	Filter by predicate
`direction`	no	`outgoing` (default) or `incoming`

kg_mutate

Create new triples in the knowledge graph.

Parameter	Required	Description
`subject`	yes	Subject entity label
`predicate`	yes	Relation label
`object`	yes	Object entity label
`confidence`	no	Confidence score (default: 0.8)

memory_recall

Fetch episodic memories relevant to the current context.

Parameter	Required	Description
`query_symbols`	yes	Comma-separated symbol names to match against
`top_k`	no	Maximum results (default: 5)

reason

Simplify expressions via e-graph rewriting.

Parameter	Required	Description
`expr`	yes	Expression to simplify
`verbose`	no	Show e-graph state

similarity_search

Find similar symbols via VSA hypervector similarity.

Parameter	Required	Description
`symbol`	yes	Symbol name or ID
`top_k`	no	Maximum results (default: 5)

External Interaction Tools

file_io

Read and write files, sandboxed to the workspace's scratch directory.

Parameter	Required	Description
`operation`	yes	`read` or `write`
`path`	yes	File path (relative to scratch dir)
`content`	write only	Content to write

Limits: 4 KB read truncation, 256 KB write limit.

http_fetch

Synchronous HTTP GET via ureq.

Parameter	Required	Description
`url`	yes	URL to fetch
`timeout_secs`	no	Request timeout (default: 30)

Limit: 256 KB response truncation.

shell_exec

Execute shell commands with poll-based timeout.

Parameter	Required	Description
`command`	yes	Shell command to run
`timeout_secs`	no	Timeout (default: 30)

Limits: 64 KB output, process killed on timeout.

user_interact

Prompt the user for input via stdout/stdin.

Parameter	Required	Description
`prompt`	yes	Question to display
`timeout_secs`	no	Input timeout

Advanced Tools

infer_rules

Run forward-chaining inference via e-graph rewrite rules.

Parameter	Required	Description
`max_iterations`	no	Rule application iterations (default: 10)
`min_confidence`	no	Minimum confidence threshold (default: 0.5)

gap_analysis

Discover knowledge gaps by analyzing the graph structure.

Parameter	Required	Description
`goal`	no	Focus analysis around a specific goal
`max_gaps`	no	Maximum gaps to report (default: 10)

csv_ingest

Ingest structured data from CSV files.

Parameter	Required	Description
`path`	yes	Path to CSV file
`format`	no	`spo` (subject-predicate-object) or `entity` (header columns as predicates)

text_ingest

Extract facts from natural language text using the grammar parser.

Parameter	Required	Description
`text`	yes	Text to parse, or `file:/path` to read from file
`max_sentences`	no	Maximum sentences to process

code_ingest

Parse Rust source code into knowledge graph entities.

Parameter	Required	Description
`path`	yes	Path to Rust source file or directory
`recursive`	no	Recurse into subdirectories
`run_rules`	no	Apply inference rules after ingestion
`enrich`	no	Run semantic enrichment

docgen

Generate documentation from code knowledge in the graph.

Parameter	Required	Description
`target`	yes	Symbol or path to document
`format`	no	`markdown`, `json`, or `both`
`polish`	no	Apply grammar polishing

Danger Metadata

Each tool carries a ToolManifest with danger metadata:

#![allow(unused)]
fn main() {
DangerLevel::Safe     // No external I/O (kg_query, reason, etc.)
DangerLevel::Caution  // Read-only external access (http_fetch, file_io read)
DangerLevel::Danger   // Write/exec capability (shell_exec, file_io write)
}

Capabilities tracked:

ReadKg, WriteKg -- knowledge graph operations
FileIo -- filesystem access
NetworkAccess -- HTTP requests
ShellExec -- command execution
UserInteraction -- stdin/stdout

The psyche system uses shadow_triggers from the manifest to match veto patterns against tool usage.

Tool Selection

During the Decide phase, each tool receives a utility score:

total_score = base_score - recency_penalty + novelty_bonus
            + episodic_bonus + pressure_bonus + archetype_bonus

The tool with the highest score is selected. See OODA Loop for details on the scoring factors.

Custom Tools via Skills

Skill packs can register additional tools:

CLI tools: JSON manifests describing shell commands.
WASM tools: WebAssembly components (requires wasm-tools feature).

# List all registered tools (built-in + skill-provided)
akh-medu agent tools

Listing Tools

akh-medu agent tools

This shows all registered tools with their names, descriptions, parameters, and danger levels.

Planning and Reflection

The agent generates multi-step plans for each goal and periodically reflects on its progress, adjusting priorities and strategies.

Planning

Before the OODA loop begins work on a goal, the agent generates a Plan -- an ordered sequence of tool calls designed to achieve the goal's success criteria.

Plan Structure

#![allow(unused)]
fn main() {
Plan {
    goal_id: SymbolId,
    steps: Vec<PlanStep>,
    status: PlanStatus,       // Active, Completed, Failed, Superseded
    attempt: u32,             // Incremented on backtrack
    strategy: String,         // Summary of the approach
}

PlanStep {
    tool_name: String,
    tool_input: ToolInput,
    rationale: String,
    status: StepStatus,       // Pending, Active, Completed, Failed, Skipped
    index: usize,
}
}

Strategy Selection

The planner analyzes the goal description and success criteria using VSA-based semantic analysis. It measures interference between the goal text and five strategy patterns:

Strategy	Keywords	When Selected
Knowledge	find, query, search, discover, explore, list, identify	Goal is about finding information
Reasoning	reason, infer, deduce, classify, analyze, why	Goal requires logical derivation
Creation	create, add, build, connect, link, store, write	Goal is about constructing knowledge
External	file, http, command, shell, fetch, download	Goal involves external data
Similarity	similar, like, related, compare, cluster	Goal is about finding relationships

Alternating Strategies

To avoid getting stuck in local optima, the planner alternates between two meta-strategies based on the attempt number:

Even attempts (0, 2, 4...): Explore-first -- knowledge gathering tools, then reasoning tools.
Odd attempts (1, 3, 5...): Reason-first -- reasoning tools, then knowledge tools.

Backtracking

When a plan step fails:

The plan is marked as PlanStatus::Failed.
A new plan is generated with attempt += 1.
The alternating strategy ensures a different approach.
After max_backtrack_attempts (default: 3), the goal fails.

CLI

# Generate and display a plan for the current goal
akh-medu agent plan

# In the REPL
> p
> plan

Reflection

After every N cycles (configurable), the agent pauses to reflect on its performance.

What Reflection Examines

Goal progress: Which goals advanced since the last reflection?
Stagnation: Which goals have been stuck for many cycles?
Tool effectiveness: Which tools produced useful results?
Memory pressure: Is working memory getting full?

Adjustments

Reflection produces a list of Adjustment actions:

Adjustment	Trigger	Effect
Boost priority	Goal made progress	Priority increased to keep momentum
Demote priority	Goal stagnant	Priority decreased to try other goals
Suggest decomposition	Goal stalled beyond threshold	Recommends splitting the goal
Trigger consolidation	Memory pressure high	Saves episodic memories, frees WM

Psyche Evolution

If a psyche is configured, reflection also triggers Psyche::evolve():

Archetype weights adjust based on tool success rates.
Shadow encounter counter grows individuation level.
Dominant archetype is recalculated.

Configuration

#![allow(unused)]
fn main() {
ReflectionConfig {
    interval: usize,            // Reflect every N cycles (default: 5)
    min_cycles_worked: usize,   // Minimum work before reflecting
}
}

CLI

# Trigger reflection manually
akh-medu agent reflect

# In the REPL
> r
> reflect

Plan-Reflection Interaction

Plans and reflection work together:

A plan is generated for a goal.
The OODA loop executes plan steps cycle-by-cycle.
If a step fails, backtracking generates an alternative plan.
During periodic reflection, the agent assesses whether the current plan strategy is working.
Reflection may boost or demote the goal, triggering plan regeneration on the next cycle.

Jungian Psyche

The psyche system models the agent's personality, ethical constraints, and behavioral tendencies using Carl Jung's analytical psychology. It replaces rigid rule-based ethics (Asimov's laws) with a dynamic, evolvable system that learns from experience.

The psyche is a special Core compartment that influences four aspects of the OODA loop:

Tool selection -- archetype weights bias which tools the agent prefers.
Action gating -- shadow veto patterns block dangerous actions.
Reflection -- the psyche evolves based on the agent's performance.
Output style -- the persona's grammar preference controls narrative synthesis.

Structure

Psyche
  ├── Persona           <- outward mask (communication style)
  ├── Shadow            <- constrained anti-patterns (vetoes + biases)
  ├── ArchetypeWeights  <- behavioral tendencies (tool selection bias)
  └── SelfIntegration   <- growth tracking (individuation)

Persona

The Persona controls how the agent communicates. It is the "mask" the agent presents to the user.

[persona]
name = "Scholar"
grammar_preference = "narrative"        # or "formal", "terse", or a custom TOML path
traits = ["precise", "curious", "thorough"]
tone = ["clear", "methodical"]

When the agent synthesizes output (Agent::synthesize_findings()), it checks the persona's grammar_preference and uses it as the default grammar archetype. The grammar framework then structures the output accordingly:

Grammar	Style
`"narrative"`	Flowing prose with topic sentences and connecting phrases.
`"formal"`	Structured sections with headers and bullet points.
`"terse"`	Minimal output, facts only, no elaboration.
`"path/to/custom.toml"`	User-defined grammar rules.

Shadow

The Shadow represents the agent's ethical constraints -- things it must not do (vetoes) and things it should avoid (biases).

[shadow]

[[shadow.veto_patterns]]
name = "destructive_action"
triggers = ["delete all", "drop table", "rm -rf"]
severity = 1.0
explanation = "Destructive actions require explicit user confirmation."

[[shadow.bias_patterns]]
name = "repetitive_loop"
triggers = ["same tool", "repeated"]
severity = 0.3
explanation = "Detected repetitive pattern - try a different approach."

Two severity levels

Level	Mechanism	Effect
Veto patterns	`check_shadow_veto(action_desc)`	Hard block. The action is not executed. A `ShadowVeto` provenance record is created and the shadow encounter counter increments (driving individuation).
Bias patterns	`check_shadow_bias(action_desc)`	Soft penalty. The cumulative severity from matched patterns is logged but the action proceeds.

Trigger matching: Each trigger string is checked as a case-insensitive substring against the action description ("tool={name} input={:?}"). Multiple triggers in a pattern are OR'd -- any match fires the pattern.

When a veto fires (during the Act phase):

The action is blocked -- it never reaches the tool registry.
A DerivationKind::ShadowVeto provenance record is stored.
psyche.record_shadow_encounter() increments the shadow encounter counter.
The cycle returns a GoalProgress::Failed with the veto reason.
The agent adapts on the next cycle by choosing a different tool.

Archetypes

Four Jungian archetypes bias tool selection during the Decide phase:

Archetype	Weight (default)	Preferred tools	Behavioral tendency
Sage	0.7	`kg_query`, `infer_rules`, `reason`, `synthesize_triple`	Seeks understanding and knowledge
Healer	0.5	`gap_analysis`, `user_interact`	Seeks missing knowledge and connection
Explorer	0.5	`file_io`, `http_fetch`, `shell_exec`	Seeks novelty and external data
Guardian	0.4	`memory_recall`, `similarity_search`	Seeks stability and consolidation

Scoring formula: For each tool candidate, the archetype bonus is:

archetype_bonus = (archetype_weight - 0.5) * 0.15

Examples with default weights:

kg_query (Sage, weight 0.7): bonus = +0.030
gap_analysis (Healer, weight 0.5): bonus = 0.000
memory_recall (Guardian, weight 0.4): bonus = -0.015

The bonus is added to the tool's total score alongside base_score, recency_penalty, novelty_bonus, episodic_bonus, and pressure_bonus. The effect is subtle per-cycle but cumulative over time.

Self-Integration (Individuation)

The Self is Jung's integrative center. In the agent, it tracks psychological growth through experience.

[self_integration]
individuation_level = 0.1     # [0.0, 1.0] - how integrated the psyche is
last_evolution_cycle = 0
shadow_encounters = 0         # times shadow patterns were triggered
rebalance_count = 0           # times archetypes were rebalanced
dominant_archetype = "sage"   # derived from highest archetype weight

Individuation growth formula (applied during reflection):

growth = 0.01 * min(shadow_encounters, 5)
individuation_level = min(individuation_level + growth, 1.0)

Encountering and acknowledging the shadow (having actions vetoed, having goals abandoned) drives psychological growth -- mirroring Jung's concept that integrating the shadow is the path to individuation.

Psyche Evolution

Evolution happens automatically during the agent's periodic reflection cycle. When reflect() is called, it passes the psyche to Psyche::evolve(), which:

Rebalances archetypes based on tool effectiveness:
- Consistent success (>70% rate, 2+ uses): archetype weight +0.02
- Flagged ineffective (<30% rate, 2+ uses): archetype weight -0.02
- Weights clamped to [0.1, 0.95]
Acknowledges shadow encounters: Incrementing counter on goal abandonment.
Grows individuation: individuation_level += 0.01 * min(shadow_encounters, 5)
Updates dominant archetype: Recomputed from current weights.

OODA Loop Integration

Phase	Psyche Influence
Observe	None
Orient	None
Decide	Archetype weights added as scoring bonus to tool candidates
Act	Shadow veto/bias check before tool execution
Reflect	`Psyche::evolve()` adjusts weights and grows individuation

The score breakdown is visible in the decision reasoning:

[score=0.85: base=0.80 recency=-0.00 novelty=+0.15 episodic=+0.00 pressure=+0.00 archetype=+0.030]

Persistence

The psyche is persisted in two ways:

Session persistence: Agent::persist_session() serializes the psyche to the durable store (redb). On Agent::resume(), the persisted psyche is restored, preserving evolved weights and individuation across sessions.
Compartment file: data/compartments/psyche/psyche.toml is the initial default, read when the psyche compartment is first loaded.

Priority order on resume: persisted session > compartment manager > default.

Configuration

Changing the persona

[persona]
name = "Explorer"
grammar_preference = "terse"
traits = ["adventurous", "bold", "direct"]
tone = ["energetic", "concise"]

Adding shadow constraints

[[shadow.veto_patterns]]
name = "no_external_network"
triggers = ["http_fetch", "curl", "wget"]
severity = 1.0
explanation = "This agent is configured for offline-only operation."

[[shadow.bias_patterns]]
name = "prefer_internal_reasoning"
triggers = ["shell_exec", "file_io"]
severity = 0.2
explanation = "Prefer KG-based reasoning over external tool calls."

Tuning archetype weights

# More exploratory
[archetypes]
healer = 0.3
sage = 0.4
guardian = 0.3
explorer = 0.9

# More cautious
[archetypes]
healer = 0.6
sage = 0.5
guardian = 0.9
explorer = 0.2

Programmatic manipulation

#![allow(unused)]
fn main() {
use akh_medu::compartment::Psyche;

let mut agent = Agent::new(engine, config)?;

let mut psyche = Psyche::default();
psyche.persona.name = "Mentor".into();
psyche.persona.grammar_preference = "formal".into();
psyche.archetypes.healer = 0.8;
agent.set_psyche(psyche);

if let Some(p) = agent.psyche() {
    println!("Dominant archetype: {}", p.dominant_archetype());
    println!("Individuation: {:.2}", p.self_integration.individuation_level);
}
}

Design Rationale

Why Jung over Asimov? Asimov's three laws are rigid boolean constraints that don't adapt. Jung's model provides a spectrum:

The Shadow has both hard vetoes and soft biases, configurable per deployment.
Archetypes create behavioral tendencies that evolve through experience.
Individuation means the agent's personality matures over time.

Why is the effect subtle? The archetype bonus formula (weight - 0.5) * 0.15 produces a maximum swing of ~0.07 per tool. The psyche should nudge behavior, not override situation-specific signals. Over many cycles, the cumulative effect creates a recognizable personality.

Knowledge Compartments

The compartment system isolates knowledge by purpose. Instead of dumping all triples into a single global graph, triples are tagged with a compartment_id and stored in named graphs in Oxigraph. This makes knowledge portable, removable, and scopable -- a skill's knowledge can be cleanly loaded and unloaded without polluting the rest of the graph.

Compartment Kinds

Kind	Description	Lifecycle
Core	Always-active modules (psyche, personality).	Loaded at engine startup, never unloaded during normal operation.
Skill	Travels with skill packs.	Loaded/unloaded when skills activate/deactivate.
Project	Scoped to a specific project.	Loaded when the user switches project context.

Lifecycle States

Each compartment passes through three states:

Dormant  --load()-->  Loaded  --activate()-->  Active
   ^                     |                        |
   |                     |                        |
   +---- unload() -------+---- deactivate() ------+

Dormant: Discovered on disk but not loaded. No triples in memory.
Loaded: Triples are in the KG; queries can be scoped to this compartment.
Active: Loaded AND actively influencing the OODA loop (e.g., psyche archetypes adjust tool scoring, shadow patterns gate actions).

On-Disk Layout

Compartments live under data/compartments/. Each compartment is a directory containing at minimum a compartment.toml manifest:

data/compartments/
  psyche/
    compartment.toml      # manifest (required)
    psyche.toml           # psyche-specific config (optional)
  personality/
    compartment.toml
  my-skill/
    compartment.toml
    triples.json          # knowledge triples to load (optional)
    rules.toml            # reasoning rules (optional)

Manifest Format

compartment.toml:

id = "psyche"                    # Unique identifier (required)
name = "Jungian Psyche"          # Human-readable name (required)
kind = "Core"                    # "Core", "Skill", or "Project" (required)
description = "Agent psyche..."  # Purpose description
triples_file = "triples.json"   # Path to triples JSON (relative)
rules_file = "rules.toml"       # Path to rules file
grammar_ref = "narrative"        # Built-in grammar or custom TOML path
tags = ["core", "ethics"]        # Domain tags for search

Triples JSON Format

When triples_file is specified, it should be a JSON array of triple objects:

[
  {
    "subject": "Sun",
    "predicate": "has_type",
    "object": "Star",
    "confidence": 0.95
  },
  {
    "subject": "Earth",
    "predicate": "orbits",
    "object": "Sun",
    "confidence": 1.0
  }
]

Each triple is loaded with compartment_id set to the compartment's id. Symbols are resolved or created automatically via the engine.

Triple Tagging

Every Triple and EdgeData carries an optional compartment_id: Option<String>. When a compartment loads triples, they are tagged with its ID. This enables:

Scoped queries: SparqlStore::query_in_graph(sparql, compartment_id) injects a FROM <graph_iri> clause to restrict results.
Clean removal: SparqlStore::remove_graph(compartment_id) drops all triples belonging to a compartment without touching others.
Provenance: DerivationKind::CompartmentLoaded { compartment_id, source_file } tracks which compartment introduced a triple.

Named graph IRIs follow the pattern: https://akh-medu.dev/compartment/{id}.

Usage

#![allow(unused)]
fn main() {
use akh_medu::compartment::{CompartmentManager, CompartmentKind};

let engine = Engine::new(config)?;

if let Some(cm) = engine.compartments() {
    // List discovered compartments.
    let cores = cm.compartments_by_kind(CompartmentKind::Core);

    // Load a compartment's triples into the KG.
    cm.load("my-skill", &engine)?;

    // Mark it as active (influences OODA loop).
    cm.activate("my-skill")?;

    // Query only this compartment's knowledge.
    if let Some(sparql) = engine.sparql() {
        let results = sparql.query_in_graph(
            "SELECT ?s ?p ?o WHERE { ?s ?p ?o }",
            Some("my-skill"),
        )?;
    }

    // Deactivate but keep triples loaded.
    cm.deactivate("my-skill")?;

    // Fully unload (remove triples from KG).
    cm.unload("my-skill", &engine)?;
}
}

Creating a New Compartment

Create a directory: data/compartments/astronomy/

Add compartment.toml:

id = "astronomy"
name = "Astronomy Knowledge"
kind = "Skill"
description = "Star catalogs and celestial mechanics"
triples_file = "triples.json"
tags = ["science", "astronomy"]

Add triples.json:

[
  {"subject": "Sun", "predicate": "has_type", "object": "G-type_star", "confidence": 1.0},
  {"subject": "Earth", "predicate": "orbits", "object": "Sun", "confidence": 1.0},
  {"subject": "Mars", "predicate": "orbits", "object": "Sun", "confidence": 1.0}
]

The engine auto-discovers it on startup. Or trigger manually:

#![allow(unused)]
fn main() {
if let Some(cm) = engine.compartments() {
    cm.discover()?;
    cm.load("astronomy", &engine)?;
    cm.activate("astronomy")?;
}
}

Error Handling

All compartment operations return CompartmentResult<T>. Errors include:

Error	Code	When
`NotFound`	`akh::compartment::not_found`	ID not in registry. Run `discover()` first.
`AlreadyLoaded`	`akh::compartment::already_loaded`	Tried to load a non-Dormant compartment.
`InvalidManifest`	`akh::compartment::invalid_manifest`	Malformed `compartment.toml` or `triples.json`.
`Io`	`akh::compartment::io`	Filesystem error.
`KindMismatch`	`akh::compartment::kind_mismatch`	Wrong compartment kind for the operation.

Provenance

Three provenance variants track compartment and psyche activity:

Variant	Tag	Description
`CompartmentLoaded`	15	Records when triples are loaded from a compartment.
`ShadowVeto`	16	Records when a shadow pattern blocks an action.
`PsycheEvolution`	17	Records when the psyche auto-adjusts during reflection.

Design Rationale

Without isolation, a skill's triples are indistinguishable from the rest of the KG once loaded. You can't unload a skill without knowing exactly which triples it introduced. Compartment tagging solves this -- compartment_id on every triple makes removal clean and queries scopable.

Workspaces

Workspaces isolate engine instances with separate data, configuration, and agent sessions. Each workspace has its own knowledge graph, symbols, skills, and compartments.

XDG Directory Layout

akh-medu follows XDG Base Directory conventions:

~/.config/akh-medu/                     # XDG_CONFIG_HOME
    config.toml                         # global config
    workspaces/
        default.toml                    # per-workspace config
        project-alpha.toml

~/.local/share/akh-medu/                # XDG_DATA_HOME
    workspaces/
        default/
            kg/                         # oxigraph, redb, hnsw data
            skills/                     # activated skill data
            compartments/               # compartment data
            scratch/                    # agent scratch space
        project-alpha/
            kg/
            ...
    seeds/                              # installed seed packs

~/.local/state/akh-medu/                # XDG_STATE_HOME
    sessions/
        default.bin                     # agent session state
        project-alpha.bin

Override any path via environment variables:

Variable	Default	Description
`XDG_CONFIG_HOME`	`~/.config`	Configuration files
`XDG_DATA_HOME`	`~/.local/share`	Persistent data
`XDG_STATE_HOME`	`~/.local/state`	Session state

Workspace Configuration

Each workspace has a TOML config file:

# ~/.config/akh-medu/workspaces/default.toml
name = "default"
dimension = 10000                       # hypervector dimension
encoding = "bipolar"                    # encoding scheme
language = "auto"                       # default grammar language
max_memory_mb = 1024                    # hot-tier memory budget
max_symbols = 1000000                   # symbol registry limit
seed_packs = ["identity", "ontology", "common-sense"]  # auto-applied on init
shared_partitions = []                  # mounted shared partitions

Managing Workspaces

CLI

# List all workspaces
akh-medu workspace list

# Create a new workspace
akh-medu workspace create my-project

# Show workspace info
akh-medu workspace info default

# Delete a workspace (removes all data)
akh-medu workspace delete my-project

Using a Specific Workspace

Pass -w or --workspace to any command:

# Initialize a named workspace
akh-medu -w my-project init

# Query in a specific workspace
akh-medu -w my-project query --seeds "Dog" --depth 2

# Run agent in a workspace
akh-medu -w my-project agent run --goals "..." --max-cycles 10

REST API

The server manages multiple workspaces simultaneously:

# List workspaces
curl http://localhost:8200/workspaces

# Create workspace
curl -X POST http://localhost:8200/workspaces/my-project

# Delete workspace
curl -X DELETE http://localhost:8200/workspaces/my-project

# Workspace status
curl http://localhost:8200/workspaces/default/status

Workspace Manager

The WorkspaceManager handles CRUD operations:

#![allow(unused)]
fn main() {
use akh_medu::workspace::WorkspaceManager;

let manager = WorkspaceManager::new(paths);

// Create a workspace
let ws_paths = manager.create(config)?;

// List all
let names = manager.list();

// Get config
let config = manager.info("default")?;

// Resolve paths
let paths = manager.resolve("default")?;

// Delete
manager.delete("my-project")?;
}

Auto-Seeding

When a workspace is initialized, the seed packs listed in its config are applied automatically:

seed_packs = ["identity", "ontology", "common-sense"]

This bootstraps the workspace with foundational knowledge on first creation. See Seed Packs for details.

Shared Partitions

Workspaces can mount shared partitions to access knowledge that lives outside any single workspace:

shared_partitions = ["shared-ontology", "company-knowledge"]

Shared partitions are read-write named graphs stored independently of any workspace, making them accessible from multiple workspaces simultaneously.

Grammars

The grammar framework is a bidirectional system for parsing natural language into structured data (abstract syntax trees) and linearizing structured data back into prose. It operates without any ML models -- all parsing and generation is rule-based.

Architecture

Prose Input --> Lexer --> Parser --> AbsTree --> ConcreteGrammar --> Styled Prose
   |             |          |          ^              ^
SymbolRegistry  VSA      bridge.rs    GrammarRegistry (formal/terse/
(exact match)   (fuzzy)              narrative/custom)

The system has two layers:

Abstract layer: Language-neutral AbsTree nodes representing entities, relations, triples, and sequences.
Concrete layer: Grammar-specific linearization rules that turn AbsTree nodes into styled prose.

Abstract Syntax Trees

The AbsTree type represents parsed meaning:

Variant	Description	Example
`Entity(String)`	A named thing	`Entity("Dog")`
`Relation(String)`	A relationship	`Relation("is-a")`
`Triple { subj, pred, obj }`	An RDF triple	`Triple(Entity("Dog"), Relation("is-a"), Entity("mammal"))`
`List(Vec<AbsTree>)`	Ordered collection	Multiple triples
`Sequence(Vec<AbsTree>)`	Narrative sequence	Story-like output
`Tag { tree, tag }`	Provenance/role tag	VSA role annotation

Built-in Grammar Archetypes

Narrative

Flowing, story-like prose for interactive sessions:

The Dog has a relationship of type is-a with mammal.
Furthermore, the mammal possesses the property warm blood.

Formal

Structured, academic-style output:

## Dog
- is-a: mammal (confidence: 0.95)
- has-part: tail (confidence: 0.85)

Terse

Minimal output, facts only:

Dog is-a mammal [0.95]
mammal has warm blood [0.85]

Using the Grammar System

CLI

# Parse prose into abstract syntax
akh-medu grammar parse "Dogs are mammals"

# Parse and ingest into the knowledge graph
akh-medu grammar parse "Dogs are mammals" --ingest

# Linearize a triple back to prose
akh-medu grammar linearize --subject Dog --predicate is-a --object mammal

# Compare a triple against the KG
akh-medu grammar compare --subject Dog --predicate is-a --object mammal

# List available archetypes
akh-medu grammar list

# Load a custom grammar from TOML
akh-medu grammar load --file /path/to/grammar.toml

# Render an entity's KG neighborhood
akh-medu grammar render --entity Dog

In the TUI

Switch grammar archetypes with the /grammar command:

/grammar narrative
/grammar formal
/grammar terse

The active grammar controls how the agent formats its responses.

Custom Grammars

Implement the ConcreteGrammar trait:

#![allow(unused)]
fn main() {
pub trait ConcreteGrammar: Send + Sync {
    fn name(&self) -> &str;
    fn linearize(&self, tree: &AbsTree, ctx: &GrammarContext) -> GrammarResult<String>;
    fn parse(&self, input: &str, expected_cat: Option<&Cat>, ctx: &GrammarContext) -> GrammarResult<AbsTree>;
}
}

Or load a TOML-defined grammar at runtime:

akh-medu grammar load --file my-grammar.toml

#![allow(unused)]
fn main() {
engine.grammar_registry().register(Box::new(MyGrammar));
engine.grammar_registry().set_default("my-grammar")?;
}

Multilingual Support

The grammar system supports five languages:

Language	Code	Relational Patterns	Void Words
English	`en`	21	a, an, the
Russian	`ru`	13	(none)
Arabic	`ar`	11	al
French	`fr`	16	le, la, les, un, une, etc.
Spanish	`es`	14	el, la, los, las, un, una, etc.

Language detection is automatic (script analysis + word frequency heuristics) or can be forced:

# Auto-detect
akh-medu grammar parse "Собаки являются млекопитающими"

# Force Russian
akh-medu --language ru grammar parse "Собаки являются млекопитающими"

All languages map to the same 9 canonical predicates: is-a, has-a, contains, located-in, causes, part-of, composed-of, similar-to, depends-on.

Lexer and Parser Pipeline

Lexer: Tokenizes input, strips void words, matches relational patterns. Unknown tokens are resolved via the symbol registry (exact match) or VSA item memory (fuzzy match, threshold >= 0.60).
Parser: Builds AbsTree from tokens. Detects intent (query, goal, statement) and categorizes grammatically.
Bridge: Converts between AbsTree and KG operations. commit_abs_tree() inserts parsed triples into the knowledge graph.

Agent Integration

The agent's persona controls the default grammar via grammar_preference:

[persona]
name = "Scholar"
grammar_preference = "narrative"

When Agent::synthesize_findings() runs, it uses the persona's preferred grammar to linearize results. See Jungian Psyche for details.

Shared Partitions

Partitions are named SPARQL graphs that can be shared across multiple workspaces. They provide a way to maintain common knowledge bases (ontologies, domain facts, company knowledge) independently of any single workspace.

Concepts

A partition is a named graph in the SPARQL store:

#![allow(unused)]
fn main() {
Partition {
    name: String,            // e.g., "ontology", "common-sense"
    graph_name: String,      // SPARQL IRI for the named graph
    source: PartitionSource, // Local or Shared
}
}

Partition Sources

Source	Description	Storage
`Local { workspace }`	Owned by a single workspace	Inside workspace data directory
`Shared { path }`	Independent of any workspace	Standalone directory

Shared partitions live in a dedicated directory and are mounted by workspaces via configuration.

Creating Shared Partitions

Programmatic

#![allow(unused)]
fn main() {
use akh_medu::partition::PartitionManager;

let pm = PartitionManager::new(partitions_dir);

// Create a new shared partition
let partition = pm.create_shared("company-ontology")?;

// Insert triples into the partition
partition::insert_into_partition(
    &engine,
    triple,
    "company-ontology",
)?;
}

Via Workspace Config

Mount shared partitions in the workspace configuration:

# ~/.config/akh-medu/workspaces/default.toml
shared_partitions = ["company-ontology", "shared-common-sense"]

When the workspace is loaded, mounted partitions become queryable.

Querying Partitions

Query a specific partition's triples:

#![allow(unused)]
fn main() {
let results = partition::query_partition(
    &engine,
    "company-ontology",
    "?s ?p ?o",
)?;
}

This is equivalent to a SPARQL query with a FROM <partition_graph> clause, restricting results to triples within that partition.

Partition Manager

The PartitionManager handles discovery and lifecycle:

#![allow(unused)]
fn main() {
let pm = PartitionManager::new(partitions_dir);

// Discover existing partitions on disk
let count = pm.discover()?;

// Register a partition
pm.register(partition)?;

// List all partitions
let names = pm.list();

// Get a specific partition
let p = pm.get("ontology")?;

// Remove a partition
pm.remove("old-partition")?;
}

Use Cases

Shared ontologies: A common set of relations and categories used by all workspaces in an organization.
Domain knowledge: Medical, legal, or engineering terminology shared across project workspaces.
Cross-workspace inference: Triples in a shared partition are visible to inference and traversal in any workspace that mounts it.

Relationship to Compartments

Partitions and compartments both use SPARQL named graphs, but serve different purposes:

Feature	Compartments	Partitions
Scope	Within a workspace	Across workspaces
Lifecycle	Load/activate/deactivate/unload	Mount/unmount
Influence on agent	Active compartments affect OODA loop	Passive data source
Typical use	Skills, psyche, project data	Shared ontologies, common facts

akh-medu Server

Multi-workspace server hosting N engine instances with REST and WebSocket APIs.

Building

cargo build --release --features server --bin akh-medu-server

Running

# Default: 0.0.0.0:8200
akh-medu-server

# Custom bind/port via env vars
AKH_SERVER_BIND=127.0.0.1 AKH_SERVER_PORT=9000 akh-medu-server

Environment Variables

Variable	Default	Description
`AKH_SERVER_BIND`	`0.0.0.0`	Bind address
`AKH_SERVER_PORT`	`8200`	Listen port
`RUST_LOG`	`info`	Log level filter
`XDG_DATA_HOME`	`~/.local/share`	XDG data directory
`XDG_CONFIG_HOME`	`~/.config`	XDG config directory
`XDG_STATE_HOME`	`~/.local/state`	XDG state directory

Directory Layout

~/.config/akh-medu/
    config.toml                 # global config
    workspaces/
        default.toml            # per-workspace config

~/.local/share/akh-medu/
    workspaces/
        default/
            kg/                 # oxigraph, redb, hnsw
            skills/             # activated skill data
            compartments/       # compartment data
            scratch/            # agent scratch space
        project-alpha/
            kg/
            ...
    seeds/                      # installed seed packs

~/.local/state/akh-medu/
    sessions/
        default.bin             # agent session state

REST API

Health

curl http://localhost:8200/health

Response:

{
  "status": "ok",
  "version": "0.1.0",
  "workspaces_loaded": 2
}

List Workspaces

curl http://localhost:8200/workspaces

Response:

{
  "workspaces": ["default", "project-alpha"]
}

Create Workspace

curl -X POST http://localhost:8200/workspaces/my-project

Response:

{
  "name": "my-project",
  "created": true
}

Delete Workspace

curl -X DELETE http://localhost:8200/workspaces/my-project

Response:

{
  "deleted": "my-project"
}

Workspace Status

curl http://localhost:8200/workspaces/default/status

Response:

{
  "name": "default",
  "symbols": 142,
  "triples": 89
}

Apply Seed Pack

curl -X POST http://localhost:8200/workspaces/default/seed/identity

Response:

{
  "pack": "identity",
  "triples_applied": 18,
  "already_applied": false
}

Preprocess Text

curl -X POST http://localhost:8200/workspaces/default/preprocess \
  -H 'Content-Type: application/json' \
  -d '{"chunks": [{"id": "1", "text": "The Sun is a star."}]}'

List Equivalences

curl http://localhost:8200/workspaces/default/equivalences

Equivalence Stats

curl http://localhost:8200/workspaces/default/equivalences/stats

WebSocket Protocol

Connect to ws://localhost:8200/ws/{workspace} for a streaming TUI session.

Client Messages

Input (natural language):

{
  "type": "input",
  "text": "What is the Sun?"
}

Command:

{
  "type": "command",
  "text": "status"
}

Available commands: status, goals.

Server Messages

The server streams AkhMessage JSON objects back. Each message has a type field:

{"type": "fact", "text": "Sun is-a Star", "confidence": 0.95, "provenance": null}
{"type": "system", "text": "Connected to workspace \"default\"."}
{"type": "tool_result", "tool": "kg_query", "success": true, "output": "Found 3 triples."}
{"type": "goal_progress", "goal": "Explore Sun", "status": "Active", "detail": null}
{"type": "error", "code": "ws", "message": "workspace not found", "help": null}

Systemd Example

[Unit]
Description=akh-medu Knowledge Server
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/akh-medu-server
Environment=AKH_SERVER_BIND=127.0.0.1
Environment=AKH_SERVER_PORT=8200
Environment=RUST_LOG=info,egg=warn
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

akh-medu + Eleutherios Integration Guide

Overview

akh-medu serves as a multilingual pre-processor that sits between Eleutherios's document chunking stage and its LLM extraction pipeline. Where Eleutherios's 7-dimensional extraction is English-strong but degrades for Russian, Arabic, and historical mixed-language sources, akh-medu's GF-based grammar parser runs in sub-millisecond time per chunk and produces language-neutral structured data: entities, relations, and claims with confidence scores.

Documents
    |
    v
Eleutherios chunking (document -> text chunks)
    |
    v
akh-medu pre-processor (text -> entities + claims + AbsTrees)
    |
    v
Eleutherios LLM enrichment (Mistral Nemo 12B -> 7D extraction)
    |
    v
Neo4j / pgvector

The pre-processor gives Eleutherios cleaner starting data that reduces extraction noise, particularly for multilingual corpora (technical manuals, diplomatic correspondence, academic texts with mixed-language citations, etc.).

Quick Start

Build

# Core binary (CLI pre-processor)
cargo build --release

# HTTP server (optional, for network integration)
cargo build --release --features server

CLI Pipeline (JSONL)

Eleutherios pipes chunked text through stdin and reads structured output from stdout:

# Auto-detect language per chunk
cat chunks.jsonl | ./target/release/akh-medu preprocess --format jsonl > structured.jsonl

# Force a specific language
cat russian_chunks.jsonl | ./target/release/akh-medu preprocess --format jsonl --language ru > structured.jsonl

Input format (one JSON object per line):

{"id": "doc1-p1", "text": "Protein folding is a fundamental process in molecular biology."}
{"id": "doc1-p2", "text": "Misfolded proteins are associated with neurodegenerative diseases."}

The id field is optional but recommended for traceability. The language field is optional; omit it to auto-detect.

Output format (one JSON object per line):

{
  "chunk_id": "doc1-p1",
  "source_language": "en",
  "detected_language_confidence": 0.80,
  "entities": [
    {
      "name": "protein folding",
      "entity_type": "CONCEPT",
      "canonical_name": "protein folding",
      "confidence": 0.83,
      "aliases": [],
      "source_language": "en"
    }
  ],
  "claims": [
    {
      "claim_text": "Protein folding is a fundamental process in molecular biology.",
      "claim_type": "FACTUAL",
      "confidence": 0.83,
      "subject": "protein folding",
      "predicate": "is-a",
      "object": "fundamental process in molecular biology",
      "source_language": "en"
    }
  ],
  "abs_trees": [...]
}

CLI Pipeline (JSON batch)

For batch processing, use --format json with a JSON array on stdin:

echo '[
  {"id": "1", "text": "Gravity is a fundamental force of nature."},
  {"id": "2", "text": "Гравитация является фундаментальной силой природы."}
]' | ./target/release/akh-medu preprocess --format json

Returns:

{
  "results": [...],
  "processing_time_ms": 0
}

HTTP Server

For network-accessible integration (e.g., Eleutherios calling akh-medu over HTTP):

# Start server on port 8200
./target/release/akh-medu-server

Endpoints summary:

Method	Path	Description
`GET`	`/health`	Status, version, supported languages
`GET`	`/languages`	List languages with pattern counts
`POST`	`/preprocess`	Pre-process text chunks
`GET`	`/equivalences`	List all learned equivalences
`GET`	`/equivalences/stats`	Equivalence counts by source
`POST`	`/equivalences/learn`	Run learning strategies
`POST`	`/equivalences/import`	Import equivalences from JSON

POST /preprocess:

curl -X POST http://localhost:8200/preprocess \
  -H 'Content-Type: application/json' \
  -d '{
    "chunks": [
      {"id": "1", "text": "The cell membrane contains phospholipids."},
      {"id": "2", "text": "Клеточная мембрана содержит фосфолипиды."},
      {"id": "3", "text": "La membrane cellulaire contient des phospholipides."}
    ]
  }'

Running with Eleutherios Docker

This section covers running the full stack: Eleutherios Docker services, Ollama for LLM inference, and akh-medu as the pre-processing layer.

Prerequisites

Component	Minimum Version	Purpose
Docker	24+	Runs Neo4j, PostgreSQL, and the Eleutherios API
Docker Compose	v2	Orchestrates the services
Ollama	0.5+	Hosts LLM models (Mistral Nemo 12B, nomic-embed-text)
Rust toolchain	1.85+	Builds akh-medu
RAM	16 GB (32 GB recommended)	Mistral Nemo 12B alone needs ~8.4 GB

Step 1: Start Ollama on All Interfaces

Ollama must listen on 0.0.0.0 (not just 127.0.0.1) so Docker containers can reach it via host.docker.internal:

# Start Ollama listening on all interfaces
OLLAMA_HOST=0.0.0.0:11434 ollama serve &

# Pull required models
ollama pull nomic-embed-text      # Embeddings (274 MB)
ollama pull mistral-nemo:12b      # Extraction LLM (7.1 GB)

# Verify
curl -s http://localhost:11434/api/tags | python3 -c "
import sys, json
for m in json.load(sys.stdin).get('models', []):
    print(f'  {m[\"name\"]}')
"

Common mistake: If Ollama is started without OLLAMA_HOST=0.0.0.0:11434, Docker containers will get "connection refused" when calling the LLM. You can verify the listening address with ss -tlnp | grep 11434 — it must show *:11434, not 127.0.0.1:11434.

Step 2: Start Eleutherios Services

# Clone Eleutherios Docker
git clone https://github.com/Eleutherios-project/Eleutherios-docker.git
cd Eleutherios-docker

# Create data directories
mkdir -p data/inbox data/processed data/calibration_profiles

# Start services (Neo4j + PostgreSQL + API)
docker compose up -d

# First startup takes several minutes (demo data seeding).
# Watch progress:
docker compose logs -f api

The Eleutherios API (port 8001) runs a demo data import on first start (SEED_ON_FIRST_RUN=true) with ~144K Cypher statements. This typically takes 5-15 minutes. Wait until the health check passes:

# Poll until healthy
until curl -sf http://localhost:8001/api/health/simple; do
    echo "Waiting for Eleutherios API..."
    sleep 10
done
echo "API is ready"

Step 3: Build and Start akh-medu

cd /path/to/akh-medu

# Build CLI and HTTP server
cargo build --release
cargo build --release --features server

# Start the pre-processing server
./target/release/akh-medu-server &

# Verify
curl -s http://localhost:8200/health

Step 4: Load Documents

Copy your PDF/EPUB corpus into the Eleutherios inbox:

# Copy files (do NOT symlink — Docker bind mounts can't follow host symlinks)
cp /path/to/your/corpus/*.pdf /path/to/Eleutherios-docker/data/inbox/

# Verify files are visible inside the container
curl -s http://localhost:8001/api/list-inbox-files | python3 -c "
import sys, json
d = json.load(sys.stdin)
print(f'{d[\"total_count\"]} files ({d[\"total_size_mb\"]:.0f} MB)')
for f in d['files'][:5]:
    print(f'  {f[\"filename\"]} ({f[\"size_mb\"]:.1f} MB)')
"

Important: Do not use symlinks into the Docker inbox directory. Docker bind mounts expose the directory to the container, but symlinks pointing to paths outside the mount will appear as broken links inside the container.

Step 5: Run the Pipeline

# Start the Eleutherios load pipeline
JOB_ID=$(curl -s -X POST http://localhost:8001/api/load-pipeline \
  -H "Content-Type: application/json" \
  -d '{
    "type": "pdfs",
    "path": "/app/data/inbox",
    "selected_files": ["your-document.pdf"]
  }' | python3 -c "import sys,json; print(json.load(sys.stdin)['job_id'])")

echo "Job started: $JOB_ID"

# Monitor progress
watch -n 10 "curl -s http://localhost:8001/api/load-status/$JOB_ID | python3 -c \"
import sys,json
d=json.load(sys.stdin)
print(f'Status: {d.get(\"status\")} | Progress: {d.get(\"progress_percent\",0)}%')
s=d.get('stats',{})
print(f'Entities: {s.get(\"entities\",0)} | Claims: {s.get(\"claims\",0)}')
\""

Step 6: Pre-Process with akh-medu

After Eleutherios completes Step 1 (chunking), retrieve the JSONL output and run it through akh-medu for immediate structural extraction:

# Copy the JSONL chunks from the container
docker cp aegis-api:/tmp/aegis_imports/${JOB_ID}_jsonl/combined_chunks.jsonl /tmp/chunks.jsonl

# Pre-process through akh-medu (CLI)
cat /tmp/chunks.jsonl | ./target/release/akh-medu preprocess --format jsonl > /tmp/structured.jsonl

# Or via HTTP (batched, for production)
python3 -c "
import json, urllib.request

chunks = []
with open('/tmp/chunks.jsonl') as f:
    for line in f:
        data = json.loads(line)
        chunks.append({'id': data['metadata']['doc_id'], 'text': data['text']})

# Process in batches of 20
for i in range(0, len(chunks), 20):
    batch = chunks[i:i+20]
    payload = json.dumps({'chunks': batch}).encode()
    req = urllib.request.Request(
        'http://localhost:8200/preprocess',
        data=payload,
        headers={'Content-Type': 'application/json'}
    )
    resp = urllib.request.urlopen(req)
    result = json.loads(resp.read())
    for r in result['results']:
        print(json.dumps(r))
" > /tmp/structured.jsonl

Benchmark: Real-World Performance

Tested with "Memories, Dreams, Reflections" by Carl Jung (1.8 MB PDF):

Metric	akh-medu (grammar)	Eleutherios (Mistral Nemo 12B on CPU)
Chunks processed	196	196
Processing time	0.8 seconds	Hours (CPU) / minutes (GPU)
Throughput	~300 chunks/sec	~0.3 chunks/sec (CPU)
Entities extracted	2,943	Requires LLM inference
Claims extracted	1,474	Requires LLM inference
GPU required	No	Strongly recommended

akh-medu provides near-instant structural pre-extraction that complements the deeper but slower LLM-based extraction. High-confidence akh-medu claims can be ingested directly while the LLM pipeline runs.

Memory Requirements

Running the full stack simultaneously:

Component	Memory Usage
Neo4j	~2-3 GB
PostgreSQL + pgvector	~0.5 GB
Eleutherios API	~1-2 GB
Ollama (Mistral Nemo 12B)	~8.4 GB
akh-medu server	~50 MB
Total	~12-14 GB

If you hit "model requires more system memory" errors from Ollama, free memory by dropping filesystem caches (sync && echo 3 | sudo tee /proc/sys/vm/drop_caches) or by stopping unused services.

Step-by-Step Tutorial

This section walks through a complete end-to-end workflow that you can replicate with your own corpus.

1. Prepare Your Corpus

akh-medu expects text chunks as JSON objects. Each chunk has a text field and an optional id and language field.

Create a sample corpus file (chunks.jsonl):

cat > /tmp/chunks.jsonl << 'EOF'
{"id": "intro-en-1", "text": "The mitochondria is a membrane-bound organelle found in eukaryotic cells."}
{"id": "intro-en-2", "text": "ATP synthesis depends on the electron transport chain."}
{"id": "intro-ru-1", "text": "Митохондрия является мембранным органоидом эукариотических клеток."}
{"id": "intro-fr-1", "text": "La mitochondrie est un organite présent dans les cellules eucaryotes."}
{"id": "intro-es-1", "text": "La mitocondria es un orgánulo presente en las células eucariotas."}
{"id": "intro-ar-1", "text": "الميتوكوندريا هي عضية موجودة في الخلايا حقيقية النواة."}
EOF

You can also convert existing documents. For PDF/EPUB corpora, use your preferred text extraction tool first:

# Example: extract text from PDFs, then chunk
# (Use your own extraction tool — pdftotext, Apache Tika, etc.)
pdftotext /path/to/your/document.pdf - | \
  python3 -c "
import sys, json
text = sys.stdin.read()
# Simple paragraph-level chunking (replace with your own chunking strategy)
paragraphs = [p.strip() for p in text.split('\n\n') if p.strip()]
for i, para in enumerate(paragraphs):
    print(json.dumps({'id': f'doc-p{i}', 'text': para}))
" > /tmp/chunks.jsonl

2. Pre-Process via CLI

# Build the binary
cargo build --release

# Run pre-processing with language auto-detection
cat /tmp/chunks.jsonl | ./target/release/akh-medu preprocess --format jsonl > /tmp/structured.jsonl

# Inspect the output
head -1 /tmp/structured.jsonl | python3 -m json.tool

Expected output (formatted for readability):

{
  "chunk_id": "intro-en-1",
  "source_language": "en",
  "detected_language_confidence": 0.80,
  "entities": [
    {
      "name": "mitochondria",
      "entity_type": "CONCEPT",
      "canonical_name": "mitochondria",
      "confidence": 0.90,
      "aliases": [],
      "source_language": "en"
    },
    {
      "name": "membrane-bound organelle",
      "entity_type": "CONCEPT",
      "canonical_name": "membrane-bound organelle",
      "confidence": 0.90,
      "aliases": [],
      "source_language": "en"
    }
  ],
  "claims": [
    {
      "claim_text": "The mitochondria is a membrane-bound organelle found in eukaryotic cells.",
      "claim_type": "FACTUAL",
      "confidence": 0.90,
      "subject": "mitochondria",
      "predicate": "is-a",
      "object": "membrane-bound organelle",
      "source_language": "en"
    }
  ],
  "abs_trees": [...]
}

Force a specific language when auto-detection isn't reliable (short text, mixed-script content, or unsupported language using English as fallback):

cat /tmp/german_chunks.jsonl | ./target/release/akh-medu preprocess --format jsonl --language en > /tmp/structured.jsonl

3. Pre-Process via HTTP

# Build and start the server
cargo build --release --features server
./target/release/akh-medu-server &

# Wait for startup
sleep 1

# Check health
curl -s http://localhost:8200/health | python3 -m json.tool

# Pre-process chunks
curl -s -X POST http://localhost:8200/preprocess \
  -H 'Content-Type: application/json' \
  -d '{
    "chunks": [
      {"id": "en-1", "text": "The cell membrane contains phospholipids."},
      {"id": "ru-1", "text": "Клеточная мембрана содержит фосфолипиды."}
    ]
  }' | python3 -m json.tool

# Check supported languages
curl -s http://localhost:8200/languages | python3 -m json.tool

4. Interpret the Output

Each output object maps directly to Eleutherios's data model:

Output Field	Type	Purpose
`chunk_id`	`string?`	Ties back to the input chunk
`source_language`	`string`	BCP 47 code (`en`, `ru`, `ar`, `fr`, `es`)
`detected_language_confidence`	`f32`	0.0–1.0, how sure the detector is
`entities[].name`	`string`	Surface form of the entity
`entities[].entity_type`	`string`	`CONCEPT` or `PLACE`
`entities[].canonical_name`	`string`	Cross-language resolved name
`entities[].confidence`	`f32`	Extraction confidence
`entities[].aliases`	`string[]`	Known surface variants
`entities[].source_language`	`string`	Language the entity was found in
`claims[].claim_text`	`string`	Original sentence
`claims[].claim_type`	`string`	`FACTUAL`, `CAUSAL`, `SPATIAL`, etc.
`claims[].subject`	`string`	Subject entity label
`claims[].predicate`	`string`	Canonical predicate (`is-a`, `causes`, etc.)
`claims[].object`	`string`	Object entity label
`claims[].confidence`	`f32`	Extraction confidence
`abs_trees[]`	`AbsTree`	Raw abstract syntax trees (for advanced use)

5. Feed into Eleutherios

Use the structured output to seed Eleutherios's extraction pipeline. The pre-extracted entities and claims give the LLM a head start:

import json

# Read akh-medu output
with open("/tmp/structured.jsonl") as f:
    for line in f:
        result = json.loads(line)

        # Use source_language to route to language-specific models
        lang = result["source_language"]

        # Pre-extracted entities → seed Neo4j / entity resolution
        for entity in result["entities"]:
            # entity["canonical_name"] is already cross-language resolved
            upsert_entity(entity["canonical_name"], entity["entity_type"])

        # Pre-extracted claims → skip LLM for high-confidence relations
        for claim in result["claims"]:
            if claim["confidence"] >= 0.85:
                # High confidence: ingest directly
                insert_triple(claim["subject"], claim["predicate"], claim["object"])
            else:
                # Lower confidence: send to LLM for validation
                queue_for_llm_validation(claim)

What Meaning We Extract

Claim Types

The pre-processor classifies every extracted relation into a claim type that Eleutherios can use to route into the appropriate dimension of its extraction pipeline:

Claim Type	Predicates	Example
FACTUAL	`is-a`, `has-a`, `contains`, `implements`, `defines`	"Gravity is a fundamental force"
CAUSAL	`causes`	"Deforestation causes soil erosion"
SPATIAL	`located-in`	"CERN is located in Geneva"
RELATIONAL	`similar-to`	"Graphene is similar to carbon nanotubes"
STRUCTURAL	`part-of`, `composed-of`	"The cortex is part of the brain"
DEPENDENCY	`depends-on`	"Photosynthesis depends on sunlight"

Entity Types

Type	Inferred When
CONCEPT	Default for most entities
PLACE	Object of `located-in` predicate

Canonical Predicates

All languages map to the same 9 canonical predicates, ensuring Eleutherios receives uniform relation labels regardless of source language:

Canonical	Meaning
`is-a`	Classification / type hierarchy
`has-a`	Possession / attribute
`contains`	Containment / composition
`located-in`	Spatial location
`causes`	Causation
`part-of`	Meronymy (part-whole)
`composed-of`	Material composition
`similar-to`	Similarity / analogy
`depends-on`	Dependency

Available Languages

Language	Code	Detection	Patterns	Void Words	Notes
English	`en`	Latin script + word frequency	21	a, an, the	Default fallback
Russian	`ru`	Cyrillic script (>0.95 conf)	13	(none)	No articles in Russian
Arabic	`ar`	Arabic script (>0.95 conf)	11	ال	RTL handled correctly
French	`fr`	Latin + diacritics (é, ç) + markers	16	le, la, les, un, une, des, du, de, d', l'	Accent-insensitive matching
Spanish	`es`	Latin + diacritics (ñ) + markers + ¿¡	14	el, la, los, las, un, una, unos, unas, del, de, al	Inverted punctuation stripped
Auto	`auto`	Script analysis + heuristics	(selected per chunk)	(selected per chunk)	Default mode

Language Detection

Detection uses a two-stage approach that requires no external NLP models:

Stage 1 — Script analysis (highest confidence):

Script	Unicode Range	Detection	Confidence
Cyrillic	U+0400..U+052F	>50% of alphabetic chars	0.70 + ratio×0.25 (max 0.95)
Arabic	U+0600..U+06FF, U+0750..U+077F, U+08A0..U+08FF	>50% of alphabetic chars	0.70 + ratio×0.25 (max 0.95)
Latin	U+0041..U+024F	Proceeds to Stage 2	—

Stage 2 — Latin disambiguation (word frequency + diacritics):

Each Latin-script text is scored against word frequency markers:

Language	Marker Words	Diacritical Boost
English	the, is, are, was, with, from, this, that, and, for, not, but, have, will, would, can, could, should, it, they, we, you, he, she (28)	—
French	le, la, les, des, est, dans, avec, une, sur, pour, pas, qui, que, sont, ont, fait, plus, mais, aussi, cette, nous, vous, ils, elles (24)	é, è, ê, ë, ç, à, ù, î, ô, œ → +2.0
Spanish	el, los, las, está, tiene, por, para, pero, también, como, más, son, hay, ser, estar, muy, todo, puede, sobre, ese, esa, estos (26)	ñ, á, í, ó, ú, ü → +2.0; ¿ ¡ → +3.0

Winner is the language with the highest normalized score. Confidence ranges from 0.60–0.85.

Mixed-Language Corpora

For documents that contain multiple languages (e.g., English text with Russian quotations or Arabic transliterations), use auto-detection per chunk rather than forcing a single language:

# Each chunk is detected independently — this is the default behavior
cat mixed_corpus.jsonl | ./target/release/akh-medu preprocess --format jsonl > structured.jsonl

You can also split mixed-language documents at the sentence level before chunking:

# Pre-split mixed-language paragraphs into per-sentence chunks
import json

paragraph = "The experiment was conducted in Moscow. Результаты были опубликованы в журнале."

# Simple sentence-level splitting
sentences = [s.strip() for s in paragraph.replace('. ', '.\n').split('\n') if s.strip()]
for i, sentence in enumerate(sentences):
    print(json.dumps({"id": f"para1-s{i}", "text": sentence}))

akh-medu's internal detect_per_sentence() function handles this automatically when processing through the grammar module.

Cross-Language Entity Resolution

When the same entity appears in different languages, the resolver unifies them under a canonical English label:

"Moscow"  (EN) -> canonical: "Moscow"
"Москва"  (RU) -> canonical: "Moscow", aliases: ["Москва"]
"Moscou"  (FR) -> canonical: "Moscow", aliases: ["Moscou"]
"موسكو"   (AR) -> canonical: "Moscow", aliases: ["موسكو"]

Static Equivalence Table

The compiled-in static table covers ~120 entries across categories:

Category	Examples
Countries	Russia/Россия/Russie, China/Китай/Chine, France/Франция/Francia
Cities	Moscow/Москва/Moscou, Beijing/Пекин/Pékin, Paris/Париж
Organizations	NATO/НАТО/OTAN, United Nations/ООН/ONU
Common terms	mammal/млекопитающее/mammifère, government/правительство/gouvernement

Dynamic Equivalence Learning

Beyond the static table, akh-medu can discover new equivalences dynamically using three learning strategies. Discovered mappings persist across sessions via the durable store (redb).

4-Tier Resolution Order

When resolving an entity, the resolver checks in this order:

Runtime aliases — hot in-memory mappings added during the current session
Learned equivalences — persisted mappings discovered by learning strategies
Static equivalence table — ~120 hand-curated entries compiled into the binary
Fallback — return the surface form unchanged

Learned equivalences override the static table, allowing domain-specific corrections.

Strategy 1: KG Structural Fingerprints

Two entities in different languages that share identical relational patterns are likely the same concept.

How it works:

For each unresolved entity e, collect its "relational fingerprint": the set of (predicate, resolved_object) tuples from the knowledge graph
For each already-resolved entity c, collect its fingerprint too
If e and c share >= 1 fingerprint tuple, propose e -> c with confidence based on the overlap ratio

Example: If the KG has ("собака", is-a, "млекопитающее") and ("Dog", is-a, "mammal"), and "млекопитающее" already resolves to "mammal", then "собака" structurally maps to "Dog".

Strategy 2: VSA Similarity

Hypervector encodings capture distributional similarity. For Latin-script near-matches (programme/program, organisation/organization) and transliterated names, VSA similarity catches what string matching misses. See VSA Similarity Algorithms for a full explanation.

How it works:

For each unresolved entity, encode its label as a hypervector
Search item memory for the 5 nearest neighbors
For results above the similarity threshold (>= 0.65), check if the matched symbol resolves to a known canonical
If yes, propose the equivalence with confidence = similarity score

Strategy 3: Parallel Chunk Co-occurrence

When Eleutherios sends parallel translations of the same content, entities at corresponding positions are likely equivalent.

How it works:

Group chunks by shared chunk_id prefix (e.g., "doc1_en" and "doc1_ru" share prefix "doc1")
For each group with different languages, align entities by extraction order
For entities at the same index across languages, propose equivalence

Chunk ID convention: Use {document_id}_{language_code} format:

{"id": "report-ch3_en", "text": "The experiment was conducted in Moscow.", "language": "en"}
{"id": "report-ch3_ru", "text": "Эксперимент был проведён в Москве.", "language": "ru"}

Equivalence Sources

Each learned equivalence records how it was discovered:

Source	Description
`Static`	From the compiled-in equivalence table
`KgStructural`	Discovered by matching KG relational fingerprints
`VsaSimilarity`	Discovered by hypervector distributional similarity
`CoOccurrence`	Discovered from parallel chunk position correlation
`Manual`	User-added via CLI or API import

Managing Equivalences via CLI

# List all learned equivalences
./target/release/akh-medu equivalences list

# Show counts by source (kg-structural, vsa-similarity, co-occurrence, manual)
./target/release/akh-medu equivalences stats

# Run all learning strategies on current engine state
./target/release/akh-medu equivalences learn

# Export to JSON for manual curation
./target/release/akh-medu equivalences export > /tmp/equivalences.json

# Import curated equivalences
./target/release/akh-medu equivalences import < /tmp/equivalences.json

Managing Equivalences via HTTP

# List all learned equivalences
curl -s http://localhost:8200/equivalences | python3 -m json.tool

# Show statistics
curl -s http://localhost:8200/equivalences/stats | python3 -m json.tool
# => {"runtime_aliases": 0, "learned_total": 12, "kg_structural": 3, "vsa_similarity": 4, "co_occurrence": 2, "manual": 3}

# Trigger learning
curl -s -X POST http://localhost:8200/equivalences/learn | python3 -m json.tool
# => {"discovered": 5, "total_learned": 17}

# Import curated equivalences
curl -X POST http://localhost:8200/equivalences/import \
  -H 'Content-Type: application/json' \
  -d '[
    {"canonical": "mitochondria", "surface": "митохондрия", "source_language": "ru", "confidence": 0.95, "source": "Manual"},
    {"canonical": "cell membrane", "surface": "клеточная мембрана", "source_language": "ru", "confidence": 0.95, "source": "Manual"}
  ]'
# => {"imported": 2}

Seeding Domain-Specific Equivalences

For specialized corpora (medical terminology, legal terms, engineering jargon), seed the equivalence table with domain-specific terms before processing:

1. Create an equivalence file (/tmp/domain-terms.json):

[
  {"canonical": "mitochondria", "surface": "митохондрия", "source_language": "ru", "confidence": 1.0, "source": "Manual"},
  {"canonical": "mitochondria", "surface": "mitochondrie", "source_language": "fr", "confidence": 1.0, "source": "Manual"},
  {"canonical": "mitochondria", "surface": "mitocondria", "source_language": "es", "confidence": 1.0, "source": "Manual"},
  {"canonical": "photosynthesis", "surface": "фотосинтез", "source_language": "ru", "confidence": 1.0, "source": "Manual"},
  {"canonical": "photosynthesis", "surface": "photosynthèse", "source_language": "fr", "confidence": 1.0, "source": "Manual"},
  {"canonical": "photosynthesis", "surface": "fotosíntesis", "source_language": "es", "confidence": 1.0, "source": "Manual"},
  {"canonical": "enzyme", "surface": "фермент", "source_language": "ru", "confidence": 1.0, "source": "Manual"},
  {"canonical": "enzyme", "surface": "إنزيم", "source_language": "ar", "confidence": 1.0, "source": "Manual"}
]

2. Import before processing:

# Via CLI
./target/release/akh-medu equivalences import < /tmp/domain-terms.json

# Via HTTP
curl -X POST http://localhost:8200/equivalences/import \
  -H 'Content-Type: application/json' \
  -d @/tmp/domain-terms.json

3. Process your corpus — imported terms will be resolved:

echo '{"text": "Митохондрия является органоидом клетки."}' | \
  ./target/release/akh-medu preprocess --format jsonl
# entities[0].canonical_name will be "mitochondria" instead of "митохондрия"

4. Export/import workflow for iterative curation:

# Process a batch, let learning discover new equivalences
cat /tmp/corpus.jsonl | ./target/release/akh-medu preprocess --format jsonl > /dev/null
./target/release/akh-medu equivalences learn

# Export for review
./target/release/akh-medu equivalences export > /tmp/review.json

# Edit review.json manually (fix mistakes, add missing terms)
# Then re-import
./target/release/akh-medu equivalences import < /tmp/review.json

VSA Similarity Algorithms

akh-medu uses Vector Symbolic Architecture (VSA) — also known as Hyperdimensional Computing — to encode symbols, detect similar entities, and support fuzzy matching across the pipeline.

How Hypervectors Work

A hypervector is a high-dimensional binary vector (default: 10,000 dimensions) where each dimension is a single bit interpreted as +1 or -1 (bipolar encoding).

Key properties:

Property	Description
Dimension	10,000 bits by default (`Dimension::DEFAULT`). 1,000 bits for tests (`Dimension::TEST`).
Encoding	Bipolar: each bit is +1 or -1. Stored as packed bytes.
Random vectors	Two random hypervectors have ~0.5 similarity (uncorrelated).
Deterministic	The same symbol ID always produces the same hypervector.

Encoding Strategies

akh-medu uses four encoding functions, each serving a different purpose:

1. Symbol Encoding (`encode_symbol`)

Maps a SymbolId to a deterministic hypervector using seeded random generation. The same ID always produces the same vector:

SymbolId(42) → deterministic random HyperVec (seeded with 42)
SymbolId(43) → different deterministic random HyperVec (seeded with 43)

This is the base encoding — every symbol in the system gets one.

2. Token Encoding (`encode_token`)

Maps a text string to a hypervector by hashing it to a synthetic SymbolId:

"dog"  → hash("dog")  → synthetic SymbolId → deterministic HyperVec
"dogs" → hash("dogs") → different SymbolId  → different HyperVec

Token encoding is deterministic: the same word always produces the same vector. However, "dog" and "dogs" produce unrelated vectors (no morphological awareness).

3. Label Encoding (`encode_label`)

For multi-word labels, encodes each word separately and bundles them:

"big red dog" → bundle(encode_token("big"), encode_token("red"), encode_token("dog"))

The resulting vector is similar to each component (similarity > 0.55) but identical to none. This captures set-like semantics: "big red dog" is similar to anything containing "big", "red", or "dog".

Single-word labels fall through to encode_token directly.

4. Role-Filler Encoding (`encode_role_filler`)

Encodes structured knowledge by binding a role vector with a filler vector:

bind(encode_symbol(color_id), encode_symbol(blue_id))
  → "the color is blue" as a single vector

Binding (XOR) produces a vector that is dissimilar to both inputs but can be decoded: unbind(bound, color_id) ≈ blue_id.

5. Sequence Encoding (`encode_sequence`)

Captures order information using permutation:

[A, B, C] → bundle(permute(A, 2), permute(B, 1), C)

Each element is shifted by its distance from the end, preserving positional information. [A, B, C] produces a different vector than [C, B, A].

Similarity Search

Item Memory provides fast approximate nearest-neighbor (ANN) search using the HNSW algorithm with Hamming distance:

Query: encode_token("programme")
Search: item_memory.search(&query_vec, k=5)
Results: [
  { symbol: "program",   similarity: 0.72 },  // near-match
  { symbol: "procedure", similarity: 0.53 },  // unrelated
  { symbol: "project",   similarity: 0.51 },  // unrelated
]

The search is sub-linear — HNSW provides O(log n) search time even with millions of vectors.

How ANN search works internally:

The query vector is encoded as packed u32 words for HNSW compatibility
HNSW navigates its layered graph using Hamming distance (bitwise XOR + popcount)
Raw Hamming distances are converted to similarity: similarity = 1.0 - (hamming / total_bits)
Results are sorted by descending similarity

Similarity Thresholds

Threshold	Used For	Meaning
~0.50	Random baseline	Two unrelated vectors
>= 0.60	Fuzzy token resolution	Lexer resolves unknown tokens to known symbols
>= 0.65	VSA equivalence learning	Dynamic equivalence strategy 2 threshold
>= 0.72	High-confidence match	Near-certain the same entity (spelling variants)
1.00	Identity	Exact same vector

How VSA Is Used in the Pipeline

VSA operates at three points in the akh-medu pipeline:

1. Lexer — Fuzzy Token Resolution

When the lexer encounters an unknown word, it encodes it as a hypervector and searches item memory for similar known symbols:

Input:  "programm" (misspelling)
Lookup: registry.lookup("programm") → None
Fuzzy:  item_memory.search(encode_token("programm"), k=3)
Match:  "program" with similarity 0.68 → Resolution::Fuzzy

The threshold is 0.60 (DEFAULT_FUZZY_THRESHOLD). Below this, the token stays Unresolved.

2. Dynamic Equivalence Learning — Strategy 2

When learn_equivalences() is called, the VSA strategy encodes unresolved entity labels and searches for similar resolved entities:

Unresolved: "organisation" (British spelling)
Search:     5 nearest neighbors in item memory
Match:      "organization" (American spelling), similarity 0.71
Result:     LearnedEquivalence { surface: "organisation", canonical: "organization", confidence: 0.71 }

The threshold is 0.65 (higher than lexer fuzzy matching for cross-lingual safety).

3. Knowledge Graph — Similarity Queries

The engine exposes similarity search for ad-hoc queries:

# Find symbols similar to a given symbol
# (Used internally by the agent module and available via Engine API)
engine.search_similar_to(symbol_id, top_k=10)

Extending with New Languages

Adding a new language requires changes in three files and takes ~30 minutes. Here is the complete procedure using German as an example.

Step 1: Add the Language Variant

File: src/grammar/lexer.rs

Add the new variant to the Language enum:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Default, Serialize, Deserialize)]
pub enum Language {
    English,
    Russian,
    Arabic,
    French,
    Spanish,
    German,       // <-- add here
    #[default]
    Auto,
}
}

Update the three methods on Language:

#![allow(unused)]
fn main() {
impl Language {
    pub fn bcp47(&self) -> &'static str {
        match self {
            // ...existing arms...
            Language::German => "de",
        }
    }

    pub fn from_code(code: &str) -> Option<Self> {
        match code.to_lowercase().as_str() {
            // ...existing arms...
            "de" | "german" => Some(Language::German),
            _ => None,
        }
    }
}

impl std::fmt::Display for Language {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            // ...existing arms...
            Language::German => write!(f, "German"),
        }
    }
}
}

Step 2: Add the Language Lexicon

File: src/grammar/lexer.rs

A lexicon defines five components for each language:

Component	Purpose	Example (German)
Void words	Semantically empty articles/determiners to strip	der, die, das, ein, eine
Relational patterns	Multi-word phrases that map to canonical predicates	"ist ein" → `is-a`
Question words	Trigger query parsing mode	was, wer, wo, wann
Goal verbs	Identify goal-setting input	finden, entdecken, erforschen
Commands	Special control patterns	help, status

Add an arm to Lexicon::for_language() and create the lexicon constructor:

#![allow(unused)]
fn main() {
impl Lexicon {
    pub fn for_language(lang: Language) -> Self {
        match lang {
            // ...existing arms...
            Language::German => Self::default_german(),
        }
    }

    /// Build the German lexicon.
    pub fn default_german() -> Self {
        let void_words = vec![
            "der".into(), "die".into(), "das".into(),
            "ein".into(), "eine".into(), "eines".into(),
            "dem".into(), "den".into(), "des".into(),
        ];

        // Map German surface forms to canonical predicates.
        // IMPORTANT: Sort longest patterns first for greedy matching.
        let relational_patterns = vec![
            // 4+ word patterns first
            rel("befindet sich in", "located-in", 0.90),
            rel("ist ähnlich wie", "similar-to", 0.85),
            rel("ist zusammengesetzt aus", "composed-of", 0.85),
            rel("ist Teil von", "part-of", 0.90),
            rel("hängt ab von", "depends-on", 0.85),
            // 2-word patterns
            rel("ist ein", "is-a", 0.90),
            rel("ist eine", "is-a", 0.90),
            rel("hat ein", "has-a", 0.85),
            rel("hat eine", "has-a", 0.85),
            // 1-word patterns last
            rel("enthält", "contains", 0.85),
            rel("verursacht", "causes", 0.85),
            rel("ist", "is-a", 0.80),
            rel("hat", "has-a", 0.80),
        ];

        let question_words = vec![
            "was".into(), "wer".into(), "wo".into(), "wann".into(),
            "wie".into(), "warum".into(), "welcher".into(), "welche".into(),
        ];

        let goal_verbs = vec![
            "finden".into(), "entdecken".into(), "erforschen".into(),
            "analysieren".into(), "bestimmen".into(), "identifizieren".into(),
        ];

        // Commands stay English (CLI is English)
        let commands = vec![
            ("help".into(), CommandKind::Help),
            ("?".into(), CommandKind::Help),
            ("status".into(), CommandKind::ShowStatus),
        ];

        Self { void_words, relational_patterns, question_words, goal_verbs, commands }
    }
}
}

Writing relational patterns — guidelines:

Patterns are matched greedily, longest first. Always sort multi-word patterns before shorter ones.
Map every pattern to one of the 9 canonical predicates: is-a, has-a, contains, located-in, causes, part-of, composed-of, similar-to, depends-on.
Assign confidence scores between 0.80 and 0.90:
- 0.90 for unambiguous patterns ("ist ein" is always classification)
- 0.85 for patterns with occasional false positives
- 0.80 for very short/ambiguous patterns (bare "ist" could be copula or identity)
Include accent-stripped variants for languages with diacritics (the existing French and Spanish lexicons do this).

Step 3: Add Detection Support

File: src/grammar/detect.rs

This step makes --language auto work for the new language.

For non-Latin scripts (e.g., Chinese, Japanese, Korean, Hindi), add a codepoint range check in the detect_language() function:

#![allow(unused)]
fn main() {
// In the character-counting loop:
match c {
    // CJK Unified Ideographs
    '\u{4E00}'..='\u{9FFF}' => { cjk += 1; }
    // Devanagari
    '\u{0900}'..='\u{097F}' => { devanagari += 1; }
    // ...
}

// After the loop, add a block like the Cyrillic/Arabic ones:
if cjk_ratio > 0.5 {
    return DetectionResult {
        language: Language::Chinese,
        confidence: (0.70 + cjk_ratio * 0.25).min(0.95),
    };
}
}

For Latin-script languages (like German), add word frequency markers to detect_latin_language():

#![allow(unused)]
fn main() {
const GERMAN_MARKERS: &[&str] = &[
    "der", "die", "das", "ist", "und", "ein", "eine", "nicht",
    "mit", "auf", "für", "von", "sich", "den", "dem", "auch",
    "werden", "haben", "sind", "wird", "kann", "nach", "über",
];
}

Then add a german_score accumulator alongside english_score, french_score, spanish_score, and include German-specific diacritics (ä, ö, ü, ß):

#![allow(unused)]
fn main() {
let has_german_diacritics = lower.contains('ä')
    || lower.contains('ö')
    || lower.contains('ü')
    || lower.contains('ß');

if has_german_diacritics {
    german_score += 2.0;
}
}

Finally, include German in the winner selection:

#![allow(unused)]
fn main() {
let max_score = en_norm.max(fr_norm).max(es_norm).max(de_norm);
// ... pick the winner with the highest score
}

Step 4: Add Equivalences

File: src/grammar/equivalences.rs

Add the new language's surface forms to existing entries:

#![allow(unused)]
fn main() {
Equivalence { canonical: "Germany", aliases: &["Allemagne", "Alemania", "Германия", "ألمانيا", "Deutschland"] },
//                                                                                              ^^^^^^^^^^ add
}

And add new entries for language-specific terms:

#![allow(unused)]
fn main() {
Equivalence { canonical: "psyche", aliases: &["Psyche", "психика", "نفس", "psyché", "psique"] },
}

You can also import equivalences at runtime instead of modifying the source:

echo '[
  {"canonical": "Germany", "surface": "Deutschland", "source_language": "de", "confidence": 1.0, "source": "Manual"}
]' | ./target/release/akh-medu equivalences import

Step 5: Rebuild and Test

# Run tests
cargo test --lib

# Build release binary
cargo build --release

# Test with sample text (explicit language)
echo '{"text":"Der Archetyp ist ein universelles Muster."}' \
  | ./target/release/akh-medu preprocess --format jsonl --language de

# Test auto-detection (if Step 3 was implemented)
echo '{"text":"Die Zelle enthält Mitochondrien und andere Organellen."}' \
  | ./target/release/akh-medu preprocess --format jsonl

Checklist for New Languages

Add variant to Language enum in lexer.rs
Update bcp47(), from_code(), Display on Language
Add arm to Lexicon::for_language()
Create Lexicon::default_LANG() with:
- Void words (articles, determiners)
- Relational patterns (sorted longest-first, mapping to canonical predicates)
- Question words
- Goal verbs
- Commands (usually keep English)
(Optional) Add detection markers in detect.rs for Language::Auto support
(Optional) Add cross-lingual entries in equivalences.rs
Run cargo test --lib (must pass, zero warnings)
Test with sample text: echo '{"text":"..."}' | akh-medu preprocess --format jsonl --language XX

CLI Reference

Global Options

Option	Description	Default
`--data-dir <PATH>`	Data directory for persistent storage	None (memory-only)
`--dimension <N>`	Hypervector dimension	10000
`--language <CODE>`	Default language for parsing	auto

Commands

`init`

Initialize a new akh-medu data directory:

./target/release/akh-medu --data-dir /tmp/akh-data init

`preprocess`

Pre-process text chunks from stdin:

# JSONL mode (streaming, one object per line)
cat chunks.jsonl | ./target/release/akh-medu preprocess --format jsonl

# JSON mode (batch, array on stdin)
cat chunks.json | ./target/release/akh-medu preprocess --format json

# With explicit language
cat chunks.jsonl | ./target/release/akh-medu preprocess --format jsonl --language ru

Option	Description	Default
`--format <jsonl\|json>`	Input/output format	jsonl
`--language <CODE>`	Override language detection	auto

`ingest`

Ingest structured data into the knowledge graph:

# JSON triples
./target/release/akh-medu --data-dir /tmp/akh-data ingest --file /path/to/triples.json

# CSV (subject, predicate, object format)
./target/release/akh-medu --data-dir /tmp/akh-data ingest --file /path/to/data.csv --format csv --csv-format spo

# CSV (entity format: column headers are predicates)
./target/release/akh-medu --data-dir /tmp/akh-data ingest --file /path/to/data.csv --format csv --csv-format entity

# Plain text
./target/release/akh-medu --data-dir /tmp/akh-data ingest --file /path/to/text.txt --format text --max-sentences 100

`equivalences`

Manage cross-lingual entity equivalences:

./target/release/akh-medu equivalences list      # Show all learned equivalences
./target/release/akh-medu equivalences stats     # Counts by source
./target/release/akh-medu equivalences learn     # Run learning strategies
./target/release/akh-medu equivalences export    # Export to JSON (stdout)
./target/release/akh-medu equivalences import    # Import from JSON (stdin)

`grammar`

Grammar system commands:

# List available grammar archetypes
./target/release/akh-medu grammar list

# Parse prose to abstract syntax
./target/release/akh-medu grammar parse "Dogs are mammals"

# Parse and ingest into knowledge graph
./target/release/akh-medu --data-dir /tmp/akh-data grammar parse "Dogs are mammals" --ingest

# Linearize a triple (generate prose from structured data)
./target/release/akh-medu grammar linearize --subject Dog --predicate is-a --object mammal

# Compare a triple to knowledge graph
./target/release/akh-medu --data-dir /tmp/akh-data grammar compare --subject Dog --predicate is-a --object mammal

# Load a custom TOML grammar
./target/release/akh-medu grammar load --file /path/to/grammar.toml

# Render an entity's knowledge graph neighborhood
./target/release/akh-medu --data-dir /tmp/akh-data grammar render --entity Dog

HTTP API Reference

Server: Listens on 0.0.0.0:8200

Build/Run:

cargo build --release --features server
./target/release/akh-medu-server

Environment:

RUST_LOG — Logging level (default: info,egg=warn,hnsw_rs=warn)

`GET /health`

Health check.

Response:

{
  "status": "ok",
  "version": "0.1.0",
  "supported_languages": ["en", "ru", "ar", "fr", "es", "auto"]
}

`GET /languages`

List supported languages with pattern counts.

Response:

{
  "languages": [
    {"code": "en", "name": "English", "pattern_count": 21},
    {"code": "ru", "name": "Russian", "pattern_count": 13},
    {"code": "ar", "name": "Arabic", "pattern_count": 11},
    {"code": "fr", "name": "French", "pattern_count": 16},
    {"code": "es", "name": "Spanish", "pattern_count": 14}
  ]
}

`POST /preprocess`

Pre-process text chunks.

Request:

{
  "chunks": [
    {"id": "optional-id", "text": "Text to process.", "language": "en"}
  ]
}

Fields: id and language are optional. Omit language for auto-detection.

Response:

{
  "results": [
    {
      "chunk_id": "optional-id",
      "source_language": "en",
      "detected_language_confidence": 0.80,
      "entities": [...],
      "claims": [...],
      "abs_trees": [...]
    }
  ],
  "processing_time_ms": 0
}

`GET /equivalences`

List all learned equivalences.

Response:

[
  {
    "canonical": "Moscow",
    "surface": "москва",
    "source_language": "ru",
    "confidence": 0.95,
    "source": "Manual"
  }
]

`GET /equivalences/stats`

Equivalence statistics by source.

Response:

{
  "runtime_aliases": 0,
  "learned_total": 12,
  "kg_structural": 3,
  "vsa_similarity": 4,
  "co_occurrence": 2,
  "manual": 3
}

`POST /equivalences/learn`

Trigger all learning strategies.

Response:

{
  "discovered": 5,
  "total_learned": 17
}

`POST /equivalences/import`

Bulk import equivalences.

Request: Array of LearnedEquivalence objects (see GET /equivalences for format).

Response:

{
  "imported": 3
}

Integration Patterns

Python Integration

Three integration approaches, from simplest to most performant:

Subprocess (JSON batch)

Best for batch processing where latency per call isn't critical:

import subprocess
import json
import os

def preprocess_chunks(chunks: list[dict]) -> list[dict]:
    """Send chunks through akh-medu for multilingual pre-processing."""
    input_json = json.dumps(chunks)
    result = subprocess.run(
        ["./target/release/akh-medu", "preprocess", "--format", "json"],
        input=input_json,
        capture_output=True,
        text=True,
        env={**os.environ, "RUST_LOG": "error"},
    )
    if result.returncode != 0:
        raise RuntimeError(f"akh-medu failed: {result.stderr}")
    response = json.loads(result.stdout)
    return response["results"]

# Usage
results = preprocess_chunks([
    {"id": "1", "text": "The cell membrane contains phospholipids."},
    {"id": "2", "text": "Клеточная мембрана содержит фосфолипиды."},
])

Subprocess (JSONL streaming)

Best for large corpora where you want to process chunks as they arrive:

import subprocess
import json
import os

def preprocess_stream(chunks):
    """Stream chunks through akh-medu one at a time."""
    proc = subprocess.Popen(
        ["./target/release/akh-medu", "preprocess", "--format", "jsonl"],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
        text=True,
        env={**os.environ, "RUST_LOG": "error"},
    )
    for chunk in chunks:
        proc.stdin.write(json.dumps(chunk) + "\n")
        proc.stdin.flush()
        line = proc.stdout.readline()
        if line:
            yield json.loads(line)
    proc.stdin.close()
    proc.wait()

# Usage
for result in preprocess_stream(chunks_iterator):
    process_result(result)

HTTP Client

Best for long-running services where the akh-medu server stays up:

import requests

AKH_MEDU_URL = "http://localhost:8200"

def preprocess_http(chunks: list[dict]) -> list[dict]:
    """Call the akh-medu HTTP server."""
    resp = requests.post(
        f"{AKH_MEDU_URL}/preprocess",
        json={"chunks": chunks},
        timeout=30,
    )
    resp.raise_for_status()
    return resp.json()["results"]

def learn_equivalences() -> dict:
    """Trigger equivalence learning."""
    resp = requests.post(f"{AKH_MEDU_URL}/equivalences/learn", timeout=60)
    resp.raise_for_status()
    return resp.json()

def import_equivalences(equivs: list[dict]) -> int:
    """Import domain-specific equivalences."""
    resp = requests.post(
        f"{AKH_MEDU_URL}/equivalences/import",
        json=equivs,
        timeout=30,
    )
    resp.raise_for_status()
    return resp.json()["imported"]

# Usage
results = preprocess_http([
    {"id": "1", "text": "CERN is located in Geneva."},
    {"id": "2", "text": "Le CERN se trouve dans Genève."},
])

Eleutherios Mapping

How akh-medu output maps to Eleutherios concepts:

akh-medu Output	Eleutherios Use
`source_language`	Route to language-specific extraction models
`entities[].canonical_name`	Seed entity for Neo4j lookup/creation
`entities[].entity_type`	Map to Eleutherios entity taxonomy
`claims[].predicate`	Pre-classified relation (skip LLM for simple facts)
`claims[].claim_type`	Route to the appropriate extraction dimension
`claims[].confidence`	Weight against LLM confidence for ensemble scoring
`abs_trees[]`	Raw parse trees for custom post-processing

Architecture Notes

Performance

Grammar parsing runs in < 1ms per chunk — 196-chunk batch in 0.8s (~300 chunks/sec)
Batches of 20 chunks process in ~55ms (measured with HTTP endpoint)
No external NLP dependencies, no model loading, no GPU required
The HTTP server handles concurrent requests via tokio with a RwLock<Engine>
Memory footprint: ~50MB for the engine with default 10,000-dimension hypervectors
HNSW ANN search: O(log n) for similarity queries

What the Pre-Processor Does NOT Do

Coreference resolution: "He studied in Zurich" — "He" is not resolved
Complex clause parsing: Subordinate clauses, relative clauses, passives
Morphological analysis: No lemmatization (Russian "психики" stays inflected)
Named Entity Recognition: Entity types are inferred from predicate context only

These are deliberate scope limits. Eleutherios's LLM pipeline handles them in its enrichment stage. The pre-processor's job is to give it a clean head start with the structural relations that grammar can catch deterministically.

The `abs_trees` Field

The output includes full AbsTree abstract syntax trees for consumers that want the raw parse. This is useful for:

Debugging parse quality
Custom post-processing beyond the entity/claim extraction
Feeding back into akh-medu's knowledge graph via Engine::commit_abs_tree()

Persistence

When started with --data-dir, the engine persists:

Symbol registry (all known symbols and their metadata)
Knowledge graph triples
Learned equivalences (via equiv: prefix in redb)
Agent session state (working memory, cycle count)

Data is stored in a 3-tier architecture:

Hot tier: In-memory DashMap for fast concurrent access
Warm tier: Memory-mapped files for large read-heavy data
Durable tier: redb (ACID transactions) for data that must survive restarts

Relational Pattern Reference

English (21 patterns)

Pattern	Canonical	Confidence	Example
"is similar to"	`similar-to`	0.85	Graphene is similar to carbon nanotubes
"is located in"	`located-in`	0.90	CERN is located in Geneva
"is composed of"	`composed-of`	0.85	Water is composed of hydrogen and oxygen
"is part of"	`part-of`	0.90	The cortex is part of the brain
"is made of"	`composed-of`	0.85	Steel is made of iron and carbon
"depends on"	`depends-on`	0.85	Photosynthesis depends on sunlight
"belongs to"	`part-of`	0.85	This enzyme belongs to the kinase family
"is a" / "is an"	`is-a`	0.90	DNA is a nucleic acid
"are a" / "are an"	`is-a`	0.85	Mitochondria are a type of organelle
"has a" / "has an"	`has-a`	0.85	The cell has a nucleus
"have a"	`has-a`	0.85	Eukaryotes have a membrane-bound nucleus
"are"	`is-a`	0.85	Proteins are macromolecules
"has" / "have"	`has-a`	0.85	Enzymes have active sites
"contains"	`contains`	0.85	The nucleus contains chromosomes
"causes"	`causes`	0.85	Radiation causes DNA damage
"implements"	`implements`	0.85	(code domain)
"defines"	`defines`	0.85	(code domain)

Russian (13 patterns)

Pattern	Canonical	Example
"является частью"	`part-of`	Кора является частью головного мозга
"находится в"	`located-in`	Институт находится в Женеве
"состоит из"	`composed-of`	Вода состоит из водорода и кислорода
"зависит от"	`depends-on`	Фотосинтез зависит от солнечного света
"похож на"	`similar-to`	Графен похож на углеродные нанотрубки
"содержит в себе"	`contains`	Ядро содержит в себе хромосомы
"является"	`is-a`	ДНК является нуклеиновой кислотой
"имеет"	`has-a`	Клетка имеет ядро
"содержит"	`contains`	Ядро содержит хромосомы
"вызывает"	`causes`	Радиация вызывает повреждение ДНК
"определяет"	`defines`	(определения)
"реализует"	`implements`	(код)
"это"	`is-a`	Митохондрия это органоид клетки

Arabic (11 patterns)

Pattern	Canonical	Example
"يحتوي على"	`contains`	النواة يحتوي على الكروموسومات
"يقع في"	`located-in`	المعهد يقع في جنيف
"جزء من"	`part-of`	القشرة جزء من الدماغ
"يتكون من"	`composed-of`	الماء يتكون من الهيدروجين والأكسجين
"يعتمد على"	`depends-on`	التمثيل الضوئي يعتمد على ضوء الشمس
"هو" / "هي"	`is-a`	الحمض النووي هو حمض نووي
"لديه" / "لديها"	`has-a`	الخلية لديها نواة
"يسبب"	`causes`	الإشعاع يسبب تلف الحمض النووي
"يشبه"	`similar-to`	الجرافين يشبه أنابيب الكربون النانوية

French (16 patterns)

Pattern	Canonical	Example
"est similaire à" / "est similaire a"	`similar-to`	Le graphène est similaire aux nanotubes de carbone
"se trouve dans"	`located-in`	Le CERN se trouve dans Genève
"est composé de" / "est compose de"	`composed-of`	L'eau est composée d'hydrogène et d'oxygène
"fait partie de"	`part-of`	Le cortex fait partie du cerveau
"dépend de" / "depend de"	`depends-on`	La photosynthèse dépend de la lumière du soleil
"est un" / "est une"	`is-a`	L'ADN est un acide nucléique
"a un" / "a une"	`has-a`	La cellule a un noyau
"contient"	`contains`	Le noyau contient des chromosomes
"cause"	`causes`	Le rayonnement cause des dommages à l'ADN
"définit" / "definit"	`defines`	(définitions)

Spanish (14 patterns)

Pattern	Canonical	Example
"es similar a"	`similar-to`	El grafeno es similar a los nanotubos de carbono
"se encuentra en"	`located-in`	El CERN se encuentra en Ginebra
"está compuesto de" / "esta compuesto de"	`composed-of`	El agua está compuesta de hidrógeno y oxígeno
"es parte de"	`part-of`	El córtex es parte del cerebro
"depende de"	`depends-on`	La fotosíntesis depende de la luz solar
"es un" / "es una"	`is-a`	El ADN es un ácido nucleico
"tiene un" / "tiene una"	`has-a`	La célula tiene un núcleo
"contiene"	`contains`	El núcleo contiene cromosomas
"causa"	`causes`	La radiación causa daño al ADN
"tiene"	`has-a`	Los eucariotas tienen núcleo
"define"	`defines`	(definiciones)

akh-medu Component Status

Component	Status	Module	Crates Used	Notes
Error types	Implemented	`error`	miette, thiserror	Rich diagnostics with error codes and help text
Symbol system	Implemented	`symbol`	serde	SymbolId (NonZeroU64), SymbolKind, SymbolMeta, AtomicSymbolAllocator
SIMD kernels	Implemented	`simd`	(std::arch)	VsaKernel trait, Generic fallback, AVX2 acceleration
Memory store	Implemented	`store::mem`	dashmap	Hot-tier concurrent in-memory storage
Mmap store	Implemented	`store::mmap`	memmap2	Warm-tier memory-mapped file storage with header/index
Durable store	Implemented	`store::durable`	redb	Cold-tier ACID key-value storage
Tiered store	Implemented	`store::mod`	—	Composes hot/warm/cold with auto-promotion
HyperVec	Implemented	`vsa`	—	Configurable-dimension hypervector type
VSA operations	Implemented	`vsa::ops`	—	bind, unbind, bundle, permute, similarity, cosine
Symbol encoding	Implemented	`vsa::encode`	rand	Deterministic symbol→vector, sequence, role-filler
Item memory	Implemented	`vsa::item_memory`	hnsw_rs, anndists, dashmap	ANN search with HNSW, concurrent access
Knowledge graph	Implemented	`graph::index`	petgraph, dashmap	In-memory digraph with dual indexing
SPARQL store	Implemented	`graph::sparql`	oxigraph	Persistent RDF with SPARQL queries
Graph traversal	Implemented	`graph::traverse`	—	BFS with depth/predicate/confidence filtering
Reasoning (egg)	Implemented	`reason`	egg	AkhLang, built-in rewrite rules, equality saturation
Engine facade	Implemented	`engine`	—	Top-level API, owns all subsystems
CLI	Implemented	`main`	clap, miette	init, ingest, query, info subcommands
Provenance	Implemented	`provenance`	redb, bincode	Persistent ledger with 4 redb tables, multi-index (derived/source/kind), batch ops
Inference	Implemented	`infer`	egg	Spreading activation, backward chaining, superposition reasoning, VSA recovery, e-graph verification
Pipeline	Implemented	`pipeline`	egg	Linear stage pipeline (Retrieve → Infer → Reason → Extract), built-in query/ingest pipelines
Skills	Implemented	`skills`	egg, serde_json	MoE-style skillpacks with Cold/Warm/Hot lifecycle, memory budgets, dynamic rule compilation
Graph analytics	Implemented	`graph::analytics`	petgraph	Degree centrality, PageRank, strongly connected components
Agent	Implemented	`agent`	—	OODA loop, 9 tools, planning/reflection, session persistence, REPL mode
Autonomous cycle	Implemented	`autonomous`	—	Symbol grounding, superposition inference, confidence fusion, KG commit

CLI Reference

Global Options

Option	Description	Default
`--data-dir <PATH>`	Override default XDG workspace path	XDG default
`-w, --workspace <NAME>`	Workspace name	`default`
`--dimension <DIM>`	Hypervector dimension	`10000`
`--language <LANG>`	Default parsing language	`auto`

Commands

init

Initialize a new workspace with XDG directory structure.

akh-medu init
akh-medu -w my-project init

workspace

Manage workspaces.

akh-medu workspace list
akh-medu workspace create <NAME>
akh-medu workspace delete <NAME>
akh-medu workspace info <NAME>

seed

Manage seed packs.

akh-medu seed list              # List available packs
akh-medu seed apply <PACK>      # Apply a seed pack
akh-medu seed status            # Show applied seeds

ingest

Load triples from files.

akh-medu ingest --file <PATH>
akh-medu ingest --file <PATH> --format csv --csv-format spo
akh-medu ingest --file <PATH> --format csv --csv-format entity
akh-medu ingest --file <PATH> --format text --max-sentences 100

Option	Description	Default
`--file <PATH>`	Input file path	Required
`--format <FMT>`	`json`, `csv`, `text`	`json`
`--csv-format <FMT>`	`spo` or `entity`	`spo`
`--max-sentences <N>`	Max sentences for text format	unlimited

bootstrap

Load bundled skills, run grounding, and run inference.

akh-medu bootstrap

query

Spreading-activation inference.

akh-medu query --seeds "Dog,Cat" --depth 2 --top-k 10

Option	Description	Default
`--seeds <LIST>`	Comma-separated seed symbols	Required
`--depth <N>`	Expansion depth	`1`
`--top-k <N>`	Max results	`10`

traverse

BFS graph traversal.

akh-medu traverse --seeds Dog --max-depth 2
akh-medu traverse --seeds Dog --predicates is-a --format json

Option	Description	Default
`--seeds <LIST>`	Starting symbols	Required
`--max-depth <N>`	Traversal depth	`2`
`--predicates <LIST>`	Filter by predicates	All
`--min-confidence <F>`	Minimum edge confidence	`0.0`
`--format <FMT>`	`text` or `json`	`text`

sparql

Run SPARQL queries.

akh-medu sparql "SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10"
akh-medu sparql --file query.sparql

reason

Simplify expressions via e-graph rewriting.

akh-medu reason --expr "unbind(bind(Dog, is-a), is-a)"
akh-medu reason --expr "..." --verbose

search

Find similar symbols via VSA.

akh-medu search --symbol Dog --top-k 5

analogy

A:B :: C:? analogical reasoning.

akh-medu analogy --a King --b Man --c Queen --top-k 5

filler

Recover role-filler for (subject, predicate) pairs.

akh-medu filler --subject Dog --predicate is-a --top-k 5

info

Show engine statistics.

akh-medu info

symbols

List and inspect symbols.

akh-medu symbols list
akh-medu symbols show Dog
akh-medu symbols show 42

export

Export engine data.

akh-medu export symbols
akh-medu export triples
akh-medu export provenance --symbol Dog

skill

Manage skill packs.

akh-medu skill list
akh-medu skill load <NAME>
akh-medu skill unload <NAME>
akh-medu skill info <NAME>
akh-medu skill scaffold <NAME>     # Create a new skill template

pipeline

Run processing pipelines.

akh-medu pipeline query --seeds "Dog"
akh-medu pipeline run --stages retrieve,infer,reason --infer-depth 3

analytics

Graph analytics.

akh-medu analytics degree --top-k 10
akh-medu analytics pagerank --top-k 10
akh-medu analytics components
akh-medu analytics path --from Dog --to Cat

render

Hieroglyphic notation rendering.

akh-medu render --entity Dog
akh-medu render --entity Dog --depth 3
akh-medu render --all
akh-medu render --legend
akh-medu render --no-color

grammar

Bidirectional grammar system.

akh-medu grammar list
akh-medu grammar parse "Dogs are mammals"
akh-medu grammar parse "Dogs are mammals" --ingest
akh-medu grammar linearize --subject Dog --predicate is-a --object mammal
akh-medu grammar compare --subject Dog --predicate is-a --object mammal
akh-medu grammar load --file grammar.toml
akh-medu grammar render --entity Dog

chat

Interactive TUI.

akh-medu chat
akh-medu chat --skill my-skill
akh-medu chat --headless         # No TUI, plain text

preprocess

Pre-process text for the Eleutherios pipeline.

cat chunks.jsonl | akh-medu preprocess --format jsonl
cat chunks.json  | akh-medu preprocess --format json
cat chunks.jsonl | akh-medu preprocess --format jsonl --language ru

equivalences

Cross-lingual equivalence mappings.

akh-medu equivalences list
akh-medu equivalences stats
akh-medu equivalences learn
akh-medu equivalences export > equivs.json
akh-medu equivalences import < equivs.json

code-ingest

Ingest Rust source code.

akh-medu code-ingest --path src/
akh-medu code-ingest --path src/ --recursive --run-rules --enrich
akh-medu code-ingest --path src/main.rs --max-files 50

enrich

Semantic enrichment on existing code knowledge.

akh-medu enrich

docgen

Generate documentation from code.

akh-medu docgen --target Engine --format markdown --output docs/
akh-medu docgen --target Engine --format json --polish

Agent Commands

All agent commands are subcommands of akh-medu agent.

agent cycle

Run one OODA cycle.

akh-medu agent cycle --goal "Find mammals"
akh-medu agent cycle --goal "..." --priority 200

agent run

Run until completion or max cycles.

akh-medu agent run --goals "Discover planets" --max-cycles 20
akh-medu agent run --goals "..." --fresh    # Ignore persisted session

agent repl

Interactive agent REPL.

akh-medu agent repl
akh-medu agent repl --goals "Initial goal"
akh-medu agent repl --headless

REPL commands: p/plan, r/reflect, q/quit.

agent resume

Resume a persisted session.

akh-medu agent resume
akh-medu agent resume --max-cycles 50

agent chat

Agent chat mode.

akh-medu agent chat
akh-medu agent chat --max-cycles 10 --fresh --headless

agent tools

List registered tools.

akh-medu agent tools

agent consolidate

Trigger memory consolidation.

akh-medu agent consolidate

agent recall

Recall episodic memories.

akh-medu agent recall --query "mammals" --top-k 5

agent plan

Generate and display a goal plan.

akh-medu agent plan

agent reflect

Trigger reflection.

akh-medu agent reflect

agent infer

Run forward-chaining rules.

akh-medu agent infer --max-iterations 10 --min-confidence 0.5

agent gaps

Analyze knowledge gaps.

akh-medu agent gaps --goal "Explore biology" --max-gaps 10

agent schema

Discover schema patterns.

akh-medu agent schema

Keyboard shortcuts

akh-medu