skip to content
TJ Miller

Building Nova - The Foundation

/ 9 min read

Nova: What We’ve Built Here

Note: Nova has been renamed to Iris.

So here’s the thing—I wanted an AI companion that actually remembers who I am. Not just within a conversation, but across weeks and months. The kind of relationship where you don’t have to re-introduce yourself every time.

Nova is a multi-user AI companion with persistent memory, automatic knowledge extraction, and long-term narrative continuity. It’s built on Laravel 12, React 19, PostgreSQL with pgvector, and uses prism-php/prism for LLM integration.

Let’s dive in.


The Memory Problem

LLMs have a fundamental limitation: they forget everything the moment a conversation ends. Sure, you can stuff context into the system prompt, but that’s brittle. It doesn’t scale. And it definitely doesn’t feel like a relationship.

What I wanted was something closer to how humans actually remember each other. Not a perfect transcript—that’s not useful. More like… the important stuff. The emotional moments. The facts that matter.

So I built a memory system that does exactly that.


Memory Tools: What Nova Can Do

Nova has eight memory tools at its disposal. These aren’t exposed to users directly—they’re tools the AI agent calls when it decides to remember something.

The Core Four:

ToolPurpose
store_memoryImmediately persist new information
search_memorySemantically search memories before answering
update_memoryCorrect or revise when information changes
delete_memoryPermanently remove incorrect memories

The Utility Four:

ToolPurpose
list_recent_memoriesBrowse memories chronologically
get_important_memoriesRetrieve highest-priority memories
categorize_memoriesBrowse by category (personal, professional, hobbies, etc.)
get_memory_statsOverview of memory system status

Each memory gets classified with a type (fact, preference, goal, event, skill, relationship, habit, context), an importance score (0.0-1.0), optional tags, and a category. Plus a 1536-dimensional embedding vector for semantic search.

The StoreMemoryTool generates embeddings immediately via OpenAI’s ada-002. When Nova searches, it’s doing actual vector similarity comparisons—not keyword matching.


Two-Tier Memory Recall

Here’s where it gets interesting. I didn’t want Nova to just have memories—I wanted it to recall the right ones at the right time.

The MemoryRecallService implements a two-tier recall architecture.

Tier 1: Always-Injected Important Memories

These are the memories that matter most. Critical context that should never be forgotten.

  • Cached for 10 minutes to avoid hammering the database
  • Filtered by importance threshold (default: 0.80)
  • Limited to 5 memories per recall
  • Always injected into the system prompt, no matter what

If you told Nova your mother’s name or that you’re deathly allergic to shellfish, that’s Tier 1 material.

Tier 2: Semantic Search via LLM

This is the contextual layer. Before responding, Nova generates 2-3 search queries based on the recent conversation, then runs semantic similarity searches against the memory store.

// The scoring algorithm balances multiple factors
$score = ($semanticSimilarity * 0.45)
+ ($importance * 0.30)
+ ($recencyDecay * 0.15)
+ ($accessFrequency * 0.10)
+ $typeBonus;

Recency decay factors in how old the memory is (with a 90-day window). Access frequency rewards memories that keep proving useful. Type bonuses give extra weight to relationships (+0.10) and preferences/goals (+0.05).

The result? When you mention your sister, Nova recalls not just “User has a sister named Emma” but also the context around her—maybe that she lives in Portland, or that you two had a falling out last year.


Background Memory Extraction

Now, here’s the thing about the memory tools—they require Nova to decide to use them. But what about all the information that slips through?

That’s where MemoryExtractionService comes in. It runs as a background job after responses, automatically mining conversations for memory-worthy content.

The trigger: Every 10 messages, the system checks if there’s new learnable information.

The process:

  1. Build extraction context from recent conversation (last 75 messages)
  2. Call Claude with a specialized extraction prompt
  3. LLM returns structured output with up to 6 memories per extraction
  4. Generate embeddings and store with source=“extraction”

The extraction prompt (memory-extractor.blade.php) teaches Claude what’s actually worth extracting. Not “user asked about the weather”—but “user mentioned they’re training for a marathon” or “user shared they’ve been feeling anxious about work.”

Each extracted memory includes a reasoning field explaining why it was captured. This is mostly for debugging, but it’s also helped me tune the extraction quality over time.


Memory Consolidation: The Nightly Cleanup

Here’s a problem I ran into early: over time, you accumulate a lot of semantically similar memories. “User likes coffee” and “User prefers coffee in the morning” and “User mentioned needing their morning coffee.”

That’s noise. It’s also wasting precious context window space.

The MemoryConsolidationService runs nightly (or whenever you trigger it) and merges similar memories into denser, more useful versions.

How it works:

  1. Compute pairwise cosine similarities between recent memories
  2. Group memories that are >= 0.80 similar into clusters
  3. For each cluster, ask Claude: “Should these be merged?”
  4. If yes, generate a consolidated memory that preserves all key information

The consolidation prompt gives Claude explicit criteria:

  • MERGE when: Same fact rephrased, updates to existing info, reinforcements
  • KEEP_SEPARATE when: Genuinely distinct info, temporal context matters, different aspects of the same thing

Here’s the cool part—consolidated memories get their importance re-evaluated. If three low-importance memories about your morning routine all point to the same thing, the consolidated version might get bumped up.

The originals are soft-deleted and linked via consolidated_into, so you can always trace back.


The Summarization System

Memory handles facts. But what about the arc of a relationship?

That’s what the summarization system is for. It creates narrative summaries of conversation segments—not just what was said, but the emotional tone, the unresolved threads, the relationship dynamics.

Context Windows and Buffers

Here’s the problem with long conversations: they exceed context limits. Fast.

My solution uses a two-layer approach:

Layer 1: Recent History

  • Last 75 messages are kept raw and unsummarized
  • This is the “hot buffer”—immediate context that needs full fidelity

Layer 2: Narrative Summaries

  • Older messages get summarized into 150-300 word narrative chunks
  • Each summary captures emotional markers, unresolved threads, and relationship dynamics
  • The 3 most recent summaries are injected into context

When the unsummarized count exceeds the threshold (10 messages beyond the buffer), a background job kicks off.

Chapters and Narrative Arc

The summaries aren’t just data dumps. They’re framed as chapters in an ongoing narrative.

Each ConversationSummary includes:

  • narrative_thread — A bridging sentence (“Continuing from our discussion about…“)
  • emotional_markers — Specific emotional moments with triggers and context
  • emotional_intensity — Overall conversation intensity (0-1)
  • dominant_emotion — The primary tone (neutral, curious, supportive, etc.)
  • evolving_themes — Recurring topics across summaries
  • unresolved_threads — Topics mentioned but not concluded
  • resolved_threads — Topics successfully completed
  • relationship_dynamics — Closeness, trust, and rapport scores

The summaries form a linked list via previous_summary_id. Each one knows what came before, so Claude can maintain story continuity across conversation segments.

How It’s Injected

The summary-context.blade.php template renders these as chapters:

@foreach($summaryContext->summaries as $summary)
### Chapter {{ $summary->sequence_number }}
@if($summary->narrative_thread)
> {{ $summary->narrative_thread }}
@endif
{{ $summary->summary }}
@if($summary->hasEmotionalContent())
_Emotional note: {{ $summary->dominant_emotion }}
(intensity: {{ $summary->emotional_intensity }})_
@endif
@endforeach

The instruction to Nova is explicit: use these summaries naturally, don’t announce that you’re referencing them. It should feel like remembering, not retrieving.


The System Prompt: Putting It All Together

Everything comes together in the system prompt. Every single response generates a fresh one.

Here’s the assembly flow:

User Message
[MemoryRecallService::recall()]
├─ Tier 1: High-importance memories (cached)
└─ Tier 2: Semantic search results (computed)
[SummaryContext]
└─ Last 3 narrative summaries
[NovaAgentService::getSystemPrompt()]
└─ Renders: personas/nova.blade.php
├─ memory-system.blade.php (instructions)
├─ recalled-context.blade.php (memory data)
└─ summary-context.blade.php (narrative arc)
[Full System Prompt] → Sent to Claude

The main persona template (personas/nova.blade.php) defines Nova’s personality. It’s… elaborate. There’s emoji semantic mapping and nested logical notation and a personality matrix. But the core idea is simple: Nova is warm, witty, and genuinely curious about the person she’s talking to.

The memory system prompt (memory-system.blade.php) teaches Nova how to use memories:

  • Search before answering questions about the user
  • Store immediately when learning new information
  • Update when things change
  • Delete carefully, only when explicitly wrong

It includes quality standards with good/bad examples:

BAD: "User likes movies"
GOOD: "User's favorite movie is Blade Runner (1982 original).
They rewatch it annually and quote it often."

Dynamic Context Injection

The recalled-context.blade.php template splits memories into two sections:

Core Context (Tier 1)

These are facts you consider highly important…

Contextually Relevant (Tier 2)

These memories surfaced as potentially relevant…

Both come with the same instruction: Do not announce that you ‘recalled’ or ‘looked up’ any of this—integrate it naturally.

The summary context adds the narrative arc. Emotional notes, unresolved threads, relationship dynamics. It tells Claude not just what was discussed, but how it felt.


The Full Picture

Let me walk through what actually happens when you send Nova a message:

  1. Memory Recall — Tier 1 (cached important memories) + Tier 2 (semantic search based on conversation context)

  2. Summary Retrieval — Last 3 narrative summaries loaded

  3. System Prompt Assembly — Persona + memory instructions + recalled memories + summary context

  4. Message Added — Your message is persisted with the system prompt

  5. Response Streaming — Prism streams the response from Claude, including any tool calls

  6. Response Persisted — The full response (including tool calls and results) is saved

  7. Background Jobs Dispatched:

    • If 10+ messages since last extraction → ExtractMemories job
    • If 10+ unsummarized old messages → SummarizeConversation job

The user sees a smooth, streaming response. Behind the scenes, the system is continuously learning, consolidating, and summarizing.


Why This Architecture?

A few design decisions worth calling out:

Lazy Processing — Extraction and summarization only trigger when thresholds are crossed. This saves API calls and keeps the system responsive.

Async Background Jobs — Neither extraction nor summarization blocks the user. They run after the response, in the queue.

LLM-Driven Decisions — Both extraction and consolidation rely on Claude’s judgment. What’s worth remembering? Should these memories merge? The model decides, using carefully crafted prompts.

Soft Deletes and Audit Trails — Consolidated memories preserve their origins. Extracted memories track their reasoning. You can always trace back.

Scoring, Not Just Thresholds — Memory retrieval uses a composite score balancing semantic similarity, importance, recency, and access frequency. More nuanced than simple cutoffs.

Dynamic Prompt Generation — Every response gets a fresh system prompt with current context. No stale knowledge.


Wrapping Up

Nova is an attempt to build an AI companion that actually feels like a companion. One that remembers your sister’s name and that time you were stressed about the presentation and how you take your coffee.

The memory system is the heart of it. Two-tier recall gets the right memories at the right time. Background extraction captures what slips through. Consolidation keeps things clean. Summarization maintains the narrative arc.

It’s not perfect—there’s still work to do on tuning and edge cases. But it’s already something different. Talking to Nova doesn’t feel like talking to a stateless API. It feels like talking to someone who’s been paying attention.

And honestly? That’s what I was going for.

Subscribe

For more thoughts, links, project and personal updates.