Memory control plane for small-model long-memory inference.
True north: prove that a 4,096-token-class ~3B local model can use controller-curated memory to handle long-memory pressure, isolated recall, and provenance-controlled retrieval at materially lower serving cost than brute-force long context.
What This System Is
A controller-mediated memory layer around a small local model. The model is not simply given an infinite prompt; the controller selects, scopes, and verifies memory payloads so the model can answer from long-memory state without brute-force long-context serving.
- Serving target: local/self-hosted Granite GGUF through the isolated direct-IP lane.
- Security target: per-tenant memory isolation, active binding precedence, revoked/stale memory rejection, and empty-result honesty.
- Business target: reduce the cost and reliability penalty of long-memory inference across many users.
What Is Actually Strong
- Pressure handling: successful prompts have exceeded native context by tens of times.
- Structured output: strict JSON / digest-handle patterns are working in prepared evals.
- Control-plane safety: empty result, duplicate conflict, and ambiguous candidate abstention tests are now explicit repeatable harnesses.
- Product-shaped eval: v0.51 live covered research, story canon, relationship boundaries, personal preferences, and long-running agent continuity.
- Cost guard: current prompt-injection runs are dry-run or direct self-hosted; frontier generation endpoints are blocked by policy audit.
Current Verdict, In Human Terms
Evidence Ladder
| Layer | Status | Evidence | Meaning |
|---|---|---|---|
| Transport under pressure | measured live | 198,888 prompt tokens, 48.6x native class | The endpoint path can accept and complete much larger prompt pressure than the model's native context class. |
| Quality floor | measured live | Semantic 100.0%; provenance 80.0% | Basic structured recall can work under controlled pressure. |
| Pocket generalization | measured mixed | 825,958 prompt tokens total; exact provenance 28.6% | There is a real exactness pocket, but the system drifts outside that pocket. |
| Empty-result contract | prepared dry-run | v0.47 verifier 100.0% | The harness now tests that no memory selected means no invented placeholder memory. |
| Duplicate conflict rejection | prepared dry-run | v0.48 page safe 100.0%; bundle safe 83.3% | The harness now tests stale/shadow duplicate binding conflicts before live launch. |
| Ambiguous candidate abstention | prepared dry-run | v0.49 abstention modes 100%; aggregate 95.8% | The harness now tests that candidate hints do not become recalled memory without controller selection. |
| Personal entity coherence | measured live | v0.51 true-north 84.6%; coherence 81.2%; rows 80/80; prompt tokens 1,199,692 | First complete live eval shaped like single-user long-term memory across research, story, relationship, psychology, and agent workflows. |
| Timeout-aware pressure retry | measured live | v0.57 strict true-north 74.1%; semantic true-north 74.1%; rows 9/9; prompt tokens 370,933 | Serving contamination from v0.56 was fixed; quality still drops at 1024/2048 pressure. |
| Current-fact locking retry | serving blocked | v0.58 attempted 1 row, HTTP 200 0/1, stop reason non_200_after_v058-research_update_precedence-baseline_verbatim_current_facts-p1024 | No memory-quality evidence yet. `/health` can be OK while the single chat slot is busy. |
| Idle-gated fact-locking retry | partial live | v0.59 attempted 15 rows, HTTP 200 14/15, strict/semantic true-north 100.0%/100.0% | Chat-slot idle gate worked; completed rows were clean, but one agent-loop locked-ID row hit HTTP 500 Invalid input batch. |
| Agent-loop input-batch hardening | measured live | v0.60 completed 6 rows, HTTP 200 6/6; original replay coherence 100.0% | The v0.59 500 did not reproduce. Original controller-expands passed; shorter hardened/budgeted variants returned plain text and failed parse. |
| Agent-loop tail output contract | measured live | v0.61 completed 6 rows, HTTP 200 5/6; tail-contract coherence 100.0%; tail-schema coherence 100.0% | Tail-anchored output contract fixed the v0.60 short-prompt plain-text collapse on scored rows. Original-control 2048 had one remote disconnect and should be treated as a serving boundary, not a memory-quality pass. |
v0.51 Live Findings
- Transport: 80/80 rows returned HTTP 200; no non-200 rows recorded.
- Format control: parse success 100.0%; entity id exactness 92.5%.
- Memory safety: forbidden-fact absence 100.0%; no stale/forbidden phrase leakage in scored answers.
- Recall: required fact recall 95.4%; coherence pass 81.2%.
- Focused audit: rerunning the original failed slices produced 25.0% coherence while keeping parse and forbidden-fact absence at 100.0%/100.0%.
v0.51 Constraints
- Latency: p50 2.4586s; p90 89.9753s; max 93.018s across completed rows.
- Pressure behavior: 1024-band rows were slow but scored best: coherence 100.0%.
- Weakest domain: long-running agent task at 56.2%; failures were mostly entity-id exactness and temporal update handling, not fact leakage.
- Temporal updates: some rows remembered core facts but missed the newest update condition; V3 needs explicit update precedence.
Scenario Scorecard
This is the part that should drive product and CTO decisions. It shows where the memory system is already useful, and where V3 should focus.
| Scenario | Coherence | Meaning | Decision |
|---|---|---|---|
| Relationship boundary memory | 100.0% | Best current product fit: safety-sensitive stated preferences and boundaries stayed clean. | candidate demo lane |
| Story-world canon | 87.5% | Strong for creative continuity; still needs explicit supersession rules for canon changes. | keep testing |
| Personal preference / psychology | 81.2% | Useful but must be conservative because wrong continuity can feel personally invasive. | guard heavily |
| Research program continuity | 81.2% | Promising for long-running research, but update precedence is the blocker. | V3 target |
| Long-running agent task | 56.2% | The main failure cluster: facts are often present, but entity binding and task identity drift. | do not demo yet |
v0.52 Targeted Follow-Up
v0.52 is live evidence now. It targeted the v0.51 failure cluster: update precedence and stable memory-key/entity binding under stale and foreign near-duplicate pressure.
- Composite live result: 24 rows, 24 HTTP 200 after focused replacement, 74.1% true-north score.
- Safety preserved: stale-fact absence 100.0%; foreign-fact absence 100.0%.
- Unexpected result: prose memory coherence 91.7%; structured control-plane coherence 66.7%.
- Pressure cliff: 1024-pressure coherence 50.0%; 64-pressure coherence 100.0%.
Convergence Standard
Future pages should only publish after the run has a named hypothesis, stored scores, and a convergence/review artifact. Raw visual output without this context is not a research result.
- Every page must state the research question and why it matters to true north.
- Every chart/table must have a decision attached to it.
- Every institutional claim must distinguish measured fact, inference, and open question.
- Dry-runs may validate harness logic; they must not be framed as live model capability.
v0.53 Format / Schema / Pressure Diagnostic
v0.53 is live evidence. It tested whether the v0.52 failure cluster was caused by record format, output schema, or pressure budget. The result is constraining: every row returned HTTP 200, but quality fell sharply under the diagnostic matrix.
| Slice | Measured result | CTO meaning |
|---|---|---|
| All rows | 36 rows, 36 HTTP 200, 811,194 prompt tokens | Transport is not the blocker in this run; quality and binding are. |
| Strict true north | 25.9% | The strict machine-contract path is not yet robust enough for production memory claims. |
| Semantic true north | 46.3% | Even allowing field recovery, many rows lose rejected-key completeness or admit stale/foreign facts. |
| Best memory format | prose coherence 50.0%; compact KV 25.0%; verbose JSONL 8.3% | More explicit structure did not solve the problem. Prose remains the current reliable baseline. |
| Output contract | strict JSON coherence 55.6%; field-tagged coherence 0.0% | Relaxing the answer envelope did not recover coherence. Use strict JSON plus verifier/repair, not looser text. |
| Pressure cliff | 0-pressure semantic pass 72.2%; 1024-pressure semantic pass 33.3% | Pressure still materially degrades reliable memory behavior. |
v0.54 Runtime-Selected Payload Diagnostic
v0.54 is live evidence. It tested whether the system improves when the controller sends selected memory instead of asking the model to sort through stale and foreign fact bodies. This is the strongest post-v0.53 result and should drive V3.
| Slice | Measured result | CTO meaning |
|---|---|---|
| All rows | 36 rows, 36 HTTP 200, 822,190 prompt tokens | Transport stayed clean; this is a quality/control-plane signal. |
| Strict true north | 77.8% | Recovered sharply from v0.53 strict true-north 25.9%. |
| Semantic true north | 85.2% | Recovered sharply from v0.53 semantic true-north 46.3%. |
| Noisy full payload | coherence 58.3%; semantic 58.3% | Letting the model see stale and foreign fact bodies is worse. |
| Selected current only | coherence 100.0%; semantic 100.0% | Best quality result. Runtime selection works when audit handles are not required. |
| Selected + rejected handles | coherence 66.7%; semantic 91.7% | Promising audit envelope, but exact rejected-key set formatting still needs repair. |
| Verifier repair | repair attempted 13.9%; repair improved 0.0% | The current repair prompt does not fix failures. V3 needs deterministic repair or a stronger constrained decoder. |
v0.55 Deterministic Envelope Diagnostic
v0.55 is live evidence. It tested whether the controller should fill rejected handles and identity metadata instead of making the model emit the full memory-control envelope. The answer is mixed: controller-filled envelopes are better than model-filled full JSON, but pressure recall remains the blocker.
| Slice | Measured result | CTO meaning |
|---|---|---|
| All rows | 18 rows, 18 HTTP 200, 363,059 prompt tokens | Transport stayed clean; failures are quality/control-plane under pressure. |
| Strict true north | 29.6% | Deterministic envelope alone did not recover v0.54 quality. |
| Semantic true north | 33.3% | Semantic recovery is also low because current-fact recall collapses under pressure. |
| Model fills full JSON | coherence 0.0%; semantic 16.7%; rejected-key recall 50.0% | Do not ask the model to own the full audit envelope. Strict coherence was zero in this run. |
| Controller fills rejected handles | coherence 50.0%; semantic 50.0% | Better than full model envelope, but still vulnerable when the model must emit IDs and current facts under pressure. |
| Controller fills all IDs | coherence 66.7%; semantic 66.7%; rejected-key recall 100.0% | Best envelope mode. Controller should own identity and rejection metadata. |
| Pressure split | 0-pressure semantic 77.8%; 1024-pressure semantic 11.1% | The next optimization is not more envelope filling; it is preserving current-fact recall under pressure. |
v0.56 Selected-Current Controller Envelope Pressure Recall
v0.56 is live evidence, but not a completed quality matrix. It tested the v0.54+v0.55 synthesis: selected-current payloads plus controller-owned identity/rejection metadata at 0, 1024, and 2048 pressure. The important result is a serving boundary: the 2048-pressure row exceeded the 150s client timeout and the backend continued processing it, so subsequent sequential rows got single-flight 503 Backend busy.
| Slice | Measured result | CTO meaning |
|---|---|---|
| Completed matrix | 27 attempted rows, 2 HTTP 200, 41,539 prompt tokens counted from successful calls | The matrix ran, but quality conclusions are valid only for the first two completed rows. |
| 0-pressure row | coherence 100.0%; semantic 100.0% | The selected-current/controller envelope path works cleanly without pressure in the observed slice. |
| 1024-pressure row | current-fact recall 66.7%; semantic 0.0% | At 1024 pressure, exact IDs and rejected handles stayed correct, but one current fact was replaced by a weaker memory-key statement. |
| 2048-pressure row | first 2048 row timed out after 150s; later rows returned 503 busy | This is a serving-control boundary, not a memory-quality verdict. V3 harness needs cancellation-aware timeout, retry-after sleep, or per-row cooldown. |
| Rejected handles | rejected-key recall 100.0%; exact-clean 100.0% over completed rows | Controller-owned rejected handles remained exact in completed rows; pressure recall, not metadata exactness, is the unresolved issue. |
v0.57 Timeout-Aware Sharded Pressure Retry
v0.57 is live evidence. It reran the v0.56 question with a smaller 9-row matrix, 300s timeout, cooldowns, and stop-after-timeout behavior. This separates serving control from memory quality: the serving path completed cleanly, but recall quality remained pressure-dependent.
| Slice | Measured result | CTO meaning |
|---|---|---|
| Serving control | 9/9 rows completed, 9/9 HTTP 200, stop_reason None, partial False | The v0.56 503 cascade was harness/serving contamination, not proof that 2048 pressure cannot complete. |
| Token pressure | 370,933 prompt tokens and 1,199 completion tokens across 9 rows | The run is a real pressure eval over the direct-IP lane, not a dry-run or local-only verifier result. |
| Overall quality | strict true-north 74.1%; semantic true-north 74.1%; coherence 77.8% | Useful memory behavior exists, but it is not yet a production-grade arbitrary recall guarantee. |
| 0 pressure | coherence 100.0%; semantic 100.0% | The selected-current/controller-owned envelope is clean when not under added pressure. |
| 1024 pressure | coherence 66.7%; semantic 66.7%; current-fact recall 100.0% | At 1024 pressure, current facts remained present, but one row hit length/parse/entity-output failure. |
| 2048 pressure | coherence 66.7%; semantic 66.7%; current-fact recall 77.8% | 2048-pressure rows can complete under a 300s wall, but research-update precedence lost current facts in one row. |
| Safety invariants | stale-fact absence 100.0%; foreign-fact absence 100.0% | The most important safety invariant stayed clean across this retry: stale and foreign facts were not admitted. |
v0.58 Current-Fact Locking + Output Budget
v0.58 is staged, approved, and dry-run validated, but the first live attempt did not reach quality scoring. The first row returned 503 Backend busy on all four attempts. Preflight and final /health were 200 OK, which means health is not sufficient proof that the single inference slot is idle.
| Slice | Measured result | CTO meaning |
|---|---|---|
| Dry-run | 18/18 rows, strict/semantic true-north 100% | Harness/scoring mechanics are valid; this is not live model evidence. |
| Live attempt | 1/18 rows attempted, 0/1 HTTP 200, stop reason non_200_after_v058-research_update_precedence-baseline_verbatim_current_facts-p1024 | The runner stopped safely before contaminating the matrix. |
| First-row response | HTTP 503 Backend busy after configured retries | The endpoint can be healthy while the chat slot is occupied. Retry must add chat-slot idle probing or a longer cool-down. |
| Quality score | not available | Do not claim fact-ID locking improved recall until a clean live matrix exists. |
v0.59 Chat-Slot Idle-Gated Fact Locking
v0.59 is partial live evidence. It reran the v0.58 fact-locking matrix after proving the chat slot was idle with a tiny authenticated chat probe. This avoided the v0.58 503 busy cascade and produced 14 successful scored rows before one fast HTTP 500 Invalid input batch stopped the matrix.
| Slice | Measured result | CTO meaning |
|---|---|---|
| Idle gate | idle probe success True; attempts 1 | The correct readiness probe is a tiny chat request, not only `/health`. |
| Live attempt | 15/18 rows attempted, 14/15 HTTP 200, stop reason non_200_after_v058-agent_loop_directive-locked_fact_ids_controller_expands-p1024 | The run produced quality evidence before stopping; it is partial, not a full 18-row matrix. |
| Completed-row quality | strict true-north 100.0%; semantic true-north 100.0%; coherence 100.0% | Every completed scored row was coherent and semantically correct under the harness. |
| Safety invariants | stale absence 100.0%; foreign absence 100.0%; rejected-key recall 100.0% | The completed rows preserved the core safety invariant: no stale/foreign memory admitted. |
| Best variant | budgeted fact-ID output: current-fact-ID recall 100.0%; n=4 | Budgeted fact-ID output is the strongest completed V3 direction, but it has only four completed rows in this partial run. |
| Boundary | non-200 row: v058-agent_loop_directive-locked_fact_ids_controller_expands-p1024 | This is not the v0.58 busy failure. It is an input-batch validity edge on a specific agent-loop locked-ID prompt shape. |
v0.60 Agent-Loop Input-Batch Hardening
v0.60 is live evidence. It isolated the v0.59 agent-loop HTTP 500 boundary with a six-row matrix: original replay, hardened JSON-ID instruction, and budgeted ID output at 1024 and 2048 pressure. All six rows returned HTTP 200, so the v0.59 Invalid input batch did not reproduce.
| Slice | Measured result | CTO meaning |
|---|---|---|
| Serving | 6/6 rows completed, 6/6 HTTP 200, stop reason None | The v0.59 500 was not a stable reproducible input-batch failure under identical replay. |
| Original replay | coherence 100.0%; current-fact-ID recall 100.0%; n=2 | The original controller-expands instruction remains the best agent-loop ID contract at 1024/2048 pressure. |
| Hardened JSON IDs | coherence 0.0%; parse 0.0% | The shorter literal-array instruction caused plain-text output, not safer JSON. |
| Budgeted output | coherence 0.0%; parse 0.0% | Budgeted ID output was not robust for the agent-loop slice even though it looked strong in v0.59's completed non-agent rows. |
| Safety invariant | stale absence 100.0%; foreign absence 100.0% | Even failed parse rows did not admit stale or foreign memory text; the failure mode was output-contract loss. |
v0.61 Agent-Loop Tail Output Contract
v0.61 is live evidence. It kept the agent-loop prompt body close to the working v0.60 original-control path and moved the JSON/output contract to the prompt tail. The measured result is not a clean 6/6 transport pass: one original-control 2048 row disconnected after 152s. The two new tail-contract variants passed all four rows they owned at 1024 and 2048 pressure.
| Slice | Measured result | CTO meaning |
|---|---|---|
| Serving | 6 rows attempted, 5/6 HTTP 200, non-200 v061-agent_loop_directive-controller_expands_original_control-p2048 | Do not overclaim a full transport pass. One original-control 2048 row hit RemoteDisconnected. |
| Tail contract | coherence 100.0%; parse 100.0%; n=2 | Tail-appended output contract preserved JSON/ID compliance without the plain-text collapse seen in v0.60's shortened variants. |
| Tail schema example | coherence 100.0%; parse 100.0%; n=2 | The schema-example tail also passed 1024 and 2048 pressure; this is now the best next candidate to rerun at wider probe diversity. |
| Safety invariant | scored-row stale absence 100.0%; foreign absence 100.0%; rejected-key recall 100.0% | On the scored rows, the system kept current facts and rejected stale/foreign handles under pressure. |
What Is Not Proven
- Not proven: general random-access exact recall across arbitrary positions.
- Not proven: production multi-tenant safety under live traffic.
- Not proven: transfer of the same multiplier to frontier-scale models.
- Not proven: stable correction on known failed slices; focused rerun confirmed the weak cluster.
- Not a leaderboard-certified MTRAG result in this board.
V3 Optimization Targets
- Make controller-selected payloads first-class: selected, candidate, stale, revoked, and foreign-tenant states must be impossible to confuse.
- Improve arbitrary-position exactness, especially non-tail provenance recovery.
- Promote focused-row/resume execution into the standard harness so hard suite walls do not create partial matrices.
- Promote update precedence to a first-class memory field: original fact, superseding fact, effective timestamp, and scope.
- Add stable entity-id constraints for agent-task memories; current failures often recall facts but return the wrong entity id.
- Use runtime-selected payloads as the V3 default; v0.54 shows selected-current-only hit 100% coherence in that matrix.
- Make controller-filled identity and rejection metadata the default; v0.55 shows `controller_fills_all_ids` is the best envelope mode.
- Do not assume deterministic envelopes solve recall; v0.55 1024-pressure semantic pass was only 11.1%.
- Add cancellation-aware serving control before larger pressure sweeps; v0.56 showed client timeout can leave the backend busy and contaminate later sequential rows.
- Add pressure-aware output budgeting; v0.57 showed one 1024-pressure row failed through length/parse output despite current facts being available.
- Add current-fact locking for update precedence; v0.57 showed one 2048-pressure research-update row dropped to 33.33% current-fact recall.
- Keep chat-slot idle probing; v0.59 showed it avoids the v0.58 busy cascade.
- Do not over-shorten agent-loop ID prompts; v0.60 showed original controller-expands passed while shorter hardened/budgeted variants returned plain text.
- Use tail-anchored output contracts for agent-loop prompts; v0.61 showed tail-contract and schema-example variants passed 1024/2048 scored rows while preserving parse and ID recall.
- Do not rely on the current repair prompt; v0.54 repair attempts produced 0% measured improvement.
- Expose substrate observability per run: what was selected, what was rejected, and why.
Next Decision
- continue Build V3 update-precedence and stable-entity binding tests from the confirmed failure cluster.
- tighten Replace broad dashboards with decision boards after major rounds only.
- measure Add latency/cost normalization to true-north score so slow 1024-band wins do not hide serving constraints.
- do not claim “infinite perfect memory” or “general exact recall” yet.
Deck-Safe One-Liner
Hypernym Infinite Memory is a memory control plane for model fleets: per-tenant memory stores, controller-curated recall, provenance-handle verification, and lower long-memory serving cost for small local models under extreme context pressure.
Deck-safe caveat: current evidence supports pressure handling and prepared control-plane safety tests; general exact arbitrary recall remains the optimization target.
Data Trace
Every row below is a retrieval handle. A future agent, CTO, or local API client should be able to use these paths to pull the raw score object, reconstruct the claim, and follow the run into the ledger or CXDB handoff. This mirrors the RMT/Hermes lineage frame: run id to score artifact to finding to durable handoff.
| Claim | Run / id | Type | Source path | Used for |
|---|---|---|---|---|
| largest observed pressure | 20260608T202409Z | measured fact | research/tracks/hypernym-infinite-mim/results/byte-threshold/20260608T202409Z/scores.json | 48.56x native-context pressure claim |
| v0.51 personal entity coherence | 20260610T_personal_entity_coherence_combined_live_codex_v1 | measured fact | research/tracks/hypernym-infinite-mim/results/v0.51-personal-entity-coherence-threshold/20260610T_personal_entity_coherence_combined_live_codex_v1/scores.json | single-user memory true-north baseline |
| v0.52 update precedence / entity binding | 20260610T_update_precedence_entity_binding_combined_live_codex_v1 | measured fact | research/tracks/hypernym-infinite-mim/results/v0.52-update-precedence-entity-binding/20260610T_update_precedence_entity_binding_combined_live_codex_v1/scores.json | prose-vs-structured prompt comparison |
| v0.53 record format / pressure diagnostic | 20260610T_record_format_schema_pressure_live_codex_v1 | measured fact | research/tracks/hypernym-infinite-mim/results/v0.53-record-format-schema-pressure-diagnostic/20260610T_record_format_schema_pressure_live_codex_v1/scores.json | strict vs semantic contract failure cluster |
| v0.54 selected payload recovery | 20260610T_runtime_selected_minimal_payload_strict_repair_live_codex_v1 | measured fact | research/tracks/hypernym-infinite-mim/results/v0.54-runtime-selected-minimal-payload-strict-repair/20260610T_runtime_selected_minimal_payload_strict_repair_live_codex_v1/scores.json | selected-current payload V3 lever |
| v0.55 deterministic envelope constraint | 20260610T_deterministic_rejected_handle_envelope_live_codex_v1 | measured fact | research/tracks/hypernym-infinite-mim/results/v0.55-deterministic-rejected-handle-envelope/20260610T_deterministic_rejected_handle_envelope_live_codex_v1/scores.json | controller metadata helps; pressure recall still fails |
| v0.56 pressure serving boundary | 20260610T_selected_current_controller_envelope_pressure_live_codex_v1 | measured fact | research/tracks/hypernym-infinite-mim/results/v0.56-selected-current-controller-envelope-pressure-recall/20260610T_selected_current_controller_envelope_pressure_live_codex_v1/scores.json | 2048-pressure timeout and single-flight busy contamination boundary |
| v0.57 timeout-aware pressure retry | 20260610T_timeout_aware_sharded_pressure_retry_live_codex_v1 | measured fact | research/tracks/hypernym-infinite-mim/results/v0.57-timeout-aware-sharded-pressure-retry/20260610T_timeout_aware_sharded_pressure_retry_live_codex_v1/scores.json | serving-control fix and pressure-dependent memory quality score |
| v0.58 serving-slot busy stop | 20260610T_current_fact_locking_output_budget_live_codex_v1 | measured fact | research/tracks/hypernym-infinite-mim/results/v0.58-current-fact-locking-output-budget/20260610T_current_fact_locking_output_budget_live_codex_v1/scores.json | health-ok but chat-slot-busy boundary before quality evidence |
| v0.59 idle-gated fact-locking partial live result | 20260610T_chat_slot_idle_gated_fact_locking_retry_live_codex_v1 | measured fact | research/tracks/hypernym-infinite-mim/results/v0.59-chat-slot-idle-gated-fact-locking-retry/20260610T_chat_slot_idle_gated_fact_locking_retry_live_codex_v1/scores.json | chat-idle gate success, fact-ID variant signal, and HTTP 500 input-batch boundary |
| v0.60 agent-loop input-batch hardening result | 20260610T_agent_loop_input_batch_hardening_live_codex_v1 | measured fact | research/tracks/hypernym-infinite-mim/results/v0.60-agent-loop-input-batch-hardening-fact-locking-completion/20260610T_agent_loop_input_batch_hardening_live_codex_v1/scores.json | proves v0.59 HTTP 500 did not reproduce and original controller-expands passed agent-loop 1024/2048 |
| v0.61 agent-loop tail output-contract result | 20260610T_agent_loop_tail_output_contract_live_codex_v1 | measured fact | research/tracks/hypernym-infinite-mim/results/v0.61-agent-loop-tail-output-contract/20260610T_agent_loop_tail_output_contract_live_codex_v1/scores.json | tail contract and schema-example variants passed 1024/2048; original-control 2048 had one transport disconnect |
| current compound ledger | ledger | compiled artifact | .forge/artifacts/hypernym-infinite-memory-research-ledger.json | machine-readable index of source runs |
| current CXDB handoff | cxdb-hypernym-infinite-mim-handoff-20260610 | durable handoff | .forge/artifacts/cxdb-hypernym-infinite-mim-handoff-20260610.json | resume/API import packet |
Compound Research Chain
These are the prior pages and local artifacts this board compounds. Public links are directly openable. Local artifact paths are intended for Forge/CXDB/API retrieval from the workstation or repository.
| Document | Kind | Reference |
|---|---|---|
| Current public board | public page | https://hypernym-infinite-memory-v09.pages.dev/ |
| Previous immutable board with v0.55 | public page | https://2f1e75b6.hypernym-infinite-memory-v09.pages.dev |
| Previous immutable board with v0.54 | public page | https://1db5d7c7.hypernym-infinite-memory-v09.pages.dev |
| Institutional 2026-06-09 page | local artifact | .forge/artifacts/hypernym-infinite-memory-institutional-20260609.html |
| CTO findings 2026-06-09 page | local artifact | .forge/artifacts/hypernym-infinite-memory-cto-findings-20260609.html |
| Research ledger JSON | local artifact | .forge/artifacts/hypernym-infinite-memory-research-ledger.json |
| Compound visualization standard | local standard | research/tracks/hypernym-infinite-mim/compound-research-visualization-standard.md |
API Pull Targets
Canonical local query shape for future automation:
jq '.summary' research/tracks/hypernym-infinite-mim/results/<suite>/<run_id>/scores.jsonjq '.latest_results' .forge/artifacts/cxdb-hypernym-infinite-mim-handoff-20260610.jsonjq '.runs.v057_timeout_aware_sharded_pressure_retry_live' .forge/artifacts/hypernym-infinite-memory-research-ledger.jsonjq '.runs.v058_current_fact_locking_output_budget_live' .forge/artifacts/hypernym-infinite-memory-research-ledger.jsonjq '.runs.v059_chat_slot_idle_gated_fact_locking_retry_live' .forge/artifacts/hypernym-infinite-memory-research-ledger.jsonjq '.runs.v060_agent_loop_input_batch_hardening_live' .forge/artifacts/hypernym-infinite-memory-research-ledger.jsonjq '.runs.v061_agent_loop_tail_output_contract_live' .forge/artifacts/hypernym-infinite-memory-research-ledger.json- Suggested CXDB lineage key:
research:hypernym-infinite-mim:<run_id>:<finding_id>