Project  ·  Technical Writeup  ·  2026

Diagnostic Engine: Registry-Driven Knee Symptom Reasoning

A bounded symptom reasoning service that turns a knee complaint into controlled evidence, asks one focused question at a time, and stops with a shortlist, fallback, or safety escalation.

Type
Systems build
Scope
Knee-only beta
Runtime
Serverless handlers + persistent sessions
Mode
Bounded reasoning, not diagnosis
53
SYMPTOM SIGNALS
28
FOLLOW-UP PROMPTS
3
BOUNDED OUTCOMES
1
APPEND-ONLY LEDGER
// concept

Symptom checkers get more useful when the scope gets narrower

Diagnostic Engine is a deliberately bounded reasoning system for knee complaints. The goal is not to behave like a general medical chatbot and it is not to present itself as a diagnosis tool. The useful boundary is smaller: accept an opening complaint in free text, convert only justified details into structured evidence, ask a short targeted follow-up round, and stop cleanly once the system has either earned a shortlist, failed to earn one, or hit a safety rule.

That scope is what makes the architecture interesting. Instead of relying on one large prompt to hold the entire reasoning process together, the project moves medical logic into registries, pushes free text through a constrained intake layer, and runs the rest of the loop on symptom keys, question definitions, and deterministic scoring rules.

Boundary: the system only gets to make narrow, auditable claims. Unknown evidence stays unresolved, safety rules sit outside ranking, and the engine is expected to stop rather than over-claim.

// architecture

Complaint in, evidence mapped, one question out

The runtime follows a governed evidence loop rather than an open-ended chat loop. After the first complaint is parsed, the engine decides between safety escalation, candidate scoring, and the next single discriminating question. The important architectural move is that everything after intake runs on the same bounded symptom state instead of on fresh free-text interpretation each round.

+------------------------------------------------------------+ | USER COMPLAINT | | "I twisted my knee, heard a pop, and it swelled fast" | +------------------------------------------------------------+ | v +------------------------------------------------------------+ | INTAKE PARSER | | - optional LLM-assisted parse | | - deterministic fallback parser | | - outputs symptom evidence + summary + leftovers | +------------------------------------------------------------+ | v +------------------------------------------------------------+ | SESSION STATE | | symptomState, parserOutput, round count, latest form | +------------------------------------------------------------+ | +---------------+----------------+ | | v v +-------------------------------+ +-------------------------+ | SAFETY RULES | | CANDIDATE SCORER | | hot/red/fever, deformity, | | registry-driven fit | | severe trauma, weight-bearing | | scoring by stage | +-------------------------------+ +-------------------------+ | | | escalated v | +-------------------------+ | | QUESTION SELECTOR | | | choose limit: 1 next | | | high-value prompt | | +-------------------------+ | | v v +-------------------------------+ +-------------------------+ | ESCALATION RESULT | | FORM / SHORTLIST / | | urgent in-person review | | FALLBACK | +-------------------------------+ +-------------------------+ | v +------------------------------------------------------------+ | LEDGER + PERSISTENCE | | append-only events, reusable sessions, reloadable state | +------------------------------------------------------------+

The core boundaries


// implementation

The system works because every later decision is anchored to structured evidence

The core design is not "ask an LLM what the complaint sounds like" over and over. The core design is to convert the opening story into controlled evidence once, then keep all later reasoning tied to the same bounded state. That makes the system auditable and also changes what a question is for. A question is not there to continue the conversation. It is there to confirm, negate, or separate concrete symptom signals that matter to the scorer.

System surface What it represents Why it exists
symptom registry Canonical symptom IDs, value types, categories, and scale labels. The system needs one stable vocabulary so every later step talks about the same evidence model.
question bank Authored follow-up prompts, gating rules, and answer-to-symptom mappings. Questions become deterministic evidence updates rather than loose conversational turns.
disease definitions Supports, anti-symptoms, contradiction logic, and stage-specific weighting. Fit scoring stays in explicit rules that can be inspected, edited, and benchmarked.
session symptom state The live evidence map for the current interview. Free-text intake and form answers converge into one shared state instead of two competing interpretations.

This is what gives the system its shape. The intake parser can infer or tentatively map evidence, but it still has to land inside the same bounded symptom vocabulary as the rest of the engine. The scorer can only rank against what has been earned into that state. The selector can only ask about unknown or weakly-supported signals that exist in the registry. The whole loop stays narrow because every component is constrained by the same evidence model.

Evidence status is also part of the design. The session distinguishes between explicit evidence, inferred evidence, and low-confidence evidence. That matters because it lets the engine treat the opening complaint as useful but provisional, then let later answers overwrite weaker assumptions with cleaner signals. In practice, that is what makes the follow-up rounds feel purposeful instead of repetitive.


// runtime

One complaint becomes one session, then one question at a time

Session start and answer submission both run through the same evaluation loop. The system parses or merges evidence, appends ledger events, checks safety, scores candidates, and either returns a single compiled question or one of the final outcomes. That matters because the engine never switches reasoning modes halfway through. The opening story and the later answers are both just ways of updating the same session state.

01

Start from complaint text

The intake layer maps only justified details into structured symptom evidence and produces a compact summary of what the system thinks it heard. Unclear facts are left unresolved instead of being silently guessed.

02

Check safety before ranking

Fever with a hot or red knee, visible deformity, or major trauma can short-circuit the normal candidate loop and return an escalation message immediately.

03

Score candidate nodes deterministically

Candidate fit depends on supports, anti-symptoms, contradictions, evidence status, and stage profiles defined in the registries. Given the same symptom state, the scorer returns the same result.

04

Select the next useful question

The selector chooses the smallest next question set that best separates the remaining candidates. In the current engine, that means exactly one active question at a time, which keeps each round tied to one specific clarification goal.

05

Stop with a bounded result

The loop ends with a shortlist, fallback, or escalation. The important product choice is that fallback is treated as a real answer state, not as something the system tries to smooth over with fake confidence.

Outcome When it appears Why it matters
candidates A confident shortlist exists and the lead is decisive enough, or no more useful questions remain. The engine only shows a shortlist once it has actually earned a bounded fit result.
fallback Evidence stays too weak or ambiguous after the questioning budget is used up. The system would rather stop cautiously than pretend certainty it does not have.
escalation Safety logic fires before or during the normal ranking loop. Urgent patterns are treated as urgent, not diluted into a candidate score.

// finding

The most useful decision is treating safety as a separate system

The strongest architectural choice here is not the scorer itself. It is the decision to keep safety logic outside disease ranking. In a lot of lightweight diagnostic demos, urgency is just another score feature, which means the same surface is expected to do both differential reasoning and red-flag triage. This engine avoids that. A hot, red knee with fever does not need to win a candidate race before the system can say the user needs urgent in-person review.

The second useful decision is that the engine is allowed to fail cleanly. A shortlist is only one of the permitted endings. Fallback is equally part of the architecture. That keeps the system honest under weak evidence and prevents the scorer from being forced into fake confidence just because the product needs a satisfying ending.

ARCHITECTURAL TAKEAWAY

Governance gets more realistic when the system is allowed to stop early and stop cautiously

The engine is useful precisely because it does not try to behave like a universal medical intelligence. Its scope is narrow, the outcomes are bounded, and fallback is explicit. That creates a more honest product surface and a cleaner engineering surface at the same time.

The session ledger supports that honesty. The system can show what it parsed, what it asked, and where it stopped, instead of only exposing a polished final sentence with no audit path behind it.


// state

Sessions are durable enough to inspect, not just fresh-page demos

The project keeps an evolving session object with symptom state, parser output, candidate state, question log, and the latest compiled form. Ledger entries are appended for material transitions such as session creation, parse merge, answer recording, candidate flagging, fallback, and safety escalation.

That persistence layer is what makes the app feel more like a small governed service than a front-end toy. The hosted deployment uses Supabase-backed sessions and ledger rows so the Vercel runtime can stay stateless while the interview state remains reloadable.


// scope

Current limits are part of the design, not an afterthought

This is still a small, intentionally narrow system. The current model covers ACL tear, meniscal tear, patellofemoral pain syndrome, and knee osteoarthritis. It is a governed knee-only reasoning loop, not a general diagnostic platform.

A few current constraints are especially worth being explicit about: