Smart pre-visit intake for doctors

Hackathon Method

FlareWise focuses on whether a language model understood messy health notes faithfully enough to produce a useful appointment summary.

Problem

People with chronic illness often have scattered symptom notes and short appointments. A model can help organize notes, but missed negation, invented symptoms, or wrong timing can change the story.

Model Approach

From-scratch TF-IDF + multinomial logistic regression classifiers (no Python, no external ML libs - trained in JavaScript) run per-sentence on every intake. Sentences classified as intake-meta ("tried ibuprofen", "doctor visit was Tuesday") are dropped before aggregation. The result is shown live during intake AND passed into the language model prompt as a second opinion. The language model handles structured extraction, cautious generation, and a self-audit in a single pass.

Data Augmentation

The stress test creates noisy variants with typos, vague timing, missing punctuation, negation, contradictions, urgent language, and long paragraphs.

Metrics

The prototype reports local classifier test accuracy, extraction F1, hallucination rate, negation accuracy, temporal accuracy, supported claims, missed details, and safety flags.

Evaluation Guided Improvements

Evaluation finding	Prototype change	Metric used
Negation heavy notes can be misunderstood	Added explicit negated symptom extraction and a negation error count in the evaluator	Negation accuracy
Summaries can imply causation without enough evidence	Added instructions to phrase patterns cautiously and evaluate unsupported causation	Hallucination rate and temporal accuracy
Urgent language should not depend only on generated text	Added a rule based safety check before the model generated summary is shown	Urgent risk terms

Trained Local Model

I trained two TF-IDF + multinomial logistic regression classifiers from scratch in JavaScript - no Python, no scikit-learn, no external ML libraries. Training data combines the public gretelai/symptom_to_diagnosis corpus with ~220 synthesised intake-meta sentences (assigned to a dedicated no_clear_domainclass), ~300 multi-symptom intake-style examples, and hand-written domain anchors for under-represented presentations like hypertension and lower back pain. Features are unigrams and bigrams (so “chest pain” and “blurred vision” count as their own discriminative features) with L2-normalised TF-IDF weighting. The model is trained with SGD with L2 regularisation for 35 epochs.

At inference each intake note is split into sentences and classified individually. Sentences predicted as no_clear_domainwith high confidence (intake meta like “tried ibuprofen”) are dropped before the remaining sentences vote on a domain. The chosen prediction is fed live into the intake UI AND passed into the language model prompt as a second opinion. Evaluation: ~98% on the held-out Gretel test split, ~100% on a held-out hand-built intake-style validation set (20 cases the model never saw in training), ~91% on the priority classifier.

Product Roadmap

The useful product is not only a one-time summary. The longer-term app should help a patient prepare before a visit, capture what the clinician said afterward, and monitor what happens during the days after a treatment change.

Phase	Current prototype	Coming next
Before appointment	Patient completes a pre-visit intake and generates a brief	Adaptive questions based on earlier answers, symptom history, pain, meds, and patient goals
During or after appointment	Doctor brief can be exported	Record clinician takeaways, prescriptions, next steps, and follow-up instructions
Between appointments	Single-run local result	Daily check-ins for symptoms, pain, side effects, and treatment response
Long-term use	Browser-local latest result	Patient-owned health timeline with flares, triggers, meds, and visit decisions

Juno-Style Feature Inspiration

Public Juno materials describe a chronic illness app with natural conversations, continuous symptom tracking, longitudinal context, pattern detection, biometrics, and appointment-ready reports. FlareWise uses that category as inspiration while focusing this prototype on reliable pre-visit intake.

Voice or text symptom check-ins

Longitudinal health profile from conversations and history

Pattern and trigger detection over weeks and months

Appointment-ready reports for doctors

Biometrics and wearable context

Personalized non-diagnostic guidance based on patient history

Audio Credits

Ambient rainfall loop: “Sound of light rainfall” from Wikimedia Commons, used under CC BY-SA 4.0. UI sounds are synthesized in-browser with the Web Audio API.

Pitch Summary

The project combines a trained local NLP classifier with transfer from a general pre trained language model into chronic illness note understanding through task specific schemas and evaluator prompts. Instead of treating the summary as automatically correct, the app checks unsupported claims, missed details, negation, timing, and safety risk.

Developed by aher.dev