Smart pre-visit intake for doctors
Hackathon Method
FlareWise focuses on whether a language model understood messy health notes faithfully enough to produce a useful appointment summary.
Problem
People with chronic illness often have scattered symptom notes and short appointments. A model can help organize notes, but missed negation, invented symptoms, or wrong timing can change the story.
Model Approach
From-scratch TF-IDF + multinomial logistic regression classifiers (no Python, no external ML libs - trained in JavaScript) run per-sentence on every intake. Sentences classified as intake-meta ("tried ibuprofen", "doctor visit was Tuesday") are dropped before aggregation. The result is shown live during intake AND passed into the language model prompt as a second opinion. The language model handles structured extraction, cautious generation, and a self-audit in a single pass.
Data Augmentation
The stress test creates noisy variants with typos, vague timing, missing punctuation, negation, contradictions, urgent language, and long paragraphs.
Metrics
The prototype reports local classifier test accuracy, extraction F1, hallucination rate, negation accuracy, temporal accuracy, supported claims, missed details, and safety flags.
Evaluation Guided Improvements
| Evaluation finding | Prototype change | Metric used |
|---|---|---|
| Negation heavy notes can be misunderstood | Added explicit negated symptom extraction and a negation error count in the evaluator | Negation accuracy |
| Summaries can imply causation without enough evidence | Added instructions to phrase patterns cautiously and evaluate unsupported causation | Hallucination rate and temporal accuracy |
| Urgent language should not depend only on generated text | Added a rule based safety check before the model generated summary is shown | Urgent risk terms |
Trained Local Model
I trained two TF-IDF + multinomial logistic regression classifiers from scratch in JavaScript - no Python, no scikit-learn, no external ML libraries. Training data combines the public gretelai/symptom_to_diagnosis corpus with ~220 synthesised intake-meta sentences (assigned to a dedicated no_clear_domainclass), ~300 multi-symptom intake-style examples, and hand-written domain anchors for under-represented presentations like hypertension and lower back pain. Features are unigrams and bigrams (so “chest pain” and “blurred vision” count as their own discriminative features) with L2-normalised TF-IDF weighting. The model is trained with SGD with L2 regularisation for 35 epochs.
At inference each intake note is split into sentences and classified individually. Sentences predicted as no_clear_domainwith high confidence (intake meta like “tried ibuprofen”) are dropped before the remaining sentences vote on a domain. The chosen prediction is fed live into the intake UI AND passed into the language model prompt as a second opinion. Evaluation: ~98% on the held-out Gretel test split, ~100% on a held-out hand-built intake-style validation set (20 cases the model never saw in training), ~91% on the priority classifier.
Product Roadmap
The useful product is not only a one-time summary. The longer-term app should help a patient prepare before a visit, capture what the clinician said afterward, and monitor what happens during the days after a treatment change.
| Phase | Current prototype | Coming next |
|---|---|---|
| Before appointment | Patient completes a pre-visit intake and generates a brief | Adaptive questions based on earlier answers, symptom history, pain, meds, and patient goals |
| During or after appointment | Doctor brief can be exported | Record clinician takeaways, prescriptions, next steps, and follow-up instructions |
| Between appointments | Single-run local result | Daily check-ins for symptoms, pain, side effects, and treatment response |
| Long-term use | Browser-local latest result | Patient-owned health timeline with flares, triggers, meds, and visit decisions |
Juno-Style Feature Inspiration
Public Juno materials describe a chronic illness app with natural conversations, continuous symptom tracking, longitudinal context, pattern detection, biometrics, and appointment-ready reports. FlareWise uses that category as inspiration while focusing this prototype on reliable pre-visit intake.
Audio Credits
Ambient rainfall loop: “Sound of light rainfall” from Wikimedia Commons, used under CC BY-SA 4.0. UI sounds are synthesized in-browser with the Web Audio API.
Pitch Summary
The project combines a trained local NLP classifier with transfer from a general pre trained language model into chronic illness note understanding through task specific schemas and evaluator prompts. Instead of treating the summary as automatically correct, the app checks unsupported claims, missed details, negation, timing, and safety risk.