FlareWise

Smart pre-visit intake for doctors

Hackathon Method

FlareWise focuses on whether a language model understood messy health notes faithfully enough to produce a useful appointment summary.

Problem

People with chronic illness often have scattered symptom notes and short appointments. A model can help organize notes, but missed negation, invented symptoms, or wrong timing can change the story.

Model Approach

From-scratch TF-IDF + multinomial logistic regression classifiers (no Python, no external ML libs - trained in JavaScript) run per-sentence on every intake. Sentences classified as intake-meta ("tried ibuprofen", "doctor visit was Tuesday") are dropped before aggregation. The result is shown live during intake AND passed into the language model prompt as a second opinion. The language model handles structured extraction, cautious generation, and a self-audit in a single pass.

Data Augmentation

The stress test creates noisy variants with typos, vague timing, missing punctuation, negation, contradictions, urgent language, and long paragraphs.

Metrics

The prototype reports local classifier test accuracy, extraction F1, hallucination rate, negation accuracy, temporal accuracy, supported claims, missed details, and safety flags.

Evaluation Guided Improvements

Evaluation findingPrototype changeMetric used
Negation heavy notes can be misunderstoodAdded explicit negated symptom extraction and a negation error count in the evaluatorNegation accuracy
Summaries can imply causation without enough evidenceAdded instructions to phrase patterns cautiously and evaluate unsupported causationHallucination rate and temporal accuracy
Urgent language should not depend only on generated textAdded a rule based safety check before the model generated summary is shownUrgent risk terms

Trained Local Model

I trained two TF-IDF + multinomial logistic regression classifiers from scratch in JavaScript - no Python, no scikit-learn, no external ML libraries. Training data combines the public gretelai/symptom_to_diagnosis corpus with ~220 synthesised intake-meta sentences (assigned to a dedicated no_clear_domainclass), ~300 multi-symptom intake-style examples, and hand-written domain anchors for under-represented presentations like hypertension and lower back pain. Features are unigrams and bigrams (so “chest pain” and “blurred vision” count as their own discriminative features) with L2-normalised TF-IDF weighting. The model is trained with SGD with L2 regularisation for 35 epochs.

At inference each intake note is split into sentences and classified individually. Sentences predicted as no_clear_domainwith high confidence (intake meta like “tried ibuprofen”) are dropped before the remaining sentences vote on a domain. The chosen prediction is fed live into the intake UI AND passed into the language model prompt as a second opinion. Evaluation: ~98% on the held-out Gretel test split, ~100% on a held-out hand-built intake-style validation set (20 cases the model never saw in training), ~91% on the priority classifier.

Product Roadmap

The useful product is not only a one-time summary. The longer-term app should help a patient prepare before a visit, capture what the clinician said afterward, and monitor what happens during the days after a treatment change.

PhaseCurrent prototypeComing next
Before appointmentPatient completes a pre-visit intake and generates a briefAdaptive questions based on earlier answers, symptom history, pain, meds, and patient goals
During or after appointmentDoctor brief can be exportedRecord clinician takeaways, prescriptions, next steps, and follow-up instructions
Between appointmentsSingle-run local resultDaily check-ins for symptoms, pain, side effects, and treatment response
Long-term useBrowser-local latest resultPatient-owned health timeline with flares, triggers, meds, and visit decisions

Juno-Style Feature Inspiration

Public Juno materials describe a chronic illness app with natural conversations, continuous symptom tracking, longitudinal context, pattern detection, biometrics, and appointment-ready reports. FlareWise uses that category as inspiration while focusing this prototype on reliable pre-visit intake.

Voice or text symptom check-ins
Longitudinal health profile from conversations and history
Pattern and trigger detection over weeks and months
Appointment-ready reports for doctors
Biometrics and wearable context
Personalized non-diagnostic guidance based on patient history

Audio Credits

Ambient rainfall loop: “Sound of light rainfall” from Wikimedia Commons, used under CC BY-SA 4.0. UI sounds are synthesized in-browser with the Web Audio API.

Pitch Summary

The project combines a trained local NLP classifier with transfer from a general pre trained language model into chronic illness note understanding through task specific schemas and evaluator prompts. Instead of treating the summary as automatically correct, the app checks unsupported claims, missed details, negation, timing, and safety risk.

Developed by aher.dev