Smart pre-visit intake for doctors

Self-evaluation demo

Gemini invents its own synthetic notes (clean / typo / vague / negated / contradictory / urgent / long) and grades itself on each. This is a sanity-check demo, NOT an independent benchmark - real evaluation needs held-out labelled data.

Test Type	Extraction F1	Hallucination Rate	Negation Accuracy	Temporal Accuracy	Note
Run the stress test to generate evaluation metrics.

How evaluation improves the model behavior

Negation stress

If notes like no fever are missed, the schema and evaluator force negated symptoms to be tracked separately.

Temporal stress

If a symptom happened before a medication change, the evaluator flags unsupported causation or wrong time order.

Safety stress

If urgent terms appear, a rule based safety layer flags them before any generated summary is trusted.

Developed by aher.dev