Validation & Quality Assurance¶
The Validation tab helps you assess and improve the quality of your extracted timeline data. ChronoScope provides three validation approaches: Quick Checks, Gold Standard Comparison, and Duplicate Management.
Why Validation Matters¶
AI-powered extraction isn't perfect. LLM models can: - Miss events mentioned in documents - Extract irrelevant information as events - Parse dates incorrectly - Create duplicate entries from similar mentions
Validation helps you identify and fix these issues before using your timeline for important purposes like job applications or presentations.
Quick Validation Checks¶
What It Does¶
Quick validation performs automatic quality checks on your events without needing reference data. It identifies common extraction issues instantly.
Common Issues Detected¶
Missing Data: - Events without titles (generic "Event from resume" titles) - Missing start dates (rare, but possible) - Empty descriptions - No location specified - No tags assigned
Low Confidence Scores: - Events with confidence < 70% (often from fallback extraction) - Generic event titles indicating poor extraction quality
Date Issues: - Events using today's date instead of actual date (extraction bug) - Implausible date ranges (e.g., spanning 50+ years) - End date before start date
How to Use:
- Go to the Validation tab (🔍)
- Review your events in the "Event Details" expander
- Check for:
- Events with confidence < 0.7
- Generic titles like "Event from resume"
- Missing location/people/tags
- Edit or delete problematic events in the Data Table
Gold Standard Validation¶
What It Is¶
Gold Standard Validation compares your extracted events against a manually-curated reference timeline (the "gold standard"). This provides quantitative metrics on extraction quality.
Metrics Calculated: - Precision - What percentage of extracted events are correct? - Recall - What percentage of actual events were successfully extracted? - F1-Score - Harmonic mean of precision and recall (overall quality)
How to Use Gold Standard Validation¶
Step 1: Create Gold Standard Data¶
Before you can validate, you need a gold standard file:
- Manually review your source documents
- List all actual events with accurate details
- Save as
gold_standard_data.jsonin the project directory
Example format:
[
{
"title": "Senior Software Engineer at TechCorp",
"start_date": "2020-01-15",
"end_date": "2023-03-30",
"location": "San Francisco, CA",
"tags": ["career", "tech"]
},
{
"title": "M.S. Computer Science, Stanford",
"start_date": "2018-09-01",
"end_date": "2020-06-15",
"tags": ["education"]
}
]
Step 2: Run Validation¶
- Go to the Validation tab (🔍)
- Ensure you have uploaded and extracted events
- Click "Run Validation" button
- Wait for comparison to complete (a few seconds)
Step 3: Interpret Results¶
Precision: - 0.8 - 1.0 (80-100%) - Excellent! Most extracted events are correct - 0.6 - 0.8 (60-80%) - Good, but some incorrect extractions - 0.4 - 0.6 (40-60%) - Fair, many false positives - < 0.4 (< 40%) - Poor, extraction needs improvement
Formula: Precision = Correct Extractions / Total Extracted
Example: 8 correct + 2 wrong = 8/10 = 0.80 precision
Recall: - 0.8 - 1.0 (80-100%) - Excellent! Almost all events were found - 0.6 - 0.8 (60-80%) - Good, most events captured - 0.4 - 0.6 (40-60%) - Fair, many events missed - < 0.4 (< 40%) - Poor, extraction is incomplete
Formula: Recall = Correct Extractions / Gold Standard Events
Example: 8 found + 2 missed = 8/10 = 0.80 recall
F1-Score: - 0.8+ 🟢 Excellent - Ready for production use - 0.6-0.8 🟡 Good - Minor improvements needed - 0.4-0.6 🟠 Fair - Significant improvements needed - < 0.4 🔴 Needs Work - Re-extract with better prompts/documents
Formula: F1 = 2 × (Precision × Recall) / (Precision + Recall)
Understanding Validation Results¶
Matches Tab (✅)¶
Shows events successfully extracted and matched to gold standard.
What you'll see: - Side-by-side comparison of extracted vs. gold standard - Similarity score (how close the match is, 0-1) - Matched fields (title, dates, location, tags)
When matches have low similarity: - Check if dates differ slightly (extraction used approximate dates) - Verify title/description wording variations - Confirm location format differences
Missed Events Tab (⚠️)¶
Shows gold standard events that were not extracted.
Common causes: - Event mentioned in document but LLM failed to recognize it - Event description too vague or buried in text - Fallback extraction missed pattern - Document section not processed (truncation at 16,000 chars)
How to fix: - Re-extract with better document formatting - Manually add missed events via Data Table - Check if event was mentioned in truncated portion of long documents - Consider different document type classification
Extra Events Tab (➕)¶
Shows extracted events that don't match gold standard (false positives).
Common causes: - Generic extraction of non-events (e.g., skill lists, references) - Duplicate events with slight variations - Incorrectly parsed sentences - Fallback extraction creating noise
How to fix: - Delete extra events in Data Table - Increase minimum confidence filter - Manually review and keep legitimate events not in gold standard - Re-extract with more specific prompts
Duplicate Detection & Management¶
Why Duplicates Occur¶
Documents often mention the same event multiple times: - Resume lists job in "Experience" and "Summary" - Cover letter references same position from resume - Multiple documents describe same achievement
ChronoScope's duplicate detector identifies these using: - Title similarity - Fuzzy string matching - Date overlap - Events with overlapping timeframes - Location matching - Same place suggests same event
How to Find Duplicates¶
- Go to Validation tab
- Scroll to "Event Management" section
- Click "Find Potential Duplicates"
- Review groups of similar events
Similarity threshold: Default is 0.7 (70% similar) - Higher = stricter matching (fewer duplicates found) - Lower = looser matching (more duplicates, some false positives)
Managing Duplicate Groups¶
When duplicates are found, you'll see expandable groups:
Group Display: - Group 1: 3 similar events - Event A: "Software Engineer at Google" - Event B: "Software Engineer at Google Inc." - Event C: "SWE at Google"
Three Actions Available:
1. Merge Events¶
When to use: Events are truly duplicates, but contain complementary information.
How it works: 1. Review all events in the group 2. Select the "primary" event (best quality, most complete) 3. Click "Merge into Primary" 4. Secondary events' information merges into primary: - Tags combined - People combined - Description combined - Earliest start date used - Latest end date used 5. Secondary events deleted
Result: One comprehensive event with all information retained.
2. Delete Duplicates¶
When to use: Events are duplicates with no unique information.
How it works: 1. Select primary event to keep 2. Click "Delete Duplicates" 3. Only the primary survives; others deleted 4. No information merging occurs
Result: Clean timeline with duplicates removed, single event kept.
3. Delete Entire Group¶
When to use: All events in group are errors/noise.
How it works: 1. Click "Delete Entire Group" 2. Confirm deletion 3. All events in group removed
Result: Group completely eliminated from timeline.
Validation Workflow¶
Recommended Process¶
1. Initial Upload & Extraction - Upload all documents - Extract events (allow both LLM and fallback) - Don't filter or delete anything yet
2. Quick Visual Check - Go to Data Table - Sort by Confidence (ascending) to see worst extractions first - Look for: - Generic titles ("Event from resume") - Confidence < 0.7 - Today's date on historical events
3. Duplicate Detection - Go to Validation tab - Click "Find Potential Duplicates" - Merge or delete obvious duplicates - Keep ambiguous cases for manual review
4. Gold Standard Comparison (Optional) - Create gold standard file - Run validation - Review Missed Events - Add manually if important - Review Extra Events - Delete if noise
5. Iterative Improvement - Delete low-confidence events or re-extract documents - Manually edit event details for accuracy - Re-run validation to measure improvement - Repeat until F1-score > 0.8
Improving Validation Scores¶
If Precision is Low (Too Many Extra Events)¶
Solutions: - Increase minimum confidence filter (show only confident extractions) - Delete events with generic titles - Re-extract with more specific document classification - Review and delete obviously incorrect events
If Recall is Low (Missing Events)¶
Solutions: - Check truncation - Long documents may be cut off at 16,000 chars - Re-extract with full document text - Manually add missed events via Data Table - Check if events are mentioned in a format LLM doesn't recognize - Try uploading different versions of the document
If Both Precision and Recall are Low¶
Solutions: - Restart extraction process with better document preparation - Ensure documents are text-extractable (not scanned images) - Check OpenAI API key is working (not using fallback constantly) - Review extraction logs in Advanced Settings
Validation Best Practices¶
✅ Do: - Validate after every major document upload - Create gold standard for your most important document (resume/CV) - Review duplicates regularly (after adding 10+ events) - Check confidence scores before important use (job applications) - Delete or improve events with confidence < 0.6
❌ Don't: - Skip validation on documents you'll use professionally - Trust extraction blindly (always review) - Ignore "Extra Events" (they may be important edge cases) - Delete events just because they're not in gold standard (gold standard may be incomplete)
Troubleshooting¶
"No validation results appear"
Cause: Gold standard file missing or malformed.
Solution: Create gold_standard_data.json with proper format (see above).
"Precision is 0%"
Cause: No events matched gold standard at all.
Diagnosis: - Check date formats match (gold standard vs extracted) - Verify title similarity threshold isn't too strict - Ensure gold standard events actually exist in documents
"Everything is a duplicate"
Cause: Similarity threshold too low.
Solution: Duplicates are found at 0.7 (70%) similarity by default. This is currently hard-coded. Truly similar events should be merged; false positives can be ignored.
Next Steps¶
After validating your timeline:
- Export to TimelineJS - Share your clean timeline
- Filtering Events - Focus on high-quality events
- Viewing Timeline - Explore your validated data
Back to Documentation Home