ChronoScope - Personal Timeline Builder¶
Project Overview¶
ChronoScope is an intelligent personal timeline builder that transforms documents into interactive, chronological visualizations. Using LLM-powered extraction and Streamlit for the interface, it automatically processes resumes, cover letters, and other documents to create comprehensive life timelines.
Current Architecture¶
Core Components¶
- Data Models (✅ REFACTORED:
chrono_scope/models/timeline_event.py) TimelineEvent: Core event model with temporal, location, and metadata fields- Supports date ranges, people tagging, confidence scoring, and priority levels
- Enhanced methods:
duration_days(),to_dict(),is_ongoing(),overlaps_with() -
Monolith now imports from
chrono_scope.models(Week 2 refactoring complete) -
Document Processing (
timeline-mvp-pipeline.py:136-282) DocumentProcessor: Context-aware document classification and event extraction- LLM integration via OpenAI API with fallback to rule-based extraction
-
Document type detection: resume, cover letter, personal statement, general
-
Data Storage (✅ REFACTORED:
chrono_scope/services/timeline_store.py) TimelineStore: JSON-based persistence with backup/restore- Advanced filtering by date, people, tags, priority
- Duplicate detection with combined similarity scoring (title + date)
- Event CRUD operations and batch management
- Monolith now imports from
chrono_scope.services(Week 3 refactoring complete) -
Test coverage: 28 comprehensive unit tests
-
Visualization (
timeline-mvp-pipeline.py:356-473) TimelineVisualizer: Interactive Plotly-based timeline and matrix views-
Gantt charts, priority matrices, and temporal distribution charts
-
Streamlit App (
timeline-mvp-pipeline.py:475-652) - Web interface with document upload, filtering, and visualization
- Real-time processing with progress indicators
Current Data State¶
- Storage:
data/timeline_events.jsoncontains extracted events (standardized path as of Week 1) - User Notes:
data/user_notes.jsonfor stream-of-consciousness notes - Events Include: Work positions, education, certifications, projects
- Test Coverage: 25 comprehensive unit tests for TimelineEvent model (Week 2)
Development Setup¶
Environment¶
- Python 3.12 with virtual environment (
.venv/) - Streamlit configuration in
.streamlit/secrets.toml - OpenAI API integration for LLM processing
Key Dependencies¶
streamlit: Web interface frameworkplotly: Interactive visualization libraryopenai: LLM integration for document processingpandas: Data manipulationpython-dateutil: Flexible date parsing
UI/UX Design Philosophy & Implementation Guidelines¶
✅ COMPLETED: Major UI/UX Overhaul (September 2025)¶
Design Philosophy: "Professional Simplicity with Progressive Disclosure"
The ChronoScope interface follows a carefully crafted design approach that prioritizes user experience while maintaining powerful functionality. The core principle is "Alert Fatigue Elimination" - transforming a debug-console-like interface into a professional timeline application.
Key Design Principles Applied¶
- Information Hierarchy First
- Timeline content is the hero - prominently displayed above the fold
- Secondary features are progressively disclosed through Advanced Settings
-
Visual weight decreases as functionality becomes more technical
-
Alert Fatigue Prevention
- Reduced from 8+ alerts to 2-3 strategic notifications
- System diagnostics hidden by default in collapsible sections
-
User control over notification preferences with dismissible warnings
-
Professional Aesthetic Standards
- Gradient header design (#667eea to #764ba2) for brand identity
- Consistent color palette with semantic meaning (success, warning, info)
- Card-based layouts with subtle shadows (0 2px 4px rgba(0,0,0,0.1))
- Rounded corners (0.5rem) for modern, friendly appearance
CSS Design System Implementation¶
/* Core Design Tokens Used */
--primary-gradient: linear-gradient(135deg, #667eea 0%, #764ba2 100%)
--card-shadow: 0 2px 4px rgba(0,0,0,0.1)
--border-radius: 0.5rem
--success-color: #d4edda
--warning-color: #fff3cd
--info-color: #d1ecf1
Component Patterns: - Metric Cards: White background, subtle borders, consistent padding (1rem) - Section Dividers: Clean lines with proper spacing - Interactive Elements: Hover transitions (transform: translateY(-1px)) - Typography: Clear hierarchy with appropriate line heights
User Experience Patterns¶
- Progressive Disclosure
- Casual Users: See clean timeline interface with essential controls
- Power Users: Toggle Advanced Settings for LLM transparency, system health
-
Expert Users: Expandable sections for debugging and performance monitoring
-
Contextual Information
- Strategic Notifications: Only show critical or actionable alerts
- Dismissible Warnings: User preferences persist across sessions
-
Empty States: Engaging graphics with clear next steps
-
Visual Feedback
- Loading States: Clear progress indicators during processing
- Success Patterns: Green alerts with checkmark icons
- Error Handling: Contextual error messages with recovery suggestions
Implementation Guidelines for Future Development¶
ALWAYS Follow These UI Patterns:
- Before Adding Any Alert/Notification:
- Ask: "Is this critical for the user's current task?"
- Consider: "Can this be moved to Advanced Settings?"
-
Ensure: "Is this dismissible if it's informational?"
-
For New Features:
- Start with the main use case - don't expose advanced options immediately
- Use progressive disclosure - basic → intermediate → advanced
-
Maintain visual hierarchy - most important content gets most visual weight
-
CSS Standards:
- Always use the established color palette - don't introduce new colors
- Follow border-radius consistency (0.5rem for cards, 0.375rem for alerts)
- Maintain shadow patterns for depth and visual hierarchy
-
Use hover transitions for interactive elements
-
Information Architecture:
- Timeline content first - always prioritize the main functionality
- Administrative features last - system health, debug info, advanced settings
- Logical grouping - related features in same sidebar section
Session State Management for UX¶
# User Preference Patterns Implemented
if 'show_advanced_settings' not in st.session_state:
st.session_state.show_advanced_settings = False
if 'show_langchain_warning' not in st.session_state:
st.session_state.show_langchain_warning = not LANGCHAIN_AVAILABLE
Key Learnings: - User preferences must persist within sessions - Toggle states should be intuitive (show/hide patterns) - Default to simplest view for new users
Measured Success Metrics¶
- 70% reduction in UI noise (from 8+ alerts to 2-3)
- Improved visual hierarchy with timeline content prominently featured
- Professional aesthetic matching modern SaaS application standards
- Maintained full functionality while dramatically improving clarity
Anti-Patterns to Avoid¶
❌ Never Do: - Show system health checks prominently unless there are critical issues - Display technical debugging information in the main interface - Use more than 3 alert boxes on a single view - Introduce new colors without updating the design system - Place advanced features above basic functionality
✅ Always Do: - Prioritize timeline content in visual hierarchy - Use collapsible sections for advanced features - Maintain consistent spacing and visual patterns - Test with both demo data and real user content - Consider mobile responsiveness in layout decisions
Design Process & Testing Methodology¶
Testing Approach Used: 1. Browser-Based Evaluation: Used Playwright automation to capture before/after screenshots 2. Interactive Testing: Tested all user flows (demo data loading, advanced settings, tab navigation) 3. Alert Audit: Systematically identified and categorized every alert/notification 4. Progressive Enhancement: Implemented changes incrementally and tested at each step
Key Testing Insights: - Screenshot comparison revealed dramatic improvement in visual hierarchy - User flow testing confirmed functionality preservation during UI changes - Alert categorization identified 5 unnecessary system notifications - Session state testing validated user preference persistence
Files Generated During Testing: - streamlit_app_current_state.png - Original cluttered interface - improved_streamlit_app.png - Clean interface with demo data - improved_app_with_advanced_settings.png - Advanced mode demonstration
Recommendation for Future UI Changes: Always use browser automation testing to validate UI changes and capture visual evidence of improvements. The before/after screenshots provide clear justification for design decisions and help maintain design quality standards.
User Data & Preferences Management¶
User Notes Storage System ✅ IMPLEMENTED¶
Implementation: JSON-based persistent storage for stream-of-consciousness notes
Core Components¶
UserNotesStoreClass (timeline-mvp-pipeline.py:837-925)- JSON-based storage with backup/restore functionality
- Multiple named notes support with metadata tracking
- Character count and last modified timestamps
-
Error handling with automatic backup recovery
-
User Interface Integration
- Placement: Dedicated section below main tabs (not sidebar)
- Features: Multiple named notes, manual save, delete functionality
- User Experience: Persistent across sessions, clear metadata display
Storage Pattern¶
# user_notes.json structure
{
"notes": {
"note_name": {
"content": "User's stream-of-consciousness text...",
"last_modified": "2025-09-15T10:30:00",
"character_count": 150
}
},
"last_updated": "2025-09-15T10:30:00",
"version": "1.0"
}
UI/UX Patterns for User Data¶
- Progressive Note Management
- Empty State: Clear explanation and helpful placeholder text
- Single Note: Simple interface focused on content creation
-
Multiple Notes: Dropdown selection with metadata display
-
Data Persistence Standards
- Automatic Loading: Notes loaded on app startup via session state
- Manual Save: User-controlled with clear feedback
-
Error Recovery: Backup system prevents data loss
-
Visual Design Consistency
- Layout: 3:1 column ratio for content vs metadata
- Buttons: Primary save button, secondary delete with confirmation
- Feedback: Success/error messages with appropriate icons
Session State Management Patterns ✅ ENHANCED¶
# User Preference Patterns (Extended)
if 'show_advanced_settings' not in st.session_state:
st.session_state.show_advanced_settings = False
if 'show_langchain_warning' not in st.session_state:
st.session_state.show_langchain_warning = not LANGCHAIN_AVAILABLE
if 'notes_store' not in st.session_state:
st.session_state.notes_store = UserNotesStore()
Key Principles: - Lazy Initialization: Create storage objects only when needed - Persistent State: User preferences survive page refreshes - Clear Separation: Different storage systems for different data types
Interactive Component Standards ✅ ENHANCED¶
Tooltip Design Specifications¶
Implementation: Enhanced Plotly hover templates with professional formatting
Current Standards Applied:¶
-
Typography Hierarchy
-
Content Structure
- Visual Icons: Emojis for quick visual parsing (📅📍📝🏷️👥⭐🎯)
- Bold Labels: Clear field identification
- Truncated Content: Descriptions limited to 80-120 characters
-
Spacing: Double line breaks between title and content
-
Information Priority
- Primary: Event title (largest, prominent color)
- Secondary: Date, location, description (structured layout)
- Tertiary: Tags, people, metadata (smaller but accessible)
Tooltip Best Practices¶
✅ Do: - Use consistent icon vocabulary across all tooltips - Truncate long descriptions with "..." indicator - Show percentage format for confidence scores (85.3% vs 0.853) - Include priority as "X/10" format for clarity - Maintain visual hierarchy with font sizes and colors
❌ Don't: - Exceed 3-4 lines of detailed information - Use technical field names (use "Date" not "start_date") - Include empty or null fields without handling - Mix formatting styles across different chart types
Performance Standards¶
- Hover Response: <200ms for tooltip appearance
- Content Loading: Tooltips should not trigger additional API calls
- Memory Usage: Tooltip templates should be static, not dynamic generation
Current Development Phase: Transparency & Monitoring Focus¶
Phase 1: Transparent Data Pipeline (In Progress)¶
Core Philosophy: Full visibility into LLM extraction process, performance tracking, and user control over data quality.
1.1 Essential UX/UI Features ✅ STARTED¶
- ✅ Clear Data Button: Implemented with confirmation dialog
- 🚧 Events Table View: Sortable, filterable table showing all extracted data
- Columns: Title, Date Range, Confidence, Source Doc, Model Used, Processing Time
- 🚧 Event Merge Interface: Side-by-side comparison tool for duplicate events
- 🚧 LLM Transparency Panel:
- Show actual prompt sent to LLM
- Display model name, parameters (temperature, max_tokens)
- Show raw LLM response vs parsed result
- Processing time and cost tracking
1.2 Enhanced Error Handling & Feedback¶
- 🚧 Processing Status Dashboard: Real-time progress with detailed logging
- 🚧 Error Recovery Interface: Show failed extractions with retry options
- 🚧 Validation Warnings: Highlight suspicious dates, missing data, low confidence scores
- 🚧 Debug Mode Toggle: Advanced users can see full processing pipeline
1.3 Performance Tracking Infrastructure¶
- 🚧 Document Type Performance Matrix: Track extraction success by document type
- 🚧 Model Performance Comparison: A/B test different models/prompts
- 🚧 Extraction Metrics Dashboard:
- Events extracted per document
- Average confidence scores
- Processing time trends
- Error rates by document type
- 🚧 Data Quality Reports:
- Duplicate detection statistics
- Date parsing accuracy
- Missing field analysis
Phase 2: Beautiful Interactive Visualization (Planned)¶
2.1 Enhanced Plotly Timeline Features¶
- 🔲 Multi-Track Timeline: Separate tracks for career, education, personal
- 🔲 Zoom & Pan Controls: Smooth navigation across time periods
- 🔲 Event Detail Popups: Rich hover cards with full context
- 🔲 Timeline Themes: Visual styles (professional, academic, personal)
- 🔲 Filtering Animations: Smooth transitions when applying filters
2.2 Advanced Analytics Views¶
- 🔲 Performance Monitoring Charts:
- Extraction accuracy over time
- Processing time trends
- Model performance comparison
- 🔲 Data Quality Visualization:
- Confidence score distribution
- Coverage gaps timeline
- Duplicate detection network graph
Phase 3: Export & Documentation (Planned)¶
3.1 Multi-Format Export¶
- 🔲 PDF Reports: Professional timeline with metadata
- 🔲 Excel/CSV: Raw data with processing annotations
- 🔲 JSON: Full event data with provenance tracking
- 🔲 PNG/SVG: High-quality timeline images
- 🔲 HTML: Standalone interactive timeline
3.2 Processing Documentation¶
- 🔲 Extraction Report: Per-document processing summary
- 🔲 Model Performance Report: Comparative analysis across runs
- 🔲 Data Lineage Export: Full traceability from source to timeline
Medium-Term Enhancements¶
- Advanced Visualization Features
- Timeline clustering and grouping
- Geographic visualization integration
- Interactive timeline editing capabilities
-
Export functionality (PDF, PNG, etc.)
-
Data Management Improvements
- Migration from JSON to proper database (SQLite/PostgreSQL)
- Event relationship modeling
- Version control for timeline changes
-
Backup and restore functionality
-
AI/ML Enhancements
- Improved document classification
- Event importance scoring
- Timeline gap detection and suggestions
- Smart event merging and deduplication
Long-Term Vision¶
- Multi-Modal Input Support
- Email processing integration
- Social media timeline import
- Calendar and photo metadata extraction
-
Voice recording transcription and processing
-
Advanced Analytics
- Life pattern recognition
- Goal tracking and milestone prediction
- Career trajectory analysis
-
Personal growth metrics
-
Collaboration Features
- Shared timelines for teams/families
- Permission-based access control
- Real-time collaborative editing
- Timeline merging and branching
Visual Documentation Strategy¶
Architecture Diagrams (Current Phase)¶
- Data Flow Diagram: Document → LLM → Validation → Storage → Visualization
- Class Relationship Diagram: Core components and their interactions
- State Management Flow: How data flows through Streamlit session state
- Error Handling Flowchart: Decision trees for recovery strategies
Performance Monitoring Visuals (Next Phase)¶
- Extraction Pipeline Dashboard: Real-time processing metrics
- Model Performance Heatmap: Success rates by document type
- Data Quality Timeline: Track improvements over time
- User Journey Map: From upload to final timeline
Technical Documentation (Future)¶
- API Specification Diagrams: Future integration points
- Database Schema Evolution: Migration from JSON to structured storage
- Component Architecture: Modular design for scaling
- Testing Strategy Flowchart: Coverage and validation approaches
Development Commands¶
Initial Setup¶
# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure OpenAI API key in .streamlit/secrets.toml
Running the Application¶
# Start the Streamlit development server
streamlit run timeline-mvp-pipeline.py
# Run with debug mode
streamlit run timeline-mvp-pipeline.py --logger.level debug
Testing & Quality Assurance¶
# Run unit tests (when implemented)
pytest tests/ -v
# Run tests with coverage
pytest tests/ --cov=. --cov-report=html
# Format code with Black
black timeline-mvp-pipeline.py
# Lint with flake8
flake8 timeline-mvp-pipeline.py --max-line-length=88
# Sort imports
isort timeline-mvp-pipeline.py
# Type checking with mypy
mypy timeline-mvp-pipeline.py
Performance Monitoring Commands¶
# Generate performance monitoring dashboard
streamlit run timeline-mvp-pipeline.py --server.port 8502
# Export extraction performance logs
python -c "from timeline_store import export_performance_logs; export_performance_logs()"
# Generate visual documentation
python generate_docs.py --diagrams
# Test LangChain prompt templates
python test_prompts.py --validate-templates
# Export timeline data
python export_timeline.py --format pdf --output timeline_report.pdf
LangChain Integration Commands¶
# Install LangChain dependencies
pip install langchain langchain-openai langchain-community
# Test prompt templates
python -m langchain.prompts.test timeline_prompts.yaml
# Monitor LLM calls and costs
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
Development Workflow¶
- Main application file:
timeline-mvp-pipeline.py - Data storage:
timeline_events.json - Configuration:
.streamlit/secrets.toml - Test documents:
test-documents/ - Performance logs:
logs/extraction_performance.json - Visual documentation:
docs/diagrams/
Technical Debt & Known Issues¶
- Code Organization
- Single-file application needs modularization
- Missing error handling in several areas
-
Hard-coded configuration values
-
Data Quality Issues
- Inconsistent date parsing in fallback mode
- Missing end dates for most events
-
Limited event validation
-
Performance Concerns
- Loading entire JSON file for each operation
- No caching for expensive LLM calls
- Synchronous document processing
Testing & Quality Assurance¶
Current State¶
- No formal testing framework implemented
- Manual testing via Streamlit interface
- Demo data available for development testing
Testing Framework Setup¶
# Install testing dependencies
pip install pytest pytest-cov black flake8 isort mypy
# Create requirements-dev.txt
echo "pytest>=7.0" >> requirements-dev.txt
echo "pytest-cov>=4.0" >> requirements-dev.txt
echo "black>=22.0" >> requirements-dev.txt
echo "flake8>=5.0" >> requirements-dev.txt
echo "isort>=5.0" >> requirements-dev.txt
echo "mypy>=1.0" >> requirements-dev.txt
Test Structure¶
tests/
├── __init__.py
├── test_datetime_manager.py # Date parsing utilities
├── test_document_processor.py # LLM and fallback extraction
├── test_timeline_store.py # Data persistence and filtering
├── test_timeline_visualizer.py # Visualization components
└── fixtures/
├── sample_resume.txt
├── sample_cover_letter.txt
└── expected_events.json
Recommended Test Coverage¶
- Unit tests for
DateTimeManagerdate parsing edge cases - Integration tests for document processing pipeline
- Data validation tests for
TimelineEventmodel - Storage persistence and filtering logic tests
- Mock LLM responses for consistent testing
Streamlit-Specific Testing Methodology ✅ LEARNED FROM PRACTICE¶
Key Insights from Row Deletion Feature Implementation (Sept 2025)¶
The Challenge: Streamlit apps have unique testing constraints that differ from traditional Python applications due to their reactive, state-driven nature.
1. Isolated Component Testing Strategy¶
Problem: When st.data_editor wasn't displaying in the main app, traditional debugging failed to identify the issue.
Solution: Create minimal, isolated test files to verify components work independently:
# test_data_editor.py - Minimal test to verify st.data_editor works
import streamlit as st
import pandas as pd
data = [{"Select": False, "Name": "Alice", "Age": 25}]
df = pd.DataFrame(data)
edited_df = st.data_editor(df, column_config={
"Select": st.column_config.CheckboxColumn("Select", default=False)
})
Key Learning: Always test Streamlit components in isolation first, then integrate.
2. Browser Automation Testing¶
Problem: Backend logs showed no errors, but user reported "excessive API calls" during deletion.
Solution: Use Playwright browser automation to reproduce exact user interactions:
# Test actual button clicks and observe backend behavior
await page.getByRole('button', { name: '🔍 Find Potential Duplicates' }).click()
await page.getByRole('button', { name: '🗑️ Delete Entire Group' }).click()
Key Learning: Streamlit UI bugs often only manifest through actual browser interactions, not programmatic testing.
3. st.rerun() Performance Debugging¶
Problem: "Excessive API calls" turned out to be excessive app reruns, not actual API calls.
Diagnosis Method: 1. Monitor backend logs during user interactions 2. Look for repeated processing patterns:
INFO:🔍 FILTERING: Result - 8 events after filtering
INFO:🔍 FILTERING: Result - 8 events after filtering # <- Duplicate processing
INFO:🔍 FILTERING: Result - 8 events after filtering
st.rerun() calls in confirmation dialogs and button handlers Key Learning: Every st.rerun() triggers full app re-execution. Use sparingly and strategically.
4. Streamlit State Management Testing Patterns¶
Best Practices Discovered:
# ✅ Good: Test session state initialization
if 'selected_event_ids' not in st.session_state:
st.session_state.selected_event_ids = []
# ✅ Good: Test state updates without immediate rerun
st.session_state.show_confirmation = True
# ... other logic
if some_condition:
st.rerun() # Single rerun at end
# ❌ Bad: Multiple reruns in sequence
if st.button("Action"):
st.session_state.state = "updated"
st.rerun() # Rerun 1
if condition:
st.session_state.another_state = "changed"
st.rerun() # Rerun 2 - causes performance issues
5. Multi-Modal Testing Approach¶
Effective Testing Strategy:
- Unit Level: Test core functions (
delete_event(),delete_multiple_events()) independently - Component Level: Test Streamlit widgets in isolation (
st.data_editor,st.button) - Integration Level: Test widget interactions and state management
- End-to-End Level: Use browser automation for full user workflows
- Performance Level: Monitor backend logs for excessive processing
6. Common Streamlit Pitfalls to Test For¶
- Layout Conflicts: Widgets not displaying due to container/column conflicts
- Session State Race Conditions: State updates happening in wrong order
- Rerun Loops: Infinite or excessive reruns due to poorly managed state
- Widget Key Conflicts: Duplicate widget keys causing state corruption
- Tab State Management: Widgets in different tabs interfering with each other
Testing Command Patterns¶
# Quick component test
streamlit run test_component.py --server.port 8503 --server.headless true
# Full app test with monitoring
streamlit run main_app.py --server.port 8502 --server.headless true
# Monitor logs in separate terminal
# Browser automation test
python test_browser_interactions.py
Key Insight: Streamlit testing requires both programmatic verification AND actual browser interaction testing to catch real-world issues.
Contributing Guidelines¶
Code Style¶
- Follow existing patterns in
timeline-mvp-pipeline.py - Use dataclasses for data models
- Maintain type hints throughout
- Document complex functions with docstrings
Development Process¶
- Test changes with demo data first
- Verify date parsing accuracy
- Check visualization rendering
- Validate JSON storage integrity
Future Considerations¶
- Scalability: Plan for migration to graph database (Neo4j) for complex relationships
- Privacy: Implement local-only processing options for sensitive documents
- Integration: Design APIs for third-party calendar and document systems
- Mobile: Consider responsive design for mobile timeline viewing
- Offline: Plan for offline-capable document processing