Skip to content

ChronoScope - Personal Timeline Builder

Project Overview

ChronoScope is an intelligent personal timeline builder that transforms documents into interactive, chronological visualizations. Using LLM-powered extraction and Streamlit for the interface, it automatically processes resumes, cover letters, and other documents to create comprehensive life timelines.

Current Architecture

Core Components

  1. Data Models (✅ REFACTORED: chrono_scope/models/timeline_event.py)
  2. TimelineEvent: Core event model with temporal, location, and metadata fields
  3. Supports date ranges, people tagging, confidence scoring, and priority levels
  4. Enhanced methods: duration_days(), to_dict(), is_ongoing(), overlaps_with()
  5. Monolith now imports from chrono_scope.models (Week 2 refactoring complete)

  6. Document Processing (timeline-mvp-pipeline.py:136-282)

  7. DocumentProcessor: Context-aware document classification and event extraction
  8. LLM integration via OpenAI API with fallback to rule-based extraction
  9. Document type detection: resume, cover letter, personal statement, general

  10. Data Storage (✅ REFACTORED: chrono_scope/services/timeline_store.py)

  11. TimelineStore: JSON-based persistence with backup/restore
  12. Advanced filtering by date, people, tags, priority
  13. Duplicate detection with combined similarity scoring (title + date)
  14. Event CRUD operations and batch management
  15. Monolith now imports from chrono_scope.services (Week 3 refactoring complete)
  16. Test coverage: 28 comprehensive unit tests

  17. Visualization (timeline-mvp-pipeline.py:356-473)

  18. TimelineVisualizer: Interactive Plotly-based timeline and matrix views
  19. Gantt charts, priority matrices, and temporal distribution charts

  20. Streamlit App (timeline-mvp-pipeline.py:475-652)

  21. Web interface with document upload, filtering, and visualization
  22. Real-time processing with progress indicators

Current Data State

  • Storage: data/timeline_events.json contains extracted events (standardized path as of Week 1)
  • User Notes: data/user_notes.json for stream-of-consciousness notes
  • Events Include: Work positions, education, certifications, projects
  • Test Coverage: 25 comprehensive unit tests for TimelineEvent model (Week 2)

Development Setup

Environment

  • Python 3.12 with virtual environment (.venv/)
  • Streamlit configuration in .streamlit/secrets.toml
  • OpenAI API integration for LLM processing

Key Dependencies

  • streamlit: Web interface framework
  • plotly: Interactive visualization library
  • openai: LLM integration for document processing
  • pandas: Data manipulation
  • python-dateutil: Flexible date parsing

UI/UX Design Philosophy & Implementation Guidelines

COMPLETED: Major UI/UX Overhaul (September 2025)

Design Philosophy: "Professional Simplicity with Progressive Disclosure"

The ChronoScope interface follows a carefully crafted design approach that prioritizes user experience while maintaining powerful functionality. The core principle is "Alert Fatigue Elimination" - transforming a debug-console-like interface into a professional timeline application.

Key Design Principles Applied

  1. Information Hierarchy First
  2. Timeline content is the hero - prominently displayed above the fold
  3. Secondary features are progressively disclosed through Advanced Settings
  4. Visual weight decreases as functionality becomes more technical

  5. Alert Fatigue Prevention

  6. Reduced from 8+ alerts to 2-3 strategic notifications
  7. System diagnostics hidden by default in collapsible sections
  8. User control over notification preferences with dismissible warnings

  9. Professional Aesthetic Standards

  10. Gradient header design (#667eea to #764ba2) for brand identity
  11. Consistent color palette with semantic meaning (success, warning, info)
  12. Card-based layouts with subtle shadows (0 2px 4px rgba(0,0,0,0.1))
  13. Rounded corners (0.5rem) for modern, friendly appearance

CSS Design System Implementation

/* Core Design Tokens Used */
--primary-gradient: linear-gradient(135deg, #667eea 0%, #764ba2 100%)
--card-shadow: 0 2px 4px rgba(0,0,0,0.1)
--border-radius: 0.5rem
--success-color: #d4edda
--warning-color: #fff3cd
--info-color: #d1ecf1

Component Patterns: - Metric Cards: White background, subtle borders, consistent padding (1rem) - Section Dividers: Clean lines with proper spacing - Interactive Elements: Hover transitions (transform: translateY(-1px)) - Typography: Clear hierarchy with appropriate line heights

User Experience Patterns

  1. Progressive Disclosure
  2. Casual Users: See clean timeline interface with essential controls
  3. Power Users: Toggle Advanced Settings for LLM transparency, system health
  4. Expert Users: Expandable sections for debugging and performance monitoring

  5. Contextual Information

  6. Strategic Notifications: Only show critical or actionable alerts
  7. Dismissible Warnings: User preferences persist across sessions
  8. Empty States: Engaging graphics with clear next steps

  9. Visual Feedback

  10. Loading States: Clear progress indicators during processing
  11. Success Patterns: Green alerts with checkmark icons
  12. Error Handling: Contextual error messages with recovery suggestions

Implementation Guidelines for Future Development

ALWAYS Follow These UI Patterns:

  1. Before Adding Any Alert/Notification:
  2. Ask: "Is this critical for the user's current task?"
  3. Consider: "Can this be moved to Advanced Settings?"
  4. Ensure: "Is this dismissible if it's informational?"

  5. For New Features:

  6. Start with the main use case - don't expose advanced options immediately
  7. Use progressive disclosure - basic → intermediate → advanced
  8. Maintain visual hierarchy - most important content gets most visual weight

  9. CSS Standards:

  10. Always use the established color palette - don't introduce new colors
  11. Follow border-radius consistency (0.5rem for cards, 0.375rem for alerts)
  12. Maintain shadow patterns for depth and visual hierarchy
  13. Use hover transitions for interactive elements

  14. Information Architecture:

  15. Timeline content first - always prioritize the main functionality
  16. Administrative features last - system health, debug info, advanced settings
  17. Logical grouping - related features in same sidebar section

Session State Management for UX

# User Preference Patterns Implemented
if 'show_advanced_settings' not in st.session_state:
    st.session_state.show_advanced_settings = False
if 'show_langchain_warning' not in st.session_state:
    st.session_state.show_langchain_warning = not LANGCHAIN_AVAILABLE

Key Learnings: - User preferences must persist within sessions - Toggle states should be intuitive (show/hide patterns) - Default to simplest view for new users

Measured Success Metrics

  • 70% reduction in UI noise (from 8+ alerts to 2-3)
  • Improved visual hierarchy with timeline content prominently featured
  • Professional aesthetic matching modern SaaS application standards
  • Maintained full functionality while dramatically improving clarity

Anti-Patterns to Avoid

Never Do: - Show system health checks prominently unless there are critical issues - Display technical debugging information in the main interface - Use more than 3 alert boxes on a single view - Introduce new colors without updating the design system - Place advanced features above basic functionality

Always Do: - Prioritize timeline content in visual hierarchy - Use collapsible sections for advanced features - Maintain consistent spacing and visual patterns - Test with both demo data and real user content - Consider mobile responsiveness in layout decisions

Design Process & Testing Methodology

Testing Approach Used: 1. Browser-Based Evaluation: Used Playwright automation to capture before/after screenshots 2. Interactive Testing: Tested all user flows (demo data loading, advanced settings, tab navigation) 3. Alert Audit: Systematically identified and categorized every alert/notification 4. Progressive Enhancement: Implemented changes incrementally and tested at each step

Key Testing Insights: - Screenshot comparison revealed dramatic improvement in visual hierarchy - User flow testing confirmed functionality preservation during UI changes - Alert categorization identified 5 unnecessary system notifications - Session state testing validated user preference persistence

Files Generated During Testing: - streamlit_app_current_state.png - Original cluttered interface - improved_streamlit_app.png - Clean interface with demo data - improved_app_with_advanced_settings.png - Advanced mode demonstration

Recommendation for Future UI Changes: Always use browser automation testing to validate UI changes and capture visual evidence of improvements. The before/after screenshots provide clear justification for design decisions and help maintain design quality standards.


User Data & Preferences Management

User Notes Storage System ✅ IMPLEMENTED

Implementation: JSON-based persistent storage for stream-of-consciousness notes

Core Components

  1. UserNotesStore Class (timeline-mvp-pipeline.py:837-925)
  2. JSON-based storage with backup/restore functionality
  3. Multiple named notes support with metadata tracking
  4. Character count and last modified timestamps
  5. Error handling with automatic backup recovery

  6. User Interface Integration

  7. Placement: Dedicated section below main tabs (not sidebar)
  8. Features: Multiple named notes, manual save, delete functionality
  9. User Experience: Persistent across sessions, clear metadata display

Storage Pattern

# user_notes.json structure
{
  "notes": {
    "note_name": {
      "content": "User's stream-of-consciousness text...",
      "last_modified": "2025-09-15T10:30:00",
      "character_count": 150
    }
  },
  "last_updated": "2025-09-15T10:30:00",
  "version": "1.0"
}

UI/UX Patterns for User Data

  1. Progressive Note Management
  2. Empty State: Clear explanation and helpful placeholder text
  3. Single Note: Simple interface focused on content creation
  4. Multiple Notes: Dropdown selection with metadata display

  5. Data Persistence Standards

  6. Automatic Loading: Notes loaded on app startup via session state
  7. Manual Save: User-controlled with clear feedback
  8. Error Recovery: Backup system prevents data loss

  9. Visual Design Consistency

  10. Layout: 3:1 column ratio for content vs metadata
  11. Buttons: Primary save button, secondary delete with confirmation
  12. Feedback: Success/error messages with appropriate icons

Session State Management Patterns ✅ ENHANCED

# User Preference Patterns (Extended)
if 'show_advanced_settings' not in st.session_state:
    st.session_state.show_advanced_settings = False
if 'show_langchain_warning' not in st.session_state:
    st.session_state.show_langchain_warning = not LANGCHAIN_AVAILABLE
if 'notes_store' not in st.session_state:
    st.session_state.notes_store = UserNotesStore()

Key Principles: - Lazy Initialization: Create storage objects only when needed - Persistent State: User preferences survive page refreshes - Clear Separation: Different storage systems for different data types

Interactive Component Standards ✅ ENHANCED

Tooltip Design Specifications

Implementation: Enhanced Plotly hover templates with professional formatting

Current Standards Applied:

  1. Typography Hierarchy

    /* Title: 16px, dark blue, bold */
    <b style='font-size:16px; color:#2c3e50'>%{name}</b>
    
    /* Body: 13px, structured with icons */
    <span style='font-size:13px'>
    📅 <b>Date:</b> %{x}<br>
    📍 <b>Location:</b> ...
    

  2. Content Structure

  3. Visual Icons: Emojis for quick visual parsing (📅📍📝🏷️👥⭐🎯)
  4. Bold Labels: Clear field identification
  5. Truncated Content: Descriptions limited to 80-120 characters
  6. Spacing: Double line breaks between title and content

  7. Information Priority

  8. Primary: Event title (largest, prominent color)
  9. Secondary: Date, location, description (structured layout)
  10. Tertiary: Tags, people, metadata (smaller but accessible)

Tooltip Best Practices

Do: - Use consistent icon vocabulary across all tooltips - Truncate long descriptions with "..." indicator - Show percentage format for confidence scores (85.3% vs 0.853) - Include priority as "X/10" format for clarity - Maintain visual hierarchy with font sizes and colors

Don't: - Exceed 3-4 lines of detailed information - Use technical field names (use "Date" not "start_date") - Include empty or null fields without handling - Mix formatting styles across different chart types

Performance Standards

  • Hover Response: <200ms for tooltip appearance
  • Content Loading: Tooltips should not trigger additional API calls
  • Memory Usage: Tooltip templates should be static, not dynamic generation

Current Development Phase: Transparency & Monitoring Focus

Phase 1: Transparent Data Pipeline (In Progress)

Core Philosophy: Full visibility into LLM extraction process, performance tracking, and user control over data quality.

1.1 Essential UX/UI Features ✅ STARTED

  • Clear Data Button: Implemented with confirmation dialog
  • 🚧 Events Table View: Sortable, filterable table showing all extracted data
  • Columns: Title, Date Range, Confidence, Source Doc, Model Used, Processing Time
  • 🚧 Event Merge Interface: Side-by-side comparison tool for duplicate events
  • 🚧 LLM Transparency Panel:
  • Show actual prompt sent to LLM
  • Display model name, parameters (temperature, max_tokens)
  • Show raw LLM response vs parsed result
  • Processing time and cost tracking

1.2 Enhanced Error Handling & Feedback

  • 🚧 Processing Status Dashboard: Real-time progress with detailed logging
  • 🚧 Error Recovery Interface: Show failed extractions with retry options
  • 🚧 Validation Warnings: Highlight suspicious dates, missing data, low confidence scores
  • 🚧 Debug Mode Toggle: Advanced users can see full processing pipeline

1.3 Performance Tracking Infrastructure

  • 🚧 Document Type Performance Matrix: Track extraction success by document type
  • 🚧 Model Performance Comparison: A/B test different models/prompts
  • 🚧 Extraction Metrics Dashboard:
  • Events extracted per document
  • Average confidence scores
  • Processing time trends
  • Error rates by document type
  • 🚧 Data Quality Reports:
  • Duplicate detection statistics
  • Date parsing accuracy
  • Missing field analysis

Phase 2: Beautiful Interactive Visualization (Planned)

2.1 Enhanced Plotly Timeline Features

  • 🔲 Multi-Track Timeline: Separate tracks for career, education, personal
  • 🔲 Zoom & Pan Controls: Smooth navigation across time periods
  • 🔲 Event Detail Popups: Rich hover cards with full context
  • 🔲 Timeline Themes: Visual styles (professional, academic, personal)
  • 🔲 Filtering Animations: Smooth transitions when applying filters

2.2 Advanced Analytics Views

  • 🔲 Performance Monitoring Charts:
  • Extraction accuracy over time
  • Processing time trends
  • Model performance comparison
  • 🔲 Data Quality Visualization:
  • Confidence score distribution
  • Coverage gaps timeline
  • Duplicate detection network graph

Phase 3: Export & Documentation (Planned)

3.1 Multi-Format Export

  • 🔲 PDF Reports: Professional timeline with metadata
  • 🔲 Excel/CSV: Raw data with processing annotations
  • 🔲 JSON: Full event data with provenance tracking
  • 🔲 PNG/SVG: High-quality timeline images
  • 🔲 HTML: Standalone interactive timeline

3.2 Processing Documentation

  • 🔲 Extraction Report: Per-document processing summary
  • 🔲 Model Performance Report: Comparative analysis across runs
  • 🔲 Data Lineage Export: Full traceability from source to timeline

Medium-Term Enhancements

  1. Advanced Visualization Features
  2. Timeline clustering and grouping
  3. Geographic visualization integration
  4. Interactive timeline editing capabilities
  5. Export functionality (PDF, PNG, etc.)

  6. Data Management Improvements

  7. Migration from JSON to proper database (SQLite/PostgreSQL)
  8. Event relationship modeling
  9. Version control for timeline changes
  10. Backup and restore functionality

  11. AI/ML Enhancements

  12. Improved document classification
  13. Event importance scoring
  14. Timeline gap detection and suggestions
  15. Smart event merging and deduplication

Long-Term Vision

  1. Multi-Modal Input Support
  2. Email processing integration
  3. Social media timeline import
  4. Calendar and photo metadata extraction
  5. Voice recording transcription and processing

  6. Advanced Analytics

  7. Life pattern recognition
  8. Goal tracking and milestone prediction
  9. Career trajectory analysis
  10. Personal growth metrics

  11. Collaboration Features

  12. Shared timelines for teams/families
  13. Permission-based access control
  14. Real-time collaborative editing
  15. Timeline merging and branching

Visual Documentation Strategy

Architecture Diagrams (Current Phase)

  1. Data Flow Diagram: Document → LLM → Validation → Storage → Visualization
  2. Class Relationship Diagram: Core components and their interactions
  3. State Management Flow: How data flows through Streamlit session state
  4. Error Handling Flowchart: Decision trees for recovery strategies

Performance Monitoring Visuals (Next Phase)

  1. Extraction Pipeline Dashboard: Real-time processing metrics
  2. Model Performance Heatmap: Success rates by document type
  3. Data Quality Timeline: Track improvements over time
  4. User Journey Map: From upload to final timeline

Technical Documentation (Future)

  1. API Specification Diagrams: Future integration points
  2. Database Schema Evolution: Migration from JSON to structured storage
  3. Component Architecture: Modular design for scaling
  4. Testing Strategy Flowchart: Coverage and validation approaches

Development Commands

Initial Setup

# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure OpenAI API key in .streamlit/secrets.toml

Running the Application

# Start the Streamlit development server
streamlit run timeline-mvp-pipeline.py

# Run with debug mode
streamlit run timeline-mvp-pipeline.py --logger.level debug

Testing & Quality Assurance

# Run unit tests (when implemented)
pytest tests/ -v

# Run tests with coverage
pytest tests/ --cov=. --cov-report=html

# Format code with Black
black timeline-mvp-pipeline.py

# Lint with flake8
flake8 timeline-mvp-pipeline.py --max-line-length=88

# Sort imports
isort timeline-mvp-pipeline.py

# Type checking with mypy
mypy timeline-mvp-pipeline.py

Performance Monitoring Commands

# Generate performance monitoring dashboard
streamlit run timeline-mvp-pipeline.py --server.port 8502

# Export extraction performance logs
python -c "from timeline_store import export_performance_logs; export_performance_logs()"

# Generate visual documentation
python generate_docs.py --diagrams

# Test LangChain prompt templates
python test_prompts.py --validate-templates

# Export timeline data
python export_timeline.py --format pdf --output timeline_report.pdf

LangChain Integration Commands

# Install LangChain dependencies
pip install langchain langchain-openai langchain-community

# Test prompt templates
python -m langchain.prompts.test timeline_prompts.yaml

# Monitor LLM calls and costs
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"

Development Workflow

  • Main application file: timeline-mvp-pipeline.py
  • Data storage: timeline_events.json
  • Configuration: .streamlit/secrets.toml
  • Test documents: test-documents/
  • Performance logs: logs/extraction_performance.json
  • Visual documentation: docs/diagrams/

Technical Debt & Known Issues

  1. Code Organization
  2. Single-file application needs modularization
  3. Missing error handling in several areas
  4. Hard-coded configuration values

  5. Data Quality Issues

  6. Inconsistent date parsing in fallback mode
  7. Missing end dates for most events
  8. Limited event validation

  9. Performance Concerns

  10. Loading entire JSON file for each operation
  11. No caching for expensive LLM calls
  12. Synchronous document processing

Testing & Quality Assurance

Current State

  • No formal testing framework implemented
  • Manual testing via Streamlit interface
  • Demo data available for development testing

Testing Framework Setup

# Install testing dependencies
pip install pytest pytest-cov black flake8 isort mypy

# Create requirements-dev.txt
echo "pytest>=7.0" >> requirements-dev.txt
echo "pytest-cov>=4.0" >> requirements-dev.txt
echo "black>=22.0" >> requirements-dev.txt
echo "flake8>=5.0" >> requirements-dev.txt
echo "isort>=5.0" >> requirements-dev.txt
echo "mypy>=1.0" >> requirements-dev.txt

Test Structure

tests/
├── __init__.py
├── test_datetime_manager.py      # Date parsing utilities
├── test_document_processor.py    # LLM and fallback extraction
├── test_timeline_store.py        # Data persistence and filtering
├── test_timeline_visualizer.py   # Visualization components
└── fixtures/
    ├── sample_resume.txt
    ├── sample_cover_letter.txt
    └── expected_events.json
  • Unit tests for DateTimeManager date parsing edge cases
  • Integration tests for document processing pipeline
  • Data validation tests for TimelineEvent model
  • Storage persistence and filtering logic tests
  • Mock LLM responses for consistent testing

Streamlit-Specific Testing Methodology ✅ LEARNED FROM PRACTICE

Key Insights from Row Deletion Feature Implementation (Sept 2025)

The Challenge: Streamlit apps have unique testing constraints that differ from traditional Python applications due to their reactive, state-driven nature.

1. Isolated Component Testing Strategy

Problem: When st.data_editor wasn't displaying in the main app, traditional debugging failed to identify the issue.

Solution: Create minimal, isolated test files to verify components work independently:

# test_data_editor.py - Minimal test to verify st.data_editor works
import streamlit as st
import pandas as pd

data = [{"Select": False, "Name": "Alice", "Age": 25}]
df = pd.DataFrame(data)

edited_df = st.data_editor(df, column_config={
    "Select": st.column_config.CheckboxColumn("Select", default=False)
})

Key Learning: Always test Streamlit components in isolation first, then integrate.

2. Browser Automation Testing

Problem: Backend logs showed no errors, but user reported "excessive API calls" during deletion.

Solution: Use Playwright browser automation to reproduce exact user interactions:

# Test actual button clicks and observe backend behavior
await page.getByRole('button', { name: '🔍 Find Potential Duplicates' }).click()
await page.getByRole('button', { name: '🗑️ Delete Entire Group' }).click()

Key Learning: Streamlit UI bugs often only manifest through actual browser interactions, not programmatic testing.

3. st.rerun() Performance Debugging

Problem: "Excessive API calls" turned out to be excessive app reruns, not actual API calls.

Diagnosis Method: 1. Monitor backend logs during user interactions 2. Look for repeated processing patterns:

INFO:🔍 FILTERING: Result - 8 events after filtering
INFO:🔍 FILTERING: Result - 8 events after filtering  # <- Duplicate processing
INFO:🔍 FILTERING: Result - 8 events after filtering
3. Trace st.rerun() calls in confirmation dialogs and button handlers

Key Learning: Every st.rerun() triggers full app re-execution. Use sparingly and strategically.

4. Streamlit State Management Testing Patterns

Best Practices Discovered:

# ✅ Good: Test session state initialization
if 'selected_event_ids' not in st.session_state:
    st.session_state.selected_event_ids = []

# ✅ Good: Test state updates without immediate rerun
st.session_state.show_confirmation = True
# ... other logic
if some_condition:
    st.rerun()  # Single rerun at end

# ❌ Bad: Multiple reruns in sequence
if st.button("Action"):
    st.session_state.state = "updated"
    st.rerun()  # Rerun 1
    if condition:
        st.session_state.another_state = "changed"
        st.rerun()  # Rerun 2 - causes performance issues

5. Multi-Modal Testing Approach

Effective Testing Strategy:

  1. Unit Level: Test core functions (delete_event(), delete_multiple_events()) independently
  2. Component Level: Test Streamlit widgets in isolation (st.data_editor, st.button)
  3. Integration Level: Test widget interactions and state management
  4. End-to-End Level: Use browser automation for full user workflows
  5. Performance Level: Monitor backend logs for excessive processing

6. Common Streamlit Pitfalls to Test For

  • Layout Conflicts: Widgets not displaying due to container/column conflicts
  • Session State Race Conditions: State updates happening in wrong order
  • Rerun Loops: Infinite or excessive reruns due to poorly managed state
  • Widget Key Conflicts: Duplicate widget keys causing state corruption
  • Tab State Management: Widgets in different tabs interfering with each other

Testing Command Patterns

# Quick component test
streamlit run test_component.py --server.port 8503 --server.headless true

# Full app test with monitoring
streamlit run main_app.py --server.port 8502 --server.headless true
# Monitor logs in separate terminal

# Browser automation test
python test_browser_interactions.py

Key Insight: Streamlit testing requires both programmatic verification AND actual browser interaction testing to catch real-world issues.

Contributing Guidelines

Code Style

  • Follow existing patterns in timeline-mvp-pipeline.py
  • Use dataclasses for data models
  • Maintain type hints throughout
  • Document complex functions with docstrings

Development Process

  1. Test changes with demo data first
  2. Verify date parsing accuracy
  3. Check visualization rendering
  4. Validate JSON storage integrity

Future Considerations

  • Scalability: Plan for migration to graph database (Neo4j) for complex relationships
  • Privacy: Implement local-only processing options for sensitive documents
  • Integration: Design APIs for third-party calendar and document systems
  • Mobile: Consider responsive design for mobile timeline viewing
  • Offline: Plan for offline-capable document processing