Data Flow Architecture¶
Understanding how data moves through the Digital Memory Chest is crucial for both users and developers. This page details the complete data lifecycle from upload to story generation.
Complete Data Flow¶
flowchart TB
subgraph "Input Sources"
USER[👤 User Upload]
CONTRIB[👥 Contributor]
API_INPUT[🔌 API Import]
end
subgraph "Ingestion Layer"
VALIDATE[✅ Validation]
METADATA[📋 Metadata Extraction]
THUMBNAIL[🖼️ Thumbnail Generation]
end
subgraph "Processing Queue"
QUEUE[📝 Processing Queue]
BATCH[📦 Batch Processor]
end
subgraph "AI Pipeline"
TRANSCRIBE[🎵 Audio Transcription]
TAG_IMG[🏷️ Image Tagging]
EXTRACT[📊 Feature Extraction]
CONTENT_ANALYSIS[🔍 Content Analysis]
end
subgraph "Storage Systems"
DATABASE[(🗄️ Database)]
FILES[📁 File Storage]
CACHE[⚡ Cache Layer]
end
subgraph "Story Generation"
TIMELINE[📅 Timeline Builder]
THEMES[🎨 Theme Extractor]
NARRATIVE[📖 Story Writer]
REVIEW[👁️ Content Review]
end
subgraph "Output & Sharing"
MEMORIAL[🏛️ Memorial View]
SHARE_TOKEN[🔗 Share Tokens]
EXPORT[📤 Export Options]
end
USER --> VALIDATE
CONTRIB --> VALIDATE
API_INPUT --> VALIDATE
VALIDATE --> METADATA
METADATA --> THUMBNAIL
THUMBNAIL --> QUEUE
QUEUE --> BATCH
BATCH --> TRANSCRIBE
BATCH --> TAG_IMG
BATCH --> EXTRACT
TRANSCRIBE --> DATABASE
TAG_IMG --> DATABASE
EXTRACT --> DATABASE
CONTENT_ANALYSIS --> DATABASE
METADATA --> FILES
THUMBNAIL --> FILES
DATABASE --> CACHE
FILES --> CACHE
DATABASE --> TIMELINE
TIMELINE --> THEMES
THEMES --> NARRATIVE
NARRATIVE --> REVIEW
REVIEW --> MEMORIAL
MEMORIAL --> SHARE_TOKEN
MEMORIAL --> EXPORT
style USER fill:#e3f2fd
style CONTRIB fill:#e8f5e8
style API_INPUT fill:#fff3e0
style DATABASE fill:#f3e5f5
style FILES fill:#fce4ec
style MEMORIAL fill:#e1f5fe
Upload Process Details¶
Phase 1: File Ingestion¶
sequenceDiagram
participant User
participant UI as Streamlit UI
participant Validator
participant Storage
participant DB as Database
User->>UI: Upload Files
UI->>Validator: Validate Files
alt Valid Files
Validator->>Storage: Save Original Files
Validator->>DB: Create Asset Records
Validator->>UI: Success Response
UI->>User: Upload Confirmed
else Invalid Files
Validator->>UI: Validation Error
UI->>User: Error Message
end
Validation Checks: - File size limits (100MB per file, 500MB total) - Supported file formats (images, videos, audio, text) - Content scanning for inappropriate material - Metadata extraction and validation
Phase 2: Metadata Extraction¶
graph LR
FILE[Original File] --> EXIF[EXIF Data]
FILE --> MIME[MIME Type]
FILE --> SIZE[File Size]
FILE --> HASH[Content Hash]
EXIF --> META[Metadata Object]
MIME --> META
SIZE --> META
HASH --> META
META --> DB[(Database)]
FILE --> THUMB[Thumbnail Generator]
THUMB --> THUMB_FILE[Thumbnail File]
THUMB_FILE --> STORAGE[File Storage]
Extracted Information: - Images: EXIF data, GPS coordinates, camera settings, creation date - Videos: Duration, resolution, codec, creation date - Audio: Duration, format, sampling rate, embedded metadata - Documents: Text content, creation date, author information
Phase 3: AI Processing Pipeline¶
stateDiagram-v2
[*] --> Queued
Queued --> Processing
Processing --> Transcribing: Audio/Video
Processing --> Tagging: Images
Processing --> Analyzing: All Types
Transcribing --> Completed: Success
Transcribing --> Failed: Error
Tagging --> Completed: Success
Tagging --> Failed: Error
Analyzing --> Completed: Success
Analyzing --> Failed: Error
Failed --> Retry: Auto Retry
Retry --> Processing: Attempt Again
Retry --> Abandoned: Max Retries
Completed --> [*]
Abandoned --> [*]
AI Processing Details¶
Audio & Video Transcription¶
flowchart LR
subgraph "Audio Processing"
EXTRACT[Extract Audio Track]
NORMALIZE[Normalize Audio]
SEGMENT[Segment by Silence]
end
subgraph "Transcription Options"
LOCAL[Local Whisper]
OPENAI[OpenAI Whisper API]
FALLBACK[Template Fallback]
end
subgraph "Post-Processing"
TIMESTAMP[Add Timestamps]
CONFIDENCE[Confidence Scoring]
CLEANUP[Text Cleanup]
end
VIDEO --> EXTRACT
AUDIO --> EXTRACT
EXTRACT --> NORMALIZE
NORMALIZE --> SEGMENT
SEGMENT --> LOCAL
SEGMENT --> OPENAI
LOCAL --> FALLBACK
OPENAI --> FALLBACK
LOCAL --> TIMESTAMP
OPENAI --> TIMESTAMP
FALLBACK --> TIMESTAMP
TIMESTAMP --> CONFIDENCE
CONFIDENCE --> CLEANUP
CLEANUP --> DATABASE[(Transcript)]
Image Tagging & Analysis¶
flowchart TB
IMAGE[Original Image] --> RESIZE[Resize for Processing]
RESIZE --> CLIP[CLIP Analysis]
subgraph "CLIP Processing"
FEATURES[Extract Features]
CATEGORIES[Memorial Categories]
SIMILARITY[Similarity Matching]
end
CLIP --> FEATURES
FEATURES --> CATEGORIES
CATEGORIES --> SIMILARITY
subgraph "Memorial Categories"
FAMILY[Family Gatherings]
TRAVEL[Travel & Places]
HOBBIES[Hobbies & Interests]
CELEBRATION[Celebrations]
NATURE[Nature & Outdoors]
HOME[Home & Daily Life]
end
SIMILARITY --> FAMILY
SIMILARITY --> TRAVEL
SIMILARITY --> HOBBIES
SIMILARITY --> CELEBRATION
SIMILARITY --> NATURE
SIMILARITY --> HOME
FAMILY --> TAGS[(Image Tags)]
TRAVEL --> TAGS
HOBBIES --> TAGS
CELEBRATION --> TAGS
NATURE --> TAGS
HOME --> TAGS
Story Generation Workflow¶
sequenceDiagram
participant User
participant StoryGen as Story Generator
participant Timeline as Timeline Builder
participant Themes as Theme Extractor
participant LLM as Language Model
participant Review as Content Review
participant DB as Database
User->>StoryGen: Request Story Generation
StoryGen->>DB: Fetch All Memories
StoryGen->>Timeline: Build Chronology
Timeline->>Timeline: Sort by Date
Timeline->>Timeline: Group by Periods
Timeline->>StoryGen: Timeline Structure
StoryGen->>Themes: Extract Themes
Themes->>Themes: Analyze Content
Themes->>Themes: Identify Patterns
Themes->>StoryGen: Theme Categories
StoryGen->>LLM: Generate Narrative
LLM->>LLM: Process Memories
LLM->>LLM: Create Story Arc
LLM->>StoryGen: Generated Story
StoryGen->>Review: Content Review
Review->>Review: Check Appropriateness
Review->>Review: Validate Quality
Review->>StoryGen: Approval
StoryGen->>DB: Save Generated Story
StoryGen->>User: Story Complete
Timeline Construction¶
The timeline builder creates a chronological narrative structure:
gantt
title Memory Timeline Construction
dateFormat YYYY-MM-DD
section Early Years
Childhood Photos :active, early, 1950-01-01, 1960-12-31
School Years :school, 1960-01-01, 1970-12-31
section Adult Life
Career Highlights :career, 1970-01-01, 2000-12-31
Family Life :family, 1975-01-01, 2020-12-31
section Later Years
Retirement :retire, 2000-01-01, 2020-12-31
Legacy Moments :legacy, 2015-01-01, 2023-12-31
Theme Extraction Process¶
mindmap
root((Memorial Themes))
Family
Relationships
Traditions
Gatherings
Love
Achievements
Career
Education
Recognition
Milestones
Personality
Humor
Kindness
Strength
Passion
Interests
Hobbies
Travel
Learning
Community
Legacy
Impact
Values
Inspiration
Memory
Data Security & Privacy Flow¶
flowchart TD
subgraph "Data Ingestion Security"
INPUT[User Input] --> SANITIZE[Input Sanitization]
SANITIZE --> VALIDATE[Schema Validation]
VALIDATE --> ENCRYPT[Encrypt at Rest]
end
subgraph "Processing Security"
ENCRYPT --> TEMP[Temporary Processing]
TEMP --> AUDIT[Audit Logging]
AUDIT --> CLEAN[Secure Cleanup]
end
subgraph "Storage Security"
CLEAN --> PARTITION[Data Partitioning]
PARTITION --> BACKUP[Encrypted Backup]
BACKUP --> RETENTION[Retention Policy]
end
subgraph "Access Control"
TOKEN[Share Tokens] --> PERMISSION[Permission Check]
PERMISSION --> RATE_LIMIT[Rate Limiting]
RATE_LIMIT --> ACCESS[Controlled Access]
end
RETENTION --> TOKEN
style INPUT fill:#ffebee
style ENCRYPT fill:#e8f5e8
style TOKEN fill:#e3f2fd
style ACCESS fill:#f3e5f5
Performance Optimizations¶
Caching Strategy¶
graph TB
subgraph "Cache Layers"
L1[L1: In-Memory Cache]
L2[L2: Redis Cache]
L3[L3: Database Cache]
L4[L4: File System Cache]
end
REQUEST[User Request] --> L1
L1 --> L2
L2 --> L3
L3 --> L4
L4 --> DATABASE[(Primary Database)]
L1 -.->|Cache Hit| RESPONSE[Fast Response]
L2 -.->|Cache Hit| RESPONSE
L3 -.->|Cache Hit| RESPONSE
L4 -.->|Cache Hit| RESPONSE
DATABASE --> RESPONSE
Cache Policies: - L1 (In-Memory): Recently accessed stories and thumbnails (5 min TTL) - L2 (Redis): Processed media metadata and search results (1 hour TTL) - L3 (Database): Query result caching with smart invalidation - L4 (File System): Thumbnail and preview caching (24 hour TTL)
Implementation Notes
- All processing is asynchronous to prevent UI blocking
- Failed operations are automatically retried with exponential backoff
- Graceful degradation ensures core functionality works without AI services
- Comprehensive audit logging tracks all data modifications
Next Steps
- Explore the Database Design for storage details
- Learn about the AI Pipeline implementation
- Review Storage Layer architecture