Skip to content

AI Processing Pipeline

The AI pipeline is the heart of Digital Memory Chest's intelligent features, transforming raw memories into meaningful narratives while maintaining privacy and respect.

Pipeline Overview

graph TD
    subgraph "Input Processing"
        UPLOAD[File Upload] --> EXTRACT[Content Extraction]
        EXTRACT --> QUEUE[Processing Queue]
    end

    subgraph "Parallel AI Processing"
        QUEUE --> TRANSCRIBE[🎵 Audio Transcription]
        QUEUE --> TAG[🏷️ Image Classification] 
        QUEUE --> ANALYZE[📊 Content Analysis]
    end

    subgraph "Content Understanding"
        TRANSCRIBE --> NLP[🧠 Natural Language Processing]
        TAG --> SEMANTIC[🔍 Semantic Analysis]
        ANALYZE --> EMOTION[💭 Emotional Context]
    end

    subgraph "Story Generation"
        NLP --> TIMELINE[📅 Timeline Construction]
        SEMANTIC --> THEMES[🎨 Theme Extraction]
        EMOTION --> NARRATIVE[📖 Narrative Generation]

        TIMELINE --> STORY[📚 Story Assembly]
        THEMES --> STORY
        NARRATIVE --> STORY
    end

    subgraph "Quality & Safety"
        STORY --> REVIEW[👁️ Content Review]
        REVIEW --> MODERATE[🛡️ Content Moderation]
        MODERATE --> OUTPUT[✅ Final Output]
    end

    style TRANSCRIBE fill:#e3f2fd
    style TAG fill:#e8f5e8
    style ANALYZE fill:#fff3e0
    style STORY fill:#f3e5f5
    style OUTPUT fill:#e1f5fe

🔍 View Full Resolution

Audio & Video Transcription

Whisper Integration

The transcription service supports both local and cloud-based processing:

sequenceDiagram
    participant Upload as File Upload
    participant Extractor as Audio Extractor
    participant Whisper as Whisper Model
    participant OpenAI as OpenAI API
    participant Processor as Text Processor
    participant DB as Database

    Upload->>Extractor: Audio/Video File
    Extractor->>Extractor: Extract Audio Track
    Extractor->>Extractor: Normalize & Clean

    alt Local Processing Available
        Extractor->>Whisper: Process Locally
        Whisper->>Processor: Raw Transcript
    else Cloud Processing
        Extractor->>OpenAI: Send to API
        OpenAI->>Processor: Raw Transcript
    else Fallback Mode
        Extractor->>Processor: Metadata Only
    end

    Processor->>Processor: Add Timestamps
    Processor->>Processor: Clean Text
    Processor->>Processor: Extract Keywords
    Processor->>DB: Save Results

Transcription Features

  • Pre-processing: Audio normalization and noise reduction
  • Multiple Models: Support for different Whisper model sizes
  • Language Detection: Automatic language identification
  • Confidence Scoring: Quality assessment for each segment
  • Local Processing: No data leaves your infrastructure
  • Temporary Files: Audio extracted to secure temporary storage
  • Secure Cleanup: All temporary files securely deleted
  • Optional Cloud: Choose between local and cloud processing
  • Timestamp Alignment: Precise timing for video synchronization
  • Speaker Identification: Basic speaker detection (when applicable)
  • Keyword Extraction: Important terms and phrases highlighted
  • Emotional Tone: Basic sentiment analysis of content

Image Understanding with CLIP

Zero-Shot Classification

flowchart LR
    subgraph "Image Input"
        IMG[Original Image]
        RESIZE[Resize & Normalize]
        TENSOR[Convert to Tensor]
    end

    subgraph "CLIP Model"
        ENCODER[Image Encoder]
        TEXT_ENCODER[Text Encoder]
        SIMILARITY[Similarity Computation]
    end

    subgraph "Memorial Categories"
        FAMILY["👨‍👩‍👧‍👦 Family Gatherings"]
        CELEBRATION["🎉 Celebrations"]
        TRAVEL["✈️ Travel & Places"]
        HOBBIES["🎨 Hobbies & Interests"]
        NATURE["🌲 Nature & Outdoors"]
        HOME["🏠 Home & Daily Life"]
        WORK["💼 Work & Career"]
        SPIRITUAL["🙏 Spiritual Moments"]
    end

    subgraph "Results"
        SCORES[Confidence Scores]
        TAGS[Generated Tags]
        METADATA[Image Metadata]
    end

    IMG --> RESIZE --> TENSOR
    TENSOR --> ENCODER

    FAMILY --> TEXT_ENCODER
    CELEBRATION --> TEXT_ENCODER
    TRAVEL --> TEXT_ENCODER
    HOBBIES --> TEXT_ENCODER
    NATURE --> TEXT_ENCODER
    HOME --> TEXT_ENCODER
    WORK --> TEXT_ENCODER
    SPIRITUAL --> TEXT_ENCODER

    ENCODER --> SIMILARITY
    TEXT_ENCODER --> SIMILARITY

    SIMILARITY --> SCORES
    SCORES --> TAGS
    TAGS --> METADATA

Context-Aware Tagging

Our image classification goes beyond simple object detection:

# Example memorial-appropriate categories
memorial_categories = [
    "a warm family gathering around a dinner table",
    "a joyful celebration with friends and loved ones", 
    "a peaceful moment in nature",
    "a cherished hobby or creative activity",
    "a meaningful travel experience",
    "a quiet moment of reflection",
    "a professional achievement or milestone",
    "a loving interaction between family members"
]

Benefits of This Approach: - Context-Sensitive: Categories designed specifically for memorial content - Respectful Descriptions: Language appropriate for sensitive memories - Nuanced Understanding: Captures emotional context, not just objects - Cultural Awareness: Recognizes diverse family structures and traditions

Natural Language Processing

Content Analysis Pipeline

graph TB
    subgraph "Text Input Sources"
        TRANSCRIPT[Audio Transcripts]
        CAPTIONS[Image Captions]  
        NOTES[User Notes]
        METADATA[File Metadata]
    end

    subgraph "NLP Processing"
        TOKENIZE[Tokenization]
        NER[Named Entity Recognition]
        SENTIMENT[Sentiment Analysis]
        KEYWORDS[Keyword Extraction]
    end

    subgraph "Understanding Layers"
        PEOPLE[People & Relationships]
        PLACES[Places & Locations]
        EVENTS[Events & Occasions]
        EMOTIONS[Emotional Themes]
        TIME[Temporal Context]
    end

    subgraph "Knowledge Graph"
        RELATIONSHIPS[Relationship Mapping]
        TIMELINE[Timeline Construction]
        THEMES[Theme Identification]
    end

    TRANSCRIPT --> TOKENIZE
    CAPTIONS --> TOKENIZE
    NOTES --> TOKENIZE
    METADATA --> TOKENIZE

    TOKENIZE --> NER
    TOKENIZE --> SENTIMENT
    TOKENIZE --> KEYWORDS

    NER --> PEOPLE
    NER --> PLACES
    NER --> EVENTS

    SENTIMENT --> EMOTIONS
    KEYWORDS --> TIME

    PEOPLE --> RELATIONSHIPS
    PLACES --> TIMELINE
    EVENTS --> TIMELINE
    EMOTIONS --> THEMES
    TIME --> TIMELINE

    RELATIONSHIPS --> KNOWLEDGE[(Knowledge Graph)]
    TIMELINE --> KNOWLEDGE
    THEMES --> KNOWLEDGE

Entity Recognition

The NLP pipeline identifies and categorizes important entities:

Entity Type Examples Use Case
People Names, relationships, nicknames Family tree construction
Places Cities, landmarks, addresses Geographic timeline
Events Birthdays, weddings, graduations Life milestone tracking
Dates Years, seasons, holidays Chronological ordering
Objects Cars, homes, pets Significant possessions
Activities Hobbies, sports, work Interest identification

Story Generation Architecture

Multi-Stage Generation Process

stateDiagram-v2
    [*] --> DataCollection
    DataCollection --> MemoryAnalysis
    MemoryAnalysis --> ThemeExtraction
    ThemeExtraction --> TimelineConstruction
    TimelineConstruction --> NarrativeGeneration
    NarrativeGeneration --> ContentReview
    ContentReview --> QualityCheck
    QualityCheck --> FinalStory

    QualityCheck --> NarrativeGeneration: Needs Improvement
    ContentReview --> ThemeExtraction: Adjust Themes

    FinalStory --> [*]

Prompt Engineering

Our story generation uses carefully crafted prompts designed for memorial content:

story_generation:
  system_prompt: |
    You are a compassionate writer helping families create respectful digital memorials. 
    Your goal is to craft meaningful, accurate narratives that honor the person's memory
    while being sensitive to grief and loss.

  guidelines:
    - Use warm, respectful language throughout
    - Focus on positive memories and character traits
    - Include specific details from the provided memories
    - Maintain chronological coherence
    - Acknowledge the person's impact on others
    - End with messages of love and remembrance

  structure:
    - Opening: Brief introduction with key characteristics
    - Early Life: Formative experiences and relationships
    - Adult Years: Achievements, family, and passions
    - Character Portrait: Personality, values, and quirks
    - Legacy: How they touched others' lives
    - Closing: Celebration of their lasting impact

Quality Assurance

flowchart TD
    STORY[Generated Story] --> FACT_CHECK[Fact Verification]
    FACT_CHECK --> TONE_CHECK[Tone Analysis]
    TONE_CHECK --> COHERENCE[Narrative Coherence]
    COHERENCE --> SENSITIVITY[Sensitivity Review]

    subgraph "Automated Checks"
        FACT_CHECK --> DATES[Date Consistency]
        FACT_CHECK --> NAMES[Name Accuracy]  
        FACT_CHECK --> PLACES[Location Verification]
    end

    subgraph "Content Quality"
        TONE_CHECK --> APPROPRIATE[Appropriate Language]
        TONE_CHECK --> RESPECTFUL[Respectful Tone]
        COHERENCE --> FLOW[Narrative Flow]
        COHERENCE --> STRUCTURE[Story Structure]
    end

    subgraph "Human Review"
        SENSITIVITY --> GUIDELINES[Editorial Guidelines]
        SENSITIVITY --> CULTURAL[Cultural Sensitivity]
        SENSITIVITY --> GRIEF[Grief-Aware Language]
    end

    DATES --> PASS{Quality Check}
    NAMES --> PASS
    PLACES --> PASS
    APPROPRIATE --> PASS
    RESPECTFUL --> PASS
    FLOW --> PASS
    STRUCTURE --> PASS
    GUIDELINES --> PASS
    CULTURAL --> PASS
    GRIEF --> PASS

    PASS -->|Pass| APPROVE[Approved Story]
    PASS -->|Needs Work| REVISE[Revision Required]
    REVISE --> STORY

Privacy & Security Considerations

Local vs. Cloud Processing

graph LR
    subgraph "Local Processing (Default)"
        LOCAL_WHISPER[Whisper Model]
        LOCAL_CLIP[CLIP Model]
        LOCAL_NLP[Local NLP]
    end

    subgraph "Cloud Processing (Optional)"
        OPENAI_API[OpenAI API]
        ANTHROPIC_API[Anthropic API]
        CLOUD_STORAGE[Cloud Storage]
    end

    subgraph "Hybrid Approach"
        FALLBACK[Graceful Fallback]
        TEMPLATE[Template-Based Stories]
        CACHE[Local Caching]
    end

    FILES[User Files] --> LOCAL_WHISPER
    FILES --> LOCAL_CLIP

    LOCAL_WHISPER -.->|Optional| OPENAI_API
    LOCAL_CLIP --> LOCAL_NLP
    LOCAL_NLP -.->|Optional| ANTHROPIC_API

    OPENAI_API --> FALLBACK
    ANTHROPIC_API --> FALLBACK
    FALLBACK --> TEMPLATE

    LOCAL_WHISPER --> CACHE
    LOCAL_CLIP --> CACHE
    LOCAL_NLP --> CACHE

Data Protection Measures

  • All processed content encrypted in database
  • Temporary files use full disk encryption
  • Secure key management with rotation
  • TLS encryption for all API communications
  • Certificate pinning for external services
  • Secure token-based authentication
  • Isolated processing environments
  • Automatic cleanup of temporary data
  • Memory-safe processing pipelines
  • Share tokens instead of public identifiers
  • Time-limited access with revocation
  • Audit logging for all access attempts

Performance Optimization

Async Processing Architecture

# Simplified async processing example
async def process_media_async(asset_id: int):
    async with ProcessingSession() as session:
        # Parallel processing of different AI tasks
        tasks = [
            transcribe_audio(asset_id),
            classify_image(asset_id), 
            extract_metadata(asset_id)
        ]

        # Wait for all tasks to complete
        results = await asyncio.gather(*tasks, return_exceptions=True)

        # Update database with results
        await update_processing_results(asset_id, results)

Caching Strategy

Cache Level Content TTL Purpose
L1 Memory Active processing results 5 min Immediate access
L2 Redis AI model outputs 1 hour Cross-session sharing
L3 Database Processed metadata 24 hours Persistent storage
L4 File Generated thumbnails 7 days Bandwidth saving

Privacy-First AI

All AI processing can run entirely locally, ensuring sensitive memories never leave your infrastructure while still providing powerful AI insights.

Next Steps