Data Import¶

Learn how to import your fitness data from various sources into the Fitness Dashboard.

Supported Data Sources¶

The dashboard is currently optimized for MapMyRun data but can be adapted for other fitness platforms with CSV export capabilities.

MapMyRun (Recommended)¶

MapMyRun provides comprehensive workout data that integrates seamlessly with the dashboard.

Exporting from MapMyRun¶

Sign in to your MapMyRun account
Navigate to MapMyRun Export
Select date range for your export (or choose "All Time")
Download the CSV file

Expected Data Format¶

Your MapMyRun export should include these columns:

Column	Description	Example
`Workout Id`	Unique identifier	"2632022148"
`Workout Date`	Date and time	"2024-01-15 08:30:00"
`Activity Type`	Exercise category	"Running", "Cycling"
`Total Calories`	Energy burned	"450"
`Distance (mi)`	Miles covered	"3.2"
`Duration`	Time in seconds	"1800"
`Avg Pace (min/mi)`	Average pace	"8.5"
`Max Pace (min/mi)`	Best pace	"7.2"
`Steps`	Step count	"4200"
`Reference`	Link to workout	"https://..."

Other Fitness Platforms¶

The dashboard can be adapted for other platforms with similar data structures:

Strava: Export GPX/CSV data
Garmin Connect: CSV export from activity history
Fitbit: Data export via Fitbit API
Apple Health: Health app data export
Google Fit: Takeout data export

How Data Import Works¶

Understanding the complete journey from your fitness app to dashboard insights:

flowchart TD
    A[📱 Export CSV from<br/>MapMyRun] --> B{📋 Data Validation}

    B -->|✅ Valid| C[💾 Store in Database<br/>workout_summary table]
    B -->|❌ Invalid| D[🚨 Show Error<br/>& Validation Tips]

    C --> E[🔍 Analyze Features<br/>Pace, Distance, Duration]
    E --> F{🤖 Classification Process}

    F --> G[🏃 Real Run<br/>8-12 min/mile]
    F --> H[🚶 Walking<br/>20-28 min/mile]
    F --> I[🔄 Mixed Activity<br/>Variable pace]
    F --> J[⚠️ Outlier<br/>Unusual data]

    G --> K[📊 Available in Dashboard<br/>Charts, Trends, Insights]
    H --> K
    I --> K
    J --> K

    style A fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style B fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style C fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style D fill:#ffebee,stroke:#d32f2f,stroke-width:2px
    style E fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    style F fill:#e0f2f1,stroke:#00695c,stroke-width:2px
    style K fill:#e1f5fe,stroke:#0277bd,stroke-width:3px

Your data flows through validation, analysis, and classification before appearing in your dashboard. Each step ensures data quality and meaningful categorization.

Import Process¶

Step 1: Prepare Your Data File¶

Download your fitness data as CSV
Save the file in the src/ directory of your project
Note the exact filename for configuration

Step 2: Configure Data Source¶

Update your pyproject.toml file to reference your data file:

[tool.project]
input_filename = "your_workout_history.csv"
debug = true

Step 3: Run Import Script¶

Execute the data import process:

cd /path/to/fitness-dashboard
python src/update_db.py

The script will:

✅ Read your CSV file
✅ Validate data format and structure
✅ Clean and normalize data
✅ Insert records into the database
✅ Report import statistics

Step 4: Verify Import¶

Check that your data was imported successfully:

Launch the dashboard: streamlit run src/streamlit_app.py
Navigate to Fitness Overview
Run query: SELECT COUNT(*) FROM workout_summary;
Check the main dashboard for your workout data

Data Mapping and Transformation¶

Column Mapping¶

The import process maps CSV columns to database fields:

# Example mapping (in update_db.py)
column_mapping = {
    'Workout Id': 'workout_id',
    'Workout Date': 'workout_date', 
    'Activity Type': 'activity_type',
    'Total Calories': 'kcal_burned',
    'Distance (mi)': 'distance_mi',
    'Duration': 'duration_sec',
    'Avg Pace (min/mi)': 'avg_pace',
    'Max Pace (min/mi)': 'max_pace',
    'Steps': 'steps',
    'Reference': 'link'
}

Data Cleaning¶

The import process includes automatic data cleaning:

Date Parsing: Converts various date formats to MySQL DATETIME
Numeric Validation: Ensures numeric fields contain valid numbers
Text Normalization: Standardizes activity type names
Duplicate Detection: Prevents importing duplicate workouts
Missing Data Handling: Sets appropriate defaults for missing fields

Custom Data Sources¶

Adapting for New Platforms¶

To support a new fitness platform:

Analyze the CSV structure from your platform
Update column mapping in src/update_db.py
Modify data transformation logic if needed
Test with a small data sample

Manual Data Entry¶

For manual data entry or custom tracking:

Create a CSV file with the required columns:

workout_id,workout_date,activity_type,kcal_burned,distance_mi,duration_sec,avg_pace,max_pace,steps,link
manual_001,2024-01-15 08:00:00,Running,300,2.5,1200,8.0,7.5,3000,
manual_002,2024-01-16 09:00:00,Cycling,400,10.0,2400,,,5000,

Batch Import Operations¶

Large Dataset Handling¶

For large datasets (1000+ workouts):

Split large CSV files into smaller batches
Import progressively to monitor progress
Use database transactions for data integrity
Monitor memory usage during import

# Example: Split large file
head -n 1 large_file.csv > header.csv
tail -n +2 large_file.csv | split -l 500 - batch_
for file in batch_*; do
    cat header.csv $file > import_$file.csv
done

Incremental Updates¶

For regular data updates:

Export only new workouts since last import
Use date-based filtering in your export
Run import script regularly (weekly/monthly)
Verify no duplicate records are created

Data Quality and Validation¶

Pre-Import Validation¶

Before importing, validate your data:

import pandas as pd

# Load and inspect data
df = pd.read_csv('your_data.csv')
print(f"Records: {len(df)}")
print(f"Date range: {df['Workout Date'].min()} to {df['Workout Date'].max()}")
print(f"Activities: {df['Activity Type'].unique()}")
print(f"Missing values: {df.isnull().sum()}")

Post-Import Verification¶

After import, verify data quality:

-- Check record count
SELECT COUNT(*) as total_workouts FROM workout_summary;

-- Check date range
SELECT MIN(workout_date) as earliest, MAX(workout_date) as latest 
FROM workout_summary;

-- Check activity distribution  
SELECT activity_type, COUNT(*) as count 
FROM workout_summary 
GROUP BY activity_type 
ORDER BY count DESC;

-- Check for anomalies
SELECT * FROM workout_summary 
WHERE distance_mi > 50 OR duration_sec > 14400; -- Potential data errors

Troubleshooting Import Issues¶

Common Problems¶

File Not Found

Error: FileNotFoundError: No such file or directory

Solution: - Verify file path in pyproject.toml - Ensure CSV file is in the correct directory - Check file permissions

CSV Format Error

Error: pandas.errors.EmptyDataError or parsing errors

Solutions: - Check CSV file encoding (UTF-8 recommended) - Verify column headers match expected format - Remove or escape special characters in data

Database Connection Error

Error: pymysql.err.OperationalError

Solutions: - Verify database is running - Check credentials in .env file - Ensure database and table exist

Duplicate Key Error

Error: pymysql.err.IntegrityError: Duplicate entry

Solutions: - Check for duplicate workout IDs in CSV - Clear existing data if re-importing: DELETE FROM workout_summary; - Implement upsert logic for updates

Data Quality Issues¶

Missing Data

Symptoms: Empty fields or null values in dashboard

Solutions: - Review CSV file for completeness - Update import script to handle missing values - Set appropriate defaults for optional fields

Incorrect Dates

Symptoms: Workouts appearing in wrong time periods

Solutions: - Verify date format in CSV matches parser expectations - Check timezone handling in import script - Manually inspect problematic date entries

Advanced Import Features¶

Automated Imports¶

Set up automated data imports:

#!/bin/bash
# auto_import.sh - Automated import script

# Download latest data (platform-specific)
# ... download logic ...

# Run import
cd /path/to/fitness-dashboard
python src/update_db.py

# Log results
echo "Import completed: $(date)" >> import.log

Schedule with cron:

# Run weekly on Sunday at 2 AM
0 2 * * 0 /path/to/auto_import.sh

API Integration¶

For real-time data integration (future enhancement):

# Example: Strava API integration
import requests

def fetch_strava_activities(access_token):
    url = "https://www.strava.com/api/v3/athlete/activities"
    headers = {"Authorization": f"Bearer {access_token}"}
    response = requests.get(url, headers=headers)
    return response.json()

Next Steps¶

After successfully importing your data:

Explore Visualizations: Learn about Visualization Features
Run Custom Analysis: Use the SQL Query Interface
Set Up Regular Updates: Establish a routine import schedule
Monitor Data Quality: Regularly validate imported data

For additional help, see the Troubleshooting Reference.