Skip to content

Data Import

Learn how to import your fitness data from various sources into the Fitness Dashboard.

Supported Data Sources

The dashboard is currently optimized for MapMyRun data but can be adapted for other fitness platforms with CSV export capabilities.

MapMyRun provides comprehensive workout data that integrates seamlessly with the dashboard.

Exporting from MapMyRun

  1. Sign in to your MapMyRun account
  2. Navigate to MapMyRun Export
  3. Select date range for your export (or choose "All Time")
  4. Download the CSV file

Expected Data Format

Your MapMyRun export should include these columns:

Column Description Example
Workout Id Unique identifier "2632022148"
Workout Date Date and time "2024-01-15 08:30:00"
Activity Type Exercise category "Running", "Cycling"
Total Calories Energy burned "450"
Distance (mi) Miles covered "3.2"
Duration Time in seconds "1800"
Avg Pace (min/mi) Average pace "8.5"
Max Pace (min/mi) Best pace "7.2"
Steps Step count "4200"
Reference Link to workout "https://..."

Other Fitness Platforms

The dashboard can be adapted for other platforms with similar data structures:

  • Strava: Export GPX/CSV data
  • Garmin Connect: CSV export from activity history
  • Fitbit: Data export via Fitbit API
  • Apple Health: Health app data export
  • Google Fit: Takeout data export

How Data Import Works

Understanding the complete journey from your fitness app to dashboard insights:

flowchart TD
    A[📱 Export CSV from<br/>MapMyRun] --> B{📋 Data Validation}

    B -->|✅ Valid| C[💾 Store in Database<br/>workout_summary table]
    B -->|❌ Invalid| D[🚨 Show Error<br/>& Validation Tips]

    C --> E[🔍 Analyze Features<br/>Pace, Distance, Duration]
    E --> F{🤖 Classification Process}

    F --> G[🏃 Real Run<br/>8-12 min/mile]
    F --> H[🚶 Walking<br/>20-28 min/mile]
    F --> I[🔄 Mixed Activity<br/>Variable pace]
    F --> J[⚠️ Outlier<br/>Unusual data]

    G --> K[📊 Available in Dashboard<br/>Charts, Trends, Insights]
    H --> K
    I --> K
    J --> K

    style A fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style B fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style C fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style D fill:#ffebee,stroke:#d32f2f,stroke-width:2px
    style E fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    style F fill:#e0f2f1,stroke:#00695c,stroke-width:2px
    style K fill:#e1f5fe,stroke:#0277bd,stroke-width:3px

Your data flows through validation, analysis, and classification before appearing in your dashboard. Each step ensures data quality and meaningful categorization.

Import Process

Step 1: Prepare Your Data File

  1. Download your fitness data as CSV
  2. Save the file in the src/ directory of your project
  3. Note the exact filename for configuration

Step 2: Configure Data Source

Update your pyproject.toml file to reference your data file:

[tool.project]
input_filename = "your_workout_history.csv"
debug = true

Step 3: Run Import Script

Execute the data import process:

cd /path/to/fitness-dashboard
python src/update_db.py

The script will:

  • ✅ Read your CSV file
  • ✅ Validate data format and structure
  • ✅ Clean and normalize data
  • ✅ Insert records into the database
  • ✅ Report import statistics

Step 4: Verify Import

Check that your data was imported successfully:

  1. Launch the dashboard: streamlit run src/streamlit_app.py
  2. Navigate to Fitness Overview
  3. Run query: SELECT COUNT(*) FROM workout_summary;
  4. Check the main dashboard for your workout data

Data Mapping and Transformation

Column Mapping

The import process maps CSV columns to database fields:

# Example mapping (in update_db.py)
column_mapping = {
    'Workout Id': 'workout_id',
    'Workout Date': 'workout_date', 
    'Activity Type': 'activity_type',
    'Total Calories': 'kcal_burned',
    'Distance (mi)': 'distance_mi',
    'Duration': 'duration_sec',
    'Avg Pace (min/mi)': 'avg_pace',
    'Max Pace (min/mi)': 'max_pace',
    'Steps': 'steps',
    'Reference': 'link'
}

Data Cleaning

The import process includes automatic data cleaning:

  • Date Parsing: Converts various date formats to MySQL DATETIME
  • Numeric Validation: Ensures numeric fields contain valid numbers
  • Text Normalization: Standardizes activity type names
  • Duplicate Detection: Prevents importing duplicate workouts
  • Missing Data Handling: Sets appropriate defaults for missing fields

Custom Data Sources

Adapting for New Platforms

To support a new fitness platform:

  1. Analyze the CSV structure from your platform
  2. Update column mapping in src/update_db.py
  3. Modify data transformation logic if needed
  4. Test with a small data sample

Manual Data Entry

For manual data entry or custom tracking:

Create a CSV file with the required columns:

workout_id,workout_date,activity_type,kcal_burned,distance_mi,duration_sec,avg_pace,max_pace,steps,link
manual_001,2024-01-15 08:00:00,Running,300,2.5,1200,8.0,7.5,3000,
manual_002,2024-01-16 09:00:00,Cycling,400,10.0,2400,,,5000,

Batch Import Operations

Large Dataset Handling

For large datasets (1000+ workouts):

  1. Split large CSV files into smaller batches
  2. Import progressively to monitor progress
  3. Use database transactions for data integrity
  4. Monitor memory usage during import
# Example: Split large file
head -n 1 large_file.csv > header.csv
tail -n +2 large_file.csv | split -l 500 - batch_
for file in batch_*; do
    cat header.csv $file > import_$file.csv
done

Incremental Updates

For regular data updates:

  1. Export only new workouts since last import
  2. Use date-based filtering in your export
  3. Run import script regularly (weekly/monthly)
  4. Verify no duplicate records are created

Data Quality and Validation

Pre-Import Validation

Before importing, validate your data:

import pandas as pd

# Load and inspect data
df = pd.read_csv('your_data.csv')
print(f"Records: {len(df)}")
print(f"Date range: {df['Workout Date'].min()} to {df['Workout Date'].max()}")
print(f"Activities: {df['Activity Type'].unique()}")
print(f"Missing values: {df.isnull().sum()}")

Post-Import Verification

After import, verify data quality:

-- Check record count
SELECT COUNT(*) as total_workouts FROM workout_summary;

-- Check date range
SELECT MIN(workout_date) as earliest, MAX(workout_date) as latest 
FROM workout_summary;

-- Check activity distribution  
SELECT activity_type, COUNT(*) as count 
FROM workout_summary 
GROUP BY activity_type 
ORDER BY count DESC;

-- Check for anomalies
SELECT * FROM workout_summary 
WHERE distance_mi > 50 OR duration_sec > 14400; -- Potential data errors

Troubleshooting Import Issues

Common Problems

File Not Found

Error: FileNotFoundError: No such file or directory

Solution: - Verify file path in pyproject.toml - Ensure CSV file is in the correct directory - Check file permissions

CSV Format Error

Error: pandas.errors.EmptyDataError or parsing errors

Solutions: - Check CSV file encoding (UTF-8 recommended) - Verify column headers match expected format - Remove or escape special characters in data

Database Connection Error

Error: pymysql.err.OperationalError

Solutions: - Verify database is running - Check credentials in .env file - Ensure database and table exist

Duplicate Key Error

Error: pymysql.err.IntegrityError: Duplicate entry

Solutions: - Check for duplicate workout IDs in CSV - Clear existing data if re-importing: DELETE FROM workout_summary; - Implement upsert logic for updates

Data Quality Issues

Missing Data

Symptoms: Empty fields or null values in dashboard

Solutions: - Review CSV file for completeness - Update import script to handle missing values - Set appropriate defaults for optional fields

Incorrect Dates

Symptoms: Workouts appearing in wrong time periods

Solutions: - Verify date format in CSV matches parser expectations - Check timezone handling in import script - Manually inspect problematic date entries

Advanced Import Features

Automated Imports

Set up automated data imports:

#!/bin/bash
# auto_import.sh - Automated import script

# Download latest data (platform-specific)
# ... download logic ...

# Run import
cd /path/to/fitness-dashboard
python src/update_db.py

# Log results
echo "Import completed: $(date)" >> import.log

Schedule with cron:

# Run weekly on Sunday at 2 AM
0 2 * * 0 /path/to/auto_import.sh

API Integration

For real-time data integration (future enhancement):

# Example: Strava API integration
import requests

def fetch_strava_activities(access_token):
    url = "https://www.strava.com/api/v3/athlete/activities"
    headers = {"Authorization": f"Bearer {access_token}"}
    response = requests.get(url, headers=headers)
    return response.json()

Next Steps

After successfully importing your data:

  1. Explore Visualizations: Learn about Visualization Features
  2. Run Custom Analysis: Use the SQL Query Interface
  3. Set Up Regular Updates: Establish a routine import schedule
  4. Monitor Data Quality: Regularly validate imported data

For additional help, see the Troubleshooting Reference.