Data Import¶
Learn how to import your fitness data from various sources into the Fitness Dashboard.
Supported Data Sources¶
The dashboard is currently optimized for MapMyRun data but can be adapted for other fitness platforms with CSV export capabilities.
MapMyRun (Recommended)¶
MapMyRun provides comprehensive workout data that integrates seamlessly with the dashboard.
Exporting from MapMyRun¶
- Sign in to your MapMyRun account
- Navigate to MapMyRun Export
- Select date range for your export (or choose "All Time")
- Download the CSV file
Expected Data Format¶
Your MapMyRun export should include these columns:
Column | Description | Example |
---|---|---|
Workout Id |
Unique identifier | "2632022148" |
Workout Date |
Date and time | "2024-01-15 08:30:00" |
Activity Type |
Exercise category | "Running", "Cycling" |
Total Calories |
Energy burned | "450" |
Distance (mi) |
Miles covered | "3.2" |
Duration |
Time in seconds | "1800" |
Avg Pace (min/mi) |
Average pace | "8.5" |
Max Pace (min/mi) |
Best pace | "7.2" |
Steps |
Step count | "4200" |
Reference |
Link to workout | "https://..." |
Other Fitness Platforms¶
The dashboard can be adapted for other platforms with similar data structures:
- Strava: Export GPX/CSV data
- Garmin Connect: CSV export from activity history
- Fitbit: Data export via Fitbit API
- Apple Health: Health app data export
- Google Fit: Takeout data export
How Data Import Works¶
Understanding the complete journey from your fitness app to dashboard insights:
flowchart TD
A[📱 Export CSV from<br/>MapMyRun] --> B{📋 Data Validation}
B -->|✅ Valid| C[💾 Store in Database<br/>workout_summary table]
B -->|❌ Invalid| D[🚨 Show Error<br/>& Validation Tips]
C --> E[🔍 Analyze Features<br/>Pace, Distance, Duration]
E --> F{🤖 Classification Process}
F --> G[🏃 Real Run<br/>8-12 min/mile]
F --> H[🚶 Walking<br/>20-28 min/mile]
F --> I[🔄 Mixed Activity<br/>Variable pace]
F --> J[⚠️ Outlier<br/>Unusual data]
G --> K[📊 Available in Dashboard<br/>Charts, Trends, Insights]
H --> K
I --> K
J --> K
style A fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style B fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style C fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
style D fill:#ffebee,stroke:#d32f2f,stroke-width:2px
style E fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style F fill:#e0f2f1,stroke:#00695c,stroke-width:2px
style K fill:#e1f5fe,stroke:#0277bd,stroke-width:3px
Your data flows through validation, analysis, and classification before appearing in your dashboard. Each step ensures data quality and meaningful categorization.
Import Process¶
Step 1: Prepare Your Data File¶
- Download your fitness data as CSV
- Save the file in the
src/
directory of your project - Note the exact filename for configuration
Step 2: Configure Data Source¶
Update your pyproject.toml
file to reference your data file:
Step 3: Run Import Script¶
Execute the data import process:
The script will:
- ✅ Read your CSV file
- ✅ Validate data format and structure
- ✅ Clean and normalize data
- ✅ Insert records into the database
- ✅ Report import statistics
Step 4: Verify Import¶
Check that your data was imported successfully:
- Launch the dashboard:
streamlit run src/streamlit_app.py
- Navigate to Fitness Overview
- Run query:
SELECT COUNT(*) FROM workout_summary;
- Check the main dashboard for your workout data
Data Mapping and Transformation¶
Column Mapping¶
The import process maps CSV columns to database fields:
# Example mapping (in update_db.py)
column_mapping = {
'Workout Id': 'workout_id',
'Workout Date': 'workout_date',
'Activity Type': 'activity_type',
'Total Calories': 'kcal_burned',
'Distance (mi)': 'distance_mi',
'Duration': 'duration_sec',
'Avg Pace (min/mi)': 'avg_pace',
'Max Pace (min/mi)': 'max_pace',
'Steps': 'steps',
'Reference': 'link'
}
Data Cleaning¶
The import process includes automatic data cleaning:
- Date Parsing: Converts various date formats to MySQL DATETIME
- Numeric Validation: Ensures numeric fields contain valid numbers
- Text Normalization: Standardizes activity type names
- Duplicate Detection: Prevents importing duplicate workouts
- Missing Data Handling: Sets appropriate defaults for missing fields
Custom Data Sources¶
Adapting for New Platforms¶
To support a new fitness platform:
- Analyze the CSV structure from your platform
- Update column mapping in
src/update_db.py
- Modify data transformation logic if needed
- Test with a small data sample
Manual Data Entry¶
For manual data entry or custom tracking:
Create a CSV file with the required columns:
workout_id,workout_date,activity_type,kcal_burned,distance_mi,duration_sec,avg_pace,max_pace,steps,link
manual_001,2024-01-15 08:00:00,Running,300,2.5,1200,8.0,7.5,3000,
manual_002,2024-01-16 09:00:00,Cycling,400,10.0,2400,,,5000,
Batch Import Operations¶
Large Dataset Handling¶
For large datasets (1000+ workouts):
- Split large CSV files into smaller batches
- Import progressively to monitor progress
- Use database transactions for data integrity
- Monitor memory usage during import
# Example: Split large file
head -n 1 large_file.csv > header.csv
tail -n +2 large_file.csv | split -l 500 - batch_
for file in batch_*; do
cat header.csv $file > import_$file.csv
done
Incremental Updates¶
For regular data updates:
- Export only new workouts since last import
- Use date-based filtering in your export
- Run import script regularly (weekly/monthly)
- Verify no duplicate records are created
Data Quality and Validation¶
Pre-Import Validation¶
Before importing, validate your data:
import pandas as pd
# Load and inspect data
df = pd.read_csv('your_data.csv')
print(f"Records: {len(df)}")
print(f"Date range: {df['Workout Date'].min()} to {df['Workout Date'].max()}")
print(f"Activities: {df['Activity Type'].unique()}")
print(f"Missing values: {df.isnull().sum()}")
Post-Import Verification¶
After import, verify data quality:
-- Check record count
SELECT COUNT(*) as total_workouts FROM workout_summary;
-- Check date range
SELECT MIN(workout_date) as earliest, MAX(workout_date) as latest
FROM workout_summary;
-- Check activity distribution
SELECT activity_type, COUNT(*) as count
FROM workout_summary
GROUP BY activity_type
ORDER BY count DESC;
-- Check for anomalies
SELECT * FROM workout_summary
WHERE distance_mi > 50 OR duration_sec > 14400; -- Potential data errors
Troubleshooting Import Issues¶
Common Problems¶
File Not Found
Error: FileNotFoundError: No such file or directory
Solution:
- Verify file path in pyproject.toml
- Ensure CSV file is in the correct directory
- Check file permissions
CSV Format Error
Error: pandas.errors.EmptyDataError
or parsing errors
Solutions: - Check CSV file encoding (UTF-8 recommended) - Verify column headers match expected format - Remove or escape special characters in data
Database Connection Error
Error: pymysql.err.OperationalError
Solutions:
- Verify database is running
- Check credentials in .env
file
- Ensure database and table exist
Duplicate Key Error
Error: pymysql.err.IntegrityError: Duplicate entry
Solutions:
- Check for duplicate workout IDs in CSV
- Clear existing data if re-importing: DELETE FROM workout_summary;
- Implement upsert logic for updates
Data Quality Issues¶
Missing Data
Symptoms: Empty fields or null values in dashboard
Solutions: - Review CSV file for completeness - Update import script to handle missing values - Set appropriate defaults for optional fields
Incorrect Dates
Symptoms: Workouts appearing in wrong time periods
Solutions: - Verify date format in CSV matches parser expectations - Check timezone handling in import script - Manually inspect problematic date entries
Advanced Import Features¶
Automated Imports¶
Set up automated data imports:
#!/bin/bash
# auto_import.sh - Automated import script
# Download latest data (platform-specific)
# ... download logic ...
# Run import
cd /path/to/fitness-dashboard
python src/update_db.py
# Log results
echo "Import completed: $(date)" >> import.log
Schedule with cron:
API Integration¶
For real-time data integration (future enhancement):
# Example: Strava API integration
import requests
def fetch_strava_activities(access_token):
url = "https://www.strava.com/api/v3/athlete/activities"
headers = {"Authorization": f"Bearer {access_token}"}
response = requests.get(url, headers=headers)
return response.json()
Next Steps¶
After successfully importing your data:
- Explore Visualizations: Learn about Visualization Features
- Run Custom Analysis: Use the SQL Query Interface
- Set Up Regular Updates: Establish a routine import schedule
- Monitor Data Quality: Regularly validate imported data
For additional help, see the Troubleshooting Reference.