Database Schema¶
This document provides detailed information about the Fitness Dashboard database schema, including table structures, relationships, and indexing strategies.
Overview¶
The Fitness Dashboard uses a MySQL database with a straightforward schema optimized for fitness tracking and analytics. The current implementation focuses on a single core table with plans for future expansion.
Database Information¶
- Database Name:
sweat
- Engine: MySQL 8.0+
- Character Set: utf8mb4
- Collation: utf8mb4_unicode_ci
Core Tables¶
workout_summary¶
The primary table storing all workout data imported from fitness platforms.
Table Definition¶
CREATE TABLE workout_summary (
workout_id VARCHAR(20) PRIMARY KEY,
workout_date DATETIME NOT NULL,
activity_type VARCHAR(50) NOT NULL,
kcal_burned BIGINT,
distance_mi FLOAT,
duration_sec FLOAT,
avg_pace FLOAT,
max_pace FLOAT,
steps BIGINT,
link VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Column Specifications¶
Column | Type | Null | Default | Description |
---|---|---|---|---|
workout_id |
VARCHAR(20) | NO | Unique identifier for the workout (Primary Key) | |
workout_date |
DATETIME | NO | Date and time when the workout occurred | |
activity_type |
VARCHAR(50) | NO | Type of physical activity (Running, Cycling, etc.) | |
kcal_burned |
BIGINT | YES | NULL | Total calories burned during the workout |
distance_mi |
FLOAT | YES | NULL | Distance covered in miles |
duration_sec |
FLOAT | YES | NULL | Workout duration in seconds |
avg_pace |
FLOAT | YES | NULL | Average pace in minutes per mile |
max_pace |
FLOAT | YES | NULL | Best/fastest pace achieved in minutes per mile |
steps |
BIGINT | YES | NULL | Total step count (if available) |
link |
VARCHAR(100) | YES | NULL | URL to original workout data |
created_at |
TIMESTAMP | NO | CURRENT_TIMESTAMP | Record creation timestamp |
updated_at |
TIMESTAMP | NO | CURRENT_TIMESTAMP | Last update timestamp |
Data Types and Constraints¶
Primary Key:
- workout_id
: Must be unique across all records
- Typically sourced from fitness platform's workout identifier
- Maximum 20 characters to accommodate various ID formats
Required Fields:
- workout_date
: Must be a valid datetime
- activity_type
: Cannot be empty, standardized activity names
Numeric Fields:
- kcal_burned
: BIGINT to handle high-calorie workouts
- distance_mi
: FLOAT with precision for decimal miles
- duration_sec
: FLOAT to handle fractional seconds
- pace fields
: FLOAT for precise pace calculations
- steps
: BIGINT for high step counts
Text Fields:
- activity_type
: Limited to 50 characters
- link
: Limited to 100 characters for URLs
Indexes¶
Performance Indexes¶
Indexes are crucial for query performance, especially with large datasets:
-- Primary index (automatic)
PRIMARY KEY (workout_id)
-- Date-based queries (most common)
CREATE INDEX idx_workout_date ON workout_summary(workout_date);
-- Activity filtering
CREATE INDEX idx_activity_type ON workout_summary(activity_type);
-- Composite index for common query patterns
CREATE INDEX idx_date_activity ON workout_summary(workout_date, activity_type);
-- Performance metrics queries
CREATE INDEX idx_distance_date ON workout_summary(distance_mi, workout_date);
CREATE INDEX idx_calories_date ON workout_summary(kcal_burned, workout_date);
Index Usage Patterns¶
Date Range Queries:
-- Uses idx_workout_date
SELECT * FROM workout_summary
WHERE workout_date >= '2024-01-01'
AND workout_date < '2024-02-01';
Activity Analysis:
-- Uses idx_date_activity composite index
SELECT activity_type, COUNT(*), AVG(distance_mi)
FROM workout_summary
WHERE workout_date >= '2024-01-01'
GROUP BY activity_type;
Performance Queries:
-- Uses idx_distance_date for efficient filtering
SELECT * FROM workout_summary
WHERE distance_mi > 5.0
AND workout_date >= '2024-01-01'
ORDER BY distance_mi DESC;
Data Validation Rules¶
Application-Level Constraints¶
The application enforces additional validation beyond database constraints:
Workout ID Validation¶
def validate_workout_id(workout_id: str) -> bool:
"""Validate workout ID format"""
return (
workout_id and
len(workout_id) <= 20 and
workout_id.isalnum()
)
Date Validation¶
def validate_workout_date(date: datetime) -> bool:
"""Validate workout date is reasonable"""
now = datetime.now()
min_date = datetime(2000, 1, 1) # Reasonable minimum
return min_date <= date <= now
Numeric Field Validation¶
def validate_numeric_fields(workout_data: dict) -> dict:
"""Validate and clean numeric fields"""
validations = {
'kcal_burned': lambda x: 0 <= x <= 10000, # Reasonable calorie range
'distance_mi': lambda x: 0 <= x <= 200, # Max reasonable distance
'duration_sec': lambda x: 60 <= x <= 86400, # 1 min to 24 hours
'avg_pace': lambda x: 3 <= x <= 30, # 3-30 min/mile reasonable
'max_pace': lambda x: 3 <= x <= 30, # Same range as avg_pace
'steps': lambda x: 0 <= x <= 100000 # Max reasonable steps
}
# Apply validations and clean data
return cleaned_data
Common Query Patterns¶
Performance Optimized Queries¶
Monthly Summaries¶
-- Optimized monthly aggregation
SELECT
DATE_FORMAT(workout_date, '%Y-%m') as month,
COUNT(*) as total_workouts,
ROUND(SUM(distance_mi), 2) as total_distance,
ROUND(SUM(kcal_burned)) as total_calories,
ROUND(AVG(duration_sec / 60), 1) as avg_duration_min
FROM workout_summary
WHERE workout_date >= DATE_SUB(CURDATE(), INTERVAL 12 MONTH)
GROUP BY DATE_FORMAT(workout_date, '%Y-%m')
ORDER BY month DESC;
Activity Performance Analysis¶
-- Activity-specific performance metrics
SELECT
activity_type,
COUNT(*) as workout_count,
ROUND(AVG(distance_mi), 2) as avg_distance,
ROUND(AVG(kcal_burned)) as avg_calories,
ROUND(AVG(avg_pace), 2) as avg_pace,
ROUND(MIN(avg_pace), 2) as best_pace
FROM workout_summary
WHERE workout_date >= DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
AND distance_mi > 0
AND avg_pace IS NOT NULL
GROUP BY activity_type
HAVING workout_count >= 5 -- Only activities with sufficient data
ORDER BY workout_count DESC;
Recent Activity Trends¶
-- Recent workout trends with rolling averages
SELECT
workout_date,
activity_type,
distance_mi,
kcal_burned,
AVG(distance_mi) OVER (
ORDER BY workout_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as rolling_avg_distance
FROM workout_summary
WHERE workout_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
ORDER BY workout_date DESC;
Data Import Schema Mapping¶
CSV to Database Mapping¶
The application maps CSV columns from fitness platforms to database fields:
CSV Column (MapMyRun) | Database Field | Transformation |
---|---|---|
Workout Id | workout_id | Direct mapping |
Workout Date | workout_date | Parse datetime string |
Activity Type | activity_type | Standardize activity names |
Total Calories | kcal_burned | Convert to integer |
Distance (mi) | distance_mi | Convert to float |
Duration | duration_sec | Convert time to seconds |
Avg Pace (min/mi) | avg_pace | Convert to float |
Max Pace (min/mi) | max_pace | Convert to float |
Steps | steps | Convert to integer |
Reference | link | Direct mapping |
Data Transformation Logic¶
def transform_csv_row(row: dict) -> dict:
"""Transform CSV row to database format"""
return {
'workout_id': row['Workout Id'],
'workout_date': parse_datetime(row['Workout Date']),
'activity_type': standardize_activity_type(row['Activity Type']),
'kcal_burned': safe_int_conversion(row.get('Total Calories')),
'distance_mi': safe_float_conversion(row.get('Distance (mi)')),
'duration_sec': parse_duration(row.get('Duration')),
'avg_pace': safe_float_conversion(row.get('Avg Pace (min/mi)')),
'max_pace': safe_float_conversion(row.get('Max Pace (min/mi)')),
'steps': safe_int_conversion(row.get('Steps')),
'link': row.get('Reference'),
}
Database Maintenance¶
Regular Maintenance Tasks¶
Data Cleanup¶
-- Remove duplicate workouts (keep most recent)
DELETE w1 FROM workout_summary w1
INNER JOIN workout_summary w2
WHERE w1.workout_id = w2.workout_id
AND w1.created_at < w2.created_at;
-- Clean invalid data
UPDATE workout_summary
SET distance_mi = NULL
WHERE distance_mi < 0 OR distance_mi > 200;
UPDATE workout_summary
SET kcal_burned = NULL
WHERE kcal_burned < 0 OR kcal_burned > 10000;
Index Maintenance¶
-- Analyze table statistics
ANALYZE TABLE workout_summary;
-- Optimize table structure
OPTIMIZE TABLE workout_summary;
-- Check table integrity
CHECK TABLE workout_summary;
-- Rebuild indexes if needed
ALTER TABLE workout_summary ENGINE=InnoDB;
Performance Monitoring¶
-- Check index usage
SELECT
TABLE_NAME,
INDEX_NAME,
CARDINALITY,
INDEX_TYPE
FROM information_schema.STATISTICS
WHERE TABLE_SCHEMA = 'sweat'
AND TABLE_NAME = 'workout_summary';
-- Monitor slow queries (enable slow query log)
SELECT
query_time,
lock_time,
rows_sent,
rows_examined,
sql_text
FROM mysql.slow_log
WHERE start_time >= DATE_SUB(NOW(), INTERVAL 1 DAY);
Future Schema Evolution¶
Planned Enhancements¶
User Management¶
-- Future user table
CREATE TABLE users (
user_id UUID PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Add user reference to workouts
ALTER TABLE workout_summary
ADD COLUMN user_id UUID,
ADD FOREIGN KEY (user_id) REFERENCES users(user_id);
Goal Tracking¶
-- Goals and targets table
CREATE TABLE fitness_goals (
goal_id UUID PRIMARY KEY,
user_id UUID,
goal_type ENUM('distance', 'calories', 'workouts', 'pace'),
target_value FLOAT NOT NULL,
target_period ENUM('daily', 'weekly', 'monthly', 'yearly'),
start_date DATE NOT NULL,
end_date DATE,
is_active BOOLEAN DEFAULT TRUE,
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
Workout Details¶
-- Detailed workout metrics
CREATE TABLE workout_metrics (
metric_id UUID PRIMARY KEY,
workout_id VARCHAR(20),
metric_type VARCHAR(30) NOT NULL,
metric_value FLOAT NOT NULL,
metric_unit VARCHAR(10),
recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (workout_id) REFERENCES workout_summary(workout_id)
);
Migration Strategy¶
For schema changes, use versioned migration scripts:
-- migrations/001_add_user_support.sql
-- Migration: Add user support to workout_summary
-- Date: 2024-XX-XX
START TRANSACTION;
-- Add user_id column
ALTER TABLE workout_summary
ADD COLUMN user_id UUID DEFAULT NULL;
-- Create index for user queries
CREATE INDEX idx_user_workouts ON workout_summary(user_id, workout_date);
-- Update version
INSERT INTO schema_migrations (version, description, applied_at)
VALUES (1, 'Add user support', NOW());
COMMIT;
Backup and Recovery¶
Backup Strategy¶
#!/bin/bash
# backup_database.sh
DATE=$(date +"%Y%m%d_%H%M%S")
BACKUP_DIR="/path/to/backups"
DB_NAME="sweat"
# Full database backup
mysqldump -u fitness_user -p$MYSQL_PWD \
--single-transaction \
--routines \
--triggers \
$DB_NAME > $BACKUP_DIR/sweat_backup_$DATE.sql
# Compress backup
gzip $BACKUP_DIR/sweat_backup_$DATE.sql
# Clean old backups (keep 30 days)
find $BACKUP_DIR -name "sweat_backup_*.sql.gz" -mtime +30 -delete
Recovery Procedures¶
# Restore from backup
gunzip sweat_backup_20240115_120000.sql.gz
mysql -u fitness_user -p sweat < sweat_backup_20240115_120000.sql
# Verify restoration
mysql -u fitness_user -p -e "
SELECT COUNT(*) as total_workouts,
MIN(workout_date) as earliest,
MAX(workout_date) as latest
FROM sweat.workout_summary;"
For more information about database operations, see the API Reference.