IO Module¶
The io module provides utilities for parsing .ass (Advanced SubStation Alpha) subtitle files.
Overview¶
This module handles:
- Loading
.assfiles with proper encoding - Extracting
Dialogue:lines - Parsing timestamps and text content
- Returning structured
SubtitleEventobjects
Classes¶
AssReader
¶
Minimal but robust ASS/SSA (.ass) Dialogue parser.
Correctness requirements: - Only reads the [Events] section - Respects the Format: field order - Splits Dialogue payload with maxsplit=(n-1) to avoid breaking on commas in Text
Source code in src/naruto_net/io/subtitles.py
SubtitleEvent
dataclass
¶
Source code in src/naruto_net/io/subtitles.py
Usage Examples¶
Basic Parsing¶
from naruto_net.io.subtitles import AssReader
# Load subtitle file
reader = AssReader('data/naruto-subtitle-files/episode_014.ass')
# Parse all dialogue events
events = reader.read_events()
# Inspect first event
print(events[0])
# SubtitleEvent(start_time=676.77, end_time=680.27, text="So what should we do, Lady Hokage?")
Filtering by Time Range¶
# Get events between 10:00 and 15:00
target_start = 10 * 60 # 10 minutes in seconds
target_end = 15 * 60 # 15 minutes
filtered = [
e for e in events
if target_start <= e.start_time <= target_end
]
print(f"Found {len(filtered)} events in time range")
Converting to DataFrame¶
import pandas as pd
# Convert to pandas for analysis
df = pd.DataFrame([
{
'start': e.start_time,
'end': e.end_time,
'duration': e.end_time - e.start_time,
'text': e.text
}
for e in events
])
print(df.describe())
File Format¶
.ass files follow the Advanced SubStation Alpha specification. Dialogue lines look like:
Parsed fields:
0:11:16.77→start_time(converted to seconds: 676.77)0:11:20.27→end_time(converted to seconds: 680.27)So what should we do, Lady Hokage?→text
Ignored fields: Layer, style, margin, effects (not needed for character detection)
Error Handling¶
Encoding Issues¶
If you encounter UnicodeDecodeError:
# Try UTF-8 with BOM
reader = AssReader('file.ass', encoding='utf-8-sig')
# Or auto-detect
import chardet
with open('file.ass', 'rb') as f:
encoding = chardet.detect(f.read())['encoding']
reader = AssReader('file.ass', encoding=encoding)
Malformed Lines¶
Invalid dialogue lines are skipped with a warning:
import logging
logging.basicConfig(level=logging.WARNING)
# Lines that don't match the Dialogue format are logged but not processed
reader = AssReader('messy_file.ass')
events = reader.read_events()
Performance Notes¶
- Memory: Events are loaded into memory. For large files (>10MB), consider streaming.
- Speed: Parsing ~500 subtitle lines takes <50ms on typical hardware.