API Reference¶
The naruto_net package provides a modular pipeline for extracting character co-appearance networks from subtitle files.
Package Structure¶
naruto_net/
├── io/ # Subtitle file parsing
├── normalize/ # Text cleaning and normalization
├── segment/ # Scene detection
├── detect/ # Character mention detection
├── build/ # Edge construction
└── qc/ # Quality control reports
Quick Links¶
-
IO Module
Parse
.asssubtitle files into structured events. -
Normalize Module
Clean ASS formatting tags and normalize text.
-
Segment Module
Detect scene boundaries using timing gaps.
-
Detect Module
Find character mentions via alias matching.
-
Build Module
Construct co-appearance edges from scene presence.
-
QC Module
Generate quality control and validation reports.
Installation¶
The package must be installed in editable mode to use:
This makes the naruto_net modules importable:
from naruto_net.io.subtitles import AssReader
from naruto_net.detect.mentions import detect_characters
Basic Usage Example¶
End-to-End Pipeline¶
from pathlib import Path
import pandas as pd
# 1. Parse subtitle file
from naruto_net.io.subtitles import AssReader
reader = AssReader('data/naruto-subtitle-files/episode_014.ass')
events = reader.read_events()
# 2. Normalize text
from naruto_net.normalize.ass_text import strip_ass_tags, normalize_newlines
for event in events:
event.text = strip_ass_tags(event.text)
event.text = normalize_newlines(event.text)
# 3. Segment into scenes
from naruto_net.segment.scenes import segment_scenes
scenes = segment_scenes(events, gap_threshold=3.0)
# 4. Detect character mentions
from naruto_net.detect.mentions import detect_characters, load_alias_dict
alias_dict = load_alias_dict('data/character_aliases.json')
mentions = []
for event in events:
chars = detect_characters(event.text, alias_dict)
mentions.extend(chars)
# 5. Build co-appearance edges
from naruto_net.build.edges import build_edges_from_scenes
edges = build_edges_from_scenes(scenes, mentions)
# 6. Export
edges_df = pd.DataFrame(edges)
edges_df.to_csv('data/processed/edges.csv', index=False)
Module Summaries¶
IO: Subtitle Parsing¶
Purpose: Load .ass files and extract dialogue events
Key classes:
AssReader— Main parser classSubtitleEvent— Dataclass for (start_time, end_time, text)
Example:
from naruto_net.io.subtitles import AssReader
reader = AssReader('episode_001.ass')
events = reader.read_events()
print(f"Parsed {len(events)} dialogue lines")
# Output: Parsed 342 dialogue lines
Normalize: Text Cleaning¶
Purpose: Remove ASS formatting and clean text
Key functions:
strip_ass_tags(text)— Remove{\i1},{\b1}, etc.normalize_newlines(text)— Convert\Nto spacesplit_multi_speaker(text)— Handle "A: ... B: ..." lines
Example:
from naruto_net.normalize.ass_text import strip_ass_tags
text = r"{\i1}Naruto!{\i0} You're late again!"
clean = strip_ass_tags(text)
print(clean)
# Output: "Naruto! You're late again!"
Segment: Scene Detection¶
Purpose: Group dialogue into scenes using timing gaps
Key functions:
segment_scenes(events, gap_threshold=3.0)— Detect scene breaksScene— Dataclass for scene metadata
Example:
from naruto_net.segment.scenes import segment_scenes
scenes = segment_scenes(events, gap_threshold=3.0)
print(f"Detected {len(scenes)} scenes")
# Output: Detected 42 scenes
Detect: Character Mentions¶
Purpose: Find character names using alias matching
Key functions:
detect_characters(text, alias_dict)— Find mentions in textload_alias_dict(json_path)— Load character aliasesbuild_regex_patterns(alias_dict)— Compile word-boundary patterns
Example:
from naruto_net.detect.mentions import detect_characters, load_alias_dict
alias_dict = load_alias_dict('character_aliases.json')
text = "I will avenge Pervy Sage!"
chars = detect_characters(text, alias_dict)
print(chars)
# Output: [{'character': 'Jiraiya', 'alias_matched': 'Pervy Sage', 'confidence': 0.8}]
Build: Edge Construction¶
Purpose: Create co-appearance edges from scene presence
Key functions:
build_edges_from_scenes(scenes, mentions)— Construct edge listaggregate_edge_weights(edges)— Sum weights for duplicate pairs
Example:
from naruto_net.build.edges import build_edges_from_scenes
edges = build_edges_from_scenes(scenes, mentions)
print(f"Created {len(edges)} edges")
# Output: Created 127 edges
QC: Quality Control¶
Purpose: Generate validation and quality reports
Key functions:
generate_episode_qc(episodes)— Events parsed, scenes, characters per episodegenerate_alias_qc(mentions)— Alias match frequency, shadowing detection
Example:
from naruto_net.qc.reports import generate_episode_qc
report = generate_episode_qc(episodes)
report.to_csv('data/reports/episode_qc.csv', index=False)
Testing¶
The package includes comprehensive test coverage:
Test files:
test_ass_reader_parsing.py— Parser correctnesstest_text_cleaning.py— ASS tag removaltest_scene_segmentation.py— Scene boundary detectiontest_mentions_matching.py— Alias matching accuracy
Development¶
Adding New Modules¶
- Create module file in
src/naruto_net/<category>/ - Add docstrings (Google style)
- Write tests in
tests/ - Update this documentation
Code Style¶
- Docstrings: Google style (Args, Returns, Examples)
- Type hints: Use for function signatures
- Naming: Snake_case for functions, PascalCase for classes
Example:
def detect_characters(text: str, alias_dict: dict) -> list[dict]:
"""Find character mentions in dialogue text.
Args:
text: Dialogue text to search
alias_dict: Dictionary mapping canonical names to alias lists
Returns:
List of dicts with keys: character, alias_matched, confidence
Examples:
>>> alias_dict = {"Naruto Uzumaki": ["Naruto", "Hokage"]}
>>> detect_characters("Where is Naruto?", alias_dict)
[{'character': 'Naruto Uzumaki', 'alias_matched': 'Naruto', 'confidence': 1.0}]
"""
# Implementation...
Contributing¶
See the GitHub repository for contribution guidelines.
Key areas for contribution:
- Performance optimization: Vectorize alias matching, parallel processing
- Feature additions: Support for
.srtfiles, speaker attribution - Documentation: More examples, tutorials
- Testing: Edge cases, integration tests