Skip to content

API Reference

The naruto_net package provides a modular pipeline for extracting character co-appearance networks from subtitle files.


Package Structure

naruto_net/
├── io/          # Subtitle file parsing
├── normalize/   # Text cleaning and normalization
├── segment/     # Scene detection
├── detect/      # Character mention detection
├── build/       # Edge construction
└── qc/          # Quality control reports


Installation

The package must be installed in editable mode to use:

pip install -e .

This makes the naruto_net modules importable:

from naruto_net.io.subtitles import AssReader
from naruto_net.detect.mentions import detect_characters

Basic Usage Example

End-to-End Pipeline

from pathlib import Path
import pandas as pd

# 1. Parse subtitle file
from naruto_net.io.subtitles import AssReader

reader = AssReader('data/naruto-subtitle-files/episode_014.ass')
events = reader.read_events()

# 2. Normalize text
from naruto_net.normalize.ass_text import strip_ass_tags, normalize_newlines

for event in events:
    event.text = strip_ass_tags(event.text)
    event.text = normalize_newlines(event.text)

# 3. Segment into scenes
from naruto_net.segment.scenes import segment_scenes

scenes = segment_scenes(events, gap_threshold=3.0)

# 4. Detect character mentions
from naruto_net.detect.mentions import detect_characters, load_alias_dict

alias_dict = load_alias_dict('data/character_aliases.json')
mentions = []
for event in events:
    chars = detect_characters(event.text, alias_dict)
    mentions.extend(chars)

# 5. Build co-appearance edges
from naruto_net.build.edges import build_edges_from_scenes

edges = build_edges_from_scenes(scenes, mentions)

# 6. Export
edges_df = pd.DataFrame(edges)
edges_df.to_csv('data/processed/edges.csv', index=False)

Module Summaries

IO: Subtitle Parsing

Purpose: Load .ass files and extract dialogue events

Key classes:

  • AssReader — Main parser class
  • SubtitleEvent — Dataclass for (start_time, end_time, text)

Example:

from naruto_net.io.subtitles import AssReader

reader = AssReader('episode_001.ass')
events = reader.read_events()

print(f"Parsed {len(events)} dialogue lines")
# Output: Parsed 342 dialogue lines

Full IO Documentation


Normalize: Text Cleaning

Purpose: Remove ASS formatting and clean text

Key functions:

  • strip_ass_tags(text) — Remove {\i1}, {\b1}, etc.
  • normalize_newlines(text) — Convert \N to space
  • split_multi_speaker(text) — Handle "A: ... B: ..." lines

Example:

from naruto_net.normalize.ass_text import strip_ass_tags

text = r"{\i1}Naruto!{\i0} You're late again!"
clean = strip_ass_tags(text)
print(clean)
# Output: "Naruto! You're late again!"

Full Normalize Documentation


Segment: Scene Detection

Purpose: Group dialogue into scenes using timing gaps

Key functions:

  • segment_scenes(events, gap_threshold=3.0) — Detect scene breaks
  • Scene — Dataclass for scene metadata

Example:

from naruto_net.segment.scenes import segment_scenes

scenes = segment_scenes(events, gap_threshold=3.0)

print(f"Detected {len(scenes)} scenes")
# Output: Detected 42 scenes

Full Segment Documentation


Detect: Character Mentions

Purpose: Find character names using alias matching

Key functions:

  • detect_characters(text, alias_dict) — Find mentions in text
  • load_alias_dict(json_path) — Load character aliases
  • build_regex_patterns(alias_dict) — Compile word-boundary patterns

Example:

from naruto_net.detect.mentions import detect_characters, load_alias_dict

alias_dict = load_alias_dict('character_aliases.json')
text = "I will avenge Pervy Sage!"

chars = detect_characters(text, alias_dict)
print(chars)
# Output: [{'character': 'Jiraiya', 'alias_matched': 'Pervy Sage', 'confidence': 0.8}]

Full Detect Documentation


Build: Edge Construction

Purpose: Create co-appearance edges from scene presence

Key functions:

  • build_edges_from_scenes(scenes, mentions) — Construct edge list
  • aggregate_edge_weights(edges) — Sum weights for duplicate pairs

Example:

from naruto_net.build.edges import build_edges_from_scenes

edges = build_edges_from_scenes(scenes, mentions)

print(f"Created {len(edges)} edges")
# Output: Created 127 edges

Full Build Documentation


QC: Quality Control

Purpose: Generate validation and quality reports

Key functions:

  • generate_episode_qc(episodes) — Events parsed, scenes, characters per episode
  • generate_alias_qc(mentions) — Alias match frequency, shadowing detection

Example:

from naruto_net.qc.reports import generate_episode_qc

report = generate_episode_qc(episodes)
report.to_csv('data/reports/episode_qc.csv', index=False)

Full QC Documentation


Testing

The package includes comprehensive test coverage:

pytest tests/ -v

Test files:

  • test_ass_reader_parsing.py — Parser correctness
  • test_text_cleaning.py — ASS tag removal
  • test_scene_segmentation.py — Scene boundary detection
  • test_mentions_matching.py — Alias matching accuracy

Development

Adding New Modules

  1. Create module file in src/naruto_net/<category>/
  2. Add docstrings (Google style)
  3. Write tests in tests/
  4. Update this documentation

Code Style

  • Docstrings: Google style (Args, Returns, Examples)
  • Type hints: Use for function signatures
  • Naming: Snake_case for functions, PascalCase for classes

Example:

def detect_characters(text: str, alias_dict: dict) -> list[dict]:
    """Find character mentions in dialogue text.

    Args:
        text: Dialogue text to search
        alias_dict: Dictionary mapping canonical names to alias lists

    Returns:
        List of dicts with keys: character, alias_matched, confidence

    Examples:
        >>> alias_dict = {"Naruto Uzumaki": ["Naruto", "Hokage"]}
        >>> detect_characters("Where is Naruto?", alias_dict)
        [{'character': 'Naruto Uzumaki', 'alias_matched': 'Naruto', 'confidence': 1.0}]
    """
    # Implementation...

Contributing

See the GitHub repository for contribution guidelines.

Key areas for contribution:

  • Performance optimization: Vectorize alias matching, parallel processing
  • Feature additions: Support for .srt files, speaker attribution
  • Documentation: More examples, tutorials
  • Testing: Edge cases, integration tests