Troubleshooting¶

Common issues and solutions for working with the Naruto character network project.

Installation Issues¶

`ModuleNotFoundError: No module named 'naruto_net'`¶

Cause: The package wasn't installed in editable mode.

Solution:

cd /path/to/naruto-network-graph
pip install -e .

Verify installation:

python -c "from naruto_net.io.subtitles import AssReader; print('Success!')"

`pip install -r requirements.txt` fails with dependency conflicts¶

Cause: Conflicting package versions or Python version mismatch.

Solution 1: Use a fresh virtual environment

rm -rf .venv
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Solution 2: Install specific versions manually

pip install pandas==2.0.3
pip install pyarrow==13.0.0
pip install networkx==3.1

Tests fail with `FileNotFoundError` for fixtures¶

Cause: Running tests from wrong directory.

Solution: Always run tests from the project root

cd /path/to/naruto-network-graph
pytest tests/ -v

Not:

cd tests
pytest .  # ❌ Will fail

Subtitle Processing Issues¶

`UnicodeDecodeError` when parsing `.ass` files¶

Cause: Subtitle file uses non-UTF-8 encoding (common with older fansubs).

Solution 1: Specify encoding explicitly

Edit src/naruto_net/io/subtitles.py, line ~42:

# Try UTF-8 with BOM
with open(file_path, 'r', encoding='utf-8-sig') as f:

Solution 2: Convert file to UTF-8

iconv -f ISO-8859-1 -t UTF-8 input.ass > output.ass

Solution 3: Auto-detect encoding

import chardet

with open('subtitle.ass', 'rb') as f:
    result = chardet.detect(f.read())
    encoding = result['encoding']

# Then use detected encoding
with open('subtitle.ass', 'r', encoding=encoding) as f:
    # ...

No characters detected despite valid subtitles¶

Cause 1: Character not in alias JSON

Solution: Add missing character to data/character_aliases.json

{
  "Kakashi Hatake": ["Kakashi", "Copy Ninja", "Kakashi-sensei"]
}

Cause 2: Word boundary issues

Solution: Check if alias contains special characters

# Example: "Nine-Tails" needs hyphen in regex
# Already handled in detect/mentions.py via re.escape()

Debug: Print all detected mentions

from naruto_net.detect.mentions import detect_characters

mentions = detect_characters(text, alias_dict)
print(f"Detected: {mentions}")

Scene segmentation produces too many/few scenes¶

Cause: Gap threshold doesn't match subtitle pacing.

Solution: Adjust threshold in src/naruto_net/segment/scenes.py

# Default: 3.0 seconds
SCENE_GAP_THRESHOLD = 3.0

# For slower-paced shows
SCENE_GAP_THRESHOLD = 5.0  # Stricter (fewer scenes)

# For fast-paced action
SCENE_GAP_THRESHOLD = 2.0  # Looser (more scenes)

Debug: Export gap distribution

gaps = [scene.gap_after for scene in scenes if scene.gap_after]
print(f"Mean gap: {np.mean(gaps):.2f}s")
print(f"Median gap: {np.median(gaps):.2f}s")

Neo4j Issues¶

Neo4j import script fails with "Invalid syntax"¶

Cause: Pasting multi-line Cypher with extra whitespace.

Solution: Use LOAD CSV or batching instead of raw paste

Option 1: Load from CSV (recommended)

LOAD CSV WITH HEADERS FROM 'file:///edges.csv' AS row
CREATE (a:Character {name: row.character_a})-[:CONNECTED {
  weight: toInteger(row.weight),
  arc: row.arc
}]->(b:Character {name: row.character_b});

Option 2: Use UNWIND for batching

UNWIND [
  {a: 'Naruto', b: 'Sasuke', w: 5},
  {a: 'Naruto', b: 'Sakura', w: 3}
] AS edge
MATCH (a:Character {name: edge.a})
MATCH (b:Character {name: edge.b})
CREATE (a)-[:CONNECTED {weight: edge.w}]->(b);

"Cannot merge node using null property value"¶

Cause: Missing required property in MERGE statement.

Solution: Ensure character_id and name are always set

// ❌ Fails if character_id is NULL
MERGE (c:Character {character_id: NULL})

// ✓ Correct
MERGE (c:Character {character_id: 1, name: "Naruto Uzumaki"})

Neo4j query returns no results despite data being imported¶

Cause 1: Label mismatch (case-sensitive)

// ❌ Wrong label
MATCH (c:chuninexams) RETURN c  // Returns 0

// ✓ Correct
MATCH (c:ChūninExams) RETURN c  // Returns results

Cause 2: Relationship direction

// ❌ May miss bidirectional edges
MATCH (a)-[:CONNECTED]->(b) RETURN a, b

// ✓ Correct (undirected)
MATCH (a)-[:CONNECTED]-(b) RETURN a, b

Debug: Check what exists

// Count all nodes
MATCH (n) RETURN labels(n), count(n);

// Sample relationships
MATCH (a)-[r]->(b) RETURN type(r), count(r) LIMIT 10;

"Heap space" error when running large queries¶

Cause: Neo4j default memory limits too low.

Solution: Increase heap size in neo4j.conf

# In Neo4j Desktop: Settings → Database Settings
dbms.memory.heap.initial_size=1G
dbms.memory.heap.max_size=4G

Alternative: Use LIMIT and batching

// Instead of processing all nodes
MATCH (c:Character)
RETURN c
// Add limit
LIMIT 1000;

Pipeline Execution Issues¶

Script runs but produces no output files¶

Cause: Output directory doesn't exist.

Solution: Create directories first

mkdir -p data/intermediate
mkdir -p data/processed
mkdir -p data/reports

Or add to script:

import os
os.makedirs('data/processed', exist_ok=True)

"Permission denied" when writing output files¶

Cause: File is open in another program (Excel, Neo4j) or wrong permissions.

Solution 1: Close all programs using the file

Solution 2: Write to different filename

output_path = 'data/processed/edges_v2.csv'  # New file
df.to_csv(output_path, index=False)

Solution 3: Check file permissions

chmod 644 data/processed/edges.csv

Pipeline runs extremely slowly¶

Cause: Processing too many episodes or inefficient I/O.

Solution 1: Process episodes in batches

# Instead of all at once
episodes = list(range(1, 500))  # ❌ Slow

# Batch processing
for batch_start in range(1, 500, 50):
    batch = list(range(batch_start, batch_start + 50))
    process_episodes(batch)

Solution 2: Use Parquet instead of CSV for intermediate files

# Faster writes
df.to_parquet('data/intermediate/events.parquet')

# Faster reads
df = pd.read_parquet('data/intermediate/events.parquet')

Solution 3: Enable progress bars

from tqdm import tqdm

for episode in tqdm(episodes, desc="Processing"):
    process_episode(episode)

Testing Issues¶

Tests pass locally but fail in CI¶

Cause: Path separators differ (Windows vs Linux).

Solution: Use pathlib.Path

from pathlib import Path

# ❌ Platform-specific
file_path = 'data/intermediate\\events.csv'

# ✓ Cross-platform
file_path = Path('data') / 'intermediate' / 'events.csv'

`AssertionError` in CSV comparison tests¶

Cause: Floating point precision or column order mismatch.

Solution: Use pandas.testing.assert_frame_equal with tolerances

import pandas as pd
from pandas.testing import assert_frame_equal

# ❌ Too strict
assert df.equals(expected_df)

# ✓ Tolerant comparison
assert_frame_equal(
    df,
    expected_df,
    check_dtype=False,      # Ignore int64 vs int32
    check_exact=False,      # Allow float tolerance
    atol=1e-6,              # Absolute tolerance
    check_column_type=False # Ignore order
)

Documentation (MkDocs) Issues¶

`mkdocs serve` fails with "Port 8000 already in use"¶

Cause: Another process is using the default port.

Solution: Use alternate port

mkdocs serve -a 127.0.0.1:8050

Find what's using port 8000:

# macOS/Linux
lsof -i :8000

# Windows
netstat -ano | findstr :8000

Mermaid diagrams don't render¶

Cause: pymdownx.superfences not configured for Mermaid.

Solution: Check mkdocs.yml has:

markdown_extensions:
  - pymdownx.superfences:
      custom_fences:
        - name: mermaid
          class: mermaid
          format: !!python/name:pymdownx.superfences.fence_code_format

`mkdocstrings` fails to find Python modules¶

Cause: src/ not in Python path.

Solution 1: Install package in editable mode

pip install -e .

Solution 2: Set PYTHONPATH

export PYTHONPATH="${PYTHONPATH}:${PWD}/src"
mkdocs serve

Solution 3: Configure mkdocstrings paths in mkdocs.yml

plugins:
  - mkdocstrings:
      handlers:
        python:
          paths: [src]  # Tell mkdocstrings where to find modules

Material theme not loading custom colors¶

Cause: CSS file not created or path mismatch.

Solution: Verify file exists

ls docs/stylesheets/naruto-theme.css

And mkdocs.yml has:

extra_css:
  - stylesheets/naruto-theme.css

Performance Optimization¶

Speed up subtitle parsing¶

# Use generator pattern for large files
def parse_subtitles_streaming(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        for line in f:
            if line.startswith('Dialogue:'):
                yield parse_line(line)

# Instead of loading all into memory
events = list(parse_subtitles_streaming('large_file.ass'))

Reduce Neo4j import time¶

// Use CREATE instead of MERGE when IDs are unique
CREATE (c:Character {character_id: 1, name: "Naruto"})

// Batch imports with UNWIND
UNWIND $batch AS row
CREATE (c:Character {character_id: row.id, name: row.name})

Getting Help¶

If your issue isn't covered here:

Check existing issues: GitHub Issues
Search documentation: Use site search (top right)
Enable debug logging:

import logging
logging.basicConfig(level=logging.DEBUG)

Ask on GitHub Discussions: Discussions
Email: barbs@balex.com

Reporting Bugs¶

When opening a GitHub issue, include:

Python version: python --version
Operating system: macOS, Windows, Linux
Error message: Full traceback
Steps to reproduce:

# Example
git clone https://github.com/dagny099/naruto-network-graph.git
cd naruto-network-graph
pip install -r requirements.txt
python scripts/00_ass_ingest_subset.py
# Error occurs here

Expected vs actual behavior