The fastest way to download playlists and sanitize metadata.. Yet..
  • Go 98.6%
  • Shell 0.8%
  • Dockerfile 0.6%
Find a file
2026-03-09 19:08:57 +01:00
.dockerignore initial commit 2026-02-17 15:46:19 +01:00
.gitignore more.. fixes? 2026-02-17 20:43:03 +01:00
AGENTS.md more improvements.. 2026-02-20 16:32:11 +01:00
CHANGES_SUMMARY.md initial commit 2026-02-17 15:46:19 +01:00
classifier.go more fixing jellyfin metadata.. 2026-02-19 10:24:13 +01:00
compose.yml fixed metadata of the artist and titles 2026-03-09 19:08:57 +01:00
config.go added a way for single songs + stripped the file saving more? 2026-02-19 12:45:29 +01:00
database.go fixed metadata of the artist and titles 2026-03-09 19:08:57 +01:00
Dockerfile now --embed-thumbnail and metadata is used by yt-dlp 2026-02-17 22:07:52 +01:00
downloader.go fixed metadata of the artist and titles 2026-03-09 19:08:57 +01:00
entrypoint.sh so.. pretty impressive. Nice speed | only the albums are missing.. 2026-02-17 17:21:34 +01:00
go.mod initial commit 2026-02-17 15:46:19 +01:00
go.sum initial commit 2026-02-17 15:46:19 +01:00
logger.go more fixing jellyfin metadata.. 2026-02-19 10:24:13 +01:00
main.go fixed metadata of the artist and titles 2026-03-09 19:08:57 +01:00
metadata.go fixing jellyfin metadata processing.. again.. through Kimi.. 2026-02-19 09:55:32 +01:00
organizer.go now better metadata handling.. 2026-02-21 11:13:19 +01:00
PHASE1A_HARDENING.md initial commit 2026-02-17 15:46:19 +01:00
PHASE2_SUMMARY.md initial commit 2026-02-17 15:46:19 +01:00
PHASE2_WORKER_POOL.md initial commit 2026-02-17 15:46:19 +01:00
PHASE3_RETRY_SYSTEM.md initial commit 2026-02-17 15:46:19 +01:00
PHASE4_RATE_LIMITER.md initial commit 2026-02-17 15:46:19 +01:00
PHASE5_PROBE.md initial commit 2026-02-17 15:46:19 +01:00
PHASE6_METADATA_POLISHING.md more changes 2026-02-17 16:53:52 +01:00
ratelimiter.go initial commit 2026-02-17 15:46:19 +01:00
README.md initial commit 2026-02-17 15:46:19 +01:00
restart.go fixed metadata of the artist and titles 2026-03-09 19:08:57 +01:00
title_sanitizer.go more improvements.. 2026-02-20 16:32:11 +01:00
title_sanitizer_test.go more improvements.. 2026-02-20 16:32:11 +01:00
worker.go fixed metadata of the artist and titles 2026-03-09 19:08:57 +01:00

Musicd - Automated Music Downloader

A restart-safe, Dockerized music downloader service that downloads audio from YouTube playlists and SoundCloud URLs, fingerprints tracks, resolves metadata via AcoustID and MusicBrainz, and organizes files in a clean structure.

Phase 4 Features (Domain-Aware Rate Limiter)

  • Per-domain concurrency limits (configurable per platform)
  • Per-domain request pacing (minimum interval between requests)
  • Dynamic backoff adjustment - increases on rate limits, decreases on success
  • Domain isolation - YouTube limits don't affect SoundCloud
  • In-memory operation - no external services or persistence
  • All Phase 1A, 2, and 3 features maintained

Phase 4 Rate Limiter

See PHASE4_RATE_LIMITER.md for implementation details.

Supported Domains:

Domain Max Concurrent Min Interval
youtube 2 2s
soundcloud 1 3s
generic 2 1s

Dynamic Adjustment:

  • On rate limit: Double interval (max 1 minute)
  • On 5 consecutive successes: Reduce by 10% (min config baseline)

Phase 3 Retry System (Inherited)

See PHASE3_RETRY_SYSTEM.md for implementation details.

Failure Classes:

Class Retries Backoff
network 5 exponential
rate_limit 5 exponential + jitter
temporary_api 3 exponential
corrupted_media 0 permanent
unavailable 0 permanent
invalid_url 0 permanent
unknown 2 fixed 1m

Backoff Formula: delay = 30s * (2 ^ retry_count), max 6 hours

Phase 2 Worker Pool (Inherited)

See PHASE2_SUMMARY.md for implementation details.

  • Concurrent Workers: Configurable 1-5 workers (default: 2, max: 5)
  • Atomic Claiming: SQL UPDATE...RETURNING ensures single claim
  • Per-Worker Resources: Individual temp dirs and DB connections
  • Worker IDs in Logs: [W1][INFO], [W2][OK], etc.
  • Bounded Concurrency: No unbounded goroutine spawning

Phase 1A Hardening (Inherited)

See PHASE1A_HARDENING.md for complete audit details.

  • Transaction Safety: Atomic DB updates with transaction wrapping
  • Timeout Protection: All external calls time out (HTTP 10s, fpcalc 15s)
  • Input Validation: Fingerprint length, duration limits, filename sanitization
  • Crash Recovery: Temp directory cleanup, WAL mode, job reset (preserves retry state)
  • Determinism: Sorted results, consistent ordering, no random selection
  • Path Security: Directory traversal prevention, reserved name filtering
  • Duplicate Detection: Size + SHA256 hash comparison for files <20MB

Requirements

  • Docker
  • Docker Compose (optional)

Quick Start

  1. Clone the repository and navigate to the project:

    cd musicd
    
  2. Create the config directory and configuration file:

    mkdir -p config
    cp config/example.playlists.toml config/playlists.toml
    
  3. Edit the configuration file:

    nano config/playlists.toml
    

    Add your AcoustID API key and playlist URLs:

    [general]
    acoustid_api_key = "YOUR_ACOUSTID_API_KEY_HERE"
    download_archive = "archive.txt"
    
    [[sources]]
    url = "https://www.youtube.com/playlist?list=YOUR_PLAYLIST_ID"
    
    [[sources]]
    url = "https://soundcloud.com/artist/track"
    
  4. Create the music output directory:

    mkdir -p music
    
  5. Build and run with Docker Compose:

    docker-compose up --build
    

    Or run with Docker directly:

    docker build -t musicd .
    docker run -d \
      -e PUID=1000 \
      -e PGID=1000 \
      -v $(pwd)/music:/music \
      -v $(pwd)/config:/config \
      musicd
    

Configuration

Environment Variables

  • PUID - User ID to run as (default: 0)
  • PGID - Group ID to run as (default: 0)

Volumes

  • /music - Output directory for downloaded music
  • /config - Configuration and database directory
    • playlists.toml - Playlist configuration
    • music.db - SQLite database
    • archive.txt - yt-dlp download archive

Config File Format (TOML)

[general]
acoustid_api_key = "YOUR_KEY"
download_archive = "/config/archive.txt"
workers = 3  # Number of concurrent workers (1-5, default: 2)

# Optional: Rate limiting per domain
[rate_limit.youtube]
max_concurrent = 2
min_interval = "2s"

[rate_limit.soundcloud]
max_concurrent = 1
min_interval = "3s"

[rate_limit.generic]
max_concurrent = 2
min_interval = "1s"

[[sources]]
url = "https://youtube.com/playlist?list=XXXX"

[[sources]]
url = "https://soundcloud.com/user/track"

How It Works

  1. Startup:

    • Load configuration from /config/playlists.toml
    • Initialize SQLite database at /config/music.db
    • Ensure database schema exists
  2. Source Scanning:

    • For each configured source URL, run yt-dlp --flat-playlist --dump-json
    • Extract video IDs and insert missing ones as pending in the database
  3. Processing Loop:

    • Continuously process pending or failed downloads
    • Never reprocess already processed tracks
    • Sleep 5 minutes between checks
  4. Download Process:

    • Download audio using yt-dlp --extract-audio --bestaudio
    • Write to temp folder /tmp/downloads
    • Use --download-archive to prevent duplicates
  5. Metadata Resolution (in order):

    • Generate fingerprint using fpcalc
    • Lookup via AcoustID (requires score > 0.8)
    • Fetch details from MusicBrainz
    • Fallback to embedded tags via ffprobe
    • Final fallback: parse from filename
  6. File Organization:

    • Sanitize names (remove invalid chars, title case)
    • Store as /music/{artist}/{album}/{title}.{ext}
    • Handle duplicates by appending _(1), etc.
    • Atomic move operation

Logging

The application produces structured CLI logs with worker IDs:

[MAIN][INFO] Starting musicd...
[MAIN][OK]   Loaded config with 2 sources
[MAIN][OK]   Rate limiter configured: youtube=2/2s, soundcloud=1/3s, generic=2/1s
[W1][INFO] Worker started
[W2][INFO] Worker started
[W3][INFO] Worker started
[W1][INFO] Claimed job: abc123
[W1][INFO] Downloading: Artist - Title
[WAIT] youtube limiter 4.2s
[W1][OK]   Metadata resolved via AcoustID
[W1][OK]   Stored → /music/Artist/Album/Title.flac
[RATE] youtube backoff increased to 8s
[W1][INFO] [RETRY] network error → retry #2 in 2m
[W2][INFO] [DEAD] unavailable → marked permanent

Format: [WORKER][LEVEL] Message

  • WORKER: MAIN (main thread) or W1, W2, W3, etc. (workers)
  • LEVEL: INFO, OK, FAIL, WARN

Special Log Prefixes:

  • [WAIT] - Rate limiter wait time (only if > 3s)
  • [RATE] - Dynamic backoff adjustment
  • [RETRY] - Retry scheduled
  • [DEAD] - Permanent failure

Colors:

  • [INFO] - Dim gray
  • [OK] - Green
  • [FAIL] - Red
  • [WARN] - Orange

Database Schema

CREATE TABLE downloads (
    id INTEGER PRIMARY KEY,
    source_url TEXT NOT NULL,
    original_id TEXT NOT NULL,
    status TEXT NOT NULL,         -- pending, downloaded, failed, processed
    file_path TEXT,
    artist TEXT,
    album TEXT,
    title TEXT,
    fingerprint TEXT,
    error TEXT,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(original_id)
);

Restart Safety

The service is designed to be fully restart-safe:

  • Duplicate Prevention: Database uses original_id as unique constraint
  • yt-dlp Archive: Separate archive file prevents re-downloading even outside DB
  • Idempotent: Can be safely restarted at any time
  • State Persistence: All state stored in SQLite database
  • Atomic Operations: File moves are atomic to prevent corruption

Development

Building Locally

cd musicd
go build -o musicd .

Running Tests

go test ./...

Getting an AcoustID API Key

  1. Go to https://acoustid.org/
  2. Register for an account
  3. Go to "API Keys" section
  4. Generate a new API key
  5. Add it to your playlists.toml

Troubleshooting

Permission Issues

Make sure the PUID/PGID environment variables match your host user's IDs:

id -u  # Get your user ID
id -g  # Get your group ID

Then set them in docker-compose.yml or docker run command.

Database Locked

If you see "database is locked" errors, ensure only one instance is running.

AcoustID Lookup Failing

  • Verify your API key is correct
  • Check network connectivity
  • Some tracks may not be in the AcoustID database

License

MIT License

Completed Phases

  • Phase 1: Core downloader with resume-safe pipeline
  • Phase 1A: Production hardening & correctness audit
  • Phase 2: Controlled parallel worker pool
  • Phase 3: Advanced retry & failure classification system
  • Phase 4: Domain-aware rate limiter system

Future Phases

  • Phase 5: Smarter duplicate detection via fingerprint matching
  • Phase 6: Dry-run mode and CLI commands