- Go 98.6%
- Shell 0.8%
- Dockerfile 0.6%
| .dockerignore | ||
| .gitignore | ||
| AGENTS.md | ||
| CHANGES_SUMMARY.md | ||
| classifier.go | ||
| compose.yml | ||
| config.go | ||
| database.go | ||
| Dockerfile | ||
| downloader.go | ||
| entrypoint.sh | ||
| go.mod | ||
| go.sum | ||
| logger.go | ||
| main.go | ||
| metadata.go | ||
| organizer.go | ||
| PHASE1A_HARDENING.md | ||
| PHASE2_SUMMARY.md | ||
| PHASE2_WORKER_POOL.md | ||
| PHASE3_RETRY_SYSTEM.md | ||
| PHASE4_RATE_LIMITER.md | ||
| PHASE5_PROBE.md | ||
| PHASE6_METADATA_POLISHING.md | ||
| ratelimiter.go | ||
| README.md | ||
| restart.go | ||
| title_sanitizer.go | ||
| title_sanitizer_test.go | ||
| worker.go | ||
Musicd - Automated Music Downloader
A restart-safe, Dockerized music downloader service that downloads audio from YouTube playlists and SoundCloud URLs, fingerprints tracks, resolves metadata via AcoustID and MusicBrainz, and organizes files in a clean structure.
Phase 4 Features (Domain-Aware Rate Limiter)
- Per-domain concurrency limits (configurable per platform)
- Per-domain request pacing (minimum interval between requests)
- Dynamic backoff adjustment - increases on rate limits, decreases on success
- Domain isolation - YouTube limits don't affect SoundCloud
- In-memory operation - no external services or persistence
- All Phase 1A, 2, and 3 features maintained
Phase 4 Rate Limiter
See PHASE4_RATE_LIMITER.md for implementation details.
Supported Domains:
| Domain | Max Concurrent | Min Interval |
|---|---|---|
youtube |
2 | 2s |
soundcloud |
1 | 3s |
generic |
2 | 1s |
Dynamic Adjustment:
- On rate limit: Double interval (max 1 minute)
- On 5 consecutive successes: Reduce by 10% (min config baseline)
Phase 3 Retry System (Inherited)
See PHASE3_RETRY_SYSTEM.md for implementation details.
Failure Classes:
| Class | Retries | Backoff |
|---|---|---|
network |
5 | exponential |
rate_limit |
5 | exponential + jitter |
temporary_api |
3 | exponential |
corrupted_media |
0 | permanent |
unavailable |
0 | permanent |
invalid_url |
0 | permanent |
unknown |
2 | fixed 1m |
Backoff Formula: delay = 30s * (2 ^ retry_count), max 6 hours
Phase 2 Worker Pool (Inherited)
See PHASE2_SUMMARY.md for implementation details.
- Concurrent Workers: Configurable 1-5 workers (default: 2, max: 5)
- Atomic Claiming: SQL
UPDATE...RETURNINGensures single claim - Per-Worker Resources: Individual temp dirs and DB connections
- Worker IDs in Logs:
[W1][INFO],[W2][OK], etc. - Bounded Concurrency: No unbounded goroutine spawning
Phase 1A Hardening (Inherited)
See PHASE1A_HARDENING.md for complete audit details.
- Transaction Safety: Atomic DB updates with transaction wrapping
- Timeout Protection: All external calls time out (HTTP 10s, fpcalc 15s)
- Input Validation: Fingerprint length, duration limits, filename sanitization
- Crash Recovery: Temp directory cleanup, WAL mode, job reset (preserves retry state)
- Determinism: Sorted results, consistent ordering, no random selection
- Path Security: Directory traversal prevention, reserved name filtering
- Duplicate Detection: Size + SHA256 hash comparison for files <20MB
Requirements
- Docker
- Docker Compose (optional)
Quick Start
-
Clone the repository and navigate to the project:
cd musicd -
Create the config directory and configuration file:
mkdir -p config cp config/example.playlists.toml config/playlists.toml -
Edit the configuration file:
nano config/playlists.tomlAdd your AcoustID API key and playlist URLs:
[general] acoustid_api_key = "YOUR_ACOUSTID_API_KEY_HERE" download_archive = "archive.txt" [[sources]] url = "https://www.youtube.com/playlist?list=YOUR_PLAYLIST_ID" [[sources]] url = "https://soundcloud.com/artist/track" -
Create the music output directory:
mkdir -p music -
Build and run with Docker Compose:
docker-compose up --buildOr run with Docker directly:
docker build -t musicd . docker run -d \ -e PUID=1000 \ -e PGID=1000 \ -v $(pwd)/music:/music \ -v $(pwd)/config:/config \ musicd
Configuration
Environment Variables
PUID- User ID to run as (default: 0)PGID- Group ID to run as (default: 0)
Volumes
/music- Output directory for downloaded music/config- Configuration and database directoryplaylists.toml- Playlist configurationmusic.db- SQLite databasearchive.txt- yt-dlp download archive
Config File Format (TOML)
[general]
acoustid_api_key = "YOUR_KEY"
download_archive = "/config/archive.txt"
workers = 3 # Number of concurrent workers (1-5, default: 2)
# Optional: Rate limiting per domain
[rate_limit.youtube]
max_concurrent = 2
min_interval = "2s"
[rate_limit.soundcloud]
max_concurrent = 1
min_interval = "3s"
[rate_limit.generic]
max_concurrent = 2
min_interval = "1s"
[[sources]]
url = "https://youtube.com/playlist?list=XXXX"
[[sources]]
url = "https://soundcloud.com/user/track"
How It Works
-
Startup:
- Load configuration from
/config/playlists.toml - Initialize SQLite database at
/config/music.db - Ensure database schema exists
- Load configuration from
-
Source Scanning:
- For each configured source URL, run
yt-dlp --flat-playlist --dump-json - Extract video IDs and insert missing ones as
pendingin the database
- For each configured source URL, run
-
Processing Loop:
- Continuously process pending or failed downloads
- Never reprocess already processed tracks
- Sleep 5 minutes between checks
-
Download Process:
- Download audio using
yt-dlp --extract-audio --bestaudio - Write to temp folder
/tmp/downloads - Use
--download-archiveto prevent duplicates
- Download audio using
-
Metadata Resolution (in order):
- Generate fingerprint using
fpcalc - Lookup via AcoustID (requires score > 0.8)
- Fetch details from MusicBrainz
- Fallback to embedded tags via ffprobe
- Final fallback: parse from filename
- Generate fingerprint using
-
File Organization:
- Sanitize names (remove invalid chars, title case)
- Store as
/music/{artist}/{album}/{title}.{ext} - Handle duplicates by appending
_(1), etc. - Atomic move operation
Logging
The application produces structured CLI logs with worker IDs:
[MAIN][INFO] Starting musicd...
[MAIN][OK] Loaded config with 2 sources
[MAIN][OK] Rate limiter configured: youtube=2/2s, soundcloud=1/3s, generic=2/1s
[W1][INFO] Worker started
[W2][INFO] Worker started
[W3][INFO] Worker started
[W1][INFO] Claimed job: abc123
[W1][INFO] Downloading: Artist - Title
[WAIT] youtube limiter 4.2s
[W1][OK] Metadata resolved via AcoustID
[W1][OK] Stored → /music/Artist/Album/Title.flac
[RATE] youtube backoff increased to 8s
[W1][INFO] [RETRY] network error → retry #2 in 2m
[W2][INFO] [DEAD] unavailable → marked permanent
Format: [WORKER][LEVEL] Message
WORKER: MAIN (main thread) or W1, W2, W3, etc. (workers)LEVEL: INFO, OK, FAIL, WARN
Special Log Prefixes:
[WAIT]- Rate limiter wait time (only if > 3s)[RATE]- Dynamic backoff adjustment[RETRY]- Retry scheduled[DEAD]- Permanent failure
Colors:
[INFO]- Dim gray[OK]- Green[FAIL]- Red[WARN]- Orange
Database Schema
CREATE TABLE downloads (
id INTEGER PRIMARY KEY,
source_url TEXT NOT NULL,
original_id TEXT NOT NULL,
status TEXT NOT NULL, -- pending, downloaded, failed, processed
file_path TEXT,
artist TEXT,
album TEXT,
title TEXT,
fingerprint TEXT,
error TEXT,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
UNIQUE(original_id)
);
Restart Safety
The service is designed to be fully restart-safe:
- Duplicate Prevention: Database uses
original_idas unique constraint - yt-dlp Archive: Separate archive file prevents re-downloading even outside DB
- Idempotent: Can be safely restarted at any time
- State Persistence: All state stored in SQLite database
- Atomic Operations: File moves are atomic to prevent corruption
Development
Building Locally
cd musicd
go build -o musicd .
Running Tests
go test ./...
Getting an AcoustID API Key
- Go to https://acoustid.org/
- Register for an account
- Go to "API Keys" section
- Generate a new API key
- Add it to your
playlists.toml
Troubleshooting
Permission Issues
Make sure the PUID/PGID environment variables match your host user's IDs:
id -u # Get your user ID
id -g # Get your group ID
Then set them in docker-compose.yml or docker run command.
Database Locked
If you see "database is locked" errors, ensure only one instance is running.
AcoustID Lookup Failing
- Verify your API key is correct
- Check network connectivity
- Some tracks may not be in the AcoustID database
License
MIT License
Completed Phases
- ✅ Phase 1: Core downloader with resume-safe pipeline
- ✅ Phase 1A: Production hardening & correctness audit
- ✅ Phase 2: Controlled parallel worker pool
- ✅ Phase 3: Advanced retry & failure classification system
- ✅ Phase 4: Domain-aware rate limiter system
Future Phases
- Phase 5: Smarter duplicate detection via fingerprint matching
- Phase 6: Dry-run mode and CLI commands