Activists Downloaded 86 Million Audio Files from Spotify and Plan to Make Them Publicly Available

Activists Downloaded 86 Million Audio Files from Spotify and Plan to Make Them Publicly Available

Pirate-activists from Anna's Archive reported they have scraped almost the entire music library from Spotify, the world's largest streaming service. The group claims to have collected metadata for 256 million tracks and downloaded 86 million audio files—totaling approximately 300 TB of data.

Anna's Archive is a metasearch engine for shadow libraries, launched in 2022 by an anonymous activist using the pseudonym Anna. The project emerged shortly after law enforcement attempted to shut down Z-Library. Anna's Archive aggregates content from Z-Library, Sci-Hub, Library Genesis (LibGen), Internet Archive, and other sources. The activists describe their work as "preserving humanity's knowledge and culture."

Members of Anna's Archive announced the creation of what they call the first-ever "preservation archive" for music. Per the activists, they recently discovered a method to mass-scrape Spotify and decided to use this capability to archive content.

"Some time ago, we found a way to scrape data from Spotify on a massive scale. We saw this as an opportunity to create a music archive focused primarily on content preservation," the group states in its blog. "Of course, Spotify doesn't have all the music in the world, but it's a great start."

Addressing Archive Shortcomings

The activists argue that all existing music collections—both physical and digital—have serious shortcomings. Such archives primarily focus on popular artists, chase maximum sound quality (such as lossless FLAC) which increases file sizes, and lack centralized torrent lists. Anna's Archive designed this project to address these issues.

Anna's Archive typically focuses on books and scientific articles because text has the highest information density. However, the group's mission—preserving humanity's knowledge and culture—makes no distinction between media types. "Sometimes an opportunity arises to preserve non-text content. This is precisely such a case," the activists note.

The Scale of Data Collection

The resulting metadata dump contains information about 99.9% of all tracks on the platform—approximately 256 million songs. This makes it the largest publicly available music metadata database in the world. For comparison, competitors have between 50 and 150 million records, while MusicBrainz has only 5 million unique ISRC codes compared to Anna's Archive's 186 million.

The activists didn't stop at metadata. They archived audio files for 86 million tracks. Although this represents only 37% of the total songs available on Spotify, these tracks account for 99.6% of all plays on the platform. In other words, there is a 99.6% probability that any random track a user listens to is included in the archive.

Technical Approach

When sorting tracks, the group used Spotify's popularity metric—a numerical value from 0 to 100 calculated based on play count and recency. Tracks with a popularity score above zero were preserved in the original Ogg Vorbis 160 kbps quality. For less in-demand songs, they applied re-encoding to Ogg Opus 75 kbps. The activists note that the difference will be imperceptible to most listeners while helping save storage space.

The entire archive is planned for distribution via torrents in the Anna's Archive Containers (AAC) format—the group's proprietary standard for file distribution. The release will be split into several stages: all collected metadata has already been published, after which they plan to release the tracks themselves (ranked by popularity, from most to least popular), additional metadata, album covers, and patches to restore the original files.

Activists added extensive metadata to each file—track title, URL, ISRC code, UPC, album cover, loudness data (replaygain), and other information. The original Spotify files did not contain metadata, so the group embedded it into the Ogg files without re-encoding the audio.

Spotify's Response

Spotify representatives confirmed the data leak is real. The company emphasized it has already identified and blocked accounts engaged in illegal scraping and implemented new protective measures to prevent similar attacks in the future.

"Spotify identified and blocked fraudulent accounts that were used for illegal scraping. We have implemented new protective measures against such attacks and are actively monitoring suspicious activity. From day one, we have stood on the side of artists in the fight against piracy and are now actively collaborating with industry partners to protect content creators and defend their rights," said Spotify spokesperson Laura Batey.

For now, the archive is focused exclusively on content preservation and is only accessible via torrents. However, the group acknowledges that with sufficient interest, they may add the ability to download individual files directly through their website.