An extreme amount of metadata from Spotify’s music library has been scraped by pirate activists as of December 2025, according to a report from the blog for Anna’s Archive. The archive serves as a “shadow library” to preserve mostly text-based media, such as scholarly journals. However, the anonymous individuals at the archive found a way to back up the majority of Spotify’s library.
According to the blog post, Spotify hosts around 256 million tracks. The archive contains metadata for 99.9% of these tracks. It also archived 86 million audio files, or 99.6% of listens. As of December 21, only the metadata has been released. But there are plans to distribute the roughly 300 terabytes of data in stages.
In a statement via Billboard, Spotify indicated it is investigating the incident. “An investigation into unauthorized access identified that a third party scraped public metadata and used illicit tactics to circumvent DRM [digital rights management] to access some of the platform’s audio files,” said a spokesperson. “We are actively investigating the incident.”
Previously, the largest open music archive was MusicBrainz, containing roughly five million tracks. Anna’s Archive is on track to significantly surpass that if it does actually release the audio files.
Scraping Spotify for Music Preservation Archive Sounds Good On Paper, But Poses Unique Challenges in the Age of AI
2025 has been the year of the Spotify boycott. More artists have been pulling their music from the platform every day since July. Additionally, several organizations called for protests against the company when Spotify Wrapped dropped in early December. With shady startup investments, the lowest royalty payments out of all platforms, ICE recruitment ads, rising monthly costs, and an inundation of AI slop, the general public is fed up with Spotify.
So, while Anna’s Archive didn’t scrape the entire Spotify music library, the fact that it did this at all puts tech-savvy listeners in a unique position. In theory, these files could be used to build an open-source streaming platform if someone had the right tools (And probably ironclad legal representation).
As the blog post described, this is the world’s first completely open preservation archive of music. While it’s not a comprehensive archive of the history of recorded music, it’s still an impressive feat. With enough disk space, the post posits, anyone could mirror the archive themselves.
This collection aims to preserve the world’s musical heritage, which is admirable, given the general commodification of music through streaming and distribution monopolies. But there are also downsides to an open music archive like this. For one, illegally training AI just got way easier. Ethically dubious AI companies have already scraped the world’s most popular songs to train their slop machines. What’s stopping them now? There’s also copyright enforcement to contend with, which is notoriously and almost universally strict. But the cat is already out of the bag, so to speak, and they tend not to go back in when forced.
Photo by Robert Michael/picture alliance via Getty Images
The post With Spotify’s Library Plundered, the Door Is Open for Music Preservation, but Also for AI Companies appeared first on VICE.




