Right now AcousticBrainz is storing summary information to PostgreSQL as jsonb data. In order to run algorithms on audio we need to store frame-level data. This poses several challenges listed here: http://tickets.musicbrainz.org/browse/AB-101
The idea is to still generate aggregated summary data with Essentia streaming_extractor_music but also generate raw frame-level data and submit them together. PostgreSQL would still store JSON summary data but frame-level data would be stored in a faster and more space-efficient format (think protocol buffers). Those data would then be saved to Google’s BigTable storage so it can be queried later for analysis. It’s a key/value storage engine capable of supporting huge amounts of keys -> “These tables will grow at the rate of approximately 2 billion rows per day, which Cloud Bigtable can handle without difficulty.” Currently, the number of collected data is somewhere around 3.5 million songs.
Here is some info from their docs page:
- Each table has only one index, the row key. There are no secondary indices.
- Rows are sorted lexicographically by row key, from the lowest to the highest byte string
- In general, keep all information for an entity in a single row. An entity that doesn’t need atomic updates and reads can be be split across multiple rows. Splitting across multiple rows is recommended if the entity data is large (hundreds of MB).
The lexicographically sorted keys and the ability to index a large number of rows makes it ideal to store audio data. A similar example is given for server metrics data: https://cloud.google.com/bigtable/docs/schema-design-time-series#server_metrics
We can use the guidelines for server metrics data to design the db schema for AcousticBrainz data.
Before we can accomplish this we need support for frame-level data in streaming_extractor_music. I’m under the impression that the tool needs more work before releasing a new version so in order for this to work by the end of summer we need coordination between Essentia and AcousticBrainz webservice developers. Designing the BigTable schema and setting it up doesn’t have to wait for the tool.
So the flow would go like this:
- Generate summary and frame-level JSON data with streaming_extractor_music
- Submit the data to the webservice
- Store summary data to postgres and frame-level data to the local storage in protobuf format and to BigTable
(Not sure when data should actually be converted to protobuf, maybe with streaming_extractor_music, maybe after upload, needs to be brainstormed)
- Efficiently fetch audio data frames from BigTable
- Do machine-learning magic
Think of this as a working draft for a GSOC proposal in the making if everyone is on board with the idea.
Any input or help with expanding this idea further?