ListenBrainz: A submission API compatible with Last.fm scrobblers

armalcolite · March 10, 2016, 9:46pm

ListenBrainz

ListenBrainz: A submission API compatible with Last.fm scrobblers

(This idea is an extention of one of the ideas mentioned on the ideas page)

Nickname: armalcolite
IRC nick: armalcolite
Time Zone: UTC/GMT +5:30 hours
Email: pinkeshbadjatiya[at]gmail.com
GitHub: pinkeshbadjatiya (Pinkesh Badjatiya) · GitHub
Recent Activity in LB: Fix for LB-13, LB-52, LB-66, LB-68, other bugs. - [ Pull Requests]

Project Details

Brief Overview

Listenbrainz is in state of infancy and needs some good amount of work to make it easy for users to use it. The submission method for users to submit their songs uses web-scraping (will use api, if my PR gets merged, hopefully) which is not reliable.

Their are a lot of applications and open-source projects that enable scrobbling to last.fm. In order to make them compatible for submission to LB, an api compatible with lastfm would be a good start. It would be an addition to the native LB api, giving it an additional upperhand.

Goals

A submission API compatible with Last.fm scrobblers
Right now ListenBrainz has its own API documented at ListenBrainz documentation — ListenBrainz 0.1.0 documentation. The idea is to create a new web service, layered on top of that one which spoke the Last.fm API, so it could be used as a proxy for existing Last.fm clients.

It would be REST API similar to lastfm, allowing minimal but exact set of functionality.

Test suite for the New API
It would include tests written using unittest module in python to ensure the API is not broken any any stage of development.
Documentation for New API
The documentation of the working of API for ease of access and reference.
Optional ideas
Export LB listens
Exporting ones personal listen history would be a good addition to the system. The format can be similar to the one which old version of lastfm gave or a new custom one, requires concent from mentors.
Import lastfm backup
The previous version of lastfm allowed the users to create a download-able backup of their listens. So it would be a nice idea to allow the users to import their previous listen data as well.
Test suite for native API
My understanding of native API has grown much better with my recent patches in scraper.js, porting it to use native API and export feature in LB. My idea is to create a testsuite for the existing Native API.

Implementaion Details

The current native API accepts a payload of the following json format. The import format can be one of the following, (single, playing_now, import).

"payload": [
{
  "listened_at": 1443521965,
  "track_metadata": {
    "additional_info": {
      "release_mbid": "bf9e91ea-8029-4a04-a26a-224e00a83266",
      "artist_mbids": [
        "db92a151-1ac2-438b-bc43-b82e149ddd50"
      ],
      "recording_mbid": "98255a8c-017a-4bc7-8dd6-1fa36124572b",
      "tags": [ "you", "just", "got", "rick rolled!"]
    },
    "artist_name": "Rick Astley",
    "track_name": "Never Gonna Give You Up",
    "release_name": "Whenever you need somebody"
  }
}
]

The native api’s routes are as shown in figure,
Fig1. - The current native api interaction to submit a listen received from POST request.

The NEW Addition to native api would be a new module with name as api_compat.py with root url as http://api.listenbrainz.org/1.0/compat/ with a blueprint

api_compat_bp = Blueprint('api_compat', __name__)

This would keep the existing systems intact and would not conflict with the existing implementation details.
The design of the new module would be as follows.

Fig2. - Shows the new addition to native api in the LB system, and the new routes which needs to be created.

The new addition for api would process the input, validate it and then submit the appropriate messy_dict to external/messybrainz.py for furthur processing. This would use the existing infrastructure as base and provide users with a completely new API(new as in for LB).

The use of api_compat (just a name. ) will require the use of token to prevent misuse of service. Since the token is already provided to the user, the same can be used for all the write api calls. The GET calls still can be used without specifying any token.

I will implement at least 2 types of auth. Desktop auth is for sure. The other auth can be web-auth or mobile-auth. (I am biased towards implementing web-auth). Surely, if time permits then i will go for all three modes of authentication.

Some part of the compatible API is already implemented and resides at GitHub - Jonty/scrobbleproxy: A Last.fm API 2.0 implementation that you do not want to use yet.
I can reuse the modules and we dont’t need to duplicate work. Ti\his would give me more time to work on integrating it with LB.

Methods to implement:

Scrobble Types supported:

Scrobble now listening
Scrobble tracks

The following are methods that i will implement, in descending order of priority.

Method	Description	Documentation
track.scrobble	Scrobble tracks to listenbrainz.org in single or batch mode	http://www.last.fm/api/show/track.scrobble
track.updateNowPlaying	Update the now playing details to LB	http://www.last.fm/api/show/track.updateNowPlaying
XMLtoJSON	Convert a given XML response to JSON using the specified rules.	(Explained below)
track.getInfo	Get full metadata about a particular using using mbid or artist and track name.	http://www.last.fm/api/show/track.getInfo

For each method, the lastfm documentation states the sample response type and the required arguments in more detail.
Appropriate verification and validation of the inputs received will be implemented to ensure security of the system.

If time permits then i will work on the optional ideas and implement them after seeking approval from mentors.

Error codes

The error codes also need to be consistent across the api. The list of all the error codes to use as documented at http://www.last.fm/api/errorcodes

More Details

Since lastfm uses both json and xml type responses, so the api_v2 must also support XML style responses in addition to JSON to ensure its compatibility and consistency with the rest of the Last.fm web services. Example XML response,

  <similartracks track="Believe" artist="Cher">
    <track>
      <name>Ray of Light</name>
      <mbid/>
      <match>10.95</match>
      <url>http://www.last.fm/music/Madonna/_/Ray+of+Light</url>
      <streamable fulltrack="0">1</streamable>
      <artist>
        <name>Madonna</name>
        <mbid>79239441-bfd5-4981-a70c-55c3f15c1287</mbid>
        <url>http://www.last.fm/music/Madonna</url>
      </artist>
      <image size="small">http://cdn.last.fm/coverart/50x50/1934.jpg</image>
      <image size="medium">http://cdn.last.fm/coverart/130x130/1934.jpg</image>
      <image size="large">http://cdn.last.fm/coverart/130x130/1934.jpg</image>
    </track>
    ...
  </similartracks>

The json format would be created from the XML using the following rules, as stated in the lastfm documentation.
- Attributes are expressed as string member values with the attribute name as key.
- Element child nodes are expressed as object members values with the node name as key.
Text child nodes are expressed as string values, unless the element also contains attributes, in which case the text node is expressed as a string member value with the key #text. *
- Repeated child nodes will be grouped as an array member with the shared node name as key.
JSON errors are simple, and do not follow the above rule. Example,
```
{
    "error": 10,
    "message": "Invalid API Key"
}
```
All requests will be POST, with form urlencoded (using utf-8) parameters in the body of the request.

TimeLine

Community Bonding Period (April-22 – May-22)

Study the ListenBrainz codebase. Understand the interaction between listenstore, cassandra and kafka and get a understanding of how the data is inserted in PostgreSQL and Cassandra.

Try to precisely define what all I have to code during the coding period. (I plan on submitting this before the start of the coding period. This way, me and my mentor, will be on the same page, in terms of design choice. This will help in eliminating bad design choice early enough, before it is too late)

List out the exact methods to implement in the comming period and
Solve some more random bugs, this will help in increasing interaction with mentors and the people associated with ListenBrainz, and also increase my knowledge about Cassandra and LB.

Quarter Term (May-19 – June 9)

Start Coding. By the end of this period I want to get a simple version of API working and wan to be able to run it live.(though it may miss some features).

Mid Term (June-9 – June-28)

Implement all the missing features. Write tests to check all the implemented features. 1st review from mentors.

Three Quarter Term (June-28 – July-21)

Start posting the proof-of-work application. Write some more test suite to ensure proper working of the API. Finish the main API and start working on more features, may be brainstorm on the optional ideas.

Pencils Down Date (July-21 – August-5)

Complete the coding part. Recheck the code and re-factor (if needed). Check for bugs. Write documentation.

Final Submission (August-16 – August-24)

Begin submitting the final code after long coding period. A big relief !!.

After Google Summer of Code (August-24 – )

Stay associated with Metabrainz. Work on other bugs. Pick some other major projects, like Picard. Be an active member of the community.

Week-by-week distribution of work

For the coding period (May-19 to August-24; 13 weeks), I have defined a more granular timeline, which will help me set short-term goals precisely, and see to it that I complete my work on time.

Week	Work
1	Set up working environment and start the initial setup and planning.
2	Read the lastfm documentation and get a clear understanding of its working.
3	Start coding. Implement the main methods, track.scrobble and track.updateNowPlaying
4	Complete with the API token authentication. Continue coding. Check its working, do some tweaks and continue coding.
5	Implement missing features (extend the API to include more features if possible) Track down and eliminate bugs. Start mild testing along with coding.
6	Mid-term Submission Continue on previous week’s work
7	Consider review from mentor, do essential changes, do tweaks as required and continue coding.
8	Complete with the basic API and test its working. Start planing for the next week. If i am at a good position, then start with some optional tasks(after consulting with mentor).
9	CUSHION_WEEK - If i am lagging behind then cope up with the timeline. If not then then try implementing the optional features.
10	Complete working on the API. Look for the bugs, and resolve them.
11	Start testing API. Do some robust testing. Optimise the API. Handle bugs, if any. Work on documentation and write test suite.
12	CUSHION_WEEK - To handle delays (if any) or work on optional features.
13	Wrap up. Do some final edits. Prepare for Final Submission

Expectations from Mentors

Listenbrainz with its quite complex codebase with use of multiple tools (like Casssandra, kafka, messybrainz, listenbrainz etc) makes it a unique challenge.
The mentors have been very supportive so far. I hope this continues.

Also I hope my mentors will pull me back up if I’m getting a little slow, or straying away from goal. I also believe code reading is a very good practice to avoid bugs, and so if my mentor is able to regularly take a look at the code which I commit, it will be wonderful. Also regular interaction increases chances of success and thus I would like to regularly bug my mentor! And, surely i am looking forward to contributing to AcousticBrainz as well.

About Me.

I am a second year undergraduate student, studying Computer Science and Engineering, at International Institute of Information Technology - Hyderabad, India. I started working on ListenBrainz a few days back, and since then i have enjoyed it. Though, Cassandra and Kafka were completely new to me, but my interest helped me in the whole learning process.

Though this will be my first major Open Source project, but i have sufficient programming experience.

Q. Tell us about the computer(s) you have available for working on your SoC project!
A. I would be using Intel i3 - 3rd Gen Laptop with 4GB RAM.

Q. When did you first start programming?
A. My first encounter with linux and related bash scripts happened when i was in 10th grade. But, my actual coding experience started in my 1st year at college.

Q. What aspects of the project you’re applying for (e.g., MusicBrainz, AcousticBrainz, etc.) interest you the most?
A. (Listenbrainz) - The idea that the raw data which is being collected can be used to predict the pattern/listening habits for users. I am looking forward towards contributing to this project, which is of keen interest to me.

Q. Have you ever used MusicBrainz to tag your files?
A. Yeah !! And it was nice.

Q. Have you contributed to other Open Source projects? If so, which projects and can we see some of your code? If you have not contributed to open source projects, do you have other code we can look at?
A. I don’t have much history of contribution in Open Source. But, in the recent months i have started contributing in Linux Kernel, and occasionally try to fix bugs. Apart from this, i do a lot of projects in my free time. Most of the codes are hosted on github.
https://github.com/pinkeshbadjatiya.

Q. What sorts of programming projects have you done on your own time?
A. I believe in learning and exploring, and so i spend most of my free time doing variety of interesting projects. I have done projects like website development, AI bots for Ultimate-tic-tac-toe, 2D/3D games, issue trackers, Scheduler for xv6 operating system, c-shell etc. Details to all of my projects are there in my CV.

Q. How much time do you have available, and how would you plan to use it?
A. 36 hours per week. (~6 hours per day, with one-day off)

Q. Do you plan to have a job or study during the summer in conjunction with Summer of Code?
A. No. I would work as a full-time employee during my whole summers. I would be submitting a full timeline of my work distribution for the entire summer period.

This is a draft application. Any suggestions/changes/corrections are appreciated.

Mineo · March 10, 2016, 10:25pm

The last.fm api has 3 different options for obtaining the session key that’s required for write API calls like track.scrobble. Which one of those will be implemented for this?

armalcolite · March 10, 2016, 10:32pm

LB already provides the authentication token(user token) on the import page. My idea is to reuse that token instead of generating new one. But if it does not suffice, then the method what LASTFM uses to authenticate the “Desktop Application” would be the one i am considering. Any suggestions regarding the same are welcome.
Doc: http://www.last.fm/api/desktopauth

Freso · March 11, 2016, 12:16am

I think it’s fine to implement these methods in a Last.FM compatibility API layer, but for anything “new” we create, we should rely on our native ListenBrainz API. The Last.FM API layer should probably “just” be translating requests to LB API and be provided as a convenience to users using “old” software. We still want to promote using the LB API for new implementations as that allows us to get a lot more data from the data people send us.

armalcolite · March 11, 2016, 9:31am

Absolutely true.
The creation of these methods is just to ensure compatibily with the existing applications who still rely on these methods of lastfm to display leaderboard. Extending the native API of LB is the main thought.

Freso · March 11, 2016, 9:43am

You will probably want to do that, then. If you’re going to be working with stuff using MBIDs, like ListenBrainz, you will want to get an understanding of the entity types, what information is or can be attached to them, how they relate/can relate to each other, etc. So get started:
https://wiki.musicbrainz.org/How_to_Contribute#Data

armalcolite · March 11, 2016, 9:45am

Seems, i will have to start working on windows. I am a linux enthusiast and only time i start windows is to play games or when i am too lazy to select ubuntu from the dual boot menu

Freso · March 11, 2016, 10:00am

Why? Picard runs fine on Linux and is packaged for most distributions…

Deleted_Editor_1240840 · March 11, 2016, 10:06am

Can you link to that pull request, please?

While it would be nice to have these features, this goal seems to be quite different from the main one, which I assume is to work on the API. Can you come up with a schedule, as Development/Summer of Code/Application Template - MusicBrainz Wiki mentions?

I see two problems with this:

Our “first” API is still unfinished, as far as I know. It doesn’t seem like a good idea to create another version.
This is not actually supposed to be v2, just an additional compatibility layer on top of what we already have.
Perhaps http://api.listenbrainz.org/compat/ or something similar?

You also mention tags. Is that something we actually need?

armalcolite · March 11, 2016, 11:15am

Updated the draft for the same.

I was not sure whether to present the complete timeline of the work that i will do in this draft, so i just sticked to the main idea. I would be giving the complete detailed schedule of my work in the final application.

I am not clear as to what is remaining in the current api. The basic functionality works fine. Can you give some idea as to what more needs to be done, so that i can plan on working on it if possible.

Sure. v2 was just a substitute name i was using to differentiate it from the native api. Implementation details may be different, and as per the approval of mentors.[quote=“Gentlecat, post:9, topic:2708”]
You also mention tags. Is that something we actually need?
[/quote]

The existing apps may be using tags, though chances are very less. So to avoid them being broken, i was considering to implement some methods. Suggestions about any other method that needs to be are appreciated.

armalcolite · March 11, 2016, 11:27am

I remember, the last time i checked, i could not find it for ubuntu. May be i read something wrong. Will surely try it.

alastairp · March 11, 2016, 12:20pm

Thanks for the proposal. I’m especially pleased to see the work you put into understanding the existing API and how ListenBrainz and MessyBrainz interact.

I see that others have given some good feedback, but there are a few other things that I wanted to add

This is the most important part of this project. The idea is that I should be able to type in my ListenBrainz username and password into a last.fm scrobbler, and configure it to submit to listenbrainz instead of last.fm. One way that people did this after the launch of libre.fm (which also has a compatible API) was to change /etc/hosts to point last.fm to a different IP address. I think this would be a first step.

You should look at some existing compatibility code which we started for ListenBrainz: GitHub - Jonty/scrobbleproxy: A Last.fm API 2.0 implementation that you do not want to use yet.

Good thinking. We should just call it lastfmcompat or similar. But this is a pretty low-level detail which we don’t need to worry about just yet.

Another consideration is that we could treat the last.fm compatibility layer as a completely separate server, which just translates last.fm API commands to ListenBrainz commands.
I think the commands to implement should start with the ones which the majority of scrobbling software supports. My guess is that this is probably just track.scrobble and track.updateNowPlaying, but you should test some of the more popular software to check what else they do.

Regarding your second idea, the chart of top tracks, etc.
This is something which we have a general idea of what we want to do, but have not had time yet to design or implement it.
Our current storage system uses cassandra, with a primary key on the userid and a timestamp. This way we can store a timeseries list of data submitted by a particular user, but this makes it difficult to get, for example, a list of people who submitted a particular track, or tracks for an artist.
Another problem is that cassandra isn’t great at getting counts of these kinds of items, and most people use an external counting system (perhaps redis)

If you’re interested in doing this part of the project in addition to the submission API, I want to warn you that it won’t be as simple as just “get a list of counts for an artist and plot it”. Therefore you should look into this part of ListenBrainz as well and see if you can propose some concrete steps like you have for the API part. Ask me or ruaok in IRC if you have any other specific questions (We have worked together on most of the existing parts of LB).

Deleted_Editor_1240840 · March 11, 2016, 5:22pm

Where and how are you going to store them? We have tags in MusicBrainz, do we want to duplicate them?

armalcolite · March 11, 2016, 5:39pm

Thanks for pointing out.
I had assumed LB was storing anything what it gets, but when i crosschecked i realized it just ignores the tags and stores just the essential information. We can consider storing the tags, but MB already does it so its a bad idea duplicating the database. Also, redirecting the request to MB is another option, but i don’t know if its a good idea either. Any comments?

alastairp · March 14, 2016, 8:12pm

My recommendation is that we work first on a solid proposal for track submissions. If we need to support other endpoints for existing applications I think that we should first make them do nothing, and only look at adding support for them once we have track submission working.

armalcolite · March 14, 2016, 8:38pm

Sure. great idea. Would prioritize the idea.

rob · March 21, 2016, 12:45pm

I like your proposal, but I think you should remove the “Chart of top artists and tracks” optional item. As Alastair suggested, this is a big project, for which we soliciting separate applications. I like your other optional suggestions and instead of the charts on, you could consider adding an option for “export my listen history”, so that users can back up their own listens.

I also wanted to echo what Gentlecat said: Your additions are not a new API, but an extension of the existing API, which does not require changing the version number of the existing API. And the existing API is “complete”, but the portion to add rate-limiting has not been finished yet, so we’ve not published the docs as official yet.

Has anyone suggested that you have a look at the rate-limiting branch? I keep failing to find the time to finish this branch and to officially release version 1 of the API.

armalcolite · March 21, 2016, 7:55pm

Thanks for your suggestions.

Made the necessary changes. I am very much interested in working with casssandra and would surely go for it in future.

Made the necessary changes.[quote=“rob, post:17, topic:2708”]
And the existing API is “complete”, but the portion to add rate-limiting has not been finished yet, so we’ve not published the docs as official yet.

Has anyone suggested that you have a look at the rate-limiting branch? I keep failing to find the time to finish this branch and to officially release version 1 of the API.
[/quote]

I was not aware of rate-limiting. The work there seems interesting. I will try to get hold of things going on in that branch.

I think the amount of effort put by the community has improved a lot of content in draft and now it seems to have touched a saturation point.
I will be posting a final proposal in a day or so, if there are no more suggestions.

alastairp · March 22, 2016, 11:14am

I think I suggested compat (for compatibility), not compact.

Please make sure you include the last.fm compatibility layer that I posted, as we have a partially-working implementation here we should make sure that we don’t duplicate work. Instead, part of the project can be making it more stable and integrating it into the main LB application:

Sorry - @rob asked me to point you to this, but I didn’t! Take a look at the branch and let us know if you have any questions.

armalcolite · March 22, 2016, 2:50pm

Made necessary changes.[quote=“alastairp, post:19, topic:2708”]
Please make sure you include the last.fm compatibility layer that I posted, as we have a partially-working implementation here we should make sure that we don’t duplicate work. Instead, part of the project can be making it more stable and integrating it into the main LB application:
[/quote]

Yeah. I had pinned that page for a long time. Added it in the proposal.