GSoC 2024: Add Syndication Feeds to ListenBrainz

ercd · March 15, 2024, 10:57am

GSoC 2024: Add Syndication Feeds to ListenBrainz

About Me

General Info

Nickname/Discord/IRC: ericd

Email: [redacted]

Github: ericd23

Timezone: CST UTC+8

My name is Eric Deng. I graduated from UESTC last year and will be attending McMaster University this fall, both majoring computer science. From now though this summer, I will have time dedicated to my family and open source software.
Music

I find myself mostly listening to Electronic/Jazz/RnB, here’s a few examples:
- At the Blue Note: The Complete Recordings - Keith Jarrett
- Random Access Memories - Daft Punk
- Subject - Dwele
I knew MetaBrainz from beets, which uses it to get metadata. I use beets to collect albums I like (downloaded using spotdl), to build a future-proof library. It works perfectly well, especially the metadata fetching part!

The thought of recording my listening history has been lingering around my mind for some time, and ListenBrain seems a good fit since it offers Spotify and Youtube integration. And I would really like to help improve it.
Tech

I’m interested Linux, system software, as well as full stack development. I’ve been a Linux user for 5 years. For the last 3 month I’m been learning distributed systems and having fun implementing Raft-related systems in Go. Last year, I interned at a SaaS startup doing LLM-related backend service, using both Nodejs and Python (Flask).

I used Python and Flask to quickly roll out a demo of a RAG application during my internship. Though I now don’t have access to the code repo, I could find some code snippets on my personal laptop to demonstrate its basic idea.

My recent effort on distributed system could be found here. You can also find some of my previous code here (I will try to add any other old projects I still have a local copy).

My OSS contribution is not large at this time, that’w why I thought this GSoC project could help kick start my OSS journey

Project Proposal

Synopsis

Project Length: 175 hours

Goals:

Add feed endpoints for user events, fresh releases, stats and stats art
Implement authentication mechanism for private feed
Add documentation for feeds

Deliverables

Feed generation (required)

We use Atom as the feed syndication format. It is simple and precisely defined XML. With Python’s built-in XML support, we first introduce a util class to work with Atom:

feed = AtomFeed(
    title = "Eric's Feed - ListenBrainz",
    link = "https://listenbrainz.org/feed/",
    updated = datetime.datetime.now(),
    id = gen_id(),
    author = listenbrainz_contacts(),
    entries = [
       AtomEntry(
         title = "Eric Listened to Part I - Keith Jarrett",
         id = gen_id(),
         link = "https://musicbrainz.org/recording/a82bc5dd-c72c-4a7f-90f1-ef12b2b48bc6",
         content = "Eric listened to Part I - Keith Jarrett. Click to the MB recording page.",
         updated = datetime.datetime.now(),
    ])

return feed.to_xml()

We can then subclass AtomFeed to add more abstraction. For example, with a class called UserEventsFeed, we can create a specific feed by simply filling a few fields:

feed = UserEventsFeed(user_name = "Eric", type = "listen")
return feed.to_xml()

Likewise, we can subclass AtomEntry to add convenience.

Authentication (required)

We introduce a new token to authenticate reads on feed endpoints. We don’s the user token because its privilege scope is too large and has write access.

We use a new table to store the feed token and add foreign key constraint:

CREATE TABLE user_syndication_feed (
  id           SERIAL,
  user_id      INTEGER NOT NULL, -- FK to "user".id
  feed_token   VARCHAR,
);

ALTER TABLE user_syndication_feed
    ADD CONSTRAINT user_syndication_feed_user_id_foreign_key
    FOREIGN KEY (user_id)
    REFERENCES "user" (id)
    ON DELETE CASCADE;

When a user tries to read on a private user-specific feed endpoints, we require the request to include a query string called code, with value md5(feed_endpoint_route + feed_token). For example, user timeline event is private to the user, so the request should look like this: /user/ericd/feed/events?code=0f820530128805ffc10351f22b5fd121.

We will provide an API to get the token so that frontend can construct a feed URL with correct code for users. Upon each request, backend does the hashing again and compare the result with code. If match, authentication succeeds.

At the setting page, we could let user reset the feed token:

Feed endpoints (required)

We can put feed endpoints in a separate blueprint:

atom_bp = Blueprint("atom", __name__)

and register it like a normal page under /atom :

app.register_blueprint(atom_bp, url_prefix='/atom')

Here’s a list of endpoints I’d like to add.

my feed

It will be served at /user/<user_name>/feed/events.

There are several event types:

class UserTimelineEventType(Enum):
    RECORDING_RECOMMENDATION = 'recording_recommendation'
    FOLLOW = 'follow'
    LISTEN = 'listen'
    NOTIFICATION = 'notification'
    RECORDING_PIN = 'recording_pin'
    CRITIQUEBRAINZ_REVIEW = 'critiquebrainz_review'
    PERSONAL_RECORDING_RECOMMENDATION = 'personal_recording_recommendation'

We can provide a query string called type to let users choose what types of event to show. It could be specified multiple times, like this listenbrainz.org/atom/user/ericd/feed/events?type=listen&type=recording_pin . By default, we enable all types.

And like API /1/user/(user_name)/feed/events, we will also support max_ts, min_ts and count parameters.

Here’s a pseudo Python code to demonstrate the idea and it will check code query string described in the authentication section:

@atom_bp.route('/user/<user_name>/feed/events', methods=['OPTIONS', 'GET'])
def events_feed(user_name: str):
    validate_code()
    _validate_get_endpoint_params()
    
    types = request.args.getlist("type")
    
    get_following_for_user()
    
    if "listen" in types:
        get_listen_events()
    if "follow" in types:
        get_follow_events()
    ...
		
    handle_hidden_events()
    sort_events_by_time()
    all_events = all_events[:count]

    ...

Then we can use all_events to construct the feed response.

for event in all_events:
    entry = UserEventEntry(type = event.type, ...)
    entries.append(entry)
feed = UserEventsFeed(user_name = "Eric", items = items)

Finally we use Flask’s make_response to fill the response content and set Content-Type to application/atom+xml .

response = make_response(atom.to_xml())
response.headers['Content-Type'] = 'application/atom+xml'
return response

On feed readers, each event will be shown as a feed item:

fresh releases

There are two fresh releases feeds, one for user and one for the whole site. The endpoints for them will be /user/<user_name>/fresh_releases and /explore/fresh_releases .

We only sort the release by release_date. We also allow users to choose if we should include past and future release through query string.

On feed readers, each release will be a feed item:

stats

There are various of statistical data on user’s stats page. I think it makes sense to include top artists, top release groups and top recording in our feed.

Their endpoint will be:
- /stats/user/<user_name>/artists
- /stats/user/<user_name>/release-groups
- /stats/user/<user_name>/recordings
The shares the same internal logic for the JSON API counterparts. All 3 lets users to specify range, offset and count. So a typical endpoint would be like this: /stats/user/ericd/artists?offset=0&range=this_week&count=10.

For range this_*, the stats are updated every day since the range hasn’t ended. So the feed should update once a day, each update aggregates more stats as we listen to music through time.

On feed readers, it looks like this:

If the range is not this_* , but week , month or year , we will make sure the stats only update when the specified time range has ended. week, month and year update once a week, once a month and once a year, respectively.

On feed readers, “top artists for last week” should look like this:

stats art creator

We will serve all art creator types that are available on Web UI and include the generated svg element in the feed content:
- /art/<custom_name>/<user_name>/<time_range>/<image_size>
- /art/grid-stats/<user_name>/<time_range>/<dimension>/<layout>/<image_size>

UI (required)

Basically we will have 2 types of feed button (thanks, aerozol!). One for page which has an extensive customization UI component and we place the button there:

and one for regular page where we put the button right at the corresponding component for the feed:

When user hover on the button, we could display tool tips to provide more information and helps on how to use our feed endpoints.

Documentation (required)

Feed endpoints are kind of like APIs in terms of how we handle them. Thus we will document them with in-code docstrings and also dedicated documentation pages.

Practical requirements

Hardware: ThinkPad X1 Carbon 6th, Intel Core i5-8350U, 16G RAM

Software: Arch Linux, VSCode

Timing

Before May 27 (official beginning of coding), I would like to discuss this project with possible people and contribute to other parts of ListenBrainz. I will also attend dev meeting on Monday to communicate progress and feedback.

July through August: I will be spending 20 to 30 hours a week on the project.

time frame	plan
week 1-2 (May 27 - June 9)	Implement Atom feed utility class. Make sure we can generate generic Atom feed which can be correctly rendered by-2 feed readers. (Milestone: feed generation)
week 3-4 (June 10 - June 23)	Add feed for user fresh releases and related UI work. End user tests.
week 5 (June 24 - 30)	Implement authentication for feed endpoint private to user. (Milestone: authentication)
week 6 (July 1 - 7)	Add feed for user timeline events, which should use previously implemented authentication scheme. End user tests.
week 7-8 (July 8 - 21)	Add feed for stats and art creator. End user tests. (Milestone: feed endpoints)
week 9-10 (July 22 - Aug 4)	Collect final community feedback and make improvements. Write documentation page.
week 11 (Aug 5 - 11)	Wrap up the project and prepare the final deliverable. Write the GSoC blog post.

edits(20240327): add project length; prefer Atom to RSS; use separate feed token; update UI mockup; add doc consideration.

UltimateRiff · March 15, 2024, 4:41pm

I think some sort of RSS feed for the art creator might be interesting? might be a bit complicated too, but my thought would be to feed say, albums on the floor for last week every week once it updates

for RSS feeds in general, one application I can think of is creating ListenBrainz widgets (ticket here) for your website. I’ve seen an upsurge of personal websites as of late, especially for music artists. I haven’t started building mine, but that could be a cool thing to add to it~

edit: also, I don’t know that there’s any sort of private feed you’d need to worry about? (not 100% on that, ask your mentor) ListenBrainz is very into open data, so I believe all user listen data is public, and the user token is mostly there to prevent others submitting listens for you (tho again, not 100% sure on that, I’m not a dev)

ercd · March 16, 2024, 4:20am

Both are great ideas! I’d like to see a quick wrap up of my previous listenings in RSS reader and be mindful about it. RSS widgets could really add to the social aspect of LB. Will keep them in mind.

Ah. Thank you for reminding me of the open nature about LB. Some URL are gated by user token, e.g. https://api.listenbrainz.org/1/user/ercd/feed/events. Browser and web UI do the auth for us for free. RSS reader may not be able to do that. So for most part of it, I’d say it’s to accommodate our server.

rob · March 18, 2024, 4:19pm

Hi!

Thanks for your proposal!

I think you should think up which endpoints would really benefit from the RSS treatment and then make your proposal more specific. Your proposal overall feels pretty good, but I feel some specifics about the proposed feeds are missing.

As for the schedule, if the last item is 4 weeks long, the project length is probably a bit too long and should be shortened. I think once you add more details about which RSS feeds should be updated, you should review your schedule again – it feels rather simple.

Also, lets wait for some feedback from @lucifer .

ercd · March 19, 2024, 5:05am

Thanks for the feedback! I was waiting for some inspirations from community to let me get down to more specifics and @UltimateRiff’s response definitely helps. Will add more detail ><

zabey · March 23, 2024, 9:26pm

I don’t know if this has been considered already, but you may want to consider using the Atom standard for feeds instead of (or even in addition to, by publishing the same content on each) an RSS feed. Here are some articles of people complaining about RSS: Atom vs. RSS, Atom > RSS, and heres a more technical comparison between the two: RSS 2.0 And Atom 1.0 Compared. In any case its my experience that 99.99% of all feed readers support both, so it isn’t a big deal either way.

lucifer · March 24, 2024, 9:01pm

Hi!

Thanks for the proposal. I will take a detailed look later again but some quick feedback until then:

You have not mentioned the project size.
The authentication should not use the user token as even with md5 there is a chance of brute forcing. We can use some other random generated identifier, maybe a new token and store it in a new table in the database.

lucifer · March 25, 2024, 8:21am

More feedback:

How will a user be able to customize the type of LB feed events in their RSS feed? Will there be some UI dialog on frontend to allow the users to do this? For fresh releases, I guess we can just take the current configured parameters on the page. Stats will have a RSS button for each entity and can pick up the current time range. Still, I suggest to chat with aerozol to figure how this will work.

What’s this feed about and how will it work?

ercd · March 25, 2024, 9:21am

Thanks for bringing this up! Will look into Atom.

ercd · March 25, 2024, 9:40am

Thanks for the feedback, lucifer.

I think it’s a 175-hour project. Will include this in the updated proposal.
We can add a field called feed_token to the user table and use that to authenticate feed endpoints with hash in the URL. What do you think?
You’re right. We should add an customization UI for the timeline event. Then feed configuration hooks into configured params on the page. I will check with @aerozol first.
stats art creator:

Basically we include the generated svg element in the feed content so that users can feed readers can renders the svg.

aerozol · March 26, 2024, 10:22am

A very cool project, I think this one will be fun. Especially the “firehose” high-volume feeds, and seeing what external tools people come up with using the feeds.

Don’t worry too much about the design and placement of the buttons - @mr_monkey and I can lead the UX.

I think we will have two types of feed buttons, one on explore pages:

(edit: ‘variable feed’ sounds kind of dumb huh, hopefully you or someone else comes up with something better, or maybe we don’t need the text)

And one on regular pages, where they may be some options as well:

These aren’t final, but an idea of what direction I can imagine this would go. My priorities are consistency, so these buttons can always be found in a similar place (it seems like in the center bottom would work well, and allow for extra text and options). And having them sit back more in the visual hierarchy - the people who want feeds will find them.

I hope that’s helpful. Please don’t spend time redoing your mockups to match mine, unless my notes prompt a mechanical/functional change for your proposal, that you need to show. We can tackle the visuals in greater detail if your project gets accepted.

P.S. As LB gets updated the RSS feeds will get out of date - you may want to add writing documentation to your plan, so as new elements (for instance, new types of feed notifications) are added, other contributors can update the feeds. And avoid breaking them.

ercd · March 26, 2024, 12:07pm

Thanks for the feedback and mockups! I will include your mockups and update the wordings a little bit as I believe it will make the proposal easier to understand.

Great point on documentations! We definitely need those.