[GSoC 2024]: Email service with internationalisation and MJML-based markup

Email service with internationalisation and MJML-based markup

Project summary

Title: Email service with internationalisation and MJML-based
markup
Proposed mentors: bitmap, reosarevok, yvanzo
Languages/skills: Rust, HTML & templating, Email
Estimated Project Length: 350 hours, including optional extensions
Expected Outcomes:

  • Mail service that sends multipart HTML and text emails

  • Mail service supports internationalisation

  • Mail service can send emails concurrently

Extension Objectives:

  • Mail service directly reads subscriptions from the main database

  • Mail service handles complex/long term sending errors, like dead
    addresses

  • Mail service is integrated with other MetaBrainz projects

Contact information

Personal Introduction

Hello! I’m Jade - or JadedBlueEyes in most places. I’m a long-time
programmer, and I’m currently in my first year studying Computer Science
with a year in Industry (BSc) at the University of Kent in England. I’ve
been a MusicBrainz user for a while now - I created my account
(Jellis16) back in 2020 because I wanted to add a few albums from my
collection that were missing. When I found that MetaBrainz were
participating in GSoC, I knew I wanted to do my project with you!

Proposed project

This project is starting from the MJML-based email project
idea
.
However, it is significantly increased in scope.

  • Currently, emails from the MusicBrainz server are generated using
    the Template Toolkit Perl library

  • The emails are only sent in plain text

  • The emails can only be sent in English

  • Limitations in the current system make it slow - it takes 4 hours
    a day to send subscription updates
    at the moment

The aim of this project is to implement a new mailing system which:

  • Allows sending both HTML-formatted and plain text multipart emails

  • Significantly improves performance

  • Enables translation of emails.

At a high level, it will achieve this by:

  • Writing a new mailing service in Rust

  • Integrating with existing translation infrastructure to allow translation of email templates

  • Using the MJML markup language (via mrml to format emails

  • Using the html2text library to convert formatted emails into an appropriate textual format

  • Sending the resulting emails via MetaBrainz’s existing infrastructure (SMTP servers or local sendmail installation)

  • Allowing sending emails from the MusicBrainz server to the mailing service in bulk to take advantage of Rust’s parallelism.

  • Handling errors in mail sending, and retrying where appropriate.

  • Optionally, reading subscriptions directly from the Postgres database and sending them without the involvement of the MusicBrainz server at all.

For a single email, this might look like the following:

I have also made a mockup of what an email could look
like
, although the
final templates will be created together with aerozol.

Timeline

  • Week 1

    • This week is mostly allocated to getting orientated!

    • This is time to get familiar with the existing codebase and
      expected dependencies, talk to everyone involved and make sure
      everything is set!

    • If I don’t need the full time, I can start on the project early

    • This is also when I will talk with aerozol about the intended
      design of the email templates.

  • Week 2

    • Implementing a minimum viable service

      • This will accept a HTTP JSON API call, select the correct
        template and produce a multipart email
    • I’ll be writing documentation, tests and logging throughout the
      project

  • Week 3/4

    • Integrate translation

    • This involves:

      • Syncing with Weblate

      • Selecting the correct language for the user

        • There is a pre-existing language preference stored for
          the UI, which can be used

          • It may be worth adding a separate option for the
            email language.
      • Inserting translation strings into the template

        • The existing translation infrastructure uses a custom
          perl-inspired template format, and allows a subset of
          HTML

        • This formatting library does not already exist in Rust,
          so some alternative should be used here

        • Fluent has been suggested, and there are other options availible.

  • Week 5

    • Template writing

      • I will write some draft versions of the most commonly used
        email templates

      • These are mainly for testing, finding issues with the
        translation setup, and informing the final designs

    • Text output finetuning

      • Prior testing shows that the default output of html2text
        isn’t as good as it could be

      • This is when I will finetune it to gain an acceptible output

      • This may involve implementing a custom Renderer

    • I will work with aerozol to start creating the final templates.

  • Week 6

    • Bulk sending

      • This involves adding an API method to send a batch of emails

      • These should then be sent concurrently

      • Once they are all sent, a response should be returned

  • Week 7

    • Alternative API methods

      • Talking on IRC about how to integrate the service with the
        server, a HTTP API over the network may not be the preferred
        final implementation.

      • The existing React renderer communicates over a local socket

      • This is when I’ll implement the final communication method -
        whether that is sockets or something else.

  • Week 8

    • Integrating with monitoring infrastructure

    • MetaBrainz uses Sentry and Prometheus for monitoring and
      observability

    • This is when I will work on integrating those into the service.

    • This will involve hooking the Sentry SDK into existing logging
      code and writing metrics collectors.

  • Week 9

    • Reading subscriptions from the database

    • As an additional optimisation, the service could directly read
      user subscriptions from the database and send them itself.

    • This would add a fair amount of complexity to the service (and
      deploying it), so it’s an optional extension

  • Week 10

    • Integration and deployment

    • This should be when I start working on deploying the application
      into a production environment

    • At this point integration code should have been written, and it
      should be clear how the final application will be deployed

    • I expect it to be deployed either as a separate container or as
      a subprocess of the MusicBrainz server, depending on how
      communication is implemented

    • This should mostly just be ironing out issues.

  • Week 11/12

    • Breathe a sigh of relief that the main project is done! Or not,
      because there’s a fair chance something could take longer than
      expected and push the timeline back. This is some margin for
      that.

    • If everything is done on time and there are no issues, I can
      work on more complicated error handling at this point

      • For example, marking dead addresses and mail servers after
        some number of retries.

Community affinities

What type of music do you listen to?
A lot of music! Some recent favourites:

What aspects of MusicBrainz/ListenBrainz/BookBrainz/Picard interest
you the most?

I really love the goal of collecting metadata for all the music in the
world! It’s both incredibly useful and incredibly impressive as a
technical and social feat. I created my MusicBrainz account back in
April 2020, and I’d been using Picard for a while before that for my
collection. I also find ListenBrainz fascinating - both in revealing my
own listening habits and opening up music recommendation algorithms,
which have mostly been the secret sauce of streaming services. I’ve
managed to accumulate more than 18,000 listens so far!

Have you ever used MusicBrainz Picard to tag your files or used any of
our projects in the past?

See above! :slight_smile:

Programming precedents

When did you first start programming?

It depends on what you count as programming! I started fiddling with
scratch and stuff before I can remember, but I picked up steam when I
was a tween - making a website for a group I volunteered with at 11 and
releasing my first app around 12/13 to 2000+ daily users.

Have you contributed to other open source projects? If so, which
projects and can we see some of your code?

I’ve contributed to a few over the years, but never anything major! A
lot of what I’ve done is locked away in private repos, but I do have a
few public projects, too.

  • GitHub - JadedBlueEyes/fendapp -
    A calculator written in Rust using Freya/Dioxus and fend. This is
    sitting on my desktop in fairly regular use right now!

  • GitHub - JadedBlueEyes/bmc -
    A little learning project, this implements the Brookshire Machine
    Algorithm in Rust.

  • I’ve made small code contributions to projects ranging from Vite to
    Jekyll

If you have not contributed to open source projects, do you have other
code we can look at?

If you message me I’m fairly happy to show you most of what I’ve worked
on, from marketing websites to my coursework :slight_smile:

What sorts of programming projects have you done on your own time?

  • Quite a few websites, using a variety of tech (Svelte, Vue, Jekyll
    are the main names)

  • GitHub - JadedBlueEyes/fendapp -
    A desktop calculator in Rust

  • I’ve modified quite a few libraries for my own purposes

  • Quite a few learning projects - for example, an interactive
    visualisation of PageRank.

Practical requirements

What computer(s) do you have available for working on your SoC
project?

  • I’ve got a desktop, running Fedora.

  • I’ve got a laptop running Windows 11 Pro

  • And an Android phone

How much time do you have available per week, and how would you plan
to use it?

My university summer term, exam season, is from the 6th of May to the
14th of June. My final exam is on the 3rd of June. From that point
onwards, it will be my summer holidays. I expect the holidays to be free
of any other major commitments. I expect to be able to spend around 30
hours a week working on the project.

3 Likes

Welcome, @JadedBlueEyes . Thank you for your interested in doing GSoC with MusicBrainz! And thank you for your past contributions to the data.

I am just an ordinary contributor, not a decision-maker. But internationalisation is dear to me, so I am enthusiastic about this project. One extra goal to consider including: make it possible to send to globally inclusive email addresses, which are not limited to the Latin script. For instance, it should be possible for me to receive MusicBrainz emails at my address in Hindi, <जिम@डाटामेल.भारत>.

Actually delivering emails to globally-inclusive addresses requires more than just email-sending code. It matters that the SMTP servers support the extensions for internationalised email addresses. If they do, great. If they don’t, fixing that may be out of scope for a GSoC project. But at least, the project can ensure that it removes existing barriers to use of globally-inclusive email addresses, and does not build in new barriers.

One resource that might help in testing is a battery of email responders at various addresses, operated by the Universal Acceptance Steering Group (UASG). They are described in a document, UASG 004 Test Cases for UA Readiness Evaluation, and the associated UASG 004A plain text file.

I volunteer with UASG on evangelising this kind of email address inclusiveness, which is part of Universal Acceptance. I would be happy to consult on those aspects of the project.

Good luck!

4 Likes

Thank you @JadedBlueEyes for this proposition. I enjoyed discussing the general idea on the IRC channel #metabrainz beforehand. I kept nodding while reading your post. I’ll just make a few comments for clarification and anticipation.

To clarify, most of this time is used for querying the database. Rendering and sending the emails is probably taking a very small fraction of this time, but we don’t have any metrics to say what time it is actually taking. However, it is certainly slower than MRML anyway.

It’s a good timing to finalize this integration. To help with preparing it, the steps of interest (in the code workflow) for reporting errors and metrics can be identified and provisioned (with code or comment) as the project progresses.

This is something that you should plan with @aerozol. It will most likely be rather spread over weeks. In any case, it is nice to have taken it into account, and it is safe to have anticipated some margin even if spread over weeks.

Last, your timeline doesn’t mention tests and documentation, so I expect that these are made as you code, which would be ideal. And there is enough margin for making up for the oversights if needed. It would be great to mention those in your proposition though.

2 Likes

I would indeed add a bit of time, early on, to meet with me and talk about the template design @JadedBlueEyes. I think we will keep it simple, but the design may change how you approach other aspects of the project, and it gives me time to work through things on my end.

The other consideration is that there is no guarantee that I am available in a specific week, e.g. week 8. In any case, it’s often nice to have a quick chat and say hi before things get serious : )

2 Likes

Thank you all for the replies!

That is definitely something to consider! One of the geat things about Rust is everything’s UTF8 by default. There might be something like punnycode for domain names to consider - and like you say, a lot depends on the SMTP server. I think MetaBrainz uses Exim, which isn’t the worst but is far from perfect.

Thank you for the resources!

That’s helpful to know! Maybe there won’t be quite the speedup I was originally expecting then :sweat_smile:! Parallelising the queries could still speed it up, though! I think the thing will be to measure and improve the slow points in the service wherever they come up.

Definitely! I’ll be writing logging code throughout the service with appropriate error levels. The Sentry SDK should just be able to hook into that. I haven’t ever written anything for Prometheus, but if I mark down where the key metrics are, I know there are plenty of libraries to help both measure and export the metrics.

You’re both right here. Talking about that and getting it started early on is better. If it works for you, in the first week we can talk about what you intend for the design, and start writing the templates around week 5, when all the templating/i18n infrastructure should be ready.

Yes! I much prefer writing tests as / just before I code, so I can use them to work against - similarly with documentation. And margin is super important!

Thank you again for all the feedback! I’ll edit in the changes this evening.

3 Likes

Hi!

We usually require applicants to contribute at least one fix to the project they are applying to (MusicBrainz in your case). I took a quick look at the proposal and did not see any links to one. I strongly recommend you to work on one to strengthen your application.

Regards.

I encourage you not to let Rust’s default UTF8 string encoding make you complacent. The big obstacles to globally inclusive address support is generally not the shipping around of non-ASCII data.

The number one obstacle is usually some misguided validation step in the interaction which collects an email address from the user. Many apps attempt to use syntax checks, such as a regular expression, to decide if an address is valid. Syntax checks beyond the most basic are a doomed exercise: they will likely rule out valid globally-inclusive addresses, while letting through actually invalid addresses. The inclusive way to validate a user-supplied email address is to send a test message to it, and have the user click a link in the message. So a task for this globally-inclusive goal is to look at the email address collection workflow.

The number two obstacle is usually the sending server chain. That will involve examining the configuration of our SMTP server, and sending test messages to see if they get through.

The number three obstacle is often the display of email addresses. The important issue here is to treat the address like user free text, which could be in any language or script. Avoid punycode for domain names when displaying to the user: it is meaningless to them. Be sure that there are fonts available to cover a wide range of scripts.

Anyway, all of this is out of scope until you decide to add this goal. I just encourage you not to be complacent about it. There are reasons why globally inclusive email address support is rare, in a world where UTF-8 usage is common.

[Edited to improve the wording.]

2 Likes

Hi all! Apologies for the late reply, I didn’t see your replies. I’ve updated Discourse to send me emails now.

I plan to make some contributions in the next couple of weeks. I’ve been pretty busy this past month, but I’ve got the easter break coming up, so I should be able to invest a lot more time then. Should I link contributions here when I make them?

I absolutely won’t! I’m all too aware of the problems of bad validation - the number of sites that have validation like ^[\w\-\.]+@([\w-]+\.)+[\w-]{2,}$ (which just straight up rejects non-ascii addresses) is shocking. Sometimes, even ASCII email addresses that are slightly unusual get rejected (like my jade@ellis.link).

Thank you for the advice. I will be keeping it in mind as I write this service so that in the worst case it’s at least internationalisation ready, so that it does not hold back improvements in the interface and SMTP server.

2 Likes

In this specific case, the project is to create a new service, rather than contributing to the current repository. Some code is still expected though.

Yes, please, links would be helpful, show us the code! :slight_smile:

One way to start with some code is to create a playground where you can try implementing things that will be needed for your project. See GitHub - yellowHatpro/Rust-Playground: Learning Rust for example.

Additionally, as we know from experience that it can take some time, we encourage you to install a development setup of MusicBrainz Server already, so that you will be able to run our modified version of MusicBrainz Server with your email service later on.

1 Like

As mentioned in today’s meeting, I’ve worked on this.

  • It is set up using the Axum web framework
  • Utoipa is used to automatically generate OpenAPI documentation
  • The templates can be previewed as both HTML and text
  • The process can also use externally managed TCP sockets, like those from systemd
  • The process gracefully finishes requests before shutting down when it receives SIGTERM
  • The templates are embedded in the binary for release builds, for ease of distribution
  • I have also created a Dockerfile that creates a small image (20mb) for deployment

I’ve also set up a local copy of the MusicBrainz Server.

1 Like