GSoC 2024: Set up Bookbrainz for Internationalization - Tarun_0x0

GSoC 2024

Set up BookBrainz for Internationalization

Project information:

  • Proposed Mentors: monkey
  • Languages/skills: Javascript/Typescript
  • Estimated Project Length: 175 hours
  • Project Size: Medium
  • Expected outcomes: Full translation project and workflow set up, with as much as possible of the website text captured for translation

Contact information:

Synopsis:

Bookbrainz, as a website holding metadata about books from all around the world and encompassing various languages, serves not only as a comprehensive database but also as a vibrant community of book lovers where each individual is from different linguistic and cultural backgrounds . However, despite its rich tapestry of languages represented within its user base, the website currently only supports English as its primary language.

Therefore, we need a way to internationalize BookBrainz and set up a community translation system. For internationalizing BookBrainz, we will utilize an internationalization framework, i18n. Additionally, for setting up the translation system, we will use an open-source web-based translation tool with version control, Weblate.

This project will set up Bookbrainz with i18n and Weblate considering all the required changes in the current state of the website to achieve that goal.

Why i18n ?

You may ask why i18n , well the main reason are

  • It’s a mature library: i18next is open source and fits most internationalization use cases.
  • It’s pretty rich: i18next goes beyond what other libraries do. It splits translations into multiple files, uses plugins to detect languages, and uses local caching and load translation to deliver localized content to users across the world.
  • Active community : i18next is well maintained , even when it comes to maintainers I have personally talked to them and they are quick to respond incase of any query

Implementation:

The implementation process is divided in to following parts

  1. Setting up i18n
  2. Using the translation in frontend
  3. Working Prototype
  4. How will a user change the language ?
  5. Setting up new Weblate Project
  6. Continuous Translation Integration ?

Setting up i18n:

Server Side:

The first point to note is that BookBrainz is constructed using React as its frontend framework. It uses server-side rendering to preload pages on the server before transmitting them to the client.

This approach leads to faster initial page loads because the server sends pre-rendered HTML to the client, reducing the time it takes for the user to see content. Similarly , we can utilize this setup to load the required translation of the requested language on the server and send it to the client . lets see how

Let’s say we are loading our translation files located in this ./src/public/locales

Then we need to set up our server-side i18n instance configurations, which look like something like this for now, but can be changed later if we need to address some special cases.

Details about each and every configuration can be found here i18n configuration docs but i would like to give a high level info about some property and how are they useful for us

  • fallbackLng : This property specifies the fallback language to use if a translation is not available in the requested language.

  • backend.loadPath : this property utilizes i18next-fs-backend Backend Plugin to load the translations file from local storage

  • preload : The preload property determines whether i18n should preload translations for all configured languages when the i18n instance is initialized. Setting preload to true can improve performance by loading translations upfront, reducing the need for additional network requests when switching languages.

  • load : decides strategy to define which language codes to lookup. Example: given set language is

en-US: - ‘all’ ⇒ [‘en-US’, ‘en’, ‘dev’] - ‘currentOnly’ ⇒ ‘en-US’ - ‘languageOnly’ ⇒ 'en’

Now after initing the i18n instance with these configuration we would like to use it in every routes , for doing so we are going to use a middleware which plays an important role in this project

I18n-http-middleware:

  • Attach the initialized i18n instance with req object

  • Can be used to detect the requested language from query string , cookies , session or from header of the request

Now we can finally use our i18n instance in our routes , for example in our src/server/routes/index.js file

We can see here that we are cloning our i18n instance then passing it in our react tree through i18nextProvider,

The I18nextProvider component is serving as a provider for internationalization (i18n) capabilities. It wraps around the components that require access to i18n functionalities, ensuring that they can access language resources and configuration settings provided by the i18nServer instance.

Now, on our server side the only part left is to load translation and send it to client

One way of doing that is to extract the translations and initial language from i18n server instance and use the global window variable in pre-rendered HTML , filling it with stringified translations string and initial language and then access it on the client side , continuing in same route it looks like this

We are finished with setting up the i18n instance on the server side

Client Side:

In previous part we set up our i18n instance and used it inside routes in our server, now, the question arise can we use the same instance for our client too , well the answer is we can, but , this can cause various problems for example

  • Server-side rendering requires preloading language resources, which may differ from how resources are managed on the client side. Using separate instances allows for tailored resource management strategies.
  • In scenarios where both server-side and client-side rendering occur simultaneously (e.g., server-side rendering followed by client-side hydration), ensuring proper synchronization of i18n resources and state between the server and the client can be challenging with a single instance.

So for our client we will initialize another i18n-client instance, with configurations more relevant to the client environment

In this src/i18n-client.js file you can see there are less and some different properties compared to the i18n instance on server , again i would like to point that these property definitions can be found on i18n configuration docs

The interesting part here is that by using i18n.use(initReactI18next) we pass the i18n instance to react-i18next which will make it available for all the react components. It enables the use of hooks provided by react-i18next, such as useTranslation, useLanguage which allow React components to access i18n functionality easily.

Also the interpolation property It allows integrating dynamic values into your translations and currently set to default setting for react

Now before hydrating our react component we will import our i18n-client instance and use a special hook that is specifically designed for server-side rendering (SSR) with React applications.

useSSR:

The useSSR hook in React-i18next helps with managing i18n resources during SSR. It ensures that the language resources are properly loaded and made available to the server-side rendering process. This is important because, during SSR, you might not have access to the browser’s environment, such as the window object .

Have a look at this example in src/client/controllers/index.js file

Finally, our i18n setup is configured for server-side translation loading and ready to be utilized in the frontend to display translations. BUT , there are still some adjustments needed to properly change languages and apply them across the entire website. We’ll address these tweaks shortly.

Using the translations in frontend

This part is fairly easy all we have to do is to use the functionality provided by react i18n some important one are the t function , Trans component , useTranslation hook

  • The useTranslation hook gets the t function and i18n instance inside your functional component.

  • The t function is a key utility for language translation within React components. It’s typically used to translate text strings into the desired language based on the current language set in the i18n configuration

  • In most of the cases the t function covers most of our use cases but some cases like this needs Trans component

Easy, right? Here’s a twist. In the case of BookBrainz, our website uses some legacy React. What I mean to say is, some of the components are made using React classes, and some are functional components. While I was testing my methodology of setting up i18n with BookBrainz, I found out that i18n doesn’t work very well if we use it for both class components and functional components at the same time. It’s better to choose one.

This may be an extra task, but I suggest we convert not all but the required one from the class components to functional components. Luckily, this is not a time-consuming task because, to my knowledge, all the BookBrainz entity pages are written using functional components; there are only several pages that need these changes

for example, the index page . This will allow us to use hooks in React and avoid some Higher Order Components provided by react-i18next, like withTranslation and withSSR.

Here is an example of t function and use translation hooks being used in index.js page in Bookbrainz after converting it from class component to functional component

This is it for using translations in the frontend , we can change our way displaying the translations as per our need like if we need to display an dynamic value we can change our translation and write it inside t function like this

Translation Keys

|624x164.01113361322416

Sample

How will we handle grammatical adjustments in case of plurals?

We know that without pluralization, you might end up with awkward or grammatically incorrect translations because pluralization in internationalization (i18n) helps create more natural language translations by allowing your application to adapt to different grammatical rules for plural forms in different languages.

But the problem here is that the plural forms depend on language, for instance, in English, you typically use a singular form when the count is 1 (“1 message”) and a plural form otherwise (“2 messages”). However, other languages may have different rules. For example, in Russian, there are three forms: singular, plural for numbers ending in 2, 3, and 4 except for numbers ending in 12, 13, and 14, and a general plural form for other numbers (“1 сообщение”, “2 сообщения”, “5 сообщений”, etc.). Similarly in arabic there are 5 plural forms , so if we are planning to add these languages in Bookbrainz we need a way to handle these type cases in translation file

In I18next we can handle cases like this by mixing pluralization with interpolation

For example

Now these translation can be used in our react component

Now thats a simple case in english but what about in languages with multiple plural forms like, arabic which have 5 plural forms , in that cases we can format our translation something like this

Keys

Sample

This may seem very redundant to write. It would be better if we were able to condense all of this into a single line in our translation keys,

Well we can , all we had to use i18next intervalPlural plugin and format above translation keys to something like this

( The code sample in this section are copied from i18n documentation , Link to documentation )

Working Prototype

To demonstrate the method described above for setting up internationalization in Bookbrainz, I created a single-page prototype that adds support for English and Spanish languages on the homepage of Bookbrainz. Everything is set up as explained above, and the code can be found in the GitHub link provided. Please remember that this is just a single-page prototype, and setting up a multi-page setup still requires some adjustments, which will be discussed shortly.

Video Link : youtube link

Github Link : Github link to my branch

How will a user change the language ?

For Changing the language we need a way for user to send a request with desired language in it (for example, in query params) to the server routes in return the routes will extract the requested language code and load the translations for that language and send them to the client where they will displayed as usual.

For user to change language we can use a drop down menu selector with all the supported language

Something like in Musicbrainz

Let say the user wants to set the language of to spanish and sends a request to server with query params set then we have to modify our existing code something like this

Extracting the requested language from query params and changing the language accordingly

After updating the language, we’ll store the new preference within cookies. This will enable i18n to detect the user’s preferred language and automatically adjust the website language accordingly during subsequent visits.

Now our i18n configuration also need some changes so it can detect the user preferred language from cookies

Do you remember the middleware we used before ? the i18n-http-middleware , it also provides us with a plugin to detect user language from path, cookie, header, queryparams, session from req . using this feature will require some small modification in our i18n server instance

The next time a user visits the site, it will load with the language that was set by the user during their previous visit.

Setting up new Weblate Project

Luckily for us we already have a hosted Weblate server

Add Bookbrainz as a new project (the screenshots are from my local instance for demonstration purpose)

Adding folders (components) containing the translation files (.json files). Let’s say our translations are located in the path ./src/public/locales. We can name our components as ‘locales’, but we always structure our translation files into different components according to our needs. For example, we can have a separate component for server messages and another one for client-side strings.

|624x304.65328609665534

While creating new components weblate provide us with loads of configuration options for example

  • we can choose a different git repo to push the translation changes
  • or may be a different branch
  • we can also define the file mask for our translation file path
  • how new translations should be handle
  • Using template for new translations

….and many more like

And our new component with supported languages are ready to be translated in our project

The translator page will look something like this (looks familiar right ? maybe same as Musicbrainz but now also available for Bookbrainz !)

Similarly we can set up other components in our project each consisting there own translation files

Continuous Translation Integration

If someone is new to weblate he may wonder, how will the translations move from weblate to our project after the translator has done their work ? It’s quite simple just like github we make the changes in code , commit them , and push them to the upstream similarly just replace the word code with translation , where admin can review the changes and merge them with the source repository

Timeline

  • 1 May - 26 May : Community bonding period
    • Continue reading the weblate and i18n docs exploring more options which fits better with our use case
    • Have a closer look at Musicbrainz’s internationalization and translation
  • 27 May - 2 june : coding period ( Week 1 )
    • Setting up initial translation folder with english as the initial language , also adding some translation keys for testing purpose
    • Setting up i18next server and client instance
    • Refactor a single react class component page into a functional component page
    • Testing the i18next server instance with a single route of the refactored Page . Also fixing bugs in the process if any
  • 3 june - 9 June : coding period ( Week 2 )
    • Setting up i18n on server for multi page access and adding the language changing mechanism
    • Testing the language changing mechanism with initial setup after setting them up with three or four pages . Also fixing bugs in the process if any
  • 10 June - 16 June : coding period ( Week 3 )
    • Setting up i18n instance with leftover pages
  • 17 June - 30 June : coding period ( Week 4 , Week 5)
    • Converting the pages with react class components into functional components
    • Begin working on static client-side strings, page by page.
    • adding translation keys and their English translations, while also addressing cases of pluralization and gender.
  • 1 July - 14 July : coding period ( Week 6 , Week 7 )
    • Begin working on strings we are receiving from the database ,for example , relationship-type descriptions.
    • Concurrently converting the pages to functional component if needed
  • 15 July - 21 July : coding period ( Week 8 )
    • Buffer week
  • 22 July - 28 July : Midterm evaluation
  • 29 July - 4 Aug : coding period ( Week 9 )
    • Locally setting up weblate for testing purpose
    • Creating a new project project in weblate and add the existing component
    • Configuring Weblate
    • Testing the weblate Continuous Translation Integration by making a dummy user in my local repo
    • If everything works fine , copying the weblate project into metabrainz’s hosted weblate server and testing it with a separate branch in Bookbrainz repo
  • 5 Aug - 11 Aug : coding period ( Week 10 )
    • Buffer week
  • 12 Aug - 18 Aug : coding period ( Week 11 )
    • Finish all started tasks and ensure that everything is in a working state
    • Write a Blog post about this project
  • Final submission and evaluation

About me

My name is Tarun Meena, and I am currently pursuing my Bachelor of Technology in Computer Science and Engineering from the National Institute of Technology Patna. I am currently in my pre-final year.

During my freshman year, I studied the C programming language. In my sophomore year, I completed courses in Java, Database Management Systems, Operating Systems, and Computer Networks & Their Protocols.

Although my college studies have played a vital role in developing a strong understanding of fundamental computer science concepts, most of my skills related to web development have been acquired through self-learning. I have been learning and practicing full-stack web development with JavaScript/TypeScript and their libraries for almost a year now. During this period, I have completed some small projects, which can be found on my GitHub profile.

As for my hobbies, there are not many, but some of them include weightlifting, playing soccer, and occasionally playing video games to unwind.

Open Source Contribution

It’s not been long since i got to know about open source and how can we contribute in these organization and to begin with I started by open source journey with Bookbrainz in december 2023 , since then I learned a lot of things from frontend to backend as this was my first time looking at large scale application codebase

During this period I also worked on some bugs in the Bookbrainz-site repo . My PRs link is provided below

My PRs : Pull requests · metabrainz/bookbrainz-site · GitHub

Other Information

  • What type of music do you listen to ?

I like slow instrumental jazz and calm old hindi classics like lag ja gale from Lata Mangeshkar and Kuch na kaho by singer Kumar Sanu , also I like to hear Ed Sheeran his Thinking out loud is one of my favorite and the list goes on

  • What aspects of MusicBrainz/ListenBrainz/BookBrainz/Picard interest you the most ?

Since I joined the Metabrainz community, most of my time has been spent on BookBrainz. As a developer, the aspect I love most is how well-organized the codebase is, with every part of the website linked to each other. Additionally, we have a super helpful and friendly community of other developers.

  • Have you ever used MusicBrainz Picard to tag your files or used any of our projects in the past?

Unfortunately, I haven’t had the opportunity to explore MusicBrainz Picard yet, but I’ve heard so much about it. I definitely want to give it a try.

  • When did you first start programming?

Even though I had been using computers since I was 10 years old, I wrote my first code in school when I was in 8th grade, back in 2016. It was in the C language. At that time, my interest in coding was not very strong, but gradually, as I learned more, it has been increasing ever since.

  • What sorts of programming projects have you done on your own time?

Coin-info: a website made with react (typescript) which gives you real time market information of Bitcoin

Live link : https://coin-master-ten.vercel.app/

Github: GitHub - Tarunmeena0901/KoinX

Coursii : a full stack application made with react , express and mongodb

Github: GitHub - Tarunmeena0901/Coursii: A website where user or admin login to his/her account and one can launch there own course or can join courses launched by others. made using react.js , recoil library , Express.js , Mongodb

  • What computer(s) do you have available for working on your SoC project?

I owns a 16gb RAM , 1 TB SSD , Ryzen 7 5800H , RTX 3060 6gb gpu , 144 hz refresh rate HP OMEN laptop

  • How much time do you have available per week, and how would you plan to use it?

During the time period of this project I will be having my long summer vacation and no other commitments so I can easily devote maximum 30 hours per week which i think should suffice for this project as our main goal is full translation project and workflow set up, with as much as possible of the website text captured for translation , However, I am ready to stretch the goals of this project and work on it further to increase the translation coverage or maybe full internationalization in the upcoming post GSoC time.

1 Like

Please feel free to point out any mistakes , suggest any better ideas , or ask any doubts related to this proposal. I will be grateful to hear any sort of feedback
Thank you

Hello, @itachi_0x0 , welcome to BookBrainz, and thank you for your interest in contributing to BookBrainz through GSoC.

My name is Jim, and I am an ordinary MusicBrainz contributor, not a BookBrainz or GSoC authority. So my comments are just those of an interested user. I do, however, have some experience working in application internationalisation and translation.

I love that you are interested in improving the i18n for BookBrainz. However, I suggest you consider using the term “string translation” rather than “internationalization” for this project’s title.

To me, internationalisation (i18n) certainly includes the infrastructure for delivering translated strings to the UI and app logic. However, it also includes much more: locale-appropriate display of dates and numbers and currency, grammatical adjustments like plurals, translated place names, locale-related features, and cross-locale features.

An example of a locale-related feature might be book identifiers. ISBN numbers are widely used, but are there other identifiers which are in use in some places or in some other times? The infrastructure for allowing multiple identifiers, based on locale, is part of i18n.

An example of a cross-locale feature might be transcription of author and book names from the original script to the user’s script. A reader of Punjabi might want BookBrainz to display the author name 紫 式部 as ਮੂਰਾਸਾਕੀ ਸ਼ੀਕੀਬੂ. The infrastructure for that kind of transliteration is also part of i18n.

I don’t mean to criticise the proposal, just to suggest that you use a label appropriate to its scope. I hope these ideas are encouraging and helpful for you. Again, I am just another MusicBrainz contributor. If these ideas conflict with what you and BookBrainz need for a successful GSoC, feel free to disregard them. Best of luck!

1 Like

Hi @Jim_DeLaHunt! I am very thankful that you shared and pointed this out. I agree with how you defined internationalization. The things that you mentioned, like:

However, it also includes much more: locale-appropriate display of dates, numbers, and currency; grammatical adjustments such as plurals; translated place names; locale-related features; and cross-locale features.

are indeed part of this project. Maybe I haven’t mentioned much about them in my proposal, but I do have these in mind because I thought the name ‘internationalization’ makes these things obvious to be included in the project.

Now that you have mentioned it, I think I should include a small section for each of them in the proposal.

Thank you.

1 Like

Thanks for your submission @itachi_0x0 !

I like that you went into details one the server-side rendering and cookie setting, while missing some more details on the string extraction aspect of the task. I think you may be underestimating that part will take, especially considering the plurals and other grammatical variations.

Regarding the timeline, I can see that you left yourself only two weeks to the the bulk of the work of extracting the text while rewriting components from class to functional.
That does not seem like a lot of time at all, especially considering the size of the project.
A quick calculation of 175 hours for the project over 11 weeks gives us 15 hours a week. 30 hours for all this seems short really.

On the other had you have two weeks reserved for setting up weblate locally, configuring it and testing integration, which seems on the surface like a few day’s work.

So some reorganization of the timeline at least is required, or the project size is wrong.
I think it would be better not to rewrite the components to functional for a 175 hours project. Basically doing two projects at the same time is going to make it harder for you to focus on the goal and harder for me to review PRs.

Regarding translation vs. internationalization, @Jim_DeLaHunt is totally right and I was using the wrong term when I wrote the brief for this suggested project.

If full internalization is part of the project proposal too, then the project lenght is definitely too short!
If it is part of it, some mention of it would indeed be useful and some planned time in the timeline, although it seems late now to evaluate how much more work that would take.

1 Like

I think you may be underestimating that part will take, especially considering the plurals and other grammatical variations.

You were right; I think I was underestimating this part. So, I gave it another thought and added some extra time for working on static and dynamic strings (for example, relationship-types descriptions) in my timeline.

On the other had you have two weeks reserved for setting up weblate locally, configuring it and testing integration, which seems on the surface like a few day’s work

At the time of testing my local Weblate instance, I thought that doing this for the whole project would take some time. That’s why I gave it an extra week, but now I think it should not take more than a week. So, I changed that in my timeline too.

I think it would be better not to rewrite the components to functional for a 175 hours project.

While it may seem like a separate undertaking, upon reviewing the codebase recently, I’ve found that only a handful of pages needed these modifications. Furthermore, I’ve already converted and tested numerous pages from class components to functional ones in my local environment for testing purposes. As a result, this task shouldn’t consume a significant portion of the project’s time. Moreover, I think transitioning to functional components instead of classes will only simplify things and reduce confusion.
But if this subtask seems infeasible for this project, I believe we can certainly find alternative methods to overcome this problem.

If full internalization is part of the project proposal too, then the project length is definitely too short!

Surely, full internationalization of Bookbrainz will take more time for a medium-sized project. I think that’s why in the expected outcome, we have written:

Full translation project and workflow set up, with as much of the website text captured for translation as possible.

So, we will try to translate as much text as possible for this project. However, I am ready to stretch the goals of this project and work on it further to increase the translation coverage in the upcoming post GSoC time.

I had made the changes in my GSoC proposal pdf version