GSoC 2024
Set up BookBrainz for Internationalization
Project information:
- Proposed Mentors: monkey
- Languages/skills: Javascript/Typescript
- Estimated Project Length: 175 hours
- Project Size: Medium
- Expected outcomes: Full translation project and workflow set up, with as much as possible of the website text captured for translation
Contact information:
- Name: Tarun Meena
- Email: meenatarun0901@gmail.com
- Libera.Chat IRC: Tarun_0x0
- GitHub: meenatarun0901
- Twitter: Tarun_0x0
- Timezone: GMT+5:30 (India Standard Time)
Synopsis:
Bookbrainz, as a website holding metadata about books from all around the world and encompassing various languages, serves not only as a comprehensive database but also as a vibrant community of book lovers where each individual is from different linguistic and cultural backgrounds . However, despite its rich tapestry of languages represented within its user base, the website currently only supports English as its primary language.
Therefore, we need a way to internationalize BookBrainz and set up a community translation system. For internationalizing BookBrainz, we will utilize an internationalization framework, i18n. Additionally, for setting up the translation system, we will use an open-source web-based translation tool with version control, Weblate.
This project will set up Bookbrainz with i18n and Weblate considering all the required changes in the current state of the website to achieve that goal.
Why i18n ?
You may ask why i18n , well the main reason are
- It’s a mature library: i18next is open source and fits most internationalization use cases.
- It’s pretty rich: i18next goes beyond what other libraries do. It splits translations into multiple files, uses plugins to detect languages, and uses local caching and load translation to deliver localized content to users across the world.
- Active community : i18next is well maintained , even when it comes to maintainers I have personally talked to them and they are quick to respond incase of any query
Implementation:
The implementation process is divided in to following parts
- Setting up i18n
- Using the translation in frontend
- Working Prototype
- How will a user change the language ?
- Setting up new Weblate Project
- Continuous Translation Integration ?
Setting up i18n:
Server Side:
The first point to note is that BookBrainz is constructed using React as its frontend framework. It uses server-side rendering to preload pages on the server before transmitting them to the client.
This approach leads to faster initial page loads because the server sends pre-rendered HTML to the client, reducing the time it takes for the user to see content. Similarly , we can utilize this setup to load the required translation of the requested language on the server and send it to the client . lets see how
Let’s say we are loading our translation files located in this ./src/public/locales
Then we need to set up our server-side i18n instance configurations, which look like something like this for now, but can be changed later if we need to address some special cases.
Details about each and every configuration can be found here i18n configuration docs but i would like to give a high level info about some property and how are they useful for us
-
fallbackLng : This property specifies the fallback language to use if a translation is not available in the requested language.
-
backend.loadPath : this property utilizes i18next-fs-backend Backend Plugin to load the translations file from local storage
-
preload : The preload property determines whether i18n should preload translations for all configured languages when the i18n instance is initialized. Setting preload to true can improve performance by loading translations upfront, reducing the need for additional network requests when switching languages.
-
load : decides strategy to define which language codes to lookup. Example: given set language is
en-US: - ‘all’ ⇒ [‘en-US’, ‘en’, ‘dev’] - ‘currentOnly’ ⇒ ‘en-US’ - ‘languageOnly’ ⇒ 'en’
Now after initing the i18n instance with these configuration we would like to use it in every routes , for doing so we are going to use a middleware which plays an important role in this project
I18n-http-middleware:
-
Attach the initialized i18n instance with req object
-
Can be used to detect the requested language from query string , cookies , session or from header of the request
Now we can finally use our i18n instance in our routes , for example in our src/server/routes/index.js file
We can see here that we are cloning our i18n instance then passing it in our react tree through i18nextProvider,
The I18nextProvider component is serving as a provider for internationalization (i18n) capabilities. It wraps around the components that require access to i18n functionalities, ensuring that they can access language resources and configuration settings provided by the i18nServer instance.
Now, on our server side the only part left is to load translation and send it to client
One way of doing that is to extract the translations and initial language from i18n server instance and use the global window variable in pre-rendered HTML , filling it with stringified translations string and initial language and then access it on the client side , continuing in same route it looks like this
We are finished with setting up the i18n instance on the server side
Client Side:
In previous part we set up our i18n instance and used it inside routes in our server, now, the question arise can we use the same instance for our client too , well the answer is we can, but , this can cause various problems for example
- Server-side rendering requires preloading language resources, which may differ from how resources are managed on the client side. Using separate instances allows for tailored resource management strategies.
- In scenarios where both server-side and client-side rendering occur simultaneously (e.g., server-side rendering followed by client-side hydration), ensuring proper synchronization of i18n resources and state between the server and the client can be challenging with a single instance.
So for our client we will initialize another i18n-client instance, with configurations more relevant to the client environment
In this src/i18n-client.js file you can see there are less and some different properties compared to the i18n instance on server , again i would like to point that these property definitions can be found on i18n configuration docs
The interesting part here is that by using i18n.use(initReactI18next) we pass the i18n instance to react-i18next which will make it available for all the react components. It enables the use of hooks provided by react-i18next, such as useTranslation, useLanguage which allow React components to access i18n functionality easily.
Also the interpolation property It allows integrating dynamic values into your translations and currently set to default setting for react
Now before hydrating our react component we will import our i18n-client instance and use a special hook that is specifically designed for server-side rendering (SSR) with React applications.
useSSR:
The useSSR hook in React-i18next helps with managing i18n resources during SSR. It ensures that the language resources are properly loaded and made available to the server-side rendering process. This is important because, during SSR, you might not have access to the browser’s environment, such as the window object .
Have a look at this example in src/client/controllers/index.js file
Finally, our i18n setup is configured for server-side translation loading and ready to be utilized in the frontend to display translations. BUT , there are still some adjustments needed to properly change languages and apply them across the entire website. We’ll address these tweaks shortly.
Using the translations in frontend
This part is fairly easy all we have to do is to use the functionality provided by react i18n some important one are the t function , Trans component , useTranslation hook
-
The useTranslation hook gets the t function and i18n instance inside your functional component.
-
The t function is a key utility for language translation within React components. It’s typically used to translate text strings into the desired language based on the current language set in the i18n configuration
-
In most of the cases the t function covers most of our use cases but some cases like this needs Trans component
Easy, right? Here’s a twist. In the case of BookBrainz, our website uses some legacy React. What I mean to say is, some of the components are made using React classes, and some are functional components. While I was testing my methodology of setting up i18n with BookBrainz, I found out that i18n doesn’t work very well if we use it for both class components and functional components at the same time. It’s better to choose one.
This may be an extra task, but I suggest we convert not all but the required one from the class components to functional components. Luckily, this is not a time-consuming task because, to my knowledge, all the BookBrainz entity pages are written using functional components; there are only several pages that need these changes
for example, the index page . This will allow us to use hooks in React and avoid some Higher Order Components provided by react-i18next, like withTranslation and withSSR.
Here is an example of t function and use translation hooks being used in index.js page in Bookbrainz after converting it from class component to functional component
This is it for using translations in the frontend , we can change our way displaying the translations as per our need like if we need to display an dynamic value we can change our translation and write it inside t function like this
Translation Keys
Sample
How will we handle grammatical adjustments in case of plurals?
We know that without pluralization, you might end up with awkward or grammatically incorrect translations because pluralization in internationalization (i18n) helps create more natural language translations by allowing your application to adapt to different grammatical rules for plural forms in different languages.
But the problem here is that the plural forms depend on language, for instance, in English, you typically use a singular form when the count is 1 (“1 message”) and a plural form otherwise (“2 messages”). However, other languages may have different rules. For example, in Russian, there are three forms: singular, plural for numbers ending in 2, 3, and 4 except for numbers ending in 12, 13, and 14, and a general plural form for other numbers (“1 сообщение”, “2 сообщения”, “5 сообщений”, etc.). Similarly in arabic there are 5 plural forms , so if we are planning to add these languages in Bookbrainz we need a way to handle these type cases in translation file
In I18next we can handle cases like this by mixing pluralization with interpolation
For example
Now these translation can be used in our react component
Now thats a simple case in english but what about in languages with multiple plural forms like, arabic which have 5 plural forms , in that cases we can format our translation something like this
Keys
Sample
This may seem very redundant to write. It would be better if we were able to condense all of this into a single line in our translation keys,
Well we can , all we had to use i18next intervalPlural plugin and format above translation keys to something like this
( The code sample in this section are copied from i18n documentation , Link to documentation )
Working Prototype
To demonstrate the method described above for setting up internationalization in Bookbrainz, I created a single-page prototype that adds support for English and Spanish languages on the homepage of Bookbrainz. Everything is set up as explained above, and the code can be found in the GitHub link provided. Please remember that this is just a single-page prototype, and setting up a multi-page setup still requires some adjustments, which will be discussed shortly.
Video Link : youtube link
Github Link : Github link to my branch
How will a user change the language ?
For Changing the language we need a way for user to send a request with desired language in it (for example, in query params) to the server routes in return the routes will extract the requested language code and load the translations for that language and send them to the client where they will displayed as usual.
For user to change language we can use a drop down menu selector with all the supported language
Something like in Musicbrainz
Let say the user wants to set the language of to spanish and sends a request to server with query params set then we have to modify our existing code something like this
Extracting the requested language from query params and changing the language accordingly
After updating the language, we’ll store the new preference within cookies. This will enable i18n to detect the user’s preferred language and automatically adjust the website language accordingly during subsequent visits.
Now our i18n configuration also need some changes so it can detect the user preferred language from cookies
Do you remember the middleware we used before ? the i18n-http-middleware , it also provides us with a plugin to detect user language from path, cookie, header, queryparams, session from req . using this feature will require some small modification in our i18n server instance
The next time a user visits the site, it will load with the language that was set by the user during their previous visit.
Setting up new Weblate Project
Luckily for us we already have a hosted Weblate server
Add Bookbrainz as a new project (the screenshots are from my local instance for demonstration purpose)
Adding folders (components) containing the translation files (.json files). Let’s say our translations are located in the path ./src/public/locales. We can name our components as ‘locales’, but we always structure our translation files into different components according to our needs. For example, we can have a separate component for server messages and another one for client-side strings.
While creating new components weblate provide us with loads of configuration options for example
- we can choose a different git repo to push the translation changes
- or may be a different branch
- we can also define the file mask for our translation file path
- how new translations should be handle
- Using template for new translations
….and many more like
And our new component with supported languages are ready to be translated in our project
The translator page will look something like this (looks familiar right ? maybe same as Musicbrainz but now also available for Bookbrainz !)
Similarly we can set up other components in our project each consisting there own translation files
Continuous Translation Integration
If someone is new to weblate he may wonder, how will the translations move from weblate to our project after the translator has done their work ? It’s quite simple just like github we make the changes in code , commit them , and push them to the upstream similarly just replace the word code with translation , where admin can review the changes and merge them with the source repository
Timeline
- 1 May - 26 May : Community bonding period
- Continue reading the weblate and i18n docs exploring more options which fits better with our use case
- Have a closer look at Musicbrainz’s internationalization and translation
- 27 May - 2 june : coding period ( Week 1 )
- Setting up initial translation folder with english as the initial language , also adding some translation keys for testing purpose
- Setting up i18next server and client instance
- Refactor a single react class component page into a functional component page
- Testing the i18next server instance with a single route of the refactored Page . Also fixing bugs in the process if any
- 3 june - 9 June : coding period ( Week 2 )
- Setting up i18n on server for multi page access and adding the language changing mechanism
- Testing the language changing mechanism with initial setup after setting them up with three or four pages . Also fixing bugs in the process if any
- 10 June - 16 June : coding period ( Week 3 )
- Setting up i18n instance with leftover pages
- 17 June - 30 June : coding period ( Week 4 , Week 5)
- Converting the pages with react class components into functional components
- Begin working on static client-side strings, page by page.
- adding translation keys and their English translations, while also addressing cases of pluralization and gender.
- 1 July - 14 July : coding period ( Week 6 , Week 7 )
- Begin working on strings we are receiving from the database ,for example , relationship-type descriptions.
- Concurrently converting the pages to functional component if needed
- 15 July - 21 July : coding period ( Week 8 )
- Buffer week
- 22 July - 28 July : Midterm evaluation
- 29 July - 4 Aug : coding period ( Week 9 )
- Locally setting up weblate for testing purpose
- Creating a new project project in weblate and add the existing component
- Configuring Weblate
- Testing the weblate Continuous Translation Integration by making a dummy user in my local repo
- If everything works fine , copying the weblate project into metabrainz’s hosted weblate server and testing it with a separate branch in Bookbrainz repo
- 5 Aug - 11 Aug : coding period ( Week 10 )
- Buffer week
- 12 Aug - 18 Aug : coding period ( Week 11 )
- Finish all started tasks and ensure that everything is in a working state
- Write a Blog post about this project
- Final submission and evaluation
About me
My name is Tarun Meena, and I am currently pursuing my Bachelor of Technology in Computer Science and Engineering from the National Institute of Technology Patna. I am currently in my pre-final year.
During my freshman year, I studied the C programming language. In my sophomore year, I completed courses in Java, Database Management Systems, Operating Systems, and Computer Networks & Their Protocols.
Although my college studies have played a vital role in developing a strong understanding of fundamental computer science concepts, most of my skills related to web development have been acquired through self-learning. I have been learning and practicing full-stack web development with JavaScript/TypeScript and their libraries for almost a year now. During this period, I have completed some small projects, which can be found on my GitHub profile.
As for my hobbies, there are not many, but some of them include weightlifting, playing soccer, and occasionally playing video games to unwind.
Open Source Contribution
It’s not been long since i got to know about open source and how can we contribute in these organization and to begin with I started by open source journey with Bookbrainz in december 2023 , since then I learned a lot of things from frontend to backend as this was my first time looking at large scale application codebase
During this period I also worked on some bugs in the Bookbrainz-site repo . My PRs link is provided below
My PRs : Pull requests · metabrainz/bookbrainz-site · GitHub
Other Information
- What type of music do you listen to ?
I like slow instrumental jazz and calm old hindi classics like lag ja gale from Lata Mangeshkar and Kuch na kaho by singer Kumar Sanu , also I like to hear Ed Sheeran his Thinking out loud is one of my favorite and the list goes on
- What aspects of MusicBrainz/ListenBrainz/BookBrainz/Picard interest you the most ?
Since I joined the Metabrainz community, most of my time has been spent on BookBrainz. As a developer, the aspect I love most is how well-organized the codebase is, with every part of the website linked to each other. Additionally, we have a super helpful and friendly community of other developers.
- Have you ever used MusicBrainz Picard to tag your files or used any of our projects in the past?
Unfortunately, I haven’t had the opportunity to explore MusicBrainz Picard yet, but I’ve heard so much about it. I definitely want to give it a try.
- When did you first start programming?
Even though I had been using computers since I was 10 years old, I wrote my first code in school when I was in 8th grade, back in 2016. It was in the C language. At that time, my interest in coding was not very strong, but gradually, as I learned more, it has been increasing ever since.
- What sorts of programming projects have you done on your own time?
Coin-info: a website made with react (typescript) which gives you real time market information of Bitcoin
Live link : https://coin-master-ten.vercel.app/
Github: GitHub - Tarunmeena0901/KoinX
Coursii : a full stack application made with react , express and mongodb
- What computer(s) do you have available for working on your SoC project?
I owns a 16gb RAM , 1 TB SSD , Ryzen 7 5800H , RTX 3060 6gb gpu , 144 hz refresh rate HP OMEN laptop
- How much time do you have available per week, and how would you plan to use it?
During the time period of this project I will be having my long summer vacation and no other commitments so I can easily devote maximum 30 hours per week which i think should suffice for this project as our main goal is full translation project and workflow set up, with as much as possible of the website text captured for translation , However, I am ready to stretch the goals of this project and work on it further to increase the translation coverage or maybe full internationalization in the upcoming post GSoC time.