Set up BookBrainz for Internationalization
Contact Information
- Name: Ashutosh Vishwakarma
- Nickname: Ashu
- IRC nick/Matrix handle: @ashutosh_things:matrix.org
- Email: ashutoshv9648@gmail.com
- GitHub: AshutoshThings (Ashutosh V) · GitHub
- Time Zone: UTC +5:30
About Me
I am Ashutosh Vishwakarma, a second-year Computer Science undergraduate at Jaypee Institute of Information Technology, Noida, India. I love reading books and listening to music. BookBrainz stood out to me as the perfect project because it is the open Wikipedia of published literature. I am an active MetaBrainz contributor with some merged PRs such as: BB-882, LB-1913, and LB-1893.
Timing Conflicts
I have full summer break during the GSoC coding period with no classes, exams, or other commitments. So, I can easily dedicate 35-40 hours per week to the project.
Proposed Project
This proposal adds a complete, SSR-compatible internationalization system to BookBrainz using i18next + react-i18next and integrates it with the central MetaBrainz Weblate instance. It will deliver per-request server-side translation loading, consistent client hydration with no mismatch, a language switcher in the navbar, automated extraction of static UI strings, and automated extraction of database type strings using stable primary-key IDs, all without any schema changes. Dynamic language switching uses a bb_lang cookie + client-side changeLanguage. Initial languages: en + hi (with full support for complex grammar/plurals and ready for RTL languages), and the language support can be scaled up later.
1. Problem Statement
BookBrainz currently has no unified internationalization system. From exploring the codebase and building working PoCs, I identified these concrete issues:
1.1 Hardcoded UI Strings
Most React components render English strings directly. There is no standardized way to extract or translate them.
1.2 SSR Hydration Constraints
BookBrainz uses server-side rendering with hydration:
- Server calls
ReactDOMServer.renderToString(<Layout><Page/></Layout>) - Client reads
#propsJSON and callsReactDOM.hydrate()
If translations load asynchronously on the client, the server renders English while the client renders translated content later. This causes hydration mismatches and visible English content on hard refreshes (especially on multi-entry controller routes).
1.3 Database-Driven UI Text
A large part of the UI comes from PostgreSQL (RelationshipType, IdentifierType, FormatType, etc.). These strings are not translatable today and are not connected to any localization system.
1.4 No Translation Workflow
There is no extraction pipeline for developers and no integration with Weblate, so translators have no frictionless way to contribute.
2. Goals
- Introduce a consistent i18n architecture that works with the existing SSR pipeline (
src/server/app.js→generateProps→renderToString→#propsJSON →hydrate). - Translate both static UI strings and system-defined database content.
- Establish a continuous localization workflow via the central MetaBrainz Weblate instance.
- Keep the solution incremental, maintainable, and non-invasive.
3. The Existing Architecture
- Server: Route handler →
generateProps(req, res, data)(src/server/helpers/index.ts) →ReactDOMServer.renderToString→target()template. - Client: Reads
#propsJSON → reconstructs the exact same React tree →ReactDOM.hydrate.
src/server/helpers/i18n.ts already contains parseAcceptLanguage and getAcceptedLanguageCodes. My proposal extends that file. Translations must be available synchronously on the server so the rendered HTML already contains the correct language. The same translation data must be injected into client props for consistent hydration.
4. Architectural Decisions
I built a working Proof of Concept directly on the BookBrainz repository. The PoC implements middleware-based language detection, per-request i18n instance creation, resource pre-loading, and a functional language switcher with zero hydration mismatch.
- Live PoC code: View on Gist
- Live PoC video: View on Google Drive
The PoC validated the full end-to-end flow (SSR correctness, dynamic switching, no hydration mismatch) but used a slightly duplicated instance-creation pattern for rapid prototyping. My final proposal refines this into a clean single-source-of-truth Express middleware design (eliminating duplication while preserving everything demonstrated in the PoC).
5. Proposed Architecture
5.1 Server-side i18n Pipeline
I will use a single source of truth for the i18n instance: the Express middleware. Files:
src/common/helpers/i18n.ts(shared factory)src/server/middleware/i18n.ts(Express middleware)src/server/helpers/renderI18nPage.ts(reusable SSR wrapper)
Middleware (src/server/middleware/i18n.ts):
import i18next from 'i18next';
import i18nextMiddleware from 'i18next-http-middleware';
import FsBackend from 'i18next-fs-backend';
export function i18nMiddleware() {
const i18nInstance = i18next.createInstance();
i18nInstance
.use(FsBackend)
.use(i18nextMiddleware.LanguageDetector)
.init({
fallbackLng: 'en',
supportedLngs: ['en', 'hi'], // initial list; can be scaled via Weblate
preload: ['en', 'hi'],
ns: ['common', 'pages', 'editor', 'entities', 'db_types'],
defaultNS: 'common',
backend: { loadPath: 'public/locales/{{lng}}/{{ns}}.json' },
detection: { order: ['cookie', 'header'], caches: ['cookie'], lookupCookie: 'bb_lang' }
});
return i18nextMiddleware.handle(i18nInstance);
}
This middleware is mounted in src/server/app.js early (after cookie-parser, before routes). renderI18nPage uses req.i18n directly. Effectively, it removes duplication and makes the middleware the single source of truth.
Client-side Hydration:
The PoC includes a client initialization helper (initClientI18n / hydrateWithI18n-style function) that uses the same i18nResources data injected via props. This ensures identical server/client output.
Sample Namespaces:
| Namespace | Content | Used by |
|---|---|---|
common |
Navbar, footer, shared UI | Every page (Layout) |
index |
Homepage strings | Index page |
statistics |
Statistics tables & headings | Statistics page |
entities |
Entity display strings | All entity routes |
db_types |
Relationship Type, Identifier Type, etc. | Editors & forms |
5.2 Language Switching & RTL/BIDI
Navbar dropdown calls i18n.changeLanguage(lang), updates the bb_lang cookie, and sets document.documentElement.dir and lang. Note: after a client-side switch the current page HTML remains in the old language until the next navigation. SSR correctness is restored on the subsequent request via the cookie. I will replace directional CSS properties with logical ones (margin-inline-start, border-inline-end, text-align: start, etc.) in the existing stylesheet.
5.3 Interactive UI (Editors & Forms)
- I will use
t()or<Trans>for form labels, placeholders, and validation messages. - Relationship editor dropdowns will use DB translation keys.
- Error messages and inline validation will be localized.
- Backend boundary: All public APIs will remain English-only. Since translation is strictly a UI concern, no backend data model or API response changes will be made.
6. Database Translation Strategy
Database values cannot safely be used as translation keys. My approach is to use the database primary key (UUID) as the stable translation key, while storing the English name as the value.
Example (public/locales/en/db_types.json):
{
"relationshipTypes": {
"b84260f8-bd03-4c9f-9aeb-a2b1f86dcff5": "Author of",
"d6b5fc42-4f36-47b7-bd28-204128038743": "Translator (Guest)"
}
}
React usage:
t(`db_types:relationshipTypes.${type.id}`, { defaultValue: type.name })
Extraction script excerpt:
// querying ORM models, using UUID as key, and writing to public/locales/en/db_types.json
for (const row of results) {
const key = row.id;
enTranslations[key] = row.name;
}
7. Translation Workflow
- Developers write normal React code with
t()or<Trans>. i18next-parser(pre-commit hook + GitHub Action on push to master) scans and updatespublic/locales/en/*.json.- On merge, Weblate ingests the English files and opens PRs with translations. Translations are committed back via Weblate’s GitHub App.
Weblate Integration:
- I will register BookBrainz as a new project/component on the central MetaBrainz Weblate instance.
- File:
public/locales/*/(*.json) - Source language: English
- Translation files are JSON (i18next format)
- GitHub App is already configured for other MetaBrainz projects, therefore I will use the same webhook flow.
- I will add a protected endpoint
POST /admin/i18n/clear-cache(guarded byX-Weblate-Tokenheader matchingI18N_CACHE_SECRETenv var) so Weblate can trigger cache refresh after commits. - I will configure Weblate to automatically create PRs on the master branch.
- Just in case, If the Weblate registration is delayed, I will try to fall back to manual JSON commits for the first 4 weeks.
- Also, I will discuss the final file-format choice (JSON vs.
.poas MusicBrainz Server uses it) with the mentor during the bonding period and adjust the extraction pipeline if.pois preferred for cross-project consistency.
8. Risks & Mitigation
- SSR Performance & Cache: In-memory map per language + namespace (reloaded on process restart).
- Cache invalidation: Protected
/admin/i18n/clear-cacheendpoint. - Hydration Mismatch: Identical resources injected via props +
partialBundledLanguages: true. - Translation Key Drift: CI validation on every build.
- Partial Translations:
defaultValuefallback to English. - UI Breakage (text expansion / RTL): CSS logical properties + German pseudo-locale stress testing.
9. Migration Strategy
I will incrementally migrate the existing routes to renderI18nPage. Old route handlers remain untouched until replaced.
Controllers/routes to migrate (core only):
- Index, Statistics, Static pages - Week 2
- Edition display - Week 6
- Author & Work display pages - Weeks 8-10
- Editor components (relationship editor, entity editor) - Weeks 5-7
Total core routes affected: 6 controllers + ~25 React components. All remaining routes (Publisher, Series, other entities, Search, Collections, Revisions, and advanced relationship editing) are explicit stretch goals.
10. Implementation Plan
| Week | Milestone | Deliverable | Files Changed |
|---|---|---|---|
| 1 | Core helpers & middleware | Add helpers from PoC + middleware. Mount in app.js. |
3 new files + app.js |
| 2 | renderI18nPage integration |
Index, statistics, static pages. Verify hydration. | renderI18nPage.ts + 3 routes |
| 3 | Layout & Language Switcher | Navbar dropdown + cookie + RTL. | client/layout + CSS |
| 4 | Index + Statistics pages | Full translation. | locales/ + components |
| 5 | DB type extraction & CI | Script + GitHub Actions + pre-commit hook. | new script + workflow |
| 6 | One major entity (Edition) | Edition display page. | entity/edition.js |
| 7 | Weblate + extraction pipeline | i18next-parser + webhook. Weblate component registered + test PR merges cleanly on master. | config + GitHub App |
| 8-10 | Testing, plurals, RTL, core documentation | Hindi plurals, RTL testing, full documentation + Author & Work display pages fully translated. | docs/ + tests + entity/author.js + entity/work.js |
Everything after Week 10 (full Author/Work pages, advanced relationship editing, Search, Collections) will be an explicit stretch goal. If SSR mismatches appear, I will keep the original route handlers untouched while rolling out renderI18nPage incrementally.
11. Final Deliverables
- SSR-safe i18n integrated into the existing rendering pipeline. Working language switcher (dynamic + cookie).
- Automated static + DB-type extraction pipelines (CI + pre-commit). Weblate continuous localization loop (including cache-clear webhook).
- Core pages fully translated (Layout, Index, Statistics, Edition read-only). Complete developer and translator documentation.
12. Why This Approach
My proposed design uses a single i18n instance across the entire request, extends the existing src/server/helpers/i18n.ts, integrates perfectly into the SSR pipeline without schema changes, and uses stable UUID keys for DB strings. English remains the single source of truth.
Community Affinities
I love reading a mix of historical fiction, philosophical literature, and classic allegories. Some of my favorite books include:
- A Thousand Splendid Suns (Khaled Hosseini) - BBID: ad011741-edb6-45c8-b2e7-55af23fe1cc9
- Man’s Search for Meaning (Viktor Frankl) - BBID: 0e652b44-2624-48b7-b7fb-bfde6aca7b13
- Animal Farm (George Orwell) - BBID: 691ee8e0-01a8-4a6b-b17f-cf8a919f865b
- The Alchemist (Paulo Coelho) - BBID: 33288f9a-b82f-4153-911d-974ebcd33505
I’m also an avid music listener, and I often listen to artists like:
- Beach House (dream pop) - MBID: d5cc67b8-1cc4-453b-96e8-44487acdebea
- Foster the People (indie pop) - MBID: e0e1a584-dd0a-4bd1-88d1-c4c62895039d
Practical Requirements
I have an Apple MacBook Air M4 (16 GB, 256 GB). I will work 35-40 hours per week using the entire summer break exclusively for this project.
AI Usage Disclaimer: I used AI for rectifying the grammatical errors, markdown formatting, and for a bit of rewriting for suitable vocabulary to express the idea better.