My team and I were presented with an interesting challenge. We were in the process of changing our source-of-truth for all of our i18n translations from hard-coded JSON files to auto-generated JSON files that are pulled in from Phrase, a popular 3rd party internationalization service.
Heres the challenge: We also decided to take this opportunity to completely rewrite the translation keys of our codebase to be cleaner and easier to reason-through.
This task required some major teamwork and several of my co-devs took on different sides of this task. In this article I’ll mostly be talking about my work on this task and some of the clever automation that went into it.
But first, let me go over what went into this before it landed in my hands. Two of the developers on my team had the job of parsing and thinking-through our existing translation schema of just under 500 translation keys. The decision was made to namespace the new translations roughly based on the page its used in, section of the page, and so-on if further nesting is necessary. This means many translations would need to be duplicated, removed, or moved to fit this new schema.
In my opinion, I think they had the more difficult task as it required a lot of conceptualization, verification, and slow progress as they worked through each translation by-hand.
The original translation files may have looked something like this:
As you can see even from this small snippet, it’s lost a lot of standardization after years of incremental modifications from dozens of developers, sometimes on completely separate frontend teams.
With that said, at the end of the day the two developers produced a new, clean, understandable JSON that could be used as our starting base for updating our codebase and imports into Phrase.
Now it was time for myself and the developer I was pairing with to take this new translation JSON and begin replacing every original translation key-string with it’s new replacement counterpart.
A very simple, near-identical replacement would look something like this:
If you’re curious about the
t() method coming off of the
useTranslation() hook, check out the next-i18n documentation! In short, this is how we reference and access the translation text within the scope of the current language and country locale.
This task felt ripe for automation! Before doing any serious work, my co-dev and I did some digging into our codebase to identify the scope of our task as well as any “weird stuff” we might run into. Using some classic RegExing, we found that we had around 18,000 relevant files between two repos that we had to care about, as well as just under 750 translation usages scattered within those files.
We also discovered a slight issue: all of our original translation JSON files combined actually produced more like 700 different translations, not the initial 500 that the previous developers worked off of. Oops! It’s alright though, because of the standard that was created by the previous devs, it didn’t take long to create new namespaces for the missing translations.
With the initial scouting out of the way, The solution we came up with to automate our task was to write a JSON tree walking script that would build the string paths for every translation and tie it to its rendered english text. We’d do this for both of the translation files (one being the combined original translation JSON file and the second being the newly created Phrase translation JSON file). After that, we’d use the rendered text kind of like a join between the original and the replacement keys.
As an example, each reference would look something like this:
After we have this massive collection of matching counterparts, we can create another script that will take every original translation, recursively step through the codebase directory-by-directory and anytime it encounters a relevant file type, run a find-replace regex to look for anywhere it’s using the i18n
t() method. If the original translation key inside a
t() is found, it’ll swap it with the replacement key!
To recap, the plan looked like this:
1: Write the script that will walk through our JSON files.
2: Use that script to generate the fully-encompassed key references for each of the translation files, and tie them together as
replacement based on the rendered text.
3: Use the collection of original and replacement keys to globally walk through the codebase files, and find-replace each instance of an original key with it’s counterpart replacement.
1: JSON Tree Walking Script
2: Generate Joined Key References
For this step, I used the the json-tree-walker tool to have it traverse both the original translations JSON file and the replacement JSON file. At each step, it’d concat the key to a string that was the path down the JSON to get to its current point. After it hit a string, it’d add that key to the path string, then add it to our collection of joined translations based on the rendered text of that translation. (See the previous code example for a reference of what that object looks like for each item).
However, I ran into a tricky problem. It turns out that many of our rendered translation texts are re-used in many places in both the original and the replacement JSON files. This is obviously intentional, so I had to write an additional portion to this script to handle this case. For example, the word “cancel” would be used in many different places of the website, therefore, it would have many different translations that have “cancel” as their rendered text. This caused a problem when trying to use rendered text as the joiner between original/replacement translation keys. I had to add another step to our process.
2.5: Collect duplicate strings for manual intervention
In order to accomplish this, while walking the JSON I’d store all of the found keys into a temporary array for both the original and replacements. At this point the script wasn’t smart enough to decide which ones link to which, so I decided to implement a manual intervention system. I used the Inquirer tool to provide clean and easy CLI option selections. If a translation had more than 1 original translation OR more than 1 replacement translation, I’d loop through the originals, add a question for each original and the options would be all of the replacements.
Let’s say for our original translations we had
common.Cancel. For the replacement options we have
Homepage.Feedback.Cancel. Well, if you’ve analyzed which should go with which, you’ll notice we have 3 distinct scenarios. The account deletion page (that’s an easy one), the feedback page, and a catch-all “common” section which is removed in the new phrase translations. Kind of tricky to decide, huh? This would be one of many that would require investigation to know for certain. In the CLI, it’d present to you the value of
accountDelete.CancelText and the options of
Homepage.Feedback.Cancel. After you selected an option, then it’d do likewise for
After the script has generated all of the automatic matches (where there was only a 1-to-1 match), generated all of the manual intervention questions, and after the developer manually selected all of the remaining matches, the script would create a new JSON file with a mapping of all original translation keys to their replacement counterparts. At this point, the hard work was mostly done!
As for what all of this could look like in code, here’s a snippet:
(Note: It’s not a small amount of code and it’s very congested, so take your time if you want to understand it. Otherwise feel free to skip this bit)
3: Walk through the codebase and replace every original key with it’s replacement counterpart
This step is pretty simple, but it entailed some tricky recursion on top of the file-system. A big benefit to saving the results from the previous step to another file instead of immediately rolling into using the data to swap translation keys was that we noticed a few errors and a few missing keys that were used in the codebase, but weren’t in any original translation file (oops!) With this technique, we could easily modify the
joined_translations.json file and know it’d be fixed for all future-cases too!
After we were confident in our
joined_translations.json file, I could pull it into a new script and it’d serve as the new counterparts collection for every replacement we wanted to perform.
Just as a reminder, this is what our joined translations file would have looked like:
The new script would allow me to pass in the root path of the codebase I want to walk down and the path to the counterparts collection. That’s all the information it needs! From there, it could loop over every counterparts object, then recursively walk down the codebase. Every time we encounter a file, I’d have it open the file’s content, execute a find-replace, and then overwrite the file with the new content!
This script can run fairly quickly. For all~18,000 relevant files, it would take around 180–250 seconds to execute. However, I could have made this run much faster by changing the order of the process. The mistake was that I had this script run through the codebase from top to bottom for every translation. Meaning if we had 500 translation counterparts, we had to walk through the entire codebase 500 times. Whoops! We could have fixed this by having it run through the codebase once, and every time we encounter a file, do all of the find-replaces for every translation for that file all at once, then move on.
The code for this step looked similar to this:
Everything worked smoothly overall! By the end of the process we had around 700 translations that changed within the codebase. The remaining were special cases or non-standard usages of the
t() method which had to be fixed manually. In this codebase we also implemented lots of unit-tests, so many of those had to be updated to handle the new translation keys.
By the end of the process we figure we saved many days of tedious, manual labor by automating as much of this process as possible. That’s certainly something I’m proud of!