Public-Data Translation Cache and Preserving Source Text
I separated UI strings, curated translations, machine translations, and Korean fallback
Hangangjari needs to show Korean public data in multiple languages. Translating only app buttons and tab names was not enough. What users actually see includes event names, facility names, notices, and crowding messages that arrive from sources each time.
Imagine a non-Korean speaker looking at Yeouido Park information. If buttons are in English but event names are empty, facility names are awkwardly translated, or the original notice cannot be found, the whole app feels unstable. This user does not see the translation pipeline. They only see whether a blank appears and whether the original text is preserved naturally.
So Hangangjari separated fixed UI strings from public-data text that changes on every collection run. UI strings change with releases, while public-data source text changes whenever sources are collected. Treating them the same way made both sides harder to handle.
Two kinds of translation
| Area | Source | When it changes | Storage |
|---|---|---|---|
| App UI strings | String Catalog | App release | iOS resources |
| Public-data text | Korean source text | Source collection | Postgres |
| Curated/official translation | Human-reviewed file | Review time | JSON files |
| Machine translation | Korean source-text hash | Translation job after collection | Translation cache |
UI strings are key-first. Code contains keys rather than human-readable phrases, and actual text lives in the String Catalog.
Public data is the opposite. The source text comes first. If that text is lost, source verification, retranslation, and fallback to original text all become difficult.
In public data, the Korean original is not merely fallback text. It is closer to the original record that can be checked later. Translation is the display value. Even if a translation is awkward or missing for a language, keeping the source text and source metadata lets me retranslate it or override it with a curated file.
Translation looks for reviewed values first
Hangangjari’s public-data translation does not trust only one source. It sets language-specific priority and finds the most reliable value to show.
flowchart TD
Request["Korean source text and display language"] --> Official{"Official translation exists?"}
Official -->|Yes| UseOfficial["Show official translation"]
Official -->|No| Curated{"Curated/edited translation exists?"}
Curated -->|Yes| UseCurated["Show curated translation"]
Curated -->|No| Machine{"Cache has<br/>this language?"}
Machine -->|Yes| UseMachine["Show cached translation"]
Machine -->|No| Korean["Show Korean source text"]
This order is connected to app trust. Official or curated translations take priority over machine translation. If machine translation is missing or risky, the display falls back to the Korean original.
Empty strings or keys are not shown on screen. Even when the Korean original is used as fallback, it should look intentional to users.
Translation cache fills only missing languages
The machine-translation cache uses a hash of the Korean source text as the key. If the same string appears in events, facilities, parking lots, or parks, it only needs to be translated once.
One cache row contains the original text, translated values by language, translation engine, success state, and created/updated times. What matters is not “is this row done?” but “which languages are missing?”
If a string has English and Japanese filled but Traditional Chinese missing, the next translation job should translate only the missing language.
sequenceDiagram autonumber participant Collector as Missing-translation collector participant Store as Translation cache participant Provider as Translation provider Collector->>Store: Find Korean source text with missing languages Store-->>Collector: Source text to translate Collector->>Provider: Translate only missing languages Provider-->>Collector: Translations by language Collector->>Store: Store without deleting existing languages
This prevents one language’s failure from becoming a permanent blank. It fills only the missing parts without deleting curated values or other language values.
Risky translations are not stored
Machine translation is not stored blindly. It needs minimum guardrails.
- Do not store it if the source language remains unchanged.
- Leave very long source text as original instead of spending translation quota.
- Do not delete existing values when a language has no value.
- Proper nouns in curated/official files take priority over machine translation.
Machine translation cannot be guaranteed to sound natural. Instead, I avoid storing risky translations and keep a path for curated files to override them.
Proper nouns such as place names, facility names, and event names feel awkward immediately when translated badly. So there had to be a way for people to override and review values beyond the cache.
The API chooses display text
The client does not read the translation DB directly. The backend receives a language hint and returns display text in a LocalizedText shape.
flowchart TB KRText["Korean source text"] --> Resolver["Choose translation"] Assets["Official/curated translation files"] --> Resolver Cache["Machine-translation cache"] --> Resolver Resolver --> DTO["API display field"] DTO --> Client["iOS app and widgets"]
This means iOS does not need to know the translation storage. The app only reads the display value chosen by the backend and whether source text was used as fallback.
Korean source text stays until the end
The original text is both fallback when translation is missing and evidence that can be revisited later. In a public-data app, losing the original makes it hard to explain where text came from.
So public data preserves Korean source text and source metadata, while translation is treated as the display value. Translation failure must not become collection failure.
This separation also helped later fixes. If a translation is strange, the display text can be corrected. If the source itself changed, the collection/normalization path should be checked. Keeping the original lets the problem be split again.
A structure that does not show blanks
UI strings and public-data translation had to be handled separately. Public-data source text remains the most reliable original, and curated/official translation files take priority over machine translation.
The translation cache had to look not only at strings, but also at missing languages per string. When translation fails, the app intentionally shows the original instead of a key or blank. Keeping clients away from direct translation-storage reads also made API responses simpler.
The lasting choice in multilingual handling was not translating every piece of text perfectly. It was preserving the original, leaving a path to replace translations with better ones, and making sure users do not see empty spaces.
Share
No comments yet. You can leave the first one.
Pending review