Public-Data Translation Cache and Preserving Source Text

I separated UI strings, curated translations, machine translations, and Korean fallback

Hangangjari needs to show Korean public data in multiple languages. Translating only app buttons and tab names was not enough. What users actually see includes event names, facility names, notices, and crowding messages that arrive from sources each time.

Imagine a non-Korean speaker looking at Yeouido Park information. If buttons are in English but event names are empty, facility names are awkwardly translated, or the original notice cannot be found, the whole app feels unstable. This user does not see the translation pipeline. They only see whether a blank appears and whether the original text is preserved naturally.

So Hangangjari separated fixed UI strings from public-data text that changes on every collection run. UI strings change with releases, while public-data source text changes whenever sources are collected. Treating them the same way made both sides harder to handle.

Two kinds of translation

AreaSourceWhen it changesStorage
App UI stringsString CatalogApp releaseiOS resources
Public-data textKorean source textSource collectionPostgres
Curated/official translationHuman-reviewed fileReview timeJSON files
Machine translationKorean source-text hashTranslation job after collectionTranslation cache

UI strings are key-first. Code contains keys rather than human-readable phrases, and actual text lives in the String Catalog.

Public data is the opposite. The source text comes first. If that text is lost, source verification, retranslation, and fallback to original text all become difficult.

In public data, the Korean original is not merely fallback text. It is closer to the original record that can be checked later. Translation is the display value. Even if a translation is awkward or missing for a language, keeping the source text and source metadata lets me retranslate it or override it with a curated file.

Translation looks for reviewed values first

Hangangjari’s public-data translation does not trust only one source. It sets language-specific priority and finds the most reliable value to show.

flowchart TD
  Request["Korean source text and display language"] --> Official{"Official translation exists?"}
  Official -->|Yes| UseOfficial["Show official translation"]
  Official -->|No| Curated{"Curated/edited translation exists?"}
  Curated -->|Yes| UseCurated["Show curated translation"]
  Curated -->|No| Machine{"Cache has<br/>this language?"}
  Machine -->|Yes| UseMachine["Show cached translation"]
  Machine -->|No| Korean["Show Korean source text"]

This order is connected to app trust. Official or curated translations take priority over machine translation. If machine translation is missing or risky, the display falls back to the Korean original.

Empty strings or keys are not shown on screen. Even when the Korean original is used as fallback, it should look intentional to users.

Translation cache fills only missing languages

The machine-translation cache uses a hash of the Korean source text as the key. If the same string appears in events, facilities, parking lots, or parks, it only needs to be translated once.

One cache row contains the original text, translated values by language, translation engine, success state, and created/updated times. What matters is not “is this row done?” but “which languages are missing?”

If a string has English and Japanese filled but Traditional Chinese missing, the next translation job should translate only the missing language.

sequenceDiagram
  autonumber
  participant Collector as Missing-translation collector
  participant Store as Translation cache
  participant Provider as Translation provider

  Collector->>Store: Find Korean source text with missing languages
  Store-->>Collector: Source text to translate
  Collector->>Provider: Translate only missing languages
  Provider-->>Collector: Translations by language
  Collector->>Store: Store without deleting existing languages

This prevents one language’s failure from becoming a permanent blank. It fills only the missing parts without deleting curated values or other language values.

Risky translations are not stored

Machine translation is not stored blindly. It needs minimum guardrails.

  • Do not store it if the source language remains unchanged.
  • Leave very long source text as original instead of spending translation quota.
  • Do not delete existing values when a language has no value.
  • Proper nouns in curated/official files take priority over machine translation.

Machine translation cannot be guaranteed to sound natural. Instead, I avoid storing risky translations and keep a path for curated files to override them.

Proper nouns such as place names, facility names, and event names feel awkward immediately when translated badly. So there had to be a way for people to override and review values beyond the cache.

The API chooses display text

The client does not read the translation DB directly. The backend receives a language hint and returns display text in a LocalizedText shape.

flowchart TB
  KRText["Korean source text"] --> Resolver["Choose translation"]
  Assets["Official/curated translation files"] --> Resolver
  Cache["Machine-translation cache"] --> Resolver
  Resolver --> DTO["API display field"]
  DTO --> Client["iOS app and widgets"]

This means iOS does not need to know the translation storage. The app only reads the display value chosen by the backend and whether source text was used as fallback.

Korean source text stays until the end

The original text is both fallback when translation is missing and evidence that can be revisited later. In a public-data app, losing the original makes it hard to explain where text came from.

So public data preserves Korean source text and source metadata, while translation is treated as the display value. Translation failure must not become collection failure.

This separation also helped later fixes. If a translation is strange, the display text can be corrected. If the source itself changed, the collection/normalization path should be checked. Keeping the original lets the problem be split again.

A structure that does not show blanks

UI strings and public-data translation had to be handled separately. Public-data source text remains the most reliable original, and curated/official translation files take priority over machine translation.

The translation cache had to look not only at strings, but also at missing languages per string. When translation fails, the app intentionally shows the original instead of a key or blank. Keeping clients away from direct translation-storage reads also made API responses simpler.

The lasting choice in multilingual handling was not translating every piece of text perfectly. It was preserving the original, leaving a path to replace translations with better ones, and making sure users do not see empty spaces.

Share

Share

Image preview