Running a Mini-PC Server Like a Real Service
A retrospective on separating user request paths, GitOps state, observability, and backups
Hangangjari’s backend runs on a mini PC. Written that way, it can sound like a hobby server. But once an App Store app starts calling that API, the story changes.
Users do not know where the server is. If parking information is stale or notifications arrive late, that is not the circumstance of a “small personal project.” It is the moment the app becomes hard to trust. This changed how I thought about the system.
”It runs” was not enough
At first I had a “keep it simple because it is a personal project” mindset. If the API and workers ran, DB and Redis were connected, and the app received responses, that looked sufficient.
Once users arrive, the question changes. It is not “does it run?” It is “when something breaks, where can I look first?”
flowchart LR Client["iOS app / widget"] --> Public["User request path<br/>edge / ingress"] Public --> API["API service"] API --> Data["Postgres / Redis"] Workers["Workers<br/>ingestion / forecast / push"] --> Data Data --> API Runtime["GitOps state"] --> API Runtime --> Workers Signals["Metrics / logs / dashboards"] --> API Signals --> Workers
If any one of these shakes, the app becomes slow or shows stale information. So even a mini PC had to be treated like a cloud VM if users depend on it. The server is small, but user-facing failures are not.
I separated user requests from management paths
The first rule I kept was not mixing the user request path with the management path.
The API used by app users should not sit behind a management-only protection line that requires login. Conversely, the path I use to operate the server should not be opened in the same way as the user API.
Instead of exposing particular configuration values, user requests and operator access need to be treated as different kinds of paths.
This separation also helps during incidents. If a user request-path problem, an internal runtime problem, and an operator control-path problem are mixed together, the recovery order becomes unclear.
Server state was left in Git
Even in a small project, I needed to explain why the server is in its current state. So I put the desired server state in Git and made the runtime reconcile toward it.
flowchart LR AppRepo["App repository"] --> CI["CI<br/>tests / build"] CI --> Image["Versioned image"] CI --> Desired["Desired deploy state"] Desired --> Sync["GitOps sync"] Sync --> Runtime["API / workers / data services"] Runtime --> Smoke["Post-deploy smoke check"]
This paid off most when something went wrong.
- I can trace which image shipped.
- I can see whether the real server diverged from Git state.
- Deployment and post-deploy checks can be discussed in the same language.
- During incidents, I do not have to guess what was changed manually on the server.
GitOps does not solve everything. DB migrations, worker rollout order, operating-config changes, and external-source failures still need separate attention. But having the desired server state recorded reduces the incident surface.
Dashboards started from the check order
Adding graphs and logs does not automatically improve operations. First I had to decide the order in which I would actually look.
The signals Hangangjari needs to keep watching are:
- Is the user API alive?
- Is the API alive but only one screen response failing?
- Are workers still collecting source data?
- Has the last success time exceeded the stale threshold?
- Did the cache hit/failure ratio change abnormally?
- Is the forecast run recent?
- Is the push outbox building up?
- Were backups created, and was restore verified?
A dashboard should gather these signals. Even many graphs are not useful if they do not reduce cause candidates.
So the dashboard had to match the actual check order more than show many numbers. It first asks whether the API is alive, whether only one screen response fails, whether a worker stopped, whether a source is stale, and whether cache is serving values instead.
When failure appears, I follow the path first
If I suspect code immediately when a problem appears, I am late. In Hangangjari, I split incidents by following the path requests take.
flowchart TD
Symptom["User report or smoke failure"] --> Public{"User API healthy?"}
Public -->|No| Path{"Edge/ingress path issue?"}
Path -->|Yes| Network["Check user request path"]
Path -->|No| Runtime["Check runtime"]
Public -->|Yes| Feature{"Only one screen failing?"}
Feature -->|Yes| Data{"Source freshness or cache issue?"}
Data -->|Yes| Worker["Check workers / ingestion / cache"]
Data -->|No| API["Check API response / DB query"]
Feature -->|No| Client["Check app cache / widget snapshot"]
The important part is not the specific runtime method. It is which section to suspect, and in what order.
This order also calms down real operations. When an incident appears, the first impulse is to open the code. But a blocked user request path and stale source data require completely different responses. Following sections at least reduces time spent digging in the wrong place.
Backups need restore, not just files
Having a backup file and being able to restore from it are different. In Hangangjari, I split data into two kinds.
| Data | Character |
|---|---|
| Values that can be collected again from public sources | Rebuildable |
| User notification settings and push subscriptions | User data that must be recovered |
| App events and audit records | Evidence for operations review |
| Forecast/backtest history | Evidence for forecast review |
| Cache | Rebuildable |
Rebuildable cache and user-provided data must not be treated with the same weight. Restore drills are the minimum procedure for checking whether backups actually restore.
With this split, incident recovery has clearer priorities. The recovery order differs depending on whether a rebuildable value disappeared or user settings/evidence for operations disappeared.
What changed in the end
The biggest change was the attitude toward the server. In the past, it felt enough if API and workers were running. Now, when users see stale values, I first ask where the data stopped and which data can be rebuilt versus which data must be recovered.
Using a small server did not make operations easy. Because the server was small, the responsibilities I was carrying directly became more visible.
Share
No comments yet. You can leave the first one.
Pending review