Why Parking Forecasts Need Confidence
I used p10, p50, p90, risk, and backtests so forecasts do not look like one correct answer
Hangangjari’s parking forecast is not a promise to predict the future exactly. It is a reference value that helps users ask, before leaving, “is this risky now?”
So before choosing the formula, I decided how the screen should speak.
Forecasts are separate from current values. They carry confidence. They also preserve whether the source is old as a reason. Backtests keep checking how far the forecast misses.
Users do not see the internals of the forecast. They see “is it worth going?”, “is this risky?”, or “is there not enough information?” So the calculation and the wording around the value had to be designed together. If a number goes straight onto the screen, the app cannot explain its uncertainty.
A forecast blends four kinds of evidence
The parking forecast combines four kinds of evidence.
| Evidence | Meaning |
|---|---|
| Recent status | Latest confirmed remaining spaces and capacity |
| Recent movement | Rate of change between recent confirmed values |
| Historical baseline | Historical mean and variance for the same horizon and time range |
| Recent-window statistics | Moving average, volatility, and trend correction for the recent window |
When a current value exists, recent movement is weighted strongly. The farther the horizon, the more weight shifts to the historical baseline. When no current value exists, the historical baseline or recent-window statistics are used instead. If there is no evidence, the forecast stays empty.
Weights and thresholds can change with use. What lasts longer is which evidence the model checks, and how carefully it speaks when the evidence is weak.
flowchart LR Snapshots["Recent status"] --> Trend["Change per minute"] History["Historical baseline"] --> Estimate["p50 estimate"] Features["Recent-window statistics"] --> Estimate Trend --> Estimate Estimate --> Quantiles["p10 / p50 / p90"] Quantiles --> Risk["Risk level"] Quantiles --> Status["Expected crowding"]
I used range and risk instead of one number
If the forecast gives one number, users read it like a fact. So forecast values include p10, p50, and p90.
- p50: representative expected remaining spaces.
- p10: low remaining spaces under a conservative view.
- p90: high remaining spaces under an optimistic view.
Risk is built from those three values. In parking, “is it risky on the low side?” matters more than “is there space on average?”
Weak evidence lowers confidence
Confidence is not a score that makes a forecast look impressive. It tells both users and developers how carefully the forecast should be read.
For now, it checks:
- Lower confidence as the horizon gets farther away.
- Lower confidence when the source is stale.
- Raise it slightly when historical samples are sufficient.
- Raise it slightly when recent-window samples are sufficient.
Reason codes are stored with it. For example, the record can show that the source was stale, the horizon was far, the historical baseline was used, or confidence was low.
flowchart TD Horizon["Forecast horizon"] --> Confidence Freshness["Source freshness"] --> Confidence Samples["Historical sample count"] --> Confidence Features["Recent-window sample count"] --> Confidence Confidence --> Reason["Reason codes"]
A forecast is stored as one generated bundle
Each forecast generation is stored as a run. A run includes model version, creation time, and forecast horizon. On success it records the number of stored rows and the observed time used as the base. On failure it closes the run as failed.
sequenceDiagram autonumber participant Job as Forecast job participant Repo as Forecast repository participant Gen as Forecast generator Job->>Repo: Create new forecast-run record Gen->>Repo: Read lots/history/baseline/statistics Gen->>Gen: Calculate values by horizon Gen->>Repo: Store forecast rows Gen->>Repo: Close run with base observed time
Runs make it possible to trace when the forecast currently visible through the API was calculated. They also keep results from different model versions from blending together.
Forecast wording stays careful because of backtests
After launch, forecasts have to be compared with actual values. Hangangjari backtests attach labels from actual confirmed values near the forecast target arrival time, then calculate metrics by horizon.
The values are roughly:
- Sample count.
- Average error similar to MAE.
- Ratio of actual values inside the p10-p90 range.
- Ratio of samples with stale sources.
- Gate pass/fail.
flowchart LR ForecastRows["Past forecast rows"] --> Labels["Attach actual values near arrival time"] Labels --> Score["Compare forecast and actual value"] Score --> Metrics["Horizon/model metrics"] Metrics --> Gate["Quality gate"]
Backtests keep the app from exaggerating forecast language. They keep forecasts as reference values whose error and uncertainty are watched continuously, not as exact answers.
Backtests are both a tool for improving the formula and a safety rail for screen copy. If a horizon’s metrics are poor, that range can be described more carefully. If the app says a forecast is “accurate guidance” without validation, it may look good on the surface but users will lose trust quickly.
Forecasts became useful when they reduced overconfidence
On screen, forecast values had to be separated from current values. It was safer to show p10/p50/p90 and risk level together than to show only p50, and confidence had to be affected by horizon, source freshness, and sample count.
Reason codes became both a debugging aid and a basis for user-facing presentation. Runs and model versions are needed to reproduce and compare results, and without backtests there is no way to say whether the forecast improved.
Parking forecast sounds impressive by name, but in practice its value came from reducing overconfidence. More than showing one more number, the app needed to say how far that number should be trusted.
Share
No comments yet. You can leave the first one.
Pending review