Why Parking Forecasts Need Confidence

I used p10, p50, p90, risk, and backtests so forecasts do not look like one correct answer

Hangangjari’s parking forecast is not a promise to predict the future exactly. It is a reference value that helps users ask, before leaving, “is this risky now?”

So before choosing the formula, I decided how the screen should speak.

Forecasts are separate from current values. They carry confidence. They also preserve whether the source is old as a reason. Backtests keep checking how far the forecast misses.

Users do not see the internals of the forecast. They see “is it worth going?”, “is this risky?”, or “is there not enough information?” So the calculation and the wording around the value had to be designed together. If a number goes straight onto the screen, the app cannot explain its uncertainty.

A forecast blends four kinds of evidence

The parking forecast combines four kinds of evidence.

EvidenceMeaning
Recent statusLatest confirmed remaining spaces and capacity
Recent movementRate of change between recent confirmed values
Historical baselineHistorical mean and variance for the same horizon and time range
Recent-window statisticsMoving average, volatility, and trend correction for the recent window

When a current value exists, recent movement is weighted strongly. The farther the horizon, the more weight shifts to the historical baseline. When no current value exists, the historical baseline or recent-window statistics are used instead. If there is no evidence, the forecast stays empty.

Weights and thresholds can change with use. What lasts longer is which evidence the model checks, and how carefully it speaks when the evidence is weak.

flowchart LR
  Snapshots["Recent status"] --> Trend["Change per minute"]
  History["Historical baseline"] --> Estimate["p50 estimate"]
  Features["Recent-window statistics"] --> Estimate
  Trend --> Estimate
  Estimate --> Quantiles["p10 / p50 / p90"]
  Quantiles --> Risk["Risk level"]
  Quantiles --> Status["Expected crowding"]

I used range and risk instead of one number

If the forecast gives one number, users read it like a fact. So forecast values include p10, p50, and p90.

  • p50: representative expected remaining spaces.
  • p10: low remaining spaces under a conservative view.
  • p90: high remaining spaces under an optimistic view.

Risk is built from those three values. In parking, “is it risky on the low side?” matters more than “is there space on average?”

Weak evidence lowers confidence

Confidence is not a score that makes a forecast look impressive. It tells both users and developers how carefully the forecast should be read.

For now, it checks:

  • Lower confidence as the horizon gets farther away.
  • Lower confidence when the source is stale.
  • Raise it slightly when historical samples are sufficient.
  • Raise it slightly when recent-window samples are sufficient.

Reason codes are stored with it. For example, the record can show that the source was stale, the horizon was far, the historical baseline was used, or confidence was low.

flowchart TD
  Horizon["Forecast horizon"] --> Confidence
  Freshness["Source freshness"] --> Confidence
  Samples["Historical sample count"] --> Confidence
  Features["Recent-window sample count"] --> Confidence
  Confidence --> Reason["Reason codes"]

A forecast is stored as one generated bundle

Each forecast generation is stored as a run. A run includes model version, creation time, and forecast horizon. On success it records the number of stored rows and the observed time used as the base. On failure it closes the run as failed.

sequenceDiagram
  autonumber
  participant Job as Forecast job
  participant Repo as Forecast repository
  participant Gen as Forecast generator

  Job->>Repo: Create new forecast-run record
  Gen->>Repo: Read lots/history/baseline/statistics
  Gen->>Gen: Calculate values by horizon
  Gen->>Repo: Store forecast rows
  Gen->>Repo: Close run with base observed time

Runs make it possible to trace when the forecast currently visible through the API was calculated. They also keep results from different model versions from blending together.

Forecast wording stays careful because of backtests

After launch, forecasts have to be compared with actual values. Hangangjari backtests attach labels from actual confirmed values near the forecast target arrival time, then calculate metrics by horizon.

The values are roughly:

  • Sample count.
  • Average error similar to MAE.
  • Ratio of actual values inside the p10-p90 range.
  • Ratio of samples with stale sources.
  • Gate pass/fail.
flowchart LR
  ForecastRows["Past forecast rows"] --> Labels["Attach actual values near arrival time"]
  Labels --> Score["Compare forecast and actual value"]
  Score --> Metrics["Horizon/model metrics"]
  Metrics --> Gate["Quality gate"]

Backtests keep the app from exaggerating forecast language. They keep forecasts as reference values whose error and uncertainty are watched continuously, not as exact answers.

Backtests are both a tool for improving the formula and a safety rail for screen copy. If a horizon’s metrics are poor, that range can be described more carefully. If the app says a forecast is “accurate guidance” without validation, it may look good on the surface but users will lose trust quickly.

Forecasts became useful when they reduced overconfidence

On screen, forecast values had to be separated from current values. It was safer to show p10/p50/p90 and risk level together than to show only p50, and confidence had to be affected by horizon, source freshness, and sample count.

Reason codes became both a debugging aid and a basis for user-facing presentation. Runs and model versions are needed to reproduce and compare results, and without backtests there is no way to say whether the forecast improved.

Parking forecast sounds impressive by name, but in practice its value came from reducing overconfidence. More than showing one more number, the app needed to say how far that number should be trusted.

Share

Share

Image preview