Estimate vs. Actual — Closing the Feedback Loop — Project Management with Kavanah

An estimator only stays calibrated if it sees what actually happened. The feedback loop — capturing actuals, comparing them to estimates, surfacing the gap — is the difference between an estimator that gets sharper every quarter and one that drifts into noise. This lesson covers the loop, the metrics that surface it, and the social discipline that keeps it from turning into a stick.

Three ways Kavanah measures actuals

Kavanah collects actual time in three ways, in descending order of accuracy.

Explicit time entries. A member can start a timer on a task or log entries by hand. These are the gold standard — actual wall-clock time spent on the task. The Time view (/time-productivity) is where members manage their entries; the AI agent will sometimes suggest 'should I start a timer on this?' when you switch context.

Inferred ranges. When no time entries exist, the estimator infers a range from the timestamps in the task's lifecycle: when it moved to 'in progress,' when it moved to 'review,' when it moved to 'done.' This is rougher because it counts wall-clock time including breaks and parallel work, but it is a reasonable lower bound for the estimator's training data.

Calendar overlap. For meeting-heavy work (customer calls, design reviews), the estimator can correlate task progress with calendar events tagged with the task or project. This is the noisiest of the three but useful for tasks whose work is primarily synchronous.

The estimator weighs these three by reliability. A task with rich explicit entries dominates the training data; a task with only lifecycle timestamps contributes less. The point is that the system has a graceful degradation when time-entry hygiene is partial — which it always is.

Reading the variance

The /reports surface and the AI agent both expose variance views. Three angles are worth knowing.

By skill tag. The variance distribution per skill tells you which kinds of work the estimator is calibrated on and which it is still learning. Tags with high MAPE need either more shipped history or sharper descriptions on incoming tasks.

By member. Per-member variance tells you whose tasks the estimator is calibrated to. Big positive bias for a member (estimator consistently below their actual) might mean the member is being assigned work above their declared skill level; big negative bias might mean the member is faster than their declared skills suggest and might be ready for harder work.

By project. Per-project variance tells you which projects' work is well-shaped and which is operating in poorly-understood territory. A project with a wide and growing variance distribution is a project where the team is learning fast, which is good but means deadlines committed early should be revisited.

The variance is a diagnostic surface. It is not a leaderboard. Reading it as 'who is the most accurate estimator' is the wrong frame, because the dominant signal in individual variance is usually the kind of work, not the person.

The estimate-vs-actual retrospective, done well

Once a sprint or once a month, the team runs through the largest variances and asks a single question: what about the work did the estimator miss?

Good answers — 'we didn't account for the legal review,' 'the customer fixture data turned out to be wrong,' 'the third refactor was load-bearing' — produce concrete learnings that feed back into how the team writes descriptions and breaks down tasks. The estimator's metadata captures the learnings as project-specific multipliers, and the next round of estimates incorporates them.

Bad answers — 'X is a slow estimator,' 'Y always sandbags' — produce social cost without producing learnings. The retro discipline is to keep the question on the work, not the person. If a member's variance distribution is genuinely out of family — which is rare — that is a 1:1 conversation about whether the work being assigned is the right work, not a public retro.

Time-entry hygiene without bureaucracy

Demanding minute-by-minute time tracking corrodes culture and produces falsified data within weeks. The alternative is a light hygiene that captures enough signal to feed the estimator without being a burden.

Default: lifecycle timestamps. Every task moving through the pipeline produces enough signal for an inferred range. No member has to do anything.

Opt-in: timers on focused work. For tasks where the member wants a sharper number (perhaps because they want to compare to their estimate), starting a timer is one click. The AI agent often suggests starting one.

Default for billable work: explicit entries. Agency teams and consultancies need to bill, and bill accuracy demands explicit entries. The Time view supports this; the workflow is built around running a timer per active task and adjusting at end-of-day.

The principle: the more billable the work, the higher the bar for explicit entries; the more internal the work, the lower. Most teams sit in the middle and benefit from a small daily-end ritual where the member confirms or adjusts the inferred ranges. Ten minutes a day produces a clean dataset and a good estimator.

Estimates, well-collected and well-fed, scale up into capacity planning. The next lesson covers the workspace-level rollup: how to plan a sprint, a quarter, and a year against a calibrated estimator.