Module 4 — Estimation and Time · Lesson 4.3
Estimate vs. Actual — Closing the Feedback Loop
How to make the gap visible without making it punitive
~11 min
What you'll learn
- Distinguish the three ways Kavanah measures actual time and when each is used
- Read the variance reports and identify systematic estimation errors
- Run a useful estimate-vs-actual retrospective without making it punitive
- Maintain time-entry hygiene without it becoming a bureaucratic burden
An estimator only stays calibrated if it sees what actually happened. The feedback loop — capturing actuals, comparing them to estimates, surfacing the gap — is the difference between an estimator that gets sharper every quarter and one that drifts into noise. This lesson covers the loop, the metrics that surface it, and the social discipline that keeps it from turning into a stick.
Three ways Kavanah measures actuals
Kavanah collects actual time in three ways, in descending order of accuracy.
Explicit time entries. A member can start a timer on a task or log entries by hand. These are the gold standard — actual wall-clock time spent on the task. The Time view (/time-productivity) is where members manage their entries; the AI agent will sometimes suggest 'should I start a timer on this?' when you switch context.
Inferred ranges. When no time entries exist, the estimator infers a range from the timestamps in the task's lifecycle: when it moved to 'in progress,' when it moved to 'review,' when it moved to 'done.' This is rougher because it counts wall-clock time including breaks and parallel work, but it is a reasonable lower bound for the estimator's training data.
Calendar overlap. For meeting-heavy work (customer calls, design reviews), the estimator can correlate task progress with calendar events tagged with the task or project. This is the noisiest of the three but useful for tasks whose work is primarily synchronous.
The estimator weighs these three by reliability. A task with rich explicit entries dominates the training data; a task with only lifecycle timestamps contributes less. The point is that the system has a graceful degradation when time-entry hygiene is partial — which it always is.
Reading the variance
The /reports surface and the AI agent both expose variance views. Three angles are worth knowing.
By skill tag. The variance distribution per skill tells you which kinds of work the estimator is calibrated on and which it is still learning. Tags with high MAPE need either more shipped history or sharper descriptions on incoming tasks.
By member. Per-member variance tells you whose tasks the estimator is calibrated to. Big positive bias for a member (estimator consistently below their actual) might mean the member is being assigned work above their declared skill level; big negative bias might mean the member is faster than their declared skills suggest and might be ready for harder work.
By project. Per-project variance tells you which projects' work is well-shaped and which is operating in poorly-understood territory. A project with a wide and growing variance distribution is a project where the team is learning fast, which is good but means deadlines committed early should be revisited.
The variance is a diagnostic surface. It is not a leaderboard. Reading it as 'who is the most accurate estimator' is the wrong frame, because the dominant signal in individual variance is usually the kind of work, not the person.
The estimate-vs-actual retrospective, done well
Once a sprint or once a month, the team runs through the largest variances and asks a single question: what about the work did the estimator miss?
Good answers — 'we didn't account for the legal review,' 'the customer fixture data turned out to be wrong,' 'the third refactor was load-bearing' — produce concrete learnings that feed back into how the team writes descriptions and breaks down tasks. The estimator's metadata captures the learnings as project-specific multipliers, and the next round of estimates incorporates them.
Bad answers — 'X is a slow estimator,' 'Y always sandbags' — produce social cost without producing learnings. The retro discipline is to keep the question on the work, not the person. If a member's variance distribution is genuinely out of family — which is rare — that is a 1:1 conversation about whether the work being assigned is the right work, not a public retro.
Time-entry hygiene without bureaucracy
Demanding minute-by-minute time tracking corrodes culture and produces falsified data within weeks. The alternative is a light hygiene that captures enough signal to feed the estimator without being a burden.
Default: lifecycle timestamps. Every task moving through the pipeline produces enough signal for an inferred range. No member has to do anything.
Opt-in: timers on focused work. For tasks where the member wants a sharper number (perhaps because they want to compare to their estimate), starting a timer is one click. The AI agent often suggests starting one.
Default for billable work: explicit entries. Agency teams and consultancies need to bill, and bill accuracy demands explicit entries. The Time view supports this; the workflow is built around running a timer per active task and adjusting at end-of-day.
The principle: the more billable the work, the higher the bar for explicit entries; the more internal the work, the lower. Most teams sit in the middle and benefit from a small daily-end ritual where the member confirms or adjusts the inferred ranges. Ten minutes a day produces a clean dataset and a good estimator.
Close the loop this week
- 1
Confirm the current sprint's tasks have at least lifecycle timestamps. Spot-check a few; for billable work, confirm explicit entries.
- 2
Open /reports → variance by skill
Identify the skill with the highest MAPE. Is the work in that skill systematically described thinly? Tag inconsistently?
- 3
Run a 20-minute estimate-vs-actual retro
Take the top 3 variances of the sprint. Ask: what did the estimator miss? Capture as project-context notes.
- 4
Adopt a daily 10-minute end-of-day ritual
Each member confirms or adjusts the day's inferred ranges. The estimator's training data becomes meaningfully cleaner within a sprint.
Feedback-loop health
- Time-entry coverage
- Fraction of completed tasks with at least lifecycle timestamps; billable subset with explicit entries.
- Healthy signal: Lifecycle: 100%. Billable explicit: above 90%.
- Variance trend
- Rolling 4-week trend of MAPE across all completed tasks.
- Healthy signal: Falling or stable. Rising means inputs are degrading.
- Learnings-to-multipliers conversion
- Retro-identified estimation errors that become project-context multipliers in the estimator.
- Healthy signal: At least one per sprint. Zero means retros are happening but not feeding back.
- Retro-induced description tightenings
- Number of changes to task-description style or template that get adopted as a result of variance analysis.
- Healthy signal: Periodic. The description style should evolve, not stay frozen.
Key takeaways
- ·Actuals come in three reliability tiers: explicit entries, lifecycle timestamps, calendar overlap.
- ·Variance is a diagnostic surface, not a leaderboard. Read it for kinds of work, not for people.
- ·A good retro asks what the estimator missed about the work; it does not name names.
- ·Time-entry hygiene is light by default and stricter for billable work.
Estimates, well-collected and well-fed, scale up into capacity planning. The next lesson covers the workspace-level rollup: how to plan a sprint, a quarter, and a year against a calibrated estimator.