Module 4 — Estimation and Time · Lesson 4.1
Why Estimates Are Wrong (and Why That's Not the Point)
The political life of a number, and what it is actually for
~10 min
What you'll learn
- Articulate why human estimates systematically miss (and why teams pretend otherwise)
- Distinguish a prediction from a commitment from a budget
- Use distribution-shaped estimates instead of point estimates
- Set the stage for the automated estimation lesson that follows
Every team's most painful weekly ritual is the estimation meeting. Half the room treats the number as a commitment; the other half treats it as a guess; nobody is treating it as what it actually is, which is a planning input. Before we get to how Kavanah generates estimates automatically, we have to talk about why human estimates are unreliable, why pretending otherwise is worse than admitting it, and what we actually want estimates to do for us.
Why human estimates miss
Three structural reasons.
The planning fallacy. Humans systematically underestimate task duration because they imagine the median case and forget the long tail. This bias is robust across domains and is not eliminated by experience. A senior developer estimates better than a junior one in absolute terms — but the senior developer's estimates are still biased low, just less so.
The political pricing. In any organization where estimates are treated as commitments, the smart move is to inflate them. The cost of overrunning a tight estimate is real; the cost of finishing early on a loose one is negligible. Over time, the estimates the system collects are not predictions at all — they are insurance bids.
The context starvation. The person estimating usually does not have the full context. They are looking at a one-line task description and reaching into memory for similar work. Their memory is selective; their similar work was usually slightly different. The estimate is biased by whichever similar task is most recent or most vivid.
These three biases compound in different directions, which is why team-level estimates tend to be a noisy average — wrong by 30–50% on individual tasks but roughly right at the project level when many tasks are summed.
Prediction vs. commitment vs. budget
An estimate gets used three different ways, often by three different people simultaneously, often without anyone realizing the conflict.
As a prediction: 'this will probably take 3 hours.' The output is a best guess about reality. The right rebuttal is data ('similar tasks have taken 5 hours on average').
As a commitment: 'I will deliver this in 3 hours.' The output is a promise. The right rebuttal is push-back about scope or capacity ('I can't promise that without dropping X').
As a budget: 'we will spend at most 3 hours on this before reassessing.' The output is a constraint on the work, not on reality. The right rebuttal is about value ('is 3 hours the right ceiling given how much this matters?').
Most team conflict around estimates comes from one person using the number as a budget while another treats it as a commitment. The cure is to be explicit about which use the estimate is serving in any given moment. Kavanah surfaces all three: the agent's estimate is a prediction; the team's accepted estimate is a budget; the assigned member's stated estimate (if they choose to add one) is a soft commitment.
Distributions, not points
A single number is the wrong shape for an estimate. The right shape is a distribution: 'median 3 hours, p90 5 hours, p99 8 hours.' The shape captures something the point estimate hides — how uncertain the estimate is.
A task with median 3 / p90 4 is well-understood; the team can plan around it. A task with median 3 / p90 9 is poorly understood; the team should not plan around it without breaking it down further. Same median, completely different planning implication.
Kavanah's task-intelligence layer produces distribution-shaped estimates by default. The UI displays the median prominently and the p90 alongside it as a confidence band. Where the band is wide, the manager has two good responses: tighten the description (which feeds the model more signal) or break the task into smaller pieces (which moves each piece into a tighter band).
The practical effect is that planning conversations stop being 'is your estimate right' and become 'is this work well-enough understood to plan around.' Those are different conversations and the second is more useful.
What estimates are actually for
Three uses justify the effort of producing estimates at all.
Sequencing. If task A is estimated 1 day and task B is estimated 5 days, that affects which goes first if both share an assignee or a dependency. The estimate does not need to be accurate; it just needs to be comparable.
Commitment to outside parties. If you tell a customer they will have the feature in two weeks, you need an estimate, however lossy, behind that promise. Kavanah's project-level estimates roll up from task estimates and surface in the Portfolio view.
Capacity planning. If your team has 40 person-hours this week and the proposed work is estimated at 60 person-hours, you need to know that before the week starts. The estimate is the input.
Notice what is missing from this list: performance evaluation. Estimates are not a fair input to whether someone is performing well. They are too noisy and too easily gamed. If you use them that way, the gaming will dominate the signal within a quarter. Module 6 covers what actually belongs in the people-metrics column.
Reset how your team thinks about estimates
- 1
Open a recent task with an estimate
Confirm the estimate is shown with a confidence band, not as a single number.
- 2
Find a task where the band was very wide
Wide band means the task was poorly understood. The right response was breakdown, not commitment.
- 3
Pick a current customer-visible deadline. Trace it back to the task estimates that support it. Is the rollup honest?
- 4
Have one explicit conversation
With your team, name which use of estimates you are doing in which context: prediction, commitment, budget. Most conflict will dissolve.
Estimate-hygiene checks
- Estimate-to-actual ratio (mean and distribution)
- Across recently completed tasks, the ratio of estimated time to actual time. Look at the distribution, not just the mean.
- Healthy signal: Mean near 1.0; p90 under 2.0. A long right tail signals underestimation; left tail signals padding.
- Confidence-band width by task class
- Median width of the p10–p90 band, grouped by task type.
- Healthy signal: Falling over time as the model learns. Stable wide bands point at unstable inputs (vague descriptions).
- Breakdown rate on wide-band tasks
- Fraction of tasks with wide confidence bands that get broken into sub-tasks before being committed.
- Healthy signal: Above 70%. The whole point of the band is to drive this decision.
Key takeaways
- ·Human estimates miss for three structural reasons: planning fallacy, political pricing, context starvation.
- ·Three uses of estimates: prediction, commitment, budget. Most conflict comes from using the same number in two ways at once.
- ·Distributions beat point estimates because they surface uncertainty as a planning input.
- ·Estimates are not a fair performance metric. Use them for sequencing, outside commitment, and capacity planning.
With the framing right, the next lesson goes into Kavanah's actual estimation machinery — what signals it uses, how it produces a distribution, and how the manager interacts with it.