Module 4 — Estimation and Time · Lesson 4.2
Automated Time Estimation in Kavanah
How the model produces an estimate, what signals it uses, and how to make it sharper
~13 min
What you'll learn
- Trace how a task's estimate is produced, signal by signal
- Read the estimate's confidence band and use it to drive planning decisions
- Improve future estimates by tightening descriptions and tagging consistently
- Calibrate the model with the explicit override mechanism
Every task in Kavanah lands with an automated time estimate. The estimate is not magic and it is not opinion; it is the output of a small pipeline that you can inspect, override, and improve. This lesson is the full picture of how it works so you can use it intentionally.
The four signals
The estimator combines four inputs.
Skill tags. The canonical skill tags on the task are the strongest signal. Tasks with the same tags are presumed to draw on similar work; the estimator pulls the historical distribution of duration for that tag across the workspace as its base rate.
Member history. If the task already has an assignee, the estimator narrows the base rate to the assignee's own history with that tag. A member who has shipped many tasks in a tag has a tighter distribution than the workspace average; a member new to a tag has the workspace's wider distribution.
Project context. Tasks within the same project share infrastructure, codebase, and customer context. The estimator applies a project-specific multiplier — usually small — that accounts for things like 'this project's PR review cycle adds an hour to median.' This multiplier is learned automatically; you do not set it.
Description signals. The estimator runs a lightweight read of the task description for keywords that correlate with duration spread — 'investigate,' 'refactor,' 'spike,' 'experiment.' These markers widen the band rather than shift the median, because they are signals of uncertainty more than of effort.
These four signals compose into a distribution, of which the manager sees median and p90. The distribution is recomputed when any input changes — including when the assignee changes — so you can see the band shift in real time as you edit the task.
How the estimate is presented
Every task shows the estimate as 'median X (p90 Y),' with the band visualized as a small bar in the task detail view. The bar's color shifts from green (tight band) to amber (medium) to red (wide). The color is the at-a-glance signal; the numbers are the actionable signal.
The agent uses the same numbers when proposing tasks. When you ask the agent to plan a sprint, it sums estimates by assignee and shows you the total commit by person, with each person's individual confidence band rolled up appropriately.
The estimate is also visible in the Portfolio view at project level, where it rolls up across tasks. The project-level p90 is, in practice, the only outside-commitment number a manager should ever quote — quoting the project median is asking to overrun half the time.
How you make it sharper
The single most effective lever on estimate quality is the task description. A description that names the constraints, the acceptance criteria, and the relevant prior art tightens the band materially. A description that says 'fix login bug' has a wide band; a description that says 'fix the 401 returned by /auth/login when the user's session has a stale refresh token, repro in staging, fixture in test-session-stale.json' has a tight one.
The second lever is consistent tagging. Skill tags are how the estimator looks up the base rate; inconsistent tagging fragments the history. The auto-tagging from the AI agent is usually right; check it at task creation. Manually tagged tasks should use the canonical vocabulary, not free-form labels.
The third lever is honest closing. When a task ships, the actual time-to-complete becomes a new datapoint in the estimator's history. If the team consistently closes tasks without recording actual time (or, worse, closes batches in a single click), the estimator's training data is corrupted and the bands stop being meaningful. Module 4.3 covers the feedback loop in detail.
Manual overrides — when and how
The manager can override the estimate manually. The override carries metadata — who overrode, when, by how much, and with what reason. The estimator does not ignore the override; it treats it as a high-weight datapoint and incorporates it into the next computation.
The rule of thumb is to override when you have specific information the model does not. 'I happen to know the customer is going to send three rounds of comments on this' is information the model cannot see; an override is correct. 'I think the estimate is too high' without a reason is not information; the override will drift back to the model's number within a few similar tasks and the team will lose trust in the estimator.
Overrides also feed the calibration metric. If a member systematically overrides downward and the actuals come in close to the original model number, the model was right and the member's pattern is sandbagging in reverse. If overrides downward consistently track to actuals, the model is systematically high for that member, and the next quarterly retune will correct.
The cold-start case
A new workspace has no history. The estimator starts with prior distributions inherited from across Kavanah and converges to the workspace's own pattern within roughly thirty completed tasks per skill. During the cold-start period, the bands are wider than they will eventually be, and the manager should treat estimates as more advisory than usual.
A new member is a partial cold-start. The estimator starts them on the workspace's distribution for the relevant tags and converges to their own pattern within ten to twenty shipped tasks per tag. This is why declared skills matter even in the absence of history: the declaration shifts the prior, so the cold-start estimates are already biased toward what the member self-reported.
A new project type — say, the team takes on a new vertical — is a third cold-start. Project-context multipliers reset for the new project and converge within a sprint or two of shipped work.
Use the estimator deliberately on the next task
- 1
Pick the next task you are about to assign
Open its detail view. Look at the estimate, the band, and the band color.
- 2
If the band is wide, sharpen the description
Add acceptance criteria. Name the prior art. Reference the related task if there is one. Watch the band narrow.
- 3
Confirm the skill tags are canonical
If the auto-tagging picked 'frontend' but the work is design-systems, retag. The estimator base rate changes accordingly.
- 4
Quote the project's p90, not the median, to outside stakeholders
If you are committing a date to a customer, use the p90 rollup from the project view. Half the time the project lands earlier; the other half you do not have to apologize.
Estimator-quality metrics
- Mean Absolute Percentage Error (MAPE) on completed tasks
- Average of |actual − estimated| / actual across the last 30 days of completed tasks.
- Healthy signal: Under 30% after a quarter of history. Above 50% suggests inputs are too sparse.
- Band coverage
- Fraction of completed tasks whose actual time fell inside the p10–p90 band.
- Healthy signal: 80%. Lower means the band is too tight; higher means too loose.
- Manual-override rate
- Fraction of tasks whose estimate was manually overridden before commit.
- Healthy signal: 10–25%. Above 50% means the model is consistently off; below 5% may mean the team is not using the override at all.
- Description-to-band correlation
- Diagnostic: across tasks, does longer/more-specific description correlate with tighter band?
- Healthy signal: Positive. If flat, the team's descriptions are not carrying signal the estimator can use.
Key takeaways
- ·The estimator combines four signals: skill tags, member history, project context, description markers.
- ·The estimate is a distribution; the manager sees median and p90 with a color-coded confidence band.
- ·The single largest lever on quality is the task description.
- ·Overrides should carry information, not opinion; the model learns from them.
- ·Cold-start at the workspace, member, and project type levels is expected and time-bounded.
The estimator only stays calibrated if reality flows back into it. The next lesson covers the feedback loop — how actuals get recorded, what to do with the gap, and how the model stays honest.