Automated Time Estimation in Kavanah — Project Management with Kavanah

How the model produces an estimate, what signals it uses, and how to make it sharper

Every task in Kavanah lands with an automated time estimate. The estimate is not magic and it is not opinion; it is the output of a small pipeline that you can inspect, override, and improve. This lesson is the full picture of how it works so you can use it intentionally.

The four signals

The estimator combines four inputs.

Skill tags. The canonical skill tags on the task are the strongest signal. Tasks with the same tags are presumed to draw on similar work; the estimator pulls the historical distribution of duration for that tag across the workspace as its base rate.

Member history. If the task already has an assignee, the estimator narrows the base rate to the assignee's own history with that tag. A member who has shipped many tasks in a tag has a tighter distribution than the workspace average; a member new to a tag has the workspace's wider distribution.

Project context. Tasks within the same project share infrastructure, codebase, and customer context. The estimator applies a project-specific multiplier — usually small — that accounts for things like 'this project's PR review cycle adds an hour to median.' This multiplier is learned automatically; you do not set it.

Description signals. The estimator runs a lightweight read of the task description for keywords that correlate with duration spread — 'investigate,' 'refactor,' 'spike,' 'experiment.' These markers widen the band rather than shift the median, because they are signals of uncertainty more than of effort.

These four signals compose into a distribution, of which the manager sees median and p90. The distribution is recomputed when any input changes — including when the assignee changes — so you can see the band shift in real time as you edit the task.

How the estimate is presented

Every task shows the estimate as 'median X (p90 Y),' with the band visualized as a small bar in the task detail view. The bar's color shifts from green (tight band) to amber (medium) to red (wide). The color is the at-a-glance signal; the numbers are the actionable signal.

The agent uses the same numbers when proposing tasks. When you ask the agent to plan a sprint, it sums estimates by assignee and shows you the total commit by person, with each person's individual confidence band rolled up appropriately.

The estimate is also visible in the Portfolio view at project level, where it rolls up across tasks. The project-level p90 is, in practice, the only outside-commitment number a manager should ever quote — quoting the project median is asking to overrun half the time.

How you make it sharper

The single most effective lever on estimate quality is the task description. A description that names the constraints, the acceptance criteria, and the relevant prior art tightens the band materially. A description that says 'fix login bug' has a wide band; a description that says 'fix the 401 returned by /auth/login when the user's session has a stale refresh token, repro in staging, fixture in test-session-stale.json' has a tight one.

The second lever is consistent tagging. Skill tags are how the estimator looks up the base rate; inconsistent tagging fragments the history. The auto-tagging from the AI agent is usually right; check it at task creation. Manually tagged tasks should use the canonical vocabulary, not free-form labels.

The third lever is honest closing. When a task ships, the actual time-to-complete becomes a new datapoint in the estimator's history. If the team consistently closes tasks without recording actual time (or, worse, closes batches in a single click), the estimator's training data is corrupted and the bands stop being meaningful. Module 4.3 covers the feedback loop in detail.

Manual overrides — when and how

The manager can override the estimate manually. The override carries metadata — who overrode, when, by how much, and with what reason. The estimator does not ignore the override; it treats it as a high-weight datapoint and incorporates it into the next computation.

The rule of thumb is to override when you have specific information the model does not. 'I happen to know the customer is going to send three rounds of comments on this' is information the model cannot see; an override is correct. 'I think the estimate is too high' without a reason is not information; the override will drift back to the model's number within a few similar tasks and the team will lose trust in the estimator.

Overrides also feed the calibration metric. If a member systematically overrides downward and the actuals come in close to the original model number, the model was right and the member's pattern is sandbagging in reverse. If overrides downward consistently track to actuals, the model is systematically high for that member, and the next quarterly retune will correct.

The cold-start case

A new workspace has no history. The estimator starts with prior distributions inherited from across Kavanah and converges to the workspace's own pattern within roughly thirty completed tasks per skill. During the cold-start period, the bands are wider than they will eventually be, and the manager should treat estimates as more advisory than usual.

A new member is a partial cold-start. The estimator starts them on the workspace's distribution for the relevant tags and converges to their own pattern within ten to twenty shipped tasks per tag. This is why declared skills matter even in the absence of history: the declaration shifts the prior, so the cold-start estimates are already biased toward what the member self-reported.

A new project type — say, the team takes on a new vertical — is a third cold-start. Project-context multipliers reset for the new project and converge within a sprint or two of shipped work.

The estimator only stays calibrated if reality flows back into it. The next lesson covers the feedback loop — how actuals get recorded, what to do with the gap, and how the model stays honest.