Published 2026-05-24

DORA Metrics in a Weekend With Apache DevLake

Every engineering leader I talk to wants DORA metrics, and almost none of them have them. The four — deployment frequency, lead time for changes, change failure rate, and time to restore service — are the closest thing our industry has to a universal scorecard for delivery performance. The reason most teams don't measure them isn't that they don't care. It's that they assume it requires a big platform purchase or a quarter of engineering time. It doesn't. You can stand the whole thing up with open-source Apache DevLake over a weekend. Here's how, and what to watch out for.

What the four metrics actually tell you

Quick refresher, because the names get muddled:

Deployment frequency — how often you ship to production. A throughput signal.
Lead time for changes — how long from commit to running in prod. A speed signal.
Change failure rate — what fraction of deployments cause a failure needing remediation. A quality signal.
Time to restore service (MTTR) — how long to recover when something breaks. A resilience signal.

The genius of DORA is the pairing. Throughput and speed without the quality and resilience signals just measures how fast you're shipping bugs. You need all four, together, or you'll optimize the wrong thing. A team that ships ten times a day with a 30% change failure rate is not "elite" — it's a fire that ships fast.

Why DevLake

Apache DevLake is an open-source data platform that ingests from your dev tools — GitHub/GitLab, Jira, CI systems, PagerDuty — normalizes it into a common schema, and ships with DORA dashboards out of the box. It's a real Apache project, it's free, and it does the annoying part (connecting to a dozen APIs and reconciling their data models) for you. You self-host it; your data stays yours.

The weekend plan

Saturday morning: stand it up

DevLake runs as a set of containers. The fastest path is the official Docker Compose setup:

git clone https://github.com/apache/incubator-devlake.git
cd incubator-devlake
cp env.example .env
docker compose up -d

That brings up the DevLake backend, a database, and a Grafana instance preloaded with the DORA dashboards. Give it a few minutes, then open the config UI (Config UI on its mapped port). You're now staring at an empty but working DORA platform. That's the part people think takes a quarter. It took a coffee.

Saturday afternoon: connect your sources

This is where the real work — and the real judgment — lives. DevLake needs to know:

Where deployments come from. Connect your CI/CD (GitHub Actions, GitLab CI, Jenkins, etc.). DevLake needs to identify which pipeline runs are production deployments, not every build. This is the single most important configuration decision you'll make.
Where code changes come from. Connect GitHub/GitLab so it can compute lead time from first commit to deploy.
Where incidents come from. Connect PagerDuty, or use Jira/GitHub issues tagged as incidents, so it can compute change failure rate and MTTR.

Create a connection for each, add a project, and run a collection. DevLake pulls the historical data and builds the metrics.

Sunday: tune the definitions until the numbers are honest

Here's the part the quickstarts skip and the part that actually matters. The default definitions will be wrong for you, and a wrong DORA dashboard is worse than none — because people will trust it.

The big ones to get right:

What counts as a "deployment"? If DevLake treats every CI run as a deployment, your deployment frequency is fiction. Scope it to the pipeline/job that actually promotes to prod. In DevLake you do this with deployment patterns — a regex or step name that identifies real production deploys.
What counts as a "failure"/"incident"? Change failure rate is only meaningful if your incident definition is consistent. If your team logs a Jira incident for every hiccup but another team only logs SEV1s, you can't compare them. Pick a definition and apply it everywhere.
When does the clock start for lead time? First commit on the branch? PR open? DevLake defaults to first commit; make sure that matches how your team actually works, or your lead time will look artificially long for teams that branch early.
When does MTTR's clock start and stop? Incident created to incident resolved — and make sure people actually resolve incidents in the tool, or your MTTR trends toward infinity because tickets never get closed.

Spend Sunday cross-checking a handful of the computed metrics against reality. Pick three recent deploys you remember and confirm DevLake's lead time matches what actually happened. If it doesn't, your definitions are off. Iterate until the dashboard tells the truth. This honesty-pass is the difference between a metrics platform people use and one they quietly ignore after week two.

The trap to avoid

Once you have the numbers, the temptation is to turn them into targets for individuals or to rank teams. Don't. DORA metrics are system metrics — they describe the delivery system, not the people in it. The moment "deployment frequency" becomes someone's performance goal, they'll game it (hello, ten trivial deploys a day) and you'll have destroyed the signal. Use them to find bottlenecks and track whether your improvements are working, not to grade humans. This is the same blameless principle I apply to postmortems: measure the system, fix the system.

Where this leaves you

By Sunday night you have live deployment frequency, lead time, change failure rate, and MTTR, on dashboards, from your real tools, for $0 in licensing and one weekend of effort. That's a genuinely strong position. From there the work is continuous: keep the definitions honest, watch the trends, and use them to argue for the investments (better CI, smaller batches, faster rollback) that actually move the numbers.

If you'd rather not spend the weekend wiring it up — or you want the metrics hosted, maintained, and watched without standing up your own DevLake stack — that's exactly what I'm building over at dora.statusowl.io. Same open-source-grounded metrics, none of the Sunday-afternoon definition-tuning. Either way: get the four signals in front of your team. You can't improve what you refuse to measure.

Five emails a week on AI reliability. Free, no spam, unsubscribe anytime.

Subscribe →