- Published on
Building an AI Code Review Extension for Azure DevOps
- Authors
Most AI code review tools target GitHub. If you're on Azure DevOps, you're stuck with either expensive enterprise products or nothing. So I built one — a pipeline task that sends your PR diff to Claude and posts actionable findings as PR comments.
The key insight: confidence scoring. Without it, AI code review is a firehose of false positives. With a threshold (default 80/100), only high-confidence findings make it through.
How It Works
PR Created/Updated
│
▼
Pipeline Trigger
│
▼
Fetch PR Diff (ADO REST API)
│
▼
Send to Claude (Anthropic API)
│
▼
Score Findings (0-100 confidence)
│
▼
Filter (>= threshold)
│
▼
Post PR Comment Thread
The task runs as a standard Azure Pipelines step. It uses System.AccessToken to read the PR diff and post comments — no extra service connections needed beyond the Anthropic API key.
The Review Prompt
The prompt engineering matters more than the code. Here's what I focused on:
What to flag:
- Real bugs: null pointers, off-by-one, race conditions, resource leaks
- Security: injection, auth bypass, data exposure, hardcoded secrets
- Logic errors: incorrect conditions, wrong variables, missing edge cases
- Missing error handling: uncaught exceptions, missing null checks
What to ignore:
- Pre-existing issues not introduced in this PR
- Style/formatting (linters handle this)
- Pedantic nitpicks
- General code quality suggestions
This distinction is critical. Without explicit ignore rules, Claude will happily flag every minor style issue in the diff, drowning real bugs in noise.
Confidence Scoring
Each finding gets scored 0-100:
| Range | Meaning |
|---|---|
| 0-25 | Likely false positive |
| 26-50 | Might be real but uncertain |
| 51-75 | Probably real, minor impact |
| 76-90 | Confident, real and important |
| 91-100 | Certain, critical issue |
The default threshold is 80. In testing, this filters out ~60-70% of findings while keeping everything actionable. You can tune it — lower for thorough reviews, higher for less noise.
Implementation
The task is TypeScript running on Node 20. Three main functions:
1. Fetching the Diff
Azure DevOps doesn't have a single "give me the diff" endpoint. You need to:
- Get PR iterations (to find the latest push)
- Get changed files from that iteration
- Fetch file content for each changed file
const iterRes = await fetch(
`${orgUrl}/${project}/_apis/git/repositories/${repoId}/pullRequests/${prId}/iterations?api-version=7.1`,
{ headers: { Authorization: `Bearer ${token}` } }
);
const lastIteration = iterData.value[iterData.value.length - 1].id;
2. Claude Review
Single API call with structured JSON output:
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
system: reviewPrompt,
messages: [{ role: "user", content: `Review this PR diff:\n\n${diff}` }],
});
The response comes back as JSON with findings, each scored for confidence. Parse, filter by threshold, done.
3. Posting Comments
Post a single thread on the PR with all findings:
const body = {
comments: [{
parentCommentId: 0,
content: formattedReview,
commentType: 1,
}],
status: 4, // closed - informational, not blocking
};
Status 4 (closed) means the comment thread doesn't block the PR. It's informational — the team decides what to act on.
Pipeline Setup
trigger: none
pr:
branches:
include:
- main
- develop
pool:
vmImage: 'ubuntu-latest'
steps:
- task: ClaudeCodeReview@1
inputs:
anthropicApiKey: $(ANTHROPIC_API_KEY)
env:
SYSTEM_ACCESSTOKEN: $(System.AccessToken)
Store the API key in a variable group (Pipelines > Library) and mark it as secret.
Cost
Claude Sonnet 4 pricing: 15/1M output tokens.
A typical PR is 2-5K input tokens and 500-1K output. That's roughly **0.50/day.
The model input is configurable — swap to Haiku for cheaper reviews or Opus for deeper analysis.
What I'd Add Next
Multi-agent review — The GitHub version of this concept uses 4 parallel agents: two for guideline compliance, one for bugs, one for git blame context. Each reviews independently, then findings are merged and deduplicated. More expensive per review (~4x) but catches more.
Inline comments — Currently posts a single thread. Azure DevOps supports inline comments on specific lines — would make findings easier to act on.
CLAUDE.md awareness — Read project-level coding guidelines and check compliance, similar to how the original plugin works.
Try It
The extension is on the Azure DevOps Marketplace. Install it, add your API key, and add the task to your PR pipeline. That's it.
This is my second Azure DevOps marketplace extension. The first — DevOps Impact Metrics — tracks commits, PRs, and work items across repos with a visual dashboard. It's hit 61 installs since launch, which validated the idea that there's demand for better ADO tooling.
Source code: GitHub

