Open Source · Apache 2.0 · 419/419 Tests Passing

Turn an alert into
an investigation
in 60 seconds.

Tarka converts Prometheus/Alertmanager alerts into structured triage reports — evidence, confidence-ranked hypotheses, and copy-paste-ready commands. No guessing. Just signal.

console.tarka.local
Tarka SRE Console — Case Inbox
419/419
Tests passing
27+
Diagnostic modules
10+
Alert families
50+
Built-in PromQL queries
< 60s
Time to first triage
The Problem

Most alerts still require a senior engineer to decode them.

Every on-call rotation has a few people who carry the tribal knowledge. They know which PromQL to run, which namespace to check, and which logs actually matter. Everyone else waits.

The result: slow MTTA, senior burnout, and incidents that drag on because the first responder isn't sure what "normal" looks like.

Tarka encodes the first 60 seconds of every investigation so every engineer on your team can respond with confidence — not just the one who's been there since day one.

Before Tarka
Slack: "hey @senior-sre can you look at this?"
10 browser tabs — Grafana, k9s, Kibana, Alertmanager…
"Is this the same as last week's thing?"
15 minutes to find the relevant PromQL query
Postmortem: "root cause: unknown"
After Tarka
Alert fires → structured report in < 60s
Evidence already gathered: K8s state, metrics, logs
Similar past incidents surfaced from case memory
Copy-paste kubectl and PromQL commands ready to run
Explicit when evidence is missing — never guesses
How It Works

A structured investigation pipeline, every time.

The same 11-stage pipeline runs whether you invoke Tarka from CLI, a webhook, or the web UI.

01
Alert Ingestion

Prometheus/Alertmanager alert arrives via CLI, webhook POST, or web UI. Labels extracted, target inferred automatically.

02
Multi-Source Evidence

Best-effort reads from Prometheus, Kubernetes API, and logs. Missing sources noted explicitly, never faked.

03
27+ Diagnostics

Crash loops, OOM kills, image pull failures, CPU saturation, change correlation. Each produces confidence-scored hypotheses.

04
Actionable Report

One-line verdict: label + why + next. Evidence-backed bullets, ranked hypotheses, 3–7 copy-paste commands.

triage-report — KubePodCrashLooping / prod / my-app
## Triage
scope: pod
discriminator: CrashLoopBackOff
impact: pod unavailable (3 restarts in 10m)
## Why
- Container exiting with code 1 (OOMKilled: no)
- Last 3 log lines before exit: FATAL: database connection refused
- No recent rollout detected (last deploy: 6 days ago)
- Liveness probe failing: /healthz → 502 (upstream timeout)
## Next
kubectl logs -n prod my-app --previous --tail=50
kubectl describe pod -n prod my-app
kubectl get events -n prod --sort-by=lastTimestamp
Features

Everything you need for confident triage.

01
100% Read-Only

Never mutates cluster state. Every operation is a read — Prometheus queries, kubectl get, log fetches. Safe to run during a live incident.

02
Works Without LLM

Deterministic base triage runs without any AI model. Add Vertex AI or Claude only when you want richer narrative enrichment.

03
Honest About Unknowns

When evidence is missing, Tarka says so explicitly. Scenarios A–D describe exactly what's blocked. No hallucinated root causes.

04
Copy-Paste Friendly

Every report ends with PromQL-first, kubectl-second commands you can run immediately. Designed for responders who need to act.

05
Case Memory

Stores every investigation in PostgreSQL. Surfaces similar past incidents during triage. Skills extracted from resolved cases inject relevant suggestions.

06
3 Deployment Modes

CLI for laptop investigations. Webhook mode for in-cluster automation. React console for team-wide visibility. One codebase, three surfaces.

See It in Action

A triage console built for on-call engineers.

Tarka Case Inbox

Case Inbox — all firing alerts in one place, scored by impact and confidence.

Integrations

Plugs into your existing stack.

Required: Prometheus/Alertmanager. Everything else is optional and degrades gracefully.

Core

Prometheus Alertmanager Kubernetes

Optional

Loki VictoriaLogs AWS CloudTrail GitHub Slack Vertex AI Anthropic Claude PostgreSQL NATS JetStream

When optional sources are unavailable, Tarka records the gap and continues. The report explicitly states what's missing.

Deployment

Run it your way.

mode: cli

Local Investigation

Run on your laptop in 5 minutes. Point at any Prometheus-compatible endpoint. No infrastructure required.

$ poetry install
$ python main.py --list-alerts
$ python main.py --alert 0
→ Quickstart Guide
mode: web-ui

Team Console

React-based case browser. See all investigations, drill into reports, chat with the agent, and review historical patterns across your team.

Tarka Web UI
→ See Screenshots
Get Started

First investigation
in 5 minutes.

No vendor signup. No SaaS. Self-hosted, open source, Apache 2.0.

→ Read the Quickstart View on GitHub
$ git clone https://github.com/tarkyaio/tarka.git && cd tarka && poetry install