Observability Needs an Architecture Reset

AI agents are rewriting how observability data gets consumed. The architecture that worked for human-scale dashboards can’t keep up. It’s time to start over.

Legacy Architecture
Legacy architecture: monolithic clusters where ingest, query, and storage are coupled in every node.
Coupled layersIngest, query, and storage fight for the same resources on every node.
Data compromiseCosts force downsampling, sampling, and short retention.
Provisioned for peakFixed clusters sized for worst-case, idle the rest of the time.
SSD-boundLocal disks limit capacity and need 3x replication for durability.
Ops-heavyCapacity planning, rebalancing, scaling — a constant burden.

OLAP Modernized. Observability Didn't.

Every other corner of data infrastructure moved on a generation ago. Observability is the last holdout.

Analytics / OLAP
Observability
Storage format
Columnar
Row-based, disk-heavy
Data layer
Object storage (S3)
Local SSDs, 3x replicated
Architecture
Separated compute & storage
Ingest / query / storage coupled
Compute model
Elastic, scales to demand
Fixed clusters, provisioned for peak
Ops burden
Serverless, zero-ops
Clusters, capacity planning, rebalancing

Telemetry volumes are compounding. AI systems are generating and consuming more observability data than ever. Meanwhile, data warehouses separated compute from storage years ago. OLAP engines run on object storage. Lakehouses decoupled ingest from query a generation ago.

Observability is still coupling everything into monolithic clusters backed by local SSDs, replicated three ways for durability, provisioned for peak load around the clock. The result: overprovisioned clusters, query timeouts under load, painful scaling, a constant ops burden, and an ever-growing bill that forces you to downsample or drop data just to stay within budget.

Agents Change Everything

AI transformed code authoring almost overnight. The same shift is coming to every observability activity — and the infrastructure isn't ready.

Human-Driven
Agent-Driven
Debugging
Hours of manual triage per incident
Parallel hypothesis testing, faster resolution
Perf testing
Periodic, bounded by human capacity
Continuous on every deploy and config change
Release validation
Manual checklists, partial coverage
Automated against full baseline, every PR
Instrumentation
Added once, updated when things break
Gaps detected and filled automatically
Anomalies
Static thresholds, constant tuning
Self-learning patterns, zero manual tuning

Copilot and Cursor proved that AI could transform code authoring. The same transformation is coming to every observability activity that was previously bounded by human time — debugging incidents, running performance benchmarks, qualifying releases, instrumenting services.

Agents don't just automate existing workflows. They create entirely new ones that humans never had time for: continuous regression detection across every metric, proactive root cause analysis before alerts fire, automated telemetry gap analysis across hundreds of services.

The infrastructure implications are massive. Each of these activities generates streams of queries — exploratory, bursty, touching data that hasn't been accessed recently. Traditional architectures were sized for human-scale query patterns. Agent-scale observability demands sub-second latency, orders of magnitude more queries, and access to the complete, unsampled dataset.

What If You Started From Scratch?

If you were designing an observability backend today, with no legacy baggage, what would it look like?

Object Storage for Everything

Infinite capacity, 11 nines of durability, a fraction of the cost of SSDs. No replication needed — S3 handles it.

On-Demand Compute

Compute scales out instantly for heavy dashboards or incident investigations, and scales to zero when nobody’s querying. No fixed clusters. No idle tax.

No Over-Provisioning

Pay for the work you actually do, not for capacity sitting idle waiting for a spike that may never come.

Zero Ops

No clusters to manage. No disks to monitor. No rebalancing, no capacity planning, no 3 am pages about the observability system itself.

High Resolution

No metrics downsampling. No trace sampling. Full log retention. Storage is cheap enough that there’s no reason to throw any signal away.

Open Interfaces

Standard query and ingest protocols. Your existing dashboards, alerts, and instrumentation should just work.

Architecture, from First Principles

Oodle isn’t a lift-and-shift of an on-prem database into the cloud. Every layer was designed for object storage and on-demand compute from day one.

Oodle Architecture Overview

Custom Columnar Format — 600x Compression

A purpose-designed storage format for observability data. 600x compression means each S3 GET returns 600x more useful data per byte transferred — the key insight that makes sub-second queries on object storage possible.

Separated Compute & Storage — Necessary, Not Sufficient

Ingest, storage, and query are fully decoupled. But separation alone is not enough — if your query layer is still a fixed-size cluster, a sudden heavy query can still crash the system. To truly scale, you need elastic compute.

Serverless Query Engine

Query compute spins up on demand, fans out in parallel, and releases resources when done. A sudden heavy query gets its own compute automatically — no cluster to crash, no capacity ceiling. Customers report 2–10x faster queries.

Intelligent Caching

Hot data and frequently-accessed results are cached in memory, warming automatically from usage patterns. Dashboard refreshes and alert evaluations hit cache directly — single-digit millisecond latency, zero compute invocations.

No Global Index

No massive inverted indexes consuming RAM. Lightweight, purpose-built metadata optimized for observability query patterns. Any tag, any cardinality — no performance cliff.

From Dashboards to Conversations

The interface for observability is changing. The architecture has to change with it.

CursorCursor proved that a sidebar conversation can replace complex IDE workflows. The same shift is happening in observability. Instead of building dashboards and writing queries, you describe what you're looking for and the system investigates.

Oodle's AI assistant works as a Cursor-like sidebar, inside Slack, and as an embedded experience in your existing tools. Ask it about an alert. Ask follow-up questions. It navigates across your metrics, logs, and traces to surface what matters.

Why did latency spike at 3 am?

Correlates metrics, logs, and traces across services to pinpoint root cause.

Is this deploy safe to ship?

Compares key metrics against historical baselines and surfaces anomalies.

Which services are missing trace instrumentation?

Analyzes telemetry coverage and identifies gaps across your stack.

What changed between yesterday and today?

Diffs metric patterns, error rates, and deployment events automatically.

Every question triggers a chain of exploratory queries — the kind of bursty, high-volume workload that dashboard-era architectures were never built for. A single conversation can touch weeks of high-cardinality data across metrics, logs, and traces simultaneously.

This is why architecture matters for AI. Conversational observability isn't a feature you bolt onto a legacy backend. It requires sub-second query latency, isolated compute so one investigation doesn't impact others, and full-resolution data so the AI never hits a gap in the record. The architecture is the AI strategy.

New Capabilities Unlocked

High Resolution Data

Full fidelity across every signal. With 600x compression on S3, there is no architectural reason to throw data away. High resolution isn’t a premium feature — it’s the baseline.

  • No metrics downsampling. A 30-second CPU spike that caused a cascade of timeouts won’t disappear into a 5-minute average that looks perfectly normal.
  • No trace sampling. Every trace is captured. An agent investigating an incident can follow the exact request that failed — not a statistical sample of requests that didn’t.
  • Full log retention. No dropping logs to control costs. The context you need for root cause analysis is always there when you need it.

Humans couldn’t consume all this data. Agents can.

AI Agents Can Query Freely

Dashboards load instantly. Range queries over weeks of high-cardinality data come back in milliseconds. An AI agent investigating an incident can fire hundreds of exploratory queries without crashing the system. Each query gets its own isolated compute, so a heavy query never impacts anything else running at the same time. Traditional systems force you to rate-limit agents or risk taking down dashboards for everyone. Elastic compute makes that trade-off disappear.

Ingest and Query Are Fully Isolated

A traffic spike that doubles your ingest volume has zero impact on query performance. A heavy dashboard refresh doesn’t slow down data ingestion. In traditional systems, ingest and query compete for the same CPU, memory, and disk I/O — so a surge in one degrades the other. With fully separated paths, each scales independently without interference.

Cost Efficient by Design

No idle compute running 24/7. No 3x SSD replication for durability — S3 handles that natively. Object storage pricing for your data. You pay for the queries you actually run, not for capacity sitting idle. No more choosing between sampling traces or blowing your budget — the architecture is cheap enough that you never have to trade visibility for cost.

Long Retention

Keep months or years of full-resolution data without breaking the bank. AI agents investigating an incident can compare current behavior against historical baselines from weeks or months ago — catching slow-burn regressions and seasonal patterns that short retention windows would miss entirely.

Zero Ops

No clusters to manage. No capacity planning. No rebalancing. No disks to monitor. No 3 am pages about the observability system itself. Infrastructure that manages itself so your team works on the product, not the plumbing.

Open Standards. No Lock-In.

Proprietary query languages and closed formats create switching costs by design. Open standards remove them.

Standard Query Protocols

PromQL for metrics. LogQL for logs. No proprietary query language to learn, no vendor-specific syntax to migrate away from.

Standard Ingest Protocols

OpenTelemetry (OTLP), Prometheus remote write, and common log formats. Your existing instrumentation just works.

Your Dashboards Work

Grafana dashboards, existing alerts, and recording rules carry over without changes. No rewrite required.

Easy Way In, Easy Way Out

No proprietary agent format. No data held hostage. If you ever want to leave, your data and queries are already in standard formats.

Own Your Data

Flexible deployment models to match your security, compliance, and data residency requirements. Your observability data is yours.

SaaS

Easiest

Fully managed by Oodle. Zero infrastructure to run. Zero ops. Start ingesting in minutes. We handle everything — storage, compute, upgrades, scaling.

Best for teams that want to focus entirely on their product.

Bring Your Own Bucket

BYO-B

Oodle runs as a managed service, but all your observability data is stored in your own S3 bucket. You always have full access to your raw data — even if you stop using Oodle.

Best for teams that need data ownership with zero ops.

Bring Your Own Cloud

BYO-C

Oodle runs entirely within your AWS account. Your data never leaves your VPC. Full control over networking, encryption, and access policies. Meets the strictest compliance requirements.

Best for regulated industries and strict data residency needs.

Complete Observability at 1/5th the Cost

Go live in 15 minutes. No clusters to manage. No vendor lock-in.

5x
Lower cost
< 3s
p99 query latency
15 min
Time to go live
0
Ops overhead