Parallel Execution Infrastructure

The Spark for AI agents.

Name: KiloAgents
Author: Quanlai Li

Multi-agent systems break at scale. KiloAgents is the runtime that makes parallel AI agent workflows actually work.

Get Early Access Read the Paper

Backed by Y Combinator S26

The Problem

The multi-agent scaling wall

1.724x

Coordination overhead grows super-linearly

Google Research

3–4

Maximum effective agents before degradation

Industry benchmarks

15x

Token explosion in multi-agent systems

Stanford HAI

43%

Teams cite inter-agent communication as #1 latency source

Developer survey, 2025

The Insight

We've seen this before

MapReduce (2004)

Agent Frameworks (2025)

Rigid two-stage pipelines

Rigid orchestrator-worker patterns

Disk-based, no shared state

No shared context between agents

No execution optimization

No workflow optimization

Then Spark arrived.

Now KiloAgents.

The Five Primitives

How it works

Versioned Context Snapshots

Immutable, tracked context objects — like RDDs for agent state. Full lineage tracking so you can trace any decision back to its source.

Task Dependency DAG

Automatic parallelism detection. We build the graph, identify what can run concurrently, and sequence dependent tasks correctly.

Workflow Optimizer

The missing piece. No agent framework has this. We analyze the full task graph and eliminate redundant operations before any agent starts working.

Shared Fast-Access Context Layer

Tiered context (working, session, memory) with compiled views. Each agent gets exactly the relevant slice — 7.8x speedup through intelligent cache sharing.

Lineage-Based Fault Tolerance

When an agent fails, recompute only the affected downstream tasks. Not everything. Just like Spark recomputes lost RDD partitions.

Benchmarks

Performance at scale

Without KiloAgents

With KiloAgents

2 agents

15% overhead12% overhead

4 agents

38% overhead22% overhead

8 agents

72% overhead30% overhead

16 agents

95% overhead38% overhead

32 agents

100% overhead44% overhead

Benchmark: Multi-agent research + synthesis pipeline, measured on 8-core cloud instance

73%

Token reduction

7.8x

Latency improvement

32+

Effective agent scaling

System Design

Architecture

Your Agent Code

LangGraph

CrewAI

OpenAI SDK

Raw API

KiloAgents Runtime

DAG Scheduler

Context Layer

Optimizer

Claude

GPT

Gemini

Local Models

Quanlai Li

Founder & CEO

I built BIDMach at Berkeley — GPU-accelerated ML that outperformed 50-node Spark clusters. Now I'm applying the same systems thinking to AI agents.

UC Berkeley|7,000+ commits/year with AI agents

The runtime layer is missing.
We're building it.

Open source core. Managed cloud coming soon.