OpenSpace: The Self-Evolving AI Agent Engine That Cuts Costs by 45.9% and Earns 4.2x More
Fact-Checked Deep Dive | 1.5k GitHub Stars | GDPVal Benchmark Verified
When you deploy an AI agent today—whether it's Claude Code, Cursor, OpenClaw, or nanobot—you're deploying a stateless worker. Every task starts from zero. Every mistake gets repeated. Every successful pattern evaporates into the void once the session ends.
OpenSpace changes this. Developed by HKUDS (HKU Data Science Lab), OpenSpace is a self-evolving skill engine that transforms AI agents from disposable tools into learning systems that accumulate expertise, share knowledge across agents, and deliver measurable economic returns.
The numbers from the GDPVal benchmark are striking:
| Metric | OpenSpace Performance | Baseline (ClawWork) |
|---|---|---|
| Value Capture | 72.8% ($11,484 / $15,764) | 17.4% |
| Average Quality | 70.8% | 40.8% (+30pp improvement) |
| Token Efficiency | −45.9% (Phase 2 vs Phase 1) | N/A |
| Income Multiple | 4.2x higher earnings | Baseline |
These aren't synthetic benchmarks. GDPVal evaluates 220 real-world professional tasks across 44 occupations—the same work that generates actual GDP. We're talking payroll calculators from union contracts, tax returns from scattered PDFs, legal memoranda on California privacy regulations.
Let's dig into what makes OpenSpace different, verify the claims, and understand when this architecture matters for your agentic workflows.
The Problem: Why Today's AI Agents Never Learn
Current AI agents suffer from three fundamental weaknesses:
❌ Massive Token Waste
Every task requires reasoning from scratch. Need to parse a CSV file? The agent burns tokens rediscovering pandas.read_csv() parameters. Need to generate a PDF report? It relearns reportlab syntax every single time. There's no memory of successful patterns.
❌ Repeated Costly Failures
Agent A spends 2,000 tokens figuring out that a specific API requires pagination. Agent B, working on the same problem five minutes later, burns the same 2,000 tokens making the same mistakes. Knowledge doesn't transfer.
❌ Skills Degrade Silently
You write a skill that calls the Stripe API. Stripe updates their endpoints. Your skill breaks—not with a clear error, but with subtle data corruption. No monitoring, no auto-repair, no version tracking.
OpenSpace's thesis: Skills should be living entities that auto-repair, improve through usage, and share learnings across the entire agent network.
What Is OpenSpace? Three Superpowers for AI Agents
OpenSpace plugs into any agent that supports the SKILL.md format (Claude Code, Codex, OpenClaw, nanobot, Cursor) and adds three core capabilities:
🧬 1. Self-Evolution
Skills that learn and improve automatically through three mechanisms:
- AUTO-FIX: When a skill breaks (API changes, dependency errors), OpenSpace detects the failure and generates a fix. The repaired skill becomes a new version.
- AUTO-IMPROVE: Successful execution patterns get captured and optimized. If a skill works but uses 800 tokens, OpenSpace tries to distill it to 400 tokens.
- AUTO-LEARN: When an agent completes a novel task successfully, the workflow gets captured as a reusable skill—no manual coding required.
- Quality Monitoring: Tracks error rates, execution success, and token consumption across all tasks. Skills with high failure rates get flagged for review.
🌐 2. Collective Agent Intelligence
Turn individual agents into a shared brain:
- Shared Evolution: One agent's improvement becomes every agent's upgrade. If Agent A evolves a skill for parsing complex PDFs, Agent B instantly benefits.
- Network Effects: More agents → richer data → faster evolution for everyone.
- Access Control: Choose public, private, or team-only access for each skill.
- Cloud Community: Browse and download evolved skills at open-space.cloud.
💰 3. Token Efficiency
Stop repeating work. Start reusing solutions:
- Cold Start → Warm Rerun: First execution of a task type builds the skill. Subsequent similar tasks reuse the evolved skill, dramatically reducing token consumption.
- Small Updates Only: Fix what's broken, don't rebuild everything.
- Measured Savings: 45.9% average token reduction across 50 professional tasks in GDPVal benchmark.
GDPVal Benchmark Results: Real Economic Impact
GDPVal is a benchmark dataset containing 220 real-world professional tasks covering 44 occupations, evaluated using actual economic value as the standard. OpenSpace was tested on 50 tasks across 6 industries in a two-phase design:
- Phase 1 (Cold Start): Execute all 50 tasks sequentially with no prior skills
- Phase 2 (Warm Rerun): Re-execute the same 50 tasks with the evolved skill database from Phase 1
Overall Results
| Metric | OpenSpace (Qwen 3.5-Plus) | ClawWork Baseline (Same LLM) |
|---|---|---|
| Value Captured | $11,484 / $15,764 (72.8%) | ~$2,743 (17.4%) |
| Quality Score | 70.8% average | 40.8% best agent |
| Token Reduction | −45.9% (Phase 2 vs Phase 1) | N/A |
| Income Multiple | 4.2x higher | Baseline |
Important: Both OpenSpace and the ClawWork baseline used the same backbone LLM (Qwen 3.5-Plus). The performance difference comes purely from skill evolution, not model capabilities.
Breakdown by Category
| Category | Tasks | Income Δ | Token Δ | Why It Matters |
|---|---|---|---|---|
| Documents & Correspondence | 7 | 71% → 74% (+3.3pp) | −56% | California privacy law memoranda, surveillance reports. The document-gen-fallback skill family evolved through 13 versions. |
| Compliance & Forms | 11 | 51% → 70% (+18.5pp) | −51% | Tax returns from 15 PDFs, pharmacy compliance checklists. PDF skill chain evolves once, all form tasks reuse it. |
| Media Production | 3 | 53% → 58% (+5.8pp) | −46% | Audio/video via ffmpeg. Evolved skills encode working codec flags, eliminating sandbox trial-and-error. |
| Engineering | 4 | 70% → 78% (+8.7pp) | −43% | Technical specifications, CAD file processing. Reusable engineering calculation patterns. |
| Data Analysis | 14 | 68% → 75% (+7pp) | −42% | CSV analysis, statistical reports. Pandas patterns captured and reused. |
| Research & Writing | 11 | 65% → 72% (+7pp) | −38% | Market research, technical documentation. |
Every category improved—no exceptions.
How Self-Evolution Actually Works: FIX, DERIVED, CAPTURED
OpenSpace implements three distinct evolution modes:
1. FIX Mode (Auto-Repair)
Trigger: Skill execution fails with a specific error type.
Example: A skill calls stripe.Customer.create() but Stripe updated the API to require email as a required field.
Execution Error: Missing required field 'email'
→ AUTO-FIX triggered
→ Skill updated: adds email parameter validation
→ New version: data-validation-csv v1.1.0
The fixed skill is stored as a new version, preserving the lineage. You can trace exactly when and why a skill evolved.
2. DERIVED Mode (Optimization)
Trigger: Successful execution with opportunity for improvement.
Example: A skill works but uses 1,200 tokens. OpenSpace analyzes the execution trace and creates a distilled version:
Original: 1,200 tokens, 8 steps
Derived: 650 tokens, 5 steps (same output quality)
→ Skill marked as v2.0 (optimized)
3. CAPTURED Mode (New Skill Creation)
Trigger: Novel task completed successfully without existing skill.
Example: Agent builds a monitoring dashboard with 20+ panels. The entire workflow gets captured as a reusable skill:
---
name: monitoring-dashboard-builder
description: Creates live monitoring dashboards with 20+ panels
target: docker, prometheus, grafana
---
# Workflow captured from successful execution
1. Scan running containers
2. Extract metrics endpoints
3. Generate Grafana datasource configs
4. Create dashboard JSON with 20 panels
5. Deploy and validate
Skill Storage: SQLite + SKILL.md
OpenSpace stores skills in two formats:
- SQLite Database: Metadata, execution history, performance metrics, evolution lineage
- SKILL.md Files: Human-readable skill definitions with instructions, code snippets, and triggers
You can inspect the database directly:
sqlite3 /path/to/workspace/.openspace/openspace.db
SELECT name, version, origin, execution_count FROM skills ORDER BY execution_count DESC;
Collective Intelligence: One Agent Learns, All Benefit
This is where OpenSpace gets interesting for teams and production systems.
Cloud Community: open-space.cloud
Register at open-space.cloud to access:
- Public Skills: Browse 165+ evolved skills from the GDPVal benchmark
- Skill Lineage: See how skills evolved (e.g.,
document-gen-fallbackhas 13 versions) - Upload/Download: Share your team's evolved skills or download community skills
- Access Control: Mark skills as public, private, or team-only
Real-World Impact
Imagine your team has 10 agents running in production:
- Without OpenSpace: Each agent independently discovers (and forgets) solutions. Agent #3 figures out the Stripe API pagination. Agent #7 burns tokens rediscovering it.
- With OpenSpace: Agent #3's discovery becomes a skill. Agents #4-#10 instantly benefit. Next week, Agent #7 encounters a new edge case, fixes the skill, and everyone upgrades.
Network Effect Formula: More agents → More executions → More evolution data → Better skills → Lower costs → More agents.
Case Study: My Daily Monitor - 20+ Panels, Zero Human Code
The OpenSpace team showcased a personal behavior monitoring system built entirely by an agent:
- 20+ Live Dashboard Panels: Processes, servers, terminals, news, markets, messages, schedules
- 60+ Skills Evolved: All created autonomously through OpenSpace execution
- Zero Human-Written Code: The agent developed the entire system end-to-end
This isn't a static dashboard. It includes a built-in AI agent that can:
- Answer questions about your processes
- Provide analysis of system metrics
- Execute tasks (restart services, deploy updates, send alerts)
Why This Matters: Traditional agent development requires humans to write skills, test them, deploy them. OpenSpace demonstrates that agents can autonomously develop complex systems, evolving skills as they encounter challenges.
Integration: Plug Into Claude Code, Cursor, OpenClaw
OpenSpace works with any agent that supports the SKILL.md format. Here's how to integrate:
Step 1: Install OpenSpace
git clone https://github.com/HKUDS/OpenSpace.git
cd OpenSpace
pip install -e .
Pro Tip: Skip the 50MB assets/ folder for faster cloning:
git clone --filter=blob:none --sparse https://github.com/HKUDS/OpenSpace.git
cd OpenSpace
git sparse-checkout set '/*' '!assets/'
pip install -e .
Step 2: Add to Your Agent's MCP Config
For agents that support MCP (Model Context Protocol):
{
"mcpServers": {
"openspace": {
"command": "openspace-mcp",
"toolTimeout": 600,
"env": {
"OPENSPACE_HOST_SKILL_DIRS": "/path/to/your/agent/skills",
"OPENSPACE_WORKSPACE": "/path/to/OpenSpace",
"OPENSPACE_API_KEY": "sk-xxx (optional, for cloud)"
}
}
}
}
Step 3: Copy Core Skills
cp -r OpenSpace/openspace/host_skills/delegate-task/ /path/to/your/agent/skills/
cp -r OpenSpace/openspace/host_skills/skill-discovery/ /path/to/your/agent/skills/
These two skills teach your agent when and how to use OpenSpace—no additional prompting needed.
Step 4: (Optional) Enable Cloud Community
Register at open-space.cloud to get an OPENSPACE_API_KEY, then add it to your config. Without it, all local capabilities work normally.
When to Use OpenSpace (and When Not To)
✅ Use OpenSpace When:
| Use Case | Why |
|---|---|
| High-volume repetitive tasks | Token savings compound quickly (45.9% reduction) |
| Multi-agent teams | Collective intelligence amplifies value |
| Long-running production systems | Skills improve over time, costs decrease |
| Complex workflows with failure modes | AUTO-FIX catches and repairs breaking changes |
| Cost-sensitive deployments | 4.2x income improvement changes unit economics |
❌ Skip OpenSpace When:
| Use Case | Why |
|---|---|
| One-off experimental tasks | Overhead outweighs benefits |
| Simple, stateless queries | No reusable patterns to capture |
| Tight latency requirements | Skill search adds ~100-300ms overhead |
| Highly specialized domains | Community skills may not apply |
The Bottom Line: Economic Viability for AI Agents
OpenSpace addresses the fundamental economic problem of AI agents: costs scale linearly with task complexity because every task starts from zero.
By treating skills as living entities that auto-repair, improve, and share knowledge, OpenSpace flips this model:
- Costs decrease over time as skills evolve and reuse increases
- Quality improves as successful patterns get captured and optimized
- Failures become rare as AUTO-FIX catches breaking changes
- Network effects kick in as more agents contribute to the shared skill pool
The Numbers Don't Lie
- 72.8% value capture on real professional work
- 45.9% token reduction through skill reuse
- 4.2x higher earnings with the same backbone LLM
- 1.5k GitHub stars and growing (not 2.6K as some sources claim—fact-checked)
For teams running AI agents in production, OpenSpace isn't just a nice-to-have. It's the difference between agents that burn money and agents that generate profit.
Ready to try it?
- GitHub: HKUDS/OpenSpace
- Cloud Community: open-space.cloud
- Documentation: See
openspace/host_skills/README.mdfor integration guides
The era of stateless, forgetful AI agents is ending. Welcome to self-evolving systems that learn from every task, share knowledge across the network, and deliver measurable economic returns.
Fact-Checked Sources:
- GitHub Repository: https://github.com/HKUDS/OpenSpace (1.5k stars, 168 forks)
- GDPVal Benchmark: https://openreview.net/forum?id=hcuEdq6eKD
- MarkTechPost Tutorial: https://www.marktechpost.com/2026/03/24/a-coding-implementation-to-design-self-evolving-skill-engine-with-openspace-for-skill-learning-token-efficiency-and-collective-intelligence/
- Dev|Journal Analysis: https://earezki.com/ai-news/2026-03-24-a-coding-implementation-to-design-self-evolving-skill-engine-with-openspace-for-skill-learning-token-efficiency-and-collective-intelligence/