Quenda: A Lightweight, Layered Agent Framework for Python

Date: 2026-07-05 Project: github.com/xvshiting/quenda

When I first started building LLM-powered applications, I kept running into the same problem: agent frameworks either handed me a black box that was impossible to reason about, or they dumped so many primitives on my desk that wiring them together took longer than the actual product. I wanted something in between — a framework small enough to hold in my head, but structured enough to scale from a one-shot script to a persistent coding assistant. That’s why I built Quenda.

1. The Design Goal: Small Surface, Strict Layers

Quenda’s entire public SDK reduces to four concepts: Agent, Session, @tool, and a provider registry. Everything else is an implementation detail. The reason the surface stays small is a strict four-layer architecture:

Interface → Host → Runtime → Kernel
Layer Responsibility
Kernel Synchronous model-tool loop. No knowledge of agents, sessions, users.
Runtime Async Agent/Session/Run lifecycle, event emission, context mgmt.
Host Persistence, identity, permissions, instruction composition, skills.
Interface Event rendering, user interaction, REPL.

Each layer depends only on the layer inside it. The innermost Kernel is a pure function of (messages, tools, model) → events, which means you can test the entire core with fake models and never touch the network. This is the property I cared most about: tests that don’t flake on rate limits or model downtime.

2. The Kernel: A Testable Model-Tool Loop

Most agent loops interleave concerns — they fetch instructions, persist state, render UI, and call the model all in the same function. Quenda’s Kernel refuses to do any of that. It only knows how to:

  1. Send the current messages + tools to a model
  2. Dispatch any tool calls the model returns
  3. Loop until the model produces a final message

Because nothing else leaks in, replacing the model with a deterministic fake turns the loop into a pure state machine. This makes regression tests instant and behavior changes auditable.

3. The Runtime: Async Sessions and Events

On top of the Kernel, the Runtime adds the async lifecycle: Agent, Session, and Run. A Session is a resumable conversation; a Run is a single send() invocation within that session. Every step of the run emits structured events — tool-started, tool-finished, model-responding, context-compressed — so any Interface can render progress without coupling to internals.

from quenda import Agent, tool
from quenda.providers import get_provider_registry
from quenda.tools import get_core_tools
import asyncio

@tool
def calculate(expression: str) -> float:
    """Safely evaluate a math expression."""
    import ast
    node = ast.parse(expression, mode='eval')
    return eval(compile(node, '<string>', 'eval'), {"__builtins__": {}}, {})

model = get_provider_registry().get_model("deepseek", "deepseek-v4-flash")

agent = Agent(
    name="assistant",
    system_prompt="You are a helpful assistant.",
    tools=[calculate, *get_core_tools(".")],
    model=model,
)

async def main():
    session = agent.open_session()
    result = await session.send("What is 15% of 847?")
    print(result)

asyncio.run(main())

That’s a complete agent. No boilerplate, no framework-specific DSL, no implicit globals.

4. Providers: 26 Behind One Registry

A common pain point is vendor lock-in. Quenda ships with 26 built-in providers covering 300+ models — OpenAI, Anthropic, DeepSeek, DashScope, Moonshot, OpenRouter, Ollama, and more — all behind one get_provider_registry() and one ModelSpec interface. Switching models mid-session is a single call:

/model deepseek/deepseek-v4-flash

Adding your own provider takes five lines — point it at your base URL, declare the API flavor (openai-completions or otherwise), and register it.

5. Tools: Workspace-Scoped by Default

get_core_tools(workspace) returns nine essential tools:

Tool Capability
list_files Browse directories (ls, find, tree)
search_text Search file contents (grep, rg)
read_file View files with line ranges
write_file Create or overwrite files
apply_patch Apply targeted text patches
run_shell Execute shell commands (filtered)
execute_python Run Python in a sandbox
request_interaction Ask the user for input
request_skill_activation Ask to activate a skill

None of them reach outside the workspace root, and the shell/Python tools enforce command filtering and import restrictions. Security lives in the code path, not in a checklist.

6. Skills: Composable Capability Packages

The newest addition is the Skills framework, introduced in the 2026-06 release. A skill is a package of instructions, resources, and optional tools that an agent can discover and activate on demand. Think of them as plug-in competencies — a “git-workflow” skill, a “code-review” skill, a “thesis-formatting” skill — that composition into a system prompt without bloating the base context.

Skills compose cleanly with context compression (also new in 2026-06): when the context grows large, Quenda summarizes earlier turns automatically, and the /compress command lets you trigger it manually. The agent stays responsive even in long sessions.

7. Quenda Code: A Coding Agent That Eats Its Own Dog Food

The flagship application is Quenda Code, an AI coding agent that runs in the terminal:

pip install quenda quenda-code
quenda code

It reads your codebase, writes code, runs commands, and helps you ship. What makes it a useful stress test of the framework is that it exercises every layer — Kernel math, Runtime session persistence, Host instruction composition + skills, and the Interface REPL — in a real workflow. If the SDK has an awkward corner, the coding agent finds it first.

A typical session:

> read the main entry point and explain how it works

I'll read the main entry point...
[Reads src/quenda/cli.py]
The entry point is `cli.py:main()` ...

> add a --version flag to the CLI

[Applies patch to cli.py]
Done. Added `--version` flag that prints the version and exits.

> run the tests
[Runs pytest]
All 42 tests passed.

REPL Commands

Command Description
/help Show available commands
/mode [code\|architect\|chat] Switch interaction mode
/model <provider>/<model> Switch model mid-session
/skill list List available skills
/skill activate <name> Activate a skill
/compress Manually compress context
/status Show session and token info
/reset Clear conversation history

8. What I Learned Building It

A few things crystallized over the course of the project:

  1. Layering is a forcing function for testability. Once the Kernel had no I/O, the rest of the system got cheap to test almost for free.
  2. Tools should be policies, not magic. run_shell filters commands; execute_python restricts imports. Putting that logic in the tool — rather than hoping the model behaves — is the only thing that scales.
  3. Providers are a registry, not a class hierarchy. Modeling every vendor behind one ModelSpec removed a whole category of accidental complexity.
  4. Events beat callbacks. Emitting structured events from the Runtime let me swap interfaces (CLI, HTTP, test harness) without touching the agent.

9. Roadmap

The 2026-06 release added skills, context compression, interaction requests, and command extensions. What’s next on my list:

  • Skill marketplace — share + discover community skills
  • Multi-agent orchestration — first-class agent-to-agent messaging built on the same event stream
  • Fine-tuned draft models — speculative-decoding-style acceleration for agent loops
  • More providers — push past 26 toward a self-describing provider spec

10. Get Started

pip install quenda quenda-code   # CLI coding agent
pip install quenda                # SDK only

Requires Python 3.12+. Zero required runtime dependencies.

  • Repository: github.com/xvshiting/quenda
  • SDK Tutorials: 8 chapters covering agents, tools, providers, sessions, and events
  • CLI Tutorials: 5 chapters on Quenda Code
  • Architecture Decisions: ADR records in docs/decisions/

Quenda is intentionally small. If you’ve been looking for an agent framework that fits in your head but doesn’t disappear when the workflow gets serious, give it a try.