SuperClaw – Open-Source Framework to Red-Team AI Agents for Security Testing

Superagentic AI has released SuperClaw, an open-source, pre-deployment security testing framework built specifically for autonomous AI coding agents.

Announced in late 2025, SuperClaw addresses a growing blind spot in enterprise AI adoption: agents are routinely deployed with broad tool access and high privileges, yet most organizations skip structured security validation entirely before going live.

The core concern driving SuperClaw’s development is straightforward. Autonomous AI agents reason dynamically over time, make decisions based on accumulated context, and adapt their behavior, breaking the assumptions of every traditional security scanner built for static, deterministic software. SuperClaw exists to test how an agent behaves under adversarial conditions, not just how it is configured.

How SuperClaw Works

SuperClaw performs scenario-driven, behavior-first security evaluations against real agents in controlled environments.

It generates adversarial scenarios using its built-in Bloom scenario engine, executes them against a live or mock agent target, captures full evidence including tool calls and output artifacts, and then scores results against explicit behavior contracts structured specifications that define intent, success criteria, and mitigation guidance for each security property.

The framework supports five core attack techniques out of the box: prompt injection (direct and indirect), encoding obfuscation (Base64, hex, Unicode, typoglycemia), jailbreaks (DAN, role-play, grandmother bypasses), tool-policy bypass via alias confusion, and multi-turn escalation across conversation turns.

Security behaviors under evaluation span critical risks like prompt-injection resistance and sandbox isolation, high-severity concerns such as tool-policy enforcement and cross-session boundary integrity, and medium-severity issues like configuration drift detection and ACP protocol security.

Attack technique	Description	What it tests in agents
prompt-injection	Malicious prompts try to override system or developer instructions and hijack the agent’s decision-making.	Whether the agent can detect and reject injected instructions instead of following untrusted user or content-sourced prompts. genai.
encoding	Uses Base64, hex, Unicode tricks, or typoglycemia-style obfuscation to hide malicious intent inside seemingly innocuous text.	Whether the agent (and its filters) can spot and refuse encoded payloads instead of decoding and executing or forwarding them blindly.
jailbreak	Techniques such as DAN-style prompts, role-play, emotional pressure, or “ignore previous rules” patterns that bypass guardrails.	How resilient the agent is to safety bypass attempts that target its refusal policies and content filters.
tool-bypass	Exploits tool aliases, ambiguous descriptions, or weak policies to get the agent to call powerful tools in unintended ways.	Whether the agent follows strict allow/deny rules for tools, and if it can resist being tricked into dangerous tool usage.
multi-turn	Gradual, multi-step conversations that escalate from benign queries to malicious objectives over several turns.	How the agent manages long-context interactions, remembers earlier instructions, and maintains safety over time instead of only per-message.

Reports are generated in HTML for human review, JSON for automation pipelines, or SARIF format for direct integration with GitHub Code Scanning and CI/CD workflows.

SuperClaw also integrates with CodeOptiX, Superagentic AI’s multi-modal code evaluation engine, enabling combined security and optimization assessments in a single pipeline.

SuperClaw ships with strict built-in guardrails. By default, it operates in local-only mode, blocking any remote targets to prevent accidental or unauthorized use. Connecting to remote agents requires a valid SUPERCLAW_AUTH_TOKEN password obtained from the target system’s administrator.

The project also explicitly requires written authorization before any test is run, and stresses that automated findings are signals to verify manually, not proof of exploitation.

SuperClaw is available now on GitHub under the Apache 2.0 license and is installable via pip install superclaw. It is part of the broader Superagentic AI ecosystem alongside SuperQE and CodeOptiX, targeting development teams that need production-grade agent security before deployment.

Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.

The post SuperClaw – Open-Source Framework to Red-Team AI Agents for Security Testing appeared first on Cyber Security News.

Kaynak: Cyber Security News

Yayin Tarihi: 21.02.2026 15:36

Görüntülenme: 1

SuperClaw – Open-Source Framework to Red-Team AI Agents for Security Testing

How SuperClaw Works

Benzer Yazılar

Hackers Leveraging Multiple AI Services to Compromise 600+ FortiGate Devices

Cloudflare Down – 6 Hour of Massive Global Service Outage Cause Customers Unreachable From the Internet

Birden Fazla Bilgisayar Korsanı Grubu, API anahtarını Çalmak ve Kötü Amaçlı Yazılım Dağıtmak için OpenClaw Örneklerinden Yararlanır

Bir yanıt yazın Yanıtı iptal et