AI Security Agent

Your code has secrets.
Helios finds them.

The first security agent that thinks like a hacker. Autonomously finds, exploits, and fixes vulnerabilities — with two AI agents battling over your code in real-time.

View Source See how it works →

helios — arena-demo

Arena

Adversarial Agents Arena

Two Claude AI instances with opposing objectives battle over your codebase in real-time. One attacks. One defends. You watch.

Red Agent // Attacker

[RED] Scanning target for attack vectors...
[RED] Found: SQL injection in /api/login
[RED] Severity: CRITICAL
[RED] Crafting auth bypass payload...
[RED] Executing: ' OR 1=1 --
[RED] Exploit successful. Admin access gained.
[RED] Scanning for XSS in /search...
[RED] Found: Reflected XSS via q= param
[RED] SQLi blocked. Adapting strategy...
[RED] Chaining XSS + SSRF for lateral move
[RED] Generating custom exploit script...
[RED] Swarm agent 2 targeting /api/upload
[RED] Path traversal: ../../etc/passwd
[RED] File read successful. 3 chains complete.

Blue Agent // Defender

[BLUE] Monitoring container logs...
[BLUE] Anomalous query on /api/login
[BLUE] Threat: SQL injection attempt
[BLUE] Deploying parameterized query patch
[BLUE] Patch applied. SQLi vector closed.
[BLUE] Detected script injection in /search
[BLUE] Applying output encoding...
[BLUE] WAF rule deployed: block XSS payloads
[BLUE] XSS vector neutralized.
[BLUE] Suspicious file read on /api/upload
[BLUE] Patching path traversal...
[BLUE] Input sanitization deployed
[BLUE] Hardening CORS and CSP headers
[BLUE] Container hardened. Monitoring...

Live Battle Score -- Inkscape Web Test

Red Agent: 14 Blue Agent: 8

Capabilities

What Helios does

Not a wrapper around an API call. Deep AI-powered security analysis with real exploitation and automated remediation.

Autonomous Scanner

SQL injection, XSS, command injection, SSRF, path traversal, hardcoded secrets, dependency vulns, and config misconfigurations.

Sandbox Exploitation

Docker-based sandbox runs your app in isolation. Real exploits execute against live containers with captured proof-of-concept output.

Supply Chain Analysis

Maps dependency relationships across multiple repos. Identifies transitive attack vectors -- vulns in package A that create exploits in package B.

AI-Generated Exploits

Not predefined payloads. The Red Agent analyzes your specific codebase and crafts bespoke attack scripts that adapt when defenses change.

Auto-Fix and Patch

Generates code patches for every finding. Creates GitLab issues with severity labels and opens merge requests with verified fixes.

GitLab Native

Clone repos via GitLab API. Create issues, open fix branches, submit merge requests. Drop-in CI/CD pipeline template for any project.

Process

How it works

Claude Agent SDK with custom MCP tools. Docker sandbox for safe exploitation. Concurrent adversarial agents via anyio task groups.

Scan

AI analyzes your codebase with static analysis, secret detection, dependency checking, and configuration auditing.

Exploit

Docker sandbox spins up your app. AI generates custom exploit scripts and executes them against the live container.

Battle

Red Agent attacks while Blue Agent defends in real-time. Concurrent agents compete with live scoring.

Fix

Auto-generates patches, creates GitLab issues with severity labels, and opens merge requests with verified fixes.

Architecture

Under the hood

Built on Claude Agent SDK with Model Context Protocol tools. Each agent gets a tailored security toolset.

Claude Agent SDK
  |
  +-- Red Agent (attacker)
  |     +-- Scanner (static analysis, secrets, deps, config audit)
  |     +-- Exploiter (SQLi, XSS, SSRF, auth bypass, path traversal)
  |     +-- Custom Exploit Generator (AI writes bespoke attack scripts)
  |     +-- Supply Chain Analyzer (cross-repo transitive vectors)
  |
  +-- Blue Agent (defender)
  |     +-- Log Monitor (real-time container log analysis)
  |     +-- Patch Deployer (live patching of running containers)
  |     +-- WAF Rule Engine (runtime input validation)
  |     +-- Patch Verifier (confirms fixes block attack vectors)
  |
  +-- Arena Orchestrator
        +-- Docker Sandbox (isolated exploitation environment)
        +-- Score Tracker (Red exploits vs Blue patches)
        +-- Rich Terminal Display (split-panel live output)

Comparison

Why Helios

Most security tools scan for patterns. Helios thinks like a hacker.

Capability	Traditional Scanners	Helios
Detection	Pattern matching	AI-powered contextual analysis
Verification	"You might be vulnerable"	"I just exploited this. Here's proof."
Exploitation	None	Sandbox with proof-of-concept
Defense Testing	None	AI vs AI adversarial battle
Supply Chain	Single repo	Cross-repository analysis
Custom Exploits	Predefined payloads	AI generates bespoke exploits
Cost	$15-50K / pentest	Your existing Claude subscription

Results

Real-world testing

Tested against production GitLab-hosted projects. Every finding verified with proof-of-concept in Docker sandbox.

Vulnerabilities

Found in Inkscape Web

Critical

Exploited in sandbox

Attack Vectors

Transitive supply chain

Verified

% with PoC output

Test Target Inkscape Web (GitLab-hosted) -- Supply chain analysis across Inkscape Web + django-cms dependency trees. Arena swarm mode: Red Agent 14 vs Blue Agent 8.

Stack

Built with

Built for the GitLab AI Hackathon with Claude Agent SDK and the Model Context Protocol.

Claude Agent SDK

AI reasoning and MCP tool orchestration

Docker

Sandboxed exploitation environment

python-gitlab

GitLab API integration

Rich

Terminal UI and arena display

Click

CLI framework

anyio

Async concurrency for parallel agents

Your code has secrets. Helios finds them.