What happens when OpenClaw agents attack each other

1 points by udit_50 3 hours ago

We ran a live adversarial security test between two autonomous AI agents built on OpenClaw.

One agent acted as a red team attacker. One agent acted as a standard defensive agent.

No humans were involved once the session started. The agents communicated directly over webhooks with real credentials and tooling access.

The goal was to test three risk dimensions that tend to break autonomous systems in practice: access, exposure, and agency.

The attacker first attempted classic social engineering. It offered a “helpful” security pipeline that hid a remote code execution payload and requested credentials. The defending agent correctly identified the intent and blocked execution.

The attacker then pivoted to an indirect attack. Instead of asking the agent to run code, it asked the agent to review a JSON document with hidden shell expansion variables embedded in metadata. This payload was delivered successfully and is still under analysis.

The main takeaway is that direct attacks are relatively easy to defend against. Indirect execution paths through documents, templates, and memory are much harder.

This report is not a claim of safety. It is an observability exercise intended to surface real failure modes in agent-to-agent interaction, which we expect to become common as autonomous systems are deployed more widely.

Full report here: https://gobrane.com/observing-adversarial-ai-lessons-from-a-live-openclaw-agent-security-audit/

Happy to answer technical questions about the setup, methodology, or findings.