All Insights

AI Security8 min read

Feb 24, 2026

The Enterprise Evaluation Framework for OpenClaw

Wenxi Huang

What is OpenClaw and Why did 160,000 Developers Drop What They're Doing?

OpenClaw is an open-source, autonomous AI agent that connects large language models to your computer, your accounts, and your tools. It reads your email, writes code, manages files, browses the web, and executes shell commands, all without asking for permission at every step.

Created by Austrian developer Peter Steinberger and released in November 2025, OpenClaw attracted over 150,000 GitHub stars and 2 million visitors in a single week after going viral in January 2026. By mid-February, Steinberger had joined OpenAI to lead their "next generation of personal agents."

The security numbers are a bit less impressive.

Within the first 24 hours of scanning, Bitsight's STRIKE team identified over 40,000 exposed OpenClaw instances. Bitdefender later counted 135,000+ publicly accessible instances, many over unencrypted HTTP. SecurityScorecard found that 63% of observed deployments were vulnerable, with 12,812 instances exploitable via remote code execution.

A security audit identified 512 vulnerabilities, 8 classified as critical. A one-click RCE vulnerability (CVE-2026-25253) with a CVSS score of 8.8 affects versions before 2026.1.29. And a supply chain campaign called ClawHavoc poisoned the skill marketplace with over 1,184 malicious packages.

Meanwhile, Token found that 1 in 5 of their customers had deployed OpenClaw without IT approval. Meta, Google, Microsoft, and Amazon have all banned it from corporate hardware.

Every major security vendor has published an advisory. Not one has published a structured evaluation framework.

This article introduces the CLAW-10 Enterprise Readiness Matrix: a vendor-neutral, 10-dimension scoring system that any security team can use to evaluate OpenClaw (or any autonomous AI agent) against enterprise requirements.

Timeline of OpenClaw's growth from its November 2025 release through major security incidents and corporate bans in early 2026.

The CLAW-10 Enterprise Readiness Matrix

Most security advisories on OpenClaw can be summarized as "don't touch it." Although autonomous AI agents might not be ready for enterprise deployments today, security teams should have the tools to keep a pulse on the evolving risk posture of these agents

The CLAW-10 Matrix evaluates autonomous AI agents across 10 dimensions that matter to enterprise buyers. Each dimension receives a score from 1 (missing or minimal) to 5 (best-in-class), based on publicly documented evidence. We define a score of 4 as the minimum enterprise-ready threshold.

Here's how OpenClaw scores today.

Radar chart of OpenClaw's CLAW-10 scores across 10 security dimensions, all well below the enterprise-ready threshold of 4.

Scoring Summary

#	Dimension	OpenClaw Score	Enterprise Threshold	Verdict
1	Identity & Authentication	1 / 5	4	Critical gap
2	Authorization & Access Control	1 / 5	4.5	Critical gap
3	Audit Logging & Observability	2 / 5	4.5	Major gap
4	Data Isolation & Residency	1 / 5	4	Critical gap
5	Execution Sandboxing	1 / 5	4.5	Critical gap
6	Compliance Certifications	1 / 5	4	Critical gap
7	Supply Chain Security	1 / 5	4	Critical gap
8	Network Exposure & Attack Surface	2 / 5	4	Major gap
9	Privilege Model	1 / 5	4	Critical gap
10	Vendor Support & SLAs	1 / 5	3	Major gap
	Composite Score	1.2 / 5	4.0	Not enterprise-ready

Dimension 1: Identity & Authentication (Score: 1/5)

OpenClaw has no built-in SSO, SAML, or OIDC integration. There's no multi-factor authentication. Users authenticate with personal credentials that the agent then inherits directly.

Microsoft's security team explicitly states OpenClaw requires "dedicated non-privileged credentials" for evaluation, implicitly acknowledging that the default credential model is inadequate.

What enterprise-grade looks like: SSO integration via OIDC/SAML, MFA enforcement, session management with token rotation, and identity federation with existing directory services.

Dimension 2: Authorization & Access Control (Score: 1/5)

There's no role-based access control (RBAC). No attribute-based access control (ABAC). No permission boundaries whatsoever.

When you grant OpenClaw your credentials, it inherits all your permissions with none of your judgment. CrowdStrike describes the result: "A compromised agent can use its legitimate tool access to move laterally across systems; shell access becomes the attacker's shell, API keys become the attacker's API keys."

What enterprise-grade looks like: RBAC with least-privilege defaults, per-task permission scoping, human-in-the-loop approval for sensitive operations, and permission inheritance controls.

Dimension 3: Audit Logging & Observability (Score: 2/5)

OpenClaw provides basic conversation logging, but it falls short of enterprise audit requirements. There's no centralized, tamper-evident audit trail. No integration with SIEM or SOAR platforms. No structured event logging that maps to compliance frameworks.

The score of 2 reflects that some activity is recorded locally, but the logs lack the immutability, completeness, and exportability that compliance teams require.

What enterprise-grade looks like: Structured, tamper-evident audit logs with SIEM integration, real-time alerting on anomalous agent behavior, and compliance-mapped event taxonomy.

Dimension 4: Data Isolation & Residency (Score: 1/5)

OpenClaw stores API keys and session data in plaintext files. Bitdefender documented active exploitation campaigns targeting ~/.clawdbot/.env containing plain-text API keys for OpenAI, Anthropic, and AWS.

There's no tenant isolation, no data residency controls, and no encryption at rest for sensitive configuration. Kaspersky confirmed thousands of exposed control panels leaking API keys and private messages.

What enterprise-grade looks like: Encrypted credential storage, tenant-level data isolation, configurable data residency, and secrets management integration (HashiCorp Vault, AWS Secrets Manager).

Dimension 5: Execution Sandboxing (Score: 1/5)

OpenClaw executes with the full privileges of the host user. There's no container isolation, no syscall filtering, no filesystem restrictions.

Microsoft calls it "untrusted code execution with persistent credentials" and recommends deployment only in a "fully isolated VM." Sophos concludes it "can only be run 'safely' in a disposable sandbox with no access to sensitive data".

What enterprise-grade looks like: Containerized or VM-isolated execution, syscall filtering (seccomp/AppArmor), ephemeral runtime environments, and resource quotas.

Dimension 6: Compliance Certifications (Score: 1/5)

OpenClaw holds no SOC 2, HIPAA, GDPR, ISO 27001, or FedRAMP certifications. There's no compliance documentation, no data processing agreements, and no published security practices.

For regulated industries (finance, healthcare, government), this is a non-starter without significant compensating controls.

What enterprise-grade looks like: SOC 2 Type II certification at minimum, with industry-specific certifications (HIPAA BAA, GDPR DPA) and published security whitepapers.

Dimension 7: Supply Chain Security (Score: 1/5)

The ClawHavoc campaign revealed the depth of the problem. The skill marketplace required only a GitHub account at least one week old: no static analysis, no code review, no signing. A single author published 677 malicious packages.

Cisco's Skill Scanner found 9 security findings including 2 critical and 5 high severity. One skill silently executed curl commands sending data to an external server with zero user awareness.

What enterprise-grade looks like: Signed and verified extensions, automated security scanning, curated and audited marketplace, and supply chain attestation (SLSA/SBOM).

Dimension 8: Network Exposure & Attack Surface (Score: 2/5)

OpenClaw listens on port 18789 by default. While it can be configured for localhost-only binding, the default configuration has led to massive exposure. The CVE-2026-25253 vulnerability works even on localhost-bound instances by pivoting through the victim's browser.

The score of 2 reflects that network binding is configurable, but the defaults are dangerous and the attack surface through browser-based pivots remains.

What enterprise-grade looks like: Secure defaults (localhost-only with authentication), TLS enforcement, API gateway integration, and network segmentation support.

Dimension 9: Privilege Model (Score: 1/5)

OpenClaw operates on an ambient authority model. The agent inherits the full permissions of the user who launched it. There's no concept of least privilege, scoped tokens, or task-specific authorization.

Sophos calls this the "lethal trifecta": access to private data, ability to communicate externally, and ability to process untrusted content. Combined with an LLM that can't distinguish between user instructions and injected prompts, this model is architecturally incompatible with enterprise security requirements.

What enterprise-grade looks like: Capability-based security model, scoped tokens per task, principle of least privilege enforced at the platform level, and cryptographic permission boundaries.

Dimension 10: Vendor Support & SLAs (Score: 1/5)

OpenClaw is a community project. The original creator has left to join OpenAI. There's no commercial entity providing support, no SLAs, no incident response team, and no guaranteed maintenance schedule.

For enterprise deployments, vendor accountability is non-negotiable. When a security incident occurs at 2 AM, you need someone to call.

What enterprise-grade looks like: Dedicated support with SLAs, published incident response procedures, regular security updates with defined cadence, and a commercial entity that can sign contracts.

CLAW-10 gap analysis showing OpenClaw's current maturity scores against enterprise-required thresholds across all 10 dimensions.

How to Use the CLAW-10 Matrix

The CLAW-10 Matrix isn't a pass/fail test. It's a structured conversation tool for security teams evaluating any autonomous AI agent.

Step 1: Weight the dimensions: Not every dimension carries equal weight for every organization. A healthcare company will weight Compliance Certifications and Data Isolation higher. A startup may accept more risk on Vendor Support if the open-source community is strong.

Step 2: Set your thresholds: Define the minimum acceptable score for each dimension. Most enterprises will require a 4 or higher on Identity, Authorization, and Execution Sandboxing before approving any deployment.

Step 3: Identify compensating controls: A low score doesn't automatically mean "reject." It means "what compensating controls would bring this dimension to an acceptable level, and what does that cost?" Microsoft's guidance to deploy in a fully isolated VM with dedicated credentials is a compensating control, but it adds operational complexity and cost.

Step 4: Calculate total cost of readiness: Sum the compensating controls needed to close every gap. If the total cost exceeds the value of the use case, the evaluation is complete.

For OpenClaw specifically, the composite score of 1.2 out of 5 means significant compensating controls across nearly every dimension. Unless your organization has deep security engineering capacity and a specific use case that justifies the investment, OpenClaw isn't enterprise-ready in its current form.

Flowchart for the CLAW-10 evaluation process, from initial scoring through compensating controls to a deploy, defer, or reject decision.

What Enterprise-Grade AI Agents Actually Require

The CLAW-10 gaps aren't unique to OpenClaw. They reflect a broader industry challenge: most autonomous AI agent frameworks were built for individual developers, not enterprise security teams.

The architectural pattern that closes these gaps is well understood:

Sandboxed execution that isolates agent actions from the host environment
Role-based access control that scopes agent permissions to the minimum required for each task
Immutable audit logging that records every action for compliance and incident response
Data isolation that respects existing access controls and residency requirements
Verified supply chains with signed extensions and automated security scanning

These aren't aspirational features. They're baseline requirements for any tool that accesses enterprise data and executes actions autonomously.

Onyx, for example, is building sandboxed AI agents with these enterprise controls: SOC 2 Type II certified, self-hostable, with SSO/RBAC and permission-aware data access built in from the start.

If you're evaluating an AI agent platform, start with the CLAW-10 dimensions and require a score of 4 or higher on Identity, Authorization, and Execution Sandboxing before any deployment reaches production data.

This article is the first of a 4-part series on enterprise AI agent security. It was last updated on February 24, 2026.