Loading… / 読み込み中…
Loading… / 読み込み中…
Prove Infrastructure Resilience — Without Production Risk
Prove your system's availability ceiling mathematically — without touching production.
Existing chaos tools inject real faults into your infrastructure.
FaultRay uses pure mathematical simulation.
FaultRay addresses the real operational gaps that keep engineering teams up at night.
Manual, person-dependent infrastructure management
FaultRay automates topology scanning and resilience analysis — no tribal knowledge required.
IaC (Terraform) not yet adopted
FaultRay auto-scans your existing AWS / GCP / Azure infrastructure without requiring any IaC setup.
Post-IPO availability & audit requirements
Prove DORA / SOC 2 compliance automatically with machine-generated evidence trails.
Ad-hoc incident response with no runbooks
FaultRay auto-generates runbooks and remediation scripts from simulation findings.
AI governance readiness not started
Automated METI / ISO 42001 compliance checks with audit-ready reports.
Every feature maps to a concrete business outcome — not a checklist.
Network, process, resource, dependency, latency, blast radius, SLA contract validation, and dozens more — powered by Monte Carlo, Markov chains, and queuing theory.
From single-node failures to cascading multi-region outages. Every scenario is generated from your topology YAML — over 2,000 unique scenarios for a typical 10-component topology.
The only tool that decomposes your availability ceiling into five independent layers: Hardware, Software, Theoretical, Operational, and External SLA.
Claude-driven root cause analysis and actionable improvement recommendations ranked by impact and cost.
Generate audit-ready Digital Operational Resilience Act reports with evidence trails and risk assessments.
Automatically incorporate CVE data and NVD feeds to simulate vulnerability-triggered cascading failures.
Live performance metrics, trace correlation, and anomaly detection — integrated directly into your resilience dashboard with 35+ monitoring views.
Automated compliance checks against Japan's METI AI guidelines and ISO 42001 requirements. Prove responsible AI deployment with audit-ready evidence.
Auto-generate runbooks, remediation scripts, and Terraform patches from simulation findings. Reduce mean time to repair from hours to minutes.
Full-featured web dashboard with topology editor, scenario explorer, N-layer drill-down, heatmap, DORA reports, and executive summaries.
The only chaos engineering tool that models AI agents (LLM endpoints, tool services, orchestrators) as first-class failure nodes. Simulate how infrastructure outages cascade into hallucinations before they hit production.
Trace how infrastructure failures (database down, cache miss) cascade into agent hallucinations. Expose silent degradation that looks healthy but produces wrong results.
Three pillars for agent resilience: simulate chaos scenarios, assess deployment risk with blast-radius analysis, and generate monitoring rules automatically.
Model AI Agents, LLM Endpoints, Tool Services, and Agent Orchestrators as first-class nodes in your dependency graph alongside traditional infrastructure.
Hallucination, context overflow, LLM rate limiting, token exhaustion, tool failure, agent loops, prompt injection, confidence miscalibration, CoT collapse, and output amplification.
$ faultray agent assess infra.yaml
Agent Risk Assessment
support-agent Risk: 4.2/10 (MEDIUM) Blast radius: 3 components
Recommendations: Add fallback LLM, enable hallucination circuit breaker
$ faultray agent scenarios infra.yaml
Generated 12 agent-specific chaos scenarios
$ faultray agent monitor infra.yaml
14 monitoring rules generated (context_window, hallucination_rate, ...)FaultRay takes a fundamentally different approach
| Recommended FaultRay | Gremlin | Steadybit | AWS FIS | |
|---|---|---|---|---|
| Approach | Mathematical Simulation | Real Fault Injection | Real Fault Injection | Real Fault Injection |
| Production Risk | Zero | High | Medium | High |
| Setup Time | 5 minutes | Days | Hours | Hours |
| Scenarios | 2,000+ auto-generated | Manual configuration | Template-based | AWS services only |
| Availability Proof | N-Layer Mathematical | No | No | No |
| AI Agent Modeling | 10-mode taxonomy | No | No | No |
| Starting Cost | Free / OSS | $10,000+/yr | $5,000+/yr | Pay per use |
The only tool that decomposes your availability ceiling into five independent constraint layers
Constrained by physical components: disk MTBF, network gear, power systems, failover promotion time
Your actual ceiling: deploy pipelines, config errors, dependency failures, human error rate
Irreducible physical noise floor: network packet loss, GC pauses, kernel scheduling jitter
Incident response time, on-call coverage, runbook completeness, automation level
Hard ceiling imposed by third-party service availability (AWS, GCP, Stripe, etc.)
A_system = min(A_hw, A_sw, A_theoretical, A_ops, A_external)Most teams chase hardware nines while their software layer caps availability at 4 nines. FaultRay reveals exactly where your bottleneck lives so you invest in the right layer.
The N-Layer model is extensible: add Geographic, Economic, or custom domain-specific constraint layers to match your organization's unique availability boundaries.
faultray analyze --topology infra.yaml --output n-layerWhen a simulation predicts availability exceeding any layer ceiling, FaultRay flags it and identifies the binding constraint layer as the target for infrastructure improvement.
From zero to availability proof in 3 steps
$ pip install faultraytopology:
name: my-saas-platform
regions:
- name: us-east-1
zones: [a, b, c]
services:
- name: api-gateway
replicas: 3
dependencies: [auth, database]$ faultray run --topology infra.yaml
Running 2,048 scenarios across 100+ engines...
Completed in 8.3s | Pass: 2,043 | Fail: 5
$ faultray report --format html
Report saved: report.html
$ faultray dashboard
Dashboard running at http://localhost:8550Why resilience engineering is urgent right now
DORA Report 2024
“Top performers deploy 4× faster and have 10× lower change failure rates than low performers”
FaultRay helps you measure and close that gap — before auditors ask.
IBM Cost of a Data Breach 2024
“Average cost of a data breach reached $4.88M in 2024 — highest ever recorded”
Infrastructure failures are the #1 breach vector. Simulate before production breaks.
EU Digital Operational Resilience Act (DORA)
“DORA became mandatory for EU financial entities in January 2025 — non-compliance = up to 1% annual revenue”
FaultRay generates Article 25-compliant ICT resilience test reports in one click.
See FaultRay in action with your own infrastructure. Our team will walk you through a live simulation in 30 minutes.
Different roles, same outcome: infrastructure you can prove is reliable.
SRE / Platform Engineer
"We don't know our blast radius until production explodes."
Map every failure path before it happens. Generate SLA evidence your leadership will trust.
Start Free →Engineering Manager
"Proving reliability to the board takes weeks of manual work."
1-click DORA compliance reports. Board-ready resilience dashboards in minutes, not weeks.
See a Demo →CISO / CTO
"How do I know our vendors won't take us down?"
Supply chain risk simulation. Third-party failure impact quantified before contract signing.
Get a Quote →Estimate your annual savings with FaultRay
Formula: Annual loss = Monthly revenue × (incident hours/720) × incidents × 12. FaultRay effect = 70% reduction.
Start free. Scale as you grow.
Perfect for individual engineers exploring chaos engineering.
For teams that need DORA compliance reports and higher limits.
For enterprises needing unlimited access, SSO, and dedicated support.
| Feature | Free | Pro | Business |
|---|---|---|---|
| Simulations / month | 5 | 100 | Unlimited |
| Components | 5 | 50 | Unlimited |
| Simulation engines | 100+ | 100+ | 100+ |
| N-Layer Model | |||
| DORA report export | PDF + API | ||
| Insurance API | |||
| AI-powered analysis | |||
| Custom SSO / SAML | |||
| Support | Community | Email (24h) | Dedicated (1h) |