QA Agent Team
An autonomous QA pipeline for Salesforce implementations. Pulls user stories from Jira, generates test cases with Claude, executes them in Salesforce via Playwright, verifies via both UI and SF MCP, and posts results to Google Sheets and Jira — all from a single command.
- Timeline
- 3 months initial build, ongoing iteration
- Role
- Sole architect and developer — from scoping through shipping to production maintenance
- Outcome
- 80% cycle time reduction
What wasn't working.
QA on multi-client Salesforce implementations was a bottleneck. Every sprint, QA engineers spent 20–30 hours manually running through user stories across multiple orgs — logging in, navigating, verifying each acceptance criteria by hand, screenshotting failures, and writing up results in spreadsheets and Jira comments. By the time a sprint ended, QA was always the blocker, and release cycles were slipping.
How we built it.
We built an autonomous QA pipeline driven by Claude Code. The system pulls user stories from Jira, generates executable Playwright test cases from the acceptance criteria, runs them in a sandbox Salesforce org, captures screenshots and DOM state on failures, verifies through both the UI and Salesforce MCP (so both user-facing behavior and record-level state are validated), and posts structured results back to Google Sheets and Jira. A dashboard UI lets QA leads run sprint-level test suites with one command, retest only what failed, and drill into failure screenshots.
What shipped.
- QA cycle time reduced by 80% (from ~30 hours per sprint to ~6 hours of human review)
- Failure sub-classification (permission_denied, selector_timeout, login_failure, etc.) routes bugs to the right developer automatically
- Dashboard + selective retest eliminates the "rerun everything" pattern — only failed tests get rerun
- Tiered guardian system with retry logic catches 90% of flaky failures before they reach human review
- Multi-client architecture supports multiple Salesforce orgs, Jira projects, and test user personas from one codebase
Have a system like this to build?
Every project here started as a 30-minute conversation. Tell us what you're trying to ship, and we'll tell you how we'd build it.
Multiply Your Output