Rise of the Phoenix: Lesson Learned Build an AI-powered Test Gen Engine

Rise of the Phoenix
Lessons Learned While Building an AI-powered Test Gen Engine
by Steve Brudz
Principal Engineer @ Def Method
Artificial Ruby NYC March 4, 2025

Agenda
● Story
● Phoenix Demo
● Key Learnings Developing Phoenix

Meet Maple
Newly Hired CTO at Cat’s Paw, LLC

Pain Points at Cat’s Paw, LLC
“Our last two releases were so buggy that we had to roll them back. Customers are pissed
off and some are threatening to leave.”
– Chocolate, Head of Sales
“The engineering team takes forever to get anything done and they never meet their
estimates.”
– Fluffy, Head of Product
“Some of this code is like the Fire Swamp from the Princess Bride. Every time I change
something in there, unexpected things break. I hate it.”
– Mochi, Senior Developer

Churn vs. Complexity of Cat’s Paw, LLC’s code

Test Coverage vs LOC of Cat’s Paw, LLC’s code

Impact of Technical Debt
● High risk of accidental breakage
● Slows development down
● Work is hard to estimate
● Lowers morale

How to fix things?
1. Add tests before changing code
2. Make the changes
3. Refactor
4. Rinse and repeat
There’s
got to be
a faster
way!
But that takes
so long…

Enter Phoenix
● AI-powered test generation
● Generates a full test suite for the system
● Uses Rails testing best practices
● Provides PR feedback*
● Maintains and updates the tests*
* coming soon

Key Learnings Developing Phoenix

Key Learnings: LLMs
● Test out different providers & models
● Newest model isn’t always the best
● Keep the info you’re sending the LLM small and focused
● LLMs use probabilities – roll the dice multiple times and pick the best
● Prefer traditional automation to LLMs
OpenAI
StarCoder
Claude
Robot House

Key Learnings: Monitoring
● Capturing traces is essential for troubleshooting
● There are many options
○ LangSmith, ArizeAI, Langtrace, AgentOps, MLFlow, OpenLit, etc.
● Capture errors, LLM completions, tool calls
● Monitor tokens and cost
● Time-outs and recursion limits are a must-have

Key Learnings: Concurrency
● Important for processing large amounts of data quickly
● Many LLM apps are I/O bound not CPU bound
● Asyncio works differently in python than in javascript
● Watch out for API rate limits when doing concurrent programming

Key Learnings: Agents
● Agent-based workflows are powerful
and flexible but cost more
● CrewAI’s Agents, Tasks, and Tools
allow LLMs to collaborate
● Postel’s Law: “Be liberal in what you
accept, and validate what you send”

Key Learnings: Tools
● Very specific function improves reliability
● General tools can be useful as a back up
● Agents may use them in surprising ways
● CrewAI supports hand-offs between agents (ask question tool)

Happy Business + Happy Team = Happy Cat
● More frequent releases
● Faster speed to market
● Greatly reduced failure rate
● Happier developers

Steve Brudz
Principal Engineer
steve.brudz@defmethod.com
Thank you!
Def Method
336 W 37th St #335
New York, NY 10018
(212)-256-1460
Any questions?

Rise of the Phoenix: Lesson Learned Build an AI-powered Test Gen Engine

More Related Content

Similar to Rise of the Phoenix: Lesson Learned Build an AI-powered Test Gen Engine

Recently uploaded

Rise of the Phoenix: Lesson Learned Build an AI-powered Test Gen Engine

Editor's Notes