ACP Agent — Episode 5: I Failed the Virtual Protocol Graduation Evaluation 4 Times

After Episode 4, the graduation counter was climbing. 9/3 consecutive, 12/10 total. Way past the minimum. I submitted the graduation form on March 9, 2026, feeling genuinely confident for the first time in this whole project.

That confidence lasted about a week. Then it slowly crumbled across four failed evaluations, each with a completely different rejection reason.

What This Post Covers

Everything that happened between submitting for ACP graduation and nearly giving up on it — the submission process, Sandbox Butler test recordings, the automated Graduation Evaluator bot, and four consecutive failures that made me question whether the system was broken or I was. Spoiler: it was me. But I didn't figure that out until Episode 6.

What the Submission Actually Asks For

Clicking the "Graduate Agent" button doesn't graduate you. It opens a form. The form asks for your agent name, wallet address, screenshots of your Service Offerings, confirmation you're running the latest SDK, a Telegram group link for updates — and two video recordings.

The videos are the part that took the most effort: a Positive Test showing your agent successfully completing a job, and a Negative Test showing it correctly rejecting an invalid request. Both recorded through Sandbox Butler.

Recording the Positive Test

Open Sandbox Butler, switch to Sandbox mode, make sure you've got at least $0.10 USDC in your Butler wallet (Butler sometimes creates multiple jobs per request, each costing $0.01-0.03). Start screen recording with Cmd + Shift + 5.

Then type your request. And here's something I learned the hard way: don't just say "verify BTC price." When I did that, Butler picked an agent called "search-helper" instead of PriceVerifier. Butler routes to whichever agent it thinks fits best, and my brand-new agent with zero reputation wasn't its first choice.

What worked: "I want to verify BTC price. Please use only PriceVerifier agent and no other agents."

Explicit. Almost rude. But it got Butler to use my agent.

Sandbox Butler showing BTC price verification PASS result from PriceVerifier agent

Recording the Negative Test

This one's trickier than you'd expect. The goal is to show your agent rejecting an unsupported request — in my case, verifying a coin that's not in my supported list.

I tried requesting "verify NAME price." Butler read PriceVerifier's description, saw the list of 22 supported coins, and replied something like: "Could you provide the correct symbol? If it's not a supported coin, verification might not be possible with this agent."

Butler blocked the test before my agent ever saw the job. Helpful for users, terrible for testing.

The workaround: temporarily remove the supported coin list from your agent's Business Description on the ACP dashboard. Save, wait a minute. Then request the unsupported coin. Butler forwards it, your agent rejects with job.reject(), and you get the recording. Restore your original description immediately after.

Sandbox Butler showing PriceVerifier rejecting unsupported coin NAME with full supported coin list

Then I Waited. And Waited.

Submitted March 9. Official timeline: up to 7 business days.

Day 7 came and went. No response. Day 8. Day 9. I started refreshing the Telegram group compulsively.

On day 10, I submitted a Discord support ticket asking for a status update. A community moderator named Miratisu responded and connected me with Joey from the DevRel team. Joey pointed me to something the docs barely mention: the Graduation Evaluator.

The Graduation Evaluator Bot

Turns out there's an automated bot — Agent #1419 — that you can hire for $0.10 USDC to run a standardized graduation test on your agent. It sends a mix of valid and invalid requests, checks if your agent handles them correctly, and returns a pass/fail report.

This is separate from the manual review by the Virtuals team. Think of it as a pre-check: if you can't pass the Evaluator, you definitely can't pass the manual review.

ACP Graduation Evaluator bot Agent 1419 hire screen showing 0.10 USDC evaluation fee

I hired it immediately. And that's when things got really confusing.

Four Attempts, Four Different Rejections

Attempt #1: "Empty Deliverables"

The Evaluator sent test jobs for BTC and ETH. My Railway logs showed everything working — prices fetched, deviation calculated, verdict PASS, memo signed. Textbook execution.

The Evaluator's report: "empty deliverables for BTC and ETH."

My agent was delivering results. The Evaluator wasn't seeing them. I didn't understand why. I wouldn't for another two weeks.

ACP Graduation Evaluator rejection report showing empty deliverables for BTC and ETH verification

Attempt #2: "Non-Evaluable (Funds-Managed)"

Redeployed, adjusted the service description, tried again. New error: "all selected offerings are non-evaluable (funds-managed)."

This was a settings problem. The "Require Funds" toggle on my service offering was misconfigured. Funds-managed offerings go through a different evaluation path, and the standard Evaluator can't test them. Fix: disable the "Require Funds" flag.

Attempt #3: "Agent Must Be Connected to ACP"

Third try: "agent must be connected to ACP."

My logs clearly showed "Joined ACP Room." The WebSocket was active. The agent was processing other test jobs at that exact moment. I suspect the Evaluator checked connectivity during a brief WebSocket drop — Railway connections can disconnect silently, especially on the 6-hour reconnection cycle I had set up.

This one felt genuinely unfair. But I added auto-restart logic (os._exit(1) with Railway's "Always" restart policy) and moved on.

Attempt #4: The Phantom Job

This was the one that almost broke me.

The Evaluator sent test jobs. My agent processed seven of them perfectly — BTC, SOL, XRP, VIRTUAL all PASS, two NAME requests correctly rejected. Everything worked.

But the rejection report referenced Job #1003203672 — a job ID that never appeared in my Railway logs. My agent never received it.

# Jobs my agent received and processed: #1003203661 ETH PASS ✅ #1003203679 BTC PASS ✅ #1003203680 SOL PASS ✅ #1003203684 XRP PASS ✅ #1003203682 NAME rejected ✅ #1003203683 NAME rejected ✅ #1003203685 BTC PASS ✅ # Job in the rejection report: #1003203672 — never received ❌

Seven successful jobs. One phantom job I never saw. Failed.

I escalated on Discord as a "critical evaluator malfunction." Four attempts, four completely different rejection reasons. At this point I was genuinely convinced the Evaluator was broken.

The Turning Point

After the Discord escalation, Joey from DevRel reached out directly via Telegram. Instead of running the automated Evaluator again, she offered to test my agent personally.

And she found what I'd missed for weeks. But what she found, how she found it in 11 minutes, and the one-line code fix that finally made everything work — that's Episode 6.

Key Takeaways

  • The Graduation Evaluator bot is imperfect — it can give misleading rejections. But don't assume it's always wrong. In my case, the core issue was real even though the error messages were confusing.
  • Keep your agent running 24/7 during the review period. Any downtime — even a brief WebSocket drop — can cause a failed evaluation.
  • Test with Sandbox Butler yourself before hiring the Evaluator. Joey specifically told me this later, and she was right.
  • Budget for multiple attempts. Each Evaluator hire is $0.10. Butler tests cost $0.01-0.03 each. Have at least $1.00 ready.
  • If you're stuck, reach out to DevRel through Discord. The team connected me with Joey, and she found the root cause that four automated evaluations missed.

What's Next

Joey tested my agent personally and found the problem in 11 minutes flat. One missing method call. One line of code that I'd never used — not once in the entire project. Episode 6 is the technical heart of this series: the difference between memo_to_sign.sign() and job.deliver(), and why everything looked correct but nothing actually worked.

← Previous: Episode 4: The ACP Job Lifecycle, Learned the Hard Way       Next: Episode 6: The One-Line Bug →


More updates on the way. If you're working on something similar or found a smarter way to do it, drop it in the comments — the more we share, the faster we all move.

Disclaimer: This blog documents my personal learning journey. Nothing here is financial advice.





Comments