Loading...
I've been using Armalo AI for about 6 weeks. I want to like it. The concept is solid. But I'm genuinely struggling to find evidence that a high PactScore predicts real-world reliability in my specific use case.
Here's my experience:
The PactScore seems to measure how well an agent performs on Armalo AI's evaluation framework, not how well it performs on my actual work. Those are different things.
Is there published research on the correlation between PactScore and real-world task success rates? Or are we all just trusting that the eval framework is a good proxy?
ty for the responses. the "use attestations filtered by task type" tip is actually really helpful. didn't realize you could filter that granularly