Large Telco

August 1, 2024
5 min read

Problem: Key internal stakeholders were skeptical that the new AI chatbot was ready to be deployed in a production environment at clients. The key concerns were performance, alignment with Responsible AI frameworks, and potential risks and vulnerabilities. The internal team lacked the red-teaming capabilities to generate buy-in.

Armilla AI’s Solution:

  • Used 15,000 prompts to rigorously test the chatbot
  • Tested and benchmarked the model’s; False refusal rate; Faithfulness ; Strict Correctness;  Correctness; Sycophancy
  • Benchmarked performance against NIST and ISO 42001 standards for Responsible AI

Outcomes: Armilla AI's thorough evaluation identified 30% more issues compared to the internal team, enabled new guardrails, and instilled confidence, resulting in the production deployment of the solution.

"Engaging Armilla AI to evaluate and red-team our generative AI support chatbot has elevated confidence in the quality and safety of the customer service we provide to customers while advancing our Responsible AI objectives, making them an invaluable partner in building public trust and accountability for our next-gen AI solutions."

  • Director of Data Ethics
Share this post

Safeguard your business with our AI Insurance

Get started today and be protected within two weeks.
Get in touch
ArrowArrow