From 10 test calls to 52,000 in production

How we deployed AI voice support for Self Financial across ~75,000 monthly calls, 24 hours a day, in under six months. Now processing ~2,400 calls daily with write actions that close accounts and replace cards.

Production calls

~52k

22 days of 24/7 (4/21--5/12)

Accuracy

~99%+

Voice pipeline + unsupported facts

Performance

~95%

From 50% in Dec 2025

Live traffic

100%

24/7 since Apr 20

1 About Self Financial

Self helps millions of Americans build credit and savings through products designed for people underserved by traditional banking. Headquartered in Austin, TX.

Credit Builder Account (CBA)

A savings-backed installment loan that reports to all three credit bureaus. Customers make fixed monthly payments; at maturity, they receive their savings minus fees. Self's flagship product.

Self Visa Credit Card (SCC)

A secured credit card funded by the customer's CBA savings. Designed as a next step after building initial credit history through the CBA.

Unsecured Credit Card (UCC)

An unsecured card for customers who have graduated from the secured card. Represents the final step in Self's credit-building ladder.

Monthly call volume

~75k

~2,500/day weekdays, ~1,400 weekends

Ticketing system

Salesforce

Via SAIS integration layer

Voice platform

Twilio

IVR + Flex queue

"We want to completely replace the existing support program and build AI that can actually resolve issues, not just route them."

2 What we built

Not a chatbot layer on top of an IVR. A fully integrated system that reads accounts, takes actions, and handles regulated financial conversations end-to-end.

Live workflows

~35

Every major support topic

API integrations

~18

Direct SAIS + Salesforce read/write

Knowledge articles

~250

Curated + daily-scraped FAQs

Account lookup

Balance, status, payment history, payout info across CBA and SCC products via SAIS APIs

Write actions

CBA account closure, card replacement, Salesforce case creation -- all executed via API

Knowledge base

~40 curated articles plus ~210 daily-scraped FAQs for comprehensive coverage

3 The quality curve

23 test batches plus 21 days of production data, December 2025 through May 2026. Performance nearly doubled; accuracy converged to near-perfect.

50%

Dec 23 performance

→

94%

Apr 21 performance

Performance pass rate -- Dec 2025 to May 2026

Agent repetition, dead air, workflow execution, coverage. Test batches through 4/21, then production samples through 5/11.

Accuracy pass rate -- Dec 2025 to May 2026

Voice pipeline + unsupported facts. Zero tolerance checks.

4 Scaling with confidence

We started with 10 calls. Each batch grew as confidence increased. By April, we were testing thousands of calls per batch while maintaining quality.

Test volume over time

Calls reviewed per batch -- reflecting growing confidence in the system

Response time pass rate

Agent response time check -- after manual review overrides for false positives

5 The launch

From first test call to 100% of traffic in under five months.

Sep 2025

Project kickoff

Initial scoping, SAIS API integration design, workflow architecture

Dec 2025

First test calls

10-call batch, 50% performance pass rate. Baseline established.

Jan 2026

API integrations complete

CBA Payouts, CBA Payments, SCC Payouts live. KB scraper running daily with ~210 FAQs.

Mar 2026

Quality crosses 95%

Performance pass rate hits 97% on 3/27 batch. Scale testing begins at 200+ calls.

Apr 9, 2026

1,000-call validation

Largest structured test batch. 94.9% performance, 99.9% accuracy. Green light for full traffic.

Apr 13, 2026

100% of voice traffic

All voice calls routed through Lorikeet. Business hours initially.

Apr 20, 2026

24/7 operation

Extended to round-the-clock coverage. Kill switch in place for emergency failover.

Apr 21, 2026

3,558-call batch

Full 24-hour production test. 94% performance, 99.9% accuracy at true production scale.

Now

Phase 2: Pushing deflection

52k+ production calls processed. Driving AI resolution toward 40--60% target. Expanding write actions, reducing premature escalations.

6 What got us here

Six operational practices that made the difference between a pilot and a production deployment.

Step 1

Define what "good" means upfront

We split quality into two distinct tracks: accuracy (unsupported facts, voice pipeline errors) with zero tolerance, and performance (response time, agent repetition, workflow execution) with separate thresholds. This gave us a shared language with Self for what to fix vs what to watch.

Step 2

Accountability gates before every scale-up

EPD and FD both had to sign off before the next live batch. No unilateral "ship it". Each gate reviewed the previous batch results, open issues, and risk. This forced honest conversations about readiness instead of optimistic timelines.

Step 3

Map the full issue cycle, not just the filing

Every issue followed a transparency chain: discovery, root cause, fix, then validation that the fix actually worked. Filing a ticket wasn't credit for fixing the problem. This closed the loop between QA findings and production improvements.

Step 4

Output transforms (the breakthrough)

Adding data field definitions and clear instructions for the LLM on how to interpret API responses was the single biggest quality lever. Raw API data is ambiguous. Telling the model what each field means and how to present it to customers eliminated an entire class of hallucination.

Step 5

Lorikeet-owned QA, subscriber audits a subset

We own the QA process and review every batch. Self independently audits a subset. This flips the typical vendor model -- the subscriber validates rather than drives QA. When Self's audit found 3 issues we missed in 113 tickets, it built confidence rather than eroded it.

Step 6

Canonical simulation set for regression testing

We built a simulation suite covering ~80% of Self's business logic. Every workflow change runs against it before going live. This catches regressions before they reach production and gives confidence to ship changes fast without breaking what already works.

7 Quality infrastructure

Financial services means zero tolerance for hallucinated data. We built monitoring at every layer.

Daily pulse monitoring

Automated 4-day rolling metrics posted to Slack with statistical significance testing

Automated feedback triage

Bad-rated tickets auto-diagnosed, classified, and routed to Linear with deduplication

Self QA audit alignment

Self independently audited 113 tickets. Found 3 accuracy issues and 3 minor performance issues not in our QA.

Kill switch

Instant escalation number for emergency failover to Twilio Flex human agent queue