From hours to outcomes: rebuilding customer support for the AI era
The Uncomfortable Secret of Every BPO in the World
Every Business Process Outsourcer (BPO) has a dirty secret, and it is one that nobody wants to voice out loud during a Quarterly Business Review: their business model depends on the exact thing their customers are trying to eliminate.
Repetition.
A BPO sells hours. Hours are billable when an agent is typing. Agents type when customers have questions. And customers have the same questions over and over again: "Where is my order?", "How do I return this?", "Why was I charged twice?". The industry has spent twenty years optimising a machine whose fuel source is the repetition its clients are actively trying to burn down.
You can bolt Artificial Intelligence onto that machine. Most of the industry is doing exactly that: a copilot here, a suggested reply there, a shiny AI-powered slide in the sales deck. It does not change the physics of the situation. The agent still types. The clock still ticks. The incentives stay broken.
At Onepilot we believe the answer is fundamentally different. We are not bolting AI onto a BPO. We are rebuilding the entire structure around a different idea: one where AI is not a feature, but the default execution layer.
This post is about the principles behind that choice.
The Repetition Economy
Look at any support queue in e-commerce, fintech, or marketplaces, and you will see roughly the same shape. The distribution of workload follows a predictable pyramid that defines how resources are currently mismanaged across the globe.
Customer Support Volume Distribution
| Task Category | Volume (%) | Role |
|---|---|---|
| Repetitive / Policy-driven | 70% | AI Execution |
| Complex Playbooks | 25% | AI Orchestration |
| Novel / Emotional | 5% | Human Judgement |
The top 70% is the part that should never have been a job in the first place. A human reading the order status from one screen and typing it into another is not customer service, it is data movement with a friendly tone. It is a waste of human potential and a source of massive operational inefficiency.
The middle 25% is where most AI for support tools quietly fail. They can draft a reply; they cannot execute. They will tell you to issue a refund, but they will not actually issue it. They cannot read the order, check the warehouse, cross-reference the return policy, decide whether to approve, process it in the ticketing system, and notify the customer, all in one coherent action.
The bottom 5% is the genuinely hard, emotional, and novel cases where humans earn their salary. That is also where a BPO’s best agents are wasted today, because they are spending 80% of their shift drowning in the top 70%.
The whole game is moving work down the pyramid. AI eats the top. Humans keep getting better at the bottom. The Onepilot approach turns the operator into a traffic controller, deciding in real time, conversation by conversation, which work goes where. Everything else in this post is downstream of that idea.
A representative trace of a duplicate-charge ticket at Onepilot. The example is constructed for this post: the pipeline, sequence, and event types are accurate to what runs in production.
AI-Layered vs. AI-Native
The standard playbook right now is what we call AI-layered:
Take an existing ticket system.
Bolt on a Large Language Model (LLM) wrapper.
Add a suggested reply button.
Call it AI-powered.
This fails for a specific reason: the surrounding system was built to be driven by humans. The workflows, the state machines, the audit trails, the permission models: they were all designed around the concept of "agent clicks button". Drop an LLM into that environment and it can observe, but it can barely act. And when it does act, nobody trusts it, because there is no coherent layer tracking what it did, why it did it, and what happened next.
AI-native is a different thing entirely. The difference is not which model you use. It is not how clever the prompt is. It is a handful of design principles that, taken together, produce a completely different product.
Here are the five that matter most.
Principle 1: The LLM is the Engine, Not the Decision-Maker
This is the single most important idea, and it is the one most products get wrong.
Text-first is the default mode for LLM products. You give the model a big prompt, some tools, and hope for the best. The model decides when to call what, what to say, when to stop, and what to promise. This is great for demos and absolutely terrible in production, because when the same customer question leads to different actions on Tuesday and Wednesday, you do not have a product. You have a lottery.
An AI-native system inverts the relationship. The operational logic, "if the order shipped more than 14 days ago, offer a refund; otherwise, offer a reroute", lives in structured, versioned playbooks defined by humans who understand the business. The LLM’s job is not to decide what should happen. Its job is to interpret messy user input, navigate the playbook, pick the right tool, and write the reply in the brand’s specific voice.
That shift from LLM as brain to LLM as engine is what makes every other property of the system possible. Determinism, auditability, safety, cost predictability, version control, and testability. None of those are achievable when the model is freelancing.
Principle 2: Actions Over Answers
An AI that drafts a refund reply is a demo. An AI that issues the refund is a product.
Most of the industry is still living on the draft-reply side of that line. Suggested responses, macro helpers, and AI-assisted inboxes are genuinely useful, but they are also a local maximum: they cannot take you past the ceiling where a human is still in the loop for every single ticket.
The path past that ceiling runs through actions. This means:
Reading the order data directly from the database.
Cancelling the subscription in the billing platform.
Issuing the refund via the payment gateway.
Updating the customer record in the CRM.
Scheduling the reshipment in the warehouse management system.
Notifying the carrier and logging the resolution.
Every one of those is an API call into a tool you already own: Shopify, Zendesk, Recharge, Klaviyo, HubSpot, the ERP, or the OMS. An AI-native CX platform is, in a very real sense, a coordination layer across all of them. The intelligence is not in the replies, it is in the orchestration.
Which also means the platform has to stay open. The strongest feedback we hear from enterprise customers is: "Do not be another closed-loop vendor." They already have dedicated tools for Quality Assurance, workforce management, and knowledge bases. Our job is not to replace any of that. It is to orchestrate across it. An AI-native operator that refuses to talk to your existing stack is not AI-native. It is merely a walled garden with a new coat of paint.
Principle 3: Conversations are State Machines, Not Chat Logs
A support conversation is not just a chat. It is a state. Consider this flow:
Customer opens a ticket with an order number.
AI fetches order status.
Customer asks "Can I change the delivery address?"
AI checks whether the order has shipped.
It has: AI offers a reroute or a return-and-rebuy.
Customer picks reroute.
AI calls the carrier.
Customer says "Oh and also, my other order..."
Anyone who has tried to handle this as a growing prompt with message history knows it falls apart within three turns. The context balloons, the model forgets earlier decisions, the tool calls get redone, and the customer gets asked the same question twice.
An AI-native platform treats every conversation as a durable, replayable state, with full memory of what has been asked, what has been tried, which tools have returned what, and which decisions have already been made. If you restart the service or scale the worker pool, the conversation picks up exactly where it left off.
This is also where the uncomfortable, honest truth about LLMs hides: they are non-deterministic, and they will occasionally make the wrong call. When that happens, you want to be able to replay the exact state that led to the bad decision and see where the system went sideways. You cannot do that if your conversation is just a prompt. You can do it if your conversation is a state.
Principle 4: Measure Process, Not Intelligence
Most AI for support dashboards show metrics like % auto-resolved and call it a day. Those numbers are mostly vanity. They do not tell you why a conversation went off the rails, or which business situation is creating friction, or what to change to make it better.
The Core Metric: We do not measure AI intelligence. We measure process performance.
AI-native measurement starts from a different question: what outcome is the customer actually looking for, and how is the system doing at delivering it?
Not: "Did the AI answer?"
But: "Did the refund actually process?"
Not: "Was the intent identified?"
But: "Did the customer come back thirty minutes later with the same issue?"
We focus on outcomes, not intents. Quality scores must combine resolution, effort, trust, and experience, not a single AI resolution rate headline.
Furthermore, the platform must be honest about what it cannot resolve. Conversations that do not match any playbook are data, the raw material for the next generation of automation. Clustering them, surfacing emerging themes, and flagging friction in workflows that are technically correct but operationally failing is the real value. The AI that gets smarter over time is not the model itself: it is the system built around the model.
Principle 5: Humans are an Upgrade, Not a Patch
This is the part that differentiates an AI-native operator from a pure-play software tool.
Onepilot has thousands of real, trained agents worldwide. They are not a fallback for when the bot gives up. They are the other half of the same system. For every message, the platform decides between four paths:
Autonomous: The AI answers directly and resolves the issue.
Augmented: The AI drafts, a human reviews, and a human sends.
Collaborative: The AI hands over entirely with a full context briefing.
Human-Led: A human owns the ticket end-to-end (e.g., VIPs, crises, legal-sensitive cases).
The ratio shifts over time. A new customer or a new playbook results in a lower confidence score, leading to more human intervention. As the playbook matures and the process stabilises, the system moves towards higher autonomy. The system measures itself and tunes these thresholds conversation by conversation. Our operations team and our AI team are the same team: they ship together, they measure together, and neither can succeed without the other.
This is why our Customer Satisfaction (CSAT) scores do not drop as we automate, they climb. We are not firing humans and hoping the bot is good enough. We are using humans for the work where humans are irreplaceable, and letting the machine handle the rest.
Governance, Traceability, Trust
A lot of what separates production-grade AI from a demo isn't the AI at all. It's the governance layer around it. This is the foundation trust gets built on, and in
enterprise CX, without trust, there is no product.
The first layer is traceability. Every conversation the system handles produces a complete, queryable audit trail: which workflow was chosen and why, which tools were called with which parameters, what each tool returned, how the final reply was composed, which knowledge passages were used to ground it, and what each step cost. Not as a byproduct but as the default. If a reply goes wrong, you can replay the state that produced it. If a refund was issued, you can see the chain of decisions that led there. If a customer asks "why did your AI say X last Tuesday?", you have a real answer, not a shrug.
The second layer is governance. Every tool the AI can invoke is explicit and revocable. Every playbook is versioned, you always know what logic was live for a given conversation, even months later. Sensitive actions can be gated by authority rules ("no refunds over €500 without human approval", "no account changes for unverified customers"), restricted to specific customer segments, or run in shadow mode while they're being validated. The humans responsible for the operation can turn capabilities on and off without shipping code. Nothing the system does is invisible to the team running it.
The third layer is safety. Per-customer data isolation. Identity verification before sensitive actions. Rate limits so a logic loop can't escalate into a financial crisis.
Content filtering. Anti-injection defenses so a crafted user message can't override the system's instructions. Honest escalation when a tool call fails - no pretending a refund went through when the API returned a 500. These aren't features, they're preconditions.
The fourth layer is trust at the contract level. ISO 27001 and ISO 18295 certifications. A public trust center where the controls, sub-processors, and incident-response commitments are laid out, not buried in a sales PDF. Data residency options, so a French insurer can keep everything on European infrastructure, a US retail brand can run on a global model, and a regulated healthcare client can run on a private endpoint: all on the same platform. Customer-owned data, exportable and deletable on request.
None of this is glamorous. All of it matters. The fancy AI research papers are fun to read, but the wins in production come from the unfancy parts, and the systems that take governance seriously are the only ones that survive contact with an enterprise procurement team.
The Paradox of Choice for the Modern Buyer
If you are running Customer Experience (CX) at a fast-growing company today, you likely face one of two problems:
Option A: The Legacy BPO The service is cheap, but quality is uneven. Their AI strategy is a single slide in a pitch deck, and you have zero visibility into why a customer received a nonsensical reply last Tuesday. You are essentially paying for headcount, not results.
Option B: The Shiny AI Tool You bought a software-only solution and spent three months trying to configure it. It can answer "Where is my order?" but breaks the moment a customer asks for a complex exchange involving a promotional code and a change of address. You still have to manage the tool, which becomes a full-time job in itself.
What we offer is neither.
We provide an operator that takes full ownership of the queue, much like a traditional BPO, but built on a platform where AI is the default engine. It is a human layer where the humans are an upgrade, not a patch. It remains open at both ends, your tools remain your tools, and your data remains your data.
Where the Industry Goes Next
The old BPO business model monetised repetition. It thrived on inefficiency because inefficiency was billable. The next generation of CX operators will monetise outcomes.
That shift from billing by the hour to being paid for the resolution changes everything about how a support organisation is designed, measured, and run. Your partner's incentives finally align with your own. AI stops being a cost centre to justify and becomes a lever to pull for performance. The humans in the loop get more meaningful work, not less of it.
At Onepilot, this shift isn't a pivot. It's the model we've been running since day one. We've always billed per ticket resolved, not per hour worked, which means we've always been paid for outcomes, and we've never had a reason to protect repetition. Every minute we save on a ticket is a minute we don't bill for. Every question we can automate is one we, by design, want to automate.
That's why the AI era doesn't force us to reinvent our commercial model. It accelerates it. The incentives that made us good at running human-operated support solve it faster, solve it better, don't pad the hours are the exact same incentives that make us good at running AI-native support.
We know which side of that shift we're building on, because we've been on this side the whole time.
Onepilot runs 24/7 in 35+ languages, serving brands in commerce, fintech, and regulated industries across Europe and the North America. If you are tired of paying for the same questions to be answered for the thousandth time, it might be time to move beyond the traditional BPO model.


