When Your AI Support Tool Isn’t Actually Supporting Your Customers

The 84% resolution rate that was actually 11%

We use Intercom’s Fin AI to help handle customer support at CraftedStays. According to Fin’s dashboard, it’s resolving 84.5% of the conversations it handles.

That number is a lie. Or at best, a carefully constructed illusion.

I kept finding myself jumping into conversations where Fin had already started answering—incorrectly. Customers asking about password resets were getting generic responses. Questions about our Hospitable integration were met with confidently wrong instructions. Fin was creating more work, not less.

So I decided to stop trusting the dashboard and start measuring reality.


The Dashboard vs. Reality

We ran the same 60-day period (November 26, 2025 – January 26, 2026) through both Fin’s dashboard and our own analysis of the raw conversation data.

Here’s what Fin’s performance dashboard showed:

Fin’s Dashboard Says The Number
Resolution Rate 84.5%
CX Score 76.2%
Escalated to Human 14 conversations
Total Conversations 239

Looks solid, right? Now here’s what we found when we actually analyzed the conversation data:

What Actually Happened The Number
Fin handled correctly (no human needed) 11.5%
Humans had to take over 67.8%
Humans had to correct Fin 65.6%
Conversations requiring human intervention 183
Total Conversations 270

The dashboard says 14 escalations. We counted 183 human takeovers. That’s a 13x difference.

And notice something else? Fin’s dashboard only shows 239 conversations. We found 270. 31 conversations aren’t even showing up in their dashboard.


How the Dashboard Lies With Statistics

The metrics aren’t technically false—they’re just designed to mislead.

“Resolution Rate: 84.5%” — This is 87 out of 103 conversations where Fin was “involved.” But Fin wasn’t involved in 136 of the conversations. And “resolved” just means the conversation closed—not that Fin actually answered correctly. Customer gave up? Resolved. Customer found the answer somewhere else? Resolved. Did a human teammate jump in and fix the problem? Still counts as Fin resolved.

“Escalated: 14” — This only counts conversations where Fin explicitly handed off to a human. It doesn’t count the 177 times a human had to jump in and correct Fin. Those get filed under “resolved.”

“CX Score: 76.2%” — Based on 21 ratings. Out of 239 conversations. An 8.8% response rate. Customers who were frustrated enough to leave probably didn’t take the time to rate the experience.

Missing conversations — We analyzed 270 conversations. Fin’s dashboard shows 239. Where did 31 conversations go?

The dashboard is designed to make you feel good about your AI investment. It’s not designed to tell you if your customers are actually getting helped.


Using AI to Audit AI

I used Claude Code to write a script that pulled every customer conversation from the last 60 days via Intercom’s API. 270 conversations total, exported to JSON for analysis.

Then I had Claude analyze each conversation to answer three questions:

  • Did Fin handle this correctly on its own?
  • Did a human have to take over?
  • Did a human have to correct something Fin said?

No dashboard spin. Just the transcripts.


The Real Numbers

What Actually Happened Percentage
Fin handled correctly (no human needed) 11.5%
Humans had to take over 67.8%
Humans had to correct Fin 65.6%

Fin wasn’t resolving 84% of our support. It was successfully handling about 1 in 9 conversations.

Worse, in nearly two-thirds of conversations, we weren’t just stepping in—we were cleaning up after incorrect information had already been given to our customers.


Finding the Pattern

Raw numbers weren’t enough. I needed to understand where Fin was failing so we could fix it.

Claude categorized every conversation by topic and calculated Fin’s success rate for each:

Category Fin Success Rate
Account & Access 0%
Billing & Payments 5%
Domain & DNS 10%
Technical Issues 11%
Setup & Onboarding 19%

The pattern was clear: Fin struggled most with questions that required specific knowledge about our product—how our integrations actually work, what our pricing includes, the exact steps to connect a domain.

Generic AI training isn’t enough. Fin needed to actually understand CraftedStays.


Building the Knowledge Base Fin Actually Needs

Here’s where it got interesting.

I gave Claude access to our codebase. Not to write code—but to understand how our product actually works. When a customer asks, “Why aren’t my photos syncing from Hospitable?”, the correct answer isn’t a guess. It’s documented in our sync logic.

Claude analyzed the conversation failures, cross-referenced them with our actual product behavior, and drafted 30 knowledge base articles targeting the exact gaps:

  • 11 high-priority articles covering the topics where Fin failed most (password resets, Hospitable integration, DNS setup)
  • 18 medium-priority articles for less common but still problematic areas
  • 1 low-priority article for edge cases

Each article was drafted based on what our code actually does, not what we think it does.


The Process

For those interested in replicating this:

  1. Pull your data: use your support platform’s API to export conversations. We used Claude Code to write the script, keeping API credentials secure in environment files.
  1. Analyze systematically — Don’t just read conversations. Have them categorized and scored against specific criteria (handled correctly, required takeover, required correction).
  1. Find the patterns — Sort by category. Calculate success rates. Your worst categories are your biggest opportunities.
  1. Ground your answers in truth — If you have a codebase, use it. AI can draft documentation, but it should be based on how your product actually works, not how it might work.
  1. Review and iterate — AI-generated articles are drafts, not finished products. We’re reviewing each one, adding screenshots, and validating accuracy before publishing.

What’s Next

We’re rolling out these 30 articles over the coming weeks. Our team is reviewing each one, adding visual guides, and making sure they meet our standards.

Will this get Fin to actually hit 84%? Probably not. But I do know our knowledge base will finally reflect how CraftedStays actually works—which means customers will get accurate answers whether they come from Fin or find the article themselves.


The Bigger Point

If you’re using AI support tools, audit them. Don’t trust the vendor’s dashboard. Pull your own data. Read the actual conversations. Measure what’s actually happening to your customers.

The vendors have every incentive to make their metrics look good. You have every incentive to know the truth.

And if you’re a CraftedStays customer reading this: we see you. We know when you’re getting less-than-perfect answers, and we’re actively working to fix it. Your experience matters more to us than a vanity metric on a dashboard.


Gil Chan is the founder of CraftedStays, a direct booking platform that helps property managers and hosts create high-performing websites without agencies or legacy platforms.

Scroll to Top