CASE STUDY
How a B2B company built a chatbot where prospective customers get answers grounded in what real users actually said — quoted, attributed, and backed by evidence.
A B2B company helps SaaS businesses collect and leverage customer testimonials — video transcripts, case studies, promotional materials. Their core product turns these into marketing assets. But testimonials have a second, harder use: they're social proof. A prospective buyer evaluating a SaaS product doesn't want to read the vendor's feature page. They want to know what a VP of Sales at a similar company actually experienced.
The new product: a chatbot where prospective customers ask questions about a SaaS product and get answers grounded in what real users have said. Not generated marketing language. Real quotes, real experiences, real authority.
It looks like a standard RAG problem. It isn't.
Standard RAG retrieves documents that were written to answer questions — knowledge bases, documentation, FAQs. Testimonials weren't written to answer anything. They're people talking about their experience in their own words, in their own structure, about whatever they felt was important.
CHALLENGE 1
A prospect asks "how long does onboarding take?" — but the customer who answered said "we were fully running within the month." Naive embedding similarity misses too much.
CHALLENGE 2
"Onboarding typically takes three weeks" is just an AI making claims. "A VP of Engineering said his team was fully running within three weeks" is social proof. That difference is the entire product.
CHALLENGE 3
Hundreds of testimonials across multiple client companies. Some contradict each other. The system must select material that fits the arc of the conversation, not just the last query.
The semantic gap between how prospects ask questions and how customers describe experiences is the core retrieval challenge.
The retrieval pipeline required more work than the generation layer. Pre-processing the testimonial corpus — cleaning transcripts, segmenting long case studies into retrievable chunks that preserve context, normalizing the difference between a polished case study and a raw video transcript. Retrieval with reranking to close the semantic gap between prospect language and customer language. Conversation-aware context so the system selects material that fits the arc of what the prospect is exploring, not just the last question they typed.
The generation layer sits on top of this and does three things. It synthesizes across multiple pieces of source material when a single testimonial doesn't fully answer the question. It quotes directly when a customer said something that carries authority — and attributes it properly ("a VP of Sales at a Series B company" rather than inventing a name or dropping the attribution). And it refuses to answer when the testimonial corpus doesn't contain relevant evidence.
It's not a helpful AI assistant. It's a social proof engine. If there's no proof, it says so.
This last part is critical. The moment the chatbot starts generating answers that aren't grounded in real testimonials, the entire product breaks.
LAYER 1
Retrieval pipeline
Corpus cleanup, segmentation, reranking, conversation-aware selection.
LAYER 2
Generation layer
Synthesize. Quote with attribution. Refuse when evidence is missing.
LAYER 3
Benchmarks
Prospect questions mapped to expected retrievals and response quality across every question type.
The product is live, serving real prospective customers across multiple SaaS client companies. Prospects ask questions and get answers backed by what actual users said — quoted when appropriate, attributed with enough authority to matter, grounded in evidence rather than generated from the model's training data.
The underlying insight is about what RAG is actually for. Most RAG implementations treat it as "search your documents and summarize." This one treats it as an evidence system. The model isn't the authority. The testimonials are. The model's job is to find the right evidence, present it credibly, and stay silent when the evidence doesn't exist.
Don't let the model be the expert. Let it find the expert.
If your AI system works in demos but not reliably in production, let's talk.