Case Studies

How Evidence-Based AI Engineering works in practice.

FROM UNRELIABLE PIPELINE TO PRODUCTION DISCIPLINE

AI Enrichment Pipeline

A sales intelligence company's AI enrichment system tried to handle sixteen different reasoning patterns with a single prompt. We decomposed it into testable steps and took it from 40% to 95% — same models, same data.

Read case study

WHEN THE MODEL CAN'T REASON

Edge AI for Financial Documents

A fintech company needed frontier-model capabilities on-device, where only edge models could run. We reframed the problem from reasoning to classification and built a three-layer architecture that doesn't need the model to think.

Read case study

THE SPREADSHEET ISN'T TEXT. IT'S A DATABASE.

Survey Analytics Agent

A financial research firm needed reliable AI-driven survey analysis. Every LLM approach failed — until we stopped treating spreadsheets as text and started treating them as databases. Accuracy went from ~50% to 95%.

Read case study

AI CAN WRITE SQL, BUT CAN'T REPLACE YOUR ANALYST

Natural-Language BI

A SaaS company built a natural-language BI system over HubSpot and Sage. The LLM could write SQL but not like an analyst — until we gave it business context through dynamic few-shot examples. Accuracy went from 75% to 97%.

Read case study

TESTIMONIALS ARE EVIDENCE. NOT MARKETING COPY.

Testimonial Chatbot

A B2B company built a chatbot where prospective customers ask about SaaS products and get answers grounded in real customer testimonials — quoted, attributed, and refused when evidence doesn't exist.

Read case study

WHEN THE EVALUATION CRITERIA DON'T EXIST YET

Microdrama Evaluation

A production studio asked if AI could evaluate microdrama content. No rubric existed. No framework existed. Building the evaluation criteria itself became the project — and proved the methodology works even when nothing's been written down.

Read case study