Fast Code AI

Ever tried buying… "Coc Col 12pk"?

Confusing, right?

A computer will file it, store it, list it, and serve it to a customer searching for Coca-Cola Classic 12-Pack — and completely miss the match.

That's not a made-up example. That's real data. From a real store. Sitting in a real database. Powering a real on-demand delivery marketplace.

Now imagine millions of these. Every product in every supermarket, supplied by a different manufacturer, each with its own creative interpretation of what a product name should look like.

Welcome to the world of retail product data. And welcome to Chapter 2.

The Problem

The job of an online data delivery platform is to ensure that every product listed on the platform has a clean, standardised name — the kind of name a customer can actually search for and recognise.

However, they don't control the data. Manufacturers do. And manufacturers, it turns out, are spectacularly bad at naming their own products consistently.

Here's what the raw data actually looked like:

"Coc Col 12pk" → Coca-Cola Classic 12-Pack
"BUD LT 24PK BTLS" → Bud Light 24-Pack Bottles
"HNKN 6PK BTL" → Heineken 6-Pack Bottles

To give you a real sense of the challenge, here's a simplified look at what the raw input actually looked like — and what we needed the AI to output:

Raw Input (Messy Manufacturer Data)

{
  "Manufacturer_Category_1": "Beer",
  "Manufacturer_Category_2": "Craft Beer",
  "Brand_Path": "Stoneface Brewing Co",
  "Manufacturer_Item_Name": "STONEFACE MOZ ACCALYPSE DDH",
  "Manufacturer_Size": "4 pk",
  "upc": 636251776120
}

Expected Output (Standardised for E-commerce)

{
  "Standardized_Product_Name": "Stoneface Brewing Co Mozaccalypse Double Dry Hopped IPA",
  "Category": "Alcohol > Beer > Craft Beer",
  "Size": "16 oz x 4 ct"
}

Look at that input. "STONEFACE MOZ ACCALYPSE DDH" needs to become "Stoneface Brewing Co Mozaccalypse Double Dry Hopped IPA." The abbreviation "DDH" needs to be expanded. The misspelling needs to be corrected. The size "4 pk" needs to become "16 oz x 4 ct", which means the individual container size isn't even in the raw data and needs to be looked up.

Spelling errors. Cryptic abbreviations. Missing details. No standard format. And this wasn't a few hundred records. This was millions.

The Human Bottleneck

Before us, every single record was cleaned by hand.

A person would look at a garbled product name, cross-reference it against manufacturer databases, Google the UPC code if needed, and manually rewrite it into the correct format.

The numbers:

100 records per day per person
10% error rate — meaning roughly 1 in 10 still had mistakes after human review

At the scale an online data delivery platform operates, this was unsustainable. You can't throw more people at a data problem that grows faster than your headcount. And every error that slips through means a customer can't find what they're looking for — which means lost sales for the retailer.

They needed a fundamentally different approach.

That's when they called us.

Why This Problem Is Deceptively Hard

On the surface, "clean up a product name" sounds like a simple text transformation. Run it through an LLM, get a clean name back, move on.

It's not.

Here's what makes it genuinely difficult:

The rules don't exist upfront. A standardised product name needs to follow a consistent format and include all relevant information. But which details matter depends on the product category. Beverages need container type (Bottle, Can). Other categories have their own requirements. These rules weren't documented anywhere. They had to be discovered iteratively, painfully, by evaluating model outputs against client expectations and working backwards to figure out what the "right" format actually looked like.

Information is scattered across multiple sources. The manufacturer record might say "Coc Col 12pk", which tells you almost nothing useful. To construct the full, standardised name, we had to pull from three different sources:

Manufacturer product details — including product descriptions, images, and UPC codes — often incomplete or inconsistent
UPC-based Google searches — to find supplementary product details when the manufacturer data wasn't enough
Product images — to visually identify things like container type (bottle, can, pouch, box) that weren't mentioned in the text data at all

Edge cases are everywhere. Multi-packs. Seasonal variants. Regional naming differences. Promotional packaging. Not every rule applies to every product and the long tail of edge cases in retail data is enormous.

The Solution: A Multi-Model Reasoning Pipeline

We didn't build a simple prompt-and-respond system. We built a reasoning pipeline that had multiple models working together, cross-checking each other, and flagging uncertainty for human review.

And we built it entirely on open-source models.

The Model Stack

DeepSeek R1 — Constructs the standardised product name from raw data.
Nemotron Ultra 70B — Verifies the output and flags inconsistencies.
Qwen QwQ 32B — Cross-validates and handles edge cases.

How The Multi-Model Verification Works

Rather than trusting any single model's output, we built a triangulation system. All three reasoning models process the same record simultaneously, and their outputs are compared:

All 3 match → High Confidence (auto-approved)
2 out of 3 match → Medium Confidence
All 3 different → Low Confidence (flagged for human review)

This is the key insight: instead of trying to make one model perfect, we used multiple models to keep each other honest. It's the same principle behind peer-reviewed research — consensus is more reliable than confidence.

The Vision Component

Some product attributes, especially container type, simply can't be extracted from text. Some records include that detail, but plenty don't.

So we processed manufacturer product images through a computer vision pipeline to identify container types visually: Bottle, Can, and other packaging formats. That information was then fed into the reasoning LLM's context so it could construct a complete, accurate product name.

The Results

Here's the before and after:

100 → 50,000 Records processed per day. That's a 500x increase.
10% → 6% Error rate. 40% fewer mistakes.
100% → 32% Records needing human review. 68% now run fully automated.
Scalability? Unlimited. Any volume. On demand.

A few things worth calling out: 68% full automation means more than two out of every three records go from raw manufacturer data to standardised product name with zero human involvement. The remaining 32% are flagged because the multi-model verification found disagreement, which is exactly the kind of record you want a human to look at.

What We Learned

Building this system taught us more about working with LLMs than any tutorial or benchmark ever could. A few takeaways:

The prompt is the product. In systems where the rules are complex and constantly evolving, prompt engineering isn't a one-time setup. It's a continuous refinement process that demands advanced prompt engineering and deep domain understanding. The rules weren't just written, they were discovered, tested, broken, rewritten, and tested again.

Multi-model verification beats single-model confidence. Don't try to make one model perfect. Run the same task through multiple models and compare their outputs. When they agree, confidence is high. When they disagree, you've automatically identified the records that need human attention. It's the difference between hoping for accuracy and engineering for it.

Infrastructure flexibility matters as much as model quality. The move from fixed Azure VMs to RunPod's pay-per-minute billing didn't make our models smarter. But it made our solution economically viable at scale. Choose infrastructure that matches your usage pattern, not just your compute needs.

What's Next

This project taught us that the real challenge in AI isn't the models — it's the systems around them. The data pipelines. The verification patterns. The prompt evolution. The infrastructure decisions.

We went deep on the engineering craft behind this project — the mega-prompt failures, the debugging tricks, the confidence detection hacks. That's coming in Chapter 2.5: 6 Hard Lessons from an LLM Data Pipeline.

If you're building with LLMs and want to skip some of the mistakes we made, that one's for you.

And if you're sitting on messy data at scale — whether it's product names, medical records, financial documents, or anything else — this pattern of multi-model reasoning, iterative prompt engineering, and smart infrastructure is remarkably versatile.

Messy data? We don't clean it. We solve it.
— Fast Code AI

Want to know more about AI ML Technology

Incorporate AI ML into your workflows to boost efficiency, accuracy, and productivity. Discover our artificial intelligence services.

Know More

1. The Problem

2. The Human Bottleneck

3. Why This Problem Is Deceptively Hard

4. The Solution

5. The Model Stack

6. The Results

7. What We Learned

8. What's Next

Fast Code AI Chronicles — Chapter 2: The Messy Data Problem