Operations · 9 min read

Customers never send a part number:
how to quote from photo, VIN, and WhatsApp audio

In auto parts, less than 15% of quotes arrive with an exact part number. The rest arrives as a photo of the broken piece, the vehicle VIN, an audio describing "the noise when braking," or a screenshot of another quote. That translation is done today by a human rep — and it costs time, errors, and lost sales. Here's how AI processes each format, what changes operationally, and where it still needs human intervention. Applies to retailers, wholesalers, distributors, and importers across Mexico, Colombia, Argentina, Chile, and Peru.

V

Victoria · Quoting digital collaborator

Suplifai · Published May 23, 2026

What the customer actually sends

The customer isn't a technical expert. They know the problem — their car isn't braking well, the truck has a strange noise, a part broke — and they send whatever they have on hand. They almost never send the data the catalog needs.

In auto parts, less than 15% of quotes arrive with the right part number (OEM or aftermarket). The other 85% comes in formats that a human rep has to translate into something searchable. That translation is invisible work that nobody measures but everybody pays for.

The bottleneck in quoting isn't the number of quotes. It's that every quote starts with an input that has to be interpreted before it can be quoted.

The 4 unstructured input formats

Four formats cover virtually all quotes arriving via WhatsApp in auto parts. Proportions vary by business type (the corner retailer gets more photos, the wholesaler more VINs and numbers), but the mix always includes all four.

Format 1
Photo of the part
~40% of auto parts quotes
Real example
[Photo attached: a wheel bearing with grease and dust, still mounted on the car, no visible label] "How much for this?"
The most common format. The customer snaps a photo of the car or the part they removed, often unclean, poorly lit, or with the part still installed. The part-number label is rarely visible. The photo may be of the broken part, the new one to replace, or the place where it goes.
Human rep: spends 3-8 minutes identifying the part, asking for more angles, trying to read marks or numbers. Often ends up asking for the VIN to confirm the variant.
How AI processes it: vision models (multimodal) recognize the part, identify distinguishing features (shape, relative size, connections), read the part number if visible, and propose identification with a confidence level. If confidence is low, they ask for complementary info before quoting — instead of guessing.
Format 2
Vehicle VIN
~25% of quotes (climbs to ~40% in distributors and wholesalers)
Real example
"I need front brake pads, my VIN is 3VW1K7AJ8FM263451"
The VIN is 17 characters with international standard encoding (ISO 3779). It lets you identify make, model, year, engine, and submodel of the vehicle, and from that filter what parts apply. The customer copies it from the vehicle documents, sometimes incomplete or with transcription errors (I confused with 1, O with 0). It's one of the cleanest inputs when it comes right — and one of the most frustrating when it doesn't.
Human rep: pastes the VIN into catalog software (one click when it works, multiple clicks when the lookup fails due to a transcription error), validates make/model, searches applications, sends the quote. Takes 2-5 minutes when everything works.
How AI processes it: validates the VIN (length, check digit, prohibited characters), decodes it against databases (NHTSA + regional decoders for local brands), retrieves make/model/year/engine, filters the catalog to compatible parts, and quotes. If the VIN has an obvious transcription error (I instead of 1), it suggests correcting before failing.
Format 3
WhatsApp audio or descriptive text
~15% of quotes
Real example (audio)
"Look, this is the bearing that goes next to the wheel, the front one on the driver's side, on my old 2008 Tsuru, this one makes noise when I brake..."
Conversational description is the richest format in information — the customer tells the whole context — and at the same time the hardest to process. It mixes regional slang ("balero" in MX, "rodamiento" in AR, "ruleman" in some places), informal vehicle references, and often includes the symptom ("makes noise when braking") that helps confirm the part.
Human rep: mentally transcribes, translates slang to technical terms, infers model if the customer was vague, searches the catalog. When it's a long audio (>30 sec), often needs to listen 2-3 times. Takes 4-7 minutes.
How AI processes it: transcribes the audio (models like Whisper handle LATAM accents with good accuracy), interprets regional slang contextually, identifies the vehicle mentioned, infers the part using both the name and the described symptom. If anything is ambiguous, it asks before quoting.
Format 4
Screenshot
~20% of quotes (rises in professional buyers and large workshops)
Real example
[Screenshot of a quote from another supplier] "Can you match this price or beat it?"
The most common screenshots are: quote from another supplier (customer wants to match price), manufacturer catalog page (looking for part number confirmation), Google search result, or screenshot of the customer's ERP. The customer sends it because they already did the search work and want to speed things up.
Human rep: reads the screenshot mentally, copies the data into the system, validates against their own catalog, adjusts price or substitutes brand. Takes 3-6 minutes. If it's a competitor screenshot, often has to make a pricing decision on the spot.
How AI processes it: OCR + structural parsing extracts part numbers, prices, and mentioned brands. If it's a competitor quote, cross-references against own catalog and offers the direct part or a substitute. If it's a manufacturer page, validates the part number and runs the search.

The hidden cost of human translation

The work of translating an unstructured format into a searchable input has three costs that don't show up on the balance sheet:

1. Time: 30-50% of every quote

Adding average times per format (3-8 min for photo, 2-5 for VIN, 4-7 for audio, 3-6 for screenshot), the translation time before being able to quote is 3-7 minutes per request. Over total quoting time (which rarely drops below 10 minutes), that's 30-50%. With 1,500 quotes/month and a 5-minute average, that's 125 hours/month of invisible work — equivalent to 0.7 FTE just interpreting inputs.

2. Errors: half of returns come from bad identification

When the rep guesses wrong from an ambiguous photo or audio, the customer receives the wrong part. That misidentification return is 40-60% of total sector returns. Each return costs logistics, team time, and most expensive, customer trust.

3. Loss: when the rep asks for more info, the customer leaves

If the rep can't identify the part from what the customer sent, they ask for more (another photo, the VIN, make/model). That question consumes the customer's time — and in auto parts, where the customer sent the same request to multiple suppliers in parallel, it's often enough to lose the sale to whoever identified the part on the first try.

Human translation of the input isn't just a capacity bottleneck. It's the main source of identification errors and of quotes lost to friction.

What changes operationally when AI processes it

Replacing human translation with automated processing isn't just "faster". It changes 4 operational metrics at once:

A typical distributor with 1,500 monthly quotes sees, in the first 60 days, response time dropping from ~25 minutes to <1 minute, identification-error returns falling from 6% to 2-3%, and quote-to-order conversion rising 5-12 percentage points.

What AI still doesn't solve well

Technical honesty: there are 4 scenarios where AI fails or requires human intervention, worth knowing before implementing:

1. Unusable photo

When the photo is completely blurry, too dark, or shows something irrelevant (the customer's hand, part of the workshop floor), AI can't infer. It asks for another photo. If the customer can't send one, it escalates to human.

2. Incomplete, mistyped, or nonexistent VIN

VINs prior to 1981 don't follow standard format, rare imported vehicles may not be in databases, and transcription errors are sometimes unrecoverable without a photo of the document. In those cases AI asks for the VIN photo or escalates.

3. Very local or ambiguous slang

Sector vocabulary varies notably across LATAM markets: in Mexico a customer asks for a balero for a Tsuru or a mofle for their Chevy; in Colombia the same customer asks for a rodamiento for a Renault Logan; in Argentina a ruleman for a VW Gol or Renault 12; in Chile the slang is similar to Colombian but Hyundai Accent and Nissan V16 dominate; in Peru the mix combines MX and CO terms with high Asian-brand penetration. "The one above the driveshaft" can be any of several parts depending on the vehicle. AI asks before quoting wrong — but that question is extra friction that in some cases is lost as a sale. Mitigation: train the digital collaborator with vocabulary specific to the market it operates in (an MX collaborator shouldn't assume it knows Argentine lunfardo).

4. Custom or very low-volume vehicles

Modified, grey-import, public service vehicles with special parts — all cases where the standard catalog doesn't apply. They're escalated to humans with all the context already gathered by AI (photo, VIN, description).

In practice, a well-implemented AI handles 70-85% of quotes end-to-end and routes the rest to the human team. The difference from the status quo: when it routes, it routes with full context. The rep doesn't start from scratch.

How to start processing unstructured inputs

Recommended order to incorporate this type of processing:

Frequently asked questions

What percentage of quotes arrives with an exact part number?

In the auto parts sector across LATAM, less than 15% of quotes arrive with a direct, correct part number (OEM or aftermarket). The rest arrives as a photo (~40%), vehicle VIN (~25%), audio or descriptive text in regional slang (~15%), or screenshot (~5-20% depending on customer type). The human rep has to translate all those formats before being able to quote.

Can AI quote from a blurry or low-quality photo?

Depends on the level of degradation. Current vision AI (multimodal models) works with imperfect photos — dirty, partially mounted, poorly lit — and still identifies the part with reasonable accuracy if there are distinguishing features. When the photo is unreadable, it asks for more information from the customer instead of guessing. That reduces identification-error returns.

How does AI process the vehicle VIN?

The VIN is 17 characters with international standard encoding. AI validates it (length, check digit, prohibited characters) and decodes it against databases (NHTSA + regional decoders for local brands). It retrieves make, model, year, engine, and submodel, and filters the catalog to parts compatible with that specific vehicle.

Does it work with audio in Mexican, Colombian, or Argentine regional slang?

Current models transcribe LATAM accents with good accuracy and handle common regional auto parts slang (balero/rodamiento, chumacera/cojinete, mofle/escape, etc.). When there's a very local or ambiguous term, AI asks for clarification before quoting. Accuracy improves when the digital collaborator is trained with vocabulary specific to the market it operates in.

When do I need human intervention on a quote like this?

Four scenarios: 1) unusable photo the customer can't improve; 2) very custom part or rare imported vehicle outside the standard catalog; 3) special pricing decisions (discount, credit terms); 4) escalation when the customer wants to talk to a person. AI handles 70-85% of quotes end-to-end and routes the rest to the human team with all the context already gathered.

Want to see it quote from a photo?

Live demo with your inputs

30 minutes. We send Victoria a real photo, a VIN, an audio from your day-to-day — and watch how she quotes in real time.

Book demo