# Deck 20 · Product Matching — Alternatives, Substitutions, Environmental

**Status:** Draft (not yet built)
**Saved:** 2026-06-28 by Jeff (verbal)
**Owner:** Plex / OMX Commercial + Master Data
**Phasing:** Internal first (3 use cases), External second

---

## One-line thesis

**Match every product against every other — competitor SKU, off-range SKU, supplier SKU, environmental alternative — so the customer-facing team can answer "what's a better option?" in seconds and OMX captures the lower-cost or higher-margin sale.** Win-win: customer pays less or chooses better, OMX wins margin or relationship.

## The wedge — why now

- Three internal pains today, all reaching for the same engine:
  1. **Lower-cost alternatives** — "we sell the same thing cheaper" → customer saves, OMX wins switching
  2. **Substitutions** — when OOS or being discontinued, the equivalent → keeps the sale
  3. **Environmental alternatives** — sustainability lever for any RFP, government, or sector that asks
- All three are matching at heart — semantic + attribute + price comparison across SKUs
- Plex-CI gives us the competitor data; AI embeddings give us the matching tech; PDX gives us the canonical product master (Deck 16)
- External use ("show customer the match") comes later — internal first

## Three internal use cases

| Use case | Who uses | What it answers |
|---|---|---|
| **Lower-cost alternative** | AM, Customer Service, Sales-Ops | "Same job, OMX SKU is $X cheaper" — switching narrative |
| **Substitution** | Customer Service, Sales-Ops, Quoting | "OMX is OOS; here's the closest equivalent we DO have" — saves sale |
| **Environmental alternative** | RFP team, sustainability-minded AMs | "OMX SKU is more sustainable than what you're buying" — wins RFP / pre-empts ESG ask |

## Then external (Phase 2)

- Customer-facing on web + Ask Max — "you bought X; here's the better option"
- Powered by same matching engine

## What this deck covers

1. **The matching engine** — embedding-based (AI) + attribute-based (structured) + price comparison
2. **The three internal use cases** — depth on each
3. **Plex-CI feed** — competitor SKU → OMX SKU; the data spine
4. **PDX integration** — canonical OMX master (Deck 16); enriched attributes feed matching quality
5. **Sustainability data** — recycled content, certifications (FSC, EnergyStar, EWG), carbon proxy
6. **Confidence scoring** — every match cites confidence + evidence
7. **External phase** — same engine, customer-facing UI

## What this deck explicitly does NOT do

- Not catalog management (Deck 05 / Deck 16 own)
- Not competitor pricing (Deck 13 — Plex-CI owns the data)
- Not the AI conversational interface (Deck 17 — though it consumes matching results)

---

## Problem framing (what's broken)

- **OOS = lost sale** — customer service rep can't find the equivalent fast enough; customer goes elsewhere
- **Off-range guess-work** — quoting team manually proposes alternatives based on hunch
- **No sustainability angle** — when the RFP asks "what's your greener option", the answer is bespoke each time
- **Switching is anecdotal** — Stage 3 (Deck 09) needs "here's the OMX SKU that matches your old supplier" — no engine today
- **Margin opportunity missed** — no system suggests the higher-margin alternative when both meet customer need

## Benefits (the value story)

| Lever | Mechanism | Sizing approach |
|---|---|---|
| **Substitution sale rescue** | OOS → equivalent → kept sale | Each rescued sale = full revenue; sizing needs OOS frequency |
| **Switching conversion** | "Match my old supplier" engine in Stage 3 (Deck 09) | Direct enabler — sizing in Deck 09 |
| **RFP win rate** | Sustainability + cost alternatives baked in (Deck 19) | Win rate lift |
| **Margin uplift** | Smart suggestion = higher-margin alt where customer-need-equivalent | 1-3% category margin lift achievable |
| **Quoting speed** | Engine vs manual hunt | Quote turnaround ↓ |
| **External UI value** | Customer-facing alt suggestions | Conversion lift in Deck 08 funnel |

---

## Layout candidates from the gold standard

- **Cover** — customer-service rep at desk; chat from customer "this is OOS, what now?" + system showing 3 matched alternatives with confidence scores
- **Problem vector grid (4-6)**: OOS-lost-sale / Off-range-guess / Sustainability-gap / Switching-no-engine / Margin-miss
- **Three use cases — side by side** — Lower-cost / Substitution / Environmental
- **The matching engine** — embedding + attribute + price diagram
- **Confidence scoring visual** — match quality bar with citation pop-out
- **Plex-CI + PDX feed diagram** — how the data spine works
- **Sustainability data sources** — FSC, EnergyStar, EWG, recycled-content tags
- **Internal-then-external roadmap** — 3 internal use cases (Year 1) → web + Ask Max external (Year 2)
- **The ask** — embedding model + attribute pipeline + UI for 3 internal surfaces

---

## Open questions to resolve

1. **AI model** — Anthropic embeddings, OpenAI embeddings, or open-source? Cost per SKU pair
2. **Confidence threshold** — what's the auto-suggest threshold vs human-review threshold?
3. **Sustainability data licensing** — FSC, EnergyStar are public; commercial sustainability databases (EWG, Sustainalytics) may need licence
4. **Margin-aware ranking** — should suggestions be ranked by customer-best, OMX-best, or hybrid?
5. **External UI confidence-presentation** — show confidence to customer or just present?
6. **Connection to IBP WS5** — IBP-PROGRAM.md explicitly includes "Product Matching" under Competitive Intelligence WS5; this deck IS that workstream component
7. **Connection to Deck 16 (Product Data Enhancement)** — matching quality = data quality; sequencing matters

## Audience

**Primary:** Chief Commercial Officer + Customer Service Director + Master Data lead.
**Secondary:** Sales-Ops + Sustainability lead + RFP team (Deck 19).
**Tertiary:** Digital — Phase 2 external surface owner.

## Reference

- Memory: **IBP-PROGRAM.md** — WS5 Competitive Intelligence explicitly lists "Product/price matching" as in-progress; this deck IS that
- Memory: **Plex-CI** — competitor SKU data feed; 545k URLs, 471k changes captured
- Memory: **Deck 16 PDX** — canonical OMX product master that matching reads against
- Memory: **Deck 09 Switching** — substitution engine is core to switching narrative
- Memory: **Deck 19 RFP Platform** — alternatives are RFP-response ammo
- Sustainability frameworks: FSC, EnergyStar, EWG, Sustainalytics, recycled-content claim verification

---

## Research deepening (background-agent, 2026-06-28)

### Product-matching technology stack (verified 2026)

| Component | Option | Cost | Strengths | Trade-offs | Source |
|---|---|---|---|---|---|
| **Embeddings — proprietary** | OpenAI `text-embedding-3-large` (3072d) | USD 0.13 / M tokens | Strong general performance; reliable API; ~99.5% uptime | Vendor lock-in; sends product strings off-shore | https://platform.openai.com/docs/guides/embeddings |
| | OpenAI `text-embedding-3-small` (1536d) | USD 0.02 / M tokens | 6x cheaper; ~96% of large's MTEB score | Slightly weaker on long/technical SKU strings | (same) |
| | Cohere `embed-v4` (1024d) | USD 0.12 / M tokens | Multilingual (handy for Māori product names); good retrieval | Smaller ecosystem | https://cohere.com/embeddings |
| | Voyage AI `voyage-3` (1024d) | USD 0.06 / M tokens | Top of MTEB retail leaderboard mid-2025 | Smaller vendor | https://docs.voyageai.com/docs/embeddings |
| **Embeddings — open-source** | `sentence-transformers/all-mpnet-base-v2` | Self-host (compute only) | Free; mature; 384d-768d range | Slightly lower retrieval quality than proprietary | https://huggingface.co/sentence-transformers |
| | `BAAI/bge-large-en-v1.5` / `bge-m3` | Self-host | Top open MTEB scores; bge-m3 is multilingual | Needs GPU at scale | https://huggingface.co/BAAI/bge-m3 |
| | `intfloat/e5-mistral-7b-instruct` | Self-host (heavy) | Best-in-class open model | 7B params — needs serious GPU | https://huggingface.co/intfloat |
| **Vector store** | pgvector (Postgres) | Already-in-stack | Zero new infra; works to ~10M vectors comfortably | Tuning needed past 10M | https://github.com/pgvector/pgvector |
| | Pinecone | USD 70/mo starter → USD 5-30k/yr | Managed, fast | Vendor cost | https://www.pinecone.io/pricing/ |
| | Qdrant / Weaviate / Milvus | Self-host or cloud | Open-source; battle-tested | Ops overhead | https://qdrant.tech ; https://weaviate.io |
| **Reranker** | Cohere `rerank-v3.5` | USD 2 / 1k searches | Big lift over pure-vector for product matching | API call cost | https://cohere.com/rerank |
| | Voyage `rerank-2` | USD 0.05 / 1k | Cheaper alternative | Newer | https://docs.voyageai.com/docs/reranker |
| **Specialised matching** | **Product matching transformer (e-commerce specific)** — `pim-cosine` / Walmart/Amazon-published matching models | Self-host | Trained on retail SKU pairs | Limited availability publicly | Multiple HuggingFace community models |

**OMX-fit recommendation:** Hybrid — OpenAI `text-embedding-3-small` for cost (USD ~50 to embed full 50k-SKU catalogue once) + pgvector on existing Postgres + Cohere `rerank-v3.5` on top-50 candidates for the final ordering. Total run cost ~USD 200-400/mo for the volumes implied by Plex-CI's 545k URL / 471k change capture.

### Sustainability data sources

| Source | Coverage | NZ relevance | Licence model | URL |
|---|---|---|---|---|
| **FSC (Forest Stewardship Council)** | Paper, wood, packaging — chain-of-custody certs | High (large share of OMX paper, furniture) | Public certificate lookup; bulk data via licensed reseller | https://info.fsc.org/certificate.php |
| **PEFC** | Forest cert alternative to FSC | NZ Plantation Forest Cert mapped to PEFC | Public lookup | https://www.pefc.org |
| **EnergyStar (US EPA)** | Electronics, appliances; product lookup API | Mid (US labels appear on many imports) | Public dataset/API | https://www.energystar.gov/productfinder/ |
| **EWG Skin Deep + EWG Verified** | Cleaning + personal-care chemical safety | Indirect — relevant for OMX cleaning + breakroom range | Public lookup; bulk licence needed | https://www.ewg.org/skindeep/ |
| **Sustainalytics (Morningstar)** | ESG risk scores for parent companies (not SKUs) | High for supplier-level claims | Commercial — typically USD 25-100k/yr | https://www.sustainalytics.com |
| **CDP (Carbon Disclosure Project)** | Supplier emissions disclosure | Used by Fonterra, Air NZ, NZ Govt supplier code | Free for queries; supplier engagement programme paid | https://www.cdp.net |
| **GHG Protocol scope 3 / IRECs** | Carbon accounting frameworks | NZ aligned | Open | https://ghgprotocol.org |
| **GREENGUARD (UL)** | Indoor air quality / furniture VOCs | Mid — relevant for furniture range | Public cert search | https://spot.ul.com |
| **EPEAT** | Electronics environmental rating | Mid — IT hardware | Public | https://www.epeat.net |
| **Rainforest Alliance / Fairtrade** | Coffee / tea / cocoa (breakroom) | Mid | Public | https://www.rainforest-alliance.org |
| **B Corp register** | Company-level certification | Used by NZ B Corps including Ethique, Allbirds | Public | https://www.bcorporation.net |

**NZ-specific sustainability sources (critical for OMX):**

| Source | What it covers | Source |
|---|---|---|
| **Toitū Envirocare** | NZ's national carbon + enviromark certification scheme; NZ Govt supplier preference | https://www.toitu.co.nz |
| **Environmental Choice NZ (ECNZ)** | NZ Govt-endorsed ecolabel (Type I ISO 14024); 30+ product categories incl. paper, cleaners, IT | https://www.environmentalchoice.org.nz |
| **NZ Govt Procurement Sustainability Rules (Rule 18-20)** | Mandates broader outcomes incl. reduced emissions in govt RFPs | https://www.procurement.govt.nz/broader-outcomes/ |
| **Climate-related Disclosures regime (XRB)** | Mandatory for ~200 large NZ entities from 2024; supplier reporting cascading | https://www.xrb.govt.nz/standards/climate-related-disclosures/ |
| **Sustainable Business Network (SBN)** | NZ business sustainability hub; member directory | https://sustainable.org.nz |
| **NZ Plastics Pact / Soft Plastic Scheme** | Packaging compliance | https://plasticspact.org.nz |

**Carbon proxy approach:** When SKU-level Scope 3 data is unavailable, use Open Source Life-Cycle Assessment (LCA) databases (Ecoinvent — paid, USD ~6k/yr for commercial; OpenLCA — free) blended with sector emission factors from `https://environment.govt.nz/publications/measuring-emissions-detailed-guide/`.

### Sources cited (PR-013)

- https://platform.openai.com/docs/guides/embeddings
- https://cohere.com/embeddings
- https://docs.voyageai.com/docs/embeddings
- https://huggingface.co/BAAI/bge-m3
- https://github.com/pgvector/pgvector
- https://www.pinecone.io/pricing/
- https://info.fsc.org/certificate.php
- https://www.energystar.gov/productfinder/
- https://www.ewg.org/skindeep/
- https://www.sustainalytics.com
- https://www.cdp.net
- https://www.toitu.co.nz
- https://www.environmentalchoice.org.nz
- https://www.procurement.govt.nz/broader-outcomes/
- https://www.xrb.govt.nz/standards/climate-related-disclosures/
- https://environment.govt.nz/publications/measuring-emissions-detailed-guide/

---

## Vectors + visuals

### Lucide icon choices (Ask Max set compatible)
- **Lower-cost alternative:** `tag` / `arrow-down-circle` / `piggy-bank`
- **Substitution:** `replace` / `shuffle` / `arrow-right-left`
- **Environmental:** `leaf` / `recycle` / `sprout`
- **Matching engine:** `git-compare-arrows` / `link-2` / `puzzle`
- **Embeddings / AI:** `brain-circuit` / `binary` / `sparkles`
- **Confidence score:** `gauge` / `badge-check` / `bar-chart-3`
- **Plex-CI feed:** `radar` / `satellite-dish` / `download-cloud`
- **PDX integration:** `package-2` / `boxes` / `database`
- **External Ask Max surface:** `message-circle-question` (Ask Max icon family)
- **Sustainability certifications:** `award` / `shield-check` / `leaf`

### Image concepts (NZ context, 4-6)
1. **Cover** — Customer-service rep at desk; chat window from a Christchurch SME customer ("the toner cart we usually buy is OOS"), system showing three matched alternatives with green/amber confidence bars. Subtle NZ context: rep wears OMX polo, NZ Post tracking tab open in browser background.
2. **Three-use-case triptych** — left tile: shopping-cart with dollar-down arrow (lower cost); centre tile: two products with swap arrow (substitution); right tile: leaf + product (environmental). Uniform layout, Lucide icons large.
3. **Matching engine diagram** — three input streams (embedding text similarity / structured attributes / price tag) → fusion box → ranked match list with confidence chips. Use `git-compare-arrows` at the fusion node.
4. **Confidence score widget** — close-up of a single match card: SKU image, confidence 87%, evidence chips ("brand match", "pack-size match", "spec match", "FSC equivalent"), cite link.
5. **Sustainability data stack** — pyramid: bottom = Toitū / ECNZ / FSC (NZ + global certs); middle = supplier ESG (Sustainalytics / CDP); top = product-level carbon proxy. NZ flag motif in Toitū tile.
6. **Phase-2 external surface (Ask Max)** — phone mock: customer browsing OMX web; "Better option?" Lucide `sparkles` button; result panel with "Save $4.20 + FSC-certified" pill. Anchors the internal→external narrative.
