Summer Millers Ltd — AI Warehouse Ops - Stock Counting

01

Executive Summary

What we propose, at a glance

Proposal v1.0

What this section is

Think of this proposal as your map for the AI counting project. Each section that follows answers one specific question — what problem we're solving, what the system does, how long it takes, what you receive, and what's not in scope.

Three pillars

Image-based

Front + Top photo per stack. Fastest to ship.

Video-based

Short pan + multi-frame voting. Higher robustness.

One AI engine

Same backend, dashboard, audit trail. Both modes share every layer except the camera.

02

Problem Statement

Where manual counting breaks today

6 Pain Points

Why this proposal exists

Manual stock counting is slow, easy to get wrong, and leaves no proof. Below are the six pain points this system is designed to address.

Manual dependency

Every audit cycle requires physical counting. Bottlenecks at peak.

5–15% error rate

Dense stacking + occlusion + tall heights drive routine miscounts.

Audit discrepancies

Reconciliation delays + financial & compliance exposure.

Further constraints

No visual evidence

Manual counts leave no photographic proof to defend.

Doesn't scale

Manual effort scales linearly with warehouse count. Not viable.

Access constraints

Narrow gaps between stacks make side-face photography impractical.

03

Five-layer stack, one engine

Image & video share every layer except the camera

Architecture

Five-stage assembly line

The system runs every time an operator counts a stack. Whether you choose image or video capture, only the first stage changes — the other four are identical.

1

Capture

PWA · no app install

2

Quality Gate

Blur · light · framing

3

AI Counting

YOLO + SAHI

4

Review

Deviation check

5

Record

Audit trail · 7yr

04

Capture — Image vs Video

Pick per warehouse. Same AI engine. Two ways to feed it.

Choice

Option A · Image

Two photos per stack (Front + Top). Fastest to ship. Best for stable lighting and accessible stacks. ~30s per stack end-to-end.

Step 1

Open URL · pick stack

Step 2

Capture FRONT photo

Step 3

Capture TOP photo

Step 4–5

Quality gate → Upload → AI count

Option B · Video

Short 10s pan across stack face. ~300 frames → top 5 → voted count. Higher robustness for tall stacks, partial occlusion, low light.

Step 1

Open URL · pick stack

Step 2

Slow pan across stack face

Step 3

Live overlay guides operator

Steps 4–5

Top 5 frames auto-selected → multi-frame voted count

Hybrid is allowed. Default a warehouse to image, switch to video for tall stacks or monthly audits. Mode is logged per session. Different warehouses can use different modes — no need to standardise across the network.

05/06

Image & Video Pipelines

Front + Top pipeline · Pan, filter, vote

Detail

05 · Image Flow — Front + Top

Two photos in, audit-grade count out. The AI multiplies front (rows × cols) by top (depth). Most sessions complete in well under a minute.

Counting Formula

TOTAL = FRONT (rows × cols) × TOP (depth)

Edge cases

Top not feasible

Operator enters depth manually. Logged as "manual depth entry".

Front blocked

Two partial fronts captured; system stitches and re-runs inference.

Fallen units

Operator enters adjustment with reason code. Both stacks reconcile to net zero.

06 · Video Flow — Frame Funnel

Video replaces two photos with a short slow pan. ~300 raw frames → on-device scoring → top 5 frames → independent AI counts → median voting → 1 final number.

300

Raw

60

Scored

5

Picked

1

Count

Frame scoring signals

Blur · 5 ms

Laplacian variance

Brightness · 2 ms

HSV V-channel mean

Stability · 10 ms

DeviceMotion magnitude

In-frame · 80 ms

ONNX.js small detector

07

AI Model — YOLO + SAHI

Why fine-tuning is mandatory (zero-shot = 0.5% detection)

CV Engine

The brain of the system

We use a fine-tuned object-detection model (YOLO family) sliced into tiles by SAHI so it can count dense, repetitive stacks reliably. The model is trained on photos of YOUR own stocks.

YOLO + SAHI

YOLOv11-L default. SAHI slices dense stacks into tiles. 50–100 ms / image on CPU.

"stacking_unit" label

Every detected object labelled generically. Same model extends to new stocks.

Fine-tune mandatory

Pre-trained models detect <1% of dense stacks. Tested and confirmed.

ThirdEye capabilities

Computer vision Fine-tuning pipelines SAHI tiling Edge inference MLOps · drift detection

Empirical test

Pre-trained YOLO-World (3.8M general images) ran zero-shot on warehouse stacks — detected 0.5% of objects. Fine-tuning on Summer Millers' own data is not optional.

Accuracy potential

Target accuracy

93%

Zero-shot baseline

<1%

08

Architecture & Stack Options

All viable options on the table — recommendations marked. Final stack decided together.

Options-First

Logical architecture (cloud-agnostic)

Nothing is locked in. Every layer below has multiple viable options. Our badges mark our default recommendation — but the final selection is decided together with Summer Millers based on existing infra, budget, and compliance.

Deployment Target

AWS

Mumbai region · widest service catalog · mature ML tooling.

★ Recommended

Azure

India regions · strong enterprise integration · GPU availability.

Google Cloud

Mumbai region · Vertex AI for managed ML pipelines.

On-Premise

Full data sovereignty · ideal if Summer Millers has existing infra.

Inference Compute

★ Recommended

CPU Instance

~$150–300/mo. Sufficient with ONNX + SAHI for Summer Millers' scale.

GPU Instance

~$400–900/mo. Required only at >100 warehouse scale or sub-second SLA.

Serverless

Pay-per-inference. Good for bursty audit-cycle traffic; cold starts are a trade-off.

Detection Model

★ Recommended

YOLOv11 + SAHI

Best dense-detection balance · open weights · 50–100 ms / image on CPU.

RT-DETR

Transformer-based · stronger on occluded objects · slightly slower.

YOLOv8 / EfficientDet

Mature alternatives · evaluated in PoC bench if needed.

Capture Frontend

★ Recommended

PWA (React / Next.js)

Zero install · works on Android & iOS · easiest rollout.

Native iOS

Required only if iOS Safari APIs limit the PWA video flow.

Native Android

Optional · camera API parity with PWA — rarely needed.

09

Three-tier gate — trust but verify

AI count is never silently changed

Deviation Logic

Trust but verify

Every AI count is sanity-checked against your book stock from the ERP. The AI count is never silently overwritten — even when it disagrees with book stock, the original AI number is what goes into the audit log.

Deviation Formula

| AI Count − Reference | ÷ Reference

< 5%

Auto-Approve

Count locked. No human touch.

5–15%

Admin Review

Flagged · admin approves or rejects.

> 15%

Mandatory Recount

Operator must retake · supervisor notified.

Thresholds calibrated per warehouse during pilot, based on real (AI count, reference) pair distribution. Adjustable in admin config without redeploy.

10

PoC → MVP — 15 weeks

5 phases · image first, video added in P3

Timeline

PoC phases

Five phases across 15 weeks of PoC. Every phase has clear sign-off criteria before the next starts — no skip-aheads.

P0

SETUP

Wks 1–2

Warehouse visit · annotation pipeline · cloud + dev env

P1

FIRST COMMODITY

Wks 3–6

500–1k images · train YOLO · image flow live · 20 field sessions

P2

EXPAND

Wks 7–10

10+ stocks · model reuse · Dashboard v1

P3

VIDEO + TAG

Wks 11–13

Video PWA · frame funnel · deviation gate live

P4

UAT → MVP

Wks 14–15

UAT with Summer Millers · sign-off + handover

Post-PoC rollout

Pilot · Months 4–6

~10 warehouses · sub-2s latency · accuracy ≥ 93% maintained.

Regional · Months 7–12

~50 warehouses · multi-AZ · read replicas for analytics.

Full Scale · Year 2+

100+ warehouses · auto-scale fleet · optional BOT handover.

11

What this proposal does NOT cover

Boundary honesty — avoid surprises later

Boundary

Explicitly out of scope

Six things we explicitly do NOT cover in this PoC. If any turn out to be necessary, they can be scoped as separate work — but they're not bundled into the price.

✕ Out of scope

Single front image counting. Not audit-grade. Front + Top (or Front + manual depth) required.

✕ Out of scope

Zero-shot on new stocks. Each new stock needs training data and a cycle.

✕ Out of scope

Irregular / loose / bulk storage. Universal formula assumes repeating stacking units.

✕ Out of scope

Replacing ERP / WMS. Records counts + evidence. Doesn't trigger stock movement or procurement.

✕ Out of scope

Hardware procurement. Phones, tripods, lighting upgrades, connectivity — not in scope.

✕ Out of scope

Long-term model governance. Covered by separate AMC, not this PoC SOW.

12

What you receive

Per-stock · platform · documentation · IP transfer

Deliverables

Per stock + platform

Per Stock

Trained YOLO model + metrics · annotated dataset · accuracy benchmarking report · weights + config + scripts

Platform

PWA (image + video flows) · Backend API + inference service · Admin dashboard + audit trail · Docker deployment config · Source code + runbook

Documentation + IP

Documentation

System architecture · API reference · annotation standards · MLOps pipeline · operator quick-reference

IP Transfer: Upon final payment, all custom code · AI/ML models · training datasets · prompts · configurations transfer to Summer Millers Ltd. 24-month non-compete in milling / grain processing / stock warehousing.

13

Risk & Mitigation

Each known risk has a defined response

Risk

No surprises

Every project has risks. Below are the ones we already see today, with a specific mitigation paired against each. If new risks surface during PoC, they're added to this list with a mitigation designed before they become problems.

⚠ Risk

Lighting variability across warehouses degrades accuracy. → Collect training data under all observed lighting in PoC; on-device gate rejects under/over-exposed captures.

⚠ Risk

Unreliable network at remote warehouses. → Offline-first PWA queues locally + syncs on reconnect. Pre-test connectivity at commissioning.

⚠ Risk

iOS Safari gaps in video mode (focus lock, MediaRecorder format, background-tab kills). → Profile phone fleet at PoC Week 2. Plan thin native iOS wrapper if non-trivial iOS share.

⚠ Risk

Model drift over time (new suppliers, packaging changes, seasons). → Automated drift detection on rolling 7-day metrics. Targeted fine-tune on drifted data only.

⚠ Risk

Operator resistance. → Show-not-tell: side-by-side AI vs manual for first 20 sessions per warehouse.

⚠ Risk

Cloud lock-in concern. → Architecture is provider-agnostic. Every component maps to AWS / Azure / GCP / on-prem. Migration is weeks, not months.

14

Team Composition

Indicative — refined per final scope

Team

Who delivers this

Indicative composition for the PoC — sizes flex slightly based on final scope and how soon the MVP rollout begins running in parallel with the back half of PoC.

EM / Scrum Master

Part-time across engagement

AI/ML Lead

Model design + accuracy owner

AI Engineers × 2

Annotation · training · eval

Backend Engineer

FastAPI · deviation · audit

Frontend Engineer

PWA + dashboard

DevOps Engineer

Cloud · CI/CD · monitoring

QA Engineer

Field validation · UAT

Embedded with you

Twice-weekly standups

15

Charges

Pricing & commercial terms

Commercial

Pricing

Full and final charges will be available after complete project specifications have been determined. All pricing will be provided in a formal quotation following the scoping exercise.

Next step: Once Summer Millers Ltd confirms the final project scope — warehouse count, stock types, capture mode, deployment target, and rollout timeline — ThirdEye Data will issue a detailed, itemised cost proposal covering PoC, MVP rollout, and ongoing support options.

16

Post-delivery support

Free window + tickets-based

Support

Free support window

2 weeks of free bug-fix support immediately after delivery. Excludes new development, integrations, or feature additions.

2 weeks freeBug fixes only

Ongoing tickets-based

Post free window. Per-ticket basis with agreed SLAs. New feature work and integrations are separately scoped.

SLA-boundPer request

17

Live Demo — See the product in action

Simulated end-to-end flow · intermediate stages shown for transparency

Interactive

Interactive walkthrough

A live simulation of what an operator and admin would experience. Numbers and visuals are placeholders. Intermediate model views (quality scoring, SAHI tile slicing, multi-frame voting) are shown here to demonstrate the engine's thinking — in production, the operator sees CAPTURE → COUNT → APPROVE only. Click any step or hit Auto-play.

09:42 5G ▮

Working demo available on request. A deployed version at a similar reference site can be walked through live on a call. The simulation above is a faithful representation of the operator + admin experience — same screens, same flow, placeholder values.

18

Closing note

Ready to start immediately on contract approval

Closing

ThirdEye Data × Summer Millers Ltd

Thank you,
Summer Millers.

We can mobilise the team within a week of sign-off and have first stock data collection underway in Week 1. Every architecture decision in this proposal is open for discussion — we recommend, you decide.

Talk to ThirdEye Data →