Case Studies

How Innoviti engineered a 60% TCO reduction in field support operations with custom multimodal AI

Table of Content

Share this article:

Background

Scaling field service operations for India’s largest offline payment network

Innoviti processes over ₹80,000 crore annually for 50,000+ merchants across India. Supporting enterprise retailers like Reliance Retail, Shoppers Stop, and DMart requires managing thousands of physical point of sale (PoS) EDC terminals across 2,000 cities. Operating this massive hardware footprint demands continuous field maintenance to prevent transaction downtime, and terminal downtime immediately stops merchant revenue.

Innoviti dispatches field agents daily to resolve hardware issues, perform audits, and deploy updates. But closing a support ticket isn’t a single check. Every job completion has three distinct dimensions:

Performance: was the payment system actually made functional, so transactions can flow again?
Governance: did the field agent really visit the store? This connects directly to merchant billing, fraud prevention, and audit trail for a distributed workforce, and is captured through a physical job sheet that must be stamped and signed by the store manager.
Brand: are the brand elements on the EDC terminal (stickers, signage, accessories) intact and correctly displayed?

Performance can be digitally validated. Governance and brand cannot. They require visual verification. To close each ticket, agents submit a complex mix of diagnostic logs, handwritten notes, store signboard photos, selfies, product sticker images, accessory shots, and the manually filled, stamped job sheet. Triangulating these visual proofs against the digital signals in real time, while catching both false positives and false negatives, is what drives ROI from the entire support investment.

Manual triangulation of these proof-of-work artefacts created a severe operational bottleneck. A large quality assurance team had to evaluate multiple transaction diagnostics and multiple photo images for every single ticket, and they could not move fast enough to match daily retail volumes. Innoviti was processing more than 7,000 service ticket logs every day, and the queue kept growing.

Innoviti needed a system that could extract data from these unstructured artefacts, verify completed work across all three dimensions, and scale with their operations.

The Opportunity

Why general purpose cloud APIs couldn’t take this to production

Innoviti began with generative AI experiments on a general purpose cloud provider. The early tests confirmed that AI could review and validate unstructured field data, but moving from prototype to production exposed the limitations of shared API ecosystems.

The shared environments operated as a complete black box. They denied Innoviti the root level infrastructure control required to fine tune the model, troubleshoot errors, manage memory utilization, and guarantee the sub 30 second latency that real time field operations demand. Beyond performance, the variable token economics of general purpose cloud platforms made scaling financially unviable. There was a separate compliance dimension to address as well, given the payment data involved.

To transition from prototype to production, Innoviti required infrastructure that could deliver four things:

Direct control over the serving stack. Root level access to adjust hardware settings, view infrastructure logs, optimize performance for retail uptime requirements, and diagnose system errors during outages. Without this, every latency spike or inference error became impossible to investigate.
Sustainable unit economics. A transparent, predictable cost structure that wouldn’t break as token consumption surged with daily audit volume. Variable token pricing made cost per ticket climb with scale, which is exactly the wrong direction for an automation play.
Deterministic performance. Dedicated infrastructure that processes every field report in under 30 seconds, regardless of network traffic or other tenants on the platform. Enterprise retail workflows cannot tolerate the variable latency typical of shared-resource APIs.
Visibility into model decision-making. Transparent access to model architecture and the freedom to modify weights, integrate proprietary training data, and disable unnecessary thinking modes. This was essential for tuning the system to recognize sector-specific attributes (retail store names, logos, stamps, payment hardware) and for maintaining trust in the verification process.

The Solution

Building a specialized AI layer to parse unstructured field reports

Innoviti developed a specialized AI engine designed to function as a structured audit layer for its distributed retail network. The system interprets the complex, often messy data generated during field interventions and converts it into verifiable operational records.

The architecture consists of two primary components:

Field Data Extraction Service.

A specialized service that parses unstructured diagnostic logs, handwritten agent notes, and terminal telemetry into structured data points.

Work Done Verification Engine.

An automated audit service that cross-references extracted field data against internal terminal health metrics, governance signals, and brand verification rules to confirm the successful resolution of hardware tickets.

The technical foundation is a custom Qwen 3.0 VL model, fine-tuned to recognize the specific technical vocabulary, hardware error codes, store signboards, product stickers, accessories, stamped job sheets, and maintenance patterns unique to the retail payment industry.

To transition these services from experimental code to a reliable production environment, Innoviti migrated its workloads to a dedicated inference stack on Neysa Velocis.

“You cannot run a real time service operations setup on infrastructure you cannot see. General purpose cloud APIs kept us locked out of the serving stack, making it impossible to diagnose latency spikes, manage cost and truly control our model’s behaviour.”

– Girish Varadarajan, Chief Data and AI Officer, Innoviti Technologies Ltd

The Neysa Partnership

Building a dedicated low-latency AI stack for vision processing at scale

The partnership focused on architecting a low-latency inference layer on Neysa Velocis to support the high throughput requirements of the Qwen 3.0 VL model. This allowed Innoviti to move their complex multimodal workflows, analyzing both physical hardware photos and technical text logs, into a secure, high-concurrency environment running 50 parallel LLM inferencing requests.

Neysa’s engineering team worked alongside Innoviti to harden the stack for production:

Multimodal inference optimization. Neysa tuned the orchestration layer to handle Qwen 3.0 VL’s large context windows, processing hardware images, governance proofs, and diagnostic text in a single pass with optimal quantization.
White-box control and root observability. Root-level access to the serving stack let Innoviti’s team monitor memory utilization in real time, calibrate hardware settings directly, and lock in a fine-tuned LLM setup without the migration churn of shared APIs.
Deterministic latency and sovereign control. Dedicated AI compute pods process every field report in under 30 seconds, with the data residency and infrastructure isolation payment workloads require.

“Neysa provided the goldilocks zone we were looking for. They gave us the exact balance of root level control, dedicated compute, and hands on engineering support to get our system ready for production.”

– Girish Varadarajan, Chief Data and AI Officer, Innoviti Technologies Ltd

The Impact

Sustaining production-scale AI across a 50,000 merchant retail network

The migration to Neysa Velocis transformed TIFIN’s infrastructure from an operational constraint into a commercial advantage.

60% reduction in total cost of ownership.

A dedicated resource model removed the scalability penalty of variable token pricing, and automating verification freed manual review teams for higher-value customer support work.

7,000+ service ticket logs processed daily.

Production-scale AI sustained across 50,000+ merchant stores in 2,000 cities, absorbing the daily maintenance surge without backlog.

Deterministic sub-30 second latency.

Vision-heavy field proofs and technical logs process fast enough to close tickets while agents are still on site, not after they’ve moved on.

96% automated verification accuracy with zero false positives.

Audit quality that matches or exceeds manual human review across all three dimensions of job completion – performance, governance, and brand.

Take the next step

Ready to move your enterprise AI use cases to production? Contact our team today to request a technical demonstration and explore our managed inference solutions.

Download PDF

Contact Expert

Share this article:

<br />