Scalable Computer Vision in Enterprises: Deployment, Limitations and Future Directions.

Denis PINCHUK

doi:10.37547/tajet/Volume07Issue07-03

Authors

Denis PINCHUK
Senior Data Science Engineer at The Walt Disney Company, USA

DOI:

https://doi.org/10.37547/tajet/Volume07Issue07-03

Keywords:

computer vision artificial intelligence neural networks enterprise workflows business process optimization

Abstract

Computer vision (CV) is increasingly embedded in enterprise workflows. This article presents a comprehensive analysis of how CV systems are being used to automate complex visual tasks, replace repetitive labor, and enhance decision-making in different industries at scale. Special attention is given to the key determinants of CV effectiveness and operational challenges companies face when implementing the technology. The author notes that treating computer vision not as a static tool but as an evolving infrastructure, organizations can unlock substantial value while preparing for the next generation of AI-driven optimization.

The American Journal of Engineering and Technology

18

https://www.theamericanjournals.com/index.php/tajet

TYPE

Original Research

PAGE NO.

18-23

DOI

10.37547/tajet/Volume07Issue07-03

OPEN ACCESS

SUBMITED

18 June 2025

ACCEPTED

29 June 2025

PUBLISHED

04 July 2025

VOLUME

Vol.07 Issue 07 2025

CITATION

Denis PINCHUK. (2025). Scalable Computer Vision in Enterprises:
Deployment, Limitations and Future Directions. The American Journal of
Engineering and Technology, 7(07).
https://doi.org/10.37547/tajet/Volume07Issue07-03

COPYRIGHT

© 2025 Original content from this work may be used under the terms
of the creative commons attributes 4.0 License.

Scalable Computer Vision
in Enterprises:
Deployment, Limitations
and Future Directions.

Denis PINCHUK

Senior Data Science Engineer at The Walt Disney
Company, USA

Abstract:

Computer vision (CV) is increasingly

embedded in enterprise workflows. This article presents
a comprehensive analysis of how CV systems are being
used to automate complex visual tasks, replace
repetitive labor, and enhance decision-making in
different industries at scale. Special attention is given to
the key determinants of CV effectiveness and
operational

challenges

companies

face

when

implementing the technology. The author notes that
treating computer vision not as a static tool but as an
evolving infrastructure, organizations can unlock
substantial value while preparing for the next
generation of AI-driven optimization.

Keywords:

computer vision, artificial intelligence, neural

networks, enterprise workflows, business process
optimization.

INTRODUCTION

Computer vision (CV), a subfield of artificial intelligence
focused on enabling machines to interpret and process
visual information, has transitioned from experimental
research into widespread industrial application. As
enterprises increasingly seek to automate visually
intensive tasks, CV systems are being integrated into
core business workflows across diverse sectors,
including manufacturing, retail, logistics, insurance,
agriculture, and public safety. These systems promise
not only operational efficiency and cost savings, but also
new levels of scalability. However, real-world
deployments reveal a complex landscape marked by
significant technical, infrastructural, and organizational
challenges.

This article provides a comprehensive overview of how
computer vision is currently used to optimize business

The American Journal of Engineering and Technology

19

https://www.theamericanjournals.com/index.php/tajet

processes, with an emphasis on practical use cases,
deployment barriers, and emerging trends. It draws on
both peer-reviewed research and first-hand industry
experience to examine the determinants of system
effectiveness,

including

data

quality,

model

architecture, computational constraints, and integration
strategies. The analysis highlights the need to treat CV
not as a static automation tool, but as a continuously
evolving component of enterprise infrastructure

—

one

that demands adaptive pipelines, robust evaluation
metrics, and long-term cross-functional collaboration.
By critically evaluating existing applications and future
directions, this paper aims to contribute to a deeper
understanding of the conditions under which computer
vision can deliver sustained value in real-world
enterprise environments.

Where Computer Vision Optimizes Business Processes

Computer vision replaces humans in tasks that require
constant visual attention

—

where people are prone to

fatigue, distraction, or inconsistency. The aim isn’t

simply to automate, but to do so with greater precision,
scale, and reliability.

In manufacturing, computer vision systems are widely
used to detect defects and monitor production lines.
They can spot inconsistencies in materials or shapes in
milliseconds. Research shows that modern CV solutions
can achieve 97% inspection accuracy [1]. Companies like
Siemens already have AI-driven quality control systems
designed to identify anomalies and defects across
different industries. One example is the automotive
industry, where manufacturers must avoid scratches,
dents, poor welds, and defective electronic components
[2].

In retail, computer vision delivers measurable
improvements. One common use is shelf monitoring:
instead of relying on employees to walk around the
store checking stock, cameras and vision algorithms
detect when a product is running low or misplaced and
automatically signal staff. This streamlines restocking,
cuts labor costs, and helps avoid lost sales due to empty
shelves [3]. A more analytical application involves
optimizing product placement. The system tracks how
customers interact with displays and uses that data to
determine which shelf arrangements maximize
purchase behavior. Though this is a different business

goal

—

one focused on logistics, the other on psychology

and sales

—

both rely on nearly identical CV technology.

In logistics and warehousing there are CV applications

like package sorting or movement tracking. In Amazon’s

fulfillment centers, for instance, an AI-powered trio of
robotic arms sort, stack, and consolidate millions of
items and customer orders. The latest version of one of
them called Sparrow uses advanced computer vision to
handle over 200 million unique products of all different
shapes, sizes, and weights [4]. One of the recent studies
shows that a computer vision platform can lead to as
much as a 45% reduction in the time required for
inventory counting and a 9% increase in inventory
accuracy [5].

In insurance, companies use computer vision to assess
vehicle damage from photos submitted after an
accident, reducing the need for physical inspections and
speeding up claims processing. Tractable, for example,
offers a CV-based platform that enables insurers to
process claims up to ten times faster than traditional
methods, with damage detection and cost estimation
models trained on millions of data-rich images [6].

And in agriculture, large-scale farms employ drone
footage and satellite imagery enhanced by computer
vision models to monitor crop health, detect pest
infestations, and predict yield. These models can analyze
patterns in plant coloration or canopy size to trigger
alerts or guide precision interventions. For instance,
leveraging AI and CV, John Deere is building AI-equipped
robot sprayers reducing herbicide usage by only
targeting weeds [7].

Security and surveillance represent another widespread
application. Large companies, airports, and factories use
CV to detect unauthorized access, track movement
patterns, or recognize suspicious behavior. These
systems analyze dozens of camera feeds in real time and
can flag anomalies before human operators notice

them. Research shows the AI сamera market is expected

to generate $35,5 billion in sales by 2034 as smart
surveillance and analytics gain popularity [8].

An illustrative example from my professional experience
involves the deployment of computer vision systems at
Walt Disney amusement parks to ensure that park
visitors remain secure. We use a network of installed

The American Journal of Engineering and Technology

20

https://www.theamericanjournals.com/index.php/tajet

cameras and custom-built CV algorithms to monitor
guest behavior in real time. These models are trained to
identify potentially dangerous situations

—

like someone

standing up on a ride or entering a restricted area

—

and

automatically notify operators. Previously, this kind of
monitoring required a person to watch 10 or 12 screens
at once, which is not only exhausting but also error-
prone. Our system can process all video feeds
simultaneously and with greater consistency, allowing us
to detect risks faster and more reliably.

Key Determinants of CV System Effectiveness

The success of a CV solution in optimizing business
processes depends on several interrelated factors. The
first and arguably most important component is data
quality. If the raw images used for training or prediction
are poor, there is little that even the advanced model
can salvage. Lighting conditions, resolution, occlusions,
and changes in the physical environment all introduce

variance. On our own project, for example, we’ve had to

continually monitor how changes in the park
environment

—

such as the installation of a new ride

that casts shadows in different areas

—

affect model

performance. Even if the camera itself hasn’t moved, the
new visual context can alter what the model “sees.” This

underscores the importance not only of high-quality
data at the training stage, but also of consistent, well-
understood inputs at inference time.

Beyond quality, data preprocessing and balancing are
also critical. In real-world scenarios, it's common for
datasets to contain far more examples of "normal"
conditions than of rare or hazardous events. Without
deliberate augmentation, rebalancing, or synthetic data
generation, the model may struggle to generalize
properly

—

especially in safety-critical use cases where

false negatives are unacceptable. As research has
shown, imbalanced training data can lead to
performance degradation in CV models, particularly in
anomaly classification tasks [9].

Another major factor is model architecture. Selecting
the right neural network structure depends not just on
accuracy, but also on computational efficiency,
deployment scalability, and hardware compatibility. In

our project, we’ve recently begun the proces

s of

migrating from one model to a more modern one

—

not

because the old one failed, but because it couldn’t scale

as efficiently meeting real-time requirements across
video streams. This kind of transition is resource-
intensive. Even with an established annotation pipeline
and pre-balanced datasets, adopting a new model often

means rebuilding much of the system infrastructure. It’s

effectively a new project, albeit one built on familiar
tracks.

That’s why model selection is both a technical and a

business decision. On the one hand, models trained
several years ago may still perform reasonably well, but
newer models released are often more adaptive, faster,
and easier to scale. Where an older model may require
extensive tuning to support inference from hundreds of
video streams, newer transformer-based or quantized
CNN architectures can handle such demands natively
and with far lower latency. In some cases, the primary

motivation for switching isn’

t accuracy at all

—

it’s cost.

If a modern model can generate predictions with 50%
less GPU load or achieve true real-time inference, the
operational savings across data centers can quickly
justify the upfront migration costs.

From a business process standpoint, companies must
strike a balance between locking themselves into a long-

term model (creating a “one

-

way door”) and preparing

for regular evolution. Ideally, a model pipeline should be
designed with modularity in mind

—

allowing retraining,

adaptation, or substitution of core components without
rebuilding everything from scratch. This flexibility is
especially important in rapidly evolving environments
where camera views, lighting, or visual targets may shift
over time.

Ultimately, the effectiveness of a CV system is not static.
It is shaped by a continuous feedback loop between
data, model, environment, and business constraints.
The companies that benefit most from computer vision
are those that treat it not as a static tool, but as a living
infrastructure

—

designed to evolve alongside

technology, context, and operational needs.

Implementation Challenges and Common Pitfalls

Despite the growing maturity of computer vision
technology, many implementation efforts still fall short
due to recurring and often underestimated challenges.
One of the most common problems is the lack of high-
quality training data discussed earlier. Computer vision
is data-hungry, and while organizations may have access

The American Journal of Engineering and Technology

21

https://www.theamericanjournals.com/index.php/tajet

to large volumes of images or video, these assets are
often unusable without proper annotation.

Sometimes data is simply missing

—

there aren’t enough

examples of edge cases or rare classes. Other times, the
annotations are too crude to be useful; for example,
drawing bounding boxes around objects may work for
simple detection tasks, but segmentation problems
require pixel-level precision. Without careful annotation
strategies, models end up overfitting to narrow patterns

—

essentially memorizing what an object “usually looks

like” rather than learning how to generalize to new

contexts.

To mitigate this, teams often turn to techniques like data
augmentation

—

flipping, cropping, rotating, or adding

noise to existing samples to artificially expand the
dataset. More advanced strategies include synthetic
data generation, where entire scenes are created
digitally to simulate rare or difficult-to-capture
conditions. Research has shown that training with high-
quality synthetic data, especially when combined with
domain adaptation techniques, can yield performance
close to real-data-trained models [10]. Recent advances
in self-supervised learning also opens possibilities for
models to bootstrap their own learning without fully
labeled data [11]. However, these methods require
expertise to implement effectively, and poor execution
can lead to models that perform well in simulation but
break down in the real world.

Another major barrier is the computational cost. Even
when the model performs well during testing, it may
require expensive GPU clusters to run at scale or meet
real-time demands. Many state-of-the-art vision models
are extremely resource-intensive, requiring high-
performance GPUs, expensive cloud infrastructure, and
significant energy consumption [12]. Updating to more
efficient architectures or shifting to edge-computing
strategies can reduce hardware load

—

but such

transitions must be weighed against engineering
overhead and financial cost.

Technical infrastructure often becomes a limiting factor.
Unlike typical web applications, computer vision
requires tight integration between hardware (cameras,
network interfaces, storage systems) and software.
Bandwidth constraints, latency, and storage limitations
become serious concerns, especially when dealing with

high-resolution live video. Organizations may also lack
the backend maturity to support continuous
deployment (CI/CD) pipelines or robust monitoring
systems to detect model drift

—

where prediction

accuracy degrades as real-world conditions shift over
time.

A further challenge, often overlooked, is organizational
and operational misalignment. Business leaders may
evaluate new CV deployments not against a defined set
of key performance indicators (KPIs), but against legacy
manual processes

—

often without establishing success

criteria or appropriate benchmarks. Such misalignment
can lead to unrealistic expectations, misinterpretation
of results, and the premature abandonment of
promising technologies.

I encountered this firsthand while working with the
Florida Department of Transportation. The goal was to
monitor vehicle counts and classify traffic composition
on public roads to inform infrastructure planning and
safety initiatives. Traditionally, the department relied on
inductive loops

—

physical sensors embedded in

roadways

—

to collect such data. Seeking a more

scalable solution, they aimed to repurpose existing
surveillance cameras for this task.

I was tasked with designing and deploying a CV-based
system capable of analyzing live video streams of
relatively low quality. The model needed to classify
vehicles in real time by type (e.g., passenger car, truck).
I selected YOLO (You Only Look Once), an object
detection architecture recognized for its high processing
speed [13]. While this choice involved a minor trade-off
in accuracy compared to more computationally

intensive models, YOLO’s real

-time performance was

essential, as legal constraints prohibited the department
from storing any footage.

The resulting system achieved 97% classification
accuracy

—

surpassing the 90% accuracy of the legacy

inductive loop system. Yet even this comparison
undersells the advantages of the CV-based approach.
Unlike physical sensors, which must be installed
individually across thousands of roadways, CV models
can be deployed across existing camera networks with
minimal marginal cost. Moreover, such systems are
inherently more maintainable: they do not degrade
physically, require no manual recalibration, and offer

The American Journal of Engineering and Technology

22

https://www.theamericanjournals.com/index.php/tajet

centralized scalability. The department gained a more
reliable, flexible, and cost-efficient source of traffic
intelligence

—

one that fundamentally changed the

nature of their data infrastructure.

This case underscores why the evaluation of CV systems
requires a shift in mindset. Traditional performance
metrics such as accuracy or cost must be contextualized
within

broader

considerations

—

deployment

scalability, operational resilience, and long-term
maintainability. In many instances, success cannot be
determined through a direct comparison to legacy
systems, but rather through a holistic assessment of

how well the technology aligns with the organization’s

evolving goals and constraints.

CONCLUSION AND FUTURE RESEARCH DIRECTIONS

Despite its immense potential, computer vision remains
a field where technical success and business value do
not always align seamlessly. Many organizations
underestimate the complexity of implementation.
Challenges

like

data

imbalance

and

shifting

environments can derail promising pilot projects if not
addressed proactively. Additionally, success metrics
often remain misaligned across teams, and traditional
benchmarks fail to capture the strengths of scalable CV
deployments. This calls for a mindset shift: evaluating
computer vision not merely as an automation upgrade,
but as an evolving system that demands infrastructure
thinking, adaptive strategies, and long-term cross-
functional collaboration.

Looking ahead, the future of enterprise computer vision
lies in greater modularity, improved model efficiency,
and tighter integration with self-learning systems. As
new architectures like transformer-based CV models
mature and edge computing becomes more accessible,
companies will be able to deploy vision solutions more
flexibly and cost-effectively.

A particularly promising direction is the integration of
large language models with CV systems, enabling more
context-aware, multimodal AI solutions. This fusion
allows enterprises not only to "see" but also to
"understand" visual input in more complex operational
contexts

—

such as interpreting surveillance footage

alongside written reports or automating inspection
workflows using both image data and text-based
specifications. Simultaneously, advances in synthetic

data generation and self-supervised learning are
reducing dependency on fully labeled datasets, making
it easier to scale in data-constrained domains.

Ultimately, organizations that treat computer vision as a
strategic capability

—

not a plug-and-play tool

—

will be

better positioned to harness its full transformative
power across industries.

REFERENCES

1.

El Melhaoui Ouafae,

Islam El Melhaoui, Faouaz

Jeffali, Sara Said, S Elouaham, Integration of
artificial intelligence algorithms for defect
detection and shape recognition in mechanical
quality

control

//

Interactions,

2024.

DOI: 

10.1007/s10751-024-02235-y

2.

Siemens, AI

‑

based quality inspection, URL:

https://www.siemens.com/global/en/products/a
utomation/topic-areas/industrial-ai/usecases/ai-
based-quality-inspection.html

3.

Rocco Pietrini, Marina Paolanti, Adriano Mancini,
Emanuele Frontoni, Primo Zingaretti, A deep
learning-based system for shelf visual monitoring
// Expert Systems with Applications, 2024.

DOI: 

10.1016/j.eswa.2024.124635

4.

Amazon, Amazon fulfillment center robotics and
AI,

2024.

URL:

https://www.aboutamazon.com/news/operation
s/amazon-fulfillment-center-robotics-ai

5.

William

Villegas-Ch,

Alexandra

Maldonado

Navarro, Santiago Sanchez-Viteri, Optimization of
inventory management through computer vision
and machine learning technologies // Intelligent
Systems

with

Applications,

2024.

DOI:

10.1016/j.iswa.2024.200438

6.

InsurTech Digital, Insurance claims AI unicorn

Tractable closes $65 M Series E, 2023. URL:

https://insurtechdigital.com/articles/insurance-
claims-ai-unicorn-tractable-closes-65m-series-e

7.

Chris Padwick, Director of Machine Learning and
Computer Vision, John Deere, Transforming
Agriculture with AI and Computer Vision, 2024.
URL:

h ttps://www.nvidia.com/en-us/on-

demand/session/gtc24-s63033/

The American Journal of Engineering and Technology

23

https://www.theamericanjournals.com/index.php/tajet

8.

Transparency

Market

Research,

Artificial

Intelligence (AI) Camera Market to Generate US$
35,5 Billion in Sales by 2034 as Smart Surveillance
and Analytics Gain Popularity, 2025.

9.

Manisha Saini, Seba Susan, Tackling class
imbalance in computer vision: a contemporary
review // Artificial Intelligence Review, 2023. DOI:

10.1007/s10462-023-10557-6

10.

Lukas Malte Kemeter, Rasmus Hvingelby, Paulina
Sierak, Tobias Schön, Bishwajit Gosswami,
Towards Reducing Data Acquisition and Labeling
for Defect Detection using Simulated Data, 2024.
arXiv:2406.19175v1 [cs.LG]

11.

Zhong Hong, Exploring self

‑

supervised learning:

training without labeled data, Medium, 2023. URL:

https://medium.com/@zhonghong9998/explorin
g-self-supervised-learning-training-without-
labeled-data-6e1a47dc5876

12.

AI Stack, How Much GPU Resources Are Required
for AI Development and ML Model Training, 2024.
URL:

https://ai-stack.ai/en/ai-model-training-gpu-

resource

13.

Momina Liaqat Ali, Zhou Zhang, The YOLO
Framework: A Comprehensive Review of
Evolution, Applications, and Benchmarks in Object
Detection // Computers, 2024, 13(12), 336. DOI:

10.3390/computers13120336

References

El Melhaoui Ouafae, Islam El Melhaoui, Faouaz Jeffali, Sara Said, S Elouaham, Integration of artificial intelligence algorithms for defect detection and shape recognition in mechanical quality control // Interactions, 2024. DOI: 10.1007/s10751-024-02235-y

Siemens, AI‑based quality inspection, URL: https://www.siemens.com/global/en/products/automation/topic-areas/industrial-ai/usecases/ai-based-quality-inspection.html

Rocco Pietrini, Marina Paolanti, Adriano Mancini, Emanuele Frontoni, Primo Zingaretti, A deep learning-based system for shelf visual monitoring // Expert Systems with Applications, 2024. DOI: 10.1016/j.eswa.2024.124635

Amazon, Amazon fulfillment center robotics and AI, 2024. URL: https://www.aboutamazon.com/news/operations/amazon-fulfillment-center-robotics-ai

William Villegas-Ch, Alexandra Maldonado Navarro, Santiago Sanchez-Viteri, Optimization of inventory management through computer vision and machine learning technologies // Intelligent Systems with Applications, 2024. DOI: 10.1016/j.iswa.2024.200438

InsurTech Digital, Insurance claims AI unicorn Tractable closes $65 M Series E, 2023. URL: https://insurtechdigital.com/articles/insurance-claims-ai-unicorn-tractable-closes-65m-series-e

Chris Padwick, Director of Machine Learning and Computer Vision, John Deere, Transforming Agriculture with AI and Computer Vision, 2024. URL: https://www.nvidia.com/en-us/on-demand/session/gtc24-s63033/

Transparency Market Research, Artificial Intelligence (AI) Camera Market to Generate US$ 35,5 Billion in Sales by 2034 as Smart Surveillance and Analytics Gain Popularity, 2025.

Manisha Saini, Seba Susan, Tackling class imbalance in computer vision: a contemporary review // Artificial Intelligence Review, 2023. DOI: 10.1007/s10462-023-10557-6

Lukas Malte Kemeter, Rasmus Hvingelby, Paulina Sierak, Tobias Schön, Bishwajit Gosswami, Towards Reducing Data Acquisition and Labeling for Defect Detection using Simulated Data, 2024. arXiv:2406.19175v1 [cs.LG]

Zhong Hong, Exploring self‑supervised learning: training without labeled data, Medium, 2023. URL: https://medium.com/@zhonghong9998/exploring-self-supervised-learning-training-without-labeled-data-6e1a47dc5876

AI Stack, How Much GPU Resources Are Required for AI Development and ML Model Training, 2024. URL: https://ai-stack.ai/en/ai-model-training-gpu-resource

Momina Liaqat Ali, Zhou Zhang, The YOLO Framework: A Comprehensive Review of Evolution, Applications, and Benchmarks in Object Detection // Computers, 2024, 13(12), 336. DOI: 10.3390/computers13120336

Scalable Computer Vision in Enterprises: Deployment, Limitations and Future Directions.

Authors

DOI:

Keywords:

Abstract

References

Categories

Information

Issue

Section

Downloads

How to Cite

License