The American Journal of Engineering and Technology
88
https://www.theamericanjournals.com/index.php/tajet
TYPE
Original Research
PAGE NO.
88-100
10.37547/tajet/Volume07Issue05-07
OPEN ACCESS
SUBMITED
19 March 2025
ACCEPTED
24 April 2025
PUBLISHED
12 May 2025
VOLUME
Vol.07 Issue 05 2025
CITATION
Anna Deviatko. (2025). Evolution of Automated Testing Methods Using
Machine Learning. The American Journal of Engineering and Technology,
7(05), 88
–
100. https://doi.org/10.37547/tajet/Volume07Issue05-07
COPYRIGHT
© 2025 Original content from this work may be used under the terms
of the creative commons attributes 4.0 License.
Evolution of Automated
Testing Methods Using
Machine Learning
Sr. QA engineer, PGA Tour Ponte Vedra, USA
Abstract:
program testing is crucial for guaranteeing
program dependability, but it has historically included
a lot of manual labor, which restricts coverage and
raises expenses. By creating and selecting test cases,
anticipating defect-prone locations, and evaluating test
results,
machine
learning
(ML)-driven
testing
approaches automate and improve traditional software
testing. This study examines the development of these
techniques. Significant enhancements are provided by
ML-driven techniques, such as early fault detection,
shorter testing times, and increased test coverage. The
paper offers a thorough synthesis of current
developments, contrasting ML-based testing with
conventional methods in a number of areas, including
efficacy and efficiency in defect identification. It also
highlights important research gaps, talks about real-
world
implementation
issues,
and
looks
at
multidisciplinary
uses
of
machine
learning
technologies, such as deep learning and reinforcement
learning. The paper concludes by highlighting machine
learning's revolutionary influence on software testing
procedures and projecting a time when testing will
become more independent, flexible, and incorporated
into ongoing software development processes.
Keywords:
adaptive leadership, organizational change,
crisis management, employee retention, workplace
innovation, participative change management, hybrid
work, leadership adaptability, career adaptability,
strategic agility.
Introduction:
For software systems to be reliable and
of high quality, software testing is essential.
Conventional testing methods can need a large amount
of manual labor to create and run test cases, which can
The American Journal of Engineering and Technology
89
https://www.theamericanjournals.com/index.php/tajet
take a lot of time and could not provide thorough
coverage. The discipline of ML-driven (or "intelligent")
software testing emerged as a result of researchers and
practitioners using machine learning (ML) and artificial
intelligence (AI) more and more during the past ten
years to improve and automate software testing
procedures. By automatically creating and prioritizing
test cases, anticipating defect-prone regions, and
analyzing test results, machine learning (ML)-driven
testing approaches can find flaws that manual methods
would overlook while also cutting down on testing time
and expense. For instance, ML-based test prioritization
can direct execution toward the most failure-prone
tests first, while ML-based test generation can get
greater code coverage than manual testing. These
benefits establish ML-driven testing as a revolutionary
development of conventional methods.
Traditional manual and heuristic-based testing
procedures frequently fail to retain efficacy and
efficiency in the increasingly complex and fast-paced
world of current software development, which makes
this topic crucial. Manual testing gets more expensive,
time-consuming, and prone to human error as software
systems get more complex. With its capacity to
automate intricate decision-making procedures and
analyze enormous volumes of data, machine learning
presents a viable answer to these problems.
Organizations can increase test coverage, identify
errors sooner, shorten testing times, and improve
software quality and dependability by incorporating
machine learning (ML) into testing procedures.
This review offers a unique synthesis of a broad range
of recent scholarly research to give a thorough
overview of how machine learning techniques
—
from
conventional supervised learning to state-of-the-art
deep learning and reinforcement learning methods
—
are actively reshaping software testing processes,
whereas previous literature has primarily concentrated
on isolated applications or particular methodologies.
This article's contribution is its comprehensive
comparison of ML-driven testing techniques with
conventional testing methods, outlining their distinct
benefits, drawbacks, and useful implications for
software engineering processes. By highlighting
important research gaps, illustrating thematic links
through bibliometric studies, and talking about the
useful application of ML technologies in actual testing
situations, it significantly deepens our understanding of
the area. Additionally, the paper offers an integrated
viewpoint on how various machine learning methods
come together to tackle challenging testing problems
by examining interdisciplinary intersections like
computer vision and natural language processing.
Finally, by clearly illustrating the timeline of machine
learning's revolutionary influence on software testing
and defining prospective research avenues crucial for
ongoing
innovation
in
intelligent
testing
methodologies, this review provides insightful
information to scholars, practitioners, and industry
stakeholders.
METHODS AND MATERIALS
In order to contextualize and examine the
incorporation of machine learning into software
testing, this review consults a variety of scholarly and
peer-reviewed sources. The progress of testing
activities like test generation, defect detection, and test
suite optimization using ML techniques is highlighted in
Ajorloo et al.'s (2024) systematic analysis of machine
learning approaches in software testing [1]. In support
of this, Boukhlif et al. (2023) provide a bibliometric
analysis of intelligent software testing during the
previous ten years, mapping research trends,
important contributors, and developing themes in the
area with the aid of visualization tools [2]. The
potential of hybrid AI approaches in addressing
particular testing difficulties is demonstrated by
Esnaashari and Damia's (2021) targeted application of
genetic algorithms and reinforcement learning for
automated test data production [3].
One of the earliest investigations of AI in software
testing is provided by Last, Kandel and Bunke (2004),
who provide a fundamental viewpoint on AI
approaches such as fuzzy logic and neural networks in
the testing setting [4]. Mehmood et al. (2024), who
examine test suite optimization using a variety of
machine learning techniques, provide a thorough and
current summary of recent developments [5]. A
detailed bibliometric analysis and systematic literature
review are provided by Obreja et al. (2024), who map
the conceptual framework of innovation in AI research
and emphasize multidisciplinary links and trends
influencing the incorporation of AI in innovation
contexts [6]. In their empirical investigation of practical
ML testing procedures, Openja et al. (2024) examine
The American Journal of Engineering and Technology
90
https://www.theamericanjournals.com/index.php/tajet
how various ML software systems handle attributes
including security, fairness, and correctness [7]. Similar
to this, Pan et al. (2022) examine ML-based methods for
prioritizing and choosing test cases, identifying patterns
and gaps in the use of ML models to expedite regression
testing [8].
Sebastian et al. (2024) demonstrate the emergence of
data-driven optimization strategies by conducting a
thorough mapping analysis of unsupervised machine
learning techniques used for test suite reduction in
order to highlight real-world applications [9]. Wahono
(2015) offers domain-wide and historical insights, and
his systematic literature assessment on defect
prediction offers preliminary evidence for the
predictive utility of machine learning in software
reliability [10]. Finally, the paper by Zardari et al. (2022)
put current research orientations into context and
pinpoint areas that need more attention [11].
The materials analyzed cover a range of ML-enhanced
testing operations. The literature based on major
themes: (4) Oracle and result analysis (using ML to
detect anomalies or verify outputs, including
metamorphic testing and log analysis); (3) Defect
prediction and fault localization (training models to
predict likely defect locations or failure-prone
components); (2) Test suite optimization (including test
case prioritization, selection, and reduction using ML
techniques); and (3) Test input generation (using ML to
create test cases or test data automatically). Within
each category, the comparison baseline (manual
testing, random testing, or other conventional
automation) and the types of machine learning (ML)
methodologies utilized (e.g., traditional supervised
learning, deep learning (DL), reinforcement learning
(RL), or hybrid techniques) are noted.
RESULTS AND DISCUSSION
Applying AI methods to software testing is not a
completely novel concept. Researchers suggested
automating some aspects of the testing process as early
as the 2000s by utilizing techniques like fuzzy logic,
neural networks, and AI planning. An early book by Last
et al. (2004), for instance, gathered a variety of AI
techniques in software testing, including case-based
reasoning for identifying dangerous code modules and
neural networks for creating test cases [4]. A corpus of
research on machine learning-based defect prediction
surfaced during the 2010s. In order to forecast which
components are likely to contain bugs, machine
learning classifiers (such as decision trees, Bayesian
networks, and support vector machines) are trained
using historical code and bug data. Although it focuses
on QA analytics rather than test automation
specifically, rigorous studies of software defect
prediction by 2015 have established it as an effective
application of machine learning in software engineering
[10]. The use of search-based algorithms (genetic
algorithms, etc.) for test generation was another
forerunner of contemporary ML-driven testing. Search-
based testing (SBST), which is frequently regarded as a
component of artificial intelligence (AI) in testing,
automatically creates test inputs by optimizing a fitness
function (such code coverage). Although not all SBST
methods use machine learning, they all adhere to the
principles of informed search and automation.
Interest in AI-driven testing increased between 2016
and 2020. According to Zardari et al.'s thorough
bibliometric analysis, there was an increase in
publications between 2016 and 2019, reaching a high in
2019, and then seeing a minor decline in 2020
–
2021
(partly due to COVID-19 disruptions) [11]. Specialized
subfields emerged around this time, most notably test
case
prioritization
utilizing
machine
learning.
Continuous integration made test selection and
prioritizing a priority since ML models can forecast
which test cases are most likely to identify errors
following a code change as test suites get bigger.
Researchers have been depending more and more on
machine learning (ML) methods in recent years to
accomplish efficient test case selection and
prioritization (ML-based TSP), according to Pan et al.
(2022) [8]. More quickly than traditional regression
testing (e.g., running tests in fixed or random order),
machine learning (ML) models can learn to rank test
cases in an order that finds flaws by incorporating
information from multiple sources (code changes,
previous failures, coverage, etc.). Numerous
investigations conducted between 2018 and 2020
proved the feasibility of this strategy, frequently
employing test execution data from the past to develop
failure prediction models. -prone evaluations.
The distribution of 1225 articles on innovation and
artificial intelligence (AI) across the social sciences is
displayed in Figure 1, which makes it evident that
The American Journal of Engineering and Technology
91
https://www.theamericanjournals.com/index.php/tajet
scholarly interest is on the rise.
Figure 1. Analysis of research interest in AI by Obreja et al. [6]
Following a period of little activity, there was a
noticeable uptick beginning in 2018, with publications
increasing dramatically from 60 to 350 by 2023. This
quick expansion indicates increased interdisciplinary
research and use of AI approaches in a variety of
subjects, reflecting a rising understanding of AI's
revolutionary potential across disciplines. The dramatic
rise highlights how AI has progressed from specialized
study to broad scholarly recognition, emphasizing
important research topics and the increasing use of AI-
driven approaches to tackle challenging innovation
problems. The figure demonstrates this academic
momentum and suggests that AI will continue to grow
and be more deeply integrated into future research
projects.
The close ties between researchers, geographical areas,
and new subjects in intelligent software testing are a
major finding of the most recent bibliometric
investigation. As illustrated in Figure 2, for example,
the Three-Field Plot graphically depicts the connections
among nations, authors, and document title keywords.
The American Journal of Engineering and Technology
92
https://www.theamericanjournals.com/index.php/tajet
Figure 2. Three-Field Plot by Boukhlif et al. [2]
Each of the three fields is represented by a colored
rectangle in this picture. The thickness of the lines
connecting them shows how strong these links are,
while the height of the rectangle represents the total
weight of their connections. Notably, the plot
emphasizes
how
significant
contributions
are
concentrated in nations like the USA, China, and India,
which are closely related to the main study issues
represented in the keywords of the document title.
This implies that these areas not only produce a lot, but
also have similar research interests that spur
advancements in ML-driven testing techniques.
Automated test generation using machine learning is
another field that developed during this period. The
late 2010s saw a greater usage of machine learning for
test generation, whereas earlier attempts relied on
evolutionary algorithms. For instance, researchers
looked into using training models to direct fuzz testing
or produce inputs that meet complex constraints.
Although early results were limited, the idea of
employing neural networks to generate test cases has
been proposed by 2019. Additionally, around this time,
there was an increase in interest in using natural
language processing (NLP) in testing, such as parsing
requirements or user stories to create test cases or
examining system logs to look for anomalies. NLP-
assisted software testing techniques were laid out by
Garousi et al. (2020), showing how ML (especially NLP)
intersects with testing chores including requirements-
based testing and test oracle development.
The discipline has developed quickly in recent years,
embracing cutting-edge machine learning techniques.
The American Journal of Engineering and Technology
93
https://www.theamericanjournals.com/index.php/tajet
According to Ajorloo et al. (2024), a wide range of
machine learning techniques are currently being used
in various testing phases, and as the field develops,
comprehending these techniques and their difficulties
has grown more difficult [1]. A few key trends can be
used to describe the current situation. First, supervised
machine learning models trained on data (such as code
metrics and previous test outcomes) are used in many
testing applications. For example, classification or
regression models are used to forecast the priority of a
test or if a certain code commit will damage the build
(continuous integration testing). According to a
thorough analysis of test suite optimization methods,
Mehmood et al. (2024) find that while deep learning
and hybrid approaches are becoming more popular,
they are still less common than standard supervised
machine learning techniques (such as random forests
and support vector machines) [5]. This implies that
tabular data and classical machine learning are
sufficient and have been used the most thus far for
many problems (such ranking test cases or forecasting
flaws).
Second, deep learning has begun to become a
significant factor, particularly for activities that use
complicated data (such as textual specifications, GUI
pictures, or computer source code). The ability of
neural networks
—
including sophisticated architectures
like transformers
—
to automatically extract features
from unprocessed inputs is useful for creating test
cases or spotting errors with subtle patterns. One study
claims that neural networks have surpassed more
established approaches like Bayesian networks in
popularity as one of the most widely used methods for
classification tasks in testing (such as determining if a
test will fail). Two prominent examples are A3Test
(2023) and AthenaTest (2020), which frame the
creation of unit tests as a neural machine translation
problem
—
that is, converting code into tests. A3Test, a
deep learning-based method, has shown increased
correctness in generated tests by supplementing neural
test production with learnt assertions. Deep models
may handle richer input data (such source code tokens
or GUI pixel data) to produce or assess tests in ways
that previous approaches could not, which signifies a
considerable shift in capacity from simple ML models.
A Sankey diagram showing the relationship between
the types of machine learning systems and the
attributes they examine is shown in Figure 3.
The American Journal of Engineering and Technology
94
https://www.theamericanjournals.com/index.php/tajet
Figure 3. Sankey diagram by Openja et al. [7]
A startling imbalance in actual ML testing procedures
is seen in the diagram. Correctness dominates the flow
of testing efforts and is by far the most tested attribute
across all domains, including recommendation,
computer vision (CV), natural language processing
(NLP), and application platforms. Critical attributes like
Adversarial Perturbation, Security & Privacy, and Bias &
Fairness, on the other hand, are examined far less
frequently and are limited to a small number of system
kinds. For example, recommendation and CV systems
pay some attention to bias and fairness, but security
and privacy are rarely evaluated. The article's assertion
that existing techniques are still narrowly focused
despite the incorporation of machine learning into
contemporary testing workflows is supported by this
figure. It draws attention to the necessity of more
comprehensive, property-aware testing techniques
that take into account the particular difficulties
presented by ML systems, particularly when they are
used in fields with higher stakes and greater sensitivity.
Third, mixing machine learning with other methods is
becoming more and more popular. Some strategies
combine several ML models in an ensemble or combine
ML with search-based techniques to create learning-
optimization hybrids. To help with test suite reduction,
Tahvili et al. (2020), for instance, clustered similar test
cases using Word2Vec embeddings, an unsupervised
machine learning technique for text, in conjunction
with clustering. In test generation research, hybrid
approaches are also used, where a standard algorithm
may be guided by a learning component. Though they
were less numerous (about 8 out of 43) than studies
that only used ML or DL, the IEEE Access study on test
suite optimization acknowledges the existence of
"hybrid" category studies, which combine ML and other
AI techniques. Using reinforcement learning to direct a
search-based test generator is an effective hybrid
example; the search algorithm creates candidate
inputs, and RL determines which regions of the input
space to concentrate on. These pairings can take
advantage of the advantages of several methods.
Fourth,
evaluating
machine
learning
models
The American Journal of Engineering and Technology
95
https://www.theamericanjournals.com/index.php/tajet
themselves
—
often referred to as ML testing or AI
software testing
—
is a parallel topic that is outside the
purview of this essay but is nevertheless noteworthy.
The community has also been interested in how to
systematically test AI systems (e.g., testing neural
networks for safety), even if our focus is on utilizing ML
to test software. It's interesting to note that there is
some cross-pollination: methods such as metamorphic
testing, which was initially created to test non-
deterministic or "non-testable" programs, are now
applied to machine learning models, and concepts from
that field (such as adversarial input generation) are
incorporated into the creation of general tests for
conventional software. Although the primary focus of
this paper is machine learning (ML) as a tool for
software testing, the increasing demand for AI system
verification provides more motivation for automated,
intelligent testing techniques.
Defect prediction, basic categorization, or clustering for
test jobs are examples of early applications of machine
learning (ML) that have evolved into more complex,
learning-enabled automation across the testing
lifecycle. As the subject matures, there is also a
noticeable rise in interdisciplinarity, as methods from
computer vision, time-series analysis, natural language
processing, etc., are being applied to software testing
issues. Whether these ML-driven techniques truly
perform better than conventional testing techniques is
a key topic. According to the research, ML-based
testing can greatly increase productivity and the
efficacy of flaw detection in a variety of situations.
However, each strategy has a different level of
improvement and a different set of circumstances in
which it appears.
Compared to manual test design or basic random
testing, automated test generation using machine
learning (ML) seeks to improve coverage and identify
more flaws. Positive results have been documented in
studies. For example, by learning program behavior,
ML models have been used to produce test inputs that
achieve high path coverage. In addition to lowering the
overall testing effort, an industry case study on AI-
enhanced testing tools demonstrated notable
increases in test coverage and error detection rates.
Similarly, studies using the NEATEST method, which
combines neuroevolution with deep learning for game
testing, showed a 93% decrease in search time and
greater branch coverage (81.3% vs. 75.9%) when
compared to a non-ML baseline. These findings suggest
that ML can produce more efficient test cases more
quickly than conventional techniques when used
appropriately. It is important to remember that
identifying bugs is the ultimate goal; coverage is not the
only statistic. Certain tests generated by machine
learning may improve coverage without discovering
new flaws (particularly if the ML optimizes coverage
directly). Therefore, direct fault-finding ability is also
measured in recent works. The deep learning strategy
was able to automatically identify some regression
errors in the AthenaTest and A3Test studies for unit
tests, but human-written tests or heuristic techniques
like EvoSuite still discovered some errors that the ML
missed. All things considered, machine learning (ML)-
driven test generation enhances traditional testing by
rapidly addressing simple scenarios and regressions,
freeing up human testers to concentrate on intricate,
scenario-based testing where insight is required.
ML has demonstrated distinct benefits in the areas of
test suite optimization and prioritizing. Conventional
test priority frequently depends on human judgment or
basic heuristics (such as executing tests that cover
updated code first). ML-based prioritizing can identify
which tests are most likely to show failures based on
past data. According to empirical assessments,
machine learning techniques can perform better than
traditional methods like random ordering or entire
coverage. For instance, a research cited by Esnaashari
& Damia used reinforcement learning to prioritize
regression tests and outperformed coverage-based
sorting in terms of early fault detection rates [3]. Using
a learning-to-rank model for test cases, Zhang et al.
(2022) observed gains of several percentage points in
the Average Percentage of Faults Detected (APFD)
measure
compared
to
standard
techniques.
Development teams now receive feedback on failures
more quickly thanks to these enhancements, which
speeds up continuous integration cycles.
Adaptability is a noteworthy benefit of machine
learning in this field. While a hard-coded heuristic may
become less effective, an ML model can retrain on new
data and modify the prioritizing method as the
software and test features change. A limitation noted
in certain research, though, is that in order for ML
models to train efficiently, a sizable history of test
The American Journal of Engineering and Technology
96
https://www.theamericanjournals.com/index.php/tajet
results is necessary. Traditional approaches might be
more dependable in projects with sparse signals or very
infrequent failures. In actuality, a mixed strategy might
be applied (for example, start with basic principles and
then progressively switch to an ML model as data
accumulates).
A more analytics-focused application of machine
learning is defect prediction and fault localization,
which has a direct influence on testing by directing
where additional testing should be conducted. ML
classifiers may predict defect-prone components with a
reasonable level of accuracy, as demonstrated by
numerous
research
(typically
assessed
by
precision/recall or AUC). This has been advanced by
contemporary deep learning techniques (such as code
embeddings), which can occasionally detect intricate
bug patterns. ML-based defect prediction might focus
testing on modules that are statistically likely to fail, as
opposed to expert intuition or traditional static
analysis. In reality, businesses have employed these
models to effectively distribute testing resources
(Google's "Testing Reliability Predictor," for instance,
uses machine learning to determine how much testing
should be done on a particular code update). Here,
comparative
efficiency
is
frequently
context-
dependent; for example, an ML model may not perform
better than a straightforward heuristic in projects with
noisy data. However, ML forecasts have outperformed
random or simplistic approaches in large systems with
a long history, increasing the rate of fault identification
before release.
ML can occasionally help determine if a test succeeds
or fails (the oracle problem). ML-based vision
algorithms, for example, can identify visual
abnormalities in GUI testing that a conventional script
might overlook. Algorithms for anomaly identification
(typically unsupervised machine learning) in log
analysis highlight odd patterns that might point to
malfunctions. These ML-driven oracles can detect
subtle failures more easily than if exhaustive assertions
or filters were written by hand. Research conducted on
production systems between 2021 and 2023 has
demonstrated that machine learning-based anomaly
detectors can identify problems like memory leaks or
performance regressions before human operators do.
The false positive rate is a problem, too, as ML oracles
might sound the alarm for harmless abnormalities;
therefore, they need to be adjusted or used in
conjunction with conventional tests to prevent noise.
In many instances, ML-driven testing has shown gains
in efficacy (raising coverage and faults identified) and
efficiency (cutting testing time and effort) across
several domains. An important benefit in agile and
DevOps environments is the ability of intelligent testing
tools to run continuously and adapt. It's critical to
temper enthusiasm with real-world factors discussed in
the literature. First and foremost, machine learning
techniques require data
—
either runtime data for
unsupervised anomaly detection or training data for
supervised approaches. The ML technique may
perform poorly if a project has insufficient data or data
of poor quality (such as an untrustworthy test result
history). This cold-start issue is not present in
traditional testing. Second, software engineers might
be hesitant to depend on "black-box" machine learning
judgments for crucial testing assignments. For
example, there needs to be assurance that an ML
model isn't missing a bug if it deprioritizes some tests.
In order to provide light on the reasoning behind a
model's testing choices, researchers have started
investigating explainable AI methodologies. Teams
may employ ML recommendations in advisory mode
rather than entirely automatic, at least initially, in
situations when trust is low. Third, ML models
themselves need to be maintained, which includes
managing concept drift and upgrading as software
changes. In contrast to standard scripts, this adds
overhead. But a lot of contemporary ML-based testing
solutions are made to learn or update automatically,
which minimizes the need for manual retraining.
In conclusion, research indicates that ML techniques
can produce outcomes that are on par with or better
than traditional methods while requiring less human
intervention, especially when it comes to regression
testing and repetitive activities. In exploratory and
scenario-based testing, where human ingenuity and
domain expertise are essential, traditional testing still
performs exceptionally well. The overall research
narrative is headed towards supporting and enhancing
human testers rather than replacing them. ML-driven
automation eliminates time-consuming testing tasks
and identifies possible trouble spots, freeing up testers
to focus on more sophisticated behavior verification
and higher-level test design. Teams that successfully
The American Journal of Engineering and Technology
97
https://www.theamericanjournals.com/index.php/tajet
use AI into testing view it as a means of increasing test
productivity (commonly referred to as "augmented
testing") rather than as a completely hands-off testing
solution,
demonstrating
this
complementary
connection in industry acceptance.
The field of ML-driven testing is still developing, and
there are a few noteworthy new trends. Software
testing now has more options thanks to the
development of strong LLMs like GPT-3/4. These
models can produce code, including possible test cases
or test oracles, from natural language prompts since
they have been trained on extensive code corpora. In
certain situations, like GUI or game testing,
reinforcement learning (RL) has been used to teach an
agent how to use an application and identify bugs. The
program being tested is treated as an environment by
RL-based test agents, who then learn events (action
sequences) that maximize a reward (such as bug
discovery or coverage). This method is comparable to
an independent exploratory tester. The application of
RL algorithms for complex systems is increasing as
computer resources and algorithms get better (e.g.,
testing self-driving car software by learning sequences
of sensor inputs that trigger misbehavior). The special
benefit of reinforcement learning is its capacity to find
unknown unknowns, or unexpected sequences that
result in failures, that a deterministic script might not
attempt. A deep Q-learning agent was utilized in a
recent study by Al-Sabbagh et al. (2022) to prioritize
which tests to execute in continuous integration. In
some cases, the adaptive technique outperformed
static ordering. More RL integration with testing
frameworks is to be expected, potentially leading to
"smart test bots" that may be used to continuously
investigate a system in the field while learning from
each execution.
Although supervised learning has received a lot of
attention, unsupervised methods are essential in
situations when data labeling is challenging (e.g.,
differentiating between failing and passing conditions).
K-means and other clustering algorithms have been
effectively used to decrease test suites by combining
duplicate tests or to group comparable failure reports.
In a systematic mapping analysis, Sebastian et al. (2024)
emphasized how often K-means clustering is in test
suite reduction research, which frequently starts with
coverage metrics [9]. Using unsupervised machine
learning (ML) to address the test oracle problem is the
trend here. For instance, anomaly detection (clustering
or autoencoders) on program execution traces is being
used to identify deviations without the need for explicit
"failure" labels. There are also plans for self-supervised
learning, in which the system generates surrogate
labels from data. For example, it may generate
plausible invariants from execution data and then flag
violations as possible flaws without the need for
manual labeling. As these methods develop, they may
significantly lessen the requirement to preserve
predicted test results, enabling more flexible problem
identification.
The method of testing programs by comparing input-
output relationships (instead of outputs against a
predetermined predicted value) is known as
metamorphic testing. A synergy is emerging between
metamorphic testing and machine learning (ML): ML
models may be tested using metamorphic testing, and
ML can be used to discover metamorphic relations from
data. In order to automate the development of
metamorphic test cases, recent studies have started
utilizing machine learning (ML) to anticipate which
metamorphic relations hold for a given function. This is
particularly helpful for AI programs and other
algorithms without obvious oracles. We anticipate
greater effort to integrate these concepts and
successfully use machine learning to identify SUT
features that can function as pseudo-oracles.
Adding ML-driven testing tools to development
workflows is a useful trend. Both open-source and
commercial tools are becoming available (e.g.,
Microsoft's IntelliTest with some ML methods,
Launchable's predictive test selection, etc.). Although
many organizations have expressed enthusiasm in
using AI for testing, a 2022 quality assessment pointed
out that the promise has not yet been widely achieved
in practice. Industry reports on adoption have been
conflicting. Adoption is anticipated to quicken, though,
as tools continue to advance and encouraging case
studies are being documented. In the upcoming years,
CI/CD systems may incorporate intelligent testing
dashboards that optimize test execution schedule,
continuously assess test efficacy, and recommend new
tests using machine learning.
The dependability of ML itself is essential as it is
incorporated into the testing process. A flawed
The American Journal of Engineering and Technology
98
https://www.theamericanjournals.com/index.php/tajet
machine learning model could miss important tests or
produce inaccurate results. Thus, the question of how
to evaluate the testing tools is becoming more and
more important. Validating ML models used in safety-
critical testing has been mentioned in research from
2023 (e.g., making sure an ML-based test selector
doesn't systematically overlook a class of tests, which
could be perceived as a bias). The NIST AI risk
framework and other initiatives place a strong
emphasis on the thorough assessment of AI
components in software processes. When testing
procedures are created to assess how well ML performs
in testing, a feedback loop may be created (for
example, seeding known flaws and observing if the ML-
driven process discovers them). The implementation of
AI-driven quality assurance in businesses is probably
going to include this meta-testing of AI systems as a
routine practice.
When combined, these patterns point to a future
where intelligent agents will increasingly support
software testing. As routine creation and execution
tasks become more automated, the human tester's
function will become more focused on oversight,
strategy, and interpretation. With machine learning
(ML) modifying tests as software changes, testing may
turn into an ongoing, independent process that
operates in the background. This has significant
ramifications: testing may stop being a phase and
instead become an ongoing service associated with
software development.
CONCLUSION
Automated software testing techniques have advanced
significantly as a result of machine learning. The area
has advanced from early attempts to use AI for defect
prediction and test generation to a wide range of ML-
driven techniques that are used in many parts of the
testing lifecycle. Our analysis of current research from
2021 to 2025 demonstrates that ML-driven testing
techniques have advanced to the point where they can
provide real advantages in practice: they can decrease
the amount of manual labor required to create and
choose tests, increase test coverage, and identify flaws
that conventional techniques might overlook. By
learning from previous project data, supervised
learning models
—
in particular
—
have become widely
used in tasks such as test case prioritization, where they
consistently
outperform
simple
heuristics.
Simultaneously, the frontier of research is expanding
into massive language models, deep learning, and
reinforcement learning, creating new opportunities to
automate intricate testing situations and even produce
test logic that is human-like.
According to a comparative analysis, ML-driven testing
has significant benefits when used properly but is not a
panacea. Although traditional testing methods, which
are based on human knowledge and simple automation
scripts, are frequently transparent and dependable,
they are unable to keep up with the complexity and
quick speed of contemporary software development.
These issues are resolved by ML-driven approaches,
which use learning to manage scale and complexity.
For instance, as was mentioned, an ML-based test
prioritizer can quickly and meaningfully rearrange
thousands of tests following each code change,
something that is impossible to accomplish by hand for
every commit. Similarly, whereas a human test suite
may not keep up with changes, tools that use machine
learning to produce test cases may quickly adjust to
new features or patterns in the code. Studies have
quantitatively shown gains like increased fault
detection rates early in testing cycles or notable time
savings (around 20
–
30% in some industrial trials) as a
result of more intelligent test selection.
However, there are obstacles and ramifications for
software engineering if ML-driven testing is to be
successfully implemented in practice. As part of their
QA infrastructure, teams need to get or hone the skills
necessary to administer and maintain ML components.
This entails not just training models at first but also
tracking their performance over time, much like one
would track automation failures or test flaws.
Important data concerns also include the necessity of
gathering and selecting high-quality training data (such
as test results, code modifications, and coverage data)
to feed the algorithms. Setting up this data pipeline is
a requirement that costs money in many businesses.
The change in test engineers' roles and culture is
another impact. The function of a tester increasingly
takes on the responsibilities of a test strategy and
machine learning supervisor when routine test design
becomes automated. Test engineers might need to
comprehend how the machine learning models decide,
analyze their results, and step in when needed (for
example, overriding a model's choices if it's missing
The American Journal of Engineering and Technology
99
https://www.theamericanjournals.com/index.php/tajet
something crucial). To effectively use AI-assisted tools,
organizations may need to teach test teams on
fundamental data science or machine learning ideas.
Positively, removing humans from mundane duties
might improve the testing process by allowing testers
to concentrate on intricate, innovative scenarios while
leaving the repetitive tasks to the robots.
The incorporation of machine learning (ML) into testing
necessitates strong validation of these AI components
from a quality and safety perspective. Evidence that an
AI-driven testing technique is at least as effective as
conventional approaches and does not increase
unacceptable risk (for example, by overlooking errors)
may be needed in safety-critical areas (financial,
healthcare, automotive, etc.). Guidelines for applying
machine learning (ML) to the verification and validation
process will probably be incorporated into software
regulatory frameworks and standards, which are just
starting to take AI into account. It makes sense to
anticipate the creation of best practices or standards
for "ML in QA" in order to guarantee that the
advantages
are
realized
without
sacrificing
dependability.
In the future, it appears that automated testing
incorporating machine learning will continue to
advance at an accelerated rate. A future where testing
adjusts to software changes on its own is hinted at by
emerging techniques like generative testing, which
uses generative models to construct complex test
scenarios, and self-healing tests, which automatically
update themselves as code changes and use machine
learning to modify expected outcomes. An AI agent
that observes every commit, determines how to test it
on its own, learns from the results, and even enhances
the test suite by adding additional tests it finds
beneficial might greatly support the DevOps concept of
continuous testing. With the current direction of
research, such a concept is becoming more and more
attainable.
In the end, automated testing has evolved from a
primarily human scripting endeavor to a data-driven,
intelligent process thanks to machine learning.
Although there have been some new difficulties along
the way, testing has clearly improved in terms of
efficiency and efficacy. ML-driven testing is in a strong
condition right now: methods for many common
testing requirements are available and verified, and
new tools are being developed to make them
accessible. A major theme for the future will be
handling this complexity in a morally sound manner as
the software being tested and the testing methods
grow increasingly complicated (with AI on both sides).
AI engineering concepts must be incorporated into
software engineering as a discipline in order to ensure
quality. The combination of human and machine
intelligence in testing has the potential to produce
software with much greater quality in less time.
Software teams can more successfully integrate
machine learning into their testing procedures and
produce more dependable software systems with
confidence by taking note of the limitations and
achievements covered in this article.
REFERENCES
Ajorloo, S., Jamarani, A., Kashfi, M., Kashani, M. H., &
Najafizadeh, A. (2024). A Systematic Review of Machine
Learning Methods in Software Testing. Applied Soft
Computing,
162,
111805.
https://doi.org/10.1016/j.asoc.2024.111805
Boukhlif, M., Hanine, M., & Kharmoum, N. (2023). A
Decade of Intelligent Software Testing Research: A
Bibliometric Analysis. Electronics, 12(9), 2109.
https://doi.org/10.3390/electronics12092109
Esnaashari, M., & Damia, A. H. (2021). Automation of
Software Test Data Generation Using Genetic
Algorithm and Reinforcement Learning. Expert Systems
with
Applications,
183,
115446.
https://doi.org/10.1016/j.eswa.2021.115446
Last, M., Kandel, A., Bunke, H. (2004). Artificial
Intelligence Methods in Software Testing Series in
Machine Perception and Artificial Intelligence, Volume
56, 2004. World Scientific Publishing Co.
Mehmood, A., Ilyas, Q. M., Ahmad, M., & Shi, Z. (2024).
Test Suite Optimization Using Machine Learning
Techniques: A Comprehensive Study. IEEE Access, 12,
168645
–
168671.
https://doi.org/10.1109/ACCESS.2024.3490453
Obreja, D. M., Rughiniș, R., & Rosner, D. (2024).
Mapping the conceptual structure of innovation in
artificial intelligence research: A bibliometric analysis
and systematic literature review. Journal of Innovation
&
Knowledge,
9(1),
100465.
The American Journal of Engineering and Technology
100
https://www.theamericanjournals.com/index.php/tajet
https://doi.org/10.1016/j.jik.2024.100465
Openja, M., Khomh, F., Foundjem, A., Jiang, Z. M., Abidi,
M., & Hassan, A. E. (2024). An empirical study of testing
machine learning in the wild. ACM Transactions on
Software Engineering and Methodology, 34(1), 1-63.
Pan, R., Ghaleb, T. A., Bagherzadeh, M., & Briand, L.
(2022). Test Case Selection and Prioritization Using
Machine Learning: A Systematic Literature Review.
Empirical
Software
Engineering,
27,
81.
https://doi.org/10.1007/s10664-021-10066-6
Sebastian, A., Naseem, H., & Catal, C. (2024).
Unsupervised Machine Learning Approaches for Test
Suite Reduction: A Systematic Mapping Study. Applied
Artificial
Intelligence,
38(4),
310
–
330.
https://doi.org/10.1080/08839514.2024.2322336
Wahono, R. S. (2015). A systematic literature review of
software defect prediction. Journal of software
engineering, 1(1), 1-16.
Zardari, S., Alam, S., Al Salem, H. A., Al Reshan, M. S.,
Shaikh, A., Rehman, M. M. u., & Mouratidis, H. (2022).
A Comprehensive Bibliometric Assessment on Software
Testing (2016
–
2021). Electronics, 11(13), 1984.
https://doi.org/10.3390/electronics11131984
