Алгоритм «обертки» для многообъектного обучения: первые результаты

Аннотация

Многофакторное обучение - это обобщение супервизорного обучения, в котором каждый пример представлен меченым мешком, состоящим из множества экземпляров. Некоторые методы многофакторного обучения преобразуют каждый мешок в один экземпляр и затем применяют стандартные методы обучения с супервизией. В данной работе представлен новый метод многоэкземплярного обучения, который преобразует многоэкземплярные данные и вдохновлен разработкой текстов. Предлагаемый метод преобразует многоинстанционные данные в традиционное представление «атрибут-значение» путем создания корпуса документов, сформированного из искусственных слов, чтобы уменьшить потерю информации в процессе преобразования. Кроме того, была проведена эмпирическая оценка предложенного метода на девяти многофакторных наборах данных и двух методах обучения, преобразующих многофакторные данные в традиционное представление «атрибут-значение». Эмпирическое исследование показало, что по точности классификации предложенный метод конкурентоспособен с использованными в сравнении методами обучения.

Тип источника: Конференции
Годы охвата с 2022
inLibrary
Google Scholar
Выпуск:
CC BY f
14-20

Скачивания

Данные скачивания пока недоступны.
Поделиться
Куинтеро-Домингуез L., Антóн Варгас J., & Пéрез Мадригал S. (2025). Алгоритм «обертки» для многообъектного обучения: первые результаты. Цифровые технологии и право, 1(5), 14–20. извлечено от https://inlibrary.uz/index.php/digteclaw/article/view/136488
Л Куинтеро-Домингуез, Университет Санкти Спиритус «Хосе Марти Перес»
Кандидат наук, доцент
Ж Антóн Варгас, Университет Санкти Спиритус «Хосе Марти Перес»
магистр наук, доцент
С Пéрез Мадригал, Университет Санкти Спиритус «Хосе Марти Перес»
Инж., Инструктор
Crossref
Сrossref
Scopus
Scopus

Аннотация

Многофакторное обучение - это обобщение супервизорного обучения, в котором каждый пример представлен меченым мешком, состоящим из множества экземпляров. Некоторые методы многофакторного обучения преобразуют каждый мешок в один экземпляр и затем применяют стандартные методы обучения с супервизией. В данной работе представлен новый метод многоэкземплярного обучения, который преобразует многоэкземплярные данные и вдохновлен разработкой текстов. Предлагаемый метод преобразует многоинстанционные данные в традиционное представление «атрибут-значение» путем создания корпуса документов, сформированного из искусственных слов, чтобы уменьшить потерю информации в процессе преобразования. Кроме того, была проведена эмпирическая оценка предложенного метода на девяти многофакторных наборах данных и двух методах обучения, преобразующих многофакторные данные в традиционное представление «атрибут-значение». Эмпирическое исследование показало, что по точности классификации предложенный метод конкурентоспособен с использованными в сравнении методами обучения.


background image

Digital technologies and law

Digital technologies and law

14

5. Ramzan Z. Phishing Attacks and Countermeasures // Handbook of Information

and Communication Security / P. Stavroulakis & M. Stamp (Eds.). 2010. Pp. 433–448.

6. Salahdine F., Kaabouch N. Social Engineering Attacks: A Survey // Future

Internet. 2019. № 11. Pp. 89–92.

7. Schneider, F. B. Cybersecurity Education in Universities // IEEE Security &

Privacy. 2013. Vol. 11. Pp. 3–4.

8. Veale M., Brown I. Cybersecurity // Internet Policy Review. 2020. Vol. 9.

Pp. 1–22.

L. A. Quintero-Domínguez,

PhD, Associate Professor,

University of Sancti Spíritus «José Martí Pérez»

J. A. Antón Vargas,

MSc, Assistant Professor,

University of Sancti Spíritus «José Martí Pérez»

S. Pérez Madrigal,

Eng., Instructor,

University of Sancti Spíritus «José Martí Pérez»

WRAPPER ALGORITHM FOR MULTI-INSTANCE LEARNING:

EARLY RESULTS

Abstract.

Multi-instance learning is a generalization of supervised learning, where

each example is represented by a labeled bag composed by a set of instances. Several
multi-instance learning methods transform each bag into a single instance and then
apply standard supervised learning methods. This paper presents a new multi-instance
learning method that transforms the multi-instance data and is inspired by text mining.
The proposed method transforms the multi-instance data into a traditional attribute-
value representation by creating a corpus of documents formed by artificial words
to reduce the loss of information during the transformation process. In addition, the
proposed method was empirically evaluated using nine multi-instance datasets and two
learning methods that transform the multi-instance data into a traditional attribute-value
representation. The empirical study indicates that, in terms of classification accuracy,
the proposed method is competitive with the learning methods used in the comparison.

Keywords

:

Multi-instance learning, Bag-of-words, Wrapper method, data, algo-

rithm, learning, learning methods

АЛГОРИТМ «ОБЕРТКИ» ДЛЯ МНОГООБЪЕКТНОГО ОБУЧЕНИЯ:

ПЕРВЫЕ РЕЗУЛЬТАТЫ

Аннотация.

Многофакторное обучение – это обобщение супервизорного

обучения, в котором каждый пример представлен меченым мешком, состоящим
из множества экземпляров. Некоторые методы многофакторного обучения
преобразуют каждый мешок в один экземпляр и затем применяют стандартные


background image

Digital technologies and law

15

методы обучения с супервизией. В данной работе представлен новый метод
многоэкземплярного обучения, который преобразует многоэкземплярные
данные и вдохновлен разработкой текстов. Предлагаемый метод преобразует
многоинстанционные данные в традиционное представление «атрибут-значение»
путем создания корпуса документов, сформированного из искусственных слов,
чтобы уменьшить потерю информации в процессе преобразования. Кроме того, была
проведена эмпирическая оценка предложенного метода на девяти многофакторных
наборах данных и двух методах обучения, преобразующих многофакторные
данные в традиционное представление «атрибут-значение». Эмпирическое
исследование показало, что по точности классификации предложенный метод
конкурентоспособен с использованными в сравнении методами обучения.

Ключевые слова

:

многофакторное обучение, метод «обертки», данные,

алгоритм, обучение, методы обучения

Introduction.

Multi-instance learning is a generalization of standard propositional

learning, also called attribute-value learning. While in standard learning an instance is
represented by a fixed-size vector of attribute-value pairs that has a class label associated
with it, in multi-instance learning an instance is represented by a bag of attribute-value
vectors and the class label is associated with the whole bag.

Multi-instance learning has attracted increasing interest primarily because

of the wide variety of real-world problems that can be modeled quite naturally

as multi-instance problems. These problems include text classification [1] image

retrieval and classification [3, 4] prediction of pharmacological activity [3], index

web page recommendation [5] and prediction of academic performance [6].

Since the introduction of multi-instance learning, the number of multi-instance

classification methods has grown considerably. Many authors have proposed categories
to try to capture the distinctive features of these methods [1]. Recently [5] proposed
three main categories:

Instance-based methods: these are algorithms where the learning process occurs at

the instance level.

Bag-based methods: includes sorters that work directly in the bag space.
Mapping-based methods (wrappers): these are classifiers that apply a transforma-

tion to the data of the multi-instance problem so that traditional supervised learning
algorithms can be applied to obtain the solution.

There are methods belonging to the wrapper category that transform multi-instance

problems into traditional learning problems by replacing each bag with an attribute vec-
tor consisting of a summary statistic derived from the instances in the bag. These meth-
ods can lead to information loss when transforming the original multi-instance prob-
lems, which affects the classification efficiency.

Here we present a multi-instance learning method belonging to the category of

mapping-based ones. The proposed method, called MIBoW, is inspired by text mining
techniques and other fields where bag-of-words representation has been used [1–8].
MIBoW aims to achieve a reduction of information loss during the transformation of
multi-instance data into a traditional attribute-value representation. MIBoW can be seen


background image

Digital technologies and law

Digital technologies and law

16

as a transformation of the multi-instance dataset into a corpus of documents, where each
bag becomes a document described by a set of artificial words that will be the attributes
in the transformed dataset.

This paper shows the initial experimental evaluation performed to assess the effec-

tiveness of the proposed method, where nine datasets and two mapping-based multi-in-
stance learning methods were used. The experimental results indicate that the proposed
method is competitive with the learning methods used.

Methodology.

This section gives a brief introduction to multi-instance learning,

presents the proposed method and describes the experimental study conducted.

Multi-instance classification.

In multi-instance classification, a training example

is a bag that contains multiple instances described by attribute-value vectors and has
a single class label associated with it. Formally, in multi-instance classification, an ex-
ample is a pair (

X

,

y

), where

X

= {

x

1

, ...

x

T

}

N

x

is a multi-set (bag) of

T

instances and

y

Y

is the class label of the instance. A bag is defined as a multiset

X

N

x

be-

cause multiple copies of the same instance may be included in a bag. The instances

x

i

X

(

i

= 1, ... ,

T

)

are vectors of the

m

-space formed by the vector product of the

m

at-

tributes describing the instances and

Y

is the set of class labels. The multi-instance clas-

sification task is to find a function

H

:

N

X

Y

that, from a training set

D

=

{

(

X

1

,

y

1

), ... ,

(

X

T

,

y

T

)

}

, allows to predict the class of a previously unseen example.

Multi-instance classification methods generally assume the existence of some rela-

tionship between the instances and the class label of the bag. This relationship is referred
to as the multi-instance hypothesis. There are now a variety of multi-instance hypothe-
ses that have been introduced as new solution methods for multi-instance problems have
been developed [9]. The first hypothesis that was employed to define multi-instance
learning was the standard [10].

The standard hypothesis states that a bag will be positive if and only if it contains

any positive instances. That is, if the bag is negative all its instances will be negative, if
the bag is positive at least one of its instances will be positive. Formally, given a func-
tion h, capable of estimating the class labels of an instance, the standard hypothesis can
be described as:

𝐻𝐻

(

𝑋𝑋

) =

� ℎ

(

𝑥𝑥

𝑖𝑖

)

𝑥𝑥

i

∈𝑋𝑋

MIBoW Method.

This section describes the main steps of the MIBoW method.

First, a transformation of the multi-instance dataset into a corpus of documents
represented in the Bag-of-Words (BoW) format is performed. Each bag of instances is
transformed into a textual document described by artificial words, which are constructed
by combining the attribute names with their value: [attribute name]_[attribute value].
It should be noted that the attributes need to be discretized beforehand, so that the
numerical values do not cause the generation of an excessive number of artificial words.
The proposed method goes through each instance of the bag and generates the artificial
words with each of the attributes, to form the set of words that will form the document
corresponding to the bag.


background image

Digital technologies and law

17

Then, attribute-value pair vectors are constructed to represent the bags of the original

multi-instance representation. For this, each of the artificial words that were generated
in the corpus of documents is considered as an attribute in the new representation. The
value associated to each document (example) for a word (attribute) is the frequency with
which that word occurred in the document. The document is then associated with the
same class label as the bag it represents.

Finally, after transforming the multi-instance dataset to the new attribute-value

representation, a traditional learning algorithm is trained that will obtain a model capable
of classifying a previously unseen bag after being transformed to the new representation.

Experimental study setup.

This section presents the initial experimental study

conducted to evaluate the effectiveness of the proposed method. For this purpose, nine
multi-instance data sets were used, which are described in Table 1. For comparison, in
addition to the proposed algorithm, two multi-instance learning methods were employed,
which, like MIBoW, perform a transformation of the data to a traditional attribute-value
representation: SimpleMI [11] and MIWrapper [12].

Table 1 Characteristics of the data sets used in the experimentation

Dataset

Attributes

Positive bags

Negative bags

Total bags

AntDrugs5

5

198

202

400

Atoms

10

125

63

188

24

24

125

63

188

Corel01vs02

9

100

100

200

Corel02vs03

9

100

100

200

Corel03vs04

9

100

100

200

Corel04vs05

9

100

100

200

EastWest

24

10

10

20

TREC9Sel-1

299

200

200

400

The proposed method, as well as SimpleMI and MIWrapper transform the multi-

instance datasets to an attribute-value representation and then use traditional classification
methods. For this reason, the experimental comparison was performed using the base
classifiers RandomForest and SMO.

The Weka tool was used to perform the experimental evaluation and the measure

used to measure the effectiveness of the methods was the classification accuracy. In
addition, the data sets were discretized using the subdivision of the rank of each attribute
into 10 intervals of equal length. The learning methods were used with the default
parameter values of Weka.

Results and discussion.

As mentioned above, the experimental study compared

the proposed MIBoW method with the SimpleMI and MIWrapper methods. These
methods have to be used in combination with a traditional classification algorithm as
they transform the multiinstance data to a traditional attribute-value representation.
RandomForest and SMO were used for this purpose.


background image

Digital technologies and law

Digital technologies and law

18

Table 2 Experimental Evaluation Results (RF-RandomForest)

Dataset

RandomForest

SMO

MIBoW-

RF

SimpleMI-

RF

MIWrapper-

RF

MiBoW-

SMO

SimpleMI-

SMO

MIWrapper-

SMO

AntDrugs5

72.25

58.50

71.75

69.25

60.00

71.25

Atoms

79.71

66.49

66.49

68.71

66.49

66.49

Chains

89.36

73.42

77.63

84.56

77.19

69.62

Corel01vs02

84.50

70.00

86.00

87.00

72.50

82.50

Corel02vs03

81.00

75.00

80.00

84.00

74.00

71.50

Corel03vs04

95.50

90.00

88.50

94.00

90.00

74.00

Corel04vs05

100.00

99.00

95.50

100.00

98.50

90.00

EastWest

80.00

70.00

65.00

70.00

80.00

60.00

TREC9Sel-1

73.50

50.25

78.50

82.50

50.25

75.00

The Table 2 shows the results of the experimental evaluation in terms of

classification accuracy. Analyzing the combinations with RandomForest, it can be seen
that MIBoW obtains the best classification accuracy value in seven out of nine data sets.
Additionally, to test whether these differences are statistically significant, statistical tests
were performed following the methodology proposed by [10, 11] to compare several
classifiers on several data sets. The Figure 1 shows the comparison between combinations
with RandomForest using Friedman’s test and Shaffer’s procedure for the post hoc
analysis with a value of

α

= 0.05. In this figure it can be seen that the combination with

MIBoW is significantly superior to those with SimpleMI and MIWrapper.

Figure 1. Comparation using RandomForest

Analyzing the combinations with SMO, it can be seen that MiBoW also obtains the

best classification accuracy value in seven of the nine data sets. Similar to the methodology
followed with the RandomForest combinations to test whether these differences are
statistically significant, we used Friedman’s test and Shaffer’s procedure for the post
hoc analysis with a value of

α

= 0.05. In the Figure 2 it can be seen that the combination

with MIBoW obtained first place in the Friedman ranking and is significantly superior
to those with SimpleMI and MIWrapper.


background image

Digital technologies and law

19

Figure 2. Comparison using SMO

Conclusion.

In this paper, a new mapping-based multi-instance learning method,

called MIBoW, is presented. The proposed method is inspired by text mining techniques, in
particular the Bag-of-Words representation. MIBoW transforms multi-instance data into
a traditional attribute-value representation by creating a corpus of documents consisting
of artificial words to reduce information loss during the transformation process. The
experimental study conducted indicates that, in terms of classification accuracy, the
proposed method is superior to other methods that transform multi-instance data into an
attribute-value representation.

As future work, it is planned to explore the effect of using typical text mining

word weighting methods such as TF-IDF. In addition, it is intended to increase the
experimental study using other learning methods and multi-instance datasets, to explore
in more detail the advantages and possible limitations of MIBoW.

References

1. Amores J. Multiple instance classification: Review, taxonomy and comparative

study // Artificial Intelligence. 2013. № 201. Pp. 81–105.

2. Chen Y., Bi J., Wang J. Z. MILES: Multiple-instance learning via embedded

instance selection // IEEE Transactions on Pattern Analysis and Machine Intelligence.
2006. Vol. 28. Pp. 1931–1947.

3. Demšar J. Statistical Comparisons of Classifiers over Multiple Data Sets //

Journal of Machine Learning Research. 2006. Vol. 7. Pp. 1–30.

4. Dietterich, T. G., Lathrop, R. H., Lozano-Pérez, T. Solving the multiple

instanceproblem with axis-parallel rectangles // Artificial Intelligence. 1997. Vol. 89.
Pp. 31–71.

5. García S., Herrera F. An Extension on «Statistical Comparisons of Classifiers

over Multiple Data Sets» for all Pairwise Comparisons // Journal of Machine Learning
Research. 2008. Vol. 9. Pp. 2677–2694.

6. Melki G., Cano A., & Ventura S. MIRSVM: Multi-instance support vector

machine with bag representatives // Pattern Recognition. 2018. Vol. 79. Pp. 228–241.

7. Peng X., Wang L., Wang X., Qiao Y. Bag of visual words and fusion methods

for action recognition: Comprehensive study and good practice // Computer Vision and
Image Understanding. 2016. Vol. 150. Pp. 109–125.

8. Quintero-Domínguez L.A., Morell C., Ventura S. WordificationMI:

multirelational data mining through multiple-instance propositionalization // Progress
in Artificial Intelligence. 2019. Vol. 8. Pp. 375–387.


background image

Digital technologies and law

Digital technologies and law

20

9. Quintero-Domínguez L.A., Morell C., Ventura S. A propositionalization method

of multi-relational data based on Grammar-Guided Genetic Programming // Expert
Systems with Applications. 2021. Vol. 168. Art. 114263.

10. Sánchez Tarragó D., Cornelis C., Bello R., Herrera, F. A multi-instance learning

wrapper based on the Rocchio classifier for web index recommendation // Knowledge-
Based Systems. 2014. Vol. 59. Pp. 173–181.

11. Zafra A., Ventura, S. Multi-instance genetic programming for predicting student

performance in web based educational environments // Applied Soft Computing. 2012.
Vol. 12. Pp. 2693–2706.

Dilixiati Duolikun,

Master’s student,

Belarusian State University

DIGITAL TECHNOLOGY IN CHINA’S JUSTICE

Abstract.

The pandemic in 2020 gave a powerful impetus to the development

of digital technologies and the maximum digitalization of all spheres of public life.
This continued the trend of growing popularity of online education, e-commerce,
online communication, including hosting forums, summits, conferences, meetings,
brainstorming sessions storms, digital online meetings. Some researchers believe that
the global pandemic has been the catalyst of the new, nascent phenomenon of digital
globalization. The court system is not an exemption. This article recognizes the China
approach of digitalizing justice.

Keywords

: artificial intelligence, digitalization; online courts, transparency, legal

proceedings, online auctions, cybersecurity, e-filling, QR code-filling

ЦИФРОВЫЕ ТЕХНОЛОГИИ В ПРАВОСУДИИ КИТАЯ

Аннотация.

Пандемия, охватившая мир в 2020 г., послужила мощным им-

пульсом цифровизации всех сфер общественной жизни. Возросла популярность
онлайн-образования, электронной коммерции, онлайн-общения, включая про-
ведение форумов, саммитов, конференций, совещаний, мозговых штурмов и за-
седаний в цифровом (онлайн) формате. Некоторые исследователи полагают, что
именно пандемия стала катализатором нового зарождающего явления – цифро-
вой глобализации. Цифровая сфера не имеет государственных границ, террито-
риальной принадлежности, не всегда охватывается национальной юрисдикцией
государств. Мир становится свидетелем прихода новой культуры – электронной.
Судебная система не является исключением. В этой статье рассматривается опыт
Китая в цифровизации системы правосудия.

Ключевые слова

: искусственный интеллект, цифровизация, онлайн-суды,

транспарентность, судебное разбирательство, онлайн-аукционы, кибербезопас-
ность, электронное заполнение, QR-заполнение

Библиографические ссылки

Amores J. Multiple instance classification: Review, taxonomy and comparative study//Artificial Intelligence. 2013. № 201. Pp. 81-105.

Chen Y, Bi J., Wang J. Z. MILES: Multiple-instance learning via embedded instance selection // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006. Vol. 28. Pp. 1931-1947.

Demsar J. Statistical Comparisons of Classifiers over Multiple Data Sets // Journal of Machine Learning Research. 2006. Vol. 7. Pp. 1-30.

Dietterich, T. G., Lathrop, R. H., Lozano-Pdrez, T. Solving the multiple instanceproblem with axis-parallel rectangles//Artificial Intelligence. 1997. Vol. 89. Pp. 31-71.

Garcia S., Herrera F. An Extension on «Statistical Comparisons of Classifiers over Multiple Data Sets» for all Pairwise Comparisons // Journal of Machine Learning Research. 2008. Vol. 9. Pp. 2677-2694.

Melki G., Cano A., & Ventura S. MIRSVM: Multi-instance support vector machine with bag representatives//Pattern Recognition. 2018. Vol. 79. Pp. 228-241.

Peng X., Wang L., Wang X., Qiao Y. Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice // Computer Vision and Image Understanding. 2016. Vol. 150. Pp. 109-125.

Quintero-Dominguez L.A., Morell C., Ventura S. WordificationMI: multirelational data mining through multiple-instance propositionalization // Progress in Artificial Intelligence. 2019. Vol. 8. Pp. 375-387.

Ouintero-Dommguez L.A., Morell C., Ventura S. Apropositionalization method of multi-relational data based on Grammar-Guided Genetic Programming // Expert Systems with Applications. 2021. Vol. 168. Art. 114263.

Sanchez Tarragd D., Comelis C., Bello R., Herrera, F. A multi-instance learning wrapper based on the Rocchio classifier for web index recommendation // Knowledge-Based Systems. 2014. Vol. 59. Pp. 173-181.

Zafra A., Ventura, S. Multi-instance genetic programming for predicting student performance in web based educational environments //Applied Soft Computing. 2012. Vol. 12. Pp. 2693-2706.