Авторы

  • Zarnigor Fayzullayeva
    Muhammad al-Xorazmiy nomidagi TATU dotsenti

DOI:

https://doi.org/10.71337/inlibrary.uz.scin.132867

Аннотация

Hozirgi kunda inson hayotining barcha sohalarida, ayniqsa internetda juda ko'p ma'lumotlar mavjud. Muhim ma'lumotni mavjud ma'lumotlar bazasidan xulosa yaratish orqali ko'rib chiqish va ajratish mumkin. Matn mazmuni ko‘payishda davom etar ekan, matnni umumlashtirish uchun aqlliroq va takomillashtirilgan yechimlarni joriy etish orqali ushbu o'sishni boshqarish tadqiqot hamjamiyatiga yuk bo'lib qoldi. Ma'lumotlar o'sishining tezligi va hajmining ortishi bilan, katta hajmdagi matn hujjatlaridan zarur ma'lumotlarni olish ancha murakkablashadi.


background image

ILM-FAN VA INNOVATSIYA

ILMIY-AMALIY KONFERENSIYASI

in-academy.uz/index.php/si

27

CHUQUR NEYRON TARMOQNI O`QITISH ORQALI MASTAT TIZIMINI

ALGORITMI

Fayzullayeva Zarnigor Inatillayevna

Muhammad al-Xorazmiy nomidagi TATU dotsenti. Zarnigor18z02@gmail.com

https://doi.org/10.5281/zenodo.16760069

Hozirgi kunda inson hayotining barcha sohalarida, ayniqsa internetda juda ko'p

ma'lumotlar mavjud. Muhim ma'lumotni mavjud ma'lumotlar bazasidan xulosa yaratish orqali
ko'rib chiqish va ajratish mumkin. Matn mazmuni ko‘payishda davom etar ekan, matnni
umumlashtirish uchun aqlliroq va takomillashtirilgan yechimlarni joriy etish orqali ushbu
o'sishni boshqarish tadqiqot hamjamiyatiga yuk bo'lib qoldi. Ma'lumotlar o'sishining tezligi va
hajmining ortishi bilan, katta hajmdagi matn hujjatlaridan zarur ma'lumotlarni olish ancha
murakkablashadi. Bu ma'lumotlar ko'plab turli maqsadlar uchun zarur bo'lsa-da, ulardan
samarali foydalanish uchun ularni tezda olish zarurati yuzaga keladi. Bunday holatlarda,
avtomatik qisqartirish vositalari asosiy yechim sifatida qaraladi. Shuni hisobga olib, tadqiqot
sohasida doimiy ravishda matnni avtomatik umumlashtirish (MASTAT) uchun yangi
yondashuvlarni ishlab chiqishga talab ortib bormoqda. MASTAT - bu asl hujjatning eng muhim
mazmunini yo'qotmasdan asl matn hujjatidan qisqaroq matn yaratish dasturiy ta’minotidir.

Deep learning (Chuqur o'rganish) — bu mashinani o'rganishning bir tarmog'i bo'lib, inson

miyasi qanday ishlashidan ilhomlangan. "Chuqur" o'rganishning sababi, unda qarorlar qabul
qilishda yordam beradigan turli vaznlarga ega bo'lgan bir nechta qatlamlardan iborat neyronlar
mavjud. Chuqur o'rganishni ikki asosiy bosqichga bo'lish mumkin: o'rgatish (training) va
chiqarish (inference).

So'nggi yillarda matnni qisqartirish vazifalari uchun bir qancha chuqur o'rganish

modellarining taklif qilinishi muhim ahamiyatga ega. BERT (Devlin, Chang, Lee, & Toutanova,
2018) tabiiy til vazifalari uchun oldindan o'rgatilgan neyron tarmoq bo'lib, transformator
asosidagi arxitekturasi bilan turli NLP benchmarklarida ilg'or natijalarni ko'rsatdi. BERT,
kiritilgan matnning semantik va kontekstual ma'lumotlarini tahlil qilishda yuqori
samaradorlikka erishganini isbotladi. RoBERTa (Liu va boshqalar, 2019) — BERT ning
takomillashtirilgan versiyasi bo'lib, matn ketma-ketliklarini uzaytirish, dinamik maskalash va
keyingi jumla prognozasi maqsadini olib tashlash orqali oldindan o'rgatish jarayonini
yaxshilaydi. RoBERTa, BERT ga nisbatan matn qisqartirish vazifalarini bajarishda samarali
ekanligini isbotlab, yuqori sifatli qisqartirishlarni yaratishda muvaffaqiyatga erishdi va bir
nechta NLP benchmark testlarida zamonaviy natijalarga erishdi. BERT/Roberta modelining
NLP vazifalaridagi a'lo darajadagi ishlashiga asoslanib, ushbu modelga asoslangan ko'plab
kengaytma ishlari amalga oshirildi. BertSum (Liu, 2019) BERT pre-o'rgatilgan til modellari
yordamida ekstraktiv qisqartirishlarning samaradorligini oshirishga qaratilgan. Ushbu
yondashuv BERT ni katta matn ma'lumotlar to'plamida qayta o'rgatishni o'z ichiga oladi va
qisqartirishlarni yaratish hamda ularni turli klassifikatorlar yordamida sinovdan o'tkazish
jarayonini amalga oshiradi. BertSum, jumla Transformer qatlamidan foydalanib yaxshi
natijalarga erishdi.

Ekstraktiv qisqartirishda, ba'zan butun jumlalarni chiqarish, keraksiz va takroriy

ma'lumotlarni kiritilishiga olib kelishi mumkin. Zhou va boshqalar (Zhou, Wei, & Zhou, 2020)
ekstraktiv qisqartirish metodini taklif qiladilar, bunda sub-jumlalar (subsentences) ekstraktiv
birlik sifatida ishlatiladi. Ular, ekstraktiv matn qisqartirishda jumla tanlash uchun turli


background image

ILM-FAN VA INNOVATSIYA

ILMIY-AMALIY KONFERENSIYASI

in-academy.uz/index.php/si

28

darajadagi aniqrog'lik — so'z, jumla va paragrafning ta'sirini o'rganib, samaradorlikni bir
nechta benchmark ma'lumotlar to'plamida BERT-ga asoslangan model yordamida
baholaydilar. Ularning natijalari jumla darajasida qisqartirishning so'z yoki paragraf darajasida
qisqartirishga qaraganda samaraliroq ekanligini ko'rsatdi.

Ushbu yondashuvda BERT va chuqur neyron tarmoqlarini birgalikda qo'llagan holda

matnni qisqartirihs jarayoni ko‘rib chiqildi. BERT modelini semantik tahlil yordamida qo‘llash
transformator asosida ishlash hisoblanib, ikki tomonlama (bidirectional) o'qish orqali
matnning semantik va kontekstual ma'lumotlarini o'rganadi. Bu vazifa yordamida BERT o'z-
o'zini o'rgatadi va ba'zi so'zlarni yashiradi. Model, yashirilgan so'zlarni kontekstga asoslanib
aniqlashga harakat qiladi.

Bu yerda

- yashirilgan so'z, va model tomonidan boshqa so'zlar asosida uni aniqlashga

harakat

qilinadi.

Chuqur neyron tarmoqlari (ayniqsa, RNN va LSTM) matnni tahlil qilishda ishlatiladi,

ularning asosiy maqsadi matnni qisqartirishni optimallashtirishdir. Bu yerda

Encoder-

Decoder

arxitekturasi ishlatiladi, unda Encoder matnni kodlaydi, Decoder esa qisqartirishni

yaratadi. har bir so'z

i

x

uchun encoder formulaga asosan yangi representatsiya

i

h

ni yaratadi:

1

2

(

)

,

,

,

i

encoder

n

h

f

x x

x

Bu yerda

encoder

f

- encoder tarmog'ining transformator yoki RNN qatlamlaridan olingan

funksiyadir. Umumiy xulosa olishda esa quyidagi formulaga murojaat qilamiz:

{

( )

}

i

i

Summary

s score s

Yuqoridagi formula orqali katta hajmdagi ma’lumotdan qisqa va mazmunli natija olinadi
BERT va chuqur neyron tarmoqlarining integratsiyasi matnni qisqartirishda yuqori

samaradorlikka erishish imkonini beradi. BERT modelining semantik va kontekstual
ma'lumotlarni o'zida aks ettiruvchi xususiyatlari, matnni chuqur o'rganish orqali yanada
aniqroq qisqartirishlarni ishlab chiqish imkonini beradi. Chuqur neyron tarmog'i (encoder-
decoder) BERT tomonidan yaratilgan matn vakillarini (embeddings) qo'llanib, qisqartirish
jarayonini optimallashtiradi. Shuningdek, attention mexanizmi modelga matnning eng muhim
qismlariga e'tibor qaratish imkonini berib, natijada yuqori sifatli qisqartirishlar ishlab chiqiladi.
Bu yondashuv matnni qisqartirishda an'anaviy metodlarga nisbatan sezilarli samaradorlikni
ta'minlaydi.

References:

Используемая литература:

Foydalanilgan adabiyotlar:

1.

Gambhir, M.; Gupta, V. Recent automatic text summarization techniques: A survey. Artif.

Intell. Rev.

2017

, 47, 1–66. [CrossRef]

2.

Gupta, S.; Gupta, S.K. Abstractive summarization: An overview of the state of the art.

Expert Syst. Appl.

2019

, 121, 49–65. [CrossRef]


background image

ILM-FAN VA INNOVATSIYA

ILMIY-AMALIY KONFERENSIYASI

in-academy.uz/index.php/si

29

3.

Thu, H.N.T.; Huu, Q.N.; Ngoc, T.N.T. A supervised learning method combine with

dimensionality reduction in Vietnamese text summarization. In Proceedings of the 2013
Computing, Communications and IT Applications Conference (ComComAp), Hong Kong, China,
1–4 April 2013; pp. 69–73.
4.

Abuobieda, A.; Salim, N.; Kumar, Y.J.; Osman, A.H. Opposition differential evolution based

method for text summarization. In Proceedings of the Asian Conference on Intelligent
Information and Database Systems, Kuala Lumpur, Malaysia, 18–20 March 2013; Springer:
Berlin/Heidelberg, Germany, 2013; pp. 487–496.
5.

Kabeer, R.; Idicula, S.M. Text summarization for Malayalam documents—An experience.

In Proceedings of the 2014 International Conference on Data Science & Engineering (ICDSE),
Chicago, IL, USA, 31 March–4 April 2014; pp. 145–150.
6.

Hong, K.; Nenkova, A. Improving the estimation of word importance for news multi-

document summarization. In Proceedings of the 14th Conference of the European Chapter of
the Association for Computational Linguistics, Gothenburg, Sweden, 26–30 April 2014; pp.
712–721.
7.

Fattah, M.A. A hybrid machine learning model for multi-document summarization. Appl.

Intell. 2014, 40, 592–600. [CrossRef].
8.

M. Hu and B. Liu, “Mining and summarizing customer reviews,” in

Proceedings of the tenth

ACM SIGKDD international conference on Knowledge discovery and data mining

, pp. 168–177,

ACM, 2004.
9.

L. Zhuang, F. Jing, and X.-Y. Zhu, “Movie review mining and summarization,” in

Proceedings

of the 15th ACM international conference on Information and knowledge management

, pp. 43–

50, ACM, 2006.
10.

I. F. Moawad and M. Aref, “Semantic graph reduction approach for abstractive text

summarization,” in

2012 Seventh International Conference on Computer Engineering & Systems

(ICCES)

, pp. 132–138, IEEE, 2012.

11.

R. Nallapati, B. Zhou, C. Gulcehre, B. Xiang,

et al.

, “Abstractive text summarization using

sequence-to-sequence rnns and beyond,”

arXiv preprint arXiv:1602.06023

, 2016.

A.

M. Rush, S. Chopra, and J. Weston, “A neural attention model for abstractive sentence

summarization,”

arXiv preprint arXiv:1509.00685

, 2015.

Библиографические ссылки

Gambhir, M.; Gupta, V. Recent automatic text summarization techniques: A survey. Artif. Intell. Rev. 2017, 47, 1–66. [CrossRef]

Gupta, S.; Gupta, S.K. Abstractive summarization: An overview of the state of the art. Expert Syst. Appl. 2019, 121, 49–65. [CrossRef]

Thu, H.N.T.; Huu, Q.N.; Ngoc, T.N.T. A supervised learning method combine with dimensionality reduction in Vietnamese text summarization. In Proceedings of the 2013 Computing, Communications and IT Applications Conference (ComComAp), Hong Kong, China, 1–4 April 2013; pp. 69–73.

Abuobieda, A.; Salim, N.; Kumar, Y.J.; Osman, A.H. Opposition differential evolution based method for text summarization. In Proceedings of the Asian Conference on Intelligent Information and Database Systems, Kuala Lumpur, Malaysia, 18–20 March 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 487–496.

Kabeer, R.; Idicula, S.M. Text summarization for Malayalam documents—An experience. In Proceedings of the 2014 International Conference on Data Science & Engineering (ICDSE), Chicago, IL, USA, 31 March–4 April 2014; pp. 145–150.

Hong, K.; Nenkova, A. Improving the estimation of word importance for news multi-document summarization. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, 26–30 April 2014; pp. 712–721.

Fattah, M.A. A hybrid machine learning model for multi-document summarization. Appl. Intell. 2014, 40, 592–600. [CrossRef].

M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 168–177, ACM, 2004.

L. Zhuang, F. Jing, and X.-Y. Zhu, “Movie review mining and summarization,” in Proceedings of the 15th ACM international conference on Information and knowledge management, pp. 43–50, ACM, 2006.

I. F. Moawad and M. Aref, “Semantic graph reduction approach for abstractive text summarization,” in 2012 Seventh International Conference on Computer Engineering & Systems (ICCES), pp. 132–138, IEEE, 2012.

R. Nallapati, B. Zhou, C. Gulcehre, B. Xiang, et al., “Abstractive text summarization using sequence-to-sequence rnns and beyond,” arXiv preprint arXiv:1602.06023, 2016.

A. M. Rush, S. Chopra, and J. Weston, “A neural attention model for abstractive sentence summarization,” arXiv preprint arXiv:1509.00685, 2015.