Digital technologies and law
6
DIGITAL TECHNOLOGIES AND LAW
S. M. Díaz,
student,
University of Sancti Spíritus «José Martí Pérez»
WEB SCANNER: A VULNERABILITY DETECTION TOOL
TO SCAN A WEBSITE GIVEN ITS URL
Abstract.
Information constitutes a very important asset for people or organiza-
tions, therefore, protecting it has become a priority for everyone, unfortunately there is
not a unique formula that grantees a complete and total protection to the information.
With that in mind, it becomes necessary the use of a software that provides help and
facilitates work for the cybersecurity specialists so they can provide the best possi-
ble protection of data. Objective: Develop a system that can scan a given URL link
of a website, and provide all the necessary information about the site. Methods: From
a scientific point of view, the scientific observation, document analysis, survey and in-
terview are considered as methods, giving place to a susceptible proposal to the scien-
tific verification and validation. Results: The implementation of a website that contains
a vulnerability detection system (web scanner), which can scan a website given its URL.
Conclusions: the implementation of the said system for the security of the information
in the University of Sancti Spíritus «José Martí Pérez» (UNISS) is valued as a positive
support for the security against phishing attacks, in a single repository.
Keywords
:
cybersecurity, information, malware, phishing attacks, social engi-
neering, tool, vulnerability analysis, web scanner
ВЕБ-СКАНЕР: СРЕДСТВО ОБНАРУЖЕНИЯ УЯЗВИМОСТЕЙ
ДЛЯ СКАНИРОВАНИЯ ВЕБ-САЙТА ПО ЕГО URL-АДРЕСУ
Аннотация.
Информация представляет собой очень важный актив для
людей или организаций, поэтому ее защита стала приоритетной задачей для всех,
но, к сожалению, не существует уникальной формулы, обеспечивающей полную
и абсолютную защиту информации. В связи с этим возникает необходимость
в использовании программного обеспечения, которое помогает и облегчает работу
специалистов по кибербезопасности, чтобы они могли обеспечить наилучшую
защиту данных. Целью исследования стала разработка системы, способной
сканировать заданную URL-ссылку сайта и предоставлять всю необходимую
информацию о нем. С научной точки зрения в качестве методов рассматриваются
научное наблюдение, анализ документов, анкетирование и интервьюирование,
уступающие место восприимчивому предложению, подлежащему научной
проверке и обоснованию. В результате был создан веб-сайт, содержащий систему
Digital technologies and law
7
обнаружения уязвимостей (веб-сканер), которая может сканировать веб-сайт
по его URL-адресу. Внедрение указанной системы для защиты информации
в Университете Санкти-Спиритус «Хосе Марти Перес» оценивается как
эффективная поддержка защиты от фишинговых атак в едином хранилище.
Ключевые слова
: кибербезопасность, информация, вредоносное ПО,
фишинговые атаки, социальная инженерия, инструмент, анализ уязвимостей,
веб-сканер
Introduction.
Cybersecurity is one of the leading niches of information technol-
ogy. It refers to the tools, frameworks, techniques, and practices implemented to ensure
the security of computing, information, and other systems and their users.
Cybersecurity covers the broad range of technical, organizational and governance
issues that must be considered to protect networked information systems against acci-
dental and deliberate threats. It goes well beyond the details of encryption, firewalls,
anti-virus software, and similar technical security tools. This breadth is captured in the
widely used International Telecommunication Union (ITU) definition [7].
The importance of cybersecurity has increased as so many government, business,
and day-to-day activities around the world have moved online. But especially in emerg-
ing economies, “[m]any organizations digitizing their activities lack organizational,
technological, human resources and other fundamental ingredients needed to secure
their system, which is the key for the long-term success [2, 7].
With the dawn of the World Wide Web, installing antivirus software was necessary
to protect your computer from attacks. Even though destructive assaults back then were
not as well known, as they are today, the history of cyber security threats has kept pace
with the advancement in information technology.
Since computers were connected to the internet and began exchanging messages,
cybercrime has substantially changed. Even if the amount of risk is substantially higher
now than it was back then, computer users have been understandably concerned about
these threats for a long time.
Despite the fact that the Internet has positively affected people’s lives, there are
negative issues emerged related to the use of Internet. Cases like cyber-bully; online
fraud, racial abuse, pornography and gambling had increased tremendously due to the
lack of awareness and self-mechanism among Internet users to protect themselves from
being victims to these acts. However, past research revealed that the level of awareness
among Internet users is still low or moderate. One of the vital measures to be taken is
to cultivate knowledge and awareness among Internet users from their early age, i.e.,
young children. Young children specifically, need to be educated to operate in a safe
manner in cyberspace and to protect themselves in the process [6].
Cybercrime against children and adolescents is certainly a concern for parents, as
they sometimes do not realize their child is a victim of cybercrime. Many parents are
unaware of the activities their children perform in cyberspace. Some children are bul-
lied through comments and insults; they may also be intimidated, harassed, abused or
sexually exploited [6].
Digital technologies and law
Digital technologies and law
8
Cyber risks could change as technology develops. Cybercriminals are always de-
veloping new ways to access systems and steal data.
Therefore, an educated workforce is essential to building trustworthy systems. Yet,
issues about what should be taught and how are being ignored by many of the university
faculty who teach cybersecurity courses a problematic situation [2].
Unfortunately, cybercriminals or “Hackers” are always one-step ahead of cyber-
security specialists in the sense that they are always developing ways to surpass the
obstacles developed by the cybersecurity specialists, and always developing new tools
to violate the information security policies and measures.
The word «hacker» conjures the image of someone with ill intent toward individu-
als, websites, and company information systems. The prevailing theory is they look for
ways to mine company data and destroy or change customer information. Those types
of «bad guys» certainly exist – the cybersecurity industry calls them Black Hats, but in
reality, they are not the only hackers lurking in cyberspace.
Over time, cybersecurity specialists came up with a way to become one-step above
the hackers called “Ethical Hacking”, which is hacking ethically to learn the vulnerabil-
ities of a system.
There is a technique in ethical hacking called “Thinking like a hacker”, which
means to be able to learn how to defend a system first one needs to learn how to attack
it, the best practice to achieve the best possible security is by thinking as a hacker and
asking the question “What would the hacker do?”.
Hackers can be sorted in varies categories, the most popular ones are “Black Hat
Hackers” and “White Hat Hackers”:
Black Hat hackers are criminals who break into computer networks with malicious
intent. They may also release malware that destroys files, holds computers hostage, or
steals passwords, credit card numbers, and other personal information.
White hat hackers or ethical hackers is an individual who uses hacking skills to
identify security vulnerabilities in hardware, software or networks. However, unlike
black hat hackers white hat hackers respect the rule of law as it applies to hacking.
With the rapidly increasing prominence of information technology in recent de-
cades, various types of security incidents, such as unauthorized access, distribution de-
nial of service (DDoS), malware attack, zero-day attacks, data breaches, social engi-
neering or phishing, etc., have increased at an exponential rate in the last decade [1].
Social engineering attacks aim at tricking individuals or enterprises into accom-
plishing actions that benefit attackers or providing them with sensitive data such as so-
cial security number, health records, and passwords.
Social engineering attacks are multifaceted and include physical, social and tech-
nical aspects, which are used in different stages of the actual attack [3].
Physical approaches:
As the name implies, physical approaches are those where the attacker performs
some form of physical action in order to gather information on a future victim. This can
range from personal information (such as social security number, date of birth) to valid
credentials for a computer system [4].
Digital technologies and law
9
Social approaches:
The most important aspect of successful social engineering attacks are social ap-
proaches. Hereby attackers rely on socio-psychological techniques such as Cialdini’s
principles of persuasion to manipulate their victims [4].
Technical approaches:
Technical attacks are mainly carried out over the Internet. Granger notes that the
Internet is especially interesting for social engineers to harvest passwords, as users often
use the same (simple) passwords for different accounts [5].
Social engineering is one of the biggest challenges facing network security be-
cause it exploits the natural human tendency to trust [3].
Although social engineering is a technique, it contains a very big amount of attacks
in its categories and one of the most famous and used attacks in these categories are the
phishing attacks, which is when attackers attempt to trick users into doing ‘the wrong
thing’, such as clicking a bad link that will download malware, or direct them to a dodgy
website.
It is thought that the first phishing attacks happened in the mid-1990s, when a group
of hackers posed as employees of AOL (America Online) and used instant messaging
and email to steal users’ passwords and hijack their accounts.
The focus of this paper is directed to the utility that the web scanner will provide
the users and cybersecurity specialists of the University of Sancti Spíritus “José Martí
Pérez” and how can they avoid being victims of a phishing attack. Since this tool will,
show all the information related to a web site or a URL.
Theatrical framework
The advancements in digital communication technology have made communica-
tion between humans more accessible and instant. However, personal and sensitive in-
formation may be available online through social networks and online services that lack
the security measures to protect this information. Communication systems are vulnera-
ble and can easily be penetrated by malicious users [2]. The last few years has seen a rise
in the frequency with which people have conducted meaningful transactions online;
from making simple purchases to paying bills to banking, and even to getting a mort-
gage or car loan or paying their taxes. This rise in online transactions has unfortunately
been accompanied by a rise in attacks [4, 8].
In this paper will be treating a computer software under development called a web
scanner, which as the name indicates it scans a website given its URL. This project start-
ed with a thesis on cybersecurity management in the University of Sancti Spíritus “José
Martí Pérez” for the department of cyber and information security.
Importance of the web scanner
Every organization, institution, University, company…etc. has a cybersecurity de-
partment that keeps all the information whether its personal or work related safe and
protected, that being said the cybersecurity specialists need tools and software to facil-
itates the work.
The University of Sancti Spíritus “José Martí Pérez” is not the exception, in the
university there is a cybersecurity department in need of a software that can make the
control and monetarization easier.
Digital technologies and law
Digital technologies and law
10
In said university, most of the attacks they suffer from are categorized as social
engineering, specifically phishing attacks to the members of all the faculties, students,
teachers and employees.
That being said a software like the web scanner proposed in this paper is a very
helpful tool for the department and for the members of the university, that way whenever
there is a suspicious email sent to any member that contains an URL can be scanned to
know what that URL hides. Which makes the web scanner a very important addition to
the department, that way they can limit all the phishing attacks in the university.
Methods.
The methodology used allowed obtaining a flexible proposal as an al-
ternative solution, susceptible to scientific verification; for this paper were used the fol-
lowing scientific research methods:
From a theoretical point of view:
Historical-logical analysis that allowed the study of the ways in which the stan-
dards and norms of cybersecurity have evolved.
Analytical-synthetic analysis, which made it possible to study the main cybersecu-
rity systems, as well as vulnerability detection ways and systems.
From an empirical point of view:
Observation, which guided the study of the state of the art, allowing a systemic,
selective and objective analysis of the main systems that can currently carry out vulner-
ability detection systems.
Unstructured interview, which was applied with the intention of obtaining infor-
mation regarding vulnerability detection, processes, as well as expert criteria on the
subject matter.
Results.
The processing of the results obtained with the application of the methods
described, allowed to identify the methodologies and tools to develop the system for
the vulnerability detection system (Web Scanner) of cybersecurity of the University of
Sancti Spíritus «José Martí Pérez», which are presented next.
For the development of the web scanner, the following technologies were chosen:
Python: Python is a programming language widely used in web applications, soft-
ware development, data science, and machine learning (ML). One of the reasons that
Python programming language was chosen because it contains many libraries for cyber-
security development.
Linux OS: Linux is a Unix-like, open source and community-developed operating
system (OS) for computers, servers, mainframes, mobile devices and embedded devic-
es. It is supported on almost every major computer platform, including x86, ARM and
SPARC, making it one of the most widely supported operating systems.
The system works based on a given URL and once it has provided it can scan it
showing the following results:
IP address: An IP is an internet protocol address. Essentially, it is a numeric value
assigned to a network device, and it is used for the identification and location of a net-
work device. IP addresses are assigned to every type of network device.
An IP address consists of a series of four numbers (each between 0 and 255) sepa-
rated by periods. For example, an IP address might look like this: 192.168.0.1.
Digital technologies and law
11
There are two types of IP addresses: IPv4 and IPv6. IPv4 addresses are the older
and more common type of IP address, consisting of 32 bits (4 bytes) and allowing for
approximately 4.3 billion unique addresses. IPv6 addresses are the newer type of IP
address, consisting of 128 bits (16 bytes) and allowing for a virtually unlimited number
of unique addresses.
IP addresses are used for a variety of purposes, including:
Identifying devices on a network: IP addresses are used to uniquely identify devic-
es on a network, such as computers, smartphones, servers, printers, and routers.
Routing internet traffic: routers to route internet traffic between devices on dif-
ferent networks use IP addresses. When a device sends data to another device over the
internet, the data is divided into packets and sent to the destination device’s IP address.
Geo-location: IP addresses can be used to determine the geographic location of
a device. This information can be used for various purposes, such as targeted advertising
or content delivery.
Network security: IP addresses are used in network security to identify potential
threats, such as spam, malware, or unauthorized access attempts. By analyzing the IP
addresses of incoming network traffic, security professionals can identify and block
potential threats.
Network administration: network administrators to manage and configure network
devices, such as routers and switches, use IP addresses. By assigning unique IP address-
es to each device, administrators can manage the devices remotely and troubleshoot
network issues.
Domain name: A domain name is a string of text that maps to an alphanumeric IP
address, used to access a website from client software. In plain English, a domain name
is the text that a user types into a browser window to reach a particular website.
Domain names are made up of two or more parts, separated by periods. The right-
most part of the domain name is called the top-level domain (TLD) and identifies the
type of organization or country associated with the domain. For example, «.com» is
a common TLD that stands for «commercial», while «.org» stands for «organization».
Other common TLDs include «.net», «.edu», and «.gov»…etc.
The part of the domain name to the left of the TLD is known as the second-level
domain (SLD) and is chosen by the website owner. For example, in the domain name
«google.com», «google» is the SLD and «.com» is the TLD.
Domain names are used for a variety of purposes, including:
Identifying websites: Domain names are used as a human- readable and memora-
ble way to identify websites on the internet. Instead of having to remember a website’s
IP address (a string of numbers); users can simply type in the website’s domain name to
access it.
Branding: Domain names are often used as part of a company’s branding strategy,
helping to create a memorable and recognizable online presence.
Email: Domain names are also used to identify email addresses. For example, the
email address «john.doe@example.com» is associated with the domain name «example.
com».
Digital technologies and law
Digital technologies and law
12
SEO: Domain names can also affect a website’s search engine optimization (SEO)
by influencing its relevance and authority in search engine rankings.
Reselling: Domain names can be bought and sold like other forms of digital assets,
and some people make a business out of buying and selling domain names.
Nmap scan: Nmap, short for Network Mapper, is a free and open source tool used
for vulnerability checking, port scanning and, of course, network mapping.
Nmap scan is one of the most popular network scanning tools available and is
widely used by system administrators, security professionals, and network engineers.
Nmap works by sending specially crafted packets to target hosts and analyzing
their responses. The tool can be used to perform a variety of tasks, such as:
Host discovery: identifying hosts that are active on a network
Port scanning: identifying open ports on a target host
Service and version detection: identifying the services running on open ports and
their version numbers
Operating system detection: identifying the operating system running on a target host
Vulnerability scanning: identifying potential vulnerabilities on a target host
Nmap provides a variety of options and parameters that allow users to customize
their scans and obtain results that are more detailed. It also includes a scripting engine
that allows users to write custom scripts to perform more advanced scans and tasks.
Robots.txt: The robots.txt file is a text file that is placed in the root directory of
a website to provide instructions to web robots (also known as web crawlers or spiders)
on how to crawl and index the website’s pages. The robots.txt file is an optional file, and
not all websites have one.
The main purposes of the robots.txt file are:
To control which pages on the website should be crawled and indexed by search
engines: The robots.txt file can be used to prevent search engines from indexing certain
pages on a website. This can be useful if there are pages on the website that the website
owner does not want to appear in search engine results (such as test pages or pages with
sensitive information).
To prevent web crawlers from overloading the website: Web crawlers can consume
a lot of bandwidth and resources, which can slow down or overload a website. The ro-
bots.txt file can be used to limit the frequency and depth of crawling by web robots to
prevent this from happening.
To protect private or confidential information: The robots.txt file can be used to
prevent web robots from crawling and indexing pages that contain private or confiden-
tial information (such as login pages or user profiles).
Whois scan: A WHOIS search will provide information regarding a domain name,
such as example.com. It may include information, such as domain ownership, where
and when registered, expiration date, and the name servers assigned to the domain.
WHOIS scan is a type of network reconnaissance that involves querying a WHOIS
database to obtain information about a domain name or IP address. The WHOIS data-
base contains information about the ownership and administrative contacts for a par-
ticular domain name or IP address, as well as other details such as the registration and
expiration dates.
Digital technologies and law
13
WHOIS scans can be performed using various online tools and services, as well as
with command-line utilities such as whois on Linux/Unix operating systems.
Some of the common use cases for WHOIS scans:
Domain name registration information: WHOIS scans can be used to obtain infor-
mation about the owner of a domain name, as well as the organization responsible for
registering and managing the domain. This information can be useful for investigative
purposes or for contacting the domain owner or administrator.
IP address ownership information: WHOIS scans can also be used to obtain infor-
mation about the owner of an IP address block. This information can be useful for identi-
fying the organization responsible for a particular network or for investigating potential
abuse or security incidents.
Network reconnaissance: WHOIS scans can be used as part of a larger network re-
connaissance effort to gather information about a target organization’s infrastructure and
digital footprint. The information obtained from WHOIS scans can be used to identify
potential attack vectors or vulnerabilities.
Brand protection: WHOIS scans can be used by organizations to monitor and pro-
tect their brand names and trademarks. By regularly querying the WHOIS database for
domain names containing their brand names or trademarks, organizations can identify
potential cases of cybersquatting or trademark infringement.
After knowing all that information about a given URL, it becomes an easy task
to identify a phishing attack, which makes it a very useful and important tool for the
department.
Conclusions.
Cybersecurity now a day is one of the most important subjects around
the world, since all the information whether it was personal, professional or work related
is online, the need of security systems to prevent the loss of said information is some-
thing highly important and necessary.
The web scanner is a tool that have been around for years and it does a great work
preventing phishing attacks victims to fall into the trap, which makes it a very necessary
and important tool for every institution, organization, university and company.
With the help of the web scanner, it is predicted to decrease the number of victims
of a social engineering phishing attack in the University of Sancti Spíritus “José Martí
Pérez”.
References
1. Ahsan M., Nygard K. E., Gomes R., Chowdhury M. M., Rifat N., Connolly J. F.
Cybersecurity Threats and Their Mitigation Approaches Using Machine Learning
a Review // Journal of Cybersecurity and Privacy. 2022. № 2. Pp. 527–555.
2. Gromova E. A., Petrenko S. A. Quantum Law: The Beginning // Journal of
Digital Technologies and Law. 2023. Vol. 1(1). Pp. 62–88.
3. Krombholz K., Hobel H., Huber M., Weippl E. Advanced social engineering
attacks// Journal of Information Security and Applications. 2015. № 22. Pp. 113–122.
4. Rahman N. A., Sairi I. H., Zizi N. A., Khalid F. The Importance of Cybersecurity
Education in School // International Journal of Information and Education Technology.
2020. № 10. Pp. 378–382.
Digital technologies and law
Digital technologies and law
14
5. Ramzan Z. Phishing Attacks and Countermeasures // Handbook of Information
and Communication Security / P. Stavroulakis & M. Stamp (Eds.). 2010. Pp. 433–448.
6. Salahdine F., Kaabouch N. Social Engineering Attacks: A Survey // Future
Internet. 2019. № 11. Pp. 89–92.
7. Schneider, F. B. Cybersecurity Education in Universities // IEEE Security &
Privacy. 2013. Vol. 11. Pp. 3–4.
8. Veale M., Brown I. Cybersecurity // Internet Policy Review. 2020. Vol. 9.
Pp. 1–22.
L. A. Quintero-Domínguez,
PhD, Associate Professor,
University of Sancti Spíritus «José Martí Pérez»
J. A. Antón Vargas,
MSc, Assistant Professor,
University of Sancti Spíritus «José Martí Pérez»
S. Pérez Madrigal,
Eng., Instructor,
University of Sancti Spíritus «José Martí Pérez»
WRAPPER ALGORITHM FOR MULTI-INSTANCE LEARNING:
EARLY RESULTS
Abstract.
Multi-instance learning is a generalization of supervised learning, where
each example is represented by a labeled bag composed by a set of instances. Several
multi-instance learning methods transform each bag into a single instance and then
apply standard supervised learning methods. This paper presents a new multi-instance
learning method that transforms the multi-instance data and is inspired by text mining.
The proposed method transforms the multi-instance data into a traditional attribute-
value representation by creating a corpus of documents formed by artificial words
to reduce the loss of information during the transformation process. In addition, the
proposed method was empirically evaluated using nine multi-instance datasets and two
learning methods that transform the multi-instance data into a traditional attribute-value
representation. The empirical study indicates that, in terms of classification accuracy,
the proposed method is competitive with the learning methods used in the comparison.
Keywords
:
Multi-instance learning, Bag-of-words, Wrapper method, data, algo-
rithm, learning, learning methods
АЛГОРИТМ «ОБЕРТКИ» ДЛЯ МНОГООБЪЕКТНОГО ОБУЧЕНИЯ:
ПЕРВЫЕ РЕЗУЛЬТАТЫ
Аннотация.
Многофакторное обучение – это обобщение супервизорного
обучения, в котором каждый пример представлен меченым мешком, состоящим
из множества экземпляров. Некоторые методы многофакторного обучения
преобразуют каждый мешок в один экземпляр и затем применяют стандартные
