Recognition system of biometric identification of a person’s face based on the anti-spoofing model

Saida Beknazarova; Mekhriban Jaumitbaeva

Рақамли тeхнологиялар соҳасидаги ҳуқуқбузарликларга қарши курашиш ҳамда

ахборот хавфсизлигини таъминлашнинг ташкилий

-

ҳуқуқий масалалари

289

Бекназарова

Саида

Сафибуллаевна

Тошкент

ахборот

технологиялари

университети,

т.ф.д.,

профессор

Жаумитбаева

Мехрибан

Караматдин

қизи

Тошкент

давлат

юридик

университетининг

Ихтисослаштирилган

филиали

ходими

ANTI-SPOOFING

МОДЕЛИ

АСОСИДА

ИНСОН

ЮЗИНИ

БИОМЕТРИК

ИДЕНТИФИКАЦИЯЛАШНИ

АНИҚЛАШ

ТИЗИМИ

Бекназарова

Саида

Сафибуллаевна

Профессор,

доктор

технических

наук,

Ташкентский

университет

информационных

технологий

Жаумитбаева

Мехрибан

Караматдин

қизи

Сотрудник

Специализированного

филиала,

Ташкентский

государственный

юридический

университет

СИСТЕМА

БИОМЕТРИЧЕСКОЙ

ИДЕНТИФИКАЦИИ

ЛИЦА

ЧЕЛОВЕКА

НА

ОСНОВЕ

МОДЕЛИ

ANTI-SPOOFING

Beknazarova Saida

Tashkent University of Information Technologies doctor of technical science,

professor

Jaumitbaeva Mekhriban

Staff member Specialized branch of Tashkent State Law University

RECOGNITION SYSTEM OF BIOMETRIC IDENTIFICATION

OF A PERSON

’

S FACE BASED ON THE ANTI-SPOOFING MODEL

Abstract:

Biometric identification is often referred to as pure or real

authentication, since it uses not a virtual, but a biometric attribute that is actually
relevant to a person. Passwords can be stolen, spied on, forgotten, keys can be forged.
But the unique characteristics of the person himself are much more difficult to fake
and lose. This can be fingerprints, voice, drawing of blood vessels of the retina, gait,
etc. Face recognition looks like a very promising direction for use in the mobile
sector. If everyone includes long been used to using fingerprints, and technologies
for working with voice are gradually and rather predictably developing, then with
face identification the situation is rather unusual and worthy of a small excursion
into the history of the issue.

Keywords:

recognition systems, technology of biometric identification,

person

’

s face, the anti-spoofing model, authentication

Рақамли тeхнологиялар соҳасидаги ҳуқуқбузарликларга қарши курашиш ҳамда

ахборот хавфсизлигини таъминлашнинг ташкилий

-

ҳуқуқий масалалари

290

Today

’

s detection systems show huge accuracy. With the advent of large

datasets and complex architectures, it has become possible to achieve face

recognition accuracy of up to 0.000001 (one error in a million!), And they are

already suitable for portability to mobile platforms. Their vulnerability became

the bottleneck.

In order to personify another person in our technical reality, and not in the

film, masks are most often used. They also attempt to fool the computer system

by presenting someone else instead of their face. Masks come in very different

qualities, from a printed photo of another person held in front of their face to

highly complex 3D heated masks. Masks can either be presented separately in the

form of a sheet or screen, or worn on the head.

The availability of such vulnerabilities is really hazardous for banking or

government systems of user authentication by face, where an intruder

’

s

penetration entails significant losses.

We can to call an attempt to deceive the identification system by presenting

it with a fake biometric parameter a spoofing attack.

And, we will call the complex of protective measures to resist such

deception anti-spoofing. It can be implemented in the form of a variety of

technologies and algorithms built into the pipeline of the identification system.

The ISO offers a somewhat extended set of terminology, with terms such as

presentation attack

–

attempts to force the system to incorrectly identify a user

or to allow him to avoid identification by displaying a picture, recorded video, and

so on. Normal (Bona Fide)

–

corresponds to the normal algorithm of the system,

that is, everything that is NOT an attack. Presentation attack instrument means a

means of performing an attack, for example, an artificially manufactured part of

the div. And finally, Presentation attack detection

–

automated means of

detecting such attacks. However, the standards themselves are still in

development, so it is impossible to talk about any well-established concepts.

Terminology in Russian is almost completely absent.

To determine the quality of the system, the HTER metric (Half-Total Error

Rate) is often used, which is calculated as the sum of the coefficients of

erroneously allowed identifications (FAR-False Acceptance Rate) and

erroneously prohibited identifications (FRR

–

False Rejection Rate) divided by in

half. HTER = (FAR + FRR) / 2.

It is worth saying that in biometric systems, the most attention is usually

paid to FAR, in order to do everything possible to prevent an attacker from

entering the system. And they are making good progress in this (remember one

millionth from the beginning of the article?) The flip side is the inevitable increase

in FRR

–

the number of ordinary users mistakenly classified as intruders. If this

can be sacrificed for government, defense and other similar systems, then mobile

technologies, working with their huge scale, variety of subscriber devices and, in

general, user-perspective oriented, are very sensitive to any factors that can force

users to refuse services. If you want to reduce the number of phones smashed

against the wall after the tenth consecutive denial of identification, the FRR is

worth looking into!

Рақамли тeхнологиялар соҳасидаги ҳуқуқбузарликларга қарши курашиш ҳамда

ахборот хавфсизлигини таъминлашнинг ташкилий

-

ҳуқуқий масалалари

291

The most popular means of deception are masks. Nothing is more obvious

than putting on another person

’

s mask and presenting the face to an identification

system (often referred to as a Mask attack).

You can also print a photo of yourself or someone else on a sheet of paper

and bring it to the camera (let

’

s agree to call this type of attack Printed attack).

Slightly more complex is the Replay attack, when the system is presented

with the screen of another device, on which a previously recorded video with
another person is played. The complexity of execution is compensated by the high
efficiency of such an attack, since control systems often use signs based on the
analysis of time sequences, for example, tracking blinking, micro movements of
the head, the presence of facial expressions, breathing, and so on. All this can be
easily reproduced on video.

Both types of attacks have a number of characteristic features that allow

them to be detected, and thus distinguish a tablet screen or sheet of paper from a
real person.

Let

’

s summarize the characteristic features that make it possible to identify

these two types of attacks in the table:

Printed attack

Replay attack

Decreased image texture quality when printed

Moire

Halftone transmission artifacts when printing on a

printer

Reflections (glare)

Mechanical print artifacts (horizontal lines)

Flat picture (no depth)

Lack of local movement (eg, blinking)

Image borders may be visible

One of the oldest approaches (works of 2007, 2008) is based on the

detection of human blinking by analyzing the image using a mask. The point is to
build some kind of binary classifier that allows you to select images with open and
closed eyes in a sequence of frames. This can be the analysis of the video stream
using landmark detection, or using some simple neural network. And today this
method is most often used; the user is prompted to perform some sequence of
actions: shake his head, wink, smile, and so on. If the sequence is random, it is not
easy for an attacker to prepare for it in advance. Unfortunately, for an honest user,
this quest is also not always surmountable, and engagement drops sharply.

Рақамли тeхнологиялар соҳасидаги ҳуқуқбузарликларга қарши курашиш ҳамда

ахборот хавфсизлигини таъминлашнинг ташкилий

-

ҳуқуқий масалалари

292

You can also use the features of picture quality degradation when printing

or displaying on the screen. Most likely, even some local patterns, albeit elusive to
the eye, will be found in the image. This can be done, for example, by counting
local binary patterns (LBP, local binary pattern) for different areas of the face
after extracting it from the frame. The described system can be considered the
founder of the whole direction of face anti-spoofing algorithms based on image
analysis. In a nutshell, when calculating the LBP, each pixel of the image, eight of
its neighbors are sequentially taken and their intensities are compared. If the
intensity is greater than the central pixel, one is assigned, if less

–

zero. Thus, an

8-bit sequence is obtained for each pixel. The obtained sequences are used to
construct a pixel-by-pixel histogram, which is fed to the input of the SVM
classifier.

Local binary patterns, histogram and SVM.

The HTER efficiency indicator is

“

as much

”

as 15%, which means that a

significant part of attackers overcome the protection without much effort,
although it should be admitted that many are eliminated. The algorithm was
tested on the IDIAP Replay-Attack dataset, which is composed of 1200 short
videos of 50 respondents and three types of attacks

–

printed attack, mobile

attack, high-definition attack.

Рақамли тeхнологиялар соҳасидаги ҳуқуқбузарликларга қарши курашиш ҳамда

ахборот хавфсизлигини таъминлашнинг ташкилий

-

ҳуқуқий масалалари

293

The ideas of image texture analysis were continued. In 2015, Bukinafit

developed an algorithm for alternatively dividing the image into channels, in
addition to the traditional RGB, for the results of which local binary patterns were
again calculated, which, as in the previous method, were fed to the input of the
SVN classifier. The accuracy of HTER, calculated on the CASIA and Replay-Attack
datasets, was an impressive 3% at that time.

HTER 2.9%. (2015) Boulkenafet Z. et al. Face Anti-Spoofing Based on

Color-Texture Analysis

To detect attempts to present a photo, the logical solution was to try to

analyze not one image, but their sequence taken from the video stream. For
example, Anzhos and his colleagues proposed to extract features from the optical
stream on adjacent pairs of frames, to feed it to the input of a binary classifier and
average the results. The approach proved to be quite effective, showing an HTER
of 1.52% on their own dataset.

The method of tracking movements looks interesting, which is somewhat

outside of the generally accepted approaches. Since in 2013 the principle

“feed

a

raw image to the input of the convolutional network and adjust the mesh layers
until the

result”

was not common for modern projects in the field of deep learning,

Bharadwaj consistently applied more complex preliminary transformations.
In particular, he applied the Eulerian video magnification algorithm known from
the work of scientists from MIT, which was successfully used to analyze color
changes in the skin depending on the pulse. Replaced LBP with HOOF (histograms

Рақамли тeхнологиялар соҳасидаги ҳуқуқбузарликларга қарши курашиш ҳамда

ахборот хавфсизлигини таъминлашнинг ташкилий

-

ҳуқуқий масалалари

294

of optical flow directions), having correctly noted that as soon as we want to track
movements, and features we need appropriate, and not just texture analysis. The
same SVM, traditional at that time, was used as a classifier. The algorithm showed
extremely impressive results on the Print Attack (0%) and Replay Attack (1.25%)
datasets.

The

“

first sign

”

can be considered the method of analyzing depth maps in

separate areas (

“

patches

”

) of the image. Obviously, the depth map is a very good

indication of the plane in which the image is located. If only because the image on
a sheet of paper has no

“

depth

”

by definition.

Unfortunately, the availability of a large number of excellent frameworks

for deep learning has led to the emergence of a huge number of developers who
are trying to solve the face anti-spoofing problem head-on in a familiar way of
ensembling neural networks. Usually it looks like a stack of feature maps at the
outputs of several networks, pre-trained on some widespread dataset, which is
fed to a binary classifier.

In general, it is worth concluding that to date, quite a lot of works have been

published, which, on the whole, demonstrate good results, and which are united
by only one small

“

but

”

. All of these results are demonstrated within one specific

dataset!

The situation is aggravated by the limitedness of the available data sets and,

for example, on the notorious Replay-Attack, no one can be surprised by HTER
0%. All this leads to the emergence of very complex architectures, for example,
these, with the use of various tricky features, auxiliary algorithms collected in a
stack, with several classifiers, the results of which are averaged, and so on

…

As a

result, the authors get HTER = 0.04%!

This suggests that the face anti-spoofing task has been solved within a

specific dataset. Let

’

s summarize in a table various modern methods based on

neural networks. As it is easy to see, the "benchmark results" were achieved by a
variety of methods that just emerged in the inquisitive minds of developers.

Unfortunately, the good picture of the struggle for tenths of a percent is

violated by the same

“

small

”

factor. If you try to train a neural network on one

dataset, and apply it on another, the results will turn out to be... not so optimistic.
Even worse, attempts to apply classifiers in real life leave no hope at all.

Рақамли тeхнологиялар соҳасидаги ҳуқуқбузарликларга қарши курашиш ҳамда

ахборот хавфсизлигини таъминлашнинг ташкилий

-

ҳуқуқий масалалари

295

For example, let

’

s take the data from 2015, where the metric of its quality

was used to determine the authenticity of the presented image.

In other words, the algorithm trained on Idiap data, and applied on MSU,

will give a true positive detection rate of 90.5%, and if we do the opposite (train
on MSU, and check on Idiap), then only 47.2 will be correctly determined. % (!)
For other combinations, the situation worsens even more, and, for example, if you
train the algorithm on MSU and check it on CASIA, the TPR will be 10.8%! This
means that a huge number of honest users were mistakenly ranked among the
attackers, which cannot but depress. Even cross-database training could not
change the situation, which seems to be a quite reasonable way out.

In 2017, at the University of Oulu in Finland, a competition was held on its

own new dataset with quite interesting protocols focused specifically on the use
in the field of mobile applications.

–

Protocol 1: There is a difference in lighting and background. The datasets

are recorded in different locations and differ in background and lighting.

–

Protocol 2: Various models of printers and screens are used for attacks.

So, the test dataset uses a technique that is not found in the training dataset.

–

Protocol 3: Interchangeability of sensors. Videos of the real user and the

attacks are recorded on five different smartphones and used in the training
dataset. To check the algorithm, a video from another smartphone is used, which
is not included in the training set.

–

Protocol 4: Includes all of the above factors.

The results were quite unexpected. As in any competition, there was no time

to come up with brilliant ideas, so almost all participants took familiar
architectures and modified them by fine-tuning, working with features and trying
to somehow use other datasets for training. The prize solution showed an error
on the fourth, most difficult protocol, about 10%.

REFERENCE:

1.

Bezrukov B.N. Specification of video monitoring of broadcast television

images, Materials of the HAT International Congress, Moscow, 2002.

–

C. 215

–

216.

2.

Vorobel R.A. Image contrast improvement using a modified method of

lump stretching. Selection and processing of information / R.A. Vorobel, I.M.
Journal.

–

M.: 2000,

–

№14

(90),

–

C. 116

–

121.

3.

Gonzalez R., Woods R. Digital image processing / Pereyev. from English

–

M.: Technosphere, 2006.

–

1070.

4.

Beknazarova S., Mukhamadiyev A.Sh. Jaumitbayeva M.K. Processing

color images, brightness and color conversion // International Conference
on Information Science and Communications Technologies ICISCT 2019
Applications, Trends and Opportunities. Tashkent

–

2019.

Recognition system of biometric identification of a person’s face based on the anti-spoofing model

Keywords:

Abstract

Similar Articles

References