INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE
ISSN: 2692-5206, Impact Factor: 12,23
American Academic publishers, volume 05, issue 08,2025
Journal:
https://www.academicpublishers.org/journals/index.php/ijai
68
PARALLELIZATION OF FAST HAAR TRANSFORM ALGORITHMS ON DUAL-
CORE SPECIALIZED PROCESSORS
Ibragimov Sanjarbek Salijanovich
Andijan State Technical Institute, associate professor, PhD
e-mail:
Annotation:
This article addresses the issue of parallelizing digital signal processing methods
based on the Haar transform using Andrews' algorithm on dual-core specialized processors. The
study develops mechanisms for parallel execution of the fast Haar transform algorithm by
effectively utilizing the hierarchical memory model, DMA controller, and core architecture of
the Blackfin ADSP-BF561 processor. Algorithmic experiments were conducted in C++ using
the VisualDSP++ platform, based on block-polynomial basis functions (constant, linear,
quadratic). Practical tests compared performance between the single-core ADSP-BF533 and
dual-core ADSP-BF561 processors, achieving speed-up coefficients ranging from 1.14 to 1.80.
The results demonstrate that multi-core computing significantly improves signal processing
efficiency and reduces processing time.
Keywords:
Fast Haar Transform (FHT), Andrews Algorithm, Parallel Computation, Dual-Core
Processor, Blackfin ADSP-BF561, Digital Signal Processing (DSP), Piecewise Polynomial
Basis Functions, VisualDSP++ Development Environment, Acceleration Coefficient.
INTRODUCTION
The Blackfin ADSP-BF561 processors are symmetric dual-core specialized processors
[ 2,8,9]. The ADSP-BF561 employs a hierarchical three-level memory architecture. The first-
level (L1) memory operates at the core clock frequency, although its capacity is relatively
limited. Each Blackfin core is equipped with 100 KB of L1 memory, which includes the
following components [4]:
16 KB instruction memory (SRAM/cache)
16 KB instruction memory (SRAM)
32 KB data memory (SRAM/cache)
32 KB data memory (SRAM)
4 KB scratchpad memory (fast SRAM)
The second-level (L2) memory consists of a 128 KB SRAM that is integrated with the
core. The L2 memory is shared between both cores and can store common instructions and data.
To optimize data exchange between L1 and L2 memory, the processor architecture includes a
dedicated four-channel internal Direct Memory Access (DMA) controller.
INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE
ISSN: 2692-5206, Impact Factor: 12,23
American Academic publishers, volume 05, issue 08,2025
Journal:
https://www.academicpublishers.org/journals/index.php/ijai
69
Figure 1. Functional Block Diagram of the ADSP-BF561 Processor
In the hierarchical memory architecture of the Blackfin processor, the third-level (L3)
memory refers to external memory. The external memory space can map up to four SDRAM
banks ranging from 16 MB to 512 MB, as well as four asynchronous memory banks (such as
ROM, SRAM, EEPROM, or Flash), each with a capacity of up to 64 MB [1,6].
The Blackfin processor instruction set is optimized such that the most frequently used
instructions are encoded using 16-bit opcodes. Digital Signal Processing (DSP) instructions are
implemented as multifunctional operations encoded in 32-bit format. The internal bus
architecture and computational units are designed to enable each processor core to execute
multiple instructions per cycle, thereby enhancing code density.
In dual-core Blackfin ADSP-BF561 processors, parallel computing algorithms are utilized for
solving practical tasks. Computational processes are distributed separately across each core
[2,6-9].
RESEARCH METHODOLOGY
A parallel computation algorithm based on the Fast Haar Transform (FHT) was
developed for digital processing of one-dimensional geophysical signals on dual-core
specialized processors of the Blackfin ADSP-BF561 architecture.
INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE
ISSN: 2692-5206, Impact Factor: 12,23
American Academic publishers, volume 05, issue 08,2025
Journal:
https://www.academicpublishers.org/journals/index.php/ijai
70
According to the Fast Haar Transform method proposed by Andrews, in each iteration,
the array X is formed by computing the sums of consecutive pairs from the previous X array,
while the array C contains the differences of the same consecutive pairs.
X i =X 2i +X 2i+1
(1)
C i+
N
2
j
=X 2i −X 2i+1
.
The sum of the X values computed in the final iteration is equal to the following
expression:
C 0 =X 0 +X 1
(2)
Here, i=0...N/2-1,
j=1…k
. The number of iterations is determined by the formula
k=log
2
N
.
The algorithm of the parallel computation process of the Fast Haar Transform based on
Andrews’ method on a dual-core specialized processor is presented in Figure 2.
INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE
ISSN: 2692-5206, Impact Factor: 12,23
American Academic publishers, volume 05, issue 08,2025
Journal:
https://www.academicpublishers.org/journals/index.php/ijai
71
Figure 2 Parallel Computation Algorithm for Andrews-Based Fast Haar Transform on
the Dual-Core Processor Blackfin ADSP-BF561
Based on this algorithm, the processes of inputting and outputting array values are
executed sequentially on Core A. The computation of the Fast Haar Transform (FHT) is evenly
distributed between Core A and Core B. Specifically, for each iteration, the first half of the
array elements is processed by Core A, while the second half is processed by Core B, enabling
parallel execution. Initially, the program code on Core A is launched, and Core B is activated
using the adi_core_b_enable() function.
Figure 3. Block Diagram of the Parallel Computation Algorithm for Piecewise-Polynomial
Bases on a Dual-Core Specialized Processor
The parallel computation algorithm of Fast Haar Transforms (FHT) using piecewise-
polynomial bases on dual-core processors is presented in Figure 3.
INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE
ISSN: 2692-5206, Impact Factor: 12,23
American Academic publishers, volume 05, issue 08,2025
Journal:
https://www.academicpublishers.org/journals/index.php/ijai
72
RESULTS
Fast Haar Transforms based on Andrews' method were implemented using piecewise-
polynomial bases on ADSP BF-533 (single-core) and ADSP BF-561 (dual-core) specialized
processors from the Blackfin processor family, utilizing the C++ programming language within
the VisualDSP++ development environment. The processors selected for the computations had
the same clock frequency: the ADSP BF-533 core operated at 600 MHz, and each core of the
ADSP BF-561 also operated at 600 MHz.
To evaluate the computation time of Fast Haar Transforms based on piecewise-
polynomial bases on these processors, experiments were conducted using the analytical
function y=e
x
as input data [3,7]. The results of the study are presented in Table 1.
Table 1.
Comparison of Computation Times for Fast Haar Transforms Based on Piecewise-
Polynomial Bases on Single-Core and Dual-Core Processors
N
Piecewise Constant
Piecewise Linear
Piecewise Quadratic
ADSP
BF-533
(sec.)
ADSP
BF-561
(sec.)
Accelerati
on
Coefficie
nt
ADSP
BF-533
(sec.)
ADSP
BF-561
(sec.)
Accelerati
on
Coefficie
nt
ADSP
BF-533
(sec.)
ADSP
BF-561
(sec.)
Accelerati
on
Coefficie
nt
16
0,00110 0,00096 1,14
0,00110 0,00100 1,10
0,00112 0,00097 1,15
32
0,00385 0,00297 1,30
0,00382 0,00295 1,30
0,00386 0,00278 1,39
64
0,01429 0,00950 1,50
0,01423 0,00994 1,43
0,01432 0,00914 1,57
128 0,05498 0,03414 1,61
0,05471 0,03567 1,53
0,05502 0,03322 1,66
256 0,21560 0,12723 1,69
0,21426 0,13206 1,62
0,21544 0,12534 1,72
512 0,85368 0,49602 1,72
0,84398 0,50379 1,68
0,85341 0,48413 1,76
1024 3,38911 1,94438 1,74
3,35060 1,93456 1,73
3,39144 1,90746 1,78
2048 13,4547
8
7,66087 1,76
13,30188 7,52543 1,77
13,4979
4
7,53445 1,79
4096 53,4154
8
30,26043 1,77
52,80848 29,72544 1,78
53,8568
0
29,89671 1,80
In this table:
N – number of array elements;
The Acceleration Coefficient [3,7,9] is calculated using the following formula:
Acceleration Coefficient=
T
single−core
T
dual−core
CONCLUSION
As seen from Table 1, the dual-core processor requires approximately 1.77 to 1.8 times
less computation time compared to the single-core processor, resulting in acceleration
coefficient of 1.8. This indicates that parallelization of computations using multi-core
processors significantly reduces processing time and improves overall efficiency compared to
single-core execution.
INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE
ISSN: 2692-5206, Impact Factor: 12,23
American Academic publishers, volume 05, issue 08,2025
Journal:
https://www.academicpublishers.org/journals/index.php/ijai
73
One of the main advantages of developing parallel programs for dual-core processors on
the VisualDSP++ platform is the ability to create separate lightweight program modules for
each core and the shared memory. This enables both cores to perform different computations in
parallel simultaneously. Additionally, in the shared L2 memory design, both cores can access
common data and functions concurrently, further enhancing the potential for parallel processing.
REFERENCES:
1. Андрейченко Д.К., Велиев В.М., Ерофтиев А.А., Портенко М.С. Теоретические
основы параллельного программирования. Саратов 2015 – 282с.
2. Вальпа О.Д. Разработка устройств на основе цифровых сигнальных процессоров
фирмы Analog Devices с использованием Visual DSP++. – М.: Горячая линия-Телеком,
2007. – 270 с
3. Воеводин, В.В. Параллельные вычисления. СПб.: БХВ-Петербург. 2002. - 608 с.
4. Гергель
В.П.
Высокопроизводительные
вычисления
дл
многоядерных
многопроцессорных систем. Учебное пособие – Нижний Новгород; Изд-во ННГУ им.
Н.И.Лобачевского, 2010 – 420с.
5. Зайнидинов Х.Н, Усмонов Б.Ш. Архитектура компьютеров и компьютерных систем.
// Ташкентский университет информационных технологий им. Мухаммада Ал-
Хоразмий. –Т., 2020. -640с. ISBN 987-9943-5805-5-8
6. ADSP-BF561 Blackfin® Processor Hardware Reference, Analog Devices, Inc. 2013.
7. Deergha Rao K., Swamy M.N.S. Digital Signal Processing Theory and Practice // Springer
Nature Singapore Pte Ltd. 2018 – 799 p.
8. Zaynidinov H.N., Ibragimov S.S., Tojiboyev G‘.O. Comparative Analysis of the
Architecture of Dual-Core Blackfin Digital Signal Processors. International Conference On
Information Science And Communications Technologies: Applications, Trends And
Opportunities http://www.icisct2021.org/ ICISCT 2021, November 3-5, 2021.
https://ieeexplore.ieee.org/document/9670135 (SCOPUS). p.1-5
9. Zaynidinov H.N., Ibragimov S.S., Tojiboyev G‘.O., Nurmurodov J.N. Efficiency of
Parallelization of Xaar Fast Transform Algorithm in Dual-Core Digital Signal Processors.
2021 8th International Conference on Computer and Communication Engineering (ICCCE).
22-23 June 2021, Kuala Lumpur, Malaysia (SCOPUS). p. 7-12
