FPGA-SoC Based Object Tracking Algorithms: A Literature Review

Systems for object detection and tracking are becoming increasingly important in practical applications today. Many research and development groups are interested in improving the performance of such systems, and numerous methods have been developed and proposed. Additionally, computer vision is constantly developing and implemented on reconfigurable and embedded systems. The purpose of this study is to present past and recent research works in the field of visual tracking systems that used FPGA and FPGA-SoC platforms. The study includes a brief description of several popular algorithms related to the main characteristics and in which field is preferred. Resource utilization was also considered in this study to present the most and the least resources used to implement different algorithms. The study found that flip-flops (FF) and lookup tables (LUT) are usually used, while BRAM, DSP


INTRODUCTION
Moving object tracking is recently employed in wide applications for example; military, driverless vehicles, industrial, etc.The need for visual object tracking is due to the lack of another tracking system under different special cases.Electromagnetic waves and sonic waves are used for many years for detecting and tracking moving objects.Unfortunately, such systems are inefficient, especially for short distances and harsh environments in addition drone detection and tracking nowadays become more interesting for many researchers due to the failure of radar systems and other traditional tracking systems from this point of view, visual tracking become the trend of many research groups.In general visual tracking systems depend open the sensors (such as a digital camera) and hardware as well as software algorithms.
Object tracking is a central task contained by computer vision systems.The production of high-powered computers, the accessibility of high class and low-cost video cameras, and the growing need for robotic video analysis has created an excessive deal of attention in object tracking algorithms [1].Tracking moving object, guessing the area of the moving object can really reduce extensive searching and make tracking system performance faster [2].Practically, no idealistic system structure exists to process perfectly all kinds of problems within changed background models.To gain actual implementation for this kind of system, trade-offs mostly be made between system the robustness of the system and the system performance like resolution, frame rate, and so on.The main bottleneck of many image processing systems is the Memory usage [3].
Many different approaches have been proposed in the last 40 years, to determine the optical flow which is a technique used for determining the motion of objects in a video by measuring the change in pixel intensities over a set of frames [4].An optical flow is incorporated into computer vision systems that carry out tasks like object detection and tracking detection [5].
The implementation of algorithms using FPGAs offer a good balance between development flexibility, algorithm testing and reconfigurability, real-time performance, and costs [6] [7].A SoC enables the integration of current hardware (HW) acceleration and software (SW) libraries into a single, small device.As a result, this technique enables reductions in both size and power usage [8] [9].
Typically, the algorithms are not ideal for hardware implementation as such, keep the amount of logic in minimum.Reducing the total number of element logics also present smaller power dissipation, which is significant matter in robotic and mobile applications.An outcome of adjusting algorithm to hardware implementation may result a reduction in accuracy.Accuracy reduction is not a problem in some applications because accuracy is not a big deal in that particular application.Many of the systems introduced in this review paper are suitable for adapting the algorithm to FPGA but others need to use both hardware/software implementations [1].
It is challenging to track an item, and it becomes even more challenging when it must be done on an embedded device with restricted resources.The first choice is to separate hardware and software, with software having the advantages of flexibility and speedier design and hardware having the advantage of high throughput [10].Pre-filtering and feature detection are performed at the pixel level by the FPGA [11].
This paper is concerned on the research papers published from 2009 and upward.All the chosen papers (Nineteen) used an FPGA and FPGA-SoC platforms to process image or video signals.The selected papers subject depending on the implementation of object detection and object tracking algorithms on different devices.In most of the research papers, the review was focused on; the used algorithm, image size, frame rate (throughput), used FPGA device, resources utilization, and Power consumption.This paper is organized in five sections.After the first section, introduction, an overview of commonly used algorithms in visual systems is presented in section 2. The literature survey was given in section 3. Important parameters summary gained from the survey section was outlined in section 4. Finally, the discussion and conclusions part found in section 5.

OBJECT DETECTION AND TRACKING ALGORITHMS
To recognize an object from a set of photos, a method or algorithm is needed to pick out key features [12].Various algorithms have been put forth from time to time to increase the tracking process' efficiency.However, no algorithm has yet been created that will function well in all environmental situations [13].Some algorithms may not function properly when the camera is used to record the video moves, while others may not function properly in conditions of intense lighting.Due to the loss of some information when converting a 3-D environment to a 2-D image, real-time processing, noise in the images, changes in scene illumination, complex object shapes, motion, and partial or complete occlusion, the tracking process is in and of itself an extremely complex task [14] [15].
There are many algorithms proposed in the field of visual object tracking.Some of them used as preprocessing stage in visual tracking system which include detecting the objects and others used for tracking objects [16].Figure 1 summarized the most common algorithms used in visual detecting and tracking systems selected algorithms.Visual detection and tracking algorithms can be split into two parts; detection algorithms and tracking algorithms.A short description of the commonly used algorithms are presented in the next subsection.

Object detection algorithms
The first step or the preprocessing stage in surveillance system is object detection.A short description of the common detection algorithms shown in figure 1are presented here: a-Back Ground Subtraction (BGS) Detects the actual background and extracts objects that do not belong to it.To recognize between moving and stationary objects, the BGS algorithm requires three sequential frames and a reference image of a stationary background [17] [18].In order to perform automatic threshold selection, subtraction operation, and pixel-wise classification, a background subtraction model based on Horprasert acquires a reference image to model the background of the scene.This algorithm work correctly when the camera is fixed [15] [19].

b-Mean-Shift algorithm
Due to the unsupervised nature of this algorithm, it can be used in a variety of autonomous applications where no input parameters are provided by the user.The main challenge is computational complexity and scales poorly with both the number of pixels N 2 and number of iterations (k) as O(kN 2 ) [20].Implementing parallel processing and pipelining of such pixels on the FPGA resulting in a decrease in computational complexity for realtime application system.In contrast to its effectiveness when the camera is moving, it is a requirement of this algorithm that there be no occlusion [15].

c-The Camshift algorithm
This is based on the Mean-shift algorithm and it was improved to use the object color information to inform the Mean-Shift algorithm [21].This method is unaffected by changes in object shape.It can effectively address the partial occlusion and object deformation problems with a higher operating efficiency.The limitation is that the histogram of the target image records the probability of the color appearing, so the algorithm requires that the object be manually specified before it can start [2].

d-Gaussian mixture model GMM algorithm
This type of algorithm is considered a probabilistic algorithm that is especially well suited to detecting moving objects in multimodal backgrounds with repetitive motion-showing objects like waves, moving leaves, and flickering light.In the presence of changes in illumination, the GMM algorithm performs well [22] [23].

e-Harris corner (HC) detection algorithm
It has enormous parallel processing power, pixel-wise operation, and operator noise immunity.It performs well when detecting Lshaped corners [11].

f-Canny algorithm
It is an optimal edge detection technique which provide good detection, clear response, and good localization.It is suitable for implementation in a pipeline parallel architecture on the FPGA [24] [25].

g-Principal Component Analysis (PCA) algorithm
It is a method used in many different disciplines, such as artificial vision, power electronics, and statistics.The PCA technique for image processing enables the reduction of redundant information (retaining only essential information) of the initial variables and the evaluation of the degree of similarity between two or more images by analyzing only the fundamental features present in the transformed space [26].

h-Convolutional Neural Network (CNN) algorithm
A CNN is a type of multi-layered neural network.In computer vision, CNN has been rising to greater prominence.The major benefit of CNNs is their capability of self-learning, meaning that the more images they are exposed to, the better they become at classifying objects [12] [27].CNN requires a lot of computing power.During deployment and training, it uses a tremendous amount of computing power.Due to the potential tradeoff between power consumption and reconfigurability, FPGA-based CNN accelerators received a lot of research attention [28].

Object tracking algorithms
The motion can be determined in a video sequence by subtracting two frames acquired after each other.This allows one to see where the movement (change) has occurred but does not provide much information regarding its direction or speed.However, more sophisticated vision systems require this knowledge.Consequently, the idea of optical flow was presented [29] [5].Optical flow is a vector that represents the motion of a target or an object in an image sequence (video) [30].There are various methods for optical flow computation: A)-Horn -Schunck (HS) algorithm Horn-Schunck is a global regularized algorithm that obtains a globally optimized solution with iterative calculation, so the characteristics of the entire image are considered, in other words, on the whole picture, the optical flow should be uniformly smooth.As a result, each pixel's calculated flow in a small neighborhood is similar [5] [30].

B)-Lucas -Kanade (LK) algorithm
Also known gradient-based optical flow estimation algorithm.The LK algorithm took into account a small neighborhood of each pixel rather than looking for the global minimum features of the entire image.Its theory is based on the observation that a pixel moves in the same way as its nearest neighbors.As a result, unlike the HS algorithm, the assumption introduced by the LK algorithm only needs to be satisfied locally [5].Embedded hardware can successfully implement the Lucas -Kanade algorithm [11].In the case of significant motion between consecutive frames, this algorithm's accuracy declines.Use of a highframe-rate scheme is the solution [31] [32] [33].

C)-Kalman filter algorithm
When measurement values are uncertain, the Kalman filter, a minimum mean-square error estimator, provides the most accurate estimation of a linear dynamic system model, including an object's position, velocity, and true measurements [34].Object tracking and motion detection in dynamically positioned vehicles are two typical applications of Kalman filter adaptation [35].

LITERATURE SURVEY
In this section, the research papers which were highlighted in this review paper are separated into two groups, the first group implemented their proposed systems using only FPGA and the other group implemented their proposed systems by using FPGA-SoC platforms.

Research work based on FPGA platforms
Hongtu Jiang et al. [3] proposed a dedicated hardware architecture for real-time segmentation at VGA resolution and 25 frames per second.The authors presented an FPGA platform with a number of memory access reduction schemes, which reduces memory bandwidth by more than 70%.The video sequences with three Gaussian distributions per pixel were used to achieve the real-time segmentation performance.The off-chip DDR SDRAM that houses the Gaussian parameters.Hardware complexity was reduced by updating only one Gaussian parameter at a time.The proposed hardware's bottleneck is memory usage.
A complete implementation of the PCA algorithm was presented by I. Bravo et al. [26] on reconfigurable hardware (FPGA) devices to detect new objects in a scene.Different components of the PCA algorithm's traditional sequential execution have been parallelized.FPGA was used to implement the entire system.Each algorithm part's computation time is also declared.A 128 MB SDRAM memory bank was included in the system and was external to the FPGA.
The lane detection and tracking procedures were created and implemented by Marzotto et al. [7] in a single FPGA device.The suggested system architecture consists of selfcontained logic modules that don't require the assistance of programmable microcontrollers, DSP processors, or external memories because every module was fully implemented inside the FPGA.The authors clearly described the steps in the pre-processing pipeline procedure.The suggested system is flexible enough to adapt to different road conditions without pre-setting.The tracking algorithm is made up of three separate Kalman filters (KF) that each work on three different parameters.The Xilinx System GeneratorTM for DSP is the foundation for the entire FPGA system implementation.Due to the system functionality only utilizing about 30% of the Spartan-3A's hardware resources, an FPGA with fewer hardware resources can be used.
F. Barranco et al. in [31] implemented the optical flow core and the multi-scale extension using high level Handel-C.The PCI interfaces, off-chip memory, and memory controller unit (MCU) are all implemented using the RTL language VHDL.The system architecture, the pipelined stage scheme, and the primary hardware resources were all described by the authors.Additionally, demonstrated that the highest clock operation frequency was possible with minimal resource use.There were two different abstraction levels used in the hardware implementation.In the used device, the monoscale implementation employed 10%-15% of the resources, whereas the multi-scale used about 60%.While the frame rate was decreased to about a tenth of the mono-scale approach.The authors increased the precision results by about three times.
I. Ishii et al. used an enhanced gradientbased algorithm based on the LK method in Ref [32], which can adaptively choose an artificially variable frame rate in accordance with the estimated optical flow's (OF's) amplitude to precisely detect it for entities moving at both high and low speeds in the same image's grayscale 10bit per pixel representation.The optical flows were estimated at 1000 f/s for every of the 1024×1024-pixel image's 1024 block regions of 32×32 pixels.Software on the PC carried out the operations in the computation of the 1024 blocks at the block level.Authors employ two FPGAs: one is used for processing and displaying images, and the other for implementing user algorithms on hardware.
M. Genovese et al. in Ref. [36] proposed an FPGA implementation of Open source Computer Vision software library which is developed by Intel (OpenCV).Variety platforms used to synthesiz and implemente GMM algorithm, including the Stratix-IV Altera FPGA and the Virtex-6, Virtex-5, and Spartan-3 Xilinx FPGA.The circuit used three Gaussian distributions for each pixel when processing grayscale videos.The proposed circuit processes 13 parameters for each pixel, including the luminance value and 12 Gaussian parameters for the pixel's statistical model.Utilizing the Stratix-IV platform from Altera FPGA, high frame rate and operating frequency were achieved.
A pipelined, parallel optical flow algorithm developed by G. K. Gultekin et al. [37] significantly boosts system throughput by using multiple clock domains.The memory interface circuit operates at a higher clock rate than the calculation modules, which aids in removing the design's memory bottleneck.Authors used the divide by powers of two methods to approximate the division operation.The performance of the proposed hardware implementation's algorithm is compared to that of a PC implementation by the authors.The comparison revealed that the reference FPGA implementation ran faster than the PC implementation by about 146 times.Additionally, the power used by FPGA implementation is only 844.38mW, or about 1/40 of the power used by a 1.66 GHz personal computer processor.The 200 MHz (SSRAM controller) and 50 MHz (optical flow controller) clocks are produced by a phase locked loop (PLL) internal circuit.
In Ref. [11], M. Tomasi et al. proposed an FPGA+DSP system that outperformed ARM+DSP and DSP only configurations by about 20 and 3 times, respectively.In the FPGA platform, a fine-grain pipeline was utilized by the authors to implement Harris corner detection, with a data rate of 60 megabytes per second (MBPS).In order to achieve a total frame rate of 160 fps for VGA images, the DSP simultaneously receives and tracks the features that the FPGA has detected.For comparison, the performance was split into detection and tracking.An IP core for Harris corner detection was created.The FPGAbased detection algorithm was executed, and the DSP board-which serves as the FPGA's coprocessor-ran the LK algorithm to track the detected feature points.Authors didn't state the hardware architecture design of the used algorithms.Instead, they presented an analysis of the processing times and hardware performances in three architectures used: FPGA, DSP, and ARM.FPGA proved the faster speed performance of 4.9ms for VGA images compared with 138ms for ARM and 10.6ms for DSP.
In their novel hardware architecture, and instead of storing the original input image, H.-S. Seong et al. [33] suggested storing the input image following the Gaussian filtering process.To reduce the external memory access to a quarter of the original data, the Gaussian-filtered image was downsampled in both the horizontal and vertical directions.The total memory access is reduced by 75% when using this technique (2:1 subsampling) in both the vertical and horizontal directions.With a slight increase in hardware resources, the external bandwidth was reduced.Instead of using multipliers, which consume more time and resources, the authors used the streamlined Gaussian coefficient, which only requires shifter and adder operations.The authors went into great detail about the suggested hardware organization for the LK algorithm design.
S. Sajjanar et al. in Ref [15] presented a complete system module including various submodules which are the controller, storage, display, and camera capture modules.The description of the RTL blocks ports of each Module explained clearly.The system used three memory modules: a display VGA module, a frame buffer module, and a background memory module.The incoming and reference frames were stored in the first two, and the resultant frame was stored in the third.The camera register was configured to automatically obtain data in the YUV form, which represent each pixel of an image using 24 bits (eight bits for each Y, U, and V).In later stages, only the 8 bits corresponding to Y, which denotes an image's grayscale, are used.
A. Arif et al. [17] described an implementation of an algorithm to process traffic camera image sequences in real-time.The algorithm requires four frames (images) as input: the frame being studied, the frame before it, the frame after it, and the reference stationary background.The algorithm compares the corresponding pixels from three subsequent frames to determine the weighted difference.When the difference is 0, it means that the corresponding pixel hasn't moved at all.Thus, there is no need to update the reference background.The algorithms are written in OpenCL code.The power consumption of the background subtraction and Lucas-Kanade algorithms on each platform, FPGA, CPU, and GPU was compared by the authors.FPGAs' computational prowess and power effectiveness demonstrated that they are excellent candidates for applications that call for intensive data processing, particularly in real-time.
P. Hobden et al. in Ref. [12] provide a solution method to overcome the limited floatingpoint resources but keep running in real-time operation.Two modules were included in the proposed method: tracking and detection of unmanned aerial vehicles (UAVs) using neural networks (NNs).The tracking module used a background-differencing algorithm, while the UAV detection used a modified CNN algorithm.Authors compared the implementation of MATLAB and Xilinx Deep Learning Processor Unit (DPU) on the UltraScale ZCU102 against their model by using the data set of the same images.

Research work based on FPGA-SoC platforms
U. Ali and M. B. Malik [21] presented a hardware/software co-design architecture for the well-known kernel-based mean shift tracking algorithm.The target's color histogram was used in the design as a tracking feature.The target was located in the following images by maximizing the statistical match of the color distributions.The system was able to track multiple targets at frame rates of up to hundreds of frames per second by localizing them using gradient-based iterative search as opposed to exhaustive search.The authors discussed how long various tasks took to complete.All computationally intensive tasks were mapped on hardware to achieve maximum frame rate, while the prediction filter and main tracking loop update were implemented in software to simplify target initialization procedures by taking inputs from the user.
An FPGA-based embedded architecture with low degradation that can extract the background in environments with limited resources was proposed by R. Rodriguez-Gomez et al. [19].The MicroBlaze processor, which was employed to create the benchmark background model and to update it over time, is a part of the suggested architecture.Using fixed-point operations, a specific hardware module carried out the subsequent stages of subtraction and pixel-by-pixel classification.The authors ran the system modules with different clock frequency domains.To make the FPGA clock supply networks simpler, the authors ran the suggested IP core BGS at the same frequency as MicroBlaze and system buses.The architecture's hardware complexity was greatly reduced during the training phase by precalculating and storing a number of constants.Division operations are avoided by using multiplications instead, which use less hardware.The fundamental Horprasert model was enhanced to effectively handle shadows, which represents a significant advancement in common scenes.
S. Guo et al. in Ref. [38] used the processing (PS) to run a Gaussian background model-based detection program to detect moving objects entering the field of view.The reconfigurable area of the Programmable Logic (PL) was programmed with the accelerator portion of the Gaussian background model (FPGA subsystem).The Compressive tracking (CT) algorithm was employed for object tracking.After being converted to 320x240 8-bit grey scale, the acquired images are saved in the image preprocess module using DDR3 RAM.The diagrams for the background model and the tracking model were thoroughly explained by the authors.Additionally, a performance comparator with and without a hardware accelerator was presented.The average tracking frame rate for the proposed system was 9.48 times faster than that of the ARM processor-based pure software solution.There were some unavoidable issues with the proposed system.Only videos with a static background are processed by the detection component.Additionally, the drift issue restricted the tracking component, which ultimately caused the tracking operation to fail.
Because of the stability of the Mixture of Gaussian (MoG) algorithm for background subtraction, which is implemented in FPGA, G. Conti et al. in [8] selected it for HW acceleration.The robustness of the Kalman filter, which is based on a statistical model, led to its implementation in PCs for tracking.Gray scale and RGB were used by authors to measure the speed differences between them.Although the results from the RGB version of the algorithm were more accurate, it was less efficient than the gray-scale version.More pixels are simultaneously stored in FPGA memory using grayscale.The RGB version offers the ideal balance between accuracy and processing time.J. G. Pandey used the mean shift-based moving object tracking algorithm in [39].A detailed explanation of the algorithm implementation in the FPGA. the circuits utilized as intellectual property (IP) cores in the framework for the implementation of a mean shift algorithm based on the kernel for tracking moving objects.The computation of the Bhattacharyya coefficient, mean shift vector, and associated circuits implemented using fixed-point binary logarithmic and antilogarithmic units.The tracking algorithm was initially developed in C, tested using a number of saved video files, and then implemented using RTL-level VHDL code.
For the tracking and detection of multiple objects, P. Babu et al. [35] presented a multi-dimensional Kalman filter (MDKF) for linear systems with updated state vector and covariance equations.Additionally, the hardware multi-dimensional Kalman filter implementation on the Zynq SoC with efficient resource utilization was shown.On various benchmark datasets, the MDKF tracking algorithm was used to estimate performance and accuracy.The only resource limitation was the usage of DSP blocks in the Zynq SoC, which may run out as the number of states for measuring uncertainties increased.
To process different numbers of pixels concurrently depending on the scale and without using additional external memory to store temporal values, a multi-scale method, gradientbased algorithms, Lucas-Kanade and Horn-Schunck were implemented on a ZCU platform with a Zynq UltraScale+ MPSoC FPGA by K. Blachut et al. [5].For each potential input vector, Look-Up Tables (LUT) were created with precalculated values.The four pixels per clock data format used in the 4K video stream allows authors to lower the lower frequency to 150 MHz needed for real-time processing.P. Hobden et al. discovered a way to deal with the issue of scarce floating-point resources while preserving real-time applications in ref. [12].The solution consists of modules for neural network-based UAV detection and tracking of unmanned aerial vehicles (UAVs).While the UAV detection used a modified CNN algorithm, the tracking module used a background-differencing algorithm.The CNN algorithm was used to deliver a feedback path so that it could be verified that the tracking had locked onto the right object and not the wrong one.In order to best allocate hardware resources on the PL unit, the authors implemented a few layers in the Advanced RISC Machines (ARM) PS core of the Zynq, and they also carried out the training using MATLAB on a PC.

SUMMARY OF THE REVIEWED PAPERS
In this section, the most crucial factors, including the algorithm chosen, the platform is chosen, the frame rate, the image resolution, the frequency of use, and the power consumption of the preceding research papers are collected in table 1 to simplify the performance comparison for the readers while table 2 shows the number of resources utilization and the percentage of the used resources to the total number for most of the selected papers that write the results in details.It is clear that multipliers are the less used resources compared with others because they drain FPGA resources.
From table 1, it is clear that there is a tradeoff between image (target) capture size and the frame rate.Other indication that can be obtained from the comparison, authors in ref. [33] [36] implement the same algorithm in different platforms while authors in ref. [5] implement two different algorithms on the same device as explained in table 2. From all the selected papers, most hardware implementations of detecting and tracking algorithms based on FPGA devices, one of the selected paper employed FPGA-DSP platforms and the remainder employed the popular recent technology based on hardware/software (co-design).
The extracted information from the mentioned researches about the numbers and resources percentage utilization related to; slices, FF, BRAM, DSP, LUT, and multipliers are collected in table 2. Although the algorithms and applications presented in this paper are different, a computation of the total resources for all the used devices in the research papers was presented in figure 2. The percent ratio between each resource type to the total resources (regardless of the type of resource) was calculated and plotted to give an overview of the most used resources among the others in such application to provide an indication for researchers and industrials.Any not specified resources by the researchers are not considered and not included in figure 2 so to reach a reasonable approximation ratio as possible.Another comparison presented in figure 3 explains the percentage ratio between the usage of each resource type to the total number of the same resource.
From figure 2 and 3 it is shown that BRAM, DSP, and multipliers are the limited FPGA resources.It is clear that multipliers are the lower used compared with others because of their resource-consuming.Flip-flops (FF) are the most used resources and the most fabricated numbers among the others.Slices come in the second order after FF.The normalized percent ratio for each resource utilization.

DISCUSSION AND CONCLUSIONS
In this review paper and from the selected subject area in the field of common detection and tracking algorithms in surveillance systems, some of the reviewed documents proposed the hardware implementation of one or two algorithms using just FPGA platform, other articles used both hardware/software Co-design (also called hybrid system).From figure 2 it is clear that flip-flop (FF) is the most used among other FPGA resources.The other logic cell is the look-up table (LUT).On the other hand, multipliers are rarely used because it consumes more devices and computational time.By using gray-scale, more pixels can be stored in FPGA memory when memory usage is limited also the processing speed can be increased to obtain high frame rate.
Although using RGB version reduce the frame rate compared with using gray scale but using it get more accurate results for detection and tracking.RGB version is the solution for complex environments.
By using LUT for the constrained range, hardware resources can be used more sparingly and redundant data storage can be avoided.Also we can get highest clock operation frequency when use less resources.
When an algorithm has a computationally intensive task, it must be implemented in hardware to achieve the highest throughput (frame rate).However, if the algorithm needs to interface with the user, it is worthwhile to implement this portion of the algorithm in software because it will facilitate target initialization procedures by accepting user input.FPGAs' computational prowess and power effectiveness demonstrated that they are excellent candidates for applications that call for intensive data processing, particularly in real time.
Other researchers calculate some parameters that consume many resources and use lookup tables to save and then use them during operation.
In comparison to more conventional approaches, the CNN algorithm is an appropriate choice for devices with limited DSP slices which is used for floating-point implementation.
In some applications it is possible to simplify the hardware implementation so that shifter and adder operations can be used instead of multiplier that cost more resources and time.
The co-design approach, which was used to reduce the amount of hardware needed.Additionally, by optimizing the code at key points and interfaces, high-level languages like ImpulseC and RTL descriptions defined using VHDL enables a reduction in the implementation strategy and the achievement of high performance.

Fig. 2 :
Fig.2: The percentage ratio for each resource utilization to the sum of the total number of all resources type.

Fig 3 :
Fig 3: The normalized percent ratio for each resource utilization.

Table 1 :
Performance comparison of FPGA based for different visual algorithms of recent and earlier literatures.(-) indicates not specified value.

Table 2 :
Resource usage of the overall design on the FPGA devices.