Design and FPGA Implementation of Two-Dimensional Discrete Wavelet Transform Architectures Using Raster-Scan Method

Jassim M. Abdul-Jabbar Zahraa Talal Abed Al-Mokhtar Dept. of Computer Eng., College of Eng., University of Mosul, Mosul, Iraq drjssm@gmail.com zah_ta84@yahoo.com Abstract In this paper, an FPGA implementation of a 2-dimenional discrete wavelet transform (2-D DWT) is proposed to efficiently construct the corresponding twodimensional architecture by using the raster-scan image method for any given hardware architecture of one dimensional (1-D) wavelet transform filter. The proposed method is based on lifting scheme architecture. The resulting architectures are simple, modular and regular for computation of one or multilevel 2-D DWT. These architectures perform both low pass and high pass filter with multiplierless coefficients calculation. In addition they require a small on-chip area to download the architectures on FPGA Board (Spartan-3E). The proposed 2-D architecture consists of: external memory, Row 1-D arithmetic module, column 1-D arithmetic module and internal memory unit. The row and column 1-D arithmetic units are designed utilizing Biorthogonal filters (5/3 and 9/7).


Introduction
The discrete wavelet transform (DWT) is developed as an effective tool for multiresolution analysis since its first presentation by Mallat [1] due to its good timefrequency characteristics.The DWT is widely used for signal and image analysis.The direct implementation of DWT usually requires heavily arithmetic computations because it is essentially a two-channel filter bank (i.e.convolution method).Lifting scheme is latest efficient way to reduce the arithmetic complexity and to provide an in place implementation [2].For 2-D DWT, there are many VLSI architectures [3] - [7].The direct method of 2-D DWT architecture performs 1-D DWT on the every row and then on every column of the image.However, this architecture needs a frame buffer as shown in Fig . 1 which is usually an off -chip to store the intermediate data [8].The external memory access consumes the most power in the 2-D DWT hardware implementation.The systolic architecture may be preferred because of the smaller external memory access.But this architecture requires some internal line buffer and processing elements which increase the die area and control complexity [6].In this paper, a simple method is proposed to implement separable 2-D rasterscan image architectures to solve the problem of internal memory access.
The basic idea is based on utilizing 2-Data buffer to store the approximation and detail coefficients.These architectures are flexible and can be used for many filter kinds.Based on lifting scheme method, a 1-D arithmetic module is used to build the 2-D DWT for one and multilevel transformation.The 2-D DWT architectures are designed to perform the one and multilevel using biorthogonal 5/3 filter & 9/7 filters.
The rest of this paper is organized as follows; The lifting scheme analysis is given in section 2. The 1-D arithmetic modules for 5/3 and 9/7 filters are presented in section 3.In section 4, the proposed architectures are explained.The simulation result and hardware requirement are illustrated in section 5 and 6, respectively.The comparison of the proposed architectures with other architectures of recent studies is shown in section 7. Finally, section 8 concludes this paper.

Lifting-Based DWT
The lifting schemes consist of 3-steps as shown in Fig. 2 [9]: 1-Split step: -the input signal is separated into even and odd samples.2-Predict step: -the even samples are multiplied by the time-domain equivalent of t(z) and then added to odd samples.3-Update step: -the predict samples are multiplied by time-domain equivalent s (z) and then added to even samples.The basic principle of the lifting scheme is to factorize the polyphase matrix of wavelet filter into a sequence of alternating upper and lower triangular matrices and diagonal matrices [9].The corresponding polyphase matrices are defined by the following matrix equation [4]: (z) P ~ is defined as polyphase matrix of decomposition stages.
The above process can be summarized as in the following lifting factorization, where the polyphase matrix of the decomposition stage is defined as The above equation can be rewritten as

Construction of the 9/7 wavelet filter
It is necessary to point out that, for any given filter coefficients h(k)= h(-k) and g(k)=g(-k); [11] they need not be wavelet filters.In fact, in order to achieve the 9/7 wavelet filter, may be calculated using the vanishing moments that satisfy M =2 and M = 4, can be calculated using h (k) Based on the vanishing moments conditions and normalization conditions h(1) = 2; g(1)= 1, and comparing ( 15) and ( 16), different coefficients can be evaluated as From ( 16) and ( 17), symmetric biorthogonal 9/7 filters can be obtained with a free variable t.In other words, for any real parameter t, ( 16) and ( 17) can be used for signal decomposition, where low pass high pass filters for analysis and synthesis, respectively.In order to achieve the 9/7 symmetric biorthogonal wave D ub c ' theorem can be employed to determine the interval that the parameter t (z) can be included.Firstly, denote h 9 (z) and 7 (z) by [11] ) Taking t =1.25; then the coefficients of the 9/7 wavelet filter are given by {h(0), h(1), h(2), h(3), h(4)} = (1/10) x{190/16, 86/16, 24/16, 6/16, 9/16} (20)

1-D arithmetic module
The proposed architecture contains the two dimensional forward discrete wavelet transform (2-D FDWT) and the two dimensional inverse discrete wavelet transform (2-D IDWT).It calculates in row -column fashion on the NXN image by using Biorthogonal filters (5/3 & 9/7 filter)).Such architecture consists of: Row 1-D DWT arithmetic module, memory unit, and column 1-D DWT arithmetic module, Note that the forward discrete wavelet transform (FDWT) and the inverse discrete wavelet transform (IDWT) are symmetrical.In the remaining parts of this paper, all the details are discussed in terms of discrete wavelet transform modeling by using (5/3, 9/7) filters.

1-D DWT arithmetic module for 5/3 filter
The 1-D DWT module of lifting scheme can be naturally pipelined as shown in Fig. 3 [13].
Since such module uses multipliers and adders, another 1-D DWT arithmetic module using pipelining of shift integer adder is proposed as shown in Fig. 4. The pipeline structure reduces area cost of the architecture as compared with the multiplier arithmetic module and reduces the worst delay path between registers.
The proposed 1-D DWT arithmetic module in 5/3 filter consists of two stages; the first stage is used to calculate the detail coefficient and the second stage calculates the approximation coefficient as shown in Fig. 5.So each stage has three input signals and two data output signals.

1-D DWT arithmetic module for 9/7 filter
In this paper, the 9/7 filters are designed as 4-stage pipelined shift integer adder without using any multipliers as shown in the Fig. 6.Depending on equation ( 9).So the 9/7 filter can be designed as two blocks, each block is a 5/3 filter pipelined -shift integer adder module with a simple change in block 1 as shown in Fig. 7.

The Proposed FPGA Implementation 4.1 The proposed 1-level architectures
The proposed architecture performs 2-D DWT on the input signal (image) by using row-column fashion, so it is required to use three arithmetic modules to calculate the lifting scheme.One of these arithmetic modules is used to calculate the 1-D DWT for each row in the image and the others are used to calculate the 1-D DWT for each column in the image.As shown in Fig. 9, The input data (image NXN) must be read as raster scan (i.e.row by row) from external memory, the first row of the image is read from an external memory and stored in the Memory unit 1 (MEM1).After that, the row 1-D arithmetic module of 5/3 filter (block 1) reads three input signals from MEM1, calculates the approximation and detail coefficients and stores the results in MEM2.These operations are repeated until the last pixel of the first row.The second row of the image is stored in MEM1 and the row 1-D arithmetic module of 5/3 filter performs the same operations to calculate the approximation and the detail coefficients of the second row which are stored in MEM3 as shown in Fig. 9

The proposed multilevel architectures
The proposed 1-level architectures can be easily extended to the multilevel architectures by using the following two Method 1:-using the proposed 1-level architecture and storing, LL sub-band image in an external memory.When the first level finishes, the same processing element (1-level proposed architecture) reads the LL sub-band image.This process needs the use of the external memory.This will increase the time delay and increase the external memory access but the die area need to build this method will be small.Method 2:-using n-time of the 1-level proposed architecture, where n is the number of levels.In this paper, the architectures are designed via this method to perform the 3-level 2-D DWT.This method needs a die area in FPGA board greater than the method 1, but method 2 reduces the external memory access and the output from level 2 and level 3 can be used immediately.The multilevel architecture for method 2 is as shown in Fig. 10.

Simulation Results
In order to determine the number of bits required for representing each signal, two lengths must be determined.The length of real part (m1) and the length of mantissa part (m2) as shown in Fig. 11.The number of bits for mantissa representation turns out to be the critical parameter for the design.In the proposed architectures, to find the minimum

HL HH
values of m1 and m2 with reasonable high image quality as shown in Table 1, a Matlab code for a set of standard images is performed.From Table 1, the real part consists of 8 bits in both (5/3 filter and 9/7 filter) architectures, but the mantissa part in (5/3 filter) architectures is 4 bits and the mantissa part in (9/7 filter) is 9 bits.Fig. 10 The proposed multilevel architecture.

Hardware Implementation
After achieving satisfactory results from Matlab simulation in the previous section, we proceed to the next stage where the codes are translated into VHDL.Then, the VHDL codes are synthesized using the synthesis tool, which produces " v " architecture for VLSI implementation.Finally, the design is down loaded into the FPGA board (Spartan-3E) for the functionality verification.The synthesis results are presented in Table 2.

A Comparative Study
The proposed design can implement a 2-D multilevel lifting scheme discrete wavelet transform concurrently.It does not use any external memory to store the intermediate results.This will avoid the delay caused by the memory access.As illustrated in Table 3, the design of 1-level of raster-scan 2-D DWT architecture for 5/3 filter requires 12 adders and 9 shift registers, without any multiplier.The multilevel (3 levels) architecture requires 36 adders and 27 shift registers.For 9/7 architectures, the 1-level raster-scan 2-D DWT requires 36 adders and 54 shift registers, while multilevel architectures require 108-adder and 162 shift registers.

Table (3):
The total number of adders and shift registers required for the design of the proposed architectures.
It should be noted that the direct structure uses 4-multipliers and 8 adders [12], while the flipping structure reduces the critical path by releasing the major computation path, but without any hardware overhead [10].The internal memory size in both direct architecture and flipping architecture is equal the image size (NXN).The proposed architectures need internal memory less than image size as shown in Table 4.This will reduce the complexity of the system because the proposed architectures do not need the memory control unit.

Conclusions
This paper has presented a high performance and low memory raster-scan architectures for the 2-D lifting scheme DWT of the 5/3 filter and 9/7 filters.By merging predict and update steps into a single step.The proposed architectures use 3-arithmetic module in 1-level architectures and 9 arithmetic modules in multilevel architectures without any multipliers.The proposed scheme in this paper depends on lifting scheme theory because it is a desirable scheme from point of view of hardware performance and high throughput.The proposed 1-level and multilevel 2-D DWTs have been designed, implemented and tested.The comparison shows that the proposed architectures are also preferable from chip utilization point of view.Syst., Vol. 35, No. 2, pp. 155-178, 2003. [8]

Table 1
The most proper SNR values to determine the total number of bits.