First Edition
U. Chuks 6/1/2010
Copyright © 2010 by U. Chuks
Cover design by U. Chuks Book design by U. Chuks
All rights reserved.
No part of this book may be reproduced in any form or by any electronic or mechanical means including information storage and retrieval systems, without permission in writing from the author. The only exception is by a reviewer, who may quote short excerpts in a review.
U. Chuks Visit my page at http://www.lulu.com/spotlight/Debarge
Copyright © 2010 by U. Chuks
Cover design by U. Chuks Book design by U. Chuks
All rights reserved.
No part of this book may be reproduced in any form or by any electronic or mechanical means including information storage and retrieval systems, without permission in writing from the author. The only exception is by a reviewer, who may quote short excerpts in a review.
U. Chuks Visit my page at http://www.lulu.com/spotlight/Debarge
Contents Table of Contents Contents ................................................................ ............................ i ii Preface .................................................................................. ............. vi
Chapter 1 ............................................................................ .............. 1 Introduction ...................................................................................... 1 1.1 Overview of Digital Image Processing .................................. 1 1.1.1
Application Areas
.................................................... 2
1.2 Digital Image Filtering .......................................................... 2 1.2.1 Frequency Domain .......................................................... 2 1.2.2 Spatial Domain ................................................................. 4 1.3 VHDL Development Environment
......................................... 6
1.3.1 Creating a new project in ModelSim .............................. 7 ................. ............ 14 1.3.2 Creating a new project in Xilinx ISE ISE .................
1.3.3 Image file data in VHDL image processing
................. 18
1.3.4 Notes on VHDL for Image Processing ......................... 20 References .................................................................. ................. 23 Chapter 2 ............................................................................ ............ 25 Spatial Filter Hardware Architectures ............................................ 25 2.1 Linear Filter Architectures ............. ....................................... 25 2.1.1 Generic Filter architecture ............................................. 28 2.1.2 Separable Filter architecture ......................................... 30 2.1.3 Symmetric Filter Kernel architecture ............................ 32
iii
2.1.4 Quadrant Symmetric Filter architecture ..................... .. 34 2.2 Non-linear Filter Architectures ..................... ........................ 35 Summary ...................................................................................... 35 References .................................................................. ................. 36 Chapter 3 ............................................................................ ............ 37 Image Reconstruction .................................................................. 37 3.1 Image Demosaicking .......................................................... 37 3.2 VHDL implementation ........................................................... 44 3.2.1 Image Selection ............................................................. 49 Summary ...................................................................................... 57 References .................................................................. ................. 57 Chapter 4 ...................................... ................................................... 59 Image Enhancement ....................................................................... 59 4.1 Point-based Enhancement ................................................... 60 4.1.1 Logarithm Transform ..................................................... 60 4.1.2 Gamma Correction ........................................................ 62 4.1.3 Histogram Clipping
........................................................ 62
4.2 Local/neighbourhood enhancement .................................... 64 4.2.1 Unsharp Masking ............................................... ............ 64 4.2.2 Logarithmic local adaptive enhancement
.................... 65
4.3 Global/Frequency Domain Enhancement ........................... 65 4.3.1 Homomorphic filter ......................................................... 66 4.4 VHDL implementation ........................................................... 66 Summary ...................................................................................... 68 References .................................................................. ................. 68 Chapter 5 ...................................... ................................................... 70
iv
Image Edge Detection and Smoothing ......................................... 70 5.1 Image edge detection kernels .............................................. 70 5.1.1 Sobel edge filter .................................. ........................... 71 5.1.2 Prewitt edge filter ....................... .................................... 72 5.1.3 High Pass Filter .............................................................. 73 5.2 Image Smoothing Filters ...................................................... 74 5.2.1 Mean/Averaging filter ..................................................... 75 5.2.2 Gaussian Lowpass filter ................................................ 75 Summary ...................................................................................... 77 References .................................................................. ................. 77 Chapter 6 ...................................... ................................................... 78 Colour Image Conversion ............................................................... 78 6.1 Additive colour spaces
......................................................... 78
6.2 Subtractive Colour spaces ................................................... 79 6.3 Video Colour spaces
............................................................ 82
6.4 Non-linear/non-trivial colour spaces .................................... 91 Summary ...................................................................................... 95 References .................................................................. ................. 95 Circuit Schematics ........................................................................ .. 97 Creating Projects/Files in VHDL Environment ............................ 106 VHDL Code .................................................................... ............... 118 Index ...................................................................... ......................... 123
v
Preface The relative dearth of books regarding the know-how involved in implementing several algorithms in hardware was the motivating factor in writing this book, which was written for those with a prior understanding of image processing fundamentals who may or may not be familiar with programming environments such as MATLAB and VHDL. Thus, the subject is addressed very early on, bypassing the fundamental theories of image processing, which are better covered in several contemporary books given in the references sections in the chapters of this book.
By delving into the architectural design and implications of the chosen algorithms, the user is familiarized with the necessary tools to realize an algorithm from theory to software to designing hardware architectures.
Though the book does not discuss the vast theoretical mathematical processes underlying image processing, it is hoped that by providing working examples of actual VHDL and MATLAB code and simulation results of the software, that the concepts of practical image processing can be appreciated. This first edition of this book attempts to provide a working aid to readers who wish to use the VHDL hardware description language for implementing image processing algorithms from software.
vi
Chapter 1 Introduction Digital image processing is an extremely broad and ever expanding discipline as more applications, techniques and products utilize digital image capture in some form or the other. From industrial processes like manufacturing to consumer devices like video games and cameras, etc, image processing chips and algorithms have become ubiquitous in everyday life.
1.1 Overview of Digital Image Processing Image processing can be performed in certain domains using: Point (pixel-by-pixel) processing operations. Local /neighbourhood/window mask operations. Global processing operations.
A list of the areas of digital image processing includes but is not limited to: Image Acquisition and Reconstruction Image Enhancement Image Restoration Geometric Transformations and Image Registration Colour Image Processing Image Compression Morphological Image Processing Image Segmentation Object and Pattern Recognition
For the purposes of this book we shall focus on the areas of
1
Introduction Image Reconstruction, Enhancement and Colour Image Processing and the VHDL implementation of selected algorithms from these areas.
1.1.1 Application Areas
Image Reconstruction and Enhancement techniques are used in digital cameras, photography, TV and computer vision chips. Colour Image and Video Enhancement is used in digital video, photography, medical imaging, remote sensing and forensic investigation. Colour Image processing involves colour segmentation, detection, recognition and feature extraction.
1.2 Digital Image Filtering Digital image filtering is a very powerful and vital area of image processing, with convolution as the fundamental and underlying mathematical operation that underpins the process makes filtering one of the most important and studied topics in digital signal and image processing. Digital image filtering can be performed in the Frequency, Spatial or Wavelet domain and operating in any of these domains requires a domain transformation or changing the representation of a signal or image into a form in which it is easier to visualize and/or modify the particular aspect of the signal one wishes to analyze, observe or improve upon.
1.2.1 Frequency Domain Filtering in the frequency domain involves transforming an image into a representation of its spectral components and then using a frequency filter to modify and alter the image 2
Introduction by passing a particular frequency and suppressing or eliminating other unwanted frequency components. This frequency transform can involve the famous Fourier Transform or Cosine Transform. Other frequency transforms also exist in the literature but these are the most popular. The (Discrete) Fourier transform is another core component in digital image processing and signal analysis. The transform is built on the premise that complex signals can be formed from fundamental and basic signals when combined together spectrally. For a discrete image function, of M ×N dimensions with spatial coordinates, x and y , the DFT transform is given as;
(1.2.1-1)
And its inverse transform back to the spatial domain is;
(1.2.1-2)
Where is the discrete image function in the frequency domain with frequency coordinates, u and v , and j is the imaginary component. The basic steps involved in frequency domain processing are shown in Figure 1.2.1(i).
PreProcessing
Frequency Domain Filter
Fourier Transform
PostProcessing
Inverse Fourier Transform
Figure 1.2.1(i) - Fundamental steps of frequency domain filtering
3
Introduction The frequency domain is more intuitive due to the transformation of the spatial image information to frequency-dependent information. The frequency transformation makes it is easier to analyze image features across a range of frequencies. Figure 1.2.1(ii) illustrates the frequency transformation of the spatial information inherent in an image.
(a) (b) Figure 1.2.1(ii) – (a) Image in spatial domain (b) Image in frequency domain
1.2.2 Spatial Domain Spatial domain processing operates on signals in two dimensional space or higher, e.g. grayscale, colour and MRI images. Spatial domain image processing can be point-based, neighbourhood/kernel/mask or global processing operations. The spatial domain mask filtering involves convolving a small spatial filter kernel or mask around a local region of the image, performing the task repeatedly until the entire image is processed. Linear spatial filtering processes each pixel as a linear combination of the surrounding, adjacent neighbourhood pixels while non-linear spatial filtering uses statistical, set theory or logical if-else operations to process 4
Introduction each pixel in an image. Examples include the median and variance filters used in image restoration. Figure 1.2.2(i) show the basics of spatial domain processing where is the input image and is the processed output image.
I i ( x, y)
Preprocessing
Filter Function
Postprocessing
I o( x, y)
Figure 1.2.2(i) - Basic steps in spatial domain filtering
Spatial domain filtering is highly favoured in hardware image processing filtering implementations due to the practical feasibility of employing it in real-time industrial processes. Figure 1.2.2(ii) shows the plots of a frequency response of the filter and the spatial domain equivalent for high and low pass filters.
(a)
(b)
(c) (d) Figure 1.2.2(ii) – Low-pass filter in the (a) frequency domain (b) spatial domain and High-pass filter in the (c) frequency domain (d) spatial domain
5
Introduction This gives an idea of the span of the spatial domain filter kernels relative to their frequency domain counterpart. Since a lot of the algorithms in this book involve spatial domain filtering techniques and their implementation in hardware description languages (HDLs), emphasis will be placed on spatial domain processing throughout the book.
1.3 VHDL Development Environment VHDL is one of the languages for describing the behaviour of digital hardware devices and highly complex circuits such as FPGAs, ASICs and CPLDs. In other words, it is called a hardware description language (HDL) and others include ADA and Verilog, which is the other commonly-used HDL. VHDL is preferred because of its open source nature in that it is freely available and has a lot of user input and support helping to improve and develop the language further. There has been three or four language revisions of VHDL since its inception in the 80s, and have varying syntax rules. Tools for hardware development with VHDL include such popular software such as ModelSim for simulation and Xilinx ISE tools and Leonardo Spectrum for complete circuit design and development. With software environments like MathWorks MATLAB and Microsoft Visual Studio, image processing algorithms and theory can now be much more easily implemented and verified in software before being rolled out into physical, digital hardware. We will be using the Xilinx software and ModelSim software for Xilinx devices for the purposes of this book.
6
Introduction
1.3.1 Creating a new project in ModelSim Before proceeding, ModelSim software from Mentor Graphics must be installed and enabled. Free ModelSim software can be downloaded from internet sites like Xilinx website or other sources. The one used for this example is a much earlier version of ModelSim (version 6.0a) tailored for Xilinx devices. Once ModelSim is installed, run it and the window like the one in Figure 1.3.1(i) should appear.
Figure 1.3.1(i) – ModelSim starting window
Close the welcome page and click on File, select New -> Project as shown in Figure 1.3.1(ii). Click on the Project option and a dialog box appears as shown in Figure 1.3.1(iii). You can then enter the project name. However we would select an appropriate location to 7
Introduction store all project files to have a more organized work folder. Thus, click on Browse and the dialog box shown in Figure 1.3.1(iv) appears. Now we can navigate to an appropriate folder or create one if it doesn‟t exist. In this case, a previously created folder called „colour space converters‟ was created to store the project files. Clicking „OK‟ returns us to the „Create a New Project‟ dialog box and now we name the project as „Colour space converters‟ and click „OK‟.
Figure 1.3.1(ii) – Creating a new project in ModelSim
8
Introduction A small window appears for us to add a new or existing file as shown in Appendix B, Figure B1. Since we would like to add a new file for illustrative purposes, we create a file called „example_file‟ as in Figure B3 and it appears on the left hand side workspace as depicted in Figure B4. Then we add existing files by clicking the „Add Existing File‟ and navigate to the relevant files and select them as shown in Figure B5. They now appear alongside the newly created file as shown in Figure B6. The rest of the process is easy to follow. For further instruction on doing this, refer to Appendix B or the Xilinx sources listed at the end of the chapter. Now these files can be compiled before simulation as shown in the subsequent figures. Successful compilation is indicated by messages in green colours while a failed compilation messages are in red and will indicate the errors and the location of those errors like all smart debugging editors for software code development. Any errors are located and corrected and the files recompiled until there are no more syntax errors.
9
Introduction
Figure 1.3.1(iii) – Creating a new project
Once there are no more errors, the simulation of the files can begin. Clicking on the simulation tab will open up a window to select the files to be simulated. However, you must create a test bench file for simulation before running any simulation. A test bench file is simply a test file to evaluate your designed system to verify its correct functionality. You can choose to add several more windows to view the ports and signals in your design.
10
Introduction
Figure 1.3.1(iv) – Changing directory for new project
The newly created file is empty upon inspection, thus we have to add some code to the blank file. We start with including and importing the standard IEEE libraries needed as shown in Figure 1.3.1(v) at the top of the blank file. library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_arith.all; Figure 1.3.1(v) – Adding libraries
11
Introduction The “IEEE.std_logic_1164” and the “IEEE.std_logic_arith” are the standard logic and the standard logic arithmetic libraries, which are the minimum libraries needed for any VHDL logic design since they contain all the necessary logic functions. With that done, the next step would be to add the architecture of the system we would like to describe in this example file. Thus, the block diagram for the design we are going to implement in VHDL is shown in Figure 1.3.1(vi). clk rst
example_file
output_port
input_port Figure 1.3.1(vi) – Top level system description of example_file
This leads to the top level architecture description in VHDL code shown in Figure 1.3.1(vii). ----TOP SYSTEM LEVEL DESCRIPTION----entity example_file is port ( ---the collection of all input and output ports in top level Clk : in std_logic; ---clock for synchronization rst : in std_logic; ---reset signals for new data input_port : in bit; ---input port output_port : out bit ---output port ); end example_file;
Figure 1.3.1(vii) – VHDL code for black box description of example_file
12
Introduction The code in Figure 1.3.1(vii) is the textual or code description of the black box diagram shown in Figure 1.3.1(vi). The next step is to detail the actual operation of the system and the relationship between the input and output ports and this operation of the system is shown in the VHDL code in Figure 1.3.1(viii). ---architecture and behaviour of TOP SYSTEM LEVEL DESCRIPTION in more detail architecture behaviour of example_file is ---list signals which connect input to output ports here ---for example signal intermediate_port : bit := '0'; -initialize to zero begin ---start process(clk, rst) --process which is triggered by clock or reset pin begin if rst = '0' then --reset all output ports intermediate_port <= '0'; --initialize output_port <= '0'; --initialize elsif clk'event and clk = '1' then --operate on rising edge of clock intermediate_port <= not(input_port); -logical inverter output_port <= intermediate_port or input_port; --logical or operation end if; end process; --self-explanatory end behaviour; --end of architectural behaviour Figure 1.3.1(viii) – VHDL code for operation of example_file
13
Introduction The first line of code in Figure 1.3.1(viii) defines the beginning of the behavioural level of the architecture. The next line defines a signal or wire that will be used in connecting the input port to the output port. It has been defined as a single bit and initialized to zero. The next line indicates the beginning of a triggered process that responds to both the clock and reset signals. The i f…then…else…then statements indicate what actions and statements to trigger when the stated conditions are met. The actual logical operation starts at the rising edge of the clock and the signal takes on the value from the input port and inverts it while the output port performs the logical „or‟ operation on the inverted and non-inverted signals to produce the output value. Though this is an elaborate circuit design for a simple inverter operation, it was added to illustrate several aspects that will be recurring themes throughout the work discussed in the book.
1.3.2 Creating a new project in Xilinx ISE Like the ModelSim software, the software for evaluating VHDL designs in FPGA devices can be downloaded for free from FPGA Vendors like Leonardo Spectrum for Altera and Actel FPGAs or the Xilinx Project Navigator software from Xilinx . The Xilinx ISE version used in this book is 7.1. Once the software has been fully installed, we can then begin, so by opening the program, we get a welcome screen, just like that when we launched ModelSim. 14
Introduction Creating a project in the Xilinx ISE is similar to the process in ModelSim., however one would have to select the specific FPGA device for which the design is to be loaded. This is because the design must be physically mapped onto a physical device and the ISE software is comprised of special, complicated algorithms that emulate the actual hardware device to ensure that the design is safe and errorfree before being downloaded to an actual device. This saves on costly errors and damage to the device by incorrectly routed pins when designing for large and expensive devices like ASICs. A brief introduction to creating a project in Xilinx is shown in Figure 1.3.2(i) – 1.3.2(iv).
Figure 1.3.2(i) – Opening the Xilinx Project Navigator
15
Introduction We then click „OK‟ on the welcome dialog box to access the project workspace. Then click on File, select New Project as shown in Figure 1.3.2(ii) and enter a new name for the project as shown in Figure 1.3.2(iii). Then click „Next‟ and the next window shown in Figure 1.3.2(iv) prompts you to select the FPGA hardware device family your final design is going to be implemented in. We select the Xilinx Spartan 3 FPGA chip which is indicated by the chip number xc3s200 and the package is ft256 and the speed grade is -4. This device will be referred to as 3s200ft256-4 in the Project Navigator. We leave all the other options as they are since we will be using the ModelSim simulator and use the VHDL language for most of the work and only implementing the final design after correct simulation and verification. Depending on the device you are implementing your design on, the device family name will be different. However, the cost of the free software means that you do not have access to all the FPGA devices in every available device family in the software‟s database and thus will not be able to generate a programming file to be downloaded to an actual FPGA. The design process from theoretical algorithm description to circuit development and flashing to an FPGA device is a non-linear exercise as the design may need to be optimized and/or modified depending on the design constraints of the project.
16
Introduction
Figure 1.3.2(ii) – Creating a new project in Xilinx Project Navigator
Figure 1.3.2(iii) – Creating a new project name
17
Introduction
Figure 1.3.2(iv) – Selecting a Xilinx FPGA target device
Clicking Next to the next set of options allows you to add HDL source files, similar to ModelSim. The user can add them from here or just click through to create the project and then add the files manually like in ModelSim.
1.3.3 Image file data in VHDL image processing Figure 1.3.3 shows an image in the form of a text file, which will be read using the textio library in VHDL. A software program was written to convert image files to text in order to process them. The images can be converted to any numerical type including binary, hexadecimal (to save space). Integers were chosen for easy readability and debugging and for illustration of the concepts. After doing this, another software program is written to convert the text files back to images to be viewed. 18
Introduction Writing MATLAB code is the easiest and quickest way of doing this when working with VHDL. MATLAB also enables fast and easy prototyping of algorithms without re-inventing the wheel and being force to write each and every function needed to perform standard operations, especially image processing algorithms. This was why it was chosen over the .NET environment. Coding in VHDL is a much different experience than coding with MATLAB, C++ or JAVA since it is describing hardware circuits, which have to be designed as circuits rather than simply software programs. VHDL makes it much easier to describe highly complex circuits that would be impractical to design with basic logic gates and it infers the fundamental logical behaviour based on the nature of the operation you describe within the code. In a sense, it is similar to the Unified Modeling Language (UML) used to design and model large and complex objectoriented software algorithms and systems in software engineering. SIMULINK in MATLAB is also similar to this and new tools have been developed to allow designers with little to know knowledge of VHDL to work with MATLAB and VHDL code. However, the costs of these tools are quite prohibitive for the average designer with a small budget. FPGA system development requires a reasonable amount of financial investment and the actual prototype hardware chip cost can be quite considerable in addition to the software tools needed to support the hardware. Thus, with 19
Introduction these free tools and a little time spent on learning VHDL, designing new systems becomes much more fulfilling and gives the coder the chance to really learn about how the code and the system they are trying to build is going to work on a macro and micro level. Also, extensive periods debugging VHDL code will definitely make the coder a much better programmer because of the experience.
Figure 1.3.3 – image as a text file to be read into VHDL testbench
1.3.4 Notes on VHDL for Image Processing Most users of this book probably have had some exposure to programming or at least have heard of programming languages and packages like C++, JAVA, C, C#, Visual Basic, MATLAB, etc. But fewer people are aware of languages like VHDL and other HDLs like Verilog and ADA, which make it much easier to design larger and more complex circuits for digital hardware chips like ASICs, FPGAs, and CPLDs used in highly sophisticated systems and devices.
20
Introduction When using these fourth generation languages like C# and MATLAB, writing programs to perform mathematical tasks and operations is much easier and users can make use of existing libraries to build larger scale systems that perform more complex mathematical computations without thinking much about them. However, with languages like VHDL, performing certain mathematical computations like statistical calculations or even divisions require careful system design and planning if the end product is to realize a fully synthesizable circuit for downloading to an FPGA. In order words, floating point calculations in VHDL for FPGAs is a painful and difficult task for the uninitiated and those without developer and design resources. Some hardware vendors have developed their own specialized floating point cores but these come at a premium cost and are not for the average hardware design hobbyist. Floating point calculations take up a lot of system resources and along with operations like divisions, especially when calculating non-multiples of 2. Thus, most experienced designers prefer to work with fixed-point mathematical calculations. For example, if we choose to write a program to calculate the logarithm, cosine or exponential of signal values, this is usually taken care of in software implementation by calling a log, cosine or exponential function from the inbuilt library without even being aware of the algorithm behind the function. This is not the case with VHDL or hardware implementation. Though it is vital to note that VHDL has libraries for all these non-linear functions, the freely available functions are not synthesizable. This means that they cannot be realized in digital hardware and thus 21
Introduction hardware design engineers must devise efficient architectures for these algorithms or purchase hardware IP cores developed by FPGA vendors before they can be implement them on an FPGA. The first obvious route to building these type of functions is to create a look-up-table (LUT) consisting of pre-calculated entries in addressable memory (ROM) which can then accessed for a defined range of values. However, the size of the LUT can expand to unmanageable proportions and render the entire system inefficient, cumbersome and wasteful. Thus, a better approach would involve a mixture of some pre-computed values and the calculation of other values to reduce the memory size and increase efficiency. Thus, the LUT is a constant recurring theme in hardware design involving certain systems that perform intensive mathematical computation and signal processing. Usually, when a non-linear component is an essential part of an algorithm, the LUT becomes an alternative to implementing such crucial part of the algorithm or an alternative algorithm may have to be devised in accordance with error trade-off curves. This is the standard theme of research papers and journals on digital logic circuits. Newer and more expensive FPGAs now have a soft core chip built into them, enabling the designer the flexibility of apportioning soft computing tasks to the PC chip on the FPGA while devoting more appropriate device resources to architectural demands. However the other challenge of realtime reconfigurable computing and linking both the soft core and the hard core aspects of the system to work in tandem comes into play. 22
Introduction Most of the images used in this book are well known in the image processing community and were obtained from the University of South Carolina Signal and Image Processing Institute website and others from relevant research papers and online repositories.
References
R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2 ed.: Prentice Hall, 2002. R. C. Gonzalez, R. E. Woods, and S. L. Eddins, Digital Image Processing Using MATLAB: Prentice Hall, 2004. W. K. Pratt, Digital Image Processing, 4 ed.: Wiley-Interscience, 2007. U. Nnolim, “FPGA Architectures for Logarithmic Colour Image Processing”, Ph.D. thesis, University of Kent at Canterbury, Canterbury-Kent, 2009. MathWorks, "Image Processing Toolbox 6 User's Guide for use with MATLAB," The Mathworks, 2008, pp. 285 - 288. [6] Mathworks, "Designing Linear Filters in the Frequency Domain," in Image Processing Toolbox for use with MATLAB, T. Mathworks, Ed.: The Mathworks, 2008. Mathworks, "Filter Design Toolbox 4.5," 2009. Weber, "The USC-SIPI Image Database," University of South Carolina Signal and Image Processing Institute (USC-SIPI), 1981. Zuloaga, J. L. Martín, U. Bidarte, and J. A. Ezquerra, "VHDL test bench for digital image processing systems using a new image format." Cyliax, "The FPGA Tour: Learning the ropes," in Circuit Cellar online, 1999. T. Johnston, K. T. Gribbon, and D. G. Bailey, "Implementing Image Processing Algorithms on FPGAs," in Proceedings of the Eleventh Electronics New Zealand Conference (ENZCon‟04), Palmerston North, 2004, pp. 118 - 123. EETimes, "PLDs/FPGAs," 2009. Digilent, "http://www.digilentinc.com," 2009. E. R. Davies, Machine Vision: Theory, Algorithms, Practicalities 3rd ed.: Morgan Kaufmann Publishers, 2005. Xilinx, "XST User Guide ": http://www.xilinx.com, 2008.
23
Introduction
www.xilinx.com, "FPGA Design Flow Overview (ISE Help)." vol. 2008: Xilinx, 2005.
24
Chapter 2 Spatial Filter Hardware Architectures Prior to the implementation of the various filters, it is necessary to lay the groundwork for the design of spatial filter hardware architectures in VHDL.
2.1 Linear Filter Architectures Using spatial filter kernels for image filtering applications in hardware systems has been a standard route for many hardware design engineers. As a result, various architectures in the spatial domain exist in company technical reports, academic journals and conferences papers dedicated to digital FPGA hardware-based image processing. This is not surprising because of the myriad of image processing applications that incorporate image filtering techniques. Such applications include but are not limited to image contrast enhancement/sharpening, demosaicking, restoration/noise removal/deblurring, edge detection, pattern recognition, segmentation, inpainting, etc. Several authors have published papers involving implementing a myriad of algorithms involving spatial filtering hardware architectures for FPGA platforms performing different tasks or used as add-ons for even more complex and sophisticated processing operations..
25
Linear Spatial filter architectures A sample of application areas in industrial processes include the detection of structural defects in manufactured products using real-time imaging and edge detection techniques to remove damaged products from the assembly line. Though frequency (Fourier Transform) domain filtering may be faster for larger images and optical processes, spatial filtering using relatively small kernels and make several of these processes feasible for physical, real-time applications and reduce computational costs and resources in FPGA digital hardware systems. Figure 2.1(i) shows one of the essential components of a spatial domain filter, which is a window generator for a 5 x 5 kernel for evaluating the local region of the image. Line In 1
FF
FF
FF
FF
FF
Line Out 1
Line In 2
FF
FF
FF
FF
FF
Line Out 2
Line In 3
FF
FF
FF
FF
FF
Line Out 3
Line In 4
FF
FF
FF
FF
FF
Line Out 4
Line In 5
FF
FF
FF
FF
FF
Line Out 5
Figure 2.1(i) – 5×5 window generator hardware architecture
The boxes represent the flip flops (FF) or delay elements with each box providing one delay. In digital signal processing notation, a flip flop is represented in the zdomain by and in the discrete time domain as , 26
Linear Spatial filter architectures where x would be the delayed signal. The data comes in from the left hand side of the unit and each line is delayed by 5 cycles. For a 3 x 3 kernel, there would be three lines and each would be delayed by 3 cycles. Figure 2.1(ii) shows the line buffer array unit which consists of long shift registers composed of several flip flops. Each line buffer is set to the length of one row of the image. Thus, for a 128 x 128 greyscale image with 8 bits per pixel, each line buffer would be 128 wide and 8 bits deep. Data_in
Line Buffer1
Line out1
Line Buffer2
Line out2
Line Buffer3
Line out3
Line Buffer4
Line out4
Line Buffer5
Line out5
Figure 2.1(ii) – Line buffer array hardware architecture
The rest of the architecture would include adders, dividers, and multipliers or look up tables. These are not shown as they are much easier to understand and implement. The main components of the spatial domain architectures are the window generator and line delay elements. The delay elements can be built from First in First out (FIFO) or shift register components for the line buffers. 27
Linear Spatial filter architectures The architecture of the processing elements is heavily determined by the mathematical properties of the filter kernels. For instance the symmetric or separable nature of certain kernels is incorporated in the hardware design to reduce multiply-accumulate operations. There are mainly three kinds of filter kernels, namely symmetric, separablesymmetric and non-separable, non-symmetric kernels. To understand the need for this clarification, it is necessary to discuss the growth in mathematical operations of image processing algorithms implemented in digital hardware.
2.1.1 Generic Filter architecture In the standard spatial filter architectures, the filter kernel is defined as is and each coefficient of the defined kernel has its own dedicated multiplier and corresponding image window coefficient. Thus, this architecture is flexible for a particular defined size of kernel and any combination of coefficient values can be loaded to this architecture without modifying the architecture in any way. However, this architecture is inefficient when a set of coefficients in the filter have the same values and redundancy grows as the number of matching coefficients increases. It also becomes computationally complex as filter kernel size increases since more processing elements will be needed to perform the full operation on a similarly sized image window. The utility of the filter is limited to small kernel sizes ranging from 3×3 to about 9×9 dimensions. Beyond this, the definition and instantiation of the architecture and its coefficients become unwieldy, especially in digital hardware description languages used to program the hardware devices. Figure 2.1.1 depicts an example of generic 5×5 filter kernel architecture. 28
Linear Spatial filter architectures
c0
c1
c2
c3
c4
c5
c6
c7
c8
c9
Data_in
Line Buffer
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF c13
Line Buffer
FF
FF
FF
FF
∑
Data_out
∑
Data_out
×
×
×
×
×
×
×
×
×
FF c12
Line Buffer
×
FF c11
Line Buffer
×
FF c10
Line Buffer
×
FF c14
c15
c16
c17
c18
c19
c20
c21
c22
c23
c24
×
∑
Data_out
∑
Data_out
∑
Data_out
×
×
×
×
×
×
×
×
×
×
×
×
Figure 2.1.1 – Generic 5×5 spatial filter hardware architecture
The 25 filter coefficients range from c0 to c24 and are multiplied with the values stored in the window generator grid made up of flip flops (FF). These coefficients are weights, which determine the extent of the contribution of 29
Linear Spatial filter architectures the image pixels in the final convolution output. The partial products are then summed in the adder blocks. Not shown in the diagram is another adder block to sum all the five sums of products. The final sum is divided by a constant value, which is usually defined as a multiple of 2 for good digital design practice.
2.1.2 Separable Filter architecture The separable filter kernel architectures are much more computationally efficient where applicable. However, these are more suited to low-pass filtering using Gaussian kernels (which have the separability property). The architecture reduces a two dimensional N × N sized filter kernel to two, one dimensional filters of length N . Thus a one-dimensional convolution operation (which is much easier than 2-D convolution) is performed followed by multiplication operations. The savings on multiply-accumulate operations as a result in the reduction in the number of processing elements demanded by the architecture can really be truly appreciated when designing very large filter convolution kernel sizes. Due to the fact that spatial domain convolution is more computationally efficient for small filter kernel sizes, separable spatial filter kernels further increase this efficiency (especially for large kernels built as with a generic filter architecture implementation). Figure 2.1.2 depicts an example of separable filter kernel architecture for a 5 × 5 spatial filter now reduced to 5 since the row and the column filter coefficients are the same with one 1-D filter being the transpose of the other.
30
Linear Spatial filter architectures
Figure 2.1.2 – Separable 5×5 spatial filter hardware architecture
31
Linear Spatial filter architectures Observing the diagram in Figure 2.1.2, it can be seen that the number of processing elements and filter coefficients have been dramatically reduced in this filter architecture. For example, the 25 coefficients in the generic filter architecture have been reduced to just 5 coefficients which are reused.
2.1.3 Symmetric Filter Kernel architecture Symmetric filter kernel architectures are more suited to high-pass and high-frequency emphasis (boost filtering) operations with equal weights and reduce the number of processing elements, thereby reducing the number of multiply-accumulate operations. A set of pixels in the image window of interest are added together and then the sum is multiplied by the corresponding coefficient, which has the same value for those particular pixels in their respective, corresponding locations. Figure 2.1.3(i) shows a Gaussian symmetric high-pass filter generated using the windowing method while Figure 2.1.3(ii) depicts an example of symmetric filter kernel architecture
Figure 2.1.3(i) – Frequency domain response of symmetric Gaussian high-pass filter obtained from spatial domain symmetric Gaussian with windowing method
32
Linear Spatial filter architectures
Figure 2.1.3(ii) – 5 x 5 symmetric spatial filter hardware architecture
33
Linear Spatial filter architectures
2.1.4 Quadrant Symmetric Filter architecture The quadrant symmetric filter is basically one quadrant (or a quarter) of a circular symmetric filter kernel and rotated 360 degrees. The hardware architecture is very efficient since it occupies a quarter of the space normally used for a full filter kernel. To summarize the discussion of spatial filter hardware architectures, it is necessary to present a comparison of the savings of hardware resources with regards to reduced multiply-accumulate operations. For an N × N spatial filter kernel, N × N multiplications and (N × N )-1, additions are required. For example, for a 3 × 3 filter, 9 multiplications and 8 additions are needed for each output pixel calculation, while for a 9×9 filter, 81 multiplications and 80 additions are needed per output pixel computation. Since multiplications are costly in terms of hardware, designs are geared towards reducing the number of multiplication operations or eliminating them entirely. Table 2.1.4 gives a summary of the number of multiplication and addition operations per image pixel required for varying filter kernel sizes using different filter architectures.
34
Linear Spatial filter architectures Kerne l size
*/pixel (GFKA)
*/ pixel (SFK A) 6
+/pixe l (SFK A) 4
*/ pixel (Sym FKA)
+/ pixel (Sym FKA)
9
+/ pixel (GFK A) 8
3×3
4/3
8
5×5
25
24
10
8
6/5
24
7×7
49
48
14
12
8/7
48
9×9
81
80
18
16
10/9
80
13×13
169
168
26
24
14/13
168
27×27
729
728
54
52
28/27
728
31×31
961
960
62
60
32/31
960
Table 2.1.4 – MAC operations and filter kernel size and type
KEY */pixel – Multiplications per pixel +/pixel – Additions per pixel GFKA – Generic Filter Kernel Architecture SFKA – Separable Filter Kernel Architecture Sym FKA – Circular Symmetric Filter Kernel Architecture
2.2 Non-linear Filter Architectures The nature of non-linear filter architectures is more complex than that of linear filters and depends on the algorithm or order statistics used in the algorithm. Since most of the algorithms covered in this book involve linear filtering, we focus more on linear spatial domain filtering.
Summary In this section, we discussed several linear spatial filter hardware architectures used for implementing algorithms in FPGAs using VHDLs and analyzed the cost savings of each architecture with regards to use of processing elements in hardware. 35
Linear Spatial filter architectures
References
U. Nnolim, “FPGA Architectures for Logarithmic Colour Image Processing”, Ph.D. thesis, University of Kent at Canterbury, Canterbury-Kent, 2009. Cyliax, "The FPGA Tour: Learning the ropes," in Circuit Cellar online, 1999. E. Nelson, "Implementation of Image Processing Algorithms on FPGA Hardware," in Department of Electrical Engineering. vol. Master of Science Nashville, TN: Vanderbilt University, 2000, p. 86. T. Johnston, K. T. Gribbon, and D. G. Bailey, "Implementing Image Processing Algorithms on FPGAs," in Proceedings of the Eleventh Electronics New Zealand Conference (ENZCon‟04), Palmerston North, 2004, pp. 118 - 123. S. Saponara, L. Fanucci, S. Marsi, G. Ramponi, D. Kammler, and E. M. Witte, "Application-Specific Instruction-Set Processor for Retinex-Like Image and Video Processing," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 54, pp. 596 - 600, July 2007. EETimes, "PLDs/FPGAs," 2009. Google, "Google Directory," in Manufacturers, 2009. Digilent, "http://www.digilentinc.com," 2009. E. R. Davies, Machine Vision: Theory, Algorithms, Practicalities 3rd ed.: Morgan Kaufmann Publishers, 2005. Xilinx, "XST User Guide ": http://www.xilinx.com, 2008. www.xilinx.com, "FPGA Design Flow Overview (ISE Help)." vol. 2008: Xilinx, 2005. Mathworks, "Designing Linear Filters in the Frequency Domain," in Image Processing Toolbox for use with MATLAB, T. Mathworks, Ed.: The Mathworks, 2008. Mathworks, "Filter Design Toolbox 4.5," 2009.
36
Chapter 3 Image Reconstruction The four stages of image retrieval from camera sensor acquisition to display device comprise of Demosaicking, White/Colour Balancing, Gamma Correction and Histogram Clipping. The process of interest in this chapter is the demosaicking stage and the VHDL implementation of the demosaicking algorithm will also be described. The steps of colour image acquisition from the colour filter array are shown in Figure 3.
Demosaicking
Colour Balancing
Gamma Correction
Histogram Clipping
Figure 3 – Image acquisition process from camera sensor
3.1 Image Demosaicking The process of demosaicking attempts to reconstruct a full colour image from incomplete sampled colour data from an image sensor overlaid with a colour filter array (CFA) using interpolation techniques. The Bayer array is the common type of colour filter array used in colour sampling for image acquisition. The other methods of colour image sampling are the Tri-filter, and Fovean sensor. References to these other methods are listed at the end of the chapter. Before we delve deeper into the mechanics of demosaicking, it is necessary to describe the Bayer filter 37
Image Demosaicking array. This grid system involves a CCD or CMOS sensor chip with M columns and N rows. A colour filter is attached to the sensor in a certain pattern. For example, the colour filters could be arranged in a particular pattern as shown by the Bayer Colour Filter Array architecture shown in Figure 3.1(i). G
R
G
R
G
B
G
B
G
B
G
R
G
R
G
B
G
B
G
B
G
R
G
R
G
Figure 3.1(i) – Bayer Colour Filter Array configuration
Where R, G, B stands for the red, green and blue colour filters respectively and the sensor chip produces an M × N array. There are two green pixels for every red and blue pixel in a 2x2 grid because the CFAs are designed to suit the human sensitivity to green light. The demosaicking process involves splitting a colour image into its separate colour channels and filtering with an interpolating filter. The final convolution results from each channel are recombined to produce the demosaicked image.
38