Proceedings of the National Conference on
Research and Development in Hardware & Systems
(CSI-RDHS 2008)
June 20-21, 2008, Kolkata, India
FPGA: An Efficient And Promising Platform For Real-Time Image Processing Applications
Sparsh Mittal, Saket Gupta and S. Dasgupta
E&C Department, Indian Institute of Technology Roorkee, sparsuch@iitr.ernet.in
E&C Department, Indian Institute of Technology Roorkee, freshuce@iitr.ernet.in
E&C Department, Indian Institute of Technology Roorkee, sudebfec@iitr.ernet.in
Abstract
Digital image processing (DIP) is an ever growing area with a variety of applications including medicine, video surveillance, and many more.. To implement the upcoming sophisticated DIP
algorithms and to process the large amount of data captured from sources such as satellites or medical instruments, intelligent high speed real-time systems have become imperative. Image processing algorithms implemented in hardware (instead of software) have recently emerged as the most viable solution for improving the performance of image processing systems. This paper reviews the relative merit of FPGA over softwares and DSPs as a platform for implementation of DIP applications. Our goal is to familiarize applications programmers with the state of the art in compiling high-level programs to FPGAs, and to survey the relevant research work on FPGAs. The outstanding features which FPGAs offer such as optimization, high computational density, low cost etc, make them an increasingly preferred choice of experts in image processing field today.
Keywords: Digital Image Processing (DIP), FPGA, Hardware Descriptive Language, PC
1. Introduction
Digital image processing [1] is an ever growing area with variety of applications in different fields. As image sizes and bit depths grow larger, software has become less useful in the video-processing realm. Generally even specialized image processing programs running on PCs cannot adequately process large amounts of high-resolutions treaming data, since PC processors are made to be for general use and hence are unable to efficiently implement many current sophisticated DIP algorithms. Also, to process the large amount of data captured from satellites and ground-based detection systems or 3D data from medical instruments, intelligent high speed real-time systems have become imperative, which may process data before passing it to the human analyst. The requirements demand a system ideally having characteristics viz. high performance, flexibility, easy up gradability, low development cost, and a migration path to lower cost as the application matures and volume increases.
Recently, image processing algorithms implemented in hardware have emerged as the most viable solution for improving the performance of image processing systems.
The introduction of reconfigurable devices and system level hardware programming languages has further accelerated the design of DIP in hardware. FPGAs are often used as implementation platforms for real-time image processing applications. A Field Programmable Gate Array (FPGA) is a programmable (or reconfigurable) device [2] in which the final logic structure can be directly configured by the end user. An FPGA consists of an array of uncommitted elements that can be programmed or interconnected (or configured) according to a user’s specification in a virtually
limitless number of ways. Being reprogrammable and easily upgradable, an FPGA offers a compromise between theflexibility of general-purpose processors and the hardware-based speed of ASICs.
In this paper we survey implementation of image processing applications on FPGAs with an emphasis on the salient features of FPGAs. The rest of the paper is organized as follows. Section 2 highlights limitations of other implementation alternatives and serves to set the stage for explaining the advantage of FPGAs. In section 3 weevaluate FPGAs on several relevant parameters. Section 4
summarizes prior research in the FPGA implementation of image processing algorithms. Finally, section 5 concludes the work and gives directions for future work.
2. Setting the stage: Limitations of other platforms
2.1 Drawbacks of DSPs and ASICs
System architecture choices for hardware implementation include standard cell ASICs, ASSPs, and programmable solutions such as digital signal processor (DSP) or media processors and FPGAs. Each of these has advantages and disadvantages. ASSPs are inflexible, expensive, and time-consuming to develop. Full custom ASIC design offers highest performance, but they cannot be changed. Hence any error in the hardware design existing after fabrication wastes the entire product. DSPs are specialized microprocessors, typically programmed in C, or with assembly code for improved performance. It is well suited to extremely complex math intensive tasks such as image processing. Knowledge of hardware design is still required, but its advantage is that the learning curve is much lower than other design choices. However, powerful DSPs are costly and their corresponding software applications may not match the performance of hardware.
The reconfigurable computing technology in FPGAs [3], along with many other features of FPGAs make them ideally suited for real-time video processing. Hardware design techniques such as parallelism and pipelining techniques can be developed on an FPGA [4], which is not possible in dedicated DSP designs. The primary reason most engineers choose an FPGA over a DSP is driven by the MIPS
requirements of an application [5]. The need for high performance rules out processor-only architectures. A state-of-the-art DSP running at 1 GHz cannot perform H.264 HD decoding or H.264 HD encoding. FPGAs are the only programmable solutions able to tackle this problem.
2.2 Drawbacks of PCs
Software implementation of most image processing algorithms has several limitations and hence it is quite difficult to achieve. Complex operations have to be realized by a large sequence of simple operations, which can only be implemented serially. The range of available operations is limited to common basic operations. The constraint of real-time processing introduces a number of additional complications. These include such issues as limited memory bandwidth, resource conflicts, and the need for pipe lining.
The CPU is burdened with additional tasks, such as OS requests, user interaction, etc., which is a major drawback in the context of real-time processing. At real-time video rates of 25 frames per second a single operation performed on every pixel of a 768 by 576 color image (PAL frame)equates to 33 million operations per second. (excluding the overhead of storing and retrieving pixel values.) Many
image-processing applications require that several operations be performed on each pixel in the image resulting in an even large number of operations per second. As a result it is difficult to meet hard real time requirements with softwares [3].
2.3 The advantages of FPGAs
The salient features of FPGAs that make them superior in speed, over conventional general-purpose hardware like Pentiums is their greater I/O bandwidth to local memory, pipe lining, parallelism and availability of optimizing compiler. Complex tasks, which involve, multiple image operators, run much faster on FPGAs than on Pentiums, in fact, ref [6] report 800 time speed up by FPGA using SA-C. There are several reasons for such large speed up which FPGAs have over PCs (see section 3). In comparison to an FPGA, hardware such as Pentium runs at memory speed, not at cache speed. So, even running at much higher clock frequency and having the facility of cache memory, it responds much slower than a comparable FPGA. Frequency of operation in hardware such as Pentium can be increased up to a certain extent to increase the performance or the required data rate to process the image data, but increasing the frequency above certain limits causes system level and board level issues that become a bottle neck in the design. However, the difference in speeds also depends on the
particular application involved [6]. In particular, complex image processing applications do enough processing per pixel to be computation bounded, rather than I/O bounded. Here, FPGAs dramatically outperform Pentiums by factors of up to 800. Simpler image processing operators tend to be
I/O bounded. In these cases, FPGAs still outperform Pentiums because of their greater I/O capabilities, but by smaller margins (factors of 10 or less).
3. Evaluation of FPGAs as Platform for Developing DIP Applications
3.1 Advantages of FPGAs
Many advantages of FPGAs make them a preferred choice of implementation in DIP realm. Based on our survey, we find many significant features which are as follows:-
1. A characteristic of many image-processing methods is the multiple iterative processing of data sets
such as four stages of canny edge detector, which require performing multiple passes over the image. These steps, which have to be performed sequentially on a general-purpose computer, can be fused in one pass in FPGA, as their structure is able to exploit spatial and temporal parallelism. FPGA can perform multiple image windows in parallel and multiple operations within one window also in parallel.
2. By employing several optimizations techniques such as Loop Fusion, Loop Unrolling etc efficient
usage of FPGA resources and speed-up in implementations is possible by avoiding many redundant operations.
3. FPGAs are capable of parallel I/O, which allows them to perform read (from memory), process and write (to memory) simultaneously. Many operations such as convolutions, finding square root etc can be executed much faster by using pipe lining and parallelism.
4. All of the logic in an FPGA can be rewired, or reconfigured, with a different design as often as the designer likes. This type of architecture allows a large variety of logic designs dependent on the processor’s resources), which can be interchanged for a new design as soon as the device can be reprogrammed.
5. FPGAs provide the flexibility to reprogram and upgrade to new standards [7]. Easy Upgradeability ensures that FPGAs solutions evolve quickly with no risk of obsolescence.
6. The reusability and efficiency of hardware implemented on FPGA, is especially useful in
developing Image Processing IP (intellectual property) as it allows an efficient system in terms of cost and performance. Possibility of quick integration of the IP blocks without a need of modification or repetition of verification cycle [7] simplifies debugging and thus greatly reduces the time-to-market.
7. Because of its LUT based architecture, some convolution masks (such as constant coefficient
multiplier or KCMs) can be implemented very efficiently [8].
8. High computational density in FPGA together with a low development costs allows even the lowest volume consumer market to bear the development costs of FPGAs. In fact, compared to ASICs, FPGAs are especially useful in a lower volume type of application. As authors in [9] report with low cost FPGAs high-definition solutions can now be implemented for less than US$1.00 per 1,000 logic elements (LEs)
9. The wide range of FPGAs made available from various companies like Xilinx, Altera, etc fulfill the performance requirement of many of the applications, such as display products [10].
3.2 Limitations of FPGAs
On the other hand, these are the limitations of FPGAs for image processing applications.
1. Currently there are many overheads in FPGA design. This include data transfer times which is the time required to upload (or download) the data, from (or to) reconfigurable processor to (or from) host; time for reconfiguration 2. FPGAs are excellent choice only for those algorithms which don’t use floating -point mathematics or complex mathematics. Division, direct multiplication etc are very complex and
expensive on FPGA. Hence, the designers have to reformulate their algorithms and avoid complex
mathematics (e.g. implementing a divide by 8 using the bit shifting method of division instead of a divide by9).
3. Current FPGAs cannot be reconfigured quickly as the process of modifying or combining FPGA circuits is also laborious.
4. The size of memory that can be implemented using standard logic cells on an FPGA is limited, as
implementing memory is an inefficient use of FPGA resources.
5. Routines where complex tasks cannot be broken down into simpler tasks must perform a more serial method of processing, which is not entirely efficient with FPGAs.
6. Hardware offers much greater speed than a software implementation, but it comes with a price of increased development time inherent in creating a hardware design. Most software designers are familiar with C, but in order to develop a hardware system, one must either learn a hardware design language such as VHDL or Verilog, or use a software-to-hardware conversion scheme, such as Streams-C [11], which converts C code to VHDL, or MATCH [12], which converts MATLAB code to VHDL
4. Implementation of DIP applications on FPGAs
A lot of research has been recently done on utilizing FPGAs as development platform for DIP algorithms. Here we present the related work in the area. The authors in [6] have developed a high level language (called SA-C) for expressing DIP algorithms, and an optimizing compiler that compiles the high-level program written on SA-C and runs them on FPGAs. SA-C is a single-assignment dialect of the C programming language designed to exploit many features of FPGAs [13, 14]. To compare the performance of FPGAs and the Pentium processors, they have implemented SA-C programs compiled to a Xilinx FPGA to equivalent programs running on an 800 MHz Pentium III. For 8 common DIP routines implemented on both these platforms, FPGAs offer 8 to 800 times speed-ups over the Pentium. Experiment results and analysis of various issues such as pipe lining, parallelism, optimizations, memory, I/O etc, brings out many prominent features of the FPGAs, relevant to image processing realm. In [15] they present performance numbers for several image-processing routines such as Gaussian, max and Laplace filter etc, written in SA-C.
The authors in [16] present a pipe lined architecture of image processing algorithms like median filter, basic morphological operators, convolution and edge detection implemented on FPGA. The hardware modeling is done with the Handel-C language. Moreover, in their work [17], the performance and efficiency of Handel-C language on image processing algorithms is compared at simulation level with another C-based system level language called SystemC and at synthesis level with the industry standard Hardware Descriptive language (HDL), Verilog. Comparison parameters at simulation level include, man-hours for implementation, compile time and lines of code. Comparison parameters at synthesis level include logic resources required, maximum frequency of operation and execution time.
The author in [18] implemented the Rank Order Filter, Erosion, Dilation, Opening, Closing and Convolution algorithms using VHDL and MATLAB on two FPGA platforms. He also integrates the FPGA algorithms into the modeling environment called ACS.
The authors in [19] address the issue of mapping algorithms to hardware. They present some general techniques such as look up tables, raster based methods etc for dealing with expressions that inefficiently map to hardware. They discuss the effects of and means to deal with the timing, bandwidth,
and resource constraints under different processing modes of the system. The efficient mapping of three different types of general operations viz. point, window and global operations is discussed in relation to the hardware constraints.
The authors in [3] report the speed ups that FPGAs offer on image processing methods (such as image denoising and restoration, segmentation, morphological shape recovery
etc.) on 2D and 3D images In computer vision and image processing, FPGAs havealready been used to accelerate real-time point tracking [20], stereo [21], color-based object detection [22], and
video and image compression [23] (see also [17]). Crookes presented a hardware FPGA implementation of image filtering to increase the speed [24, 25]. The authors in [26] applied three 2-input bubble sorting algorithm to obtain a triple input sorter and implemented it in FPGA. This algorithm can be utilized to obtain the maximum, middle, and minimum values and hence can be used to realize the 2-D sorting.
5. Conclusion and Future work
With a multi billion-dollar market per year, increases in FPGA speeds and capacities have followed or exceeded Moore’s law for the last several years. This survey clearly
demonstrates the outstanding features of FPGAs which make them seem very promising choice for the researchers in the field. FPGAs are great fits for video and image
processing applications, such as broadcast infrastructure, medical imaging, HD videoconferencing, video surveillance, and military imaging.
The greater future potential lies in including FPGAs on-chip with the main processor, giving the benefit of general-purpose acceleration without the communication bottleneck created by placing the FPGA in a co-processor. At the moment, applications written directly in VHDL are more efficient (albeit more difficult to develop), but we expect future improvements to the compiler to narrow this gap.
Also the disadvantages of FPGAs need to be addressed to make them more efficient and useful.
6. Reference
[1] Castleman, K. R. Digital Image Processing, Upper Saddle River, New Jersey: Prentice-Hall, 1996.
[2] Stephen D.Brown, R.J. Francis, J.Rose Z.G.Vranesic. Filed
Programmable Gate Arrays, 1992.
[3] S. Klupsch, et al. “Real Time Image Processing based on
Reconfigurable Hardware Acceleration” Available www.mpi-inf.mpg.de/~strzodka/papers/public/KlErHu_02fpga.pdf
[4] Digital Video & Image processing Xilinx solutions for
Broadcast Chain. Xilinx Ltd 2002.
[5] Telikepalli, A. Fiset, E. “Platform FPGA design for high-performance DSP”. Available http://www.lyrtech.com/DSP-development/technical_lib/form1_wp.php
[6] Bruce A. et al. “Accelerated Image Processing on FPGAs”,
IEEE Transactions on Image Processing, Dec. 2003
Volume 12, Issue12, pp 1543- 1551
[7] “Image Processing Applications on New Generation FPGAs”
by Rahul V. Shah, eInfochips Ltd.,March 7, 2006
[8] Chi-Jeng Chang, Pei-Yung Hsiao, Zen-Yi Huang. “Integrated
Operation of Image Capturing and Processing in FPGA”,
International Journal of Computer Science and Network Security,
Vol. 6 No. 1 pp. 173-180
[9] “Video and Image Processing Design Using FPGAs” Altera
white paper Available www.altera.com/literature/wp/wp-ideo0306.pdf
[10] High-Performance Image Processing on FPGAs By Michael
Tusch Available
www.xilinx.com/publications/xcellonline/xcell_57/xc_pdf/p042-044_57-apical.pdf
[11] M. Gokhale, “Stream oriented FPGA computing in streams-C,” in Proc. IEEE Symp. Field-Programmable Custom Computing
Machines, Napa, CA, 2000.
[12] P. Banerjee, “A MATLAB compiler for distributed,
hetergeneous, reconfigurable computing systems,” in IEEE Symp.
Field-Programmable Custom Computing Machines, Napa, CA,
2000.
[13] A. P. W. Böhm, et al, “Mapping a single assignment
programming language to reconfigurable systems,”
Supercomputing, vol. 21, pp. 117–130, 2002.
[14] http://www.cs.colostate.edu/~cameron/
[15] Compiling and Optimizing Image Processing Algorithms for
FPGA’s Bruce Draper, Walid Najjar, Wim Böhm, Jeff Hammes,
Bob Rinker, Charlie Ross, Monica Chawathe, José Bins
[16] Daggu V. R.and Venkatesan M. “Design and Implementation
of an Efficient Reconfigurable Architecture for Image Processing
Algorithms using Handel-C”
www.uweb.ucsb.edu/~shahnam/CannyAlgorithmImplementation.p
df
[17] Daggu Venkateshwar Rao, et al “Implementation and
Evaluation of Image Processing Algorithms on Reconfigurable
Architecture using C-based Hardware Descriptive Languages”
Available www.gbspublisher.com/ijtacs/1002.pdf
[18] Nelson, A.. Implementation of Image Processing Algorithms
on FPGA Hardware. Masters Thesis, Graduate School of
Vanderbilt University, 2000.
[19] Johnston, C. T., Gribbon, K. T., and Bailey, D. G.,
“Implementing Image Processing Algorithms on FPGAs,”
Proceedings of the Eleventh Electronics New Zealand Conference,
Palmerston North, New Zealand, pp. 118-123, Nov 2004.
[20] A. Benedetti and P. Perona, “Real-time 2-D feature detection
on a reconfigurable computer,” in Proc. IEEE Conf. Computer
Vision and Pattern Recognition, Santa Barbara, CA, 1998.
[21] J. Woodfill and B. v. Herzen, “Real-Time stereo vision on the
PARTS reconfigurable computer,” in Proc. IEEE Symp. Field-Programmable Custom Computing Machines, Napa, CA, 1997.
[22] D. Benitez and J. Cabrera, “Reactive computer vision system
with reconfigurable architecture,” in Proc. Int. Conf. Vision
Systems, Las Palmas de Gran Canaria, 1999.
[23] R.W. Hartenstein, J. Becker, R. Kress, H. Reinig, and K.
Schmidt, “A reconfigurable machine for applications in image and
video compression,” in Proc. Conf. Compression Technologies
and Standards for Image and Video Compression, Amsterdam,
The Netherlands, 1995.
[24] Crookes D. et al., “Design and implementation of a high level
programming environment for FPGA-based image processing,”
Vision, Image and Signal Processing, IEE Proceedings, vol. 147,
Issue: 4 , Aug, 2000, pp. 377 -384.
[25] Bouridane A., Crookes D., Donachy P., Alotaibi K., and
Benkrid K., “A high level FPGA-based abstract machine for image
processing,” Journal of Systems Architecture vol. 45, Issue: 10,
April, 1999, pp. 809-824.
[26] Maheshwari R., Rao S.S.S.P., and Poonacha P.G., “FPGA
implementation of median filter,” Tenth International Conference
on VLSI Design, June, 1997, pp. 523 -524.
No comments:
Post a Comment