A Performance Study of Applying CUDA-Enabled GPU in Polar Hough Transform for Lines

Published on July 2019 | Categories: Documents | Downloads: 160 | Comments: 0 | Views: 14544
of 4
Download PDF   Embed   Report

Comments

Content

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617 https://sites.google.com/sit https://sites.google.com/site/journalofcomputing e/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

86

A Performance Study of Applying CUDAEnabled GPU in Polar Hough Transform Transform for Lines Ghaith Makey, Kanj Al-Shufi and Mustafa Sayem El-Daher Abstract —  — With the advent of modern GPGPUs which can be used efficiently in general purpose applications, the use of multithreaded cores and high memory bandwidths of CUDA-enabled GPGPUs in digital image processing and features extraction has been raised to a new level. This paper uses NVIDIA’s CUDA language to calculate polar Hough transform for lines which is an important method for image features extraction; a performance study of this implementation and a comparison with CPU sequential computations is included. This study has been processed on GPGPU which is inexpensive and available for all the research laboratories in the developing countries. Index Terms — Image feature extraction, Parallel computing, Graphics processors, Performance Performance Analysis

——————————



——————————

1 INTRODUCTION

I

N order to meet demands from the 3D graphic industry, GPUs have developed and obtained distinctive power in parallel computation and relatively huge memory bandwidth [1]. These properties have recently inspired the use of GPU as general purposes computing unit where many applications can benefit of these GPGPU’s properties. Two of the important applications that have driven into the implementation of GPGPUs are image processing and feature extraction. In this context this paper has been written to study the efficiency of the use of NVIDIA’s CUDA enabled GPGPU in calculating polar Hough transform for lines which is a key technique in image processing and pattern recognition [2].

2 RELATED WORKS A large amount of interest was given lately to the use of general purpose computing abilities of CUDA-enabled GPUs and a large number of applications have been accelerated by modifying to be applicable on these GPUs. In 2008 Shuai Che et al introduced a performance study of general-purpose applications on graphics processors using CUDA where some of general applications have been applied on GPU and CPU and a performance study has been given [3].

————————————————

Ghaith Makey is PhD student at The Higher Institute of Laser Research  And Applications, Damascus Damascus University.  K. Al-shufi, Associate professor at Damascus University and vice dean of  The Higher Institute of Laser Research And Applications, Damascus, Syria. professor of computational physics physics at the   M. Sayem El-Daher Associate professor  physics department Damascus University, University, Damascus, Syria. Syria. 

3 CUDA-C LANGUAGE CUDA is abbreviation for Computing Unified Device Architecture and it is defined as a general purpose parallel computing architecture that leverages the parallel compute engine in NVIDIA GPGPUs to solve many complex computational problems in more efficient way than on a CPU [1]. CUDA’s functions are easy to learn when strong background in both C and GPGPU architecture is presented. For more information about CUDA one can refer to ref [1]. Many of tutorials and learning media and documentations can be found in NVIDIA CUDA website or in reference [4].

4 HOUGH TRANSFORMS 4.1 Overview Hough Transform (Hough, 1962) is a key method of images feature extraction by shape matching and it has been used widely to extract the basic shapes of images. This method has proven efficient but still need high computational requirements and here comes the benefit of using GPGPU power to fulfill these requirements [2]. 4.1 Hough transform for lines Hough transform for lines converts all the points of a line into a specific point in an accumulator space, let us say that we have a line with this equation in the Cartesian space:  y  mx  c (1)

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617 https://sites.google.com/sit https://sites.google.com/site/journalofcomputing e/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

87

Where m is the line slope and c is the intercept with yaxis. This equation can be rearranged as:  Ay  Bx  1  0 (2) Where:  A  1 / c and B  m / c . Since each of A and B defines a line in the Cartesian space, each x and y coordinates for a point in a line define a line in in the parametric space of A and B. And all lines in the (A,B) parametric space which came from all the points of a line in the (x,y) Cartesian space should be intersected in the point which correspond to the slope and intercept of this line [2]. However, practically its more convenient to use the polar parameters ( ρ,θ) instead of the Cartesian parameters (m,c) because of many reasons like the infinity value of m in case of vertical lines and the huge range of values which c is may come within [2]. In this case the line equation is written as:      x cos( )  y sin( ) (3) Where  ρ is the normal distance to the line from the origin and θ is the angle of this normal.

(b) Fig. 2. Polar Hough transform (b) f or an image with two lines (a).

5 HARDWARE ARDWARE SPECIFICATIONS AND PROGRAMMING STEPS

5.1 Hardware specifications specifications The platform is Windows 7 based. The CPU used is AMD Phenom™ 9850 Processor 2.51 GHz. The GPGPU used is NVIDIA GeForce 9400 GT which was available by the price of less than 40$ at the date of this work. The Compute Capability of this GPGPU is 1.1 which represents almost the minimum compute capability a GPU can provide (but better than the GPUs with Compute Capability 1.0 that it supports Integer atomic functions operating on 32-bit words in global memory). All the baseline methods used on the CPU are sequential. 5.2 CPU based functions functions 5.2.1 Matlab function For calculating polar Hough Transform for lines we have used the next function in Matlab:

Fig. 1. Polar form of a line.

In this case we can guarantee that θ value is between 0 and 180 degrees and  ρ value is between 0 and  ρmax where:

  max



w2  h 2

(4)

Where w is the width of input image width and h is the height of input image.

Fig. 3. Matlab function to calculate Polar Hough Transform for lines.

Where: The 2D array (image) represents the input image. The 2D array (hough) represents the result (all its elements are set to zero before applying to the function). The integer variables (rows) and (columns) are the dimensions of the array (image). And the variable rmax is calculating by (4). The version of Matlab in this work is 7.8. (a)

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617 https://sites.google.com/sit https://sites.google.com/site/journalofcomputing e/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

5.2.2

C function

Fig. 4. C function to calculate Polar Hough Transform for lines.

The input variables definitions for this function are like thus for Matlab code. image and hough arrays are taken as pointers. For the compilation of this code we used Microsoft Visual Studio 6. 5.3 GPU based kernel

88

6 RESULTS AND DISCUSSION We have compared the performance of sequential CPU’s Matlab and C functions and parallel CUDA kernel for 10 images with the dimensions of (256x256). The difference between the ten images is the dark pixels capacity in each which represents the data in these images. These ten images have been applied on the sequential functions and the kernel and the time has been taken by using: Matlab 7.8 Profiler for Matlab function, Visual Studio C++ 6.0 Profiler for C function and CUDA Visual Profiler for CUDA kernel. The speedups have been calculated and the results charts have been drawn on Fig.3 . We can see that the speedup for an image with 60% dark pixels capacity is about 12 times for GPU vs. sequential C for single CPU and better than 23 for GPU vs. sequential Matlab for single CPU; these results were taken on relatively low parallelism ( just 180 threads) on almost the cheapest CUDA-enabled GPU in the market. Dark pixels capacity of more than 3% is required for the GPU to start giving speedup because when poor parallelism load is given the strong sequential computation speed of CPU will be dominating.

Fig. 5. CUDA Kernel to calculate Polar Hough Transform for lines.

The input variables definitions for this kernel are like thus for Matlab code. d_image and d_hough arrays are taken as pointers in device memory stages of copying data between the host memory and device memory are required before and after calling the kernel. The parallelism of threads has been used here on the parameter theta to insure that each thread is writing in a separated location of d_hough array. Thus this kernel only needs 16 blocks each with 16 threads whatever the dimensions of the input image are. When we used the parallelism of threads on x and y parameters we had to use atomic functions to insure that there is no conflict in writing to the device memory but however the serialization provided by the atomic function when many threads are trying to use the same memory address was so slow so in our GPU that no speedup has achieved. We couldn’t use the share memory because for each thread a huge shared memory was required that our GPGPU was not able to provide. The atomic function atomicadd has been used to increment the points of the accumulator space because it faster than using the operator (++) for example (faster up to 20%).

(a)

(b) Fig. 6. The speedup for GPU over C for single CPU (a) and the speedup for GPU over Matlab for single CPU (b).

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617 https://sites.google.com/sit https://sites.google.com/site/journalofcomputing e/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

7 CONCLUSIONS In this work a brief introduction to CUDA was given and a demonstration of polar Hough transform for lines was provided. The code of polar Hough transform for lines has written to be executed on GPU using CUDA language and in CPU by sequential baseline on both Matlab and C languages, after that speedups have been calculated for each code and for different images the results were shown and discussed. These results shows that CUDA enabled GPGPUs can really carry a normal PC computational speed to a new level, in this work good speedup has been achieved with a cheap GPGPU versus a pretty powerful processor. We expect that if a stronger GPGPU has been used a better performance can be obtained especially for GPGPUs with Compute Capability of 2.0.

REFERENCES [1] [2]

NVIDIA CUDA™ Programming Guide 3.0, 2010 Mark S. Nixon, Alberto S. Aguado: Feature Extraction and Image Processing, 2002 [3] Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Kevin Skadron: A performance study of generalpurpose applications on graphics processors using CUDA, J. Parallel Distrib. Comput. 68 (2008) 1370–1380 [4]  Jason Sanders and Edward Kandrot, Cuda by example An introduction to general-purpose GPU programming, AddisonWesly 2011

Ghaith Makey 2005 BS in Electronics Engineering 2009 MSc in Laser sciences and applications Currently PhD student at HILRA, Damascus, Syria Major Intersts: GPGPUs, Image processing, Digital Holography, Spatial Light Modulators K. Alshufi. PhD in physics, physics, Associate professor of physics at Damascus University and vice Dean of the higher Institute for Laser research and applications. M. Sayem El-Daher PhD in physics, physics, Associate professor professor at the physics department, department, Damascus University. University. area of research is computational physics.

89

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close