Image Processing

Published on January 2017 | Categories: Documents | Downloads: 27 | Comments: 0 | Views: 449

of 45

Content

DIGITAL IMAGE PROCESSING
Image:
A digital image is a computer file that contains graphical information instead
of text or a program. Pixels are the basic building blocks of all digital images.
Pixels are small adjoining squares in a matrix across the length and width of your
digital image. They are so small that you don’t see the actual pixels when the
image is on your computer monitor.
Pixels are monochromatic. Each pixel is a single solid color that is blended
from some combination of the 3 primary colors of Red, Green, and Blue. So, every
pixel has a RED component, a GREEN component and BLUE component. The
physical dimensions of a digital image are measured in pixels and commonly
called pixel or image resolution. Pixels are scalable to different physical sizes on
your computer monitor or on a photo print. However, all of the pixels in any
particular digital image are the same size. Pixels as represented in a printed photo
become round slightly overlapping dots.

Pixel Values: As shown in this bitonal image, each pixel is assigned a tonal value,
in this example 0 for black and 1 for white.

PIXEL DIMENSIONS are the horizontal and vertical measurements of an image
expressed in pixels. The pixel dimensions may be determined by multiplying both
the width and the height by the dpi. A digital camera will also have pixel
dimensions, expressed as the number of pixels horizontally and vertically that
define its resolution (e.g., 2,048 by 3,072). Calculate the dpi achieved by dividing a
document's dimension into the corresponding pixel dimension against which it is
aligned.

Example:

Fig: An 8" x 10" document that is scanned at 300 dpi has the pixel dimensions of
2,400 pixels (8" x 300 dpi) by 3,000 pixels (10" x 300 dpi).
Images in MATLAB:
The basic data structure in MATLAB is the array, an ordered set of real or
complex elements. This object is naturally suited to the representation of images,
real-valued ordered sets of color or intensity data.
MATLAB stores most images as two-dimensional arrays (i.e., matrices), in which
each element of the matrix corresponds to a single pixel in the displayed image.
(Pixel is derived from picture element and usually denotes a single dot on a
computer display.)
For example, an image composed of 200 rows and 300 columns of different
colored dots would be stored in MATLAB as a 200-by-300 matrix. Some images,
such as color images, require a three-dimensional array, where the first plane in the

third dimension represents the red pixel intensities, the second plane represents the
green pixel intensities, and the third plane represents the blue pixel intensities. This
convention makes working with images in MATLAB similar to working with any
other type of matrix data, and makes the full power of MATLAB available for
image processing applications.
IMAGE REPRESENTATION:
An image is stored as a matrix using standard Matlab matrix conventions. There
are four basic types of images supported by Matlab:
1. Binary images
2. Intensity images
3. RGB images
4. Indexed images
Binary Images:
In a binary image, each pixel assumes one of only two discrete values: 1 or 0. A
binary image is stored as a logical array. By convention, this documentation uses
the variable name BW to refer to binary images.
The following figure shows a binary image with a close-up view of some of the
pixel values.

Fig: Pixel Values in a Binary Image
Grayscale Images:
A grayscale image (also called gray-scale, gray scale, or gray-level) is a data
matrix whose values represent intensities within some range. MATLAB stores a
grayscale image as an individual matrix, with each element of the matrix
corresponding to one image pixel. By convention, this documentation uses the
variable name I to refer to grayscale images.
The matrix can be of class uint8, uint16, int16, single, or double. While grayscale
images are rarely saved with a color map, MATLAB uses a color map to display
them.
For a matrix of class single or double, using the default grayscale color map, the
intensity 0 represents black and the intensity 1 represents white. For a matrix of

type uint8, uint16, or int16, the intensity intmin (class (I)) represents black and the
intensity intmax (class (I)) represents white.
The figure below depicts a grayscale image of class double.

Fig: Pixel Values in a Grayscale Image Define Gray Levels
Color Images:
A color image is an image in which each pixel is specified by three values —
one each for the red, blue, and green components of the pixel's color. MATLAB
store color images as an m-by-n-by-3 data array that defines red, green, and blue
color components for each individual pixel. Color images do not use a color map.
The color of each pixel is determined by the combination of the red, green, and
blue intensities stored in each color plane at the pixel's location.

Graphics file formats store color images as 24-bit images, where the red, green,
and blue components are 8 bits each. This yields a potential of 16 million colors.
The precision with which a real-life image can be replicated has led to the
commonly used term color image.
A color array can be of class uint8, uint16, single, or double. In a color array of
class single or double, each color component is a value between 0 and 1. A pixel
whose color components are (0, 0, 0) is displayed as black, and a pixel whose color
components are (1, 1, 1) is displayed as white. The three color components for
each pixel are stored along the third dimension of the data array. For example, the
red, green, and blue color components of the pixel (10,5) are stored in
RGB(10,5,1), RGB(10,5,2), and RGB(10,5,3), respectively.
The following figure depicts a color image of class double.

Fig: Color Planes of a True color Image

Indexed Images:
An indexed image consists of an array and a colormap matrix. The pixel values in
the array are direct indices into a colormap. By convention, this documentation
uses the variable name X to refer to the array and map to refer to the colormap.
The colormap matrix is an m-by-3 array of class double containing floating-point
values in the range [0, 1]. Each row of map specifies the red, green, and blue
components of a single color. An indexed image uses direct mapping of pixel
values to colormap values. The color of each image pixel is determined by using
the corresponding value of X as an index into map.
A colormap is often stored with an indexed image and is automatically loaded with
the image when you use the imread function. After you read the image and the
colormap into the MATLAB workspace as separate variables, you must keep track
of the association between the image and colormap. However, you are not limited
to using the default colormap--you can use any colormap that you choose.
The relationship between the values in the image matrix and the colormap depends
on the class of the image matrix. If the image matrix is of class single or double, it
normally contains integer values 1 through p, where p is the length of the
colormap. The value 1 points to the first row in the colormap, the value 2 points to
the second row, and so on. If the image matrix is of class logical, uint8 or uint16,
the value 0 points to the first row in the colormap, the value 1 points to the second
row, and so on.

The following figure illustrates the structure of an indexed image. In the figure, the
image matrix is of class double, so the value 5 points to the fifth row of the
colormap.

Fig: Pixel Values Index to Colormap Entries in Indexed Images

Digital Image File Types:
The 5 most common digital image file types are as follows:
1. JPEG is a compressed file format that supports 24 bit color (millions of colors).
This is the best format for photographs to be shown on the web or as email
attachments. This is because the color informational bits in the computer file are
compressed (reduced) and download times are minimized.

2. GIF is an uncompressed file format that supports only 256 distinct colors. Best
used with web clip art and logo type images. GIF is not suitable for photographs
because of its limited color support.
3. TIFF is an uncompressed file format with 24 or 48 bit color support.
Uncompressed means that all of the color information from your scanner or digital
camera for each individual pixel is preserved when you save as TIFF. TIFF is the
best format for saving digital images that you will want to print. Tiff supports
embedded file information, including exact color space, output profile information
and EXIF data. There is a lossless compression for TIFF called LZW. LZW is
much like 'zipping' the image file because there is no quality loss. An LZW TIFF
decompresses (opens) with all of the original pixel information unaltered.
4. BMP is a Windows (only) operating system uncompressed file format that
supports 24 bit color. BMP does not support embedded information like EXIF,
calibrated color space and output profiles. Avoid using BMP for photographs
because it produces approximately the same file sizes as TIFF without any of the
advantages of TIFF.
5. Camera RAW is a lossless compressed file format that is proprietary for each
digital camera manufacturer and model. A camera RAW file contains the 'raw' data
from the camera's imaging sensor. Some image editing programs have their own
version of RAW too. However, camera RAW is the most common type of RAW
file. The advantage of camera RAW is that it contains the full range of color
information from the sensor. This means the RAW file contains 12 to 14 bits of
color information for each pixel. If you shoot JPEG, you only get 8 bits of color for
each pixel. These extra color bits make shooting camera RAW much like shooting

negative film. You have a little more latitude in setting your exposure and a slightly
wider dynamic range.

Image Coordinate Systems:
Pixel Coordinates
Generally, the most convenient method for expressing locations in an image is to
use pixel coordinates. In this coordinate system, the image is treated as a grid of
discrete elements, ordered from top to bottom and left to right, as illustrated by the
following figure.

Fig: The Pixel Coordinate System
For pixel coordinates, the first component r (the row) increases downward, while
the second component c (the column) increases to the right. Pixel coordinates are
integer values and range between 1 and the length of the row or column.
There is a one-to-one correspondence between pixel coordinates and the
coordinates MATLAB uses for matrix subscripting. This correspondence makes the
relationship between an image's data matrix and the way the image is displayed

easy to understand. For example, the data for the pixel in the fifth row, second
column is stored in the matrix element (5, 2). You use normal MATLAB matrix
subscripting to access values of individual pixels.

For example, the MATLAB code
I (2, 15)
Returns the value of the pixel at row 2, column 15 of the image I.
Spatial Coordinates:
In the pixel coordinate system, a pixel is treated as a discrete unit, uniquely
identified by a single coordinate pair, such as (5, 2). From this perspective, a
location such as (5.3, 2.2) is not meaningful.
At times, however, it is useful to think of a pixel as a square patch. From this
perspective, a location such as (5.3, 2.2) is meaningful, and is distinct from (5, 2).
In this spatial coordinate system, locations in an image are positions on a plane,
and they are described in terms of x and y (not r and c as in the pixel coordinate
system).
The following figure illustrates the spatial coordinate system used for images.
Notice that y increases downward.

Fig: The Spatial Coordinate System

This spatial coordinate system corresponds closely to the pixel coordinate system
in many ways. For example, the spatial coordinates of the center point of any pixel
are identical to the pixel coordinates for that pixel.
There are some important differences, however. In pixel coordinates, the upper left
corner of an image is (1,1), while in spatial coordinates, this location by default is
(0.5,0.5). This difference is due to the pixel coordinate system's being discrete,
while the spatial coordinate system is continuous. Also, the upper left corner is
always (1,1) in pixel coordinates, but you can specify a non default origin for the
spatial coordinate system.
Another potentially confusing difference is largely a matter of convention: the
order of the horizontal and vertical components is reversed in the notation for these
two systems. As mentioned earlier, pixel coordinates are expressed as (r, c), while
spatial coordinates are expressed as (x, y). In the reference pages, when the syntax
for a function uses r and c, it refers to the pixel coordinate system. When the syntax
uses x and y, it refers to the spatial coordinate system.

Digital Image Processing:
Digital image processing is the use of computer algorithms to perform image
processing on digital images. As a subfield of digital signal processing, digital
image processing has many advantages over analog image processing; it allows a
much wider range of algorithms to be applied to the input data, and can avoid
problems such as the build-up of noise and signal distortion during processing.
Image digitization:
An image captured by a sensor is expressed as a continuous function f(x,y)
of two co-ordinates in the plane. Image digitization means that the function f(x,y)
is sampled into a matrix with M rows and N columns. The image quantization
assigns to each continuous sample an integer value. The continuous range of the
image function f(x,y) is split into K intervals. The finer the sampling (i.e., the
larger M and N) and quantization (the larger K) the better the approximation of the
continuous image function f(x,y).
Image Pre-processing:
Pre-processing is a common name for operations with images at the lowest
level of abstraction -- both input and output are intensity images. These iconic
images are of the same kind as the original data captured by the sensor, with an
intensity image usually represented by a matrix of image function values
(brightness). The aim of pre-processing is an improvement of the image data that
suppresses unwanted distortions or enhances some image features important for
further processing. Four categories of image pre-processing methods according to

the size of the pixel neighborhood that is used for the calculation of new pixel
brightness:
o Pixel brightness transformations.
o Geometric transformations.
o Pre-processing methods that use a local neighborhood of the
processed pixel.
o Image restoration that requires knowledge about the entire image.
Image Segmentation:
Image segmentation is one of the most important steps leading to the analysis of
processed image data. Its main goal is to divide an image into parts that have a
strong correlation with objects or areas of the real world contained in the
image.Two kinds of segmentation
1.

Complete segmentation: This results in set of disjoint regions uniquely
corresponding with objects in the input image. Cooperation with higher
processing levels which use specific knowledge of the problem domain is
necessary.

2.

Partial segmentation: in which regions do not correspond directly with
image objects. Image is divided into separate regions that are
homogeneous with respect to a chosen property such as brightness, color,
reflectivity, texture, etc. In a complex scene, a set of possibly overlapping
homogeneous regions may result. The partially segmented image must
then be subjected to further processing, and the final image segmentation
may be found with the help of higher level information.

Segmentation methods can be divided into three groups according to the dominant
features they employ
1. First is global knowledge about an image or its part; the knowledge is
usually represented by a histogram of image features.
2. Edge-based segmentations form the second group; and
3. Region-based segmentations
Image enhancement:
The aim of image enhancement is to improve the interpretability or perception of
information in images for human viewers, or to provide `better' input for other
automated image processing techniques. Image enhancement techniques can be
divided into two broad categories:
1. Spatial domain methods, which operate directly on pixels, and
2. Frequency domain methods, which operate on the Fourier transform of an
image.
Unfortunately, there is no general theory for determining what `good’ image
enhancement is when it comes to human perception. If it looks good, it is good!
However, when image enhancement techniques are used as pre-processing tools for
other image processing techniques, then quantitative measures can determine
which techniques are most appropriate.

Theory
Color Space:
A color

model is

an

abstract

mathematical

model

describing

the

way colors can be represented as tuples of numbers, typically as three or four
values or color components (e.g. RGB and CMYK are color models). However, a
color model with no associated mapping function to an absolute color space is a
more or less arbitrary color system with no connection to any globally understood
system of color interpretation.
Adding a certain mapping function between the color model and a certain
reference color space results in a definite "footprint" within the reference color
space. This "footprint" is known as a gamut, and, in combination with the color
model, defines a new color space. For example, Adobe RGB and sRGB are two
different absolute color spaces, both based on the RGB model.
In the most generic sense of the definition above, color spaces can be defined
without the use of a color model. These spaces, such as Pantone, are in effect a
given set of names or numbers which are defined by the existence of a
corresponding set of physical color swatches. This article focuses on the
mathematical model concept.
A wide range of colors can be created by the primary colors of pigment
(cyan (C), magenta (M),yellow (Y), and black (K)). Those colors then define a
specific color space. To create a three-dimensional representation of a color space,
we can assign the amount of magenta color to the representation's X axis, the

amount of cyan to its Y axis, and the amount of yellow to its Z axis. The resulting
3-D space provides a unique position for every possible color that can be created
by combining those three pigments.
However, this is not the only possible color space. For instance, when colors are
displayed on a computer monitor, they are usually defined in the RGB
(red, green and blue) color space. This is another way of making nearly the same
colors (limited by the reproduction medium, such as the phosphor (CRT) or filters
and backlight (LCD)), and red, green and blue can be considered as the X, Y and Z
axes. Another way of making the same colors is to use their Hue (X axis), their
Saturation (Y axis), and their brightness Value (Z axis). This is called the HSV
color space. Many color spaces can be represented as three-dimensional (X,Y,Z)
values in this manner, but some have more, or fewer dimensions, and some, such
as Pantone, cannot be represented in this way at all.
RGB Color Space:
An RGB color space is any additive color space based on the RGB color model. A
particular RGB color space is defined by the three chromaticity’s of the red, green,
and blue additive primaries, and can produce any chromaticity that is the triangle
defined by those primary colors. The complete specification of an RGB color space
also requires a white point chromaticity and a gamma correction curve.
An RGB color space can be easily understood by thinking of it as "all possible
colors" that can be made from three colourants for red, green and blue. Imagine,
for example, shining three lights together onto a white wall: one red light, one
green light, and one blue light, each with dimmer switches. If only the red light is

on, the wall will look red. If only the green light is on, the wall will look green. If
the red and green lights are on together, the wall will look yellow. Dim the red light
and the wall will become more of a yellow-green. Dim the green light instead, and
the wall will become more orange. Bringing up the blue light a bit will cause the
orange to become less saturated and more whitish. In all, each setting of the three
dimmer switches will produce a different result, either in color or in brightness or
both. The set of all possible results is the gamut defined by those particular color
lamps. Swap the red lamp for one of a different brand that is slightly more orange,
and there will be slightly different, and more limited gamut, since the set of all
colors that can be produced with the three lights will be changed.
An LCD display can be thought of as a grid of thousands of little red, green, and
blue lamps, each with their own dimmer switch. The gamut of the display will
depend on the three colors used for the red, green and blue lights. A wide-gamut
display will have very saturated, "pure" light colors, and thus be able to display
very saturated, deep colors.
Applications:
RGB is a convenient color model for computer graphics because the human visual
system works in a way that is similar — though not quite identical — to an RGB
color space. The most commonly used RGB color spaces are sRGB and Adobe
RGB (which has a significantly largergamut). Adobe has recently developed
another color space called Adobe Wide Gamut RGB, which is even larger, in
detriment to gamut density.

As of 2007, sRGB is by far the most commonly used RGB color space, particularly
in consumer grade digital cameras, HD video cameras, and computer monitors.
HDTVs use a similar space, commonly called Rec. 709, sharing the sRGB
primaries. The sRGB space is considered adequate for most consumer applications.
Having all devices use the same color space is convenient in that an image does not
need to be converted from one color space to another before being displayed.
However, sRGB's limited gamut leaves out many highly saturated colors that can
be produced by printers or in film, and thus is not ideal for some high quality
applications. The wider gamut Adobe RGB is being built into more medium-grade
digital cameras, and is favored by many professional graphic artists for its
larger gamut.

HSV Color space:
HSV is one of the most common cylindrical-coordinate representations of
points in an RGB color model, which rearrange the geometry of RGB in an attempt
to be more intuitive and
representation.

They

were

perceptually relevant than the cartesian (cube)
developed

in

the

1970s

for computer

graphics applications, and are used for color pickers, in color-modification tools
in image editing software, and less commonly for image analysis and computer
vision.
HSV

stands

for hue, saturation,

and value,

and

is

also

often

called HSB (B for brightness). In each cylinder, the angle around the central
vertical axis corresponds to "hue", the distance from the axis corresponds to

"saturation", and the distance along the axis corresponds to "lightness", "value" or
"brightness". Because HSV is simple transformations of device-dependent RGB
models, the physical colors it defines depend on the colors of the red, green, and
blue primaries of the device or of the particular RGB space, and on the gamma
correction used to represent the amounts of those primaries. Each unique RGB
device therefore has unique HSV spaces to accompany it, HSV values describe a
different color for each basis RGB space.
HSV representations are used widely in computer graphics, and it is often more
convenient than RGB, but it is also criticized for not adequately separating colormaking attributes, or for their lack of perceptual uniformity.
Basic Principle:
HSV is a cylindrical geometries as shown in the figure1 given below(fig. 2), with
hue, their angular dimension, starting at the red primary at 0°, passing through
the green primary at 120° and the blue primary at 240°, and then wrapping back to
red at 360°.

In each geometry, the central vertical axis comprises the neutral, achromatic,
or gray colors, ranging from black at lightness 0 or value 0, the bottom, to white at

lightness 1 or value 1, the top. In both geometries, the additive primary
and secondary colors – red, yellow, green, cyan, blue, and magenta – and linear
mixtures between adjacent pairs of them, sometimes called pure colors, are
arranged around the outside edge of the cylinder with saturation 1; in HSV these
have value 1. In HSV, mixing these pure colors with white – producing socalled tints –

reduces

saturation,

while

mixing

them

with

black

–

producing shades – leaves saturation unchanged. In HSL, both tints and shades
have full saturation, and only mixtures with both black and white – called tones –
have saturation less than 1.
Because these definitions of saturation – in which very dark (in both models) or
very light (in HSL) near-neutral colors, for instance or , are considered fully
saturated – conflict with the intuitive notion of color purity, often a conic or biconic solid is drawn instead (fig. 2 shown below), with what this article
calls chroma as its radial dimension, instead of saturation. Confusingly, such
diagrams usually label this radial dimension "saturation", blurring or erasing the
distinction between saturation and chroma.

As described below, computing chroma is a helpful step in the derivation of each
model. Because such an intermediate model – with dimensions hue, chroma, and

HSV value or HSL lightness – takes the shape of a cone or bicone, HSV is often
called the "hexcone model" while HSL is often called the "bi-hexcone model"
Use In Image Analysis:
HSV color

model

is

often

used

in computer

vision and image

analysis for feature detection or image segmentation. The applications of such
tools include object detection, for instance in robot vision; object recognition, for
instance of faces, text, or license plates; content-based image retrieval; and analysis
of medical images.
For the most part, computer vision algorithms used on color images are straightforward extensions to algorithms designed for grayscale images, for instance kmeans or fuzzy clustering of pixel colors, or canny edge detection. At the simplest,
each color component is separately passed through the same algorithm. It is
important, therefore, that the features of interest can be distinguished in the color
dimensions used. Because the R, G, and B components of an object’s color in a
digital image are all correlated with the amount of light hitting the object, and
therefore with each-other, image descriptions in terms of those components make
object discrimination difficult. Descriptions in terms of hue/lightness/chroma or
hue/lightness/saturation are often more relevant.
Starting in the late 1970s, transformations like HSV or HSI were used as a
compromise

between

effectiveness

for

segmentation

and

computational

complexity. They can be thought of as similar in approach and intent to the neural
processing used by human color vision, without agreeing in particulars: if the goal
is object detection, roughly separating hue, lightness, and chroma or saturation is

effective, but there is no particular reason to strictly mimic human color response.
HSV color model is widely used as it’s performance compares favorably with more
complex models, and it’s computational simplicity remains compelling.

YCbCr Color model:
YCbCr or Y′CbCr, sometimes written YCBCR or Y′CBCR, is a family of color
spaces

used

as

a

part

of

the color

image

pipeline in video and digital

photography systems. Y′ is the luma component and CB and CR are the bluedifference and red-difference chroma components. Y′ (with prime) is distinguished
from Y which is luminance, meaning that light intensity is non-linearly encoded
using gamma correction.
Y′CbCr is not an absolute color space, it is a way of encoding RGB information.
The actual color displayed depends on the actual RGB primaries used to display
the signal. Therefore a value expressed as Y′CbCr is predictable only if standard
RGB primary chromaticity’s are used.
Cathode ray tube displays are driven by red, green, and blue voltage signals, but
these RGB signals are not efficient as a representation for storage and
transmission, since they have a lot of redundancy.
YCbCr and Y′CbCr are a practical approximation to color processing and
perceptual uniformity, where the primary colors corresponding roughly to red,
green and blue are processed into perceptually meaningful information. By doing
this, subsequent image/video processing, transmission and storage can do

operations and introduce errors in perceptually meaningful ways. Y′CbCr is used to
separate out a luma signal (Y′) that can be stored with high resolution or
transmitted at high bandwidth, and two chroma components (C B and CR) that can
be bandwidth-reduced, sub sampled, compressed, or otherwise treated separately
for improved system efficiency.
One practical example would be decreasing the bandwidth or resolution allocated
to "color" compared to "black and white", since humans are more sensitive to the
black-and-white information (see image example to the right).

Implementation:
Color Based segmentation:
Assuming that a person framed in any random photograph is not an attendee at the
Renaissance Fair or Mardi Gras, it can be assumed that the face is not white, green,
red, or any unnatural color of that nature. While different ethnic groups have
different levels of melanin and pigmentation, the range of colors that human facial
skin takes on is clearly a subspace of the total color space. With the assumption of
a typical photographic scenario, it would be clearly wise to take advantage of facecolor correlations to limit our face search to areas of an input image that have at
least the correct color components. In pursuing this goal, we looked at three color
spaces that have been reported to be useful in the literature, HSV and YCrCb
spaces, as well as the more commonly seen RGB space. Below we will briefly

describe what we found and how that knowledge was used in our system. The
result of this study is the construction of hyper planes in the various color spaces
that may be used to separate colors. While elegant techniques like FLD and SVD,
etc. may be used to optimally construct the hyper planes, we built ours more ad hoc
by varying the parameters of the separating lines and planes that we eventually
used.
A. HSV Color Space
While RGB may be the most commonly used basis for color descriptions, it has the
negative aspect that each of the coordinates (red, green, and blue) is subject to
luminance effects from the lighting intensity of the environment, an aspect which
does not necessarily provide relevant information about whether a particular image
”patch” is skin or not skin. The HSV color space, however, is much more intuitive
and provides color information in a manner more in line how humans think of
colors and how artists typically mix colors. ”Hue” describes the basic pure color of
the image, ”saturation” gives the manner by which this pure color (hue) is diluted
by white light, and ”Value” provides an achromatic notion of the intensity of the
color. It is the first two, H and S, that will provide us with useful discriminating
information regarding skin. Using the reference images (truth images) provided by
the teaching staff, we were able to plot the H,S, and V values for face and non-face
pixels and try to detect any useful trends. The results of this may be viewed in
figure 2. From those results it is seen that the H values tend to occupy very narrow
ranges towards both the bottom and top of its possible values. This is the most
noticeable trend and was used by us to derive the following rule used in our face
skin detection block:

19 < H < 240 ) Not Skin;
and otherwise we assume that it is skin. By applying a mask based on this rule to
our sample image in figure 1, we have the remaining pixels seen in figure 3.
B. YCbCr Color Space
Similarly, we analyzed the YCbCr color space for any trends that we could take
advantage of to remove areas that are likely to not be skin. Relevant plots may be
viewed in 4. After experimenting with various thresholds, we found that the best
results were found by using the following rule:
102 < Cb < 128 ) Skin;
and otherwise assume that it is NOT skin and may be removed from further
consideration.
C. RGB Color Space
Let’s not be too hard on our good friend the RGB color space...she still has some
useful things to offer us to take
advantage of in our project. While RGB doesn’t decouple the effects of luminance,
a drawback that we noted earlier, it is still able to perhaps allow us to remove
certain colors that are clearly out of the range of what normal skin color is. Please
refer to figure 6.
From studying and experimenting with various thresholds in RGB space, we found
that the following rule worked well in removing some unnecessary pixels:
0:836G ¡ 14 < B < 0:836G + 44 ) Skin
And
0:79G ¡ 67 < B < 0:78G + 42 ) Skin;
with other pixels being labeled as non-face and removed.

Lower Plane Masking:
While in general it would destroy the generality of a detector, in our case we
believe that its reasonable to take advantage of a priori knowledge of where faces
are most likely to be and not be to remove ”noise”. We observed that in the training
images that no faces ever appeared in the lower third of the image field. With very
high probability it is likely that the scenarios where our system will be used (i.e.
the testing images) that the same will be true since we know that the conditions in
which the pictures were taken are identical. Hence, we removed the lower portion
of the image from consideration to remove the possibility of false alarms
originating from this region.
Morphological processing:
A. Applying the Open Operation
At this stage in the flow or our detector (figure 9) we have successfully removed
the vast majority of the original pixels from consideration, but we still see little
specs throughout the masked image. Because we will subsequently send the image
through a matched filter and the specs will be averaged out of consideration and
hence could be left in and just ignored, it is preferable to remove them now in
order to speed future processing (i.e. the matched filter needn’t perform any
wasteful calculations at these pixels). Hence the open (erode ! dilate) operation was
performed using a 3x3 window of all 1s. The result of applying this additional step
is in figure 10. It is seen that the open operation has resulted in there being a huge
reduction in the number of small ”noisy” specs. B. Removal of Small Blobs and
Grayscale Transformation By ”blobs”, we simply mean the connected groups of
pixels that remain at this stage. Here we may apply a little additional knowledge

about the way the picture was taken...we know that the subjects in the photos were
standing relatively closely to one another and hence should have head sizes
(measured by number of pixels) that are relatively similar. The largest blobs should
be these heads and blobs considerably smaller than the larger blobs may be safely
assumed to be more ”noise”. In the particular sample image that we’ve been
looking at, the sizes of the blobs from figure 10 were measured and ranked. The
ranked sizes of the 195 remaining blobs is seen in figure 11. By removing blobs
that are below a given threshold size we can remove even more additional noise.
After experimenting with the given image studied in this report as well as the other
provided images, we found that a pixel size of 200 was a good threshold value.
Hence our blob size rule is:
Blob Size < 200 ) Non-face Blob; and hence such blobs may be removed. Finally,
we found that after this stage in our processing that all the color information that
could be used within the level of sophistication feasible for this project had been
and that subsequent stages could be done in grayscale without any performance
degradation, but with the additional benefit of a faster system that need only
operate in one of the original three dimensions. Hence we now transform our
image to grayscale. This provides us with our final pre-processed image, which
may be seen in figure 12. It is important to note at this point one the main problem
that we faced in this project. Note that the faces are retained at this stage in the
processing, but unfortunately we have been unable to resolve them into separate
blobs. Were the subjects standing with sufficient separation to do allow this, we
could do almost all of our necessary face detection just by working with blobs and
their size statistics, etc. However, because the students in the photos are in very
close clusters, multiple faces have been grouped in single blobs. This leads to

complications that the template matching methodology (in our case at least) is
unable to cleanly resolve in some situations.
MATCHED FILTERING (TEMPLATE MATCHING)
A. Template Design
The first task in doing template matching is to determine what template to use.
Intuitively, it seemed reasonable to us that the best template to use would be one
derived by somehow averaging the some images of the students in the training
images that would likely be in the testing images. We would like to find a good
subset of the faces found in the training images that are clear, straight, and
representative of typical lighting/environmental conditions. It is also important that
these images be properly aligned and scaled with respect to one another. To this
end, we spent considerable time manually segmenting, selecting, and aligning face
photos. In the end we chose 30 face images, which may be seen in figure 13. In
order to have the template reflect the shape of the faces it is trying to detect, rather
than their particular coloring, etc. we applied histogram equalization to each image
and removed the means. This resulted in figure 14. Our final template is a result of
adding together the 30 face images in figure 14, giving us figure 15. The actual
template used in the matched filtering started at 30x30 pixels, by resizing this
template. Its size was changed to cover different possible scalings in our test
image.
B. Application of Template for Face Detection
Our basic algorithm at this stage may be summarized as follows:

1) Resize the image through appropriate filtering and subs ampling so that the
smallest head in the resulting image is likely to be no smaller than the initial size of
our template, 30x30 pixels
2) Convolve the masked grayscale image (figure 12) with the template. Normalize
the output by the energy in the template.
3) Look for peaks in the resulting output and compare them to a given range of
thresholds.
4) Any pixels that fall within the threshold range are deemed to be faces and are
marked as such. In order to help prevent the occurrence of false detections and
multiple detections of the same face, we subsequently mask out the pixels in the
reference grayscale image (figure 12) with a small vertical rectangle of a size
comparable to the template and large enough to cover most of the detected head
and neck regions.
5) The threshold range is reduced to a preset lower limit. Apply another stage of
convolving. If the lower limit is already reached, proceed to the next step below.
6) In order to detect larger scale faces, the template is enlarged and the thresholds
are reset to the upper limit. We again go through the convolution, detection,
threshold reduction, steps.
7) If an upper scale limit is reached, quit.
For an example of what the results are of a typical application of the template
matching step, please refer to figures 16 and 17. We see that there are peaks at the
locations of the faces, but that due to the proximity of the faces that the peaks are
closely located in space. As mentioned in the steps of our algorithm, when a given
peak was determined to be a face pixel by virtue of falling into the threshold
interval, we then remove any pixels that have a high likelihood of being associated

with that single face from the masked image. An example of what masked image
results from this may be seen in figure 18. It is worth mention that we found that
we were able to detect tilted faces with reasonable reliability without necessarily
having to rotate the template through a range of different angles. It is worth asking
the question of what sorts of results we would have received had we used a
template with a shape similar to that of a face...perhaps a Gaussian shaped
template. It leads to the question of how much of our results are due to ”face
detection” as opposed to just ”pixel energy detection”. These are questions that we
hadn’t sufficient time to investigate, although the question is a relevant one in
interpreting our results. Without addition experimental data, we will have to refrain
from further comment on this point.

Software Used:

INTRODUCTION TO MATLAB
MATLAB is a high performance language for technical computing .It
integrates computation visualization and programming in an easy to use
environment. Matlab stands for matrix laboratory. It was written originally to
provide easy access to matrix software developed by LINPACK (linear system
package) and EISPACK (Eigen system package) projects. MATLAB is therefore
built on a foundation of sophisticated matrix software in which the basic element is
matrix that does not require pre dimensioning
Typical uses of MATLAB
1 Math and computation
2 Algorithm development
3 Data acquisition
4 Data analysis ,exploration ands visualization
5 Scientific and engineering graphics

The main features of MATLAB
1 Advance algorithm for high performance numerical computation ,especially
in the
Field matrix algebra

2 A large collection of predefined mathematical functions and the ability to
define one’s own functions.
3 Two-and three dimensional graphics for plotting and displaying data
4 A complete online help system
5 Powerful, matrix or vector oriented high level programming language for
individual applications.
6 Toolboxes available for solving advanced problems in several application
areas

MATLAB

MATLAB
Programming language

User written / Built in functions

Graphics
2-D graphics
3-D graphics
Color and lighting
Animation

Computation
Linear algebra
Signal processing
Quadrature
Etc

External interface
Interface with C and
FORTRAN
Programs

Tool boxes
Signal processing
Image processing
Control systems
Neural Networks
Communications
Robust control
Statistics

Features and capabilities of MATLAB
The MATLAB System
The MATLAB system consists of five main parts:
Development Environment

This is the set of tools and facilities that help you use MATLAB functions and
files. Many of these tools are graphical user interfaces. It includes the MATLAB
desktop and Command Window, a command history, an editor and debugger, and
browsers for viewing help, the workspace, files, and the search path.
The MATLAB Mathematical Function Library
This is a vast collection of computational algorithms ranging from elementary
functions, like sum, sine, cosine, and complex arithmetic, to more sophisticated
functions like matrix inverse, matrix Eigen values, Bessel functions, and fast
Fourier transforms.
The MATLAB Language
This is a high-level matrix/array language with control flow statements, functions,
data structures, input/output, and object-oriented programming features. It allows
both "programming in the small" to rapidly create quick and dirty throw-away
programs, and "programming in the large" to create large and complex application
programs.
Graphics
MATLAB has extensive facilities for displaying vectors and matrices as graphs, as
well as annotating and printing these graphs. It includes high-level functions for
two-dimensional and three-dimensional data visualization, video processing,
animation, and presentation graphics. It also includes low-level functions that
allow you to fully customize the appearance of graphics as well as to build
complete graphical user interfaces on your MATLAB applications.

The MATLAB Application Program Interface (API)
This is a library that allows you to write C and Fortran programs that interact with
MATLAB. It includes facilities for calling routines from MATLAB (dynamic
linking), calling MATLAB as a computational engine, and for reading and writing
MAT-files.

Starting MATLAB
On Windows platforms, start MATLAB by double-clicking the MATLAB shortcut
icon on your Windows desktop. On UNIX platforms, start MATLAB by typing
mat lab at the operating system prompt. You can customize MATLAB startup. For
example, you can change the directory in which MATLAB starts or automatically
execute MATLAB statements in a script file named startup.m
MATLAB Desktop
When you start MATLAB, the MATLAB desktop appears, containing tools
(graphical user interfaces) for managing files, variables, and applications
associated with MATLAB. The following illustration shows the default desktop.
You can customize the arrangement of tools and documents to suit your needs. For
more information about the desktop tools .

MATLAB WORKING ENVIRONMENT:
MATLAB DESKTOP:- Matlab Desktop is the main Matlab application window.
The desktop contains five sub windows: the command window, the workspace
browser, the current directory window, the command history window, and one or
more figure windows, which are shown only when the user displays a graphic.
The command window is where the user types MATLAB commands and
expressions at the prompt (>>) and where the output of those commands is
displayed. MATLAB defines the workspace as the set of variables that the user
creates in a work session. The workspace browser shows these variables and some
information about them. Double clicking on a variable in the workspace browser
launches the Array Editor, which can be used to obtain information and income
instances edit certain properties of the variable.
The current Directory tab above the workspace tab shows the contents of the
current directory, whose path is shown in the current directory window. For
example, in the windows operating system the path might be as follows:
C:\MATLAB\Work, indicating that directory “work” is a subdirectory of the main
directory “MATLAB”, WHICH IS INSTALLED IN DRIVE C. clicking on the
arrow in the current directory window shows a list of recently used paths. Clicking
on the button to the right of the window allows the user to change the current
directory.

MATLAB uses a search path to find M-files and other MATLAB related
files, which are organize in directories in the computer file system. Any file run in
MATLAB must reside in the current directory or in a directory that is on search
path. By default, the files supplied with MATLAB and math works toolboxes are
included in the search path. The easiest way to see which directories are on the
search path. The easiest way to see which directories are soon the search paths, or
to add or modify a search path, is to select set path from the File menu the
desktop, and then use the set path dialog box. It is good practice to add any
commonly used directories to the search path to avoid repeatedly having the
change the current directory.
The Command History Window contains a record of the commands a user
has entered in the command window, including both current and previous
MATLAB sessions. Previously entered MATLAB commands can be selected and
re-executed from the command History window by right clicking on a command
or sequence of commands. This action launches a menu from which to select
various options in addition to executing the commands. This is a use to select
various options in addition to executing the commands. This is useful feature when
experimenting with various commands in a work session.
Using the MATLAB Editor to create M-Files:
The MATLAB editor is both a text editor specialized for creating M-files and a
graphical MATLAB debugger. The editor can appear in a window by itself, or it
can be a sub window in the desktop. M-files are denoted by the extension .m, as in
pixelup.m. The MATLAB editor window has numerous pull-down menus for tasks
such as saving, viewing, and debugging files. Because it performs some simple

checks and also uses color to differentiate between various elements of code, this
text editor is recommended as the tool of choice for writing and editing Mfunctions. To open the editor , type edit at the prompt opens the M-file filename.m
in an editor window, ready for editing. As noted earlier, the file must be in the
current directory, or in a directory in the search path.

Getting Help:
The principal way to get help online is to use the MATLAB help browser,
opened as a separate window either by clicking on the question mark symbol (?) on
the desktop toolbar, or by typing help browser at the prompt in the command
window. The help Browser is a web browser integrated into the MATLAB desktop
that displays a Hypertext Markup Language (HTML) documents. The Help
Browser consists of two panes, the help navigator pane, used to find information,
and the display pane, used to view the information. Self-explanatory tabs other
navigator pane are used to perform a search.
For example, help on a specific function is obtained by selecting the search tab,
selecting Function Name as the Search Type, and then typing in the function name
in the Search for field. It is good practice to open the Help Browser at the
beginning of a MATLAB session to have help readily available during code
development or other MATLAB task.
Another way to obtain for a specific function is by typing doc followed by
the function name at the command prompt. For example, typing doc format

displays documentation for the function called format in the display pane of the
Help Browser. This command opens the browser if it is not already open.
M-functions have two types of information that can be displayed by the user. The
first is called the H1 line, which contains the function name and alone line
description. The second is a block of explanation called the Help text block. Typing
help at the prompt followed by a function name displays both the H1 line and the
Help text

for that function in there command window. Occasionally, this

information can be more up to date than the documentation of the M-function in
question. Typically look for followed by a keyword displays all the H1 lines that
contain that keyword.
This function is useful when looking for a particular topic without knowing
the names of applicable functions. For example, typing look for edge at the prompt
displays the H1 lines containing that keyword. Because the H1 line contains the
function name, it then becomes possible to look at specific functions using the
other help methods. Typing look for edge-all at the prompt displays the H1 line of
all functions that contain the word edge in either the H1 line or the Help text block.
Words that contain the characters edge also are detected. For example, the H1 line
of a function containing the word polyedge in the H1 line or Help text would also
be displayed.

Saving and Retrieving a Work Session:
There are several ways to save and load an entire work session or selected
workspace variables in MATLAB. The simplest is as follows.

To save the entire workspace, simply right-click on any blank space in the
workspace Browser window and select Save Workspace As from the menu that
appears. This opens a directory window that allows naming the file and selecting
any folder in the system in which to save it. Then simply click Save. To save a
selected variable from the workspace, select the variable with a left click and then
right-click on the highlighted area. Then select Save Selection As from the menu
that appears. This again opens a window from which a folder can be selected to
save the variable.
To select multiple variables, use shift click or control click in the familiar
manner, and then use the procedure just described for a single variable. All files are
saved in the double-precision, binary format with the extension. mat. These saved
files commonly are referred to as MAT-files.
For example, a session named, says mywork_2003-02-10, and would appear as the
MAT-file mywork_2003_02_10.mat when saved. Similarly, a saved video called
final_video will appear when saved as final_video.mat.
To load saved workspaces and/or variables, left-click on the folder icon on the
toolbar of the workspace Browser window. This causes a window to open from
which a folder containing MAT-file or selecting open causes the contents of the
file to be restored in the workspace Browser window.
It is possible to achieve the same results described in the preceding paragraphs by
typing save and load at the prompt, with the appropriate file names and path
information. This approach is not as convenient, but it is used when formats other
than those available in the menu method are required.

Image Processing

Comments

Content

Sponsor Documents

Recommended