Sunteți pe pagina 1din 6

Non-Parametric Histogram Based Skin Modeling For Skin Detection

Shruthi M.L.J., Harsha B.K.


Asst. Prof., Dept. of Electronics and Communication, CMRIT, Bangalore, India.
(shruthimlc@gmail.com, harsha405@gmail.com)

Abstract - Digital Image Processing is a rapidly evolving


field with increasing applications in science and engineering.
Image Processing has the capability of developing the
ultimate device that can perform visual functions of all living
beings. An application of Image processing is skin detection.
Skin detection is the process of finding skin colored pixels
and regions in an image. Skin color arises due to melanin
and hemoglobin, but there are many other objects in the
world which are easily confused with skin, certain types of
wood, copper ,sand as well as clothes often have skin like
colors. Therefore there is a need to properly formulate the
skin detector, overcoming all the difficulties that arise in
detecting skin in an image. This paper presents a technique
for skin detection.
Keywords Bayesian classifier, skin probability, skinlikelihood, thresholding, non-parametric, histogram.

I. INTRODUCTION
The advancements of image processing warrant a new
set of algorithms to cater the modern day challenges. To
do so, various attributes of image needs to be considered
among various parameters. Skin color has proved to be a
useful and robust cue for face detection[7] [8] [9], lips and
face real time tracker [11], target detection, detection of
intruders in the border and motion analysis.
Hitherto skin color based modeling is developed by
considering large set of images on that a parametric
histogram or explicitly defined skin regions employed.
The algorithm based on the above said methods do suffer
from inability to adapt to the changing set of images.
Many systems used for detecting people in user interface
or video conferencing applications have employed skin
color models. Histogram models are employed by Schiele
and Waibel [1] and Kjeldsen and Kendermodel skin color
as a single Gaussian [2], while Jebara et al [3] [10]
employ a mixture density. Comparative performance of
different skin chrominance models and chrominance
spaces for the automatic detection of human faces in color
images is performed in [6]. An elliptical boundary model
is employed in [12]. Chinese skin detection is performed
in [13] and a fusion approach to detect the skin is
employed in [14]. In all of these systems, the color model
is trained on a small number of example images taken
under a representative set of illumination conditions. Most
of these works do not use non-skin models. These color
models are effective in the contexts of a larger system, but
they do not address the question of building a global skin
model which can be applied to a larger set of images.
This Paper proposes the construction of a statistical color
model from a data set of unprecedented size: the model

978-1-4799-1597-2/13/$31.00 2013 IEEE

includes nearly one billion labeled training pixels


obtained from random scrawls of the World Wide Web.
From this data a generic color model as well as separate
skin and non skin models are constructed. Histogram
density from this data is used to design a skin pixel
classifier to detect the skin.
II. SKIN DETECTION BASICS
Skin detection is defined as the process of detecting
skin colored pixels and regions in an image or video. Skin
color arises due to melanin and hemoglobin, but there are
many other objects in the world which are easily confused
with skin, certain types of wood, copper ,sand as well as
clothes often have skin like colors. Therefore there is a
need to properly formulate the skin detector, overcoming
all the difficulties that arise in detecting skin in an image.
2.1 A Framework for Skin Detection
Skin detection process constitutes of two phases: a
training phase and a detection phase. Training a skin
detector involves three basic steps:
Collecting database of skin patches from different
images. Such a database typically contains skin-colored
patches from a variety of people under different
illumination conditions
Choosing a suitable color space.
Learning the parameters of a skin classifier.
Given a trained skin detector, identifying skin pixels in a
given image or video frame involves:
Converting the image into the same color space that
was used in the training phase.
Classifying each pixel using the skin classifier to
either a skin or non-skin.
In any given color space, skin color occupies a part of
such a space, which might be a compact or large region in
the space. Such region is usually called the skin color
cluster. A skin classifier is a one-class or two-class
classification problem. A given pixel is classified and
labeled if it is a skin or a non-skin given a model of the
skin color cluster in a given color space. In the context of
skin classification, true positives are skin pixels that the
classifier correctly labels as skin. True negatives are nonskin pixels that the classifier correctly labels as non-skin.
Any classifier makes errors: it can wrongly label a nonskin pixel as skin or a skin pixel as a non-skin. The former
type of errors is referred to as false positives (false
detections) while the latter is false negatives. A good
classifier should have low false positive and false
negative rates. As in any classification problem, there is a

2013 IEEE International Conference on Computational Intelligence and Computing Research

tradeoff between false positives and false negatives. The


more loose the class boundary, the less the false negatives
and the more the false positives. The tighter the class
boundary, the more are the false negatives and the less are
the false positives. The same applies to skin detection.
This makes the choice of the color space extremely
important in skin detection. The color needs to be
represented in a color space where the skin class is most
compact in order to be able to tightly model the skin class.
The choice of the color space directly affects the kind of
classifier that should be used.
2.2 Skin Color model
A human skin color model is used to decide if a color is
skin or non-skin. A major requirement of skin color
model is listed below.
Very low false rejection rate at low false detection
rate: Color identification is the first step in skin detection.
Therefore it is imperative that almost all skin colors are
detected while keeping the false detection rate low.
Detection of different skin color type: There are many
skin color type ranging from whitish and yellowish to
blackish and brownish, which must be classified in one
class, skin color.
Ambiguity between skin and non-skin color: There
are many objects in the environment that have the same
color as skin. In these instances even a human eye cannot
determine if a particular color from a skin or non-skin
region. An effective skin color model should handle this
ambiguity between skin and non-skin colors.
Robustness to variation in lighting conditions: Skin
color can appear different under different lighting. It is
impractical to construct a skin color model that works
under all possible lighting conditions. However, a good
skin color model should exhibit some sort of robustness to
variations in lighting conditions.
Once the skin color model is built, the next question arises
as to which skin modeling method should be used, to
classify a pixel as skin or non-skin with possibly low false
rate.
Next section describes the skin color modeling method
used in the detector.
2.3 Skin Modeling
The final goal of skin color detection is to build a decision
rule that will discriminate between skin and non-skin
pixels. This is usually accomplished by introducing a
metric which measures distance of the pixel color to skin
color. The type of this metric is defined by the skin color
modeling method.
In this work, non-parametric skin modeling method is
used to estimate skin color distribution from the training
data without deriving an explicit model of skin color.
Therefore a Bayes classifier is used.
Skin detection is formulated as a standard three class
classification problem. Taking a color vector c as input,

detector produces a continuous output - skin likelihood


value (skin- ness), usually normalized to [0, 1], and
finally produces binary output (skin map) 1 for skin, 0 for
non-skin which is obtained by thresholding the skin
likelihood image. This pixel based skin detector works by
sequentially and independently analyzing each image
pixels color and labeling the pixel as skin or non-skin.
The following section gives the block diagram view of the
whole skin detector implementation.
2.4 Skin detector model
Input
2D
image

Skin
probability

Skin
likelihood

Threshol
-ding

Skin
map

Fig. 1 Block diagram of skin detector

The process followed in the implementation of the skin


detector is shown in Fig. 1. The first step is to acquire an
RGB test image in which the presence of skin is to be
detected. The skin probability of each pixel is calculated
in the next step using Bayes classifier. In the next step,
the skin likelihood of each pixel is computed with respect
to the trained set of images. Here we obtain skin
likelihood image. This image is threshold to obtain skin
map, which is a binary image where 1 is assigned for skin
pixels and 0 for non-skin pixels. The subsequent section
provides a detailed description of each step involved in
the process of skin detection.
2.5 Skin probability computation
The human skin color model used in this work is based on
the Bayesian decision rule. The Bayesian model can be
described as follows. Let c be a color vector in a given
color space. Let P(c|skin) and P(c|non-skin) be the classconditional probability density functions (pdfs) of the skin
color and non-skin color classes, respectively. The color c
is classified as skin color if:
P(cskin)/ P(cnon-skin)

(1)

The left term of (1) is known as the likelihood ratio.


= P (non-skin)/P (skin)

(2)

The histogram technique is employed to estimate the


class-conditional pdfs of skin and non-skin colors. This
technique is viable in this case because the dimension of
the feature vector c is low (at most 3), and a large set of
skin and non-skin colors can be collected. It can be
described as follows. From a set of labeled skin and nonskin pixels, two histograms H skin(c) and Hnon-skin(c) are
obtained, which are the counts of skin and non-skin pixels
having a value c, respectively. The class-conditional pdf
values are estimated by simply normalizing the

2013 IEEE International Conference on Computational Intelligence and Computing Research

histograms. These values are then used in (2) to


discriminate between skin and non-skin colors.
The skin probability calculation is followed by the
computation of skin likelihood of each pixel in an image,
which is detailed in next section.
2.6 Skin Likelihood Computation
For a particular histogram bin (i.e. pixel color), the log
likelihood of it being skin is calculated using the (3).
Skinlikelihood=Log(H(r,g,b)/h(r,g,b))

( 3)

where H(r,g,b) is skin histogram, h(r,g,b) is non-skin


histogram. For a test image, the log likelihood of each
pixel is computed using (3) to obtain skin likelihood
image.
The skin likelihood image is subjected to threshold to
decide whether a given image is skin or non-skin. Here
the non-skin pixels are assigned a 0 value and the skin
pixels are assigned value of 1. The image obtained is the
binary image called the skin map.
III. NON PARAMETRIC HISTOGRAM
BASED SKIN MODELLING

A central task in visual learning is the construction of


statistical models of image appearance from pixel data. A
solution consists of a representation of image appearance,
a learning algorithm and a source of training images.
When the amount of available training data is small,
sophisticated learning algorithms may be required to
interpolate between samples. However, as a result of the
World Wide Web, the vision community today has access
to image libraries of unprecedented size and richness.
These large data sets can support simple, computationally
efficient learning algorithms. However, a data set such as
web images constitute a biased sample from the space of
possible imagery. The following section describes the
construction and visualization of histogram color models.
3.1 Histogram Color Models
There are two issues that must be addressed in building a
color histogram model: the choice of color space and the
size of the histogram, which is measured by the number of
bins per color channel. The color images fit naturally into
a 24 bit color representation, since high quality color
images require 24 bits and images with coarser color
resolutions can be mapped into it. In contrast, the size of
the histogram depends upon the task. The starting point
for color analysis is the direct construction of a histogram
color model in 24 bit RGB color space. Such a model has
a size of 256 bins per color channel, which corresponds to
more than 16.7 million (2563) bins, each mapped to a
specific R, G, B color triple.
The dataset for the experiments described in this paper
were obtained by a large crawl of the web which
produced about 3 million images (including icons and

graphics). A smaller set of images were randomly


sampled from this large set and cleared off all icons and
graphics by hand. This produced a set of 18,696
photographs. This set was then manually separated into a
set of 9731 images containing skin and 8965 images not
containing any skin. This is a dataset of nearly 2 billion
pixels, which are two orders of magnitude more data than
the number of degrees of freedom in a histogram model of
size 256. A subset of 13,640 photos is used to build
specialized skin and non-skin color models. The regions
of skin in 4675 skin images were segmented by hand.
This set in conjunction with the 8965 non-skin images
gives a total of nearly 1 billion labeled pixels.
3.2 General Color Model
There is necessary to first learn a general color model
using a histogram of size 256 in RGB space[14]. Each of
the three histogram dimensions is divided into 256 bins,
and each bin stores an integer counting the number of
times that color value occurred in the entire database of
images. The pixels in the 18,696 photograph dataset were
used to populate the histogram. The histogram counts are
converted into a discrete probability distribution. To
visualize the probability distribution, a software tool was
developed for viewing the histogram as a 3-D model in
which each bin is rendered as a cube whose size is
proportional to the number of counts it contains. The
color of each cube corresponds to the smallest RGB triple
which is mapped to that bin in the histogram. Fig. 2(a)
shows a sample view of the histogram, produced by the
tool. This rendering uses a perspective projection model
with a viewing direction along the green-magenta axis
which joins corners (0; 255; 0) and (255; 0; 255) in color
space. The viewpoint was chosen to orient the gray line
horizontally. The gray line is the projection of the gray
axis which connects the black (0; 0; 0) and white (255;
255; 255) corners of the cube.
The histogram in Fig. 2(a) is of size 8 and only shows
bins with counts greater than 336,818. Down-sampling
and finding the threshold of the full size model makes the
global structure of the distribution more visible. By
examining the 3-D histogram from several angles its
overall shape can be inferred.
Another visualization of the model can be obtained by
computing its marginal distribution along a viewing
direction and plotting the resulting 2-D density function
as a surface. Fig. 2(c) shows the marginal distribution that
results from integrating the 3-D histogram along the same
green-magenta axis used in Fig. 2(a). The positions of the
black-red and black-green axes under projection are also
shown. The density is concentrated along a ridge which
follows the gray line from black to white. White has the
highest likelihood, followed closely by black.
Additional information about the shape of the surface in
Fig. 2(b) can be obtained by plotting its equi-probable
contours. These are shown in Fig. 2(c). They were
obtained with the contour function in Matlab. It is useful
to compare Fig. 2(c) with Fig. 2(a) as they are drawn from

2013 IEEE International Conference on Computational Intelligence and Computing Research

the same viewpoint. This plot reinforces the conclusion


that the density is concentrated around the gray line and is
more sharply peaked at white than black. An intriguing
feature of this plot is the bias in the distribution towards
red. This bias is clearly visible in Fig. 2(d), which shows
the contours produced by a different marginal density,
obtained by integrating along the gray axis. The
distribution shows a marked asymmetry with respect to
the axis of projection that is oriented at approximately 30
degrees to the red line in the Fig.2.

color space. Skin and non-skin histogram models using


13,640 photo dataset is constructed. The skin pixels in the
4675 images containing skin were labeled manually and
placed into the skin histogram. The 8965 images that did
not contain skin were placed into the non-skin histogram.
Given skin and non-skin histogram models a skin pixel
classifier can be constructed. Such a classifier could be
extremely useful in two contexts. First, for applications
such as the detection and recognition of faces and figures,
skin is a useful low-level cue that can be used to focus
attention on the most relevant portions of an image. A
second role for skin pixel detection is in image indexing
and retrieval, where the presence of skin pixels in a photo
is an attribute that could support queries or categorization.
The key step in skin pixel classification is the
computation of P (skin|rgb), which is given by Bayes
rule as in (4):

P(skinrgb)=

P (rgbskin) P (skin)

(4)

P (rgbskin) P (skin) + P (rgbskin) P (skin)

A particular RGB value is labeled skin if the condition in


(5) is satisfied
Fig. 2 Four visualizations of a full color RGB histogram model
constructed from nearly 2 billion web image pixels.
(a) 2-D rendering of 3-D histogram marginal.
(b) Surface plot of the model viewed along green- magenta
a b axis
density formed by integrating along the viewing
c d direction in (a).
(c)Equiprobability contours from the Surface plotting (b).
(d) Contour plot for an integration of (a) along the gray axis.

In summary, the generic color model built from web


images has three properties:
1. Most colors fall on or near the gray line.
2. Black and white is by far the most frequent colors, with
white occurring slightly more frequently.
3. There is a marked skew in the distribution toward the
red corner of the color cube. A generic color model can be
specialized to describe particular classes of objects if
labels are available for the training pixels.
3.3 Skin and Non-skin Color Models
The color of skin in images depends primarily on the
concentration of hemoglobin and melanin and on the
conditions of illumination. It is well-known that the hue
of skin is roughly invariant across different ethnic groups
after the illuminant has been discounted. This is because
differences in the concentration of pigments primarily
affect the saturation of skin color, not the hue.
Unfortunately we do not know the illumination conditions
in an arbitrary image and so the variation in skin colors is
much less constrained in practice. This is particularly true
for web images captured under a wide variety of
conditions. However, given a large collection of labeled
training pixels a model can still be formed from the
distribution of skin and non-skin colors in un-normalized

P (skinrgb)

(5)

P (skin) and P (skin) are the prior probabilities for any


color value being skin or non-skin, respectively. Since
P(skin) + P (skin) = 1, there is need to specify only one
of these priors. One reasonable choice for the prior
probability of skin is the ratio of the total skin pixels in
the histogram to the total of all the pixels.
The use of color spaces other than RGB (such as YUV or
HSV) will not improve the performance of the skin
detector. Detector performance depends entirely on the
amount of overlap between the skin and non-skin samples.
Colors which occur in both classes with comparable
frequencies cannot be classified reliably. No fixed global
transformation between color spaces can affect this
overlap. On the other hand, color normalization which
adjusts the colors in an image based on its global
properties could be beneficial in separating skin colors
from non-skin colors.
3.4 Histogram-based Skin Classifier
A series of experiments were conducted with histogram
color models using the skin classifier defined by (3). For
these experiments, the collection of photos was divided
into separate training and testing sets. Skin and non-skin
color models were constructed from a 6822 photo training
set. In this case there were 4483 training photos which
formed the non-skin color model and 2339 training photos
which formed the skin color model. From 6818 photo
testing set(4482 non-skin and 2336 skin photos) two
populations of labeled skin and non-skin pixels were

2013 IEEE International Conference on Computational Intelligence and Computing Research

obtained which were used to test the classifier


performance.
Classifier performance can be quantified by computing
the ROC curve which measures the threshold-dependent
trade-off between misses and false detections [5]. In
addition to the threshold setting, classifier performance is
also a function of the size of the histogram (number of
bins) in the color models. Too few bins results in poor
accuracy while too many bins lead to over-fitting.
Fig. 3 shows the family of ROC curves produced as the
size of the histogram varies from 256 bins/channel to 16.
The axis labeled Probability of correct detection gives
the fraction of pixels labeled as skin that were classified
correctly, while Probability of false detection gives the
fraction of non-skin pixels which are mistakenly classified
as skin. These curves were computed from the test data.
Histogram size 32 gave the best performance, superior to
the size 256 model at the larger false detection rates and
slightly better than the size 16 model.
In addition to histogram size, classifier performance is
also affected by the amount of training data. The
performance of the skin classifier was tested as the
amount of training data was increased.

trained was tested on a small set of data sampled


according to the distribution of skin and non-skin colors
in the full training set. 387,172 skin pixels and 4,261,703
non-skin pixels (1% of the training data) were sampled
and histogram models were built from these samples. In
this case we tried histograms with different numbers of
bins in order to find the optimal histogram size. It is
almost as good as the histogram model using the full
training set. This demonstrates that while a large data set
is necessary to capture the underlying distribution of skin
and non-skin colors, it is sufficient to train models on a
smaller set of samples.
There are two clear advantages of the non-parametric
methods i.e. they are fast in training and usage and they
are theoretically independent to the shape of skin
distribution. The disadvantages are much storage space
required and inability to interpolate or generalize the
training data. If, for example, RGB is quantized to 8 bits
per color, an array of 224 elements is needed to store skin
probabilities. To reduce the amount of needed memory
and to account for possibly large training data, coarser
color space samplings are used - 128x128x128, 64x64x64
and 32x32x32. The evaluation of different RGB
samplings has shown that 32x32x32 shows the best
performance.
IV. RESULTS AND DISCUSSION
Implementation of skin detection includes main steps such
as skinlikelihood, Computation and threshold. Skinness is
calculated for a given image by fixing the threshold to
zero. Fig. 4 shows an input image which when subjected
to designed system will yield results as shown in the
figures to follow.

Fig. 3 ROC curves for the skin detector as a function of histogram size.

A 2563 histogram model was used for these tests. To do


this, a list of skin and non-skin images in the training set
were taken and divided into chunks containing
approximately 2.5 million skin pixels and 28 million nonskin pixels. In each iteration, one such chunk of new skin
and non-skin pixels were added to the evolving training
set. As more data is added, performance on the training
set decreases because the overlap between skin and nonskin data increases. Performance on the test set improves
because the test and training distributions become more
similar as the amount of training data increases.
Performance on both training and test sets converges
relatively quickly. During this research, photos selected at
random from a larger set were added to the model until
the ROC curves had converged. The final total of 13,640
photos corresponds to this stopping point. In the final
histogram experiment, the performance of the models

The analysis of the above GUI is discussed through the


following cases.
Case1: When the input image button is clicked, the Fig.
4 appears. Select the folder where the images are stored.
Case2: When the skin likelihood button is clicked, Fig.
5 appears.
Case3: When the skin map button is clicked Fig.6
appears.
The skinness detected for different set of images are
tabulated in the Table 1 using the (6)
S=Log (H(r,g,b)/h(r,g,b))

(6)

The Table I clearly shows that the designed skin detector


is image independent and can adapt to a large set of
images.
The results obtained show the performance of the
classifier. The false detection rate is low and hence there
is a good performance of the detector.

2013 IEEE International Conference on Computational Intelligence and Computing Research

REFERENCES

Fig. 4 input image

Fig. 5 skin likelihood

Fig. 6 skin map

V. CONCLUSION AND FUTURE SCOPE


Skin detector is a powerful preprocessing technique. This
enables a user to perform an accurate high level image
processing. Skin detection brings in adapting environment
there by paving way to design a robust system.
Any classifier has its own limitations. The performance of
the designed skin detector fades if the input image
contains shadows. The one among the possible solutions
would be by employing IR and visual imagery together.
TABLE I
SKIN-NESS MEASURE FOR DIFFERENT SET OF IMAGES
Sl.
No.

Image

Minimum

Maximum

Group( more than 20)

-15.79

6.10

Single(clear)

-15.79

7.49

Double(two in day light)

-14.67

5.46

Blur image of camouflage(group)

-14.67

2.87

Blur single camouflage

-14.32

1.25

Group camouflage(flight)

-13.29

5.51

Burka image

-15.79

7.49

Single camouflage (clear)

-15.79

5.92

Single( in the room)

-14.32

7.49

[1] Bernt Schiele and Alex Waibel. Gaze tracking based on


face-color, In Proceedings of the International Workshop
on Automatic Face-and Gesture-Recognition, pages 344
349, Zurich, Switzerland, June 26-28, 1995.
[2] Rick Kjeldsen and John Kender. Finding skin in color
images, In Proceedings of the International Conference on
Automatic Face and Gesture Recognition, pages 312317,
Killington, VT, October 14-16, 1996.
[3] T. S. Jebera and A. Pentland. Parameterized structure from
motion for 3d adaptive feedback tracking of faces, In
Proc. Computer Vision and Pattern Recognition, pages
144150, San Juan, Puerto Rico, June 17-19, 1997.
[4] Menser, B., and Wien, M. 2000. Segmentation and tracking
of facial regions in color image sequences. In Proc. SPIE
Visual Communications and Image Processing 2000, 731740.
[5] Jones, M. J., and Rehg, J. M., Statistical color models with
application to skin detection, In Proc. of the CVPR '99,
vol. 1, 274-280, 1999.
[6] Terrillon, J.-C., Shirazi, M. N., Fukamachi, H., and
Akamatsu, S., Comparative performance of different skin
chrominance models and chrominance spaces for the
automatic detection of human faces in color images, In
Proc. of the International Conference on Face and Gesture
Recognition, 54-61, 2000.
[7] Hsu, R.-L., Abdel-Mottaleb, M., and Jain, A. K., Face
detection in color images, IEEE Trans. Pattern Analysis
and Machine Intelligence 24, 5, 696-706, 2002.
[8] Yang, M.-H., and Ahuja, N., Detecting human faces in
color images, In International Conference on Image
Processing (ICIP), vol. 1, 127-130, 1998.
[9] Saber, E., and Tekalp, A., Frontal-view face detection and
facial feature extraction using color, shape and symmetry
based cost functions, In Pattern Recognition Letters, vol.
9, 669-680, 1998.
[10] McKenna, S., Gong, S., and Raja, Y., Modelling facial
colour and identity with gaussian mixtures, Pattern
Recognition 31, 12, 1883-1892, 1998.
[11] Oliver, N., Pentland, A., and Berard, F. Lafter: Lips and
face real time tracker, In Proc. Computer Vision and
Pattern Recognition, 123-129, 1997.
[12] Lee, J. Y., and Yoo, S. I., An elliptical boundary model for
skin color detection, In Proc. of the 2002 International
Conference on Imaging Science, Systems, and Technology,
2002.
[13] Wei Xiong, Qingquan Li, Chinese skin detection in
different color spaces, in wireless Communications &
Signal Processing (WCSP), 2012 International Conference
Publication Year: 2012 , Page(s): 1 5.
[14] Wei Ren Tan ; Chee Seng Chan ; Yogarajah, P. ; Condell,
J., A Fusion Approach for Efficient Human Skin
Detection, IEEE Transactions on Industrial Informatics,
Volume: 8, Publication Year: 2012 , Page(s): 138 147.