Sunteți pe pagina 1din 16

GRAPHICAL PRESENTATIONS 1 (REPRESENTATION OF CATEGORICAL DATA)

Applied Statistics and Computing Lab Indian School of Business

Applied Statistics and Computing Lab

Learning Goals
Why we use Graphs? Basic Graphs for Categorical Data

Applied Statistics and Computing Lab

Why use graphs?


A market survey firm is conducting a survey on the popularity of different makes of car in the USA. For this, it investigates various car showrooms and lists down the various car varieties in each showroom Suppose it obtains the following data on the makes of 28 cars in one such showroom : Buick, Cadillac, Buick, Chevrolet, Buick, Buick, Buick, Pontiac, Cadillac, Chevrolet, SAAB, SAAB, SAAB, Cadillac, Chevrolet, Pontiac, Buick, Buick, Buick, SAAB, Cadillac, SAAB, Pontiac, SAAB, Chevrolet, Buick, SAAB, Cadillac. Relevant Questions:
Which is the most common car in the showroom? Which is the least common car in the showroom?

How to answer these questions? Probably the first thing you do, is count the observations for each car make and note them down Once you note them down in a tabular form, that makes it a frequency table Frequency Table for categorical data is a table that displays the possible categories along with the associated frequencies or relative frequencies.
Applied Statistics and Computing Lab
3

Table 1: Frequency Table


Color Buick Cadillac Chevrolet Pontiac SAAB Frequency 9 5 4 3 7

At one glance- the answers are there! Buick is the most common brand in that showroom and Pontiac is the least common brand A frequency table is the simplest representation of data in a tabular form This same data can be represented pictorially in a number of ways!
Applied Statistics and Computing Lab
4

Different types of graphs for representing categorical data


Graphs for presenting categorical data:
Frequency Table Bar Chart- Multiple Bar, Divided or Segmented Bar Pie Chart

Applied Statistics and Computing Lab

Table 2:Why different graphsIllustration through examples


Consider the following Frequency table of the population of Andhra Pradesh in 2011:
Category Rural Male Rural Female Urban Male Urban Female Population in 2011 28219760 28092028 14290121 14063624

Applied Statistics and Computing Lab

Bar Chart
Bar chart is a graph of the frequency distribution of categorical data. Each category in the frequency distribution is represented by a bar or rectangle. Area of each bar is proportional to the corresponding frequency. Bar chart maybe vertical or horizontal. The following is vertical

Inferences
For policy reasons, one is interested in the composition of population in Andhra Pradesh. From a bar chart, we can get the most frequently and infrequently occurring categories- Rural male has the highest occurrence, urban female the lowest. However, one may also be interested in the relative share of each category, rather than the absolute figure Visually, from the bar chart rural male and rural female seems almost equal. So does urban male and female To analyze their relative share we look at pie chart
7

Applied Statistics and Computing Lab

Pie Chart

A circle is used to represent the whole data set Slices of the pie represent possible categories Area of the slice for a particular category is proportional to the corresponding relative frequency When to use: categorical data with a relatively small number of categories In case of many categories, merge a few categories into one Most useful for illustrating proportions of the whole dataset for various categories Criticism: Not an effective visual display if one doesnt specify the percentages
8

Applied Statistics and Computing Lab

Multiple Bar Graph


Suppose the government of AP is interested in population control. They are interested to know how the population in each category has changed in the decade from 2001 to 2011 so that they know which section to target We have bar plot and Pie chart of population in 2011. Draw the same for 2001. Can we compare graphically and readily answer the questions? (Check!) A pie chart and bar chart allow comparison between categories within a year- but not across years To facilitate such a comparison Multiple bar graph is used Category Rural Male Rural Female Urban Male Urban Female Population in 2011 28219760 28092028 14290121 14063624 Population in 2001 27937204 27463863 10590209 10218731
9

Applied Statistics and Computing Lab

Multiple Bar Graph


Inferences:

What is a multiple bar graph?


Variant of the bar diagram Used to compare one or more series of data on the same variable or for showing different components of an item Several sets of bars are drawn so that bars for a particular period (here 2001 and 2011) or related phenomenon are put together and uniform gap is maintained between any two sets of bars
Applied Statistics and Computing Lab

Most marked increase in population in the category of urban female, followed closely by urban male Very slight increase in the categories rural male and rural female Possible cause- rural to urban migration- needs investigation Since increase in each category from 2001 to 2011, hence total increase in population from 2001 to 2011 However, cannot visualize exactly how much the population has increased from 2001 to 2011 Also, no information about the relative contribution of each category to the total For this, use segmented or stacked bar plot

10

Segmented Bar Graph: Year wise


Segmented Bar Graph uses a rectangular bar rather than a circle to represent the entire dataset The bar is then divided into segments, with different segments representing different categories Size of the segment for a particular category is proportional to the relative frequency for that category

Inferences
Gives an idea about the total increase in population from 2001 to 2011 Also, the relative share of each category in each year
11

Applied Statistics and Computing Lab

Segmented Bar Graph: Category-wise


Suppose we want to compare each categoryRural male vs. rural female, urban male vs. urban female in both the years Simple trick- switch the row/column in your data so that you have segmented bar graph, category wise

Rural male is the most dominant category, followed by rural female From 2001 to 2011, there has been a relative increase in the urban male and urban female population (relative to rural male and rural female)

Applied Statistics and Computing Lab

12

Pictorial Presentation or Table?


The objective of a frequency table is to provide insights about the data that cannot be quickly obtained by looking only at the original data- like the cars example Here if you make a pictorial presentation of the data, no extra gain in information But Pictorial Presentation sometimes necessary to depict some features of datanot readily readable from table- refer table 2, slide 11 Recall some questions we have askedWhat is the relative share of urban male, urban female, rural male and rural female in the population of Andhra Pradesh in 2011? By how much did the total population increase from 2001 to 2011? What was the combined urban male, urban female, rural male and rural female populations for the years 2001 and 2011? You cannot readily answer such questions by looking at just the frequency table which only gives information about the frequency of each category. However, you will be able to answer questions likeWhich one is the most frequently occurring category in 2001 and in 2011? Was there an increase in the frequency of urban male, urban female, rural male and rural female from 2001 to 2011?
13

Applied Statistics and Computing Lab

R-Codes
#Creating Data APPopulation = cbind(c(28219760,28092028,14290121,14063624), c(27937204,27463863,10590209,10218731)) rownames(APPopulation) = c("RuralMale","RuralFemale","UrbanMale","UrbanFemale") colnames(APPopulation) = c("2011","2001") #barplot colors=c("red", "bisque", "darkslategray", "violet") barplot(APPopulation[,"2011"]/1000000,col=colors) title(main="Barplot of AP Population in 2011 (in millions)") # Multiple Bar Graph: A = matrix( c(10218731,10590209,27463863,27937204,14063624,14290121,28092028,28219760), # the data elements nrow=2, # number of rows ncol=4, # number of columns byrow = TRUE) # fill matrix by rows colors=c("red", "bisque") barplot(A/1000000,names.arg =rev(rownames(APPopulation)),legend.text=c(2001,2011),beside=TRUE,main="Distribution of population by category",xlab="Categories", ylab="population, in millions",ylim=c(0,80),col=colors)

Applied Statistics and Computing Lab

14

R-Codes (Continued)
# Segmented bar graph (yearwise) colors=c("red", "bisque", "darkslategray", "violet","red","yellow") barplot(APPopulation/1000000, main="Distribution of population by category yearwise", xlab="Year", ylab="population, in millions",col=colors, legend = rownames(APPopulation)) # Segmented bar graph (categorywise) colors=c("red", "bisque") A = matrix( c(10218731,10590209,27463863,27937204,14063624,14290121,28092028,28219760), # the data elements nrow=2, ncol=4, byrow = TRUE) barplot(A/1000000,names.arg =rev(rownames(APPopulation)),legend.text=c(2001,2011),beside=FALSE,main="Distribution of population by category",xlab="Categories", ylab="population, in millions",ylim=c(0,90),col=colors) # Pie Chart colors=c("red", "bisque", "darkslategray", "violet") slices <- c(27937204,27463863,10590209,10218731) lbls <- c("RuralMale","RuralFemale","UrbanMale","UrbanFemale") pct <- round(slices/sum(slices)*100) lbls <- paste(lbls, pct) # add percents to labels lbls <- paste(lbls,"%",sep="") # ad % to labels pie(slices,labels = lbls, col=rainbow(length(lbls)), main="Pie Chart of APPopulation in 2011")

Applied Statistics and Computing Lab

15

Thank you

Applied Statistics and Computing Lab

16

S-ar putea să vă placă și