Sunteți pe pagina 1din 48

Business Analytics

Business Analytics
Instructor : Daniyal Nawaz

1
Business Analytics

Lecture # 03

Descriptive Statistics

2
Creating Distributions from Data

Cumulative Distributions
• Cumulative frequency distribution: A variation
of the frequency distribution that provides
another tabular summary of quantitative data
– Uses the number of classes, class widths, and class
limits developed for the frequency distribution
– Shows the number of data items with values less
than or equal to the upper class limit of each class

3
Cumulative Frequency, Cumulative Relative
Frequency, and Cumulative Percent Frequency
Distributions for the Audit Time Data

4
Sorting and Filtering Data in Excel
Conditional Formatting of Data in Excel

MODIFYING DATA IN EXCEL


Filtering and sorting exercise

PRACTICE

6
Modifying Data in Excel

Sorting and Filtering Data in Excel


• To sort the automobiles by March 2010 sales:
– Step 1: Select cells A1:F21
– Step 2: Click the Data tab in the Ribbon
– Step 3: Click Sort in the Sort & Filter group
– Step 4: Select the check box for My data has headers
– Step 5: In the first Sort by dropdown menu, select
Sales (March 2010)
– Step 6: In the Order dropdown menu, select Largest
to Smallest
– Step 7: Click OK

7
Top-Selling Automobiles Data Sorted by Sales
in March 2010 Sales

8
Modifying Data in Excel
Sorting and Filtering Data in Excel
• Using Excel’s Filter function to see the sales of models made by Toyota

– Step 1: Select cells A1:F21


– Step 2: Click the Data tab in the Ribbon
– Step 3: Click Filter in the Sort & Filter group
– Step 4: Click on the Filter Arrow in column B, next to Manufacturer
– Step 5: If all choices are checked, you can easily deselect all choices by
unchecking (Select All). Then select only the check box for Toyota.
– Step 6. Click OK

9
Using Excel’s Sort Function to Sort the Top-
Selling Automobiles Data

10
Top Selling Automobiles Data Filtered to Show Only Automobiles
Manufactured by Toyota

11
Modifying Data in Excel
Conditional Formatting of Data in Excel
• Makes it easy to identify data that satisfy certain conditions
in a data set
• To identify the automobile models in Table 2.2 for which
sales had decreased from March 2010 to March 2011:
– Step 1: Starting with the original data shown in Figure 2.3, select
cells F1:F21
– Step 2: Click on the Home tab in the Ribbon
– Step 3: Click Conditional Formatting in the Styles group
– Step 4: Select Highlight Cells Rules, and click Less Than from the
dropdown menu
– Step 5: Enter 0% in the Format cells that are LESS THAN: box
– Step 6: Click OK

12
Using Conditional Formatting in Excel to Highlight Automobiles with
Declining Sales from March 2010

13
Using Conditional Formatting in Excel to Generate Data Bars for the Top-
Selling Automobiles Data

14
Modifying Data in Excel

• Quick Analysis button appears just outside the bottom-


right corner of a group of selected cells
• Provides shortcuts for Conditional Formatting, adding
Data Bars, etc.
Creating a Frequency Distribution for
Soft Drinks Data in Excel

16
Using Excel to Generate a Frequency Distribution for
Audit Times Data

17
histogram
Histograms can be created in Excel using the Data
Analysis ToolPak.
Following are the steps to create histogram in Excel.
Step 1. Click the DATA tab in the Ribbon
Step 2. Click Data Analysis in the Analysis group
Step 3. When the Data Analysis dialog box opens, choose
Histogram from the list of Analysis Tools, and click OK
In the Input Range: box, enter A2:D6
In the Bin Range: box, enter A10:A14
Under Output Options:, select New Workshee Ply:
Select the check box for Chart Output
Click OK

18
Figure 2.13: Creating a Histogram for the Audit Time Data Using
Data Analysis Toolpak in Excel

19
Figure 2.14: Completed Histogram for the Audit Time Data Using
Data Analysis ToolPak in Excel

20
Measures of Location

1. Mean (Arithmetic Mean):


• The most commonly used measure of location
is the mean (arithmetic mean), or average
value, for a variable
• If the data are for a sample (typically the
case), the mean is denoted by x .
• the population mean is computed in the
• same manner, but denoted by the Greek letter
m.
21
Sample mean

22
Example

23
Solution

24
Measures of location (MEDIAN)

2. Median
• measure of central location, it is the value in
the middle when the data are arranged in
ascending order (smallest to largest value).
• With an odd number of observations, the
median is the middle value.
• An even number of observations has no single
middle

25
• Let us apply this definition to compute the
median class size for a sample of five college
classes.

• Arranging the data in ascending order provides


the following list:
32 42 46 46 54
• Because n = 5 is odd, the median is the middle
value. Thus, the median class size is 46 students.

26
MEDIAN

• Suppose we also compute the median value


for the 12 home sales in Table 2.9. We first
arrange the data in ascending order.

108,000 138,000 138,000 142,000


186,000 199,500 208,000 254,000
254,000 257,500 298,000 456,250

27
• Because n =12 is even, the median is the
average of the middle two values: 199,500
and 208,000.

28
Measure of location (MODE)
• A third measure of location, the mode, is the
value that occurs most frequently in a dataset.
To illustrate the identification of the mode,
consider the sample of five class sizes.
32 42 46 46 54
• The only value that occurs more than once is
46. Because this value, occurring with a
frequency of 2, has the greatest frequency, it
is mode

29
Measure of location (Geometric mean)

• The geometric mean is a measure of location


that is calculated by finding the nth root of the
product of n values. The general formula for
the sample geometric mean, denoted xg,
follows.

30
• The geometric mean is often used in analyzing
growth rates in financial data. In these types
of situations, the arithmetic mean or average
value will provide misleading results.

31
• To illustrate the use of the geometric mean, consider Table
2.10, which shows the percentage annual returns, or growth
rates, for a mutual fund over the past 10 years

32
• $100 - 0.221($100) = $100(1- 0.221)
=$100(0.779) =$77.90

• We refer to 0.779 as the growth factor for


year 1 in Table 2.10
• We can compute the balance at the end of
year 1 by multiplying the value invested in the
fund at the beginning of year 1 by the growth
factor for year 1: $100(0.779) =$77.90.
33
$100[(0.779)(1.287)(1.109)(1.049)(1.158)(1.055)
(0.630)(1.265)(1.151)(1.021)]

=$100(1.335) = $133.45

34
35
Measures of variability
In addition to measures of location, it is often desirable to
consider measures of variability.
For example,
suppose that you are considering two financial funds. Both
funds require a $1,000 annual investment.
• Fund A has paid out exactly $1,100 each year for an
initial $1,000 investment.
• Fund B has had many different payouts, but the mean
payout over the previous 20 years is also $1,100.
But would you consider the payouts of Fund A and Fund B
to be equivalent?
Clearly, the answer is NO
The difference between the two funds is due to variability.
36
37
• Figure 2.18 shows a histogram for the payouts
received from Funds A and B. Although
• the mean payout is the same for the two
funds, their histograms differ in that the
payouts associated with Fund B have greater
variability.
• Sometimes the payouts are considerably
larger than the mean, and sometimes they are
considerably smaller.

38
39
Range

• The simplest measure of variability is the


range. The range can be found by subtracting
the smallest value from the largest value in a
data set.
• E.g. Refer to the data from home sales prices
in Table 2.9. The largest home sales price is
$456,250, and the smallest is $108,000.
• The range is $456,250 - $108,000 = $348,250.

40
• Range =MAX(B2:B13)-MIN(B2:B13)

41
Variance

• The variance is a measure of variability that


utilizes all the data. The variance is based on
the deviation about the mean, which is the
difference between the value of each
observation (xi ) and the mean

42
43
standard deviation

• The standard deviation is defined to be


the positive square root of the variance.
We use ‘s’ to denote the sample standard
deviation and to denote the population
standard deviation.

44
45
Table 2.12

The sample variance for the sample of class sizes in five college
classes is s^2 =64.
Thus,
the sample standard deviation is = 8.

46
Coefficient of variation

• In some situations we may be interested in a


descriptive statistic that indicates how large
the standard deviation is relative to the mean.
This measure is called the coefficient of
variation and is usually expressed as a
percentage.

47
• we found a sample mean of 44 and a sample
standard deviation of 8.
• The coefficient of variation is (8/44 * 100)
18.2%.

48

S-ar putea să vă placă și