Sunteți pe pagina 1din 7

Hello Everyone!

I hope you have had a good day and are ready for a good seminar
tonight. We have a lot to talk about tonight.

You will like the amount of useful tips that I will share with you tonight. You can
use them directly in your assignments or they help you understand the concepts
better. Before I get to them I need to make a couple of statements.

Every body should have installed the Data Analysis Plus and activated Data Analysis
by now. I you have not, please do it as soon as possible. If you have tried the option
and it didn’t work then, most likely, you need to uninstall the MS Office that is
installed on your computer and reinstall it as -complete- install mode. If you still
have problems, please contact the Student Services. They deal with similar
problems every term and can answer fix these issues.

There should be a link called Net Tutor on your home page that allows you to call a
professional tutoring center and get help with the course material. Take advantage of
this free service as much as possible. Of course, I am also available during our office
hours (Sundays from 9:00 pm to 11:00 pm on AIM) and via email. The office hour is
a good chance for a one-on-one tutoring. A few of you have already tried the office
hours and found it useful.

The seminar system keeps a list of seminar attendants in each seminar. The system
*stamps*the log in and log out of each student when student attend a seminar. So,
please try to be on time and stay to the end of the seminar so I can give you full
credit for seminar. I do not want you lose any credit on the seminar points 
Thank you.

Tonight, I will discuss some concepts that I know students usually have questions
over them. Please feel free to ask questions during my lecture. You can ask any non-
topic related questions that you have after the talk. This way we get a lot more
done.

And a request to those of you who have taken Statistics prior to taking this class. I
have to work with *all* of you to make sure we all are on the same page. So, please
be patient with the class if you see some of the concepts we discuss are not so much
challenging to you yet. Believe me it gets enough challenging for everyone after the
Reading week as we will mainly cover the concepts that are not in our Statistics
course curriculum.

I know the areas that are usually a little challenging to some students in this class. I
have posted a message about summation notation earlier on Doc Sharing. If we are
not comfortable enough with Summation Notation we will have difficulties to follow
the book's work or to find the correct answer to some measures such as the
standard deviation formula.

It is going to be with us all the way to the end of the term. Do you want me to work
out an example of Summation Notation or you are OK with it?
Let me give you an example. This example is not in the book but I sent it to you
earlier in the week. Suppose you have a table of x and y values as below:

Let X be: 1, 2, 3, 4, 5, 6 and Y: 6, 1, 9, 5, 17, 12. So, each ordered pair of (x,y) is
(1,6), (2,1), (3,9), (4,5), (5,17), 6,12). So, every X value is associated with a Y
value.

Is it clear so far?

So, E x = 1+2+3+4+5+6= 21. It means sum of all x values.

Similarly, E y = 6+1+9+5+17+12= 50. So, the sum of all y values is 50.

E x y = Sum of product of all x values to their y values = (1)(6)+(2)(1)+(3)(9)+(4)


(5)+(5)(17)+(6)(12) = 212.

Similarly, E X^2 = Sum of the squared values of x = 1^2 + 2^2 + 3^2 + 4^2 +
5^2 +6 ^2 = 91 where “ ^ ” means power of 2 or taken to the second power (for
example: 2^ 3 = 2 * 2* 2 = 8).

How was summation notation discussion? Was the discussion straightforward to you?

You will be using Mean Absolute Deviation (MAD) in your assignments this week.
MAD (page 87) is the sum of absolute differences of all values from their mean
divided by total number of data values. Let me give you an example. Let's say you
have values 3, 4, and 5. Here is the calculation of MAD:

We have 3 numbers and the Mean of these three numbers is (3 + 4 + 5) / 3 = 12 /


3=4

And remember that absolute value of a number means the distance of that number
from zero (Algebra) so | 4 | = 4 and similarly | -4 | = 4

MAD = [ | 3 - 4 | + | 4 - 4 | + | 5 - 4 | ] /3

= [ | -1 | + | 0|+ |1|] /3

= (1 + 0 + 1) / 3 = 2/3 = 0.66667.

You can also use Excel to do it for you. This is not in the book but do not worry I will
not charge you extra! Here is how to do it. Please do this after our seminar. I
promise that it will work.

Enter 3, 4, and 5 (remember that this is just a simple example and these numbers
are chosen arbitrarily) into cells A1, A2, and A3. Now , click on an empty cell
anywhere and then click on the Fx on the menu bar select "statistical" or “All” on
the left side and then select AVEDEV (the top one) on the right side. Then click on
OK. The result of MAD will be shown on the cell you chose.
You will get the same answer: 0.666667

Wasn't it easy?
For calculations of standard deviation, if you can get Excel to use that would always
be the best. I will give you the Excel formulas at the end of this seminar. You can
also use Descriptive Statistic option of Data Analysis.

Simply open your dataset in an Excel worksheet, highlight it, go to “Tools” in Excel,
then go to Data Analysis and select Descriptive Statistics. Then enter the range of
your data (let's say you are doing it for our 3 numbers 3, 4, and 5 that are on A1,
A2, and A3). So, here you enter A1:A3 in Input Range and click on "Summary
Statistic" and Confidence Interval for Mean". Your descriptive statistics would open
with lots of information in it. This is very useful.

But it is good to know the manual way so we can follow the book discussion like the
one on page 89. I wait a little until you get your book if you want.

The -deviation- is the difference between each value and the mean. For instance,
here we know sum of x values is 100 (already calculated here) and therefore the
mean of y is 100 / 5 or 20. So, deviation of the second x value from the mean of x
values is 21 – 20 = 1. -Standard Deviation- has a different meaning from deviation.
It is the measure of variability or spread among all of the x values.

On page 95, they talk about a bell -shaped distribution. We will call it Normal
distribution and will cover it soon. On page 96, books says 68 percent of the data
falls within one standard deviation of the data; 95 percent falls between 2 standard
deviation of the data and almost all (99.7 percent) of the data falls between 3
standard deviation of the data.

There is not much more explanation on this discussion on this page because we
really cover it in detail later. So, I am going to give you a numerical example to
make it more understandable. It is in chapter 7 but fits what we need now. We come
back to chapter 3 after this example.

This is a numerical example and helps you visually see what is going on. Visual
presentation is very helpful in statistics when it comes to normal curve and
calculations. It also helps you with your assignment for chapter 3.

I want you to take a look at page 246 for now. As you see in the bottom normal
graph the mean is 130 and standard deviation is 30. Now one standard deviation
around the mean is 130-30 and 130+30 or 100-160. So, we can say 68% of the
data are between 100 and 160 (whatever these numbers represent).

Do not read too hard into this. It is supposed to be easy. I gave you this example so
you can visually see what a empirical rule of 68%, 95%, and 99.7% means. As you
see two standard deviation around the mean is:
130–(2*30) and 130+(2*30)

= (130-60) and (130+60) which is between 70 and 190.

So, we say 95% of the values are between 70 and 190. These and the 99.7% rule
are characteristics of a normal distribution that we will revisit in chapter 7 in more
detail again.

Now, what if the question was: What percent of scores are less than 100?

Here is one way to find out…

First we check to see 100 is how many standard deviation value from the mean.
Since mean is 130 and standard deviation is 30 then 100 is one standard deviation
less than the mean (130-30=100).

Now, we know one standard deviation AROUND the mean covers 68% of the data.
So, the sum of the two corner areas (left corner which is below 100 and right corner
under the curve which is above 160) beyond one standard deviation is 32%.

So, if we are looking for percentage less than 100 we are just talking about the lest
side so we divide 32% to find the left corner percentage. That is our answer. So,
about 16% of scores are below 100.

Similarly, if the question was what percent of scores are more than 160, we say the
answer is 16%. The reason is just like what we discussed here. The only difference
is that this time we are looking at right corner which is the same area as the left
corner under the curve.

Does it make sense?

Here is another example (we are going back to chapter 3 now).

Let us say distribution of test score in a class is normally distributed with a mean of
80 and standard deviation of 5 (We get to more details when we get to Normal
Distribution chapter).

So since mean value is 80 and the data was normally distributed then we can say
99.7 percent of students got scores between 65 and 95… Here is the calculation… [
“*” is the multiplication symbol here]

80 – (3 *5) and 80 + (3 * 5) =
(80-15) and (80+15) which is between 65 and 95 .

Similarly, 68 percent of students got scores between 75 and 85. Again, the
calculation is…. 80 – (1 * 5) and 80 + (1 * 5) which is between 75 and 85.

Now, what if the question was: What percent of scores are less than 75?

Here is one way to find out…


First we check to see 75 is how many standard deviation value from the mean. Since
mean is 80 and standard deviation is 5 then 75 is one standard deviation less than
the mean. Now, we know one standard deviation AROUND the mean covers 68% of
the data. So, the sum of the two corner areas (left corner and right corner under the
curve) beyond one standard deviation is 32%.

So, if we are looking for percentage less than 75 we are just talking about the lest
side so we divide 32% to find the left corner percentage. That is our answer. So,
about 16% of scores are below 75.

Similarly, if the question was what percent of scores are more than 85, we say the
answer is 16%. The reason is just like what we discussed here. The only difference
is that this time we are looking at right corner which is the same area as the left
corner under the curve.

Now, what if we want to find the percentage of scores less than or equal to 85?

In this case since we know 68% falls between 75 and 85 (one standard deviation
rule) and less than 75 is 16% then the percentage of scores less than 85 is (68% +
16%) = 84%. This is one way to do it. Another way is as follow:

We know on standard deviation around the mean covers 68% of the data so 68% of
test scores are between 75 and 85. Now, by symmetry the area between the mean
and 85 is half of 68% or 34%. Now, if we add the left side of the area under the
curve (which is 50%) with this 34% we get 84%! So, either way we get the same
answer. You can expand this to 2 standard deviation or 3 standard deviation
problems.

Now, what if the test scores of students were NOT bell shaped and were highly
skewed? What do we use now? We cannot use Empirical rules in this case.

Again, suppose the test scores of a class are highly skewed and the mean of test
scores is 80 and standard deviation of test scores is 5. And, suppose we want to
know how the percent of test scores that are between 70 and 90.

Since we are told that the distribution of test scores is highly skewed we need to use
Chebyshev’s theory (page 93). Now, since the range of 70 and 90 is really TWO
standard deviations below and above the mean ( 80–(2*5) and 80+(2*5) ) then
according to Chebyshev’s theory K is equal to 2 (because it is TWO standard
deviations around the mean).

The Chebyshev’s formulais 1 – [1 / (K)^2 ]

If we plug in value of K in the formula we get: 1 – [1 / (2^2) ] = 1 – 1 / 4 = 3 / 4


or 75%.

Therefore, we say the percent of test scores that are between 70 and 90 are at a
MINIMUM of 75%. It means it is 75% or maybe more.
You can use "Descriptive Statistics" under Data Analysis to get Mean, Median, Q1,
Q3, and other values of any dataset. It is a very useful feature. Don't forget to use
it…

Now, let's talk about Coefficient of Variation. Let's say we have two data set that
have the following values. First dataset is 1,3,5 and the second dataset is 121,
123, and 125. The mean of the first dataset is 3 and the mean of the second dataset
is 123. If you calculate the standard deviation for both of these datasets you get the
same value because all values in the two dataset have the same distance from their
mean (just calculate to see they are the same).

But, can we say that the variability among these two datasets are the same as well?
The answer is no because there are two units differences among the values in both
datasets but the values themselves are so much smaller in the first data set (1, 2,
and 3). So, relatively, the first dataset is more spread compare to the second dataset
(1,3,5 vs. 121, 123, and 125) .

Two unit difference between values of 121, 123, and 123 is a lot less significant than
2 units difference between 1, 3, and 5. So, CV [CV= (standard deviation / mean
value) * 100] is a better measure of variability when we compare two or more
datasets because it divides the standard deviation by the mean value. It will give a
more accurate measure of variability.

To use the following Statistical commands in Excel you need to open a blank Excel
document first and click on a cell to make it active. Then, go to Insert on the Menu
bar and then go to Function. Then, click on Statistical on the left side box.

You can also get help from the HELP in the menu bar in Excel.

Do not forget to include “=” when you try to perform a function in Excel.

To find th Average value:

If A1:A5 is named Scores and contains the numbers 10, 7, 9, 27, and 2, then:

=AVERAGE(A1:A5) equals 11. This is the mean of numbers.

Suppose the sample values (1345, 1301, 1368, 1322, 1310, 1370, 1318, 1350,
1303, 1299) are stored in A1:A10, respectively. STDEV estimates the standard
deviation of these numbers. So, =STDEV(A1:A10) equals 27.46.

Also do not forget the use of Descriptive Statistics in Excel (Data Analysis). It is a
very powerful tool and it gives you the value of mean, standard deviation, mode, and
several other features in statistics.
Book talks briefly about correlation this week but we get to more in depth discussion
on it in a few weeks.

S-ar putea să vă placă și