Documente Academic
Documente Profesional
Documente Cultură
Prof. Aryee
Why Statistics
Any field of study that collects data, summarizes and describes the information collected, as well as interprets and draws valid conclusions from the information collected is a candidate for statistical application
Why Statistics
7/3/2013
Why Statistics
Statistics provide us with the tools to analyze data. Whether we want to detect differences between groups of people, events or activities, reorganize data to identify hidden patterns, or to create models in predicting outcomes of future events, statistics provide us with a variety of tools to achieve our goals.
3
Why Statistics
Why Statistics
The following list of reasons for taking statistics. Statistics gives us a clearer understanding of the world around us. It provides the methods and techniques for developing knowledge and for learning from information, thus forming the basis for thinking and planning ahead.
4
Why Statistics
7/3/2013
Why Statistics
Statistics allow us to formulate questions that can be addressed by using data and it provides the methods needed to adequately describe, summarize, analyze, interpret and draw valid conclusions from the set of data to answer the questions.
Why Statistics
Why Statistics
Proper usage of statistics helps us to critically interpret and evaluate claims as well as to make informed decisions in the face of uncertainty. The tools of statistics are widely employed in many fields of study, including business, communication, science, law, and so on.
Why Statistics
7/3/2013
Why Statistics
a) Designing the data collection process and experiments, b) Preparing the data collected for analysis and to aid understanding, c) Analyzing and drawing conclusions from data, and d) Making estimates and predictions from data.
8
Statistics, What Is It?
7/3/2013
7/3/2013
11
7/3/2013
The aim is to describe what is going on within the data or what the data collected actually shows. There is no intention to make conclusions that extend beyond the data actually collected.
Numerical counts or frequencies Construction of tables and graphs Computation of various descriptive measures such as averages, percentages, and percentiles Computation of variability measures such as range, variance, and standard deviations
14
What is Descriptive Statistics
7/3/2013
15
16
7/3/2013
Inferential Statistics
In inferential statistics, analysis of data is directed towards generalizing, summarizing, predicting and making valid conclusion about a larger set of data from which the given sample was collected and of which the given sample forms just a part.
17
Inferential Statistics
When we use statistical methods to draw conclusions, make estimations, predictions, and generalization about an entire set of data by studying only part of the data, then we are dealing with inferential statistics. Inferential statistics allows us to use information from a smaller group to make inferences about a larger group from which the smaller group was taken.
18
Inferential Statsitics
7/3/2013
Inferential Statistics
How many children die each year from child abuse? Based on data reported by CPS agencies in 2001, it is estimated that nationwide, 2,000 children died as a result of abuse or neglect. Based on this number, five to six children die each day as a result of child abuse or neglect.
Source: http://www.preventchildabuse.com/abuse.htm
19
Inferential Statistics
10
7/3/2013
Answer: The population of interest is all firstyear Seton Hall university students.
21
Statistics, what is a population and a sample
A sample
When a population is inaccessible or not available (due to time or money constraint), or we cannot get a complete set because it is impractical or impossible to obtain a complete set, we draw samples. A sample is a collection of some (but not all) of the elements of the population. Thus, a sample is a subset of the population. It is usually selected to represent the population from which it was drawn.
22
Statistics, what is a population and a sample
11
7/3/2013
A sample
It is important to note that, different samples may give us different portions of the same population. As a result, if we already know the result of one sample and then draw a second sample from the same population we should not expect to have the exact same replica of data in the first. The difference in two or more samples drawn from the same population is called sampling variation or sampling error. The sampling variation decreases as we increase the size of our sample. 23
Statistics, what is a population and a sample
A sample
24
12
7/3/2013
A sample
It is important to note that, different samples may give us different portions of the same population. As a result, if we already know the result of one sample and then draw a second sample from the same population we should not expect to have the exact same replica of data in the first. The difference in two or more samples drawn from the same population is called sampling variation or sampling error. The sampling variation decreases as we increase the size of our sample. 25
Statistics, what is a population and a sample
A Random Sample
The sample taken must be based on a selection technique called random sampling. To use this technique, each member of the population must have an equal chance of being selected. A sample resulting from a random sampling technique is called a random sample. A random sample is one in which every different subset of a specified size from the population has equal probability of being selected. We can use a table of random numbers to select a random sample.
26
A RANDOM sample
13
7/3/2013
14
7/3/2013
15
7/3/2013
What is a parameter
A parameter is a numerical descriptive measure of a population. It is usually a single value computed by using all the values in the entire population.
The study in which all members of the population are included in the study is called a census.
31
What is a parameter
Example of a parameter
In an English class of 40 students, 24 of them had participated in the English as a Second Language Program which provide a coursework for comprehensive language development for students from other non-English speaking countries. The statement "60% of the students in this English class had participated in the English as a Second Language Program" is a descriptive statement. The population is the 40 students in this English class. The 60% represents a parameter of interest.
32
Example of a parameter
16
7/3/2013
What is a statistic
A statistic is a numerical descriptive measure of a sample. It is usually a single numerical value computed by using only the sample data, and not the entire population.
Most statistical investigation leads to searching for the values of population parameters that are of interest to the investigator. If the population is not readily available, or we cannot get a complete set because it is impractical or impossible to obtain a complete set, we draw samples and then compute the necessary descriptive statistic. We then make statistical inference about the population parameter using the computed sample statistic.
33
What is a statistic?
What is a variable
A parameter is a numerical descriptive measure of a population. It is usually a single value computed by using all the values in the entire population. The study in which all members of the population are included in the study is called a census.
34
What is a variable
17
7/3/2013
What is a variable
A variable is usually a common characteristic that an investigation focuses on after all the units of analysis in the population or sample underlying the study have been identified. A variable can also be thought of as the characteristics of the units of analysis under investigation that vary from one unit to another, taking on different values, categories, or attributes. A variable tells us what particular characteristic is being studied or is of interest to the researcher. Researchers focus on the empirical measurement of this characteristic. 35
What is a variable
18
7/3/2013
Values of a variable
It is sometimes possible to confuse the difference between the variables name and the different categories or attributes which the variable consist, called the variables value. For example, gender is a variable consisting two different categories namely male and female. In this example, male and female are values we use to distinguish different people, however the name of the variable is gender.
37
Values of a variable
Values of a variable
A variable may consist of two or more values. Suppose a question on a survey asks each person to choose the response that best reflects their marital status: Are you Married, Widowed, Divorced, Separated, or Never Married. In this case, the name of the variable is marital status. The five different categoriesMarried, Widowed, Divorced, Separated, or Never Marriedare the values of the variable.
38
Values of a variable
19
7/3/2013
Values of a variable
Some variables, such as height, weight, age, may take on so many values. Others, such as gender, may take on just a few values. Irrespective of how many values a variable may take on, you can usually determine the name of the variable by asking the question what is this individuals ______? For example, what is this individuals weight? So the name of the variable is weight. Suppose the answer is 120 pounds. Then the value of this variable is 120 pounds. In this case, 120 pounds is just one of the many values of this variable named weight. 39
Values of a variable
Values of a variable
Domestic Violence: Battered women who live in poverty are often forced to choose between abusive relationships and homelessness. In a study of 777 homeless parents (the majority of whom were mothers) in ten U.S. cities, 22% said they had left their last place of residence because of domestic violence (Homes for the Homeless, 1998).
What is the population of interest in this study? Answer: The population of interest is all homeless parents in the ten U.S. cities.
40
Values of a variable
20
7/3/2013
Values of a variable
Domestic Violence: Battered women who live in poverty are often forced to choose between abusive relationships and homelessness. In a study of 777 homeless parents (the majority of whom were mothers) in ten U.S. cities, 22% said they had left their last place of residence because of domestic violence (Homes for the Homeless, 1998). What is the variable of interest being study? Answer: The variable of interest is the response of each homeless parent as to whether or not the reason for leaving their last place of residence because of domestic violence. 41
Values of a variable
Values of a variable
Domestic Violence: Battered women who live in poverty are often forced to choose between abusive relationships and homelessness. In a study of 777 homeless parents (the majority of whom were mothers) in ten U.S. cities, 22% said they had left their last place of residence because of domestic violence (Homes for the Homeless, 1998).
What is the size of the sample used? Answer: The 777 homeless parents (the majority of whom were mothers) in the ten U.S. cities.
42
Values of a variable
21
7/3/2013
43
22
7/3/2013
46
23
7/3/2013
48
24
7/3/2013
50
25
7/3/2013
51
Question 1: In comparing individuals, the mean number of hours spent watching TV will be higher among newspaper readers than nonreaders. Answer: Independent variablewhether or not an individual reads newspaper. Dependent variable number of hours spent watching TV. Question 2: In comparing candidates campaigning for elections, those who spend more money on their campaigns are more likely to win than those candidates who spend less money on their campaigns. Answer: Independent variableamount of money spent on campaign. Dependent variable whether or not the candidate won the election. Question 3: In comparing students, those who arrive late to class are more likely to receive poor grades than those who arrive on time. Answer: Independent variablewhether or not the student arrived late. Dependent variable whether or not the student received a poor grade.
52
26
7/3/2013
What is an Hypothesis?
When researchers propose an explanation to a why question, the explanation must be described in such a way that it can be tested with an empirical data. A hypothesis, therefore, is a testable statement about the empirical relationship between independent variable and the dependent variable (or between cause and effect).
53
What is an Hypothesis?
What is an Hypothesis?
For example, we might formulate the hypothesis that students from richer communities have higher SAT scores than those from poorer communities.
54
What is an Hypothesis?
27
7/3/2013
What is an Hypothesis?
There are scientific procedures that must be followed to determine whether or not a hypothesis is incorrect. To determine whether or not a hypothesis is incorrect, researchers describes a set of conditions under which the hypothesis would be rejected. To test hypotheses, we use empirical comparison. For example, using empirical data, we can compare the income of people having less education to the incomes of people having more education. In general, we use empirical comparison to test the hypotheses. We will learn more about the set of procedures for determining whether or not the hypothesis is incorrect. 55
What is an Hypothesis?
Writing Hypothesis
After we have determine the two variables whose relationship we are trying to examine, we can start our hypothesis by linking one category of the independent variable with another category of the dependent variable and make a statement about their relationship in terms of more likely or less likely type of relationship. We can use the following format: In comparing [put the name of the units of analysis here], those who are/those having [put the name of one of the category of the independent variable here] are more likely to [put the name of one of the category of the dependent variable being considered here] than those who are/those having [put the name of a different category of the independent variable with the lowest percentage here].
For example, in the attitude towards gun permit, we can make a statement such as: In comparing individuals, those who are women will be more likely to favor handgun permits than those who are men.
56
Writing Hypothesis
28
7/3/2013
Writing Hypothesis
After we have determine the two variables whose relationship we are trying to examine, we can start our hypothesis by linking one category of the independent variable with another category of the dependent variable and make a statement about their relationship in terms of more likely or less likely type of relationship. We can use the following format: In comparing [put the name of the units of analysis here], those who are/those having [put the name of one of the category of the independent variable here] are more likely to [put the name of one of the category of the dependent variable being considered here] than those who are/those having [put the name of a different category of the independent variable with the lowest percentage here].
For example, in the attitude towards gun permit, we can make a statement such as: In comparing individuals, those who are women will be more likely to favor handgun permits than those who are men.
57
Writing Hypothesis
Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students. Question 1: What is the population of interest in this study? Answer: The population of interest is all students attending that particular university. 58
What is an Hypothesis?
29
7/3/2013
Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students. Question 2: What is the variable of interest being study? Answer: The variable of interest is the topics that students most want to discuss with parents. 59
What is an Hypothesis?
Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students. Question 3: What are the values of this variable of interest? Answer: The values of this variable are: discuss about family financial situation, talk about school, and talk about religion. 60
What is an Hypothesis?
30
7/3/2013
Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students. Question 4: What is the size of the sample used? Answer: The sample size is 500 students.
61
What is an Hypothesis?
Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students. Question 6: What was a descriptive statistics used in this study?
Answer: Of the 500 students, 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion.
62
What is an Hypothesis?
31
7/3/2013
Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students.
Question 7: What statistical inference could be made from this study? Answer: 55% of all students in that university would like more to discuss about family financial situation. That is, majority of all students would like more to discuss about family financial situation. Very few students would like to talk about religion. 63
What is an Hypothesis?
Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students.
Question 8: What statistical inference could be made from this study? Answer: 55% of all students in that university would like more to discuss about family financial situation. That is, majority of all students would like more to discuss about family financial situation. Very few students would like to talk about religion. 64
What is an Hypothesis?
32
7/3/2013
Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students.
(Hint: use your creativity to come out with explanations why majority of all students would like more to discuss about family financial situation but very few students would like to talk about religion.)
Answer: In comparing individuals, students who are more concern about their own financial aid eligibility would like more to discuss about family financial situation and students who are less concern about their own financial aid eligibility would like to talk about religion. 65
What is an Hypothesis?
Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students.
Question 11: For your hypothesis, what is the independent and dependent variable? Answer: Independent variabledegree of concern of students own financial aid eligibility. Dependent variable topics that teenagers most want to discuss with parents. Question 12: For your hypothesis, list possible control variables. Answer: family income. 66
What is an Hypothesis?
33
7/3/2013
Control Variables
In conducting a research, the independent variable may not be the only variable that might have effects on the dependent variable. Similarly, there may be other variables that could influence the relationship among the variables. If such variables are not controlled, they can confuse the interpretation of the research results. The variables which are not part of the variables under investigation but could potentially influence or affect the relationship among the variables if not controlled are called controlled variables. Controlled must be held constant, or must be prevented from varying, otherwise, they can confuse the interpretation of the research results. Control variables are important because they limit the focus of the research only to specific subgroups
67
What is an Hypothesis?
Control Variables
Suppose in a study, we are interested in understanding why some people perform much better academically than others. A researcher may propose that class attendance plays a causal role. This may lead to a statement that students who attend class more are more likely to perform better academically, on the average, than students who attend class less. In this case, the variable we want to explain (the dependent variable) is academic performance, and the variable that represents the causal factor in the explanation (the independent variable) is class attendance. However, there might be additional explanations that can affect academic performance other than class attendance. For instance, amount of hours of study, age of student, class participation, social responsibilities, etc. might also have influence on academic performance. As a result, the researcher can limit the study to a certain age group, or a particular amount of class participation.
68
Control Variables
34
7/3/2013
69
35