Documente Academic
Documente Profesional
Documente Cultură
Purpose:
The objective of this project is to give you the opportunity to use some of the statistical techniques that you
have learned in this course for exploring a real data set.
You may work individually or in groups of no more than three students. Your group members can be from
different tutorial sections in our class. If you are working in a group, please think of creating a team name
and note that you will only submit on Quercus one report (filled-in template and StatCrunch outputs) on or
before the project’s due date. The ways in which your analysis of this data will be assessed is described on
page 7 of this document.
Context of Data:
The Organisation of Economic Cooperation and Development (OECD) gathers various information
regarding OECD countries and its partners in order to promote policies that aims to improve the economic
and social well-being of people around the world (http://www.oecd.org/about/).
This agency collects quantitative information on many domains and makes the collected data available for
public use (e.g., researchers) so that interested individuals can further investigate relationships among a set
of variables. A domain named “Social Protection and Well-being” includes a yearly collection of data
“Better Life Index” from OECD countries. This information can be retrieved from: http://stats.oecd.org
2
From the “Better Life Index” (BLI), the most recent data published in 2019 but collected in 2018, we will
analyze a quantitative variable named “Life Satisfaction”. Information regarding this variable can be
retrieved from: http://www.oecd.org/statistics/OECD-Better-Life-Index-definitions-
2019.pdf#_ga=2.145820212.1027110605.1559482147-696144184.1473183978
(note that this definition-document is posted on our Quercus page in the module Data Analysis Project).
This variable “considers people's evaluation of their life as a whole. It is a weighted-sum of different
response categories based on people's rates of their current life relative to the best and worst possible lives
for them on a scale from 0 to 10, using the Cantril Ladder (known also as the "Self-Anchoring Striving
Scale")” (BLI, 2019). OECD indicates that they obtained this information based on a certain Poll.
Let us recap the variables of interest in our data analysis:
1. Mean value of people’s life satisfaction
2. Gender of the respondents identified as Male, Female
I recommend that you read about this data here:
http://www.oecdbetterlifeindex.org/#/11111111111
Also, click on “Life Satisfaction” on the right hand-side menu to be directed to another web-link:
http://www.oecdbetterlifeindex.org/topics/life-satisfaction/
Scroll down that page; you can click on and read about each country’s life satisfaction score
StatCrunch Activity:
1. Understanding and comparing distributions of life satisfaction scores for males and for females in the
36 OECD countries.
2. Describing the distribution of differences between females and males’ life satisfaction scores in the 36
OECD countries.
3. Examining the relationship between males and females’ life satisfaction scores in the 36 OECD
countries. We aim to predict males’ life satisfaction scores from females’ life satisfaction scores.
Overview of Steps:
1. Save the following two data files on your computer (e.g., My Document folder):
Data Set 1_Life Satisfaction_BLI2019.csv
Data Set 2_Life Satisfaction_BLI2019.csv
2. Open each of the saved file (above). Add your last name or team name to the following variable names:
In “Data Set 1_Life Satisfaction_BLI2019.csv”, modify the variable name “Life Satisfaction”
In “Data Set 2_Life Satisfaction_BLI2019.csv”, modify the variable names:
“Life Satisfaction_Female”, “Life Satisfaction_Male, “Diff Life Satisfaction”
3. Re-save the two excel files that you have just modified their column heading names with your lastname
or your team name.
4. Follow the steps below to produce your StatCrunch outputs for the analysis of life satisfactions for
males and females. Work on the related questions on pages 5 and 6 of this document.
2. In the next horizontal menu bar, click on Data > Load > From file> On my computer
3. In the screen that opens, under “Load data from my computer”, click on “Choose File”
From your computer, find the data file that you saved from Quercus (and modified/re-saved):
“Data Set 1_Life Satisfaction_BLI2019.csv” and select/open it (to be loaded).
At the bottom of the screen that you are currently in StatCrunch, click on “Load File”
4. Obtain summary statistics for each distribution of life satisfaction scores for males and for females:
2. In the next horizontal menu bar, click on Data > Load > From file> On my computer
3. In the screen that opens, under “Load data from my computer”, click on “Choose File”
From your computer, find the data file that you saved from Quercus (and modified/re-saved):
“Data Set 2_Life Satisfaction_BLI2019.csv” and select/open it (to be loaded).
At the bottom of the screen that you are currently in StatCrunch, click on “Load File”
4
4. Obtain summary statistics for the distribution of differences in females & males’ life satisfaction scores:
5. Obtain a boxplot of the distribution of differences between females and males’ life satisfaction scores:
2. Conduct a regression analysis: Predict males’ life satisfaction scores from the females.
Copy & paste your plot into a word document that you are preparing as your StatCrunch outputs.
Click on Graph > Boxplot
Under Select column(s), choose: “Residuals”
Under Options, select: “Draw boxes horizontally”
Click on “Compute”.
Copy & paste your plot into a word document that you are preparing as your StatCrunch outputs.
Related Questions
Part 3. Predict males’ life satisfaction scores from females’ life satisfaction scores.
1. Use the scatterplot of males’ life satisfaction scores verses females’ life satisfaction scores to describe
the relationship.
2. What is the estimated correlation coefficient? Interpret this value.
3. If we examined only those countries with life satisfaction scores of above 7 for both genders, what
would happen to the correlation? And, discuss why would that happen to correlation?
4. Fit a linear regression model relating males life satisfaction scores to females life satisfaction scores.
That is, fit a straight line for predicting males life satisfaction scores’ from females life satisfaction scores.
What is the equation of the regression line?
5. What does the regression line tell us in the context of this study?
6. What does the slope of regression line mean in the context of this study?
7. Note that the slope of the line does not differ much from 1.00. What would a slope of 1.0 indicate about
the nature of the relationship? If we fitted a model with the slope fixed at 1.00, what prediction equation
would you expect to get? (Hint: Refer to the summary statistics described by males and females. Find the
mean life satisfaction scores for males and for females to answer this question).
8. Can we, at all, interpret the value for y-intercept in the regression equation? Justify your answer.
9. What is the standard deviation of residuals? Interpret this value in the context of this problem.
10. Use the plots of residuals to assess the overall adequacy of linear regression model fit to this data. State
the assumption(s) about the residuals that each of the constructed plot checks and determine whether the
assumption(s) is/are met.
11. In which country or countries do the male respondents have “somewhat unusually” low life satisfaction
scores in relation to the female respondents, according to the regression model? Moreover, In which
country or countries do the female respondents have “somewhat unusually” low life satisfaction scores in
relation to the female respondents, according to the regression model? Give the residual(s) to make and
justify your argument.
12. Give and interpret the R2 value in the context of this study.
7
Marked by TA:
Comment (if any):