Sunteți pe pagina 1din 18

CHILD

MORTALITY REDUCTION : A CLOSER VIEW

Lusi Yang
University of Toronto
Abstract

Although the number of under-five deaths worldwide has declined from 12.7 million

in 1990 to 5.9 million in 2015, many children are still dying in the poor and unsanitary regions

of Africa. This paper had two aims: to study the associations between the response variable,

under-five mortality rate and the predictors, which are selected explanatory factors of child

mortality; and to assess the full multiple linear model by using Box-Cox transformation for

the skewed response variable. The associations were studied based on the results of the

multiple linear regression both before and after the Box-Cox transformation. Numerous

sources such as the WHO have confirmed the associations of the transformed model. To

justify the adequacy of the Box-Cox transformation, histograms of the residuals and the

normal Q-Q plots of both models were plotted. The transformed model satisfied not only

the normality of residuals assumptions of the linear regression but also other assumptions

of the linear regression including the linear relationship between dependent and

independent variables and homoscedasticity. The Box-Cox transformed model also had a

lower AIC than the non-transformed model. Thus, Box-Cox transformation can be a tool for

correcting the non-normality of residuals and fulfil other linear regression assumptions.

2
Child Mortality Reduction: A Closer View
Introduction

Children are the key in shaping the future demographics. Thus, for the well-being of

children and the stability of the international economy, world leaders must set goals to

reduce child mortality rate. According to the World Health Organization (WHO), the number

of under-five deaths worldwide has declined from 12.7 million in 1990 to 5.9 million in 2015

and that is 16,000 everyday compared with 35,000 in 1990 (WHO, 2016). According to the

United Nations Childrens Fund (UNICEF), Sub-Saharan African children suffer the highest

under-five mortality rate in the world (UNICEF, 2016). As a result, this paper will take the

initiative to take a closer look at under-five mortality rates around the world through a

statistical lens. This study aimed to investigate the associations between the dependent

variable, under-five mortality rate and the selected explanatory factors as the independent

variables by a multiple linear regression, and to assess the full multiple linear model by using

Box-Cox transformation for the skewed response variable.

Material and Methods

The Sample and the Quality of the Data

The independent variables were selected based on the studies published such as

WHO, the Alan Guttmacher Institute (AGI), Our World in Data, and the United Nations (UN).

For instance, a report on family planning by AGI listed several explanatory factors for infant

mortality including births to adolescents, closely spaced births, high fertility rates, less-

educated women, and lack of government spending on health (AGI, 2002). Moreover, the

WHO (2016) listed overcrowded conditions, unsafe drinking water and food, and poor

3
hygiene practices as major explanatory factors of under-five mortality. This paper would

then confirm the associations with previous studies through a statistical perspective.

In this study, the data were gathered from Gapminder through various sources

including the UN, the World Bank, the WHO, OECD, and UNICEF. There were 14 independent

variables and 1 dependent variables of 181 nations. The countries were assumed to be

representative of the world out of the total 196 nations. However, there are several

concerns about the dataset. The first concern is that many developing nations did not have

the resources to gather reliable data; thus, this study must assume the data consist no

measurement errors. The second concern is that since some countries did not have the data

for certain variables, missing data are expected in the dataset. One last limitation of the

dataset is that it is not possible to collect all the explanatory factors that are associated with

the under-five mortality rate because they would introduce multicollinearity.

Figure 1. Histograms of the Variables

4
Figure 1 presents the 15 variables in this study of under-five mortality rate. Please refer each

variable in appendix A. The dependent variable is right skewed, which requires

transformation.

The Method: Multiple Linear Regression Model

Suppose there are m selected explanatory factors, and they can be analyzed through

a multiple regression. Let the explanatory variables be X1, X2, , Xm, and let the response

variable be y. The multiple linear regression in matrix form can be set up as

y = X! + ", (1)

where X is an n (m+1) design matrix where n is the number of observations, m is the

number of independent variables, and the X matrix includes the intercept. ! is an (m+1) 1

vector of coefficients we want to estimate. " is an n 1 error vector. y is an vector of

observations of the dependent variable. The coefficient !s are estimated by minimizing the

sum of squared residuals. A multiple linear regression can be estimated through lm() in R. A

linear regression model assumes: linear relationship between the response and predictors,

normality of the residuals and homoscedasticity. To study the association between the

independent variable x and the dependent variable y, there are three possible cases for ith

coefficient: !$ = 0 means there is no linear association between y and x; !$ > 0 means that

there is a positive linear association between y and X; and !$ < 0 means there is a negative

linear association between y and x.

To ensure that the errors are i.i.d. normally distributed, Box and Cox developed a

method for choosing the best transformation from a set of power transformations to

correct the violation. The power transformations can be defined as follows:

5
Let % " , then

(2)

A % value that maximizes the log likelihood or minimizes the sum of squared residuals would

be most appropriate. There are several steps involved to find the optimal %.

Step 1. Use boxcox function in R package MASS and then use this function with the R lm

object. The boxcox function also displays the log-likelihood vs % plot to visually determine

the % that maximizes the log likelihood.

Step 2. Type in lambda<-bc$x[which.max(bc$y)] to find the % that maximizes the log

likelihood. Then type in lambda, the optimal % will be displayed.

Step 3. After finding the %, we apply this % as the exponent of the response variable and run

the lm function again.

Additional steps to assess the adequacy of the Box-Cox transformation:

Step 4. Use the diagnostic plot in R, plot(fit1), and then examine the Normal Q-Q plot to see

if the residuals are normally distributed.

Step 5. Compare the AIC of the two models to see the improvement after transformation.

Step 6. Compare the before and after transformation diagnostic plots in R.

Results

The goals of this study were to identify the associations between the dependent and

independent variables and to assess the full multiple linear model by using Box-Cox

transformation for the response variable. The results of the multiple linear regression model

6
before and after the transformation are presented in Table 1. The resulting % that minimizes

the sum of squared residuals from R for the Box-Cox transformation is 0.3838384.

Before Box-Cox Transformation After Box-Cox Transformation


Coefficients !-Coefficient (95% CI) p-Value !-Coefficient (95% CI) p-Value
intercept 166.574 0 8.923 0
(110.811, 222.337) (7.240, 10.606)
fer 7.643 0.0003 0.271 0
(3.549, 11.738) (0.1478, 0.395)
teenfer 0.118 0.064 0.0024 0.2136
(-0.007, 0.243) (-0.0014 0.006)
gdppc 0.000147 0.441 -0.000008 0.1642
(-0.0002, 0.0005) (-0.00002, 0.000003)
gvt 0.00065 0.735 -0.00004 0.4998
(-0.003, 0.004) (-0.00015, 0.00007)
agr 0.222 0.298 0.004 0.5570
(-0.200, 0.644) (-0.009, 0.0165)
sch -0.579 0.409 -0.0426 0.0458
(-1.964, 0.807) (-0.084, -0.0008)
water -0.310 0.068 -0.008 0.1370
(-0.643, 0.023) (-0.0176, 0.002)
san -0.019 0.865 -0.003 0.3872
(-0.235, 0.198) (-0.009, 0.004)
le -1.826 0 -0.069 0
(-2.432, -1.220) (-0.088, -0.051)
pop 0.0 0.490 0.0 0.3117
- -
Table 1 Regression results for before and after Box-Cox transformation

Before the transformation, fertility (fer), teen fertility (teenfer), GDP per capita

(gdppc), government spending (gvt), and agriculture (agr) had a positive association with

under-five mortality rate; school (sch), water, sanitation (san), life expectancy (le), had a

negative association with under-five mortality rate; and population had no linear association

with under-five mortality rate. Based on the p-values, only fertility and life expectancy were

statistically significant. On the other hand, after the Box-Cox transformation, both GDP per

7
capita and government spending changed signs, so the associations became negative. Based

on the p-values, fertility, school, and life expectancy were statistically significant.

To check the improvement of the normality of errors after transformation, we

compared the normal Q-Q plot for checking the normality of residuals.

Figure 2. Histogram and Normal Q-Q plot of standardized residuals before the transformation

The above histogram of standardized residuals before transformation suggests that the

residuals are not normally distributed because there are several extreme positive and

negative residuals. The corresponding Normal Q-Q plot also suggests residuals are not

normally distributed because of the very high and very low points (outliers) relative to the

linear trend. The overall AIC for this model was 908.59.

Figure 3. Histogram and Normal Q-Q plot of standardized residuals after the Box-Cox transformation

8
Since point 4 (Angola) and point 144 (Sierra Leone) were outliers, they were removed

from the dataset. In Figure 3, the histogram after Box-Cox transformation looks more

normal than the histogram before the transformation. The corresponding Normal Q-Q plot

also looks straighter, although there are still some deviations at the tails. The overall AIC

after the transformation was 110.98. Based on the diagnostic plots of the second row in

Figure 4, all other linear regression assumptions including linear relationship and

homoscedasticity have been met.

Figure 4. The Diagnostic Plots of the model before and after the Box-Cox transformation

9
Discussion

The positive associations between under-five mortality rate and fertility, teen fertility,

and agriculture of the Box-Cox transformed model can be confirmed by numerous studies

(AGI, 2002; BBC, 2013; WHO, 2017; Roser, 2017;). It is interesting to see that the positive

association between under-five mortality rate and agriculture. In agriculture-depend

economies, hazardous pesticides and child labour impose serious risks to children (BOI, 2015;

WHO, 2017). The negative associations between under-five mortality rate and GDP per

capita, government spending, school, water, sanitation, and life expectancy can also be

confirmed by several studies (AGI, 2002; Gunther & Fink, 2011; O Hare, 2013; Page et al., 2014;

Zeltner et al., 2015; WHO, 2017). Thus, the Box-Cox model had estimated the coefficient signs

correctly.

On the other hand, normality of residuals had been improved after the Box-Cox

transformation. The histogram in Figure 3 is more normal than the histogram in Figure 2.

Moreover, the Normal Q-Q plot identified the outliers of the dataset and became more linear

even though it still shows some outlying observations. The AIC of the before-transformation

model was 908.59 and the AIC of the after-transformation model was 110.98. The

transformation had improved the fit by approximately 8 times.

After the transformation, other assumptions of linear regression also improved. In

second row of Figure 4, the Scale-Location shows that the residuals are randomly spread

out, which satisfies the homoscedasticity assumption; the Residuals vs Leverage plot states

that there are no influential cases because all points are within the Cook Distances lines; the

Residuals vs Fitted shows that the residuals are equally spread residuals around the

10
horizontal line, so the linear relationship between the dependent and independent variables

is satisfied. Thus, the transformation of the response variable was effective.

Conclusion

Based on this study, we confirmed the associations between child mortality rate and

the chosen explanatory factors with previous studies. Thus, policy makers should make their

policies in the directions of the associations of this study to reduce child mortality rate. For

the skewed response variable, we applied the Box-Cox transformation to ensure the

normality of the residuals. The transformation was adequate because it improved not just

the normality of the residuals but also improved other multiple linear regression model

assumptions: linear relationship between the respond and predictor variables and

homoscedasticity. As a result, this study shows that Box-Cox transformation can be a tool

for correcting the non-normality of residuals and fulfil other linear regression assumptions.

11
Appendices

Appendix A: References

AGI (2002). Family Planning Can Reduce High Infant Mortality Levels. (2016). Retrieved
December 26, 2016, from https://www.guttmacher.org/report/family-planning
can-reduce-highinfant-mortality-levels

Ezeh, O. K., Agho, K. E., Dibley, M. J., Hall, J., & Page, A. N. (2014, September). The Impact of
Water and Sanitation on Childhood Mortality in Nigeria: Evidence from
Demographic and Health Surveys, 20032013. Retrieved April 26, 2017, from
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4199018/

Findings on the Worst Forms of Child Labor - Cte d'Ivoire. (2016, December 07). Retrieved
April 26, 2017, from https://www.dol.gov/agencies/ilab/resources/reports/child
labor/c%C3%B4te-dIvoire

Gapminder: Unveiling the beauty of statistics for a fact. Retrieved December 26, 2016,
from https://www.gapminder.org/

Gnther, I., & Fink, G. (2011). Water and Sanitation to Reduce Child Mortality: The Impact and
Cost of Water and Sanitation Infrastructure (Rep.). Washington D.C.: The World
Bank.

Maruthappu, M., Ng, K. Y., Williams, C., Atun, R., & Zeltner, T. (2015, April 01). Government
Health Care Spending and Child Mortality. Retrieved April 26, 2017, from
http://pediatrics.aappublications.org/content/135/4/e887

O'Hare, B., Makuta, I., Chiwaula, L., & Bar-Zeev, N. (2013, October). Income and child
mortality in developing countries: a systematic review and meta-analysis.
Retrieved April 26, 2017, from
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3791093/

Roser, M. (2016). Child Mortality. Retrieved December 26, 2016, from


https://ourworldindata.org/child-mortality/

The cost of a polluted environment: 1.7 million child deaths a year, says WHO. (2017).
Retrieved April 26, 2017, from
http://www.who.int/mediacentre/news/releases/2017/pollution-child-death/en/

Under-Five Mortality. (2016, October). Retrieved April 26, 2017, from


https://data.unicef.org/topic/child-survival/under-five-mortality/

12
WHO. (2016, September). Children: reducing mortality. Retrieved December 26, 2016, from
http://www.who.int/mediacentre/factsheets/fs178/en/

Young mothers 'risk factor for early childhood death' (2013, September 30). Retrieved April
26, 2017, from http://www.bbc.com/news/health-24296960

Appendix B: The Variables

Variable Name Definition Source


Y Under-five mortality The probability that a www.childmortality.org
rate (per 1,000 live child born in a specific www.mortality.org
births) year will die before
reaching the age of five
if subject to current age-
specific mortality rates.
Expressed as a rate per
1,00 live births.
X1 Children per woman Total fertility rate. The UN data (most observations
(total fertility) number of children that after 1950)
would be born to each
woman with prevailing
age-specific fertility
rates.
X2 Teen fertility rate Teen fertility rate is the World Bank data and the
(births per 1,000 number of births per WPP data from UN
women ages 15-19) 1,000 women ages 15-19.
World Bank staff
estimates from various
sources including
census reports, the
United Nations
Population Division's
World Population
Prospects, national
statistical offices,
household surveys
conducted by national
agencies, and Macro
International.
Gapminder has added
historical series for
United States, Sweden
and Algeria

13
X3 Income per capita Gross Domestic Product Compiled by Mattias
(GDP/cap, PPP$ per capita by Purchasing Lindgren, Gapminder
inflation-adjusted) Power Parities (in
international dollars,
fixed 2011 prices). The
inflation and differences
in the cost of living
between countries has
been taken into
account.
X4 Per capita Per capita general World Health Organization
government government http://www.who.int
expenditure on expenditure on health
health at average expressed at average
exchange rate (US$) exchange rate for that
year in US dollar.
Current prices.
X5 Agriculture, value Agriculture corresponds World Bank National
added (% of GDP) to ISIC divisions 1-5 and Accounts Data, and OECD
includes forestry, National Accounts Data
hunting, and fishing, as http://data.worldbank.org/ind
well as cultivation of icator
crops and livestock
production.
X6 Mean years in The average number of Institute for Health Metrics
school 15-44 women years of school and Evaluation
attended by all people http://www.healthdata.org/
in the age and gender
group specified,
including primary,
secondary and tertiary
education
X7 Improved water The percentage of the The United Nations site for
source, overall % total population who the MDG Indicators
use any of the following http://mdgs.un.org/unsd/mdg
types of water supply /Data.aspx
for drinking: piped
water into dwelling,
plot or yard; public
tap/standpipe;
borehole/tube well;
protected dug well;
protected spring;

14
rainwater collection and
bottled water.
X8 Improved sanitation, Access to improved World Development
overall % sanitation facilities Indicators
refers to the percentage http://data.worldbank.org/ind
of the population with icator/SH.STA.ACSN
at least adequate access
to excreta disposal
facilities that can
effectively prevent
human, animal, and
insect contact with
excreta. Improved
facilities range from
simple but protected pit
latrines to flush toilets
with a sewerage
connection. To be
effective, facilities must
be correctly constructed
and properly
maintained.
X9 Life expectancy The average number of Various sources
years a new born child
would live if current
mortality patterns were
to stay the same
X10 Total population Total number of Mattias Lindgren, Gapminder
population of both
sexes, data after 2010 is
based on the medium
estimates from UN
population division
X11 DTP3 immunized, % One-year-olds UNICEF and WHO
of one-year-olds immunized with three https://www.unicef.org/
doses of diphtheria
tetanus toxoid and
pertussis (DTP3) (%)
X12 Contraceptive use % Contraceptive World Bank
of women ages 15- prevalence rate is the http://data.worldbank.org/ind
49 percentage of women icator
who are practicing, or
whose sexual partners

15
are practicing, any form
of contraception. It is
usually measured for
married women ages 15-
49 only.
X13 CO2 per capita Carbon dioxide Gapminder
(metric tons per emissions from the
person) burning of fossil fuels
X14 Pneumonia deaths Pneumonia deaths in Gapminder
in newborn, per newborn(per 1,000
1,000 births births)

Appendix C: The Code

study = read.csv("/Users/Lusi/Desktop/study.csv")
attach(study)
# Rename Variables
mor = study$underFiveMortality
fer = study$totalFertilityRate
teenfer = study$teenFertility
gdppc = study$GDPPerCapita
gvt = study$govtExOnHealthPerCapita
agr = study$agriPercentGDP
sch = study$womenMeanYearsInSchool
water = study$improvedDrinkingWaterSourcesInPercentage
san = study$improvedSanitationFacilitiesInPercentage
le = study$lifeExpectancy
pop = study$totalPopulation
dtp3 = study$DTP3ImmunizedInPercentage
contra = study$contraceptivePrevalenceInPercentage
co2 = study$CO2
pne = study$pneumoniaDeathsInNewborns

# Study the distribution of each variable


par(mfrow=c(3,5))
hist(mor)
hist(fer)
hist(teenfer)
hist(gdppc)
hist(gvt)
hist(agr)
hist(sch)
hist(water)
hist(san)

16
hist(le)
hist(pop)
hist(dtp3)
hist(contra)
hist(co2)
hist(pne)

# Full Regression Model


par(mfrow=c(2,2))
fit = lm(mor~fer+teenfer+gdppc+gvt+agr+sch+water+san+le+pop, data=study)
plot(fit)
residuals=scale(residuals(fit))
hist(residuals,xlab="Standardized Residuals", main="Histogram of Standardized Residuals")
AIC(fit)

# Based on the original dataset, we removed outliers points 4 Angola and 144 Sierra Leone
study1 = read.csv("/Users/Lusi/Desktop/study1.csv")
attach(study1)
mor = study1$underFiveMortality
fer = study1$totalFertilityRate
teenfer = study1$teenFertility
gdppc = study1$GDPPerCapita
gvt = study1$govtExOnHealthPerCapita
agr = study1$agriPercentGDP
sch = study1$womenMeanYearsInSchool
water = study1$improvedDrinkingWaterSourcesInPercentage
san = study1$improvedSanitationFacilitiesInPercentage
le = study1$lifeExpectancy
pop = study1$totalPopulation
dtp3 = study1$DTP3ImmunizedInPercentage
contra = study1$contraceptivePrevalenceInPercentage
co2 = study1$CO2
pne = study1$pneumoniaDeathsInNewborns

# Apply the Box-Cox transformation to make variances of the error terms more constant
library(MASS)
bc =
boxcox(mor~fer+teenfer+gdppc+gvt+agr+sch+water+san+le+pop,na.action=na.exclude)
which.max(bc$y)
lambda<-bc$x[which.max(bc$y)]
lambda
fit1 = lm(mor^0.3838384~fer+teenfer+gdppc+gvt+agr+sch+water+san+le+pop)
par(mfrow=c(1,3))
plot(fit1)

17
residuals=scale(residuals(fit1))
hist(residuals,xlab="Standardized Residuals", main="Histogram of Standardized Residuals")
AIC(fit1)

18

S-ar putea să vă placă și