Q.M. Assignment 1 Updated

Quantitative Mmethods Aassignment 1
MIM 14 F1
Exercise 1
www.exploreiceland.is has committed to display 650.000 ads. Traffic to the website
is estimated to be normally distributed with a mean of 850.000 viewers and a standard
deviation of 150.000.
a)
What is the probability of delivering the promised 650.000 impressions?
The problem consists of obtaining the probability of showing the 650.000 ads out of a
normal distributed population. Any normal distribution can be expressed as follows:
(
Where:
is the mean or average.

is the standard deviation.
Consequently, our distribution is:

(
)
Figure 1 Gaussian distribution
The question can be rewritten in a more mathematical styile as we are looking for
probabilities:
(
This means that we are looking for the probability that Z is over 650.000, or what is the
samein other words, 1 minus the probability that Z is less than 650.000, as it is less than the
mean of the traffic.
Z is obtained using the next equation that normalizes any Gaussian distribution to N (0,1):
Where:
X is our study starting point: 650.000 impressions.

is the mean: 850.000 impressions.
is the standard deviation: 150.000 impressions.
Using these values in the equation for Z, we obtain:
Graphically, this probability is represented in the Gauss curve as shown in the following
figure:
The shaded area of the image shows the probability of

achievingshowing less than 650.000 ads:
(
Formatted: Highlight

MIM 14 F1
In order to obtain that probability, we must refer the obtained Z to the output chart of a
normalized Gaussian distribution N (0,1):
Figure 2 Normal distributed data
Using Z=133 as input, there is a probability

of 00918 = 918% that well we obtain a
probability of showing achieve less than 650.000
impressions. of 00918 = 918%
(
To obtain the probability of showing more

than the required 650.000 ads we have to
subtract that value to the total 100% as follows:
(
Figure 3 Normalized chart for N (0,1)
The probability of delivering the promised 650.000 impressions is 9082%
b)
How many impressions should be published to be able to guarantee a 95% service

level?
In this case, we will obtain the probability of displaying less than the 5% of ads and then,
using the equation for Z we calculate theat number of impressions:
Once lookingLooking atin the normal table for P (Z) = 5%, Z can be obtained:
As 005 is located between 164 and
165, it is recommended to make a linear
interpolation to obtain z:
Figure 3 Normal distributed data

MIM 14 F1
Figure 5 Linear interpolation
Figure 4 Normalized chart for N (0,1)
( )
Using this result in the Z equation we can calculate the number of impressions required:
The number of impressions required to guarantee a 95% service level is 602.575
Exercise 2
A sample of 36 weekly observations has a mean of 0005 and a standard deviation of
002.
a)
Calculate a 95% C.I. for the mean weekly return.
To build a 95% Confidence Interval, we must first assume that our sample is normally
distributed. In order to do so, we can reflect on the CENTRAL LIMIT THEOREM:
It is safe to use the normal distribution if the sample is reasonably large (30 or more)
As n increases, distribution of the sample mean approaches
Sample size is 36, so it is safe to adopt a normal distribution. Hence, a 95% C.I. can be
built using the next expressionequation below:
b)
Size of the sample if the maximum error admitted is 0004?
We have to calculate a new 95% C.I. knowing what the new range is:
This means that the spread of the distribution must be less than 0004. Knowing this, we
can calculate the new value required for the size of the sample:

MIM 14 F1
n = 9604 = 97
The size of the sample required is 97
c)
Do we need to assume that the weekly return follows a normal distribution?
As it has been explained in a), thanks to the size of the sample being more than 30, we can
use the CENTRAL LIMIT THEOREM and assume that the sample is normally distributed.
Without using this theorem and assuming a normal distribution, we couldnt have built a 95%
C.I. to estimate the mean weekly return.

MIM 14 F1
Exercise 3
A random sample of 256 observations. Sample mean is 35.420 and the sample
standard deviation is 2.050.
a)
What is the estimated mean income for the population?
Again, as sample size is 256 random observations, we can use the CENTRAL LIMIT
THEOREM and assume that the sample mean is the same as the mean of the population.
Estimated mean income for the population is 35.420.
b)
95% C.I. (rounded to the nearest 10) for the estimate of the mean income.
Assuming, as said in a), a normal distribution, we can calculate a 95% C.I. using the next
expressionequation below:
Once rounded, the 95% confidence interval is:

(
c)
Interpret the meaning of the confidence interval.
It means that we are 95 % confident that the estimate mean income is between 35.170
and 35.670. In other words, the probability that this confidence interval includes the true
estimate mean income is 95%. Hence, if we were to repeat the sampling 100 times and we would
build a 95% confidence interval each time, 95 out of the 100 C.I. would include the true mean.
Exercise 4
Two models, X and Y, are used to forecast the probability that a drug development
project is going to be successful. The probability of success is said to be related to the
spent in R&D. Model Y also takes the number of scientist in the project into account.
a)
Can model X be used to predict project success?
In order to analyze any regression model we must follow the next steps below:
1)
Scatterplot and correlation:
As we dont have the original data from which the model has been created, we cannot draw
a scatterplot. However, we can see that the correlation value is 0.63 which0.63, which means
that both dependent and independentand independent variables have a positive correlation.

MIM 14 F1
2)
Models significance:
The objective is to check whether the model has been created randomly or not. For this
purpose we will do the following hypothesis testing:
Null hypothesis H0 : m = 0
Alternative hypothesis HA : m 0
The objective is to reject the null hypothesis so we can be

sure that the model has not been created randomly. This will be
analyzed by checking the following three elements:
Figure 6 Hypotesis testing
T-Stat: It is a ratio that tells us how many standard errors the regression
coefficient is from zero. This can be calculated using the following formula:
|
Using = 0 (as the objective is to know how far is our slope from this value), we
obtain the t-stat for our model. We canwill be sure that this difference is
significant if t-stat is >2.
P-Value is the probability of seeing a sample with at least much evidence in favor of
the alternative hypothesis as the sample actually observed. The smaller the pvalue, the more evidence there is in favor of the alternative hypothesis. As we will
use a 95% C.I., we need at least a p-value 5%.
We also need to check that the confidence interval for the variable analyzed
does not contain zero.
To sum up, in order to check the models significance, we must check the following
values:
T-Stat > 2
P-Value < 5%
C.I. 95% does not include 0
Once knowing what do we have to analyze toWe can follow the above steps to analyze a
given variable. To check the models significance, we can apply it to Model X:
R&D
Models value
Pass/Fail
T-stat (>2)
27352
PASS
P-value (<5%)
00194
PASS
Lower 95%
00001
Upper 95%
00013
PASS
Hence, we cans assure that the model is statistically significant.

MIM 14 F1
3)
Models quality:
With this second step, we will check whether the model is good enough to predict the
response of the dependent variable and, therefore, to be able to predict the success of the
development project. The following values must be analyzed:
Adjusted R2: Is a measure that adjusts R2 (R2 is the percentage of variation of the
dependent variable explained by the model) for the number of explanatory
variables in the equation.
Standard Error of Estimate: Is essentially the standard deviation of the residuals.
Going back to Model X, these are the results given by excel:
Adjusted R2 = 35%
This means that, once adjusted for the explanatory variable,

the percentage of variation of the dependent variable explained by
the model is only 35%, which, as far as we are concerned, is is not
enough.
-
Standard error of estimate = 019.
We should compare this value with the range of values fromof our original data, but as a
first opinion, we think that this value is too high.
However, these two conclusions cannot be used as a final decision to qualify the model. In
order to do so, we should build a 95% C.I. for the model and see how accurate it is:
*This model is obtained in the part c) of the exercise:
95% C.I. = (1759% , 9207%)
As we can see, the confidence interval is too wide (7448%). Therefore, we conclude that:
Correlation does not imply causation

Model X should not be used to predict the success of the development project as its
quality is not good enough.
b)
Evaluate model Y
In order to evaluate model Y we have to follow the same steps used for model X. The only
difference is that model Y has have included two explanatory variables to run the regression
analysis.
1)
Scatterplot and correlation:
Again, we cannot draw a scatterplot as we dont know the original values, but we have the
correlation analysis in the assignment:
Both, R&D spending and No. of scientists
show a strong positive correlation with the % of
success. Furthermore, the two independent
variables are correlated to one another. This is
called multicolinearity, and will be explained in
the next steps.
2)
Models significance:

MIM 14 F1
Following the same plan as we did for model X, we need to check:
T-Stat > 2
P-Value < 5%
C.I. 95% does not include 0
R&D
Models value
Pass/Fail
T-stat (>2)
45454
PASS
P-value (<5%)
00003
PASS
Lower 95%
00005
Upper 95%
00014
Scientists
T-stat (>2)
Models value
P-value (<5%)
Lower 95%
09246
-00101
Upper 95%
0011
00961
PASS
Pass/Fail
FAIL
FAIL
FAIL
This model presents a peculiarity, one of the independent variables is not significant for the
regression analysis. This is due to the multicolinearity mentioned above, and which means
that the variable No. of scientists does not improve models quality.
3)
Models quality:
We check the quality of model Y as we did for model X:

In this case, the quality is much better:
-
Adjusted R2 = 93%
This means that, once adjusted for the explanatory variable,

the percentage of variation of the dependent variable explained by
the model is 93%
-
Standard error of estimate = 007.
We should compare this value with the range of values fromof our original data, but as a
first opinion, we think that this value is very good.
However, these two conclusions cannot be used as a final decision to qualify the model. In
order to do so, we should build a 95% C.I. for the model and see how accurate it is:
*This model is obtained in the part c) of the exercise:
95% C.I. = (336% , 614%)
As we can see, the C.I. is very narrow (278%). This means that the model is accurate.

MIM 14 F1
As a conclusion for model Y we can say that:
Model Y is statistically significant, and its very accurate after the analysis.
It can be improved by removing the variable No. of scientists as it does not
improve the significance of the model.
c)
Write both model X and Y regression equations and forecast the % of success for
both companies when 540 have been invested in R&D and 21 scientists have
been assigned to the project. Also determine the 95% C.I.
Both model X and model Y uses a straight line to create the regression model. This straight
line can be expressed using the next equation:
Where:
m is the coefficient or slope.

c is the intercept.
Therefore, the equations for each model are:
Model X:
When R&D spending is 540:
Success = 5483%
Model Y:
When R&D spending is 540 and No. of scientists is 21:
Success = 4768%

MIM 14 F1
We will also create a 95% C.I. for each model:
The formula for the 95% C.I is:
Model X:
95% C.I. = (1759% , 9207%)
Model Y:
95% C.I. = (336% , 614%)
Exercise 5
A scatterplot of spending on alcohol vs tobacco has been done in 11 regions.
a)
Does alcohol seem to be a significant predictor of tobacco spending?
According to the graph in the left we can see that, in the first 11 regions, the more alcohol
is consumed, the more money is spent onin tobacco. Also, if we analyze the right graph, a
straight line can be drawn in the scatterplot implying a positive correlation:
Correlation for the first 10 regions = 0784 Strong positive correlation.

Correlation for all eleven regions = 0223 Normal positive correlation.
TApparently, taking into account the data provided, we could say that alcohol spending
causes tobacco spending. However, Correlation does not imply Causation, so maybe, once the
regression analysis is done, we realize that we cannot find any relationship between both.
b)
Should we discard the outlier?
Outliers are observations that have extreme values relative to other observations made
under the same conditions.
Normally, we should not discard any outlier unless we are completely sure that it is not
significant for our model.
In this case, we recommend to discarddiscarding the outlier as it is only one region out of
the 11 observed and, instead of being significant for our model, we think that it will be
10
Formatted: Indent: First line: 0"

MIM 14 F1
prejudicialdetrimental for it not representing the actual behavior of the population. We could
make a guess to answer this atypicalunusual value, maybe it is an area located far away from
the other ten or maybe it has demographic differences
However, despite the fact thatEven though we recommend to discarddiscarding the outlier,
we would also try both models and see which one fits better the reality.the reality better.
11

Q.M. Assignment 1 Updated

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Q.M. Assignment 1 Updated

Încărcat de

Drepturi de autor:

Formate disponibile

Quantitative Mmethods Aassignment 1

What is the probability of delivering the promised 650.000 impressions?

is the mean or average.

Consequently, our distribution is:

X is our study starting point: 650.000 impressions.

Using these values in the equation for Z, we obtain:

The shaded area of the image shows the probability of

Quantitative Mmethods Aassignment 1

Using Z=133 as input, there is a probability

To obtain the probability of showing more

Figure 3 Normalized chart for N (0,1)

The probability of delivering the promised 650.000 impressions is 9082%

How many impressions should be published to be able to guarantee a 95% service

using the equation for Z we calculate theat number of impressions:

Quantitative Mmethods Aassignment 1

Figure 5 Linear interpolation

Figure 4 Normalized chart for N (0,1)

The number of impressions required to guarantee a 95% service level is 602.575

Calculate a 95% C.I. for the mean weekly return.

As n increases, distribution of the sample mean approaches

Size of the sample if the maximum error admitted is 0004?

Quantitative Mmethods Aassignment 1

Do we need to assume that the weekly return follows a normal distribution?

Quantitative Mmethods Aassignment 1

What is the estimated mean income for the population?

Once rounded, the 95% confidence interval is:

Interpret the meaning of the confidence interval.

Can model X be used to predict project success?

Scatterplot and correlation:

Quantitative Mmethods Aassignment 1

The objective is to reject the null hypothesis so we can be

Figure 6 Hypotesis testing

Hence, we cans assure that the model is statistically significant.

Quantitative Mmethods Aassignment 1

Standard Error of Estimate: Is essentially the standard deviation of the residuals.

Going back to Model X, these are the results given by excel:

This means that, once adjusted for the explanatory variable,

Standard error of estimate = 019.

Correlation does not imply causation

Scatterplot and correlation:

Quantitative Mmethods Aassignment 1

We check the quality of model Y as we did for model X:

This means that, once adjusted for the explanatory variable,

Standard error of estimate = 007.

Quantitative Mmethods Aassignment 1

m is the coefficient or slope.

Therefore, the equations for each model are:

When R&D spending is 540:

When R&D spending is 540 and No. of scientists is 21:

Quantitative Mmethods Aassignment 1

95% C.I. = (1759% , 9207%)

95% C.I. = (336% , 614%)

Does alcohol seem to be a significant predictor of tobacco spending?

Correlation for the first 10 regions = 0784 Strong positive correlation.

Should we discard the outlier?

Formatted: Indent: First line: 0"

Quantitative Mmethods Aassignment 1

S-ar putea să vă placă și