Sunteți pe pagina 1din 11

Quantitative Mmethods Aassignment 1

MIM 14 F1
Exercise 1
www.exploreiceland.is has committed to display 650.000 ads. Traffic to the website
is estimated to be normally distributed with a mean of 850.000 viewers and a standard
deviation of 150.000.
a)

What is the probability of delivering the promised 650.000 impressions?

The problem consists of obtaining the probability of showing the 650.000 ads out of a
normal distributed population. Any normal distribution can be expressed as follows:
(

Where:

is the mean or average.


is the standard deviation.

Consequently, our distribution is:


(

)
Figure 1 Gaussian distribution

The question can be rewritten in a more mathematical styile as we are looking for
probabilities:
(

This means that we are looking for the probability that Z is over 650.000, or what is the
samein other words, 1 minus the probability that Z is less than 650.000, as it is less than the
mean of the traffic.
Z is obtained using the next equation that normalizes any Gaussian distribution to N (0,1):

Where:

X is our study starting point: 650.000 impressions.


is the mean: 850.000 impressions.
is the standard deviation: 150.000 impressions.

Using these values in the equation for Z, we obtain:

Graphically, this probability is represented in the Gauss curve as shown in the following
figure:

The shaded area of the image shows the probability of


achievingshowing less than 650.000 ads:
(

Formatted: Highlight

Quantitative Mmethods Aassignment 1


MIM 14 F1
In order to obtain that probability, we must refer the obtained Z to the output chart of a
normalized Gaussian distribution N (0,1):
Figure 2 Normal distributed data

Using Z=133 as input, there is a probability


of 00918 = 918% that well we obtain a
probability of showing achieve less than 650.000
impressions. of 00918 = 918%
(

To obtain the probability of showing more


than the required 650.000 ads we have to
subtract that value to the total 100% as follows:
(

Figure 3 Normalized chart for N (0,1)

The probability of delivering the promised 650.000 impressions is 9082%

b)

How many impressions should be published to be able to guarantee a 95% service


level?

In this case, we will obtain the probability of displaying less than the 5% of ads and then,

using the equation for Z we calculate theat number of impressions:

Once lookingLooking atin the normal table for P (Z) = 5%, Z can be obtained:
As 005 is located between 164 and
165, it is recommended to make a linear
interpolation to obtain z:
Figure 3 Normal distributed data

Quantitative Mmethods Aassignment 1


MIM 14 F1

Figure 5 Linear interpolation

Figure 4 Normalized chart for N (0,1)

( )

Using this result in the Z equation we can calculate the number of impressions required:

The number of impressions required to guarantee a 95% service level is 602.575

Exercise 2
A sample of 36 weekly observations has a mean of 0005 and a standard deviation of
002.
a)

Calculate a 95% C.I. for the mean weekly return.

To build a 95% Confidence Interval, we must first assume that our sample is normally
distributed. In order to do so, we can reflect on the CENTRAL LIMIT THEOREM:

It is safe to use the normal distribution if the sample is reasonably large (30 or more)

As n increases, distribution of the sample mean approaches

Sample size is 36, so it is safe to adopt a normal distribution. Hence, a 95% C.I. can be
built using the next expressionequation below:

b)

Size of the sample if the maximum error admitted is 0004?

We have to calculate a new 95% C.I. knowing what the new range is:

This means that the spread of the distribution must be less than 0004. Knowing this, we
can calculate the new value required for the size of the sample:

Quantitative Mmethods Aassignment 1


MIM 14 F1

n = 9604 = 97
The size of the sample required is 97

c)

Do we need to assume that the weekly return follows a normal distribution?

As it has been explained in a), thanks to the size of the sample being more than 30, we can
use the CENTRAL LIMIT THEOREM and assume that the sample is normally distributed.
Without using this theorem and assuming a normal distribution, we couldnt have built a 95%
C.I. to estimate the mean weekly return.

Quantitative Mmethods Aassignment 1


MIM 14 F1
Exercise 3
A random sample of 256 observations. Sample mean is 35.420 and the sample
standard deviation is 2.050.
a)

What is the estimated mean income for the population?

Again, as sample size is 256 random observations, we can use the CENTRAL LIMIT
THEOREM and assume that the sample mean is the same as the mean of the population.
Estimated mean income for the population is 35.420.

b)

95% C.I. (rounded to the nearest 10) for the estimate of the mean income.

Assuming, as said in a), a normal distribution, we can calculate a 95% C.I. using the next
expressionequation below:

Once rounded, the 95% confidence interval is:


(

c)

Interpret the meaning of the confidence interval.

It means that we are 95 % confident that the estimate mean income is between 35.170
and 35.670. In other words, the probability that this confidence interval includes the true
estimate mean income is 95%. Hence, if we were to repeat the sampling 100 times and we would
build a 95% confidence interval each time, 95 out of the 100 C.I. would include the true mean.

Exercise 4
Two models, X and Y, are used to forecast the probability that a drug development
project is going to be successful. The probability of success is said to be related to the
spent in R&D. Model Y also takes the number of scientist in the project into account.
a)

Can model X be used to predict project success?

In order to analyze any regression model we must follow the next steps below:
1)

Scatterplot and correlation:

As we dont have the original data from which the model has been created, we cannot draw
a scatterplot. However, we can see that the correlation value is 0.63 which0.63, which means
that both dependent and independentand independent variables have a positive correlation.

Quantitative Mmethods Aassignment 1


MIM 14 F1
2)

Models significance:

The objective is to check whether the model has been created randomly or not. For this
purpose we will do the following hypothesis testing:

Null hypothesis H0 : m = 0
Alternative hypothesis HA : m 0

The objective is to reject the null hypothesis so we can be


sure that the model has not been created randomly. This will be
analyzed by checking the following three elements:

Figure 6 Hypotesis testing

T-Stat: It is a ratio that tells us how many standard errors the regression
coefficient is from zero. This can be calculated using the following formula:
|

Using = 0 (as the objective is to know how far is our slope from this value), we
obtain the t-stat for our model. We canwill be sure that this difference is
significant if t-stat is >2.

P-Value is the probability of seeing a sample with at least much evidence in favor of
the alternative hypothesis as the sample actually observed. The smaller the pvalue, the more evidence there is in favor of the alternative hypothesis. As we will
use a 95% C.I., we need at least a p-value 5%.

We also need to check that the confidence interval for the variable analyzed
does not contain zero.

To sum up, in order to check the models significance, we must check the following
values:
T-Stat > 2
P-Value < 5%
C.I. 95% does not include 0
Once knowing what do we have to analyze toWe can follow the above steps to analyze a
given variable. To check the models significance, we can apply it to Model X:

R&D

Models value

Pass/Fail

T-stat (>2)

27352

PASS

P-value (<5%)

00194

PASS

Lower 95%

00001

Upper 95%

00013

PASS

Hence, we cans assure that the model is statistically significant.

Quantitative Mmethods Aassignment 1


MIM 14 F1
3)

Models quality:

With this second step, we will check whether the model is good enough to predict the
response of the dependent variable and, therefore, to be able to predict the success of the
development project. The following values must be analyzed:

Adjusted R2: Is a measure that adjusts R2 (R2 is the percentage of variation of the
dependent variable explained by the model) for the number of explanatory
variables in the equation.

Standard Error of Estimate: Is essentially the standard deviation of the residuals.

Going back to Model X, these are the results given by excel:

Adjusted R2 = 35%

This means that, once adjusted for the explanatory variable,


the percentage of variation of the dependent variable explained by
the model is only 35%, which, as far as we are concerned, is is not
enough.
-

Standard error of estimate = 019.

We should compare this value with the range of values fromof our original data, but as a
first opinion, we think that this value is too high.
However, these two conclusions cannot be used as a final decision to qualify the model. In
order to do so, we should build a 95% C.I. for the model and see how accurate it is:
*This model is obtained in the part c) of the exercise:
95% C.I. = (1759% , 9207%)
As we can see, the confidence interval is too wide (7448%). Therefore, we conclude that:

Correlation does not imply causation


Model X should not be used to predict the success of the development project as its
quality is not good enough.

b)

Evaluate model Y

In order to evaluate model Y we have to follow the same steps used for model X. The only
difference is that model Y has have included two explanatory variables to run the regression
analysis.
1)

Scatterplot and correlation:

Again, we cannot draw a scatterplot as we dont know the original values, but we have the
correlation analysis in the assignment:
Both, R&D spending and No. of scientists
show a strong positive correlation with the % of
success. Furthermore, the two independent
variables are correlated to one another. This is
called multicolinearity, and will be explained in
the next steps.
2)

Models significance:

Quantitative Mmethods Aassignment 1


MIM 14 F1
Following the same plan as we did for model X, we need to check:
T-Stat > 2
P-Value < 5%
C.I. 95% does not include 0

R&D

Models value

Pass/Fail

T-stat (>2)

45454

PASS

P-value (<5%)

00003

PASS

Lower 95%

00005

Upper 95%

00014

Scientists
T-stat (>2)

Models value

P-value (<5%)
Lower 95%

09246
-00101

Upper 95%

0011

00961

PASS

Pass/Fail
FAIL
FAIL
FAIL

This model presents a peculiarity, one of the independent variables is not significant for the
regression analysis. This is due to the multicolinearity mentioned above, and which means
that the variable No. of scientists does not improve models quality.

3)

Models quality:

We check the quality of model Y as we did for model X:


In this case, the quality is much better:
-

Adjusted R2 = 93%

This means that, once adjusted for the explanatory variable,


the percentage of variation of the dependent variable explained by
the model is 93%
-

Standard error of estimate = 007.

We should compare this value with the range of values fromof our original data, but as a
first opinion, we think that this value is very good.
However, these two conclusions cannot be used as a final decision to qualify the model. In
order to do so, we should build a 95% C.I. for the model and see how accurate it is:
*This model is obtained in the part c) of the exercise:
95% C.I. = (336% , 614%)
As we can see, the C.I. is very narrow (278%). This means that the model is accurate.

Quantitative Mmethods Aassignment 1


MIM 14 F1
As a conclusion for model Y we can say that:

Model Y is statistically significant, and its very accurate after the analysis.
It can be improved by removing the variable No. of scientists as it does not
improve the significance of the model.

c)

Write both model X and Y regression equations and forecast the % of success for
both companies when 540 have been invested in R&D and 21 scientists have
been assigned to the project. Also determine the 95% C.I.

Both model X and model Y uses a straight line to create the regression model. This straight
line can be expressed using the next equation:

Where:

m is the coefficient or slope.


c is the intercept.

Therefore, the equations for each model are:

Model X:

When R&D spending is 540:

Success = 5483%

Model Y:

When R&D spending is 540 and No. of scientists is 21:

Success = 4768%

Quantitative Mmethods Aassignment 1


MIM 14 F1
We will also create a 95% C.I. for each model:
The formula for the 95% C.I is:

Model X:

95% C.I. = (1759% , 9207%)

Model Y:

95% C.I. = (336% , 614%)

Exercise 5
A scatterplot of spending on alcohol vs tobacco has been done in 11 regions.
a)

Does alcohol seem to be a significant predictor of tobacco spending?

According to the graph in the left we can see that, in the first 11 regions, the more alcohol
is consumed, the more money is spent onin tobacco. Also, if we analyze the right graph, a
straight line can be drawn in the scatterplot implying a positive correlation:

Correlation for the first 10 regions = 0784 Strong positive correlation.


Correlation for all eleven regions = 0223 Normal positive correlation.

TApparently, taking into account the data provided, we could say that alcohol spending
causes tobacco spending. However, Correlation does not imply Causation, so maybe, once the
regression analysis is done, we realize that we cannot find any relationship between both.

b)

Should we discard the outlier?

Outliers are observations that have extreme values relative to other observations made
under the same conditions.
Normally, we should not discard any outlier unless we are completely sure that it is not
significant for our model.
In this case, we recommend to discarddiscarding the outlier as it is only one region out of
the 11 observed and, instead of being significant for our model, we think that it will be

10

Formatted: Indent: First line: 0"

Quantitative Mmethods Aassignment 1


MIM 14 F1
prejudicialdetrimental for it not representing the actual behavior of the population. We could
make a guess to answer this atypicalunusual value, maybe it is an area located far away from
the other ten or maybe it has demographic differences
However, despite the fact thatEven though we recommend to discarddiscarding the outlier,
we would also try both models and see which one fits better the reality.the reality better.

11

S-ar putea să vă placă și