0 Voturi pozitive0 Voturi negative

12 (de) vizualizări9 paginiOct 19, 2010

© Attribution Non-Commercial (BY-NC)

DOC, PDF, TXT sau citiți online pe Scribd

Attribution Non-Commercial (BY-NC)

12 (de) vizualizări

Attribution Non-Commercial (BY-NC)

- Z-Distribution
- Decsci2.hw02
- new vison
- MCQ Normal,Probability, Confidence
- Construction Spending March 2012
- Nov -2011 Construction Report
- bes220_s2_2017_ BES220 - Assignment 3(b)4
- 20140905 Probability Sam
- Construction Spending September 2011
- Construction Spending June 2011
- Chapter 5 Estimation.pdf
- PART A FINAL.docx
- Construction Spending April 2011
- PHStat2 Users Guide
- Effectiveness Of Homework
- Statistic
- US Department of Justice Court Proceedings - 08242007 notice
- etivariasi
- Mannes_2012_ Shorn Scalps and Perceptions of Male Dominance
- CH11-isbe.pdf

Sunteți pe pagina 1din 9

This Reading Assignment examines how data in a sample can be collected and then

used to provide information on the wider population. Many of the examples are

concerned with the mean of the sample being used to estimate the population

mean, this is a practice often used in finance. The Central Limit Theorem allows us

to make probability statements about a population mean based on sample data. It

is imperative that you understand the concept and calculation of confidence

intervals for the population mean and when to use the z-statistic or t-statistic.

Sampling

There are different ways of selecting a sample from a population. The basic type of sample is a simple

random sample. In this sample each item or person in the population has an equal probability of

being included.

Put each member of a population in a sequence and identify each member by a number then use

random number tables to select the numbers for a sample (however many numbers needed for the

sample size required). Match these numbers to the members of the population to identify the sample.

When it is not practical to assign a number to each item in a population then we might use

systematic random sampling. In this case the items are arranged and then every nth item is

included in the sample. This assumes that there is no pattern to the way that the items are arranged.

A chocolate bar manufacturer selects every 100th chocolate bar coming off a conveyor belt for

inclusion in a sample to test the weights of chocolate bars being produced.

Although a random sample will reflect the characteristics of the population in an unbiased way, there

is likely to be a difference between the estimate from the sample and the actual population

characteristic.

Sampling error is defined as the difference between the observed value of a sample statistic and the

quantity that it is being used to estimate from the population.

A chocolate bar manufacturer selects every 100th chocolate bar coming off a conveyor belt for

inclusion in a sample to test the weights of chocolate bars being produced. The mean weight of

chocolate bars in the sample is 105 grams, the mean population weight is 100 grams, and sampling

error is therefore 5 grams.

A sampling distribution of a statistic is the distribution of all possible distinct values that the

statistic can assume when samples of the same size are randomly taken from the population.

For example the sampling distribution of the sample mean is the distribution of all possible

sample means of a given sample size and the probability of occurrence of each sample mean.

The four employees of a firm have worked for the firm for 3, 7, 8 and 12 years. To calculate the

sampling distribution of the sample mean for samples of two workers calculate the means for all

possible samples

Employees in Sample (years worked) Sample Mean (in years)

3 and 7 5.0

3 and 8 5.5

3 and 12 7.5

7 and 8 7.5

7 and 12 9.5

8 and 12 10.0

5.0 0.167

5.5 0.167

7.5 0.333

9.5 0.167

10.0 0.167

We can see from the example that the mean of the sample mean is the same as the population mean,

and the standard deviation of the distribution of sample mean is less than that of the population.

Another method of taking a sample is stratified random sampling. In this case we divide the

population into subgroups (or strata) and select a sample from each subgroup. If it is a proportional

sample then the number of items selected from each subgroup will be the same as the size of the

subgroup as a proportion to the total population.

If we wish to study the usage of cars by a population of car owners we might decide to divide car

owners into three subgroups by age as shown below.

25 ears a to 55

60% 1,200

ears

The number from each group selected for the sample is based on the percentage of car owners in that

group.

1. Time-series data

Time-series data is a sequence of returns collected at discrete and equally spaced time

intervals, for example historic monthly stock returns.

2. Cross-sectional data

companies, at a single point in time. Last year's closing prices for stocks that trade on the

NYSE is an example of cross-sectional data.

For a population with a mean of μ and a variance of σ2, the sampling distribution of the sampling

mean (x) of all possible samples of size n will be approximately normally distributed with a mean μ

and variance σ2/n (assuming n is large, say 30 or over).

To summarize:

• Even if the distribution of the population is not normal the sampling distribution of the

sampling mean, x, is approximately a normal distribution.

• The mean of the distribution of x will be equal to the mean of the population.

• The variance of the distribution of x will be equal to the variance of the population divided by

the sample size.

(1)

This is the standard deviation of the sampling distribution of the sample mean.

If the population standard deviation (σ) is not known, then we can use the sample standard deviation,

s, to estimate the standard error, it is then denoted by:

(2)

where:

(3)

If the standard deviation of a population is 10 and a sample of 49 items is taken from the population

then the standard error of the sample mean is:

Estimating a Population Parameter

The formulae that we use to calculate a sample statistic are estimators. The particular value that we

calculate using an estimator is an estimate.

A point estimate is a single estimate calculated from a sample which is used to estimate the

population parameter. An example of this would be a sample mean being calculated as a point

estimate of the population mean.

Another approach is to make an interval estimate of the parameter; this means we find an interval

that will include the population parameter with a certain level of probability. This is a confidence

interval.

1. Unbiased - the expected value (the mean of its sampling distribution) is the same as the

parameter it is intended to estimate.

2. Efficient - there is no other unbiased estimate of the same parameter with a sampling

distribution of smaller variance.

3. Consistent - the probability of accurate estimates increases as the sample size increases.

Confidence Intervals

This is an interval and the population parameter lies within this interval with a specified probability (1

- α). The probability is the degree of confidence. The interval is called the (1 - α)% confidence

interval for the parameter.

The end points of the interval are called the lower and upper confidence limits.

A 95% confidence interval can be interpreted by considering the case when we take a large number of

samples from the population and construct a confidence interval for each sample. We expect 95% of

these confidence intervals to include the population mean. Following on, we can say that we are 95%

confident that a single confidence level includes the population mean.

where

estimate

reliability = a number based on the assumed distribution of the point estimate and degree of

factor confidence for the interval

standard = standard error of the sample statistic providing the point estimate

error

Applying this to the case where we are estimating the population mean and we are taking a sample

from a normally distributed population with known variance. The confidence interval is given by:

(5)

where

n = sample size

Zα/2 = reliability factor, the Point where α/2 of the Probabilitv is in the right tail

• z0.05 is used for 90% confidence intervals, since it is when 5% of the probability is in the top

right tail and 5% in the bottom left tail. z0.05 is 1.645.

• 90% of the sample means will be within 1.645 standard deviations of the population mean.

• 95% of the sample means will be within 1.960 standard deviations of the population mean.

• 99% of the sample means will be within 2.575 standard deviations of the population mean.

For any distribution if we do not know the variance, and it is a large sample, we can use

(6)

where

x = sample mean

n = sample size

Therefore:

• The 99% confidence interval for the mean is x ±

A sample of 81 observations is taken from a normal population, the sample mean is 20 and the

standard deviation is 3.

This means we can be 90% confident that the population mean lies between 19.45 and 20.55.

Student's t -Distribution

An alternative method for constructing confidence intervals is to use the t-distribution. It is a more

conservative method, giving wider intervals, and ideally is used in all cases even when it is a large

sample. However when it is a small sample (less than 30), when we do not know the population

variance, it is essential to use the t-distribution approach.

The t-distribution is a symmetrical probability distribution defined by a single parameter, the number

of degrees of freedom (df).

The t-distribution with a mean of 0 and (n -1) degrees of freedom is given by:

(7)

It is not normal since there are two random variables, the sample mean and standard deviation.

However as the number of degrees of freedom increases the t-distribution approaches the normal

distribution, as shown below:

Confidence Intervals for the Population Mean

If we are considering a population with unknown variance and either

8)

where the number of degrees of freedom for tα/2 is (n -1), with a sample size of n.

In order to answer hypothesis questions you may be required to read t-distribution tables to find the

critical value of t. We show an excerpt from the tables below. Note that these are for one-tailed tests,

so for α = 0.05 then p = 0.05, whereas for a two-tailed test you would need to use p = 0.025, which

is half the significance level.

For example to find the critical t-value with 5 degrees of freedom and a = 0.05 and a one-tailed test

the critical t-value would be 2.015. For a two-tailed test (p = 0.025) it would be 2.571.

Etc.

An investor is looking at the quarterly returns from a mutual fund portfolio which are assumed to be

normally distributed and have a mean of 3% and a sample standard deviation of 2%. He looks at 3

years' data and wishes to compute the 95% confidence interval. Since the sample is small he uses

Equation 3-12.

He will need to use t-distribution tables to look up t0.025 for 11 degrees of freedom (since the sample

size is 12), this is 2.201.

The investor can be confident, at the 95% level, that this range includes the population mean.

In summary, the table below shows which statistic to use for different samples.

Normal Known z z

Normal Unknown t z or t

If a larger sample size is taken then the confidence interval will decrease as the standard error is

lower. As you would expect, a larger sample gives more precise results.

Data-Snooping Bias

This is the bias that occurs if you use the empirical results of other analysts' research, or focus on

patterns that may have been identified by other research. Ideally you would study new data but

unfortunately this may not be practical in financial markets where much of the research is based on

historic data.

Data-Mining Bias

This is when forecasting models are derived from searching through historic data for patterns/trading

rules. The problems occur when a large number of models are tested but only the successful ones

reported.

This occurs when certain data is excluded from the analysis, possibly because the data was not

available.

Survivorship Bias

This is one type of sample selection bias, which occurs when companies that have gone bankrupt, or

funds or portfolios that have been liquidated, are not included in the analysis.

Look-Ahead Bias

This is when a test uses information that was not available at the test date. An example of this is

when the success of valuation ratios is considered but all investors may not have had access to the

accounting data incorporated in the valuation ratio at the test date.

Time-Period Bias

This is when the test period used does not match the conclusion being drawn, perhaps short-term

data is being applied to provide long-term forecasts.

- Z-DistributionÎncărcat deAamir Mukhtar
- Decsci2.hw02Încărcat dePatrick Bernil
- new visonÎncărcat deNil Mukherjee
- MCQ Normal,Probability, ConfidenceÎncărcat deDanish Shaikh
- Construction Spending March 2012Încărcat deCoy Davidson
- Nov -2011 Construction ReportÎncărcat deCoy Davidson
- bes220_s2_2017_ BES220 - Assignment 3(b)4Încărcat dempaka felli
- 20140905 Probability SamÎncărcat deManggala Mahardhika
- Construction Spending September 2011Încărcat deCoy Davidson
- Construction Spending June 2011Încărcat deCoy Davidson
- Chapter 5 Estimation.pdfÎncărcat deElson Lee
- PART A FINAL.docxÎncărcat deBolang chai
- Construction Spending April 2011Încărcat deCoy Davidson
- PHStat2 Users GuideÎncărcat deOmar Ahmed Elkhalil
- Effectiveness Of HomeworkÎncărcat deJerome Loreca Gloria
- StatisticÎncărcat delopoka98
- US Department of Justice Court Proceedings - 08242007 noticeÎncărcat delegalmatters
- etivariasiÎncărcat dejaelani
- Mannes_2012_ Shorn Scalps and Perceptions of Male DominanceÎncărcat deMarcelo Luz
- CH11-isbe.pdfÎncărcat deasa
- Construction Spending October 2011Încărcat deCoy Davidson
- Standard Deviation - Wikipedia, The Free EncyclopediaÎncărcat deManoj Borah
- UploadÎncărcat devinayak
- Gail M. H. - On meta-analytic assessment of surrogate outcomes(2000)(16).pdfÎncărcat deoscura
- RoyalizahÎncărcat deSittieroyalizah abdullah
- Formula SheetÎncărcat debinhvu24
- UntitledÎncărcat deapi-25887578
- Interplay of BayesÎncărcat deDeusExMacchiato
- SG-BiostatsÎncărcat deOEMBoardReview
- Vrancken Phylogenetic SignalÎncărcat deRebeca De la Fuente

- ,,,,,,,,,,,Încărcat deAra Taningco
- Bayesian Anomaly Sensor IAHR07Încărcat desuder
- Spatial regretion: the curious case of negative spatial dependenceÎncărcat deJuan Pinzon
- McqÎncărcat deEngr Mujahid Iqbal
- SylÎncărcat defriendboy
- ds resume jxlin dec18Încărcat deapi-307160369
- Types of Regression PptÎncărcat deakhare16
- R Code for Canonical Correlation AnalysisÎncărcat deJose Luis Jurado Zurita
- as02Încărcat deLakshmi Seth
- Cambridge Mathematics Part IIÎncărcat deanuraggupta74@gmail.com
- Chapter 3Încărcat deromel
- Bolker Et Al 2009 General Mixed ModelÎncărcat deCarlos Andrade
- 09 Power & Sample SizeÎncărcat deafonsopilar
- Differential Geometry in Statistical InferenceÎncărcat dePotado Tomado
- CSEBook.pdfÎncărcat deresplandor
- 12.Simple Regression NLS Edit(1)Încărcat deZaldy Harrist
- Estimation and Hypothesis Testing With SPSSÎncărcat deneosmit
- Chapter 12Încărcat deJJBB33
- linearna regresijaÎncărcat deandri00
- STATISTICS FOR BUSINESS - CHAP09 - ANOVA.pdfÎncărcat deHoang Nguyen
- Hypothesis Testing Examples and ExercisesÎncărcat deWess Sklas
- Chap003 ModifiedÎncărcat dejgu1994
- What is a General Linear ModelÎncărcat deAshish Pandey
- Logistic Regression TutorialÎncărcat deDaphne
- MCA First Year Syllabus Manipur UniversityÎncărcat deMeitei opubunq
- On the Problem of Calibration G.K. Shukla TechnometricsÎncărcat demilos
- Additional Analysis Example Demonstrating Use of Stata Svy Logistic and Estat Gof CommandsÎncărcat devinhdtscribd
- confidence intervals Unit 8 1Încărcat deJoey Shackelford
- Taking risk out of systemic risk measurementÎncărcat deAmerican Enterprise Institute
- Kuliah 5, 6 - Simple Linear RegressionÎncărcat deFaizal Akbar

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.