Sunteți pe pagina 1din 6

Q:1

Definition of Median
Median is the middle data value of an ordered data set.

More About Median

If there are two middle values, then the median is the mean of the two numbers.

There will be two middle values when the number of values in the data set is
even.

Examples of Median

12,

Mean

23,

8,

46,

5,

42,

19

The median in the above data set is 19.

2
Measures of statistical dispersion

A measure of statistical dispersion is a nonnegative real number that is zero if all the data are the
same and increases as the data become more diverse.
Most measures of dispersion have the same units as the quantity being measured. In other words,
if the measurements are in metres or seconds, so is the measure of dispersion.

3
The standard deviation is the square root of the variance. The standard
deviation is expressed in the same units as the mean is, whereas the variance is

expressed in squared units, but for looking at a distribution, you can use either just
so long as you are clear about what you are using.
4

Addition Rule 2:

When two events, A and B, are non-mutually exclusive, the probability that
A or B will occur is:
P(A or B) = P(A) + P(B) - P(A and B)

In the rule above, P(A and B) refers to the overlap of the two events. Let's apply this rule to
some other experiments.

Experiment 5: In a math class of 30 students, 17 are boys and 13 are


girls. On a unit test, 4 boys and 5 girls made an A
grade. If a student is chosen at random from the class,
what is the probability of choosing a girl or an A
student?
Probabilities: P(girl or A) = P(girl) + P(A) - P(girl and A)
=

13
30

17
30

9
30

5
30

5
A continuous random variable is a random variable with a set of possible values
(known as the range) that is infinite and uncountable. Probabilities of continuous
random variables (X) are defined as the area under the curve of its PDF. Thus, only
ranges of values can have a nonzero probability.
6

Mean and Variance of the Binomial Distribution


The binomial distribution for a random variable X with parameters n and p represents the sum of
n independent variables Z which may assume the values 0 or 1. If the probability that each Z

variable assumes the value 1 is equal to p, then the mean of each variable is equal to 1*p + 0*(1p) = p, and the variance is equal to p(1-p). By the addition properties for independent random
variables, the mean and variance of the binomial distribution are equal to the sum of the means

and variances of the n independent Z variables, so


These definitions are intuitively logical. Imagine, for example, 8 flips of a coin. If the coin is fair,
then p = 0.5. One would expect the mean number of heads to be half the flips, or np = 8*0.5 = 4.
The variance is equal to np(1-p) = 8*0.5*0.5 = 2.

8
Normal distributions are defined by two parameters, the mean () and the
standard deviation (). 68% of the area of a normal distribution is within one
standard deviation of the mean. Approximately 95% of the area of a normal
distribution is within two standard deviations of the mean.
9

More Properties of Sampling Distributions


1. The overall shape of the distribution is symmetric and approximately normal.
2. There are no outliers or other important deviations from the overall pattern.
3. The center of the distribution is very close to the true population mean.

10
In statistics, interval estimation is the use of sample data to calculate an
interval of possible (or probable) values of an unknown population parameter, in
contrast to point estimation, which is a single number.

11

Constructing a Confidence Interval for


Lets review some of symbols and equations that we learned in previous lessons:

Sample size

Population mean

Sample mean

=X/N

x=x/n

Standard error of
Stander error of

SE(x)=sn

Multiplier
Degrees of freedom (one group)

df=n1

Recall from earlier this lesson, the general form for a confidence interval is pointestimate

(multiplier)(standarderror)

For a population mean, the point estimate is x


, the standard error is SE(x) and the multiplier is t
. When we put these together, the formula for a confidence interval for a population mean is
Confidence Interval for a Population Mean

xt*s/n

12

A simple hypothesis is one in which all parameters of the distribution are specified. For example,
if the heights of college students are normally distributed with 2=4, the hypothesis that its
mean is, say,62, that is Ho:=62, we have stated a simple hypothesis, as the mean and
variance together specify a normal distribution completely. A simple hypothesis, in general,
states that =o where o is the specified value of a parameter , ( may represent

,p,12
etc).
A hypothesis which is not simple (i.e. in which not all of the parameters are specified) is called a
composite hypothesis.For instance, if we hypothesize that Ho:>62
(and 2=4) orHo:=62 and 2<4, the hypothesis becomes a composite hypothesis because
we cannot know the exact distribution of the population in either case. Obviously, the parameters

>62 and2<4 have more than one value and no specified values are being assigned. The
general form of a composite hypothesis is o or o, that is the parameter does not
exceed or does not fall short of a specified value o. The concept of simple and composite
hypotheses applies to both null hypothesis and alternative hypothesis.
13

The null hypothesis is rejected if the p-value is less than a predetermined level, .
is called the significance level, and is the probability of rejecting the null
hypothesis given that it is true (a type I error). It is usually set at or below 5%.
14
Type I and type II errors. ... In statistical hypothesis testing, a type I error is the
incorrect rejection of a true null hypothesis (a "false positive"), while a type II error
is incorrectly retaining a false null hypothesis (a "false negative").
15
In statistics, simple linear regression is a linear regression model with a single
explanatory variable.[1][2][3][4] That is, it concerns two-dimensional sample points with
one independent variable and one dependent variable (conventionally, the x and y
coordinates in a Cartesian coordinate system) and finds a linear function (a nonvertical straight line) that, as accurately as possible, predicts the dependent
variable values as a function of the independent variables. The adjective simple
refers to the fact that the outcome variable is related to a single predictor.
16
Multiple linear regression (MLR) is a statistical technique that uses several
explanatory variables to predict the outcome of a response variable. The goal of
multiple linear regression (MLR) is to model the relationship between the
explanatory and response variables.

S-ar putea să vă placă și