Sunteți pe pagina 1din 5

Statistika-završni

Mean square error:point estimate of residual variance


Standard error of estimate:point estimate of residual standard deviation
Raw data:data collected in original form
Frequency distribution:organisation of row data in table form,using classes and frequencies
Class limits:smallest and largest data values that an belong to a class
Frequency:the number of values in a specific class of distribution
Boundaries:used to separate the classes so that there are no gaps in the frequency distribution
Class width:differences between the lower class limits
Class midpoint:the arihtmetic mean of its upper and lower class limits
Class limits:
1.Stated:the class intervals of a frequency distribution when stated as discrete categories
2.True:classes whose upper true limit is the same as the lower of the next class
3.True or real:class intervals of a frequency distribution when stated as in continous categories

Measures of central tendency:mean,median,mode,midrange,midhinge


Measures of variability:range,interquartile range,variance,standard deviation,coefficient of
variation,coef. of quartile deviation
Shape:skew,kurtosis
Sample statistics:when the measured are computed for data from a sample
Population parameters:when the measures are computed for data from a population

Mean:average of all the data values


Weighted mean:when the mean is computed by giving each data value a weight that refflects its
importance
Trimmed mean:the mean of remaining data,a better indicator of the central location of the data
Merits:mean is well understood by most people,computation of mean is easy
Demerits:sensitive to extreme value,does not reflect the actually central tendency
Harmonic mean:one of several kinds of average (ungraped data,graped data)

Properties of the arithmetic mean:


1.Every set of intervallevel and ratio-level data has a mean
2.All the values are incuded in computing the mean
3.A set of data has a unique mean
4.The mean is affected by unusually small or large data values
5.The arithmetic mean is the only measure of central tendency where the sum of the deviations of
each value from the mean is zero

Median:the value in the middle when the data items are arranged in ascending order,divides the
numbers into two halves such that the number of items below is the same as the number of items
above it
Odd number of items:median value in the middle
Even number: median value of the two middle items
Merits:widely used measure of central tendency,not influenced by extreme values
Demerits:when the number of items are small median may not be representative because it is a
positia? average,median is appropriate when distribution is highly skewed

Mode:value that occurs with greatest frequency,most frequent value in the set of numbers
Biomodal:data with exactly two modes

Multimodal:data with more than two modes


Merits:quick and easy to determine,actual value of data,not affected by extreme scores,represents
the most typical value in the distribution
Demerits:sometimes not very informative,can change dramatically,it may not be uniquely defined

Mode vs. Median vs. Mean


used for nominal scales vs. highly skewed distribution of the interval vs. the most stable measure if
distribution is reasonably symetric

Percentile: provides information about how the data are spread over the interval from the smallest
value to the largest value
pth percetile:value such that at least p percent of the items take on this value or less,and at least
(100-p) percent of the items take on this value
Quartiles:specific percentiles
first quartile-25th percentile
second quartile-50th percentile (median)
third quartile-75th percentile

Range:the difference between the largest and smallest data value,the simpliest measure of
variability
Merits:simple to compute and understand,gives quick answer
Demerits:not reliable because it's affected by the extreme items,too indefinite for usage

Midrange:the arithmetic mean of the max. and min. values in a data set,measure of central tendency
Interquartile range:the difference between the third quartile and the first quartile,not affected by
the extreme values

Midhinge:the arithmetic mean of the first quartile and the third quartile,not affected by extreme
values

Variance:a measure of variability that utilizes the data,the average of the squared differences
between each data value and the mean
Standard deviation-the positive square root of the variance,small SD ->high degree of uniformity of
observation
Coefficient of variation:indicated how large the standard deviation is in relation to the
mean,measure of relative dispersion,used to compare two or more graphs,weakness:undefined if the
mean is zero or data is negative
Merits:rigidly defined,definite value,widely used and most appropriate method for comparing the
variability
Demerits:not easy to understand,difficult to calculate

Z-score:often called the standardized value,it denotes the number of standard deviation
Negative:less than zero - data value less than sample mean
Positive:greater than zero-data value greater than the sample mean

Coefficient of quartile deviation:indicates how large the deviction is in relation to the median
Merits:easy to calculate,simple to understand
Demerits:not based on all observations,affected by fluctuations,necessary to arrange data in
ascending order

Skewness:an important measure of the shape of a distribution,measure of symmetry


sk=0 ->median and mean are equal
sk=-x ->mean is less than median
sk=+x->mean is greater than median

Chebyshev's theorem:permits us to make statements about the percentage of items that must be
within a specific number of standard deviatons from the mean

Empirical rule:used for dara having a bell-shaped distribution to determine the percentage of items
that must be within a specified number of standard deviations of the mean

Outlier:an unusually small or unusually large value in a data set

Time series:a sequence of observation obtained over time,observations are taken at equally spaced
time intervals:yearly,quarterly,monthly,...
-can be decomposed into components:trend,seasonal,cycle,irregular
goals:modeling,characterization,forecasting
can be Stock-values at the point of time,Flow-values over a given period

Index number:number that expresses changes in a variable or group of variable with respect to
time,geographic location or other characteristics such as income,proffesion..

Simple index numbers:when the calcuations are single variables


Aggregate index numbers:calculations are group of variables

Ways of computing index numbers:


1.The fixed base method-a base year is selected and all subsequent changes are calculated with
respect to this base
2.The chain base method:changes are calculated against the value in the period immediately before
Chain index numbers:calculating ficed based index number from link relatives,for 1st year=100

Composite index numbers:a number that reflects an average relative changes in a group of vriables
compared to a base,the present perriod is called current period,and is offten presented like a
symbol,period in the past is called the base period
Most common used Composite index numbers:
1.Price index numbers
2.Quantity index number
3.Value index number

Aggregate price index:could be obtained by simply summing the pricesof several items or calculating
average of the prices

Problems in index number construction:


1.selection of base year
2.selection of commodities
3.the weights allocation
4.type of formula
5.the data for index numbers
-the purpose should be defined before the construction

Laspeyres method
Merits:easy to calculate,has an intuitive meaning,changes in the index number can be explained by
changes in the price
Demerits:tends to overstate inflation,no substition in terms of economic theory is allowed

Paasche's method
Merits:the weights are continously updated and always reflect current buying habits
Demerits:quantity data for current ear needed,prices have to be recomputed each year

Fisher's ideal index number:the geometric mean of the Laspeyres and Paasche's indeces,not widely
used because of it's limitations
Merits:uses geometric mean-the best average for the construction of index numbers
Demerits:Practical limitation of collecting data,it's not easy to enusage what it refers to

Limitations of index numbers:


1.Loss of information,one figure represents a mass of dara
2.Consumption pattern changes over time and index values might not be representative of the actual
3.Lack of adequate and accurate data
4.An idex number constructed for one purpose canot be used for other purposes
6.It's an average and it has all advantages and disadvantages of average

Geometric mean:average of link relatives,mean percentage of change,used in series behaving


approximately according to geometric proggression such as:population growth,financial
investment,savings,not calculated when the same value is zero

Forecasting:the underlying premise of time series models is that the historical pattern of the
dependent variable could be used as the basis for developing forecasts
Types of forecating models:nave model,moving average model
Forecasting errors:MAPE (mean absolute percentage error),MSE(mean squared error),ROOT
MSE(root mean squared error)
Naive model forecasting:assumes that things will remain the same and that whetever happened last
time will happen again this time,simple and efficient model

Moving average forecasting:The average of the previous observations of time series,reduces radnom
variation and ignores complex relationships in data,useful in relatively stable series without seasonal
component

Types of regresion models:


Simple:linear and non-linear
Multiple:linear and non-linear

Simple linear regresion model:equation that describes how y (as an dependent variable) is related to
x and an error term

Least squares method:a procedure for using sample data to find the estimated regression equation

Coefficient of determination:the proportion of the total variation in the dependent variable that is
explained or accounted for by the variation in the independent variable,it ranges from 0 to 1 and it's
the square og coef. of correlation

Coefficient of correlation:a measure og the strength of the relationship between two


variables,negative values indicate inverse relationshp and positive values indicate a direct one,a
correlation greater tha 0,8 is generally described as strong,and correlation smaller than 0,5 is
generally described as weak

Assumptions about the regression model error terms:


1.Mean zero:the mean of the error terms is equal to 0
2.Constant variance:variance of the error terms is the same for all values of x
3.Normality:error terms follow a normlal distribution for all values of x
4.Independence:values of the error terms are statistically independent of each other

S-ar putea să vă placă și