Sunteți pe pagina 1din 37

ESTADISTICA DESCRIPTIVA

SESION 3-4
Sesin 3 y 4
TAREAS
2.31
2.71
2.89


Estudio
enumerativo

Estadstica
descriptiva
Recoleccin
de datos
Presentacin
de datos
Caracterizacion
de datos
Papel de paquetes
de computacion
Estadisticas
Parmetros
Muestra
Estudio
analtico
Poblacin
Mejora
del proceso
Pensamiento
estadstico
Distribucin
de porcentaje
acumulado


Distribucin
de frecuencia
relativa


Distribucin
de porcentaje


Distribucin de
frecuencia relativa
acumulada

Histograma Polgono
Grfica
digipunto
Diagrama de
tallo y hojas
Clesificacin
ordenada
Presentacin de
datos nmericos
No Si
Tiempo
ordenado
Ojiva
Distribucin de
frecuencia
Polgono
No agrupados Agrupados
Forma de
datos
Datos sin
procesar
Claificacin
ordenada
Diagrama de
tallos y hojas
Distribucin
de frecuencia
Ojiva
Propiedades de datos
nmericos
Tendencia
central
Variacin
Coeficiente de
variacin
Varianza
Desvio
estndar
Forma
Grafica de
Caja y bigotes
Rango
Rango
intercuartil
Media
Mediana
Moda
Rango medio
Eje medio
Total 100%
Fila 100%
Columna
100 %
Grfica de
pastel
Grfica de
barras
Grfica de puntos
Diagrama
De Pareto


Tabla de
resumen


Supertabla


Presentacinde
Datos categoticos

>2
1 2
Tabulaciones cruzadas
(Tabla de contigencias)

Nmero
de
variables

Resumen Medidas
Tendencia Central
Media
Mediana
Moda
Cuartil
Media geomtrica
Resumen Medidas
Variacin
Variacin
Desviacin estndar
Coeficiente
de Variacin
Rango
Medidas de Tendencia Central
Tendencia Central
Media
Mediana
Moda
Media Geometrica
1
1
n
i
i
N
i
i
X
X
n
X
N

=
=
=
=

( )
1/
1 2
n
G n
X X X X =
Media (Media Aritmtica)
Media (Media Aritmtica) de un conjunto
de datos
Media simple



Media de la poblacin
1 1 2
n
i
i n
X
X X X
X
n n
=
+ + +
= =

1 1 2
N
i
i N
X
X X X
N N

=
+ + +
= =

Tamao muestra
Tamao Poblacin
Media (Aritmetica)
La ms Comn Medida de Tendencia
Central
Afectada por Valores Extremos (Outliers)
(continua)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Media = 5 Media = 6
Aproximacin a la Media Aritmetica
Usada cuando la disposicin de los datos no est
disponible




n = tamao de la muestra
c = nmero de clases
m
j
= media de clase
f
j
= frecuencia de clase
Media (Aritmetica
(continua)
1
sample size
number of classes in the frequency distribution
midpoint of the th class
frequencies of the th class
c
j j
j
j
j
m f
X
n
n
c
m j
f j
=
=
=
=
=
=

Mediana
Medida robusta de Tendencia Central
No es Afectada por valores Extremos



En un arreglo ordenado, la Mediana is la Media de
los Numeros
Si n o N is impar, la mediana is el nmero del medio
Si n o N is par, la mediana is el promedio de los 2
nmeros del medio
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mediana = 5
Mediana = 5
Moda
Una medida de Tendencia Central
Valor que ocurre con ms frecuencia
No es afectado por valores extremos
Puede no haber una Moda
Puede haber varias Modas
Se usa para valores Numericos y Categoricos
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Moda = 9
0 1 2 3 4 5 6
No Moda
Media Geometrica
Util para medir Variaciones en el tiempo


Media Geometrica del retorno sobre la inversin
Mide la situacin de una inversin en el tiempo
( )
1/
1 2
n
G n
X X X X =
( ) ( ) ( )
1/
1 2
1 1 1 1
n
G n
R R R R = + + + (

Ejemplo
Una inversin de $100,000 descendi a $50,000 al fin del ao 1 y
se recuper a $100,000 al fin del ao 2:
1 2
0.5 (or 50%) 1 (or 100% ) R R = =
( ) ( )
( ) ( )
1/ 2
1/ 2
1/ 2
Average rate of return:
( 0.5) (1)
0.25 (or 25%)
2
Geometric rate of return:
1 0.5 1 1 1
0.5 2 1 1 1 0 (or 0%)
G
R
R
+
= =
= + (

= = = (

Media del retorno
Media geomtrica del retorno
( ) ( )
( ) ( )
1/ 2
1/ 2
1/ 2
Average rate of return:
( 0.5) (1)
0.25 (or 25%)
2
Geometric rate of return:
1 0.5 1 1 1
0.5 2 1 1 1 0 (or 0%)
G
R
R
+
= =
= + (

= = = (

Cuartiles
Ordenar los datos en 4 Cuartiles


Posicion del i-th Quartile



y son Medidas No centrales de ubicacin
= Mediana, una Medida de Tendencia Central
25% 25% 25% 25%
( )
1
Q
( )
2
Q ( )
3
Q
Datos ordenados: 11 12 13 16 16 17 18 21 22
( ) ( )
1 1
1 9 1 12 13
Position of 2.5 12.5
4 2
Q Q
+ +
= = = =
1
Q
3
Q
2
Q
( )
( )
1
4
i
i n
Q
+
=
Medidas de Variacion
Variacion
Varianza
Desviacion estndar Coeficiente de
Variacion
Varianza
Poblacin
Varianza
Muestra
Desviacion
estndar
poblacin
Desviacion
estndar
muestra
Rango
Rango intercuartil
Rango
Meadida de Variacion
Diferencia entre el valor ms grande y el
ms pequeo de las Observaciones:


Ignora Cmo est distribuida la data


Largest Smallest
Range X X =
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
Medida de Variacion
Diferencia entre el primero y tercer Cuartil

No es afectado por valores extremos
3 1
Interquartile Range 17.5 12.5 5 Q Q = = =
Rango Intercuartil
Data in Ordered Array: 11 12 13 16 16 17 17 18 21
( )
2
2
1
N
i
i
X
N

o
=

=

Importante Medida de Variacion
Muestra Variacion respecto a la media
Varianza Muestra:



Varianza Poblacin:
( )
2
2
1
1
n
i
i
X X
S
n
=

=

Varianza
Desviacion estandar
Ms Importante Medida de Variacin
Muestra Variation respecto a la Media
Tiene la misma unidad que los datos
originales
Sample Standard Deviation:



Population Standard Deviation:
( )
2
1
1
n
i
i
X X
S
n
=

=

( )
2
1
N
i
i
X
N

o
=

=

Approximating the Standard Deviation
Used when the raw data are not available and the
only source of data is a frequency distribution




Standard Deviation
(continued)
( )
2
1
1
sample size
number of classes in the frequency distribution
midpoint of the th class
frequencies of the th class
c
j j
j
j
j
m X f
S
n
n
c
m j
f j
=

=

=
=
=
=

Comparing Standard Deviations


Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.57
Data C
Coefficient of Variation
Measure of Relative Variation
Always in Percentage (%)
Shows Variation Relative to the Mean
Used to Compare Two or More Sets of Data
Measured in Different Units


Sensitive to Outliers
100%
S
CV
X
| |
=
|
\ .
Comparing Coefficient
of Variation
Stock A:
Average price last year = $50
Standard deviation = $2
Stock B:
Average price last year = $100
Standard deviation = $5
Coefficient of Variation:
Stock A:

Stock B:
$2
100% 100% 4%
$50
S
CV
X
| | | |
= = =
| |
\ . \ .
$5
100% 100% 5%
$100
S
CV
X
| | | |
= = =
| |
\ . \ .
Shape of a Distribution
Describe How Data are Distributed
Measures of Shape
Symmetric or skewed
Mean = Median =Mode

Mean < Median < Mode
Mode < Median < Mean
Right-Skewed Left-Skewed Symmetric
Exploratory Data Analysis
Box-and-Whisker
Graphical display of data using 5-number
summary
Median( )
4 6 8 10
12
X
largest
X
smallest
1
Q
3
Q
2
Q
Distribution Shape &
Box-and-Whisker
Right-Skewed Left-Skewed Symmetric
1
Q
1
Q
1
Q
2
Q
2
Q
2
Q
3
Q
3
Q
3
Q
The Empirical Rule
For Most Data Sets, Roughly 68% of the
Observations Fall Within 1 Standard Deviation
Around the Mean
Roughly 95% of the Observations Fall Within 2
Standard Deviations Around the Mean
Roughly 99.7% of the Observations Fall Within 3
Standard Deviations Around the Mean

The Bienayme-Chebyshev Rule
The Percentage of Observations Contained Within
Distances of k Standard Deviations Around the Mean
Must Be at Least
Applies regardless of the shape of the data set
At least 75% of the observations must be contained within
distances of 2 standard deviations around the mean
At least 88.89% of the observations must be contained within
distances of 3 standard deviations around the mean
At least 93.75% of the observations must be contained within
distances of 4 standard deviations around the mean
( )
2
1 1/ 100% k
Coefficient of Correlation
Measures the Strength of the Linear
Relationship between 2 Quantitative
Variables




( )( )
( ) ( )
1
2 2
1 1
n
i i
i
n n
i i
i i
X X Y Y
r
X X Y Y
=
= =

=


Features of Correlation
Coefficient
Unit Free
Ranges between 1 and 1
The Closer to 1, the Stronger the Negative Linear
Relationship
The Closer to 1, the Stronger the Positive Linear
Relationship
The Closer to 0, the Weaker Any Linear
Relationship
Scatter Plots of Data with
Various Correlation Coefficients
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1 r = -.6 r = 0
r = .6 r = 1
Pitfalls in Numerical Descriptive
Measures and Ethical Issues
Data Analysis is Objective
Should report the summary measures that best meet the
assumptions about the data set
Data Interpretation is Subjective
Should be done in a fair, neutral and clear manner
Ethical Issues
Should document both good and bad results
Presentation should be fair, objective and neutral
Should not use inappropriate summary measures to distort
the facts
Resumen 3 y 4
Describe Medidas de Tendencia Central
Media, Mediana, Moda, Media geomtrica
Uso de Cuartiles
Describe Medidas de Variacin
Rango, Rango Intercuartil, Varianza y Desviacin
estandar, Coeficiente de variacin
Representaciones grficas de la distribucin
Simetra, Sesgos, Uso grficos de caja y bigote
Funcin de momento,
expresa la cantidad de
movimiento
Expresa
posicin
Resumen
Descripcin de reglas empricas y
desigualdad de Chebyshev
Discusin del Coeficiente de Correlacin
Trampas dirigidas en estadstica descriptiva
y aspectos ticos

(continuacin)

S-ar putea să vă placă și