Sunteți pe pagina 1din 4

Brief stats outline

Distributions
• To find outliers: 1.5 X IQR rule
• When describing: Shape, center, spread
◦ Left skewed: mean<median
◦ Right skewed: median>mean
◦ Standard deviation & mean for symmetrical distributions
◦ 5 number summary for skewed distributions
• Always plot your data
Trend lines
• y = a + b*x
• Transformations
◦ adding a constant → add to mean, don't change σ
◦ Exponential (a*bx): log(y) vs. x
◦ Power (a*xp): log(y) vs. log(x)
• ŷ gives the predicted value
• Define your variables
• Residuals
◦ Make a residual plot and make sure that you don't find a pattern
◦ residual = yobserved - ypredicted
◦ r2: how much better ŷ is than the mean of y
▪ “(___)% of the variation in (response variable) is explained by the least-squares
regression line”
Density curves
• Area of 1
Normal distributions
• Use z-scores
• 68%, 95%, 99.7% rule
2 variable relationships
• Direction, form, strength
◦ Direction: positive association, negative association
◦ Form: linear, curved?
◦ Strength: residuals
• 2 way table (for categorical variables)
◦ Marginal distribution (of the columns): Add the totals of each row and compare them to
each other
◦ Conditional distribution (of 1 column): Compare each cell in that column
◦ Simpson's paradox: An association or comparison that holds for all of several groups can
reverse direction when the data are combined to form a single group
• Establishing causation
◦ Don't assume causation
◦ Causation, common response, confounding
Samples
• Bad sampling techniques:
◦ voluntary response sampling
◦ Convenience sampling
• Good sampling techniques
◦ SRS (simple random sample)
▪ give each individual and each sample an equal chance of being chosen
▪ Steps
• Label each individual with a number
• Use Table B
• Stopping rule
• Identify sample: convert the numbers into the individuals
◦ Stratified Random Sampling
▪ SRS's of subgroups
◦ Systematic Random Sampling
▪ every kth individual
◦ Cluster Sampling
▪ Randomly choose groups. Use everyone in those groups
◦ Bias
▪ Undercoverage
▪ Nonresponse
Experiments
• Observational study: do not influence responses
• Experiment: give individuals a treatment
• Control, replication, randomization
• Block Design: when you know about a confounding variable
◦ Matched-pairs design: Compare a subject to himself
• Drawing
◦ Always say how many subjects for each step
◦ Random Allocation, Group, Treatment, Compare
Probability
• Satisfy 2 requirements:
◦ 0≤P(A)≤1
◦ ∑Pi = 1
• Disjoint: No events in common
• Independent: Knowing one doesn't change the probability of another
• Use venn-diagrams or tree diagrams
• Law of Large Numbers, but no “Law of small numbers”
Random Variables
• Discrete: countable number of possible values
• Continuous: All values inside an interval
◦ Use density curves
• Transformations
◦ μa+bx = a+b*μx
◦ μx±y = μx±μy
◦ σ2a+bx = b2σ2x
◦ σ2x±y = σ2x+ σ2y
Binomial Distributions
• Conditions
◦ Each observation a “success” or “failure”
◦ There is a fixed number n of observations
◦ The n observations are independent
◦ The probability of success, p, is the same for each observation
• X = number of successes
• Abbreviation: B(n,p)
• Probability
◦ pdf: for one value of X
◦ cdf: for an interval of X's
◦ Normal Approximations
▪ Conditions: np≥10 and n(1-p)≥10
Geometric Distributions
• Conditions
◦ Same as binomial, except the variable of interest is the number of trials required to obtain
the first success
▪ Number of trials is not fixed
• P(X=n) = (1-p)n-1p
• P(X>n) = (1-p)n
• μx = 1/p
• σx = (1-p)/p2

S-ar putea să vă placă și