Documente Academic
Documente Profesional
Documente Cultură
10.5 Swan (1986) gives the following data from a study of infant respiratory disease. Each cell
of the table shows the number out of so many observed children who developed bronchitis
or pneumonia in their first year of life, classified by sex and type of feeding (with the risk
in parentheses).
The major question of interest is whether the risk of illness is affected by the type of feeding.
Also, is the risk the same for both sexes and, if there are differences between the feeding
groups, are they the same for boys and girls?
(i) Fit all possible linear logistic regression models to the data. Use your results to answer
all the preceding questions through significance testing. Summarize your findings using
odds ratios with 95 % confidence intervals.
Existe la posibilidad de ajustar 3 modelos diferentes para encontrar tales diferencias,
uno en el cual solo se tenga en cuenta tipo de alimentación, otro para género y otro
en el cual se evalúen las dos covariables.
The LOGISTIC Procedure
Model Information
Data Set ADE.SIRS
Response Variable (Events) illness
Response Variable (Trials) total
Model binary logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total
Value Binary Outcome Frequency
1 Event 238
2 Nonevent 1836
Response Profile
Ordered Total
Value Binary Outcome Frequency
Modelo ajustado al tipo de alimentación
1 Event donde: feeding
238 1 = Bottle only, feeding
2 = Breast + supplement, feeding2 3Nonevent
= Breast only. 1836
En este modelo ajustado se obtiene Ψ̂f eeding,13 = 1.967 como estimación para la
razón de odds, fijando como nivel de referencia el nivel 3, Breast only; lo que quiere
decir es que los niños que se alimentan Bottle only tienen aproximadamente 1.97
veces más riesgo de sufrir una enfermedad respiratoria comparado con los bebes
alimentados Breast only; este riesgo se considera significativo ya que su intervalo
de confianza no contiene el 1, (1.458, 2.654).
2
Model Information
Data Set ADE.SIRS
Response Variable (Events) illness
Response Variable (Trials) total
Model binary logit
Optimization Technique Fisher's scoring
3
Number of Observations Read 6
Number of Observations Used 6
Sum of Frequencies Read 2074
Sum of Frequencies Used 2074
Response Profile
Ordered Total
Value Binary Outcome Frequency
1 Event 238
Con este modelo sin interacción entre Sex y Feeding se obtienen los verdaderos
valores de los parámetros estimados cuando intervienen como variebles sin tener
interacción, sus razones de odds e intervalos de confianza, ası́:
Comparación Odds ratio IC 95 %
Boys : Girls 1.37 (1.04,1.80)
Bottle : Breast 1.95 (1.45,2.51)
Mixed : Breast 1.64 (1.08,2.05)
4
(ii) Fit the model with explanatory variables sex and type of feeding (but no interaction).
Calculate the residuals, deviance residuals and standardised deviance residuals and
comment on the results.
Luego de realizar el ajuste del modelo con el PROC LOGISTIC se buscaron los resi-
duales en el PROC GENMOD con la opción output que arroja la siguiente tabla:
Sex Feed Resid Dev Resid St Dev Resid
Boy Bottle 0.8742 0.1096 0.2462
Boy Mixed -2.1175 -0.5052 -0.8579
Boy Breast 1.2433 0.1922 0.3670
Girl Bottle -0.8742 -0.1342 -0.2473
Girl Mixed 2.1175 0.5896 0.8279
Girl Breast -1.2433 -0.2284 -0.3707
Programa SAS
data ADE.SIRS;
input illness total sex$ feeding$;
cards;
77 458 1 1
19 147 1 2
47 494 1 3
48 384 2 1
16 127 2 2
31 464 2 3
;
run;
5
10.10 Repeat the analysis of Exercise 6.1, the unmatched case–control study of oral contraceptive
use and breast cancer, using logistic regression modelling. Compare results.
6.1 In a case–control study of the use of oral contraceptives (OCs) and breast cancer in
New Zealand, Paul et al. (1986) identified cases over a 2-year period from the National
Cancer Registry and controls by random selection from electoral rolls. The following
data were compiled.
Used OCs? Cases Controls
Yes 310 708
No 123 189
Total 433 897
(i) Estimate the odds ratio for breast cancer, OC users versus nonusers. Specify a
95 % confidence interval for the true odds ratio.
Luego de ajustar el modelo de regresión logı́stica: [model cases/total = UsedOC]
en SAS, se obtuvo como resultado el estimador del efecto UsedOCs = −0.3963
para un Odss Ratio = 0.673. El signo negativo en el estimador y el valor del OR <
1 en Ψ̂ = e−0.3963 = 0.673, con un IC al 95 % para el OR = Ψ̂ es (0.517, 0.876)
sin incluir el 1 (adquiriendo significancia), indica que Yes en UsedOC disminuye
la probabilidad del riesgo de cáncer de seno en 67.3 %. Entonces UsedOC es un
factor protector contra el cáncer de seno.
Las mujeres con No en UsedOC tienen un riesgo Ψ̂ = 1/0.673 = 1.48 veces mayor
de sufrir de cáncer de seno que las mujeres con Yes en UsedOC.
6
The LOGISTIC Procedure
Model Information
Data Set ADE.OC
Response Variable (Events) cases
Response Variable (Trials) total
Model binary logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total
Value Binary Outcome Frequency
1 Event 433
2 Nonevent 897
7
10.13 Refer to the venous thromboembolism matched case–control study of Exercise 6.12.
In a matched case–control study of venous thromboembolism (VTE) and use of hormone replacement
therapy (HRT), Daly et al. (1996) screened women aged 45–64 years admitted to hospitals in the Oxford
Regional Health Authority (UK) with a suspected diagnosis of VTE. From these, 103 cases of idiopathic
VTE were recruited. Each case was individually matched with up to two hospital controls with diagnoses
judged to be unrelated to HRT use, such as diseases of the eyes, ears or skin. Matching criteria were 5-year
age group, district of admission and date of admission (between 2 weeks before and 4 months after the
admission date of the corresponding case). Altogether there were 178 controls. The data are available from
the web site for this book (Appendix A).
Confirm the following summary table:
(i) Use logistic regression to repeat the analysis of Exercise 6.12. Compare results. Using
this summary table,
a. Test for no association between hormone replacement therapy (HRT) use and venous throm-
boembolism.
b. Estimate the odds ratio, and find the associated 95 % confidence interval, for HRT users versus
nonusers.
Al realizar el análisis de forma similar al ejercicio 6.12 pero usando regresión logı́stica
y en la cual solo se tiene en cuenta la variable HRT (similar al ejercicio 6.12) se
obtienen las siguientes resultados:
El test para verificar si existe asociación entre HRT y VTE (significancia del fac-
tor HRT ) arrojó una estadı́stica de 11.9516 con un valor − p = 0.0005, rechazando
la hipótesis nula y dando razón con una significancia del 5 % que si hay asociación
estadı́stica entre HRT y la aparición de VTE.
8
The LOGISTIC Procedure
Conditional Analysis
Model Information
Data Set ADE.VTE
Response Variable CC
Number of Response Levels 2
Number of Strata 103
Model binary logit
Optimization Technique Newton-Raphson ridge
Response Profile
Ordered Total
Value CC Frequency
1 0 178
2 1 103
Strata Summary
CC
Response Number of
Pattern 0 1 Strata Frequency
1 1 1 28 56
2 2 1 75 225
SC 203.608 196.105
-2 Log L 203.608 190.467
Analysis of
Analysis of Conditional
Conditional Maximum
Maximum Likelihood
Likelihood Estimates
Estimates
Standard
Standard Wald
Wald
Parameter
Parameter DF
DF Estimate
Estimate Error
Error Chi-Square
Chi-Square Pr >
Pr > ChiSq
ChiSq
HRT
HRT 1
1 1.0957
1.0957 0.3170
0.3170 11.9516
11.9516 0.0005
0.0005
9
Model Information
Data Set ADE.VTE
Response Variable CC
Number of Response Levels 2
Number of Strata 103
Number of Uninformative Strata 3
Frequency Uninformative 4
Model binary logit
Esta fue la salida SAS de donde se extrajo laNewton-Raphson
Optimization Technique
información
ridge
para dar los anteriores
resultados, que además son muy similares a los que se obtuvieron en la realización del
ejercicio 6.12 (Ψ = 3.00 IC95 %(1.61,
Number 5.59), Read
of Observations donde igualmente
281 se concluyó que habı́a
asociación entre HRT y VTE es of
Number decir ambas
Observations metodologı́as
Used 278 conllevan a las mismas
conclusiones, siendo la regresión
Numberlogı́stica masInformative
of Observations acertada274y con mayor detalle.
Response Profile
(ii) The associated dataset also includesOrdered
data on body Total
mass index (BMI), a potential
confounding factor in the relationshipValue between HRT and venous thromboembolism.
CC Frequency
Note: 3 observations were deleted due to missing values for the response, explanatory, or strata variables.
Strata Summary
CC
Response Number of
Pattern 0 1 Strata Frequency
1 0 1 2 2
2 1 1 26 52
3 2 0 1 2
4 2 1 74 222
10
Según los resultados obtenidos para BMI Ψ = 3.039 IC95 %(1.607, 5.748) como covariable
de HRT , este no incide como un factor confusor en la relación HRT y VTE. La variación
con respecto a los resultados antes expuestos es mı́nima, casi imperceptible.
Además los resultados de BMI no inciden puesto que su Ψ = 1.056 lo que lleva a concluir
que el OR de BMI no afecta el riesgo por HRT para VTE.
Programa SAS
11
10.6 Saetta et al. (1991) carried out a prospective, single-blind experiment to determine whether
gastric content is forced into the small bowel when gastric-emptying procedures are emplo-
yed with people who have poisoned themselves. Each of 60 subjects was asked to swallow
20 barium-impregnated polythene pellets. Of the 60, 20 received a gastric lavage, 20 recei-
ved induced emesis and 20 (controls) received no gastric decontamination. The number of
residual pellets, counted by x-ray, in the intestine after ingestion for each subject was, for
the induced emesis group:
0, 15, 2, 0, 0, 15, 1, 16, 0, 1, 1, 0, 6, 0, 0, 1, 0, 16, 7, 11
for the gastric lavage group:
9, 3, 4, 15, 3, 5, 0, 0, 2, 11, 0, 0, 0, 0, 7, 5, 9, 0, 0, 0
and for the control group:
0, 9, 0, 0, 4, 5, 0, 0, 13, 0, 0, 12, 0, 0, 1, 0, 4, 4, 6, 7
Considerando estos datos:
Sin embargo se decide explorar un anova para observar que está sucediendo “por
dentro”de la estructura de esos datos “aperezados”.
El valor para cada una de estas medias (grupos ranqueados) es para el grupo control :
28.9, emasis: 32.125 y lavage: 30.475 con una desviación estándar de 60.75
12
The NPAR1WAY Procedure
Kruskal-Wallis Test
Chi-Square DF Pr > ChiSq
0.3758 2 0.8287
13
A continuación se presentan el programa SAS y sus salidas respectivas, que soportan
los resultados anteriormente comentados en la implementación anova:
Programa SAS
%web_drop_table(ADE.GE);
%web_open_table(ADE.GE);
/* ANOVA */
proc glm data=ADE.GE order=data plots=diagnostics;
class gastric;
model respel=gastric;
lsmeans gastric / pdiff cl;
mean gastric / hovtest;
run;
14
/* kruskal wallis - Exact Wilcoxon Two-Sample Test */
proc npar1way data=ADE.GE wilcoxon;
class gastric;
exact wilcoxon / mc;
var respel;
run;
15
The UNIVARIATE Procedure
Variable: respel
Moments
N 60 Sum Weights 60
Mean 3.83333333 Sum Observations 230
Std Deviation 5.02929272 Variance 25.2937853
Skewness 1.2014062 Kurtosis 0.24103148
Uncorrected SS 2374 Corrected SS 1492.33333
Coeff Variation 131.198941 Std Error Mean 0.6492789
Quantiles (Definition 5)
Level Quantile
100% Max 16.0
99% 16.0
95% 15.0
90% 12.5
75% Q3 6.5
50% Median 1.0
25% Q1 0.0
10% 0.0
5% 0.0
1% 0.0
0% Min 0.0
Extreme Observations
Lowest Highest
Value Obs Value Obs
16
0 49 15 37
0 48 15 38
0 47 15 60
0 46 16 39
0 45 16 40
17
The GLM Procedure
18
19
The GLM Procedure
Least Squares Means
20
Least Squares Means for Effect gastric
i j Difference Between Means 95% Confidence Limits for LSMean(i)-LSMean(j)
1 2 -1.350000 -4.569169 1.869169
1 3 -0.400000 -3.619169 2.819169
2 3 0.950000 -2.269169 4.169169
21
Note: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used.
22
respel
Level of
gastric N Mean Std Dev
control 20 3.25000000 4.24108973
emesis 20 4.60000000 6.29452561
lavage 20 3.65000000 4.46359544
23