0 Voturi pozitive0 Voturi negative

60 (de) vizualizări7 paginiAnalysis of Depression Data

Mar 08, 2015

© © All Rights Reserved

PDF, TXT sau citiți online pe Scribd

Analysis of Depression Data

© All Rights Reserved

60 (de) vizualizări

Analysis of Depression Data

© All Rights Reserved

- The Law of Explosive Growth: Lesson 20 from The 21 Irrefutable Laws of Leadership
- Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
- Hidden Figures Young Readers' Edition
- The E-Myth Revisited: Why Most Small Businesses Don't Work and
- Micro: A Novel
- The Wright Brothers
- The Other Einstein: A Novel
- State of Fear
- State of Fear
- The Power of Discipline: 7 Ways it Can Change Your Life
- The Kiss Quotient: A Novel
- Being Wrong: Adventures in the Margin of Error
- Algorithms to Live By: The Computer Science of Human Decisions
- The 6th Extinction
- The Black Swan
- The Art of Thinking Clearly
- The Last Battle
- Prince Caspian
- A Mind for Numbers: How to Excel at Math and Science Even If You Flunked Algebra
- The Theory of Death: A Decker/Lazarus Novel

Sunteți pe pagina 1din 7

STA 138

December 4, 2014

Final Project

Introduction: The data that I will be analyzing for this report deals with whether a a patient id

diagnosed or not diagnosed with depression in a visit during one year of care. There are many

ways in which a patient can be diagnosed with depression, so there are many more variables

not taken into account with this model, that may affect the results greatly, but for the sake if this

project, I will try and predict whether a patient will be diagnosed with depression using stepwise

logtisitic regression. The variables for this data are as follows:

Diagnosis of depression in any visit during one

DAV

Physical component of SF-36 measuring health

PCS

Mental component of SF-36 measuring health

MCS

The Beck depression score of the patient

BECK

PGEND

AGE

EDUCAT

The response variable is DAV. The explanatory variables are PCS, MCS, BECK, and PGEND

which indicates the gender, AGE, which indicates the age, and EDUCAT which tells the number

of years of formal schooling.

Materials and Methods: For this project, I will be testing to see if we can predict whether a

patient will be diagnosed with depression based on the variables above, and pick the best

model ,using stepwise logistic regression. 400 patients were randomly selected from primary

care facilities and the above 7 variables were recored for each patient.

SAS Code and Results:

To read in the data and format:

The very first thing I did was to check to make sure that the model with main effect did not

include any multi collinearity, I did this with the following code:

From the output, it is clear that there is no

multicollinearity in the model since the variance

inflation for each of the variables is much lower

than 10.

Nest, I want to create the logistic model. I will fist show you the model with only the main effects

(no interactions) that were chosen from the stepwise logistic regression in SAS, though

this is not the model that I will use. The code that was used to obtained the best model

through forward stepwise regression was:

So our model for the best picked from SAS, with

no interactions is:

log(/1-) = -2.3093 - (0.047)MCS +

(0.0721)BECK - (0.6633)PGEND +

(.1785)EDUCAT

I chose to use a similar model as the one above, with two extra interaction terms.I chose these

interaction terms in the model because I believe that firstly, the interaction between

PGEND and EDUCAT can help with the prediction of depression because educations

effect may differ depending on the gender of the patient. Secondly, the interaction between

PGEND and BECK, I believe, may help with prediction of depression because the Becks

depression score may differ depending on gender as well. So the SAS code for the model

described above is:

and the partial output from this to obtain the model is:

So the model is :

log(/1-) = 0 + 1x1 + 2y

An explanation of the variables used for my final model is as follows: The intercept 0 is -2.7921

which is for when all of the other parameters are equal to zero. The slope estimate for

MCS is -0.0487, which means that when MCS increases by one unit, the odds of the

patient being diagnosed with depression is (e^(-0.0487) = 0.9524) 0.9524 the odds of the

patient not being diagnosed with depression. For EDUCAT, when a patient does one extra

year of formal schooling, the odds of the patient being diagnosed with depression is

(e^(.2151)) 1.24 times the odds of the patient not being diagnosed with depression. For

BECK, when a patients Beck depression score increases by one unit, the odds of that

patient being diagnosed with depression is (e^(.0813)) 1.085 times the odds of that paient

not being diagnosed with depression. For PGEND, we can say that the odds of the patient

being diagnosed with depression as a male are (e^(1.3659)) 3.92 the odds of the patients

being diagnosed with depression as a female.

reported 95% Wald confidence intervals for the

above variables described. For MCS, we can be

95% confident that when the MCS unit increases

by one, the odds of that person being diagnosed

with depression will increase by between 2.6% and 6.3%. For EDUCAT, we can be 95%

confident that with one year of additional formal education, the odds of that person being

diagnosed with depression will increase by between 5.9% and 34.9%. For BECK, we can

be 95% confident that when the Beck depression score increases by one unit, the odds of

that person being diagnosed with depression will increase by between 1.0% and 14.3%.

For PGEND, since the 95% confidence interval contains 1, it is not statistically significant.

-Residual Analysis:

It is clear form the chart to the right, that

there are many outliers, which have a

Pearson and deviance Residual of over

the absolute value of 2.0, so they are

influencing the coefficients and the

goodness of fit. After looking at the data

output (which is not displayed because

it is too big), I can see that the following

observations have a Pearson and

Deviance residual of over the absolute

value of 2.0: observations 22, 115, 173,

194, 255, 260, 286, 316, 323, 325, 333, 353, and 368. These observation numbers are the

ones corresponding to the output form SAS, with observation 1 being nothing (the header).

So if I were to adjust the observations to match exactly the observations form the data,

they would be the observation numbers listed above minus 1: 21, 114, 172, 193, 254, 259,

287, 315, 322, 324, 332, 352, and 367. Out of these adjusted observations, the 5

observations with the highest Pearson and Deviance residuals are observations (with

Pearson residual, Deviance residual): 193(5.47, 2.62), 259(3.54, 2.28), 315(4.20, 2.42),

324(4.31, 2.44), 352(5.75, 2.65).

-Influential Observations

Looking at the hat matrix diagonal column (what we were told to do in class) from the SAS

output, it is clear, after carefully looking, that there is really only one influential observation

which is not even listed as a residual, it is observation 378, with a hat matrix diagonal

equal to .0892, which is much higher than any of the others (the next highest is .02). With

a hat matrix diagonal so high, this means that this observation is affecting the the

parameter estimates.

-Goodness of Fit

The percent concordant is 76.5 and

the percent discordant is 23.1. This is

relatively a good thing, with Somers

D, Gamma, and C being relatively

high (.535, .537, and .767

respectively). This means that (using

Somers D) there 53.5% concordants (or agreement) with the model that we have

selected. This isnt an excellent number, but it still implies that there is some association.

So we can conclude with Somers D that the average difference in he percent concordant

and percent discordant is 53.5%, which means our model is doing an okay job at

predicting. We could do a similar analysis for Gamma and say that since it is positive and

relatively large (.537), that there is some association.

With the lowest AIC(305.201), I chose to

work with the best model chosen by

SAS with stepwise regression over the

model with only the intercept (AIC:

353.736), and over the best model with

the interaction terms (AIC: 308.519). With the Hosmer and Lemshow Goodness-of-fit test

in SAS, we can see that the 2 statistic is: 2 = 7.4172 with 8 degrees of freedom and

p-value= 0.4924. Since we have such a large p-value in this case, we will fail to reject

H0: Model fits the data well, and conclude that the model IS a good fit for the data.

likelihood estimates table from above, for

the best fit model so we can test the s to

see if they are statistically significant: To

start, Lets take a look at 0. To test, we

have H0: 0=0, against

our Wald Chi Square Statistic is 3.897, with p-value= 0.0484, (p-value < .05), we can

conclude that 0 has statistical significance, Ha: 00.

Statistic is 9.773, with p-value= 0.0018, (p-value < .05), we can conclude that 1 has

statistical significance, Ha: 20.

Lets take a look at 2. To test, we have H0: 2=0, against

Statistic is 5.2214, with p-value= 0.0223, (p-value < .05), we can conclude that 2 has

statistical significance, Ha: 20.

Lets take a look at 3. To test, we have H0: 3=0, against

Statistic is 3.8280, with p-value= 0.0504, (p-value .05), we can conclude that 3 has NO

statistical significance, H0: 3=0. Just because this fails the test though, does not mean it

should not be included in the model. It is right on the edge of being statistically significant

and it can play part in predicting ones depression diagnosis, so I believe it should stay in

the model.

Lastly, lets take a look at 4. To test, we have H0: 4=0, against

Square Statistic is 8.36009, with p-value= 0.0038, (p-value < .05), we can conclude that

4 has statistical significance, Ha: 20.

-Goodness-of-link function

This was done with the code:

Output:

the estimated new variable (linkf) is

1.4111, with a Wald Chi Square Statistic

of .4469, and p-value = .5038. We can conclude, since p-value > .05, that this variable is

statistically insignificant, so the link function is appropriate.

Conclusion and Discussion: A logistic regression model was fit to the data, using stepwise

logistic regression. I found that we can be 95% confident that when the MCS unit

increases by one, the odds of that person being diagnosed with depression will increase

by between 2.6% and 6.3%. For EDUCAT, we can be 95% confident that with one year of

additional formal education, the odds of that person being diagnosed with depression will

increase by between 5.9% and 34.9%. For BECK, we can be 95% confident that when the

Beck depression score increases by one unit, the odds of that person being diagnosed

with depression will increase by between 1.0% and 14.3%. And that for PGEND, since the

95% confidence interval contains 1, it is not statistically significant.

I also did residual analysis on the data, and searched for influential observations. I found there

was an influential observation that was not included in the residuals, which was

observation 378. Even though there were several residuals and an influential observation,

the model was still found to be a good fit for the data, which was determined from the

Hosmer and Lemshow goodness-of-fit test.

As a result from the statements above, I was able to conclude that the Variables BECK,

EDUCAT, MCS, and PGEND, are all associated with the depression diagnosis of a patient.

Based on the result, we cannot reject the Null Hypothesis that the model is fitting the data;

yet I would be more comfortable with a model that provides more support of fit. Therefore,

I recommend researching additional covariates in order to make more reliable predictions,

such as How many people the patient talks to on a dally basis, if the patient has a hobby,

and so on.

- RegressionÎncărcat deShahid Aziz
- On the determination of the kinetic parameters for the BOD testÎncărcat deClaudinei José Rodrigues
- sakshamassignment2Încărcat deSaksham Aggarwal
- Examples.pdfÎncărcat deGag Paf
- rmlogitpostestimation.pdfÎncărcat deandres57042
- Business Research and Report Writing Report Report - TelecommunicationÎncărcat deMuhammad Talha
- Analysis of Solid Waste Management Logistics and IÎncărcat dePrajval Somani
- Gra FiacaÎncărcat deMoldes Acero Inoxidable
- 101197 ID Analisis Keputusan Konsumen Memilih BahaÎncărcat deKhofifah Awalia Ramadani
- The Business Outsourcing in Telecommunication Industry Case of PakistanÎncărcat deJolita Vveinhardt
- MultinomialLogisticRegression_BasicRelationshipsÎncărcat devsuarezf2732
- Forensic Dental Age Estimation by Measuring Root Dentin Translucency Area Using a New Digital TechniqueÎncărcat deSushant Pandey
- Applied Regression II (Qixuan Chen) P8110 - Syllabus 2016Încărcat desykim657
- 2018 BSCpE ANNEX III Course Specifications Nov. 28 2017Încărcat deRoman Marcos
- S8SPC-12Încărcat deRoxy Roxa
- Chap 013Încărcat deBruno Marins
- SSRN-id1010103Încărcat deIosias
- 122 p 121 to 137 pstat122Încărcat debanjo211
- Estimating Source TermÎncărcat deAli ALdoosy
- Yu ShuoÎncărcat deBubu Bear
- Hansen Aise Im Ch03Încărcat deRoristua Pandiangan
- Verb a Ken Brian ListÎncărcat deMicah Scott
- Kelly 2006Încărcat deĐạt Nguyễn
- Data SpssÎncărcat deRazye D'kuakraohampawekowmongomh
- Detecting Earnings ManagementÎncărcat deAndreea Violeta
- Handaru (2013)Încărcat deLindaWulanRiana
- Binomial Logistic Regression Using SPSSÎncărcat deKhurram Zahid
- statsÎncărcat deAditya Mahadevan
- publicacionÎncărcat dexavico

- Mixture Models and Target DensityÎncărcat deAustin Kinion
- A Sense of the Declining popularity of Baseball : A Time Series AnalysisÎncărcat deAustin Kinion
- Spam Email Classification 3Încărcat deAustin Kinion
- Spam Email Classification 2Încărcat deAustin Kinion
- Airline Data AnalysisÎncărcat deAustin Kinion
- SQL and Shell Baseball AnalysisÎncărcat deAustin Kinion
- Part 1 of spam detectionÎncărcat deAustin Kinion
- Therapy analysisÎncărcat deAustin Kinion

- Compact Constant Flow (CF) BubblerÎncărcat dewebadminjk
- Cylinders content.pdfÎncărcat deNessa Caranzo
- Grafik Scr 3 FasaÎncărcat deAhmad Yon Khoerul
- Oxmetrics ManualÎncărcat deAna
- THE EIGHT PARTS OF SPEECH.docxÎncărcat demica
- Transient Multiphase Production DesignÎncărcat dePanzer_Ronald
- Petroleum Geoscience and Geophysics Chapter 2Încărcat deShu En Seow
- skripsi tablet feÎncărcat deYessiana Luthfia Bahri
- Windows RegistryÎncărcat deAkhil Bhattathiripad
- Catalogo Vent-O-Mat RBXcÎncărcat defrmalthus
- Controlling the Pitch and Yaw Angles of Twin Rotor MIMO System_2014Încărcat deKawshigan Kawshigan
- AsphaltÎncărcat deSheenly David
- AnnaÎncărcat deHarish Kumar
- Catalogo de Jacks 2013Încărcat deColorin Ramirez
- RT600 PartsÎncărcat dejuan castaeda
- GJBB-V1(2)2012-2R_2Încărcat deDewi Perceka Sari
- Bibliography of Mesopotamian Astral ScienceÎncărcat deAnonymous r57d0G
- Single and Two Stage CompressorsÎncărcat deELGi India
- A discrete binary version of particle swarm algorithm.pdfÎncărcat deJuan Sebastián Poveda Gulfo
- Consumer Demand TheoryÎncărcat defmshaon
- Ghsa 750-006 Metric DatasheetÎncărcat deBogdan St
- FTB CryptographyÎncărcat deSamuel Nii Annan O'Neil
- SoxÎncărcat depepegallagher
- 03 External ConvectionÎncărcat dealagar krishna kumar
- 1Încărcat deTarek Al Ashhab
- SuperLÎncărcat deMarcelo Sánchez
- Chapter 1 Introduction (Semiconductors)Încărcat deMarshall Harper
- Ansys DemoÎncărcat desonirocks
- TS700 UserÎncărcat deJames Thomson
- Work Completion ReportÎncărcat deDilip Suryawanshi

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.