Documente Academic
Documente Profesional
Documente Cultură
UserID: ciser
Part II
Password: download
1
6. Do Files: 7. Do Files:
Creating Do Files Executing Do Files
• Do-files allow commands to be saved and • To run do-file in Stata, type:
executed in “batch” form. do dofilename
• We will use the Stata do-file editor to write do- • Can “comment out” lines by preceding with *
files. or by enclosing text within /* and */.
• Can also use WordPad or Notepad: • Can save the contents of the Review window as
a do-file by right-clicking on window and
– Save as “Text Document” with extension selecting “Save Review Contents”.
“.do” (instead of “.txt”). Allows larger files than
do-file editor.
• Note: a blank line must be included at the end of
a WordPad do-file (otherwise last line will not
• To open do-file editor click Window à Do-File run).
Editor or click
2
EXERCISE 1 (cont.) 13. Data Management:
12. Generating Simple Statistics Sorting Data
• Use tabulate to calculate the number of years of • sort puts the observations in dataset in a
high unemployment, highun, by decade, i.e. specific order:
1940s, 1950s (Slide 9)
• Use table command to produce a table that shows sort varlist
the mean of realgdp and the sum of
• Some procedures require file to be sorted before
persbankrupt by decade, rows, and highun,
they can be executed, e.g. merge.
columns (Slide 10)
• Save the do-file using any name you wish (Slide 6) • You can sort a file based on more than one
variable.
• Example:
sort mpg weight
3
18. Combining Datasts: 19. Combining Datasets:
Appending (Example) Merging
data1 data3 • To join corresponding observations from a Stata dataset
id Name Age Source id Name Age Source with those in the dataset in memory, type:
001 Alice 21 wave1 001 Alice 21 wave1 merge [varlist] using filename [, options]
002 Mary 20 wave1 002 Mary 20 wave1
003 David 23 wave1 003 David 23 wave1 • A match merge joins observations with common values
004 Jim . wave2 of varlist, which must be present in both datasets.
data2 005 Linda . wave2 • The update option updates missing values of same-
named variables in master with values from the using
Example:
id Name Source dataset
use data1,clear
004 Jim wave2 • The update option with replace replaces all values of
005 Linda wave2
append using data2 same-named variables in master with nonmissing values
save data3 from the using dataset
4
24. Combining Datasets: EXERCISE 2
Merging (cont.) 25. Combining Datasets: Merging
• To form all pairwise combinations between two • Open "State admission data.dta" - the
datasets, type: using dataset - and sort by state, then save
(Part I and Slide 13)
joinby [varlist] using filename • Open "Stata course data 3.dta“ - the
master dataset - and sort by state (Part I and
• Unlike merge, joinby can handle “many-to- Slide 13)
many” merges. • Merge with "State admission data.dta"
using state as the match variable (Slide 19)
• Type tabulate _merge to check the results of
the merge. Think about what the values of this
indicate. (Slide 23)
5
EXERCISE 3 31. Panel Data Analysis:
30. Collapsing What is Panel Data?
• Collapse the dataset "Combined state • Panel data generally refer to the repeated
data.dta" by region to produce a dataset observation of a set of fixed entities at fixed
containing the means of inc and unemplrate intervals of time (also known as longitudinal
(Slide 27) data).
• Which region has historically had the lowest • Stata is particularly good at arranging and
unemployment rate? analyzing panel data.
• Stata refers to two panel display formats:
• Which region has historically had the highest
income level? – Wide form: useful for display purposes and
often the form data obtained in.
• Do not save the new dataset. Instead, use – Long form: needed for regressions etc.
outsheet to export the new dataset into Excel
file named "collapse.xls"
6
36. Panel Data Analysis: 37. Panel Data Analysis:
Dummy Variables (cont.) Lag Variables
group group g1 g2 g3 • Assuming the data are in chronological order,
1 lags can be created with:
1 1 0 0
3
gen lagname = varname[_n-1]
3 0 0 1
2
• Similarly _n+1 gives lead.
2 0 1 0
• Care must be taken with panel data (in long
1 1 1 0 0 form) so that first observation in each state etc.
has a missing value. Use, for example:
2 2 0 1 0
use sp500, clear
tab group, gen(g) gen lag_vol=volume[_n-1]
7
42. Regression Analysis: 43. Regression Analysis:
Regression Basics Regression Basics (cont.)
• To perform a linear regression of depvar on • Some more sophisticated estimators:
varlist, type: – Logit
regress depvar [varlist] [if exp] [in logit depvar varlist
range] [, noconstant] – Probit
• depvar is the dependent variable. probit depvar varlist
• varlist is the set of independent variables – Panel regression
(regressors).
xtreg depvar varlist
• By default Stata includes a constant. The
noconstant option excludes it.
• Example:
regress mpg weight length
8
EXERCISE 5 (cont.) EXERCISE 5 (cont.)
48. Simple Regression Analysis 49. Simple Regression Analysis
• Run a linear regression explaining realgdpcap • Regress inc on unemplrate and the set of
in terms of unemplrate, persbankrupt and dummy variables without a constant (Slide 42),
treasurybillrate. Which are significant using:
regressors (at the 5% level)? (Slide 42)
• Save the dataset, clear the memory and open regress inc unemplrate st1-st50 ,nocons
the dataset “Combined state data.dta”.
• Perform a test of the null hypothesis that the
(Part I)
coefficients of unemplrate is insignificant.
• Create a set of 51 dummy variables for the
(Slide 45)
states, using: tab stateabb, gen(st).
(Slide 35)
variables, type:
scatter varlist
0
0 2 4 6
line varlist
4,000
Mileage (mpg)
30
mean of price
20
2,000
10
9
54. Graphical Data Exploration: 55. Graphical Data Exploration:
Scatter Plot Linear Prediction Plot
scatter mpg weight twoway (scatter mpg weight) (lfit mpg
weight)
40
40
30
Mileage (mpg)
30
20
20
10
10
2,000 3,000 4,000 5,000 2,000 3,000 4,000 5,000
Weight (lbs.) Weight (lbs.)
1200
ylabel
– Changing the markers used: msymbol
1100
10
60. Graphical Data Exploration: EXERCISE 6 (cont.)
Saving Graphs 61. Graphs
• Graphs are not saved by log files (separate • Open the dataset “Combined state
windows). data.dta”. (Part I)
• Select File à Save Graph. • Create a dataset of time-averaged data using
the collapse command. Specifically, create a
• To insert in a Word document etc., select Edit à dataset, by stateabb, containing the means of
Copy and then paste into Word document. This inc and unemplrate (Slide 28)
can be resized but is not interactive (unlike Excel • Create a scatterplot of inc against
charts etc.). unemplrate using stateabb as the marker.
(Slides 54 & 57)
Exercise 1 Exercise 3
cd c:\stataworkshop
use "Combined state data",clear
use “US economic data”, clear
gen yearstr = string(year) collapse (mean) inc unemplrate, by(region)
gen decade = substr(yearstr,3,1) outsheet using "collapse.xls"
* gen decade = substr(string(year),3,1)
tab decade highun
table decade highun, contents(mean realgdp sum persbankrupt) Exercise 4
Exercise 2 use "Combined state data",clear
use "State admission data", clear keep state year unemplrate
sort state reshape wide unemplrate, i(state) j(year)
save "State admission data", replace
use "Stata course data 3", clear reshape long unemplrate, i(state) j(year)
sort state
merge state using "State admission data"
tab _merge
replace region="South" if stateabb=="DC"
save "Combined state data"
11
66. Solutions (cont.)
Exercise 5
use "US economic data",clear
gen realgdpcap=1000000000*realgdp/population
pwcorr realgdpcap unemplrate, sig
Exercise 6
use "Combined state data", clear
collapse (mean) inc unemplrate, by(stateabb)
scatter inc unemplrate, mlabel(stateabb)
12