Sunteți pe pagina 1din 6

Math 127B, Basic R, R commands

Mrinal Raghupathi
This is a list of the commands we use most often in R. The list is not comprehensive, but it covers
most of the things we need.
1 Making lists, data frames
Basic list creation Make a list and assign it to a name x <- c(1,2,3)
Sequence seq(start, stop, step) creates the list of numbers beginning at start, in steps of
size step until it reaches stop.
> x <- seq(3,15,4)
> x
[1] 3 7 11 15
Repetition rep(number, times) produces a list with number repeated times times.
> x <- rep(5,7)
> x
[1] 5 5 5 5 5 5 5
Dollar The $ sign can be used to extract a properties. For example the value returned from the
lm command, or the column of a data frame.
Data frame data.frame() can be used to make a data frame. Supply the names of the columns
and the corresponding values with options of the form name = value.
> men <- c(45, 86, 12)
> women <- c(30 ,105 ,21)
> both <- data.frame(gents = men , ladies = women)
> both
gents ladies
1 45 30
2 86 105
3 12 21
> both$ladies
[1] 30 105 21
You can assign a list as a new column of the data frame by using the $ operation and assigning
the list to it.
1
2 Statistics
Frequency tables table(x) produces the frequencies for entries in x.
> grades <- c(a, b, a, b,a,a,a,a,b)
> length(grades)
[1] 9
> table(grades)
x
a b
6 3
Using two lists you can get the cross tabulation. For example here are the subject areas in
which the grades were recieved.
> subjects <- c(math , math , physics , physics ,math ,
math , physics , physics , math )
> table(subjects)
subjects
math physics
5 4
> table(grades , subject)
subject
grades math physics
a 3 3
b 2 1
Correlation cor(x,y) computes the correlation between x and y.
Linear regression lm(y ~ x) returns a lot of information about the regression line of y on x.
The most useful of these is fitted.values which produces a list of the predicted values of y
based on the regression line.
> x <- c(1,2,3,4,5)
> y <- c(1,3,2,5,4)
> cor(x,y)
[1] 0.8
> r <- lm(y ~ x)
> r$ fitted.values
1 2 3 4 5
1.4 2.2 3.0 3.8 4.6
Mean mean(x) computes the mean or average of the list.
> x <- c(1,3,7,11,9.4,-45)
> mean(x)
[1] -2.266667
2
Standard Deviation sd(x) computes the standard deviation of the list x. It computes what the
book refers to as SD
+
.
> x <- c(1,3,7,11,9.4,-45)
> sd(x)
[1] 21.27220
Maximum and minimum The maximum and minimum values in a list can be produced with
the max(x) and min(x).
3 Reading and writing data
Table read.table(filename, header, col.names) reads data from a le that is tab or space
separated. The options header and col.names indicate whether the le contains a header row,
and the col.names can be used to give names to these columns.
> colnames <- c(brandname , nicotine , tar ,
+ weight , carbon.monoxide )
> cig <- read.table(cigarettes.dat , col.names = colnames)
> cig [1:5 ,]
brandname nicotine tar weight carbon.monoxide
1 Alpine 14.1 0.86 0.9853 13.6
2 Benson&Hedges 16.0 1.06 1.0938 16.6
3 BullDurham 29.8 2.03 1.1650 23.5
4 CamelLights 8.0 0.67 0.9280 10.2
5 Carlton 4.1 0.40 0.9462 5.4
CSV read.csv(filename, header, col.names) reads data from a le that is comma separated.
The options header and col.names indicate whether the le contains a header row, and the
col.names can be used to give names to these columns. By default the header option is TRUE,
which is common for csv les.
> al <- read.csv(al_salaries_2003.csv )
> al[1:5 ,]
Team Player Salary Position
1 New York Yankees Acevedo , Juan 900000 Pitcher
2 New York Yankees Anderson , Jason 300000 Pitcher
3 New York Yankees Clemens , Roger 10100000 Pitcher
4 New York Yankees Contreras , Jose 5500000 Pitcher
5 New York Yankees Flaherty , John 750000 Catcher
4 Plotting
Histogram hist(x) a histogram of x. This command accepts the following options:
3
breaks number of bins or breaks into which to divide the data.
col color of the bars.
xlab and ylab labels for the horizontal and vertical axes.
main title for the plot
border color of the lines around the bars.
Scatter plots plot(x,y) scatter plot of y against x.
Points You can add points to a scatter plot by using the command points(x,y). The plotting
commands accept the following options:
col the color of the marker.
pch the shape of the marker. pch = 3 produces little plus signs, pch = 16 produces
dots.
Lines lines(x,y) produces lines connecting the points species, in order.
New plot dev.new() creates a new blank plotting surface in case you want to draw plots side by
side.
Save to PDF dev.copy2pdf(filename) produces a pdf copy of the current plot and saves it to
the le specied by filename
5 Indexing, slicing, ordering
Slice You can slice a list, i.e., extract some values as follows. If you x is a list and i is a list of
whole numbers, then x[i] will extract the entries corresponding to i from the list x. A colon
can be used to produce a sequence.
> x <- c(2,3,5,7,11,13)
> i <- c(2,3,5)
> x[i]
[1] 3 5 11
> x[1:3]
[1] 2 3 5
The same process can be used on a data frame. Since a data frame is essentially two dimen-
sional we need to decide whether we want columns of row. To slice and extract some rows we
can use x[i,]. To get columns we can do x[,i]. Here is both from above.
> both[,1]
[1] 45 86 12
> both$gents
[1] 45 86 12
> both[c(1,3),]
gents ladies
4
1 45 30
3 12 21
> both [2:3 ,]
gents ladies
2 86 105
3 12 21
> both [2:3 ,]
Order order(x) sorts the list x in descending order and then returns a list of indices that tell us
how to reorder the list.
> x <- c(0,10,5)
> order(x)
[1] 1 3 2
The above code tells us that the smallest number is the rst one, the third number is next
smallest and the second number is largest. So that x[1] x[3] x[2]. We can use this to sort
a list or data frame.
> both[order(both$gents ),]
gents ladies
3 12 21
1 45 30
2 86 105
The order can be reversed with a minus sign.
Test There are 6 basic tests on a list.
== equality
!= not equal
<, >, <=, >= less than, greater than, less than equal to, greater than or equal to.
> x <- c(2,3,5,7,11,13)
> x
[1] 2 3 5 7 11 13
> x == 4
[1] FALSE FALSE FALSE FALSE FALSE FALSE
> x > 5
[1] FALSE FALSE FALSE TRUE TRUE TRUE
> x >= 5
[1] FALSE FALSE TRUE TRUE TRUE TRUE
> x != 3
[1] TRUE FALSE TRUE TRUE TRUE TRUE
All return a TRUE FALSE list that can be used to extract values. You can ip TRUE and
FALSE with an exclamation point !.
5
> tf <-c(T,F,T,T)
> tf
[1] TRUE FALSE TRUE TRUE
> !tf
[1] FALSE TRUE FALSE FALSE
6 Tests of signicance
chi-square chisq.test(x, p, rescale.p) does a chi square test. There are three variations.
chisq.test(x) is the goodness-of-t test is carried out with the assumption that all
observations should be equally likely.
chisq.test(x, p) is the goodness-of-t test with the expected probabilities given by
p. If you specify expected frequencies or percentages, then use rescale.p = TRUE.
If x is a data frame, or a cross tabulation, then the command chisq.test(x) carries
out a test of independence.
6

S-ar putea să vă placă și