Documente Academic
Documente Profesional
Documente Cultură
Manuals Search this site
Linux Basics Linux Cluster
Programming in R
Author: Thomas Girke, UC Riverside
Contents
1 Introduction
2 R Basics
3 Code Editors for R
4 Integrating R with Vim and Tmux
5 Finding Help
6 Control Structures
6.1 Conditional Executions
6.1.1 Comparison Operators
6.1.2 Logical Operators
6.1.3 If Statements
6.1.4 Ifelse Statements
6.2 Loops
6.2.1 For Loop
6.2.2 While Loop
6.2.3 Apply Loop Family
6.2.3.1 For TwoDimensional Data Sets: apply
6.2.3.2 For Ragged Arrays: tapply
6.2.3.3 For Vectors and Lists: lapply and sapply
6.2.4 Other Loops
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 1/39
2/5/2016 R Programming - Manuals
6.2.5 Improving Speed Performance of Loops
7 Functions
8 Useful Utilities
8.1 Debugging Utilities
8.2 Regular Expressions
8.3 Interpreting Character String as Expression
8.4 Time, Date and Sleep
8.5 Calling External Software with System Command
8.6 Miscellaneous Utilities
9 Running R Programs
10 ObjectOriented Programming (OOP)
10.1 Define S4 Classes
10.2 Assign Generics and Methods
11 Building R Packages
12 Reproducible Research by Integrating R with Latex or Markdown
13 R Programming Exercises
13.1 Exercise Slides
13.2 Sample Scripts
13.2.1 Batch Operations on Many Files
13.2.2 Largescale Array Analysis
13.2.3 Graphical Procedures: Feature Map Example
13.2.4 Sequence Analysis Utilities
13.2.5 Pattern Matching and Positional Parsing of Sequences
13.2.6 Identify OverRepresented Strings in Sequence Sets
13.2.7 Translate DNA into Protein
13.2.8 Subsetting of Structure Definition Files (SDF)
13.2.9 Managing Latex BibTeX Databases
13.2.10 Loan Payments and Amortization Tables
13.2.11 Course Assignment: GC Content, Reverse & Complement
14 Translation of this Page
Introduction
[ Slides ] [ R Code ]
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 2/39
2/5/2016 R Programming - Manuals
General Overview
One of the main attractions of using the R (http://cran.at.rproject.org) environment is the ease with which users can write their own
programs and custom functions. The R programming syntax is extremely easy to learn, even for users with no previous programming
experience. Once the basic R programming control structures are understood, users can use the R language as a powerful
environment to perform complex custom analyses of almost any type of data.
Format of this Manual
In this manual all commands are given in code boxes, where the R code is printed in black, the comment text in blue and the output
generated by R in green. All comments/explanations start with the standard comment sign '#' to prevent them from being interpreted by
R as commands. This way the content in the code boxes can be pasted with their comment text into the R console to evaluate their
utility. Occasionally, several commands are printed on one line and separated by a semicolon ';'. Commands starting with a '$' sign
need to be executed from a Unix or Linux shell. Windows users can simply ignore them.
R Basics
The R & BioConductor manual provides a general introduction to the usage of the R environment and its basic command syntax.
Code Editors for R
Several excellent code editors are available that provide functionalities like R syntax highlighting, auto code indenting and utilities to send
code/functions to the R console.
Basic code editors provided by Rguis
RStudio: GUIbased IDE for R
VimRTmux: R working environment based on vim and tmux
Emacs (ESS addon package)
gedit and Rgedit
RKWard
Eclipse
TinnR
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 3/39
2/5/2016 R Programming - Manuals
Notepad++ (NppToR)
Programming in R using Vim or Emacs Programming in R
using RStudio
Integrating R with Vim and Tmux
Users interested in integrating R with vim and tmux may want to
consult the VimRTmux configuration page.
Finding Help
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 4/39
2/5/2016 R Programming - Manuals
Reference list on R programming (selection)
R Programming for Bioinformatics, by Robert Gentleman
Advanced R, by Hadley Wickham
S Programming, by W. N. Venables and B. D. Ripley
Programming with Data, by John M. Chambers
R Help & R Coding Conventions, Henrik Bengtsson, Lund University
Programming in R (Vincent Zoonekynd)
Peter's R Programming Pages, University of Warwick
Rtips, Paul Johnsson, University of Kansas
R for Programmers, Norm Matloff, UC Davis
HighPerformance R, Dirk Eddelbuettel tutorial presented at useR2008
C/C++ level programming for R, Gopi Goswami
Control Structures
Conditional Executions
Comparison Operators
equal: ==
not equal: !=
greater/less than: > <
greater/less than or equal: >= <=
Logical Operators
and: &
or: |
not: !
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 5/39
2/5/2016 R Programming - Manuals
If Statements
If statements operate on lengthone logical vectors.
Syntax
Example
if(1==0) {
print(1)
} else {
print(2)
}
[1] 2
Table of Contents
Avoid inserting newlines between '} else'.
Ifelse Statements
Ifelse statements operate on vectors of variable length.
Syntax
Example
Loops
The most commonly used loop structures in R are for, while and apply loops. Less common are repeat loops. The break function is
used to break out of loops, and next halts the processing of the current iteration and advances the looping index.
For Loop
For loops are controlled by a looping vector. In every iteration of the loop one value in the looping vector is assigned to a variable that can be
used in the statements of the body of the loop. Usually, the number of loop iterations is defined by the number of values stored in the looping
vector and they are processed in the same order as they are stored in the looping vector.
Syntax
for(variable in sequence) {
statements
}
Example
Example: condition*
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 7/39
2/5/2016 R Programming - Manuals
x <- 1:10
z <- NULL
for(i in seq(along=x)) {
if(x[i] < 5) {
z <- c(z, x[i] - 1)
} else {
z <- c(z, x[i] / x[i])
}
}
z
[1] 0 1 2 3 1 1 1 1 1 1
Table of Contents
Example: stop on condition and print error message
x <- 1:10
z <- NULL
for(i in seq(along=x)) {
if (x[i]<5) {
z <- c(z,x[i]-1)
} else {
stop("values need to be <5")
}
}
Error: values need to be <5
z
[1] 0 1 2 3
Table of Contents
While Loop
Similar to for loop, but the iterations are controlled by a conditional statement.
Syntax
while(condition) statements
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 8/39
2/5/2016 R Programming - Manuals
Example
z <- 0
while(z < 5) {
z <- z + 2
print(z)
}
[1] 2
[1] 4
[1] 6
Table of Contents
Apply Loop Family
For TwoDimensional Data Sets: apply
Syntax
X: array, matrix or data.frame; MARGIN: 1 for rows, 2 for columns, c(1,2) for both; FUN: one or more functions; ARGs: possible arguments
for function
Example
For Ragged Arrays: tapply
Applies a function to array categories of variable lengths (ragged array). Grouping is defined by factor.
Syntax
Example
For Vectors and Lists: lapply and sapply
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 10/39
2/5/2016 R Programming - Manuals
Both apply a function to vector or list objects. The function lapply returns a list, while sapply attempts to return the simplest data object,
such as vector or matrix instead of list.
Syntax
lapply(X, FUN)
sapply(X, FUN)
Example
Other Loops
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 11/39
2/5/2016 R Programming - Manuals
Repeat Loop
Syntax
repeat statements
Loop is repeated until a break is specified. This means there needs to be a second statement to test whether or not to break from the loop.
Example
z <- 0
repeat {
z <- z + 1
print(z)
if(z > 100) break()
}
Table of Contents
Improving Speed Performance of Loops
Looping over very large data sets can become slow in R. However, this limitation can be overcome by eliminating certain operations in loops
or avoiding loops over the data intensive dimension in an object altogether. The latter can be achieved by performing mainly vectortovecor or
matrixtomatrix computations which run often over 100 times faster than the corresponding for() or apply() loops in R. For this purpose,
one can make use of the existing speedoptimized R functions (e.g.: rowSums, rowMeans, table, tabulate) or one can design custom
functions that avoid expensive R loops by using vector or matrixbased approaches. Alternatively, one can write programs that will perform all
time consuming computations on the Clevel.
(1) Speed comparison of for loops with an append versus an inject step:
(2) Speed comparison of apply loop versus rowMeans for computing the mean for each row in a large matrix:
(3) Speed comparison of apply loop versus vectorized approach for computing the standard deviation of each row:
(4) Example for computing the mean for any custom selection of columns without compromising the speed performance:
## In the following the colums are named according to their selection in myList
myList <- tapply(colnames(myMA), c(1,1,1,2,2,2,3,3,4,4), list)
myMAmean <- sapply(myList, function(x) rowMeans(myMA[,x]))
colnames(myMAmean) <- sapply(myList, paste, collapse="_")
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 13/39
2/5/2016 R Programming - Manuals
colnames(myMAmean) <- sapply(myList, paste, collapse="_")
myMAmean[1:4,]
C1_C2_C3 C4_C5_C6 C7_C8 C9_C10
1 0.0676799 -0.2860392 0.09651984 -0.7898946
2 -0.6120203 -0.7185961 0.91621371 1.1778427
3 0.2960446 -0.2454476 -1.18768621 0.9019590
4 0.9733695 -0.6242547 0.95078869 -0.7245792
## Alternative to achieve the same result with similar performance, but in a much less elegant way
myselect <- c(1,1,1,2,2,2,3,3,4,4) # The colums are named according to the selection stored in
myselect
myList <- tapply(seq(along=myMA[1,]), myselect, function(x) paste("myMA[ ,", x, "]", sep=""))
myList <- sapply(myList, function(x) paste("(", paste(x, collapse=" + "),")/", length(x)))
myMAmean <- sapply(myList, function(x) eval(parse(text=x)))
colnames(myMAmean) <- tapply(colnames(myMA), myselect, paste, collapse="_")
myMAmean[1:4,]
C1_C2_C3 C4_C5_C6 C7_C8 C9_C10
1 0.0676799 -0.2860392 0.09651984 -0.7898946
2 -0.6120203 -0.7185961 0.91621371 1.1778427
3 0.2960446 -0.2454476 -1.18768621 0.9019590
4 0.9733695 -0.6242547 0.95078869 -0.7245792
Table of Contents
Functions
A very useful feature of the R environment is the possibility to expand existing functions and to easily write custom functions. In fact, most of
the R software can be viewed as a series of R functions.
Syntax to define functions
Syntax to call functions
myfct(arg1=..., arg2=...)
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 14/39
2/5/2016 R Programming - Manuals
Table of Contents
Syntax Rules for Functions
General
Functions are defined by (1) assignment with the keyword function, (2) the declaration of arguments/variables (arg1, arg2,
...) and (3) the definition of operations (function_body) that perform computations on the provided arguments. A function
name needs to be assigned to call the function (see below).
Naming
Function names can be almost anything. However, the usage of names of existing functions should be avoided.
Arguments
It is often useful to provide default values for arguments (e.g.:arg1=1:10). This way they don't need to be provided in a function
call. The argument list can also be left empty (myfct < function() { fct_body }) when a function is expected to return
always the same value(s). The argument '...' can be used to allow one function to pass on argument settings to another.
Function body
The actual expressions (commands/operations) are defined in the function body which should be enclosed by braces. The
individual commands are separated by semicolons or new lines (preferred).
Calling functions
Functions are called by their name followed by parentheses containing possible argument names. Empty parenthesis after the
function name will result in an error message when a function requires certain arguments to be provided by the user. The
function name alone will print the definition of a function.
Scope
Variables created inside a function exist only for the life time of a function. Thus, they are not accessible outside of the function.
To force variables in functions to exist globally, one can use this special assignment operator: '<<'. If a global variable is used in
a function, then the global variable will be masked only within the function.
Example: Function basics
Example: Function with optional arguments
Control utilities for functions: return, warning and stop
Return
The evaluation flow of a function may be terminated at any stage with the return function. This is often used in combination
with conditional evaluations.
Stop
To stop the action of a function and print an error message, one can use the stop function.
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 16/39
2/5/2016 R Programming - Manuals
Warning
To print a warning message in unexpected situations without aborting the evaluation flow of a function, one can use the function
warning("...").
Useful Utilities
Debugging Utilities
Several debugging utilities are available for R. The most important utilities are: traceback(), browser(),
options(error=recover), options(error=NULL) and debug(). The Debugging in R page provides an overview of the available
resources.
Regular Expressions
R's regular expression utilities work similar as in other languages. To learn how to use them in R, one can consult the main help page on this
topic with ?regexp. The following gives a few basic examples.
The grep function can be used for finding patterns in strings, here letter A in vector month.name.
month.name[grep("A", month.name)]
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 17/39
2/5/2016 R Programming - Manuals
month.name[grep("A", month.name)]
[1] "April" "August"
Table of Contents
Example for using regular expressions to substitute a pattern by another one using the sub/gsub function with a back reference. Remember:
single escapes '\' need to be double escaped '\\' in R.
Example for split and paste functions
x <- gsub("(a)", "\\1_", month.name[1], perl=TRUE) # performs substitution with back reference which
inserts in this example a '_' character
x
[1] "Ja_nua_ry"
strsplit(x, "_") # splits string on inserted character from above
[[1]]
[1] "Ja" "nua" "ry"
paste(rev(unlist(strsplit(x, NULL))), collapse="") # reverses character string by splitting first all
characters into vector fields and then collapsing them with paste
[1] "yr_aun_aJ"
Table of Contents
Example for importing specific lines in a file with a regular expression. The following example demonstrates the retrieval of specific lines from
an external file with a regular expression. First, an external file is created with the cat function, all lines of this file are imported into a vector
with readLines, the specific elements (lines) are then retieved with the grep function, and the resulting lines are split into vector fields with
strsplit.
Interpreting Character String as Expression
Example
Time, Date and Sleep
Example
system.time(ls()) # returns CPU (and other) times that an expression used, here ls()
user system elapsed
0 0 0
date() # returns the current system date and time
[1] "Wed Dec 11 15:31:17 2012"
Sys.sleep(1) # pause execution of R expressions for a given number of seconds (e.g. in loop)
Table of Contents
Calling External Software with System Command
The system command allows to call any commandline software from within R on Linux, UNIX and OSX systems.
system("...") # provide under '...' command to run external software e.g. Perl, Python, C++ programs
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 19/39
2/5/2016 R Programming - Manuals
Table of Contents
Related utilities on Windows operating systems
x <- shell("dir", intern=T) # reads current working directory and assigns to file
shell.exec("C:/Documents and Settings/Administrator/Desktop/my_file.txt") # opens file with associated
program
Table of Contents
Miscellaneous Utilities
(1) Batch import and export of many files.
In the following example all file names ending with *.txt in the current directory are first assigned to a list (the '$' sign is used to anchor the
match to the end of a string). Second, the files are imported onebyone using a for loop where the original names are assigned to the
generated data frames with the assign function. Consult help with ?read.table to understand arguments row.names=1 and
comment.char = "A". Third, the data frames are exported using their names for file naming and appending the extension *.out.
(2) Running Web Applications (basics on designing web client/crawling/scraping scripts in R)
Example for obtaining MW values for peptide sequences from the EXPASY's pI/MW Tool web page.
Running R Programs
(1) Executing an R script from the R console
source("my_script.R")
Table of Contents
(2.1) Syntax for running R programs from the commandline. Requires in first line of my_script.R the following statement:
#!/usr/bin/env Rscript
$ Rscript my_script.R # or just ./myscript.R after making file executable with 'chmod +x my_script.R'
All commands starting with a '$' sign need to be executed from a Unix or Linux shell.
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 21/39
2/5/2016 R Programming - Manuals
(2.2) Alternatively, one can use the following syntax to run R programs in BATCH mode from the commandline.
The output file lists the commands from the script file and their outputs. If no outfile is specified, the name used is that of infile and .Rout
is appended to outfile. To stop all the usual R command line information from being written to the outfile, add this as first line to my_script.R
file: options(echo=FALSE). If the command is run like this R CMD BATCH nosave my_script.R, then nothing will be saved in the
.Rdata file which can get often very large. More on this can be found on the help pages: $ R CMD BATCH help or ?BATCH.
(2.3) Another alternative for running R programs as silently as possible.
Argument slave makes R run as 'quietly' as possible.
(3) Passing CommandLine Arguments to R Programs
Create an R script, here named test.R, like this one:
######################
myarg < commandArgs()
print(iris[1:myarg, ])
######################
Then run it from the commandline like this:
$ Rscript test.R 10
In the given example the number 10 is passed on from the commandline as an argument to the R script which is used to return to STDOUT
the first 10 rows of the iris sample data. If several arguments are provided, they will be interpreted as one string that needs to be split it in R
with the strsplit function.
(4) Submitting R script to a Linux cluster via Torque
Create the following shell script my_script.sh
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 22/39
2/5/2016 R Programming - Manuals
#################################
#!/bin/bash
cd $PBS_O_WORKDIR
R CMD BATCH nosave my_script.R
#################################
This script doesn't need to have executable permissions. Use the following qsub command to send this shell script to the Linux cluster from
the directory where the R script my_script.R is located. To utilize several CPUs on the Linux cluster, one can divide the input data into
several smaller subsets and execute for each subset a separate process from a dedicated directory.
$ qsub my_script.sh
Table of Contents
Here is a short R script that generates the required files and directories automatically and submits the jobs to the nodes: submit2cluster.R. For
more details, see also this 'Tutorial on Parallel Programming in R' by Hanna Sevcikova
(5) Submitting jobs to Torque or any other queuing/scheduling system via the BatchJobs package. This package provides one of the most
advanced resources for submitting jobs to queuing systems from within R. A related package is BiocParallel from Bioconductor which
extends many functionalities of BatchJobs to genome data analysis. Useful documentation for BatchJobs: Technical Report, GitHub page,
Slide Show, Config samples.
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 23/39
2/5/2016 R Programming - Manuals
library(BatchJobs)
loadConfig(conffile = ".BatchJobs.R")
## Loads configuration file. Here .BatchJobs.R containing just this line:
## cluster.functions <- makeClusterFunctionsTorque("torque.tmpl")
## The template file torque.tmpl is expected to be in the current working
## director. It can be downloaded from here:
## https://github.com/tudo-r/BatchJobs/blob/master/examples/cfTorque/simple.tmpl
getConfig() # Returns BatchJobs configuration settings
reg <- makeRegistry(id="BatchJobTest", work.dir="results")
## Constructs a registry object. Output files from R will be stored under directory
"results",
## while the standard objects from BatchJobs will be stored in the directory "BatchJobTest-
files".
print(reg)
## Some test function
f <- function(x) {
system("ls -al >> test.txt")
x
}
## Adds jobs to registry object (here reg)
ids <- batchMap(reg, fun=f, 1:10)
print(ids)
showStatus(reg)
## Submit jobs or chunks of jobs to batch system via cluster function
done <- submitJobs(reg, resources=list(walltime=3600, nodes="1:ppn=4", memory="4gb"))
## Load results from BatchJobTest-files/jobs/01/1-result.RData Table of Contents
ObjectOriented Programming (OOP)
R supports two systems for objectoriented programming (OOP). An older S3 system and a more recently introduced S4 system. The latter is
more formal, supports multiple inheritance, multiple dispatch and introspection. Many of these features are not available in the older S3
system. In general, the OOP approach taken by R is to separate the class specifications from the specifications of generic functions (function
centric system). The following introduction is restricted to the S4 system since it is nowadays the preferred OOP method for R. More
information about OOP in R can be found in the following introductions: Vincent Zoonekynd's introduction to S3 Classes, S4 Classes in 15
pages, Christophe Genolini's S4 Intro, The R.oo package, BioC Course: Advanced R for Bioinformatics, Programming with R by John
Chambers and R Programming for Bioinformatics by Robert Gentleman.
Define S4 Classes
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 24/39
2/5/2016 R Programming - Manuals
Class: the name of the class
representation: the slots that the new class should have and/or other classes that this class extends.
prototype: an object providing default data for the slots.
contains: the classes that this class extends.
validity, access, version: control arguments included for compatibility with SPlus.
where: the environment to use to store or remove the definition as meta data.
(B) The function new creates an instance of a class (here myclass)
Class: the name of the class
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 25/39
2/5/2016 R Programming - Manuals
...: Data to include in the new object with arguments according to slots in class definition.
(C) A more generic way of creating class instances is to define an initialization method (details below)
(D) Usage and helper functions
myobj@a # The '@' extracts the contents of a slot. Usage should be limited to internal functions!
initialize(.Object=myobj, a=as.matrix(cars[1:3,])) # Creates a new S4 object from an old one.
# removeClass("myclass") # Removes object from current session; does not apply to associated methods.
Table of Contents
(E) Inheritance: allows to define new classes that inherit all properties (e.g. data slots, methods) from their existing parent classes
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 26/39
2/5/2016 R Programming - Manuals
The argument contains allows to extend existing classes; this propagates all slots of parent classes.
(F) Coerce objects to another class
(G) Virtual classes are constructs for which no instances will be or can be created. They are used to link together classes which may have
distinct representations (e.g. cannot inherit from each other) but for which one wants to provide similar functionality. Often it is desired to
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 27/39
2/5/2016 R Programming - Manuals
create a virtual class and to then have several other classes extend it. Virtual classes can be defined by leaving out the representation
argument or including the class VIRTUAL:
setClass("myVclass")
setClass("myVclass", representation(a = "character", "VIRTUAL"))
Table of Contents
(H) Functions to introspect classes
getClass("myclass")
getSlots("myclass")
slotNames("myclass")
extends("myclass2")
Assign Generics and Methods
Assign generics and methods with setGeneric() and setMethod()
(A) Accessor function (to avoid usage of '@')
(B.1) Replacement method using custom accessor function (acc <)
(B.2) Replacement method using "[" operator ([<)
(C) Define behavior of "[" subsetting operator (no generic required!)
setMethod(f="[", signature="myclass",
definition=function(x, i, j, ..., drop) {
x@a <- x@a[i,j]
return(x)
})
myobj[1:2,] # Standard subsetting works now on new class
An object of class "myclass"
Slot "a":
a b c d e
a 999 999 21 31 41
b 2 12 22 32 42
...
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 29/39
2/5/2016 R Programming - Manuals
Table of Contents
(D) Define print behavior
(E) Define a data specific function (here randomize row order)
(F) Define a graphical plotting function and allow user to access it with generic plot function
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 30/39
2/5/2016 R Programming - Manuals
Table of Contents
(G) Functions to inspect methods
showMethods(class="myclass")
findMethods("randomize")
getMethod("randomize", signature="myclass")
existsMethod("randomize", signature="myclass")
Building R Packages
To get familiar with the structure, building and submission process of R packages, users should carefully read the documentation on this topic
available on these sites:
Writing R Extensions, R web site
R Packages, by Hadley Wickham
R Package Primer, by Karl Broman
Package Guidelines, Bioconductor
Advanced R Programming Class, Bioconductor
Short Overview of Package Building Process
(A) Automatic package building with the package.skeleton function:
Note: this is an optional but very convenient function to get started with a new package. The given example will create a directory named
mypackage containing the skeleton of the package for all functions, methods and classes defined in the R script(s) passed on to the
code_files argument. The basic structure of the package directory is described here. The package directory will also contain a file named
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 31/39
2/5/2016 R Programming - Manuals
'Readanddeleteme' with the following instructions for completing the package:
Edit the help file skeletons in man, possibly combining help files for multiple functions.
Edit the exports in NAMESPACE, and add necessary imports.
Put any C/C++/Fortran code in src.
If you have compiled code, add a useDynLib() directive to NAMESPACE.
Run R CMD build to build the package tarball.
Run R CMD check to check the package tarball.
Read Writing R Extensions for more information.
(B) Once a package skeleton is available one can build the package from the commandline (Linux/OS X):
(C) Install package from source:
Linux:
install.packages("mypackage_1.0.tar.gz", repos=NULL)
Table of Contents
OS X:
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 32/39
2/5/2016 R Programming - Manuals
Table of Contents
Windows requires a zip archive for installing R packages, which can be most conveniently created from the commandline (Linux/OS X) by
installing the package in a local directory (here tempdir) and then creating a zip archive from the installed package directory:
$ mkdir tempdir
$ R CMD INSTALL -l tempdir mypackage_1.0.tar.gz
$ cd tempdir
$ zip -r mypackage mypackage
## The resulting mypackage.zip archive can be installed under Windows like this:
install.packages("mypackage.zip", repos=NULL)
Table of Contents
This procedure only works for packages which do not rely on compiled code (C/C++). Instructions to fully build an R package under Windows
can be found here and here.
(D) Maintain/expand an existing package:
Add new functions, methods and classes to the script files in the ./R directory in your package
Add their names to the NAMESPACE file of the package
Additional *.Rd help templates can be generated with the prompt*() functions like this:
The resulting *.Rd help files can be edited in a text editor and properly rendered and viewed from within R like this:
library(tools)
Rd2txt("./mypackage/man/myfct.Rd") # renders *.Rd files as they look in final help pages
checkRd("./mypackage/man/myfct.Rd") # checks *.Rd help file for problems
Table of Contents
(E) Submit package to a public repository
The best way of sharing an R package with the community is to submit it to one of the main R package repositories, such as CRAN or
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 33/39
2/5/2016 R Programming - Manuals
Bioconductor. The details about the submission process are given on the corresponding repository submission pages:
Submitting to Bioconductor (guidelines, submission, svn control, build/checks release, build/checks devel)
Submitting to CRAN
Reproducible Research by Integrating R with Latex or Markdown
See Sweave/Stangle sections of Slide Show for this manual
Sweave Manual
R Markdown
knitr manual
RStudio's Rpubs page
R Programming Exercises
Exercise Slides
[ Slides ] [ Exercises ] [ Additional Exercises ]
Download on of the above exercise files, then start editing this R source file with a programming text editor, such as Vim, Emacs or one of the
R GUI text editors. Here is the HTML version of the code with syntax coloring.
Sample Scripts
Batch Operations on Many Files
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 34/39
2/5/2016 R Programming - Manuals
Largescale Array Analysis
Sample script to perform largescale expression array analysis with complex queries: lsArray.R. To demo what the script does, run it like this:
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/lsArray.R")
Table of Contents
Graphical Procedures: Feature Map Example
Script to plot feature maps of genes or chromosomes: featureMap.R. To demo what the script does, run it like this:
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/featureMap.txt")
Table of Contents
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 35/39
2/5/2016 R Programming - Manuals
Sequence Analysis Utilities
Includes sequence batch import, subsetting, pattern matching, AA Composition, NEEDLE, PHYLIP, etc. The script 'sequenceAnalysis.R'
demonstrates how R can be used as a powerful tool for managing and analyzing large sets of biological sequences. This example also shows
how easy it is to integrate R with the EMBOSS project or other external programs. The script provides the following functionality:
Batch sequence import into R data frame
Motif searching with hit statistics
Analysis of sequence composition
Allagainstall sequence comparisons
Generation of phylogenetic trees
To demonstrate the utilities of the script, users can simply execute it from R with the following source command:
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/sequenceAnalysis.txt")
Table of Contents
Pattern Matching and Positional Parsing of Sequences
Functions for importing sequences into R, retrieving reverse and complement of nucleotide sequences, pattern searching, positional parsing
and exporting search results in HTML format: patternSearch.R. To demo what the script does, run it like this:
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/patternSearch.R")
Table of Contents
Identify OverRepresented Strings in Sequence Sets
Functions for finding overrepresented words in sets of DNA, RNA or protein sequences: wordFinder.R. To demo what the script does, run it
like this:
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/wordFinder.R")
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 36/39
2/5/2016 R Programming - Manuals
Table of Contents
Translate DNA into Protein
Script 'translateDNA.R' for translating NT sequences into AA sequences (required codon table). To demo what the script does, run it like this:
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/translateDNA.R")
Table of Contents
Subsetting of Structure Definition Files (SDF)
Script for importing and subsetting SDF files: sdfSubset.R. To demo what the script does, run it like this:
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/sdfSubset.R")
Table of Contents
Managing Latex BibTeX Databases
Script for importing BibTeX databases into R, retrieving the individual references with a fulltext search function and viewing the results in R or
in HubMed: BibTex.R. To demo what the script does, run it like this:
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/BibTex.R")
Table of Contents
Loan Payments and Amortization Tables
This script calculates monthly and annual mortgage or loan payments, generates amortization tables and plots the results: mortgage.R. To
demo what the script does, run it like this:
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 37/39
2/5/2016 R Programming - Manuals
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/mortgage.R")
Table of Contents
Course Assignment: GC Content, Reverse & Complement
Apply the above information to write a function that calculates for a set of DNA sequences their GC content and generates their
reverse and complement. Here are some useful commands that can be incorporated in this function:
Translation of this Page
SerboCroatian version translated by Jovana Milutinovich
This site was accessed times (detailed access stats).
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 38/39
2/5/2016 R Programming - Manuals
http://manuals.bioinformatics.ucr.edu/home/programming-in-r 39/39