Documente Academic
Documente Profesional
Documente Cultură
INSTALL R
You first need to install R on your computer.
1. Go to http://cran.r-project.org
Here is a screenshot of what youll see:
Install R Studio
Next you should install R Studio, which is an integrated development environment (IDE)
for the R programming language.
If youre a veteran programmer, youll know what an IDE is, but if youre new to
programming, think of an IDE as an application that allows you to write, execute, and
administer your code, as well as manage your data and the outputs of your analysis all in
one place.
1. Go to http://www.rstudio.com/products/rstudio/download/
2. Follow the instructions for installation for your particular platform (i.e., Mac,
Windows, Linux)
Install Packages
Next, you want to install packages. Packages are extensions of the R programming
language. R has pre-built functions and datasets that come with the stock R installation,
but sometimes you want to do things that are outside of Rs basic function set. In this
case, you can extend R with new functions by installing packages.
The primary package that we need to install is called GGPlot2. GGPlot2 is a graphics
package that makes data visualization extremely easy.
Data Visualization is the best skill area to start with for a couple of reasons. First, its easy
to get started. Its easy to find data sets that are ready to be visualized. Second, data
visualization is a quick win. If you can learn visualization fairly rapidly. Third, its
ubiquitous in analytics; whether youre creating a report, doing exploratory analysis, or
creating a machine learning algorithm (which you will need to present to managers,
coworkers, business partners, etc) youll need to use visualization as an exploratory or
communication tool. Lastly, in the analytics world, few people are actually good at
visualization. Truthfully, its one of the most necessary skills, but few analysts are truly
exceptional at it (so if you can master it early, youll be able to stand out).
To get you started, Im going to teach you GGPlot2, the best tool for data visualization.
GGPlot2 is excellent because its highly structured. I wont go into detail here, but Ill
note that the structure of GGPlot2 will help you learn how to think about visualizing data.
Once you learn the syntax as well as the overall conceptual framework, creating
insightful visualizations becomes much, much easier.
(Note: the following shows how this is done on a Mac. The basic procedures should be
very similar on Windows or Linux.)
1. Open R Studio
2.
1. Bar chart
2. Scatter plot
I want to point out that while these are fairly basic visualizations, when we take a closer
look, they show something important.
Each one is made up of basic components - namely bars, lines, and points respectively.
These components (which we will later refer to as geoms) are building blocks of more
complex visualizations.
So, dont be deceived. Yes, these are somewhat basic. But if you can master these (and
some other) basic building blocks, youll later be able to create much more complex,
insightful, and valuable visualizations.
Bar Chart
Here is the code for a simple bar chart:
ggplot(data=diamonds, aes(x=diamonds$cut)) +
geom_bar()
Scatter Plot
ggplot(data=diamonds, aes(x = carat, y = price)) +
geom_point()
Closing Notes
There you have it. These are the basic charts you need to know to get started analyzing
your data. To be clear, there are many variations on each of the above three charts (such
as the stacked bar chart), however, many of those variations can be created with just
one or two extra lines of code, added to the code above. That is, these charts form the
basis for more complex variations of bar, line, and scatter plots. Moreover, the principles
you learn by mastering these basic charts will enable you to start producing much more
complicated plots, much easier and faster than you probably thought possible.
Want to learn more? Im going to continue where this PDF ends on the blog.
Keep reading at sharpsightlabs.com and Ill teach you everything you need to know to get
started - and eventually master - analyzing your data.