Sunteți pe pagina 1din 1

When Should I Use Python vs. R?

Python and R are both great programming languages for data science and
analy7cs. Since they’re open-source, they’re free to download for everyone,
unlike commercial tools like SAS and SPSS. Find out their strengths and
weaknesses and figure out which is beCer for your specific use cases.

Purpose
Either language is suitable for almost any data science task, from data manipula7on
and automa7on to ad-hoc analysis and exploring datasets. Users may leverage both
languages for different purposes, e.g., conduc7ng early-stage data analysis and
explora7on in R, then switching to Python when it’s 7me to ship some data products.

Choosing Python vs. R


It’s up to the individual data scien7st or data analyst to choose the language that best fits
their unique needs. The following ques7ons may help with that decision.

1 Which language do your colleagues use?


The benefits of being able to share code with your colleagues and maintaining a simpler
soTware stack outweigh any benefits of one language over another.

2 What problems do you want to solve and what tasks do


you need to accomplish?

3 What are the net costs of learning a language?


It will take 7me to learn a new system that is beCer aligned for the problem you want to
solve, but staying with the system you know may not be a fit for that problem.

4 What are the commonly used tool(s) in your field?

Who It’s Used By


Python is used by programmers that want R has been used primarily in academics and
to delve into data analysis or apply research and is great for exploratory data
sta7s7cal techniques, and by developers analysis. In recent years, enterprise usage
and programmers that turn to data science. has rapidly expanded.

Python is a produc7on-ready language, Sta7s7cians, engineers, and scien7sts


meaning it has the capacity to be a single without computer programming skills. It’s
tool that integrates with every part of your popular in academia, finance,
workflow! pharmaceu7cals, media, and marke7ng.

Usability
People with a soTware engineering If you have no coding experience, then R
background may find Python comes more may be easier to learn.
naturally to them than R.
Sta7s7cal models can be wriCen with only
Coding and debugging is easy because of a few lines.
the simple syntax.
The same piece of func7onality can be
The indenta7on of code affects its wriCen in several ways with R.
meaning.

Any piece of func7onality is always


wriCen the same way with Python.

Ecosystem
Python has a robust ecosystem and is R has a rich ecosystem of cu`ng-edge
commonly considered one of the easier interface packages available to
programming languages to read and learn. communicate between open-source
Its programming syntax is simple and its languages.
commands mimic the English language.
This allows users to string their workflows
E.g. print(“Hello world!”) together, which is especially useful for
data analysis.
Python code is syntac7cally clear and
elegant, easily interpretable, and easy to
Packages are available at:
type.
Comprehensive R Archive Network
(CRAN): “Task Views” page lists a wide
It’s great for building data science
range of tasks for which R packages
pipelines and machine learning products
are available and users can easily
integrated with web frameworks at scale.
contribute to.
But watch out for dependencies and
installing Python libraries!
Bioconductor: Open source soTware
for bioinforma7cs
It’s great for building data science
pipelines and machine learning products
GitHub: Web-based Git repository
integrated with web frameworks at scale.
hos7ng service
But watch out for dependencies and
installing Python libraries!
Search through these sources easily with
RdocumentaHon
The Python Package Index (PyPi) and
Anaconda are repositories of Python
Packages are collec7ons of R func7ons,
soTware with all libraries. Users can
data, and compiled code. They can be
contribute to these repositories, but it’s a
installed in R with one line.
bit complicated in prac7ce to do so.

Flexibility
Python is flexible for crea7ng something that It’s easy to use complex func7ons in R. All
has never been done before. Developers can kinds of sta7s7cal tests and models are
also use it for scrip7ng websites or other readily available and easily used.
applica7ons.

Ease of Learning
Python’s focus on readability and simplicity R is easier to learn when you start out, but
means its learning curve is rela7vely linear the intricacies of advanced func7onali7es
and smooth. makes it more difficult to develop exper7se.

Python is considered a good language for R is not hard for experienced programmers to
beginner programmers. learn.

Advantages
General-purpose programming languages Widely considered the best tool for
are useful beyond just data analysis. making beau7ful graphs and visualiza7ons.

Has gained popularity for its code Has many func7onali7es for data analysis.
readability, speed, and many
func7onali7es. Great for sta7s7cal analysis.

Great for mathema7cal computa7on and Built around a command line, but the
learning how algorithms work. majority of R users work inside of RStudio,
an environment that includes a data
Has high ease of deployment and editor, debugging support, and a window
reproducibility. to hold graphics as well.

Disadvantages
Python doesn’t have as many libraries as For people with no soTware engineering
R, and there are no module replacements experience, base R can be more difficult to
for the hundreds of essen7al R packages. learn because it was developed by
sta7s7cians, not to make coding easier.
Python requires rigorous tes7ng as errors But R has a set of packages known as the
show up in run7me. Tidyverse, which provides powerful yet
easy-to-learn tools for impor7ng,
Visualiza7ons are more convoluted in
manipula7ng, visualizing, and repor7ng on
Python than in R, and results are not as
data.
eye-pleasing or informa7ve.

Python packages for data visualiza7on: Finding the right packages to use in R may
be 7me consuming.
seaborn: Library based on Matplotlib
Bokeh: Interac7ve visualiza7on library
There are many dependencies between R
Pygal: Create dynamic dynamic svg
libraries.
charts

R can be considered slow if code is wriCen


poorly.

Not as popular as Python for deep


learning and NLP.

Use Case: Data Analysis

Usage
Python is generally used when the data R is mainly used when the data analysis tasks
analysis tasks need to be integrated with web require standalone compu7ng or analysis on
apps or if sta7s7cs code needs to be individual servers.
incorporated into a produc7on database.
For exploratory work, R is easier for
Since it’s a full-fledged programming beginners. Sta7s7cal models can be wriCen
language, Python is a good tool to implement with a few lines of code.
algorithms for use in produc7on.

Data Handling
CapabiliHes
Python requires users to install packages for R is great for data analysis because of its huge
data analysis, and these packages have number of packages, readily usable tests, and
greatly improved in recent years. the advantage of using formulas.

NumPy and pandas, among others, are It can handle basic data analysis without
popular for data analysis. needing to install packages. Big datasets
require the use of packages such as
data.table and dplyr.

GeJng Started

IDE

There are many Python IDEs to choose from RStudio is the most popular R IDE. It’s
which dras7cally reduce the overhead of available in two formats: RStudio Desktop for
organizing code, output, and notes files. running locally as a regular desktop
Jupyter Notebooks and Spyder are popular, applica7on and RStudio Server for access via
and Jupyter Lab is gaining trac7on. web browser while running on a remote Linux
Tip: Also try Rodeo, the “data science IDE for server.
Python.”

Popular Libraries and Packages

pandas to easily manipulate data dplyr, Hdyr and data.table to easily


manipulate data
SciPy and NumPy for scien7fic compu7ng
stringr to manipulate strings
Scikit-learn for machine learning
zoo to work with regular and irregular 7me
Matplotlib and seaborn to make graphics series

statsmodels to explore data, es7mate ggplot2 to visualize data


sta7s7cal models, and perform sta7s7cal
tests and unit tests caret for machine learning

“R is currently head-and-shoulders above


Python for data analysis, but I remain convinced
that Python can catch up, easily and quickly.”
JAN GALKOWSKI, COMPUTATIONAL ENGINEER

Support and
CommuniHes
DataCamp Slack Community DataCamp Slack Community

Stack Overflow Stack Overflow

Reddit Python Reddit rstats

PyLadies RdocumentaHon

pydata R-help

pystatsmodels ROpenSci

numpy-discussion and sci-py-user Jumping Rivers list of local R User Groups

Trends and Highlights


Popularity Rankings, Python vs. R

Python R

2016 #3 #5

2017 #1 #6

2018 #1 #7

2019 #1 #5

Source: IEEE Spectrum, Sep 2019

Popularity on Stack Overflow, Python vs. R

Python R
% O F S TA C K O V E R F L O W Q U E S T I O N S T H AT M O N T H

9.00%

8.00%

7.00%

6.00%

5.00%

4.00%

3.00%

2.00%

1.00%

0.00%

2009 2010 2011 2012 2013 2014 2015 2016 2017

YEAR

Source: DZone

Python, R or both or other plakorms for data science

Python R Both Other


70%

60%

50%
42% 41%
40%
34% 36%

30%

20%
16%
12% 11%
9%
10%

0%

SHARE IN 2016 SHARE IN 2017

Source: DZone

SAS, R or Python preference by years of experience

SAS R Python

60%

50% 48% 47%

40% 38% 36%


33% 31%
30% 26% 27%

20%
14%

10%

0%

0-5 YRS 6-15 YRS 16+

Source: Butch Works

SAS, R or Python preference by industry

50% SAS R Python

45%
42% 43%
41% 40%
39% 39%
40% 37%
36% 35%
35% 34%
33%
30% 29% 29% 28% 30%
30% 29% 27% 27%
26%
25%

25% 24%

20%

15%

10%

5%

0%
TECH/TELECOM C O N S U LT I N G OTHER A D V/ M A R K E T I N G FINANCIAL R E TA I L / C P G H E A LT H C A R E /
C O R P O R AT I O N S SERVICES SERVICES PHARMA

Source: Butch Works

User loyalty, Python vs. R

74% OF R USERS 91% OF PYTHON USERS


R E M A I N L O YA L T O R R E M A I N L O YA L T O P Y T H O N

10% switch from R to Python

5% switch from Python to R

Source: KDnuggets polls 2016

Learn Python and R


online with DataCamp
The skills people and businesses need to succeed are
changing. No maCer where you are in your career or
what field you work in, you will need to understand the
language of data. With DataCamp, you learn data
science today and apply it tomorrow.

Get Started

S-ar putea să vă placă și