Documente Academic
Documente Profesional
Documente Cultură
If you program enough, it can change the way you look at the world
1/5
6/13/2016
Figures 1 and 5 from Fourment and Gillings, which illustrate the tradeo between lines of code written and the
speed at which a global alignment program runs to completion. Notice that the compiled and semi-compiled
languages run much faster, but can take more lines of code to write. The semi-compiled languages (C# and Java),
do not necessarily take more lines (though see notes on Perl below).
2/5
6/13/2016
Fourment, M. and Gillings, M.R. 2008. A comparison of common programming languages used in bioinformatics.
BMC Bioinformatics 9: 82.
See also:
Dudley, J.T. and Butte A.J. 2009. A quick guide for developing effective bioinformatics programming skills. PLoS
Computational Biology. 5:12
Like
Tweet
17
Share
This entry was posted in bioinformatics, next generation sequencing, software. Bookmark the permalink.
10 Comments
Share
Recommend
Login
Sort by Oldest
4 years ago
I use Java, and highly recommend it. There are a variety of approaches to creating loops and conditional statements, it is relatively
straightforward to read and write files, and very easy to create and manage arrays. Another important advantage is there are a huge
number of examples and code snippets scattered across the internet for beginners and advanced users alike. Programming for either
command line executables or GUI interfaces is also easy and intuitive. I think Java has also been designed to be very good at
preventing run-time errors. I tried picking up C and Java simultaneously and found Java much easier than C to lay out a program. I
use R sometimes, but I mostly use Java because its easier to interact with files and write much more complex code. Well, those are
my thoughts. I do hope to get into Python sometime soon.
Reply Share
Tim Vines
4 years ago
Does Mathematica count as a language or a program? The lab I did my PhD in was all M'ca, all the time. As a low level user it was
much shallower learning curve than R, especially when it came to manipulating lists (R is a complete jerk with lists).
2
Reply Share
Mark Christie
4 years ago
I still haven't completely mastered the power of lists in R. My understanding is that they hold many different data types (data.frames,
vectors, matrices etc) and so would be useful in large projects. Any avid R list users care to comment?
unionx
Reply Share
3 years ago
Java is good, but I don't recommend it for bioinformatics tasks. JVM takes a long time to start, and numerical computation in Java is
not as good as in Python or R.
3
Reply Share
3 years ago
Although I do not use Java myself, one interesting thing I have noticed is that the people running it on our cluster are almost
(1) always running it in parallel and (2) are using very little computational resources. My guess would be that initial
development in R or Python would be a good idea, but moving it over to Java or C might be a good idea when you start scaling
up your applications.
1
Reply Share
3 years ago
I know some bioinfo guys who just write batch scripts to do some calculation. I am not sure whether they need to build
online service. Yes, Java is very good for online service, and I use Clojure for that.
http://www.molecularecologist.com/2012/11/a-comparison-of-bioinformatics-programming-languages/
3/5
6/13/2016
online service. Yes, Java is very good for online service, and I use Clojure for that.
Jon Puritz
Reply Share
3 years ago
I still rely on others to actually write the heavy duty analysis code, but I find bash incredibly easy and useful for analysis pipelines. I
highly recommend that every bioinformatician be familiar with what bash and baseline unix commands can do for data
manipulation.
Reply Share
Eric Thomas
3 years ago
I used to wave the Java and C++ flags high but after solid libraries like biopython and scipy its hard to justify the time you would
need to replicate a lot of this in Java. Python is just quick and can handle most things you need. The in house GUI (tkinter) doesn't
have as much going it as java but it usually more then fills the needs of a basic program. After doing this for about a year in a half, I
have all but fully converted to python.
Matt
Reply Share
2 years ago
For use once research code that filters/formats data anyone using C or Java doesn't value their own time and likely just doesn't know
any other languages. I know all of the mentioned languages in this article with the exception of C# which I have only played with
because portability matters to me. Each has its place but for just day to day data munging only Python and Perl are viable options
with Perl genuinely nicer in syntax for shell script activities (e.g. no significant white space in if statements). For stats and plots R or
Python with stats models and matplotlib are both great. Personally I try to stick more with Python because its just a lot less clunky in
syntax IMHO and more useful to know if you ever decide to leave science. If you wish to write a large scale application others will
contribute to that is highly algorithmic rather than on data processing Java or Python are equally viable. If you want something that's
solving some serious problems maybe in large combinatorial space you want to be using C with MPI or OpenMP at a minimum, if
you didnt already know this then you aren't solving really serious combinatorial problems! Most of us arent, unless you are dealing
with short read assembly or phylogenetic tree search. The most valuable thing is your own time, line count isn't a great measure of
productivity. Java you copy paste the same 50-100 lines every time so once you have some of your own libs written its not too bad.
Ultimately for really simple stuff Perl cannot be beaten since you can inline whilst in the shell. perl -lane 'print $F[2]*$F[4]' <
input.tsv for this sort of task: stripping the third and fifth columns of a table and multiplying them (or running any function could be
seq comparisson) Perl mastery cannot be beaten. Converting a history of one liners like this into a full script doesn't take much effort
at a later date too. The real issue is I haven't met a single person in bioinfo who wasn't comp sci trained get beyond 'intermediate' in
any single language. The OP suggests you need a lot of time to learn many languages. That isn't true, after deep understanding of two
languages with varied syntax it gets very easy to learn. If you know only one language (other than perhaps C) its not possible to be a
true master since you lack understanding of how things might be working underneath the high level syntax. Like Java if you don't
see more
Reply Share
Edward Kirton
2 years ago
I've been working in bioinformatics for over a decade and have used a dozen languages over the years, including the ones discussed
above. The bread-and-butter coding of bioinformaticians is writing scripts which wrap powerful third-party programs and
manipulate files, often to create a pipeline (usually on the cluster). For this, the best are perl, python, bash, maybe c#. Each has
pros/cons. Start with one of these and learn good coding practices (e.g. use of repositories like Git, good documentation habits, testdriven development, agile project management, etc.). Which to learn? I recommend you use whichever one you can get good
coaching on. Do you have someone at work whom is willing to answer questions, do code reviews, and do paired-programming with
you?
Subscribe
Reply Share
Privacy
ProudlypoweredbyWordPress.
http://www.molecularecologist.com/2012/11/a-comparison-of-bioinformatics-programming-languages/
4/5
6/13/2016
http://www.molecularecologist.com/2012/11/a-comparison-of-bioinformatics-programming-languages/
5/5