Sunteți pe pagina 1din 17

DATABASE TECHNOLOGIES

IN BIOINFORMATICS
GLEB SKLYR
THE PROBLEM

Bioinformatics research produces highly irregular and


unstructured data
Example: gene EGFR
THE PROBLEM

New emerging technologies allow data to be generated quicker, cheaper, and in larger
quantities
Example:

Gebelhoff, Robert. "Sequencing the genome creates so much data we dont know what to do with it."The Washington Post. WP Company, 07
July 2015. Web. 01 May 2017.
THE PROBLEM

Bioinformatics data is generated globally and is stored and


processed in multiple site around the world. Each research
center and university have their own data storage solutions
and many different centralized repositories exist
Examples:
THE PROBLEM

Additionally, data analysis algorithms are complex


Examples:
- Global alignment used by BLAST O(NM)
- Multiple Sequence Alignment O()

Most algorithms use heuristic approaches


MOTIVATION

Understand the secret of life. How biology works


Replicate biological processes
Cure disease
Much more
MOTIVATION

Every paper repeats the 3 points: data is unstructured,


scattered, and growing fast (data tsunami)
This field has a lot of problems that individual companies do
not have and make it unique
What solutions exist? What solutions are proposed?
As a database administrator/designer how can you alleviate
the hard work that goes into bioinformatics?
EXISTING WORK XML IN RDBMS
EXISTING WORK
ORACLE RDBMS
Offer XML data type
Have data mining libraries
Continuously working to adapt
to standards in industry
ACID Atomicity, Consistency,
Isolation, Durability
PROBLEM

Relational databases are constrained by schema and


relationships all columns are same in a table, foreign key
constraints
Performance is degraded with increasing schema complexity,
data volume and data distribution
SOLUTION NOSQL SYSTEMS

Are not restricted by schema or relationships


Designed with performance in mind
Designed with data distribution in mind
Highly scalable
SOLUTIONS MONGODB
UNSTRUCTURED DATA
SOLUTIONS CASSANDRA
FOR COMPUTATIONALLY INTENSIVE DATA
CASE STUDY - BIGNASIM
CONCLUSION

NoSQL technologies are the future of bioinformatics


In a field of unstructured, distributed, and rapidly growing
data, it is important to be able to pick the right system for
your application
BIBLIOGRAPHY
Blackwell, Bruce, and Siva Ravada. "Oracle's technology for bioinformatics and future directions." ACM Digital
Library. Australian Computer Society, Inc., n.d. Web. 03 May 2017.
Alger, Abdullah. "Redis and MongoDB in the biomedical domain." Compose Articles. Compose Articles, 03 Feb. 2017.
Web. 03 May 2017.
Aniceto, Rodrigo, Rene Xavier, Maristela Holanda, Maria Emilia Walter, and Sergio Lifschitz. "Genomic data
persistency on a NoSQL database system." 2014 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM) (2014): n. pag. Web.
Gebelhoff, Robert. "Sequencing the genome creates so much data we dont know what to do with it." The Washington
Post. WP Company, 07 July 2015. Web. 01 May 2017.
Guimaraes, Valeria, Fernanda Hondo, Rodrigo Almeida, Harley Vera, Maristela Holanda, Aleteia Araujo, Maria
Emilia Walter, and Sergio Lifschitz. "A study of genomic data provenance in NoSQL document-oriented database
systems." 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2015): n. pag. Web.
Hospital, Adam, Pau Andrio, Cesare Cugnasco, Laia Codo, Yolanda Becerra, Pablo D. Dans, Federica Battistini, Jordi
Torres, Ramn Goi, Modesto Orozco, and Josep Ll. Gelp. "BIGNASim: a NoSQL database structure and analysis
portal for nucleic acids simulation data." Nucleic Acids Research 44.D1 (2015): n. pag. Web.
Lima, Iasmini, Matheus Oliveira, Diego Kieckbusch, Maristela Holanda, Maria Emilia M. T. Walter, Aleteia Araujo,
Marcio Victorino, Waldeyr M. C. Silva, and Sergio Lifschitz. "An evaluation of data replication for bioinformatics
workflows on NoSQL systems." 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2016):
n. pag. Web.
Stromback, Lena, and Juliana Freire. "XML Management for Bioinformatics Applications." Computing in Science &
QUESTIONS

S-ar putea să vă placă și