Sunteți pe pagina 1din 2

Katie Gao

Research Plan/Project Summary

A. The advent of the big data era for functional genomics has created opportunities to study
cancer through a new lens. Despite a wealth of literature on identifying gene signatures
for individual types of cancer, little has been reported regarding the comparison of gene
signatures across multiple cancers to identify potential common-cancer gene signatures.
This research aims to identify gene pathways shared by multiple cancers through machine
learning analysis. Findings will be critical to the future of cancer research, because
common-cancer signatures can aid the diagnostic cancer screening in a more timely and
accurate fashion.

B. The hypothesis is that common gene signatures can diagnose multiple cancers. Based on
the presence of various single-cancer signatures, and recent discoveries of multi-cancer
biomarkers indicate that the hypothesis can be supported. The application of machine
learning aims to discover an effective method to discover such signatures.

C. *Procedures for data collection:


1. Download dataset from NCBI-GEO database.
a. Platform: Affymetrix GeneChip Human Genome U133 Plus 2.0 Array
b. Filter: (gds[filter]) AND homo sapiens AND GPL570 AND cancer
2. Create ExpresSet R Object with phenotype data and gene expressiogn matrix data.
*Risk and Safety: Because all data collection and analysis is performed on a computer,
there are no safety risks.
*Data Analysis:
1. PAM Analysis with cross validation to identify gene signatures with
classification.
2. Apply gene signature to all other datasets to evaluate predictive accuracy.
3. Gene enrichment analysis.

D. Bibliography
Buness, A., Ruschhaupt, M., Kuner, R., & Tresch, A. (2009). Classification across gene
expression microarray studies. BMC Bioinformatics, 10, 453.
http://doi.org/10.1186/1471-2105-10-453

Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V., & Fotiadis, D. I.
(2015). Machine learning applications in cancer prognosis and prediction. Computational
and Structural Biotechnology Journal, 13, 817.
http://doi.org/10.1016/j.csbj.2014.11.005
Katie Gao

Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Diagnosis of multiple cancer types
by shrunken centroids of gene expression. Proceedings of the National Academy of
Sciences of the United States of America, 99(10), 65676572.
http://doi.org/10.1073/pnas.082099299

Touw, W. G., Bayjanov, J. R., Overmars, L., Backus, L., Boekhorst, J., Wels, M., & van Hijum,
S. A. F. T. (2013). Data mining in the Life Sciences with Random Forest: a walk in the
park or lost in the jungle? Briefings in Bioinformatics, 14(3), 315326.
http://doi.org/10.1093/bib/bbs034

S-ar putea să vă placă și