Sunteți pe pagina 1din 5

Deep Learning

Machine learning (ML) is a basic form of artificial intelligence, where an algorithm is able to

recognize and predict patterns in a set of data. Deep learning (DL) is a subtype of ML that utilizes

multi-layered deep neural networks (DNNs), to recognize images, sounds and texts when fed data

(1). Deep learning is able to conduct multifactorial analyses as well as dimensionality reduction,

which makes it a favourable approach to identify the HLA alleles that are linked with SJS/TEN (2).

Figure 1: The evolution of neural networks with the introduction of deep learning.Adapted from

name (3).

DL can be broadly categorized into supervised and unsupervised learning. Supervised learning

assigns the accurate classifications to a data set and uses it to train the algorithm. Alternatively, in

unsupervised DL, the algorithm is allowed to learn hidden and inherent patterns within a dataset by

self-organizing (4). An example of Unsupervised DL used in a genomic context is Principal

Component Analysis (PCA). It is able to conduct dimensionality reduction on genotypical data and

reveal unknown relationships between individuals (5). So far, Unsupervised DL has been the

preferred method for genomic research, although Supervised DL is becoming a reality due to the
ability of deep learning models to process and encode genomic data whilst maintaining salient

features (6).

There are several types of neural networks utilized for deep learning. A main type are Feedforward

neural networks (FNNs). FNNs are taught to map a fixed sized input (an image) to a fixed size output

(probabilities of image category). Convolutional Neural Networks (CNNs) are a subtype of FNNs that

are able to process multiple arrays of data, for example different layers of pixels from a colour image

can be fed to a CNN. This makes CNN easier to train and apply universally. So far, CNNs have been

successfully used for document reading and speech recognition. Recurrent Neural Networks (RNNs)

are another class that processes one element of a sequence at a time whilst including information

about the past elements of a sequence. Despite being difficult to train, RNNS have been able to

successfully predict consequent items in patterns (2)

There are three subsets of data used for DL. Training datasets are used to learn the parameters of

the model. Validation datasets choose the most appropriate model. Additionally, test data sets are

used to approximate qgeneralization performance. The algorithm needs to be optimized so that it is

able to analyse a foreign and unfamiliar dataset efficiently. To achieve this, a compromise needs to

be reached between the size of that data fed and the flexibility of the model. If the model is too

basic, it will fail to discover patterns within the data. Conversely, an overly complex model will

overfit and generate false patterns without generalizing (7).

DL is especially important for genomic analysis due to the size and diversity of data collected from

the human genome. Convolutional Neural Networks (CNNs) which are a type of deep learning,

allows the reduction of large sections of genetic data, while identifying the important sections in the

input data (8). Supervised DL has provided the most accurate results in genomic analyses so far (9).

However, issues such as appropriating DL to interpret genome within the medical paradigm,

overcoming biases in training sets and interpreting predictions prevail (10).


Figure 2: An example of an autoencoder conducted dimensionality reduction (11).

This research project aims to create an autoencoder that is able to accurately compress

multidimensional data to identify the HLA alleles that are associated with CBZ-SJS/TEN. A similar

approach was utilized in a recent study to predict the genes associated with cancer tumours,

potentially identifying the activated pathways and the suitable treatments. To accomplish this, a

variational autoencoder (VAE) was programmed using data obtained from The Cancer Genome

Atlas. In addition to being able to compress data, VAEs can also analyse latent spaces to identify

treatments, biological pathways and cancer states (12). Although, this approach is useful and

innovative, this project does not require a VAE as only the identification of the culprit alleles are

required. A recent study attempted to accomplish this by using a stacked autoenoder to graphically

classify DNA sequences of HLA-A alleles. The autoencoder was able to compress a dataset with 3288

dimensions to 2. This data was presented in a graphical projection on a 2 dimensional feature space,

showing the autoencoders ability to accurately categorise the allele subtypes. In addition to the

binary numerical vector graph, a document vector graph was produced (11).
Figure 3: Graphical representation
Figure 4: Graphical representation
of 2 dimensional
of histogram based document

HLA-A allele categorization, vector analysis. Possesses greater


conducted by autoencoder (9). clarity than Figure 3 (9).

1. Yu D, Deng L. Deep Learning and Its Applications to Signal and Information Processing
[Exploratory DSP]. IEEE Signal Processing Magazine. 2011;28(1):145-54.
2. Lecun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-44.
3. Waldrop MM. News Feature: What are the limits of deep learning? Proceedings of the
National Academy of Sciences. 2019;116(4):1074-7.
4. Sathya R, Abraham A. Comparison of Supervised and Unsupervised Learning Algorithms for
Pattern Classification. 2013;2(2).
5. Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, et al. Genes mirror geography
within Europe. Nature. 2008;456(7218):98-101.
6. Schrider DR, Kern AD. Supervised Machine Learning for Population Genetics: A New
Paradigm. Trends in Genetics. 2018;34(4):301-12.
7. Zou J, Huss M, Abid A, Mohammadi P, Torkamani A, Telenti A. A primer on deep learning in
genomics. Nature Genetics. 2019;51(1):12-8.
8. Graham B. Fractional Max-Pooling2014.
9. Gupta A, Zou J. Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for
Optimizing Protein Functions2018.
10. Ghorbani A, Abid A, Zou J. Interpretation of Neural Networks is Fragile2017.
11. Miyake J, Kaneshita Y, Asatani S, Tagawa S, Niioka H, Hirano T. Graphical classification of
DNA sequences of HLA alleles by deep learning. Human Cell. 2018;31(2):102-5.
12. Way GP, Greene CS. Extracting a biologically relevant latent space from cancer
transcriptomes with variational autoencoders. Pac Symp Biocomput. 2018;23:80-91.

S-ar putea să vă placă și