Using Deep Learning to detect DNA-regulatory elements

0. Authors

0.1 Corresponding Authors:

png png

  • Predicting the sequence specificities of DnA- and RnA-binding proteins by deep learning

png png

  • Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks

0.2 Important contributor to Deep Learning:

BIG THREE during the development of Deep Learning theory:

png

Geoffrey Hinton:

png

  • Big contribution Two Times

png

by Ran Bi, NYU
http://www.kdnuggets.com/2014/10/deep-learning-make-machine-learning-algorithms-obsolete.html

1. A brief introduction to DL

1.1. Data vitualization: From PCA to tSNE

Example from Colah’s Blog Visualizing MNIST: An Exploration of Dimensionality Reduction:

  • PCA performed well overall, but not well in some detail region, like to divide 4, 7and 9
Visualizing MNIST with PCA
  • tSNE performed much better.
Visualizing MNIST with t-SNE

1.2. tSNE were invented during the development of Deep learning theory

png

by Laurens van der Maaten
https://lvdmaaten.github.io/publications/papers/JMLR_2008.pdf

1.3. Deep Learning includes:

  • Multi-layer network (The inception of Deep learning)
    • More Layers.
    • Using GPU(显卡) for calculation with many many trials in a parallel way.
  • Convolutional Neuron Network (CNN, 卷积神经网络)
  • Recurrent Neuron Network (RNN, 循环神经网络)
    • Time serial information(Video, Audio)
    • Using LSTM algorithm to make the calculation easier.
  • Reinforce Learning Neuron Network(增强学习)
    • Value network:To FEEL the environment.
    • Policy network:To Decide what is the best solution.
    • AlphaGO, Unpiloted cars

1.4 Deep Learning ABC:

  • Three basic kinds of calculation: multiplication, addiction and transformation.

png

by Deep Learning Udacity Course
https://cn.udacity.com/course/deep-learning–ud730/

  • For example: Colah’s blog Neural Networks, Manifolds, and Topology

    • The transforming process: gif

    • The key point is the training for parameters to be multiplied, added and so on. Bad parameters could get bad results: gif

    • More dimension were required for complex datasets. For example, it is hard to divide these points in a 2D graph: gif However, after adding one demension, these two groups could be divided by a layer instead of a line. png

1.5 Convolution Kernal

png png

by iOS Developer Guide
https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html

1.6 Deep Learning frame work

Framework Core Programming Language Interfaces from Other Languages Programming Paradigm Wrappers
Caffe C++/CUDA Python, Matlab Imperative -
TensorFlow C++/CUDA Python Declarative Pretty Tensor, Keras
Theano Python (compiled to C++/CUDA) Declarative Keras, Lasagne, or Blocks
Torch7 LuaJIT (with C/CUDA backend) C Imperative -

TensorFlow: Biology’s Gateway to Deep Learning?
http://www.cell.com/cell-systems/pdf/S2405-4712(16)00010-7.pdf

2. Predicting the sequence specificities of DNA- and RNA-binding proteins

2.1 Using published Data to train the model

2.2. The structure of the neural networks:

png

Let’s see it in detail:

png

2.2.1 conv

e.g. Using motif with length of 3 to convolve the input sequence: ATGG

png

2.2.2 recifity

png

Vanessa’s blog
https://imiloainf.wordpress.com/2013/11/06/rectifier-nonlinearities/

2.2.3 pooling

png

Deep Learning For Java
http://deeplearning4j.org/convolutionalnets.html

2.2.4 neural network

  • Fully connected Layer
    • Multiplication + Addiction + Transformation. Scale to sum of one at last.

jpg

Visual Studio Magzine
https://visualstudiomagazine.com/articles/2014/11/01/~/media/ECG/visualstudiomagazine/Images/2014/11/1114vsm_mccaffreyfig2.ashx

  • The process for calculation:

2.3. Optimizing parameters to get the best performance:

png

  • Calibrate: Using 3xCV to estimate 30 groups of parameters and select the best one
  • Train: Repeat Calibrate process several times.
  • Test: Using the best parameters in a non-used data for testing.
  • Store these group of parameters for predicting new data without training.

2.4. Quantitative performance on various types of held-out experimental test data.

2.4.1 DNA binding

in vitro Micro-array , in vivo ChIP , better performance

png

2.4.2 RNAcomplete micro-array

better than formal methods

png

Check in TF level

png

2.4.3 Using all peaks rather than top500 peaks will get better result

png

2.5 Potentially disease-causing genomic variants

png

  • A disrupted SP1 binding site in the LDL-R promoter that leads to familial hypercholesterolemia
  • A gained GATA1 binding site that disrupts the original globin cluster promoters.

2.6 RNA binding proteins preference for up-stream and down-stream information

  • Exons known to be downregulated by Nova had higher Nova scores in their upstream introns, and exons known to be upregulated by Nova had higher Nova scores in their downstream intron.

png

  • TIA has been shown to upregulate exons when bound to the downstream intron

png

2.7 What are motifs like in the convolution kernal after training.

  • Comparing with known databases(DNA, jaspar. RNA, CISBP-RNA):

    png

3. Learning the regulatory code of the accessible genome with Deep CNN.

  • A very familiar Deep Learning structure comparing with the NBT 3300 article.
  • Source code available.

3.1 Data source

  • Encode Project Consortium + Roadmap Project, 164 samples’ BED file for peaks.
  • Hg19 genome sequences.
X Y
200 million x (600bp*4) 200 million x 164

e.g, Y is like:

png

3.2 The structure of the neural network

  • Familiar structure

png

  • More layers

png

This architecture is recommended by Spearmint

3.2.1: SGD:

divide all training-samples into many subsets. Using one set to update parameters in order to speed up.

gif

Sebastian Ruder’s blog:
http://sebastianruder.com/optimizing-gradient-descent/

3.2.2: Batch Normalization(BN):

Definition: Input: Values of x over a mini-batch:

Parameters to be learned: Output:

Yes, BN is just like z-score, which can scale values for training to the center of the optimizer, which can help speed up the optimization as well as get a higher accuracy.

3.2.3: Drop-out

Randomly choosing a subset of nodes to train in order to guarantee the robustness of the network.

png

Journal of Machine Learning Research 15(2014) 1929-1958:
http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

3.3 Basset accurately predicts cell-specific DNA accessibility

png

  • Better than formal method.
  • Differences for AUC between cell types.

3.4 The convolution kernal

png

For A:

  • x axis Information content is:
  • y axis Influences reflects the accessibility prediction changes over all cells.

  • high influeces but unannotated includes CpG and ATAT boxes.
  • 45% kernals could be annotated.

For C:

  • cell-specific patterns.

3.5 Accessibility and Binding-Sites

png

  • AP-1 complex members includes JUN and JUND
  • The open region inclueds JUN/JUND peaks.
  • Basset result showed a mutation in FOS motif will induce to the loss of the accessibility.

png

  • Conservation also showed a correlation with signal.

3.6 Using GWAS data to validate.

png

  • Basset score for general GWAS SNP vs causal GWAS SNP.

png

  • Basset report T>C a 85% for causality for vitiligo(白癜风) for rs4409785.
  • DNA were opened and CTCF could bind.

png

  • Encode CTCF data for raw reads in rs4409785.
  • 21 / 88 Samples were sequences here with 11 have peaks, and almost all sequenced samples have T>C mutation.

4. Deep Learning needs GPU

type Tesla K20m GPU Mac Intel i7-CPU
Seeded single-task 18m 6h37m
Full multi-task 85h -

本文作者Boqiang Hu, 欢迎评论、交流。
转载请务必标注出处: [20160808 Journal Club]Using Deep Learning to study the gene-regulatory elements.