SeqPlots - a fast interactive web tool for visualizing next generation sequencing signals along genomic features.

Przemyslaw Stempor @ Graduate Seminar Series, 17 November 2015

The BIG question!

What are the relationships between chromatin features, underlying DNA sequence and gene regulation?

chrom

Source: http://www.cliffsnotes.com/assets/24452.jpg

mod

Every BIG question needs small tools.

ChIP-seq
RNA-seq
DNase-seq
MNase-seq
ATAC-seq

rad

1. DNA library

2. Short reads

3. Genomic position

Technical problem – typical experiment produce tens of millions of such positions over hundreds of millions to billions possible locations (base pairs) in the genome!

Solutions:

Shrink/simplify the data so they are small enough for us to understand (e.g. peak calls, unsupervised machine learning)

Use data visualization to make an original comprehensible to us

Scientific data visualization – is it important?

Data visualization is prevalent approach in science

Shaded matrix display from Loua (1873).

Since the advent of sequencing techniques there is great advance in methods specific to this field
Helps us to better understand the data and find the patterns that might be lost due to shrinkage/simplification
Great for exploratory data analyses
Very useful for results presentation

We can visualize reads directly, but usually more useful is converting them to a read coverage

Source: http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html

*-seq data visualization:
global approaches.

plot of chunk circos

*-seq data visualization:
genome browsers.

UCSC Genome Browser

IGV (Broad Institute)

Biodalliance (Thomas Down) - live

*-seq data visualization:
multiple parts of genome, using pre-defined genomic features

plot of chunk aver

plot of chunk heatmap

Command line tools, e.g. ngsplot

Tools on Galaxy platform: deepTools, Cistrome, etc.

Why do we need yet another visualization tool?

Existing solutions did not meet our requirements:

Custom scripts and pargramic languages labraries allows to run things in batch, but are too complicated to run for users without IT expertise
Even with good training these tools requires a lot of time to code
Galaxy/Cistrome was too slow and not configurable enough (plus data privacy problem!)

I want take the best from two worlds - connect the intuitiveness and interactiveness of genome browsers with visualization power of plotting 1000s of genomic features at once.

Goal: fast, intuitive software for exploratory data analyses!

SeqPlots is this software!

We developed a highly configurable, GUI operated web application for rapidly generating sets of publication quality linear plots and heatmaps.

See SeqPlots in action on the movie...

Quick explanaton of the example in hand

Files - signal profiles from ChIP-seq experiments:

H3K4me3 (mark active promoters)
H3K36me3 (mark transcribed regions of active genes)

Files - genomic features:

C. elegans transcription start sites (TSS), divided into 5 expression bins based on RNA-seq data

Tasks:

Compare histone marks between highly and lowly expressed genes.
Check if CpG (CG-dinucleotide) occupancy is higher on transcription start sites (TSS) relative to local neighborhood

The app is available as:

R package from Bioconductor
Mac OS X app
Server deployment with Shiny Server
Web service (shinyapps.io, Amazon EC2, etc.)

How to get SeqPlots:

Official documentation with installation instructions:
http://przemol.github.io/seqplots
Bioconductor:
http://bioconductor.org/packages/seqplots
GitHub:
https://github.com/Przemol/seqplots

Thank you!