The function runs BEAD algorithm on aligned files. It requires short read alignment in BAM format, input track (either single experiment BAM or summed input created using sumBAMinputs function), mappability track in BigWig format (see details) and reference genome in FASTA fomat or as BSgenome genomic package.

beads(experiment, control, mappability, genome, uniq = TRUE, insert = 200L,
  mapq_cutoff = 10L, export = "BEADS", rdata = FALSE, export_er = TRUE,
  quickMap = TRUE, ...)

# S4 method for BamFile,BamFile,BigWigFile,ANY
beads(experiment, control,
  mappability, genome, uniq = TRUE, insert = 200L, mapq_cutoff = 10L,
  export = "BEADS", rdata = FALSE, export_er = TRUE, quickMap = TRUE,
  ...)

# S4 method for BamFile,BigWigFile,BigWigFile,ANY
beads(experiment, control,
  mappability, genome, uniq = TRUE, insert = 200L, mapq_cutoff = 10L,
  export = "BEADS", rdata = FALSE, export_er = TRUE, quickMap = TRUE,
  ...)

# S4 method for character,character,character,character
beads(experiment, control,
  mappability, genome, uniq = TRUE, insert = 200L, mapq_cutoff = 10L,
  export = "BEADS", rdata = FALSE, export_er = TRUE, quickMap = TRUE,
  ...)

Arguments

experiment

The path to experiment (sample) alignment file in BAM format or BamFile class.

control

The path to control (input) alignment file in BAM format or summed input file in BigWiggle format (accepts BamFile and BigWigFile classes as well).

mappability

The path to mappability track in BigWiggle format (accepts BigWigFile class as well).

genome

The path reference genome FASTA or UCSC identifier for installed BSgenome packages e.g. "hg19" for human

uniq

If TRUE the alignment will be uniqued, i.e. only one of non-unique reads will be used.

insert

The expected insert size in base pairs.

mapq_cutoff

The cutoff parameter used to filter BAM alignments for low mapping quality reads.

export

The character vector of BW tracks to be exported.

rdata

If TRUE all data will be exported as R binaries in addition to BigWiggle tracks.

export_er

If TRUE the enriched regions will be exported to BED format.

quickMap

If TRUE the quick mappability processing be used, otherwise the mappability track will be processed by running mean smoothing.

...

parameters passed by reference to rbeads internal functions.

Value

Named BigWigFileList containing exported BW file connections.

Details

Mappability/alignability tracks gives numeric score for level of reference sequence uniqueness. Short reads cannot be confidently aligned to non-unique sequences, so BEADS masks the out. The pre-calculated tracks for many species can be found in genome databases, e.g. following link gives the collection of human tracks for reference genome GRCh37/hg19: http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=340327143&g=wgEncodeMapability.

For other species mappability track can be easily calculated from reference FASTA file using GEM-mappability software (http://www.ncbi.nlm.nih.gov/pubmed/22276185). The GEM library binaries are available on SourceForge: (http://sourceforge.net/projects/gemlibrary/files/gem-library/), while the reference genome FASTA file can be obtained from UCSC: (http://hgdownload.soe.ucsc.edu/downloads.html). The following example illustrates the procedure for C. elegans reference genome (36bp read length and 8 parallel threads set in options): gem-indexer -i ce10.fa -o ce10 -T 8 gem-mappability -I ce10.gem -l 36 -o ce10 -T 8 gem-2-wig -I ce10.gem -i ce10.map -o ce10 wigToBigWig ce10.wig ce10.chrom.sizes ce10.bw

By default only BEADS normalized track is exported. The export files can be control by export parameter, which is the character vector containing following values:

'BEADS'

fully normalized files, i.e. GC correction, mappability correction, division by input and scaling by median

'GCandMap'

GC correction and mappability correction

'GCcorected'

GC correction

'readsCoverage'

raw reads coverage, after extending to insert length, filtering /codemapq_cutoff and uniquing if uniq

'control_GCandMap'

GC normalized and mappability corrected input, only works if input is a BAM file

'control_GCcorected'

GC normalized, only works if input is a BAM file

'control_readsCoverage'

input raw reads coverage, after extending to insert length, filtering mapq_cutoff and uniquing if uniq, only works if input is a BAM file

For example: beads(sample_bam, input_bam, map, fa, export=c('control_readsCoverage', 'control_GCcorected', 'control_GCandMap', 'readsCoverage', 'GCcorected', 'GCandMap', 'BEADS'))

Methods (by class)

  • experiment = BamFile,control = BamFile,mappability = BigWigFile,genome = ANY: Method for signature experiment='BamFile', control='BamFile', mappability='BigWigFile', genome='ANY'

  • experiment = BamFile,control = BigWigFile,mappability = BigWigFile,genome = ANY: Method for signature experiment='BamFile', control='BigWigFile', mappability='BigWigFile', genome='ANY'

  • experiment = character,control = character,mappability = character,genome = character: Method for signature experiment='character', control='character', mappability='character', genome='character'

References

http://beads.sourceforge.net/ http://www.ncbi.nlm.nih.gov/pubmed/21646344

Examples

# Get the paths of example files sample_bam <- system.file("extdata", "GSM1208360_chrI_100Kb_q5_sample.bam", package="rbeads") input_bam <- system.file("extdata", "Input_fE3_AA169.bam", package="rbeads") SummedInput_bw <- system.file("extdata", "Ce10_HiSeqFRMInput_UNIQ_bin25bp_chrI_100Kb_sample.bw", package="rbeads") map_bw <- system.file("extdata", "ce10_mappability_chrI_100Kb_sample.bw", package="rbeads") ref_fa <- system.file("extdata", "ce10_chrI_100Kb_sample.fa", package="rbeads") # Set the directory where the output files will be crated setwd(tempdir()) # Run BEADS for BAM input file beads(sample_bam, input_bam, map_bw, ref_fa)
#> Importing refference genome...
#> Importing mappability [binning]...
#> Importing input...
#> Reading alignment file [/Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam]... <0.04s> #> Processing alignment file [uniq=TRUE]:... <0.059s>
#> INFO: 22409 out of 27847 (80.47%) reads mapped uniquely with 10 quality cutoff: [All=27847, Mapped=27847, Qmapped=27807, Quniq=22409].
#> Calculate input GCcontent.... <0.614s> #> Sample genome for GCcontent.... <0.005s> #> Calculate histograms for genomic (a) nad and sample (b) GC content...
#> <0.016s> #> Calculate GC weighting vector...
#> <0.015s> #> Assign GC score to every read... <0s> #> Prepare GCscore-enriched GRanges... <0.007s> #> Calculate GC weighted coverage... <0.018s> #> Mappability correction... <0.01s>
#> INFO: Total 3% of genome in masked.
#> Reading alignment file [/Users/przemol/code/rbeads/inst/extdata/GSM1208360_chrI_100Kb_q5_sample.bam]... <0.025s> #> Processing alignment file [uniq=TRUE]:... <0.035s>
#> INFO: 8216 out of 10373 (79.21%) reads mapped uniquely with 10 quality cutoff: [All=10373, Mapped=10373, Qmapped=10346, Quniq=8216].
#> Calculate coverage... <0.009s>
#> INFO: a = 18
#> Calling enriched regions... <0.021s> #> Mask out reads in enriched regions... <0.009s>
#> INFO: percentage of reads in non-enriched regions: 57.17%
#> Calculate input GCcontent.... <0.433s> #> Sample genome for GCcontent.... <0.015s> #> Calculate histograms for genomic (a) nad and sample (b) GC content...
#> <0.029s> #> Calculate GC weighting vector...
#> <0.014s> #> Assign GC score to every read... <0s> #> Prepare GCscore-enriched GRanges... <0.003s> #> Calculate GC weighted coverage... <0.013s> #> Mappability correction... <0.01s>
#> INFO: Total 3% of genome in masked.
#> DivStep: Divideing... <0.017s> #> DivStep [median ]: Scaling... <0.022s>
#> INFO: BEADS score scaling coefficient = 0.297
#> Exporing ER...
#> Exporing BigWig tracks...
#> Exporting to BigWiggle: GSM1208360_chrI_100Kb_q5_sample_BEADS_linear_1bp.bw
# Run BEADS for SummedInput (BigWig) input file beads(sample_bam, SummedInput_bw, map_bw, ref_fa)
#> Importing refference genome...
#> Importing mappability [binning]...
#> Importing input...
#> Reading alignment file [/Users/przemol/code/rbeads/inst/extdata/GSM1208360_chrI_100Kb_q5_sample.bam]... <0.018s> #> Processing alignment file [uniq=TRUE]:... <0.032s>
#> INFO: 8216 out of 10373 (79.21%) reads mapped uniquely with 10 quality cutoff: [All=10373, Mapped=10373, Qmapped=10346, Quniq=8216].
#> Calculate coverage... <0.009s>
#> INFO: a = 18
#> Calling enriched regions... <0.02s> #> Mask out reads in enriched regions... <0.01s>
#> INFO: percentage of reads in non-enriched regions: 57.17%
#> Calculate input GCcontent.... <0.128s> #> Sample genome for GCcontent.... <0.014s> #> Calculate histograms for genomic (a) nad and sample (b) GC content...
#> <0.021s> #> Calculate GC weighting vector...
#> <0.016s> #> Assign GC score to every read... <0s> #> Prepare GCscore-enriched GRanges... <0.004s> #> Calculate GC weighted coverage... <0.013s> #> Mappability correction... <0.01s>
#> INFO: Total 3% of genome in masked.
#> DivStep: Divideing... <0.011s> #> DivStep [median ]: Scaling... <0.014s>
#> INFO: BEADS score scaling coefficient = 0.081
#> Exporing ER...
#> Exporing BigWig tracks...
#> Exporting to BigWiggle: GSM1208360_chrI_100Kb_q5_sample_BEADS_linear_1bp.bw
# NOT RUN { ## Run BEADS for BSgenome package, the reference genome package have to be installed prior to running this example # source("http://bioconductor.org/biocLite.R") # biocLite("BSgenome.Celegans.UCSC.ce10") # library(BSgenome.Celegans.UCSC.ce10) beads(sample_bam, SummedInput_bw, map_bw, genome='ce10') ## Run BEADS for all BAM files in the directory #lapply(dir(pattern='bam$'), beads, control=input, mappability=map_bw, genome=ref_fa) # }
# NOT RUN { ## In parallel: # library(parallel) # mclapply(dir(pattern='.bam$'), beads, control=input, mappability=map_bw, genome='ce10', mc.cores=parallel::detectCores()) # }