The function creates the BigWig formatted summed input from aligned input experiments in BAM format. The first two step of BEADS normalization, i.e. GC normalization and mappability correction are performed independently on all input experiments. Further, the obtain signals are summed. The tracks are normalized for tag count, so each experiment contribute in equally to summed input. It is important to select only good quality run for summed input preparation. FastQC is a good program for sequencing experiment quality assessment. It is important to create the summed input with the same parameters as will be used for future BEADS runs.

sumBAMinputs(bam.controls = dir(pattern = "bam$"), bw.mappability, genome,
  out_name = paste0("Summed_", length(bam.controls), "_experiments"),
  uniq = TRUE, insert = 200L, mapq_cutoff = 10L, quickMap = TRUE,
  bin = 25L)

Arguments

bam.controls

The character vector containing paths to control (input) alignment files in BAM format

bw.mappability

The path to mappability track in BigWiggle format (accepts BigWigFile cless as well)

genome

The path reffrence genome FASTA or UCSC identifier fo installed BSgenome packages e.g. "hg19" for human

out_name

The prefix for exported BigWiggle, full name out_name_SummedInput_linear_binbp.bw

uniq

If TRUE the alignemnt will be uniqued, i.e. only one of non-unique reads will be used

insert

The expected insert size in base pairs.

mapq_cutoff

The cutoff parameter used to filter BAM alignments for low mapping quality reads.

quickMap

If TRUE the quick mappability processing be used, otherwise the mappability track will be processed by running mean smoothing

bin

The desired binning window. 1L disables binning. It is recommended to set it >1L, as it greatly reduces the size of the output BW files and speeds up further BEADS runs.

Value

BigWigFile class containing connection to summed input BigWig file.

References

http://beads.sourceforge.net/ http://www.ncbi.nlm.nih.gov/pubmed/21646344

Examples

# Get the paths of example files input_bam <- system.file("extdata", "Input_fE3_AA169.bam", package="rbeads") map_bw <- system.file("extdata", "ce10_mappability_chrI_100Kb_sample.bw", package="rbeads") ref_fa <- system.file("extdata", "ce10_chrI_100Kb_sample.fa", package="rbeads") # Set the directory where the output files will be crated setwd(tempdir()) # Calculate summed input; in this example we will pretend that we have 3 test input files, # in real applications these should be different experiments out1 <- sumBAMinputs(c(input_bam, input_bam, input_bam), map_bw, ref_fa)
#> Importing refference genome...
#> Importing mappability [binning]...
#> 1) Processing: /Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam
#> Importing input...
#> Reading alignment file [/Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam]... <0.025s> #> Processing alignment file [uniq=TRUE]:... <0.084s>
#> INFO: 22409 out of 27847 (80.47%) reads mapped uniquely with 10 quality cutoff: [All=27847, Mapped=27847, Qmapped=27807, Quniq=22409].
#> Calculate input GCcontent.... <0.888s> #> Sample genome for GCcontent.... <0.006s> #> Calculate histograms for genomic (a) nad and sample (b) GC content...
#> <0.013s> #> Calculate GC weighting vector...
#> <0.015s> #> Assign GC score to every read... <0.001s> #> Prepare GCscore-enriched GRanges... <0.007s> #> Calculate GC weighted coverage... <0.028s> #> Mappability correction... <0.016s>
#> INFO: Total 3% of genome in masked.
#> 2) Processing: /Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam
#> Importing input...
#> Reading alignment file [/Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam]... <0.031s> #> Processing alignment file [uniq=TRUE]:... <0.074s>
#> INFO: 22409 out of 27847 (80.47%) reads mapped uniquely with 10 quality cutoff: [All=27847, Mapped=27847, Qmapped=27807, Quniq=22409].
#> Calculate input GCcontent.... <0.784s> #> Sample genome for GCcontent.... <0.006s> #> Calculate histograms for genomic (a) nad and sample (b) GC content...
#> <0.02s> #> Calculate GC weighting vector...
#> <0.015s> #> Assign GC score to every read... <0s> #> Prepare GCscore-enriched GRanges... <0.007s> #> Calculate GC weighted coverage... <0.017s> #> Mappability correction... <0.01s>
#> INFO: Total 3% of genome in masked.
#> 3) Processing: /Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam
#> Importing input...
#> Reading alignment file [/Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam]... <0.029s> #> Processing alignment file [uniq=TRUE]:... <0.058s>
#> INFO: 22409 out of 27847 (80.47%) reads mapped uniquely with 10 quality cutoff: [All=27847, Mapped=27847, Qmapped=27807, Quniq=22409].
#> Calculate input GCcontent.... <0.633s> #> Sample genome for GCcontent.... <0.006s> #> Calculate histograms for genomic (a) nad and sample (b) GC content...
#> <0.017s> #> Calculate GC weighting vector...
#> <0.016s> #> Assign GC score to every read... <0s> #> Prepare GCscore-enriched GRanges... <0.007s> #> Calculate GC weighted coverage... <0.019s> #> Mappability correction... <0.012s>
#> INFO: Total 3% of genome in masked.
#> Summing /Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam [weight=1]...
#> Summing /Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam [weight=1]...
#> Summing /Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam [weight=1]...
#> Exporting to BigWiggle: Summed_3_experiments_SummedInput_linear_25bp.bw
#> #> Summed inpud exported to BigWiggle file: Summed_3_experiments_SummedInput_linear_25bp.bw
# Sanity check: The mean signal ratio between previous example (using 3 identical inputs) # and single normalized input should be very close to 3 # (limited by binning function and BigWig summary numerical precision) out2 <- sumBAMinputs(input_bam, map_bw, ref_fa)
#> Importing refference genome...
#> Importing mappability [binning]...
#> 1) Processing: /Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam
#> Importing input...
#> Reading alignment file [/Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam]... <0.026s> #> Processing alignment file [uniq=TRUE]:... <0.051s>
#> INFO: 22409 out of 27847 (80.47%) reads mapped uniquely with 10 quality cutoff: [All=27847, Mapped=27847, Qmapped=27807, Quniq=22409].
#> Calculate input GCcontent.... <1.269s> #> Sample genome for GCcontent.... <0.008s> #> Calculate histograms for genomic (a) nad and sample (b) GC content...
#> <0.053s> #> Calculate GC weighting vector...
#> <0.017s> #> Assign GC score to every read... <0s> #> Prepare GCscore-enriched GRanges... <0.008s> #> Calculate GC weighted coverage... <0.022s> #> Mappability correction... <0.015s>
#> INFO: Total 3% of genome in masked.
#> Summing /Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam [weight=1]...
#> Exporting to BigWiggle: Summed_1_experiments_SummedInput_linear_25bp.bw
#> #> Summed inpud exported to BigWiggle file: Summed_1_experiments_SummedInput_linear_25bp.bw
rtracklayer::summary(out1)[[1]]$score / rtracklayer::summary(out2)[[1]]$score
#> [1] 2.999847