The function creates the BigWig formatted summed input from aligned input experiments in BAM format. The first two step of BEADS normalization, i.e. GC normalization and mappability correction are performed independently on all input experiments. Further, the obtain signals are summed. The tracks are normalized for tag count, so each experiment contribute in equally to summed input. It is important to select only good quality run for summed input preparation. FastQC is a good program for sequencing experiment quality assessment. It is important to create the summed input with the same parameters as will be used for future BEADS runs.
sumBAMinputs(bam.controls = dir(pattern = "bam$"), bw.mappability, genome, out_name = paste0("Summed_", length(bam.controls), "_experiments"), uniq = TRUE, insert = 200L, mapq_cutoff = 10L, quickMap = TRUE, bin = 25L)
bam.controls | The character vector containing paths to control (input) alignment files in BAM format |
---|---|
bw.mappability | The path to mappability track in BigWiggle format (accepts |
genome | The path reffrence genome FASTA or UCSC identifier fo installed |
out_name | The prefix for exported BigWiggle, full name |
uniq | If TRUE the alignemnt will be uniqued, i.e. only one of non-unique reads will be used |
insert | The expected insert size in base pairs. |
mapq_cutoff | The cutoff parameter used to filter BAM alignments for low mapping quality reads. |
quickMap | If TRUE the quick mappability processing be used, otherwise the mappability track will be processed by running mean smoothing |
bin | The desired binning window. 1L disables binning. It is recommended to set it >1L, as it greatly reduces the size of the output BW files and speeds up further BEADS runs. |
BigWigFile
class containing connection to summed input BigWig file.
http://beads.sourceforge.net/ http://www.ncbi.nlm.nih.gov/pubmed/21646344
# Get the paths of example files input_bam <- system.file("extdata", "Input_fE3_AA169.bam", package="rbeads") map_bw <- system.file("extdata", "ce10_mappability_chrI_100Kb_sample.bw", package="rbeads") ref_fa <- system.file("extdata", "ce10_chrI_100Kb_sample.fa", package="rbeads") # Set the directory where the output files will be crated setwd(tempdir()) # Calculate summed input; in this example we will pretend that we have 3 test input files, # in real applications these should be different experiments out1 <- sumBAMinputs(c(input_bam, input_bam, input_bam), map_bw, ref_fa)#>#>#>#>#> Reading alignment file [/Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam]... <0.025s> #> Processing alignment file [uniq=TRUE]:... <0.084s>#>#> Calculate input GCcontent.... <0.888s> #> Sample genome for GCcontent.... <0.006s> #> Calculate histograms for genomic (a) nad and sample (b) GC content...#> <0.013s> #> Calculate GC weighting vector...#> <0.015s> #> Assign GC score to every read... <0.001s> #> Prepare GCscore-enriched GRanges... <0.007s> #> Calculate GC weighted coverage... <0.028s> #> Mappability correction... <0.016s>#>#>#>#> Reading alignment file [/Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam]... <0.031s> #> Processing alignment file [uniq=TRUE]:... <0.074s>#>#> Calculate input GCcontent.... <0.784s> #> Sample genome for GCcontent.... <0.006s> #> Calculate histograms for genomic (a) nad and sample (b) GC content...#> <0.02s> #> Calculate GC weighting vector...#> <0.015s> #> Assign GC score to every read... <0s> #> Prepare GCscore-enriched GRanges... <0.007s> #> Calculate GC weighted coverage... <0.017s> #> Mappability correction... <0.01s>#>#>#>#> Reading alignment file [/Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam]... <0.029s> #> Processing alignment file [uniq=TRUE]:... <0.058s>#>#> Calculate input GCcontent.... <0.633s> #> Sample genome for GCcontent.... <0.006s> #> Calculate histograms for genomic (a) nad and sample (b) GC content...#> <0.017s> #> Calculate GC weighting vector...#> <0.016s> #> Assign GC score to every read... <0s> #> Prepare GCscore-enriched GRanges... <0.007s> #> Calculate GC weighted coverage... <0.019s> #> Mappability correction... <0.012s>#>#>#>#>#>#> #># Sanity check: The mean signal ratio between previous example (using 3 identical inputs) # and single normalized input should be very close to 3 # (limited by binning function and BigWig summary numerical precision) out2 <- sumBAMinputs(input_bam, map_bw, ref_fa)#>#>#>#>#> Reading alignment file [/Users/przemol/code/rbeads/inst/extdata/Input_fE3_AA169.bam]... <0.026s> #> Processing alignment file [uniq=TRUE]:... <0.051s>#>#> Calculate input GCcontent.... <1.269s> #> Sample genome for GCcontent.... <0.008s> #> Calculate histograms for genomic (a) nad and sample (b) GC content...#> <0.053s> #> Calculate GC weighting vector...#> <0.017s> #> Assign GC score to every read... <0s> #> Prepare GCscore-enriched GRanges... <0.008s> #> Calculate GC weighted coverage... <0.022s> #> Mappability correction... <0.015s>#>#>#>#> #>rtracklayer::summary(out1)[[1]]$score / rtracklayer::summary(out2)[[1]]$score#> [1] 2.999847