[Bioc-sig-seq] ChIPpeakAnno fails to interpret standard chromosome name and strandedness

Tue Feb 9 18:22:03 CET 2010

Hi Ivan,

Thank you very much for your valuable suggestion and for the examples.

Both examples should work now. The strand can be represented as "+/-" or
1/-1. The chromosome can be represented as "chr1" or "1".

You can get the fixed code from svn ( ChIPpeakAnno_1.2.3) or wait for it to
be posted.

Best regards,

Julie

*******************************************
Lihua Julie Zhu, Ph.D
Research Associate Professor
Program Gene Function and Expression
University of Massachusetts Medical School
364 Plantation Street, Room 613
Worcester, MA 01605
508-856-5256
http://www.umassmed.edu/pgfe/faculty/zhu.cfm

On 2/9/10 1:12 AM, "Ivan Gregoretti" <ivangreg at gmail.com> wrote:

> Hello everybody
> 
> The package ChIPpeakAnno comes with a couple example RangedData sets.
> 
> With those toy sets you can familiarise yourself with the package. It works.
> 
> Now, if you redefine the sets so that space '1' becomes 'chr1' or
> strand '1' becomes '+', the functions do not work.
> 
> Notice that 'chr1' and strand '+' is standard nomenclature, '1' and '1' is
> not.
> 
> I tried to fix it myself but failed.
> 
> Can anybody help?
> 
> Thanks,
> 
> Ivan
> 
> 
> ## These are the examples that work ##
> myPeak1 = RangedData(IRanges(start = c(967654, 2010897, 2496704,
> 3075869, 3123260, 3857501, 201089),
>                                end = c(967754, 2010997, 2496804,
> 3075969, 3123360, 3857601, 201089),
>                              names = c("Site1", "Site2", "Site3",
> "Site4", "Site5", "Site6", "site7")),
>                      space = c("1", "2", "3", "4", "5", "6", "2"))
> 
> TFbindingSites = RangedData(IRanges(start = c(967659, 2010898,
> 2496700, 3075866, 3123260, 3857500, 96765, 201089, 249670, 307586,
> 312326, 385750),
>                                       end = c(967869, 2011108,
> 2496920, 3076166, 3123470, 3857780, 96985, 201299, 249890, 307796,
> 312586, 385960),
>                                     names = c("t1", "t2", "t3", "t4",
> "t5", "t6", "t7", "t8", "t9", "t10", "t11", "t12")),
>                            space = c("1", "2", "3", "4", "5", "6",
> "1", "2", "3", "4", "5", "6"),
>                            strand = c(1,1, 1, 1, 1, 1, -1, -1, -1, -1, -1,
> -1))
> annotatedPeak2 = annotatePeakInBatch(myPeak1, AnnotationData = TFbindingSites)
> 
> 
> ## this are the examples that are properly named and then do not work ##
> myPeak1 = RangedData(IRanges(start = c(967654, 2010897, 2496704,
> 3075869, 3123260, 3857501, 201089),
>                                end = c(967754, 2010997, 2496804,
> 3075969, 3123360, 3857601, 201089),
>                              names = c("Site1", "Site2", "Site3",
> "Site4", "Site5", "Site6", "site7")),
>                      space = c("chr1", "chr2", "chr3", "chr4", "chr5",
> "chr6", "chr2"))
> 
> TFbindingSites = RangedData(IRanges(start = c(967659, 2010898,
> 2496700, 3075866, 3123260, 3857500, 96765, 201089, 249670, 307586,
> 312326, 385750),
>                                       end = c(967869, 2011108,
> 2496920, 3076166, 3123470, 3857780, 96985, 201299, 249890, 307796,
> 312586, 385960),
>                                     names = c("t1", "t2", "t3", "t4",
> "t5", "t6", "t7", "t8", "t9", "t10", "t11", "t12")),
>                            space = c("chr1", "chr2", "chr3", "chr4",
> "chr5", "chr6", "chr1", "chr2", "chr3", "chr4", "chr5", "chr6"),
>                            strand = c("+","+", "+", "+", "+", "+",
> "-", "-", "-", "-", "-", "-"))
> annotatedPeak2 = annotatePeakInBatch(myPeak1, AnnotationData = TFbindingSites)
> Error in fix.by(by.x, x) : 'by' must specify valid column(s)
> 
> 
>> sessionInfo()
> R version 2.10.0 (2009-10-26)
> x86_64-redhat-linux-gnu
> 
> locale:
>  [1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=C
>  [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=en_US
>  [7] LC_PAPER=en_US       LC_NAME=C            LC_ADDRESS=C
> [10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
>  [1] ChIPpeakAnno_1.2.2                  org.Hs.eg.db_2.3.6
>  [3] GO.db_2.3.5                         RSQLite_0.8-2
>  [5] DBI_0.2-5                           AnnotationDbi_1.8.1
>  [7] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.14.2
>  [9] Biostrings_2.14.12                  IRanges_1.4.10
> [11] multtest_2.2.0                      Biobase_2.6.1
> [13] biomaRt_2.2.0
> 
> loaded via a namespace (and not attached):
> [1] MASS_7.3-3      RCurl_1.3-1     XML_2.6-0       splines_2.10.0
> [5] survival_2.35-8
> 
> 
> Ivan Gregoretti, PhD
> National Institute of Diabetes and Digestive and Kidney Diseases
> National Institutes of Health
> 5 Memorial Dr, Building 5, Room 205.
> Bethesda, MD 20892. USA.
> Phone: 1-301-496-1592
> Fax: 1-301-496-9878
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>