[Bioc-sig-seq] ChIPpeakAnno fails to interpret standard chromosome name and strandedness

Tue Feb 9 07:12:32 CET 2010

Hello everybody

The package ChIPpeakAnno comes with a couple example RangedData sets.

With those toy sets you can familiarise yourself with the package. It works.

Now, if you redefine the sets so that space '1' becomes 'chr1' or
strand '1' becomes '+', the functions do not work.

Notice that 'chr1' and strand '+' is standard nomenclature, '1' and '1' is not.

I tried to fix it myself but failed.

Can anybody help?

Thanks,

Ivan

## These are the examples that work ##
myPeak1 = RangedData(IRanges(start = c(967654, 2010897, 2496704,
3075869, 3123260, 3857501, 201089),
                               end = c(967754, 2010997, 2496804,
3075969, 3123360, 3857601, 201089),
                             names = c("Site1", "Site2", "Site3",
"Site4", "Site5", "Site6", "site7")),
                     space = c("1", "2", "3", "4", "5", "6", "2"))

TFbindingSites = RangedData(IRanges(start = c(967659, 2010898,
2496700, 3075866, 3123260, 3857500, 96765, 201089, 249670, 307586,
312326, 385750),
                                      end = c(967869, 2011108,
2496920, 3076166, 3123470, 3857780, 96985, 201299, 249890, 307796,
312586, 385960),
                                    names = c("t1", "t2", "t3", "t4",
"t5", "t6", "t7", "t8", "t9", "t10", "t11", "t12")),
                           space = c("1", "2", "3", "4", "5", "6",
"1", "2", "3", "4", "5", "6"),
                           strand = c(1,1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1))
annotatedPeak2 = annotatePeakInBatch(myPeak1, AnnotationData = TFbindingSites)

## this are the examples that are properly named and then do not work ##
myPeak1 = RangedData(IRanges(start = c(967654, 2010897, 2496704,
3075869, 3123260, 3857501, 201089),
                               end = c(967754, 2010997, 2496804,
3075969, 3123360, 3857601, 201089),
                             names = c("Site1", "Site2", "Site3",
"Site4", "Site5", "Site6", "site7")),
                     space = c("chr1", "chr2", "chr3", "chr4", "chr5",
"chr6", "chr2"))

TFbindingSites = RangedData(IRanges(start = c(967659, 2010898,
2496700, 3075866, 3123260, 3857500, 96765, 201089, 249670, 307586,
312326, 385750),
                                      end = c(967869, 2011108,
2496920, 3076166, 3123470, 3857780, 96985, 201299, 249890, 307796,
312586, 385960),
                                    names = c("t1", "t2", "t3", "t4",
"t5", "t6", "t7", "t8", "t9", "t10", "t11", "t12")),
                           space = c("chr1", "chr2", "chr3", "chr4",
"chr5", "chr6", "chr1", "chr2", "chr3", "chr4", "chr5", "chr6"),
                           strand = c("+","+", "+", "+", "+", "+",
"-", "-", "-", "-", "-", "-"))
annotatedPeak2 = annotatePeakInBatch(myPeak1, AnnotationData = TFbindingSites)
Error in fix.by(by.x, x) : 'by' must specify valid column(s)

> sessionInfo()
R version 2.10.0 (2009-10-26)
x86_64-redhat-linux-gnu

locale:
 [1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=C
 [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=en_US
 [7] LC_PAPER=en_US       LC_NAME=C            LC_ADDRESS=C
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] ChIPpeakAnno_1.2.2                  org.Hs.eg.db_2.3.6
 [3] GO.db_2.3.5                         RSQLite_0.8-2
 [5] DBI_0.2-5                           AnnotationDbi_1.8.1
 [7] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.14.2
 [9] Biostrings_2.14.12                  IRanges_1.4.10
[11] multtest_2.2.0                      Biobase_2.6.1
[13] biomaRt_2.2.0

loaded via a namespace (and not attached):
[1] MASS_7.3-3      RCurl_1.3-1     XML_2.6-0       splines_2.10.0
[5] survival_2.35-8

Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878