[BioC] New package to identify differentially expressed genes from RNA-seq data
Ulrike Goebel
ugoebel at mpiz-koeln.mpg.de
Fri Oct 16 09:04:42 CEST 2009
Hi Likun,
Likun Wang wrote:
> Dear Ulrike,
>
> Are there invalid values in your gene expression file?
That was it ! Some very high expression values had ","s introduces by
Excel to separate the powers of thousand.
After removing the commas, it works fine.
Seems to be a nice package !
Best, Ulrike
> Look at the following example. All the values should be numeric.
>
> > file1 <- "./test.txt"
> > rt1 <- read.table(file1, header=FALSE,sep="\t")
> > head(rt1,n=4)
> V1 V2 V3
> 1 SGN-U573325 6.17 STRING1
> 2 SGN-U591447 0.77 <NA>
> 3 SGN-U592038 6.27 OK
> 4 SGN-U573325 6.17 619.72
> > rt1[,2]
> [1] 6.17 0.77 6.27 6.17
> > rt1[,3]
> [1] STRING1 <NA> OK 619.72
> Levels: 619.72 OK STRING1
> > mode(rt1[,2])
> [1] "numeric"
> > mode(rt1[,3])
> [1] "numeric" # We do not want this
> > mode(as(rt1[,2], "matrix"))
> [1] "numeric"
> > mode(as(rt1[,3], "matrix"))
> [1] "character" # We want this
>
> Please contact me anytime if this problem is not fixed.
> Thanks.
> Best regards.
> ---------
> Likun
>
> 2009/10/15 Ulrike Goebel <ugoebel at mpiz-koeln.mpg.de
> <mailto:ugoebel at mpiz-koeln.mpg.de>>
>
> Dear Likun,
>
> I am not sure whether the following is a problem of your package,
> or my input ..
>
> I wanted to compare two samples with a single replicate each,
> using DEGseq(method="MARS").
>
> The input file simply looks like this:
> SGN-U573325 6.17 619.72
> SGN-U591447 0.77 101.16
> SGN-U592038 6.27 37.8
> ...
> (The fields are tab-separated)
>
> >DEGexp(geneExpFile1=my_infile,expCol1=2,
> geneExpFile2=my_infile,expCol2=3,
> groupLabel1="condition1",groupLabel2="condition2",
> method="MARS",
> sep="\t",
> header=FALSE
> )
> Please wait...
> Error in sum(exp_values) : invalid 'type' (character) of argument
>
> I traced this back by calling the routine in debug mode:
>
> debug: rt1 <- read.table(geneExpFile1, header = header, sep = sep)
> Browse[2]>head(rt1,n=2)
> V1 V2 V3
> 1 SGN-U573325 6.17 619.72
> 2 SGN-U591447 0.77 101.16
> Browse[2]> mode(rt1[,expCol1[i]])
> [1] "numeric"
>
> Browse[2]>
> debug: exp_values <- as(rt1[expCol1[i]], "matrix")
> Browse[2]> mode(exp_values)
> [1] "character"
>
> I am not sure whether you have a reason to extract the columns
> using "rt1[expCol1[i]]" rather
> than "rt1[,expCol1[i]]" ? The latter *is* numeric ...
>
> Best regards
>
> Ulrike
>
> > sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-08-01 r49053)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] tcltk stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] DEGseq_0.99.0 samr_1.26 impute_1.18.1 qvalue_1.19.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.10.0
>
>
>
>
>
>
> Likun Wang wrote:
>
> You can find it at
> http://www.bioconductor.org/packages/2.5/bioc/html/DEGseq.html.
> Thanks for your attention, contact me anytime.
>
> 2009/10/15 Naomi Altman <naomi at stat.psu.edu
> <mailto:naomi at stat.psu.edu>>
>
>
>
> I could not find this package on bioconductor.org
> <http://bioconductor.org/>. Thanks to rules about
> software downloads here, it will take a while for me to
> get R 2.10.0, and I
> would like to have
> a look at the documentation in the meantime. Where could
> I find it?
>
> Thanks,
> Naomi
>
>
> At 09:15 AM 10/14/2009, Likun Wang wrote:
>
>
>
> Hi all,
> We present a new R package DEGseq for identifying
> differentially
> expressed genes from RNA-seq data.The input of DEGseq
> is uniquely mapped
> reads from RNA-seq data with a gene annotation of the
> corresponding
> genome,
> or gene (or transcript isoform) expression values
> provided by other
> programs. The output of DEGseq includes a text file
> and an XHTML summary
> page. The text file contains the expression values for
> the samples, a
> P-value and two kinds of Q-values for each gene to
> denote its expression
> difference between libraries. Two novel MA-plot based
> methods along with
> some existing methods have been integrated into it.
>
> You may access it through the commands:
> > source("http://bioconductor.org/biocLite.R") # R
> >= 2.10.0
> > biocLite("DEGseq")
>
> Comments, questions, etc, are all welcome.
> Best regards
> Likun
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> <mailto:Bioconductor at stat.math.ethz.ch>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
> Naomi S. Altman
> 814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics
> 814-863-7114 (fax)
> Penn State University 814-865-1348
> (Statistics)
> University Park, PA 16802-2111
>
>
>
>
>
>
>
>
>
>
> --
> Dr. Ulrike Goebel
> Bioinformatics Support
> Max-Planck Institute for Plant Breeding Research
> Carl-von-Linne Weg 10
> 50829 Cologne
> Germany
> +49(0) 221 5062 121
>
>
>
>
> --
> Likun Wang
> MOE Key Laboratory of Bioinformatics and Bioinformatics Div,
> TNLIST / Department of Automation, Tsinghua University,
> Beijing 100084, China
> Tel: +86-10-62794294
> Fax: +86-10-62786911
> Email: wang.likun at gmail.com <mailto:wang.likun at gmail.com>
--
Dr. Ulrike Goebel
Bioinformatics Support
Max-Planck Institute for Plant Breeding Research
Carl-von-Linne Weg 10
50829 Cologne
Germany
+49(0) 221 5062 121
More information about the Bioconductor
mailing list