[Bioc-sig-seq] ShortRead support for "id", "paired read number" and "multiplex index" when reading an Illumina export file
Martin Morgan
mtmorgan at fhcrc.org
Wed Feb 24 15:19:48 CET 2010
Hi Nicolas --
These sounds like very useful additions, and I'll try to incorporate
over the next day or so.
Thank you very much for the contribution!
Martin
On 02/24/2010 02:55 AM, Nicolas Delhomme wrote:
> Hi Martin, everyone,
>
> I've been looking forward to doing it for a long time now, and,
> finally, I got the time. So, I dove into the ShortRead C code to add
> some functionalities when loading Illumina export files. I've added an
> option to the readAligned method, specifically for the type
> "SolexaExport" that will in addition to the default information,
> retrieve the multiplex barcode and the paired read number (the 6 and 7th
> column of the export file, that were ignored so far). Additionally,
> using this option will create the sequence identifier (i.e. the one you
> get in a fastq file extracted from an export file) and populate the id
> slot of the alignedRead object.
>
> I've attached the diff of my local working copy with the revision 44842
> of ShortRead (the current one, as of this morning), two example export
> files (one from a single-end (SE) and one from a paired-end (PE)
> sequencing experiment) and a small R script showing the modified usage.
>
> I think that these functionalities are very interesting for people, like
> me, who have to analyze PE, multiplexed data, and I'd be glad if they
> got integrated.
>
> Finally, I'm, by far, not a C expert, so you might wish/(need?) to
> optimize what I've written.
>
> Best,
>
> ---------------------------------------------------------------
> Nicolas Delhomme
>
> High Throughput Functional Genomics Center
>
> European Molecular Biology Laboratory
>
> Tel: +49 6221 387 8426
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> ---------------------------------------------------------------
>
>
>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-sig-sequencing
mailing list