[Bioc-sig-seq] Input from multiple Solexa runs

ig2ar-saf2 at yahoo.co.uk ig2ar-saf2 at yahoo.co.uk
Thu Apr 23 23:05:57 CEST 2009


Thank you Deepayan.

What package provides combineLaneReads()?

Ivan





----- Original Message ----
From: Deepayan Sarkar <deepayan.sarkar at gmail.com>
To: ig2ar-saf2 at yahoo.co.uk
Cc: bioc-sig-sequencing at r-project.org
Sent: Thursday, 23 April, 2009 16:37:57
Subject: Re: [Bioc-sig-seq] Input from multiple Solexa runs

On Thu, Apr 23, 2009 at 12:24 PM,  <ig2ar-saf2 at yahoo.co.uk> wrote:
>
> Hello,
>
> How do you pool together reads from multiple Solexa runs?
>
> Specificaly:
>
> Say that the structure of my hard drive looks like this
>
> /experiment01/GERALD_analysis01/(here all 8 lanes)
> /experiment01/GERALD_analysis02/(here all 8 lanes)
> /experiment02/GERALD_analysis01/(here all 8 lanes)
>
> Now say that my control1 lanes are in
> experiment01, analysis01, lanes 1 and 2
> experiment01, analysis02, lanes 3 and 4
> (all four lanes are biologically identical)
>
> Now say that my treatment1 is in
> experiment02, analysis01, lane 5 and 7
>
> Now say that I plan to read all Reads with a filter instance called myFilter.
>
> The question:
>
> How do I collect all that information into a single GenomeDataList object where I can call its GenomeData objects like this
> myGenomeDataList$control1
> myGenomeDataList$treatment1
> ?
>
> Please try to answer the specific example because there is no alternative documentation.

The function you are looking for is 'combineLaneReads'. If x is a
GenomeDataList, then

combineLaneReads(x)

will give you a GenomeData object combining all reads in x.
Additionally, you can treat a GenomeDataList object like a list, in
the sense that you can subset them using [, combine them using c(),
and changes their names using names()<-.


So, let's say your per-run data are in variables called

expt1_analysis1
expt1_analysis2
expt2_analysis1

which are all GenomeDataList objects with names "1", "2", ..., "8" (I
assume you know how to do this; as we discussed this a few days back).
Then, the usage would be:

control1 <- combineLaneReads(c(expt1_analysis1[c("1", "2")],
expt1_analysis2[c("3", "4")]))
treatment1 <- combineLaneReads(expt2_analysis1[c("5", "7")])

(these would both be GenomeData objects). To combine them into a
GenomeDataList, simply do

myGenomeDataList <- GenomeDataList(list(control1 = control1,
treatment1 = treatment1))

or combining the two steps:

myGenomeDataList <-
    GenomeDataList(list(control1 =
combineLaneReads(c(expt1_analysis1[c("1", "2")],
expt1_analysis2[c("3", "4")])),
                        treatment1 =
combineLaneReads(expt2_analysis1[c("5", "7")])))


-Deepayan







More information about the Bioc-sig-sequencing mailing list