[Bioc-sig-seq] Reducing Solexa's export.txt in preparation for a ChIP-seq analysis.

Robert Gentleman rgentlem at fhcrc.org
Thu Mar 19 15:30:06 CET 2009


Hi Ivan,
  You are skipping over one part of the pipeline, that is using the ShortRead
package to read in your data and perform some sort of QA.  The output will be
the aligned reads. But you should take the diagnostics seriously, we find lots
of problems that need to be caught early so that the downstream analyses are
reasonable.

  As for how does one justify discarding duplicate reads, why not ask it the
other way around? How does one justify keeping them?  And in either case, one
thing to do is to try to decide if those duplicate reads represent biological
replicates (ie the same piece of DNA was selected twice), or if they are more
likely to represent PCR artifacts.  If the former, then I would keep them, if
the latter, then I would discard them.  For the example given, it is the latter,

 best wishes
   Robert


ig2ar-saf2 at yahoo.co.uk wrote:
> Hello,
> 
> In preparation to analyse my own ChIP-seq data, I am trying to follow the steps described in this sample workflow:
> 
> http://www.bioconductor.org/workshops/2008/SeattleNov08/ChIP-seq/workflow.pdf
> 
> The document starts by loading data that has been "reduced to a set of alignment start positions (including orientation)".
> 
> Can somebody elaborate on that a little bit or, ideally, show it with one example?
> 
> Also, as part of the reduction, the procedure "removed all duplicate reads and applied a quality score cutoff". The score cutoff is fine but how is removing duplicates justified?
> 
> Thank you,
> 
> Ivan
> 
> 
> 
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioc-sig-sequencing mailing list