[BioC] Motif search -- access to JASPAR, MotIV package, more TF-PWM relationships?
Paul Shannon
pshannon at fhcrc.org
Wed Apr 25 05:02:07 CEST 2012
Hi Julie,
FlyFactorSurvey looks great. Would that we had such a resource (curated, current, and growing) for all organisms!
A few questions, if I may:
1) What role with respect to FlyFactorSurvey do you picture us taking here at BioC? How can we help?
2) Your website (http://pgfe.umassmed.edu/TFDBS) recommends meme and TOMTOM for motif comparison. Do you use them yourself? If so, can you tell us about their strengths and weaknesses? How do they compare to clover? (http://zlab.bu.edu/clover/)
In that same spirit -- trying to find out more about this topic -- here are some more questions:
3) The JASPAR database seems to be mostly unchanged since 2009.
(http://jaspar.genereg.net/html/DOWNLOAD). Does anyone know their update policy?
4) Is TRANSFAC only for license holders?
5) Are there any other organism-specific gems like FlyFactorSurvey to be discovered out on the web?
Thanks!
- Paul
On Apr 24, 2012, at 3:16 PM, Zhu, Lihua (Julie) wrote:
> Paul,
>
> Thanks so much for the comprehensive summary of existing capability of Bioc
> and other resources for motif discovery and matching!
>
> Here is my response to your great initiative to collect use cases and open
> data resources.
>
> Here is an open data source for Drosophila which we developed:
> http://pgfe.umassmed.edu/TFDBS/
> http://nar.oxfordjournals.org/content/early/2010/11/19/nar.gkq858.full
>
> As you pointed out, there are several excellent Bioconductor packages
> available for the two common cases of motif problems, i.e., de nova motif
> discovery and motif matching to known motifs. It would be useful to have
> more motif databases available for motif comparison program such as MotIV.
> In addition, we use clover to search for known motifs in a given set of
> sequences.
>
> Many thanks for sharing your insights!
>
> Best regards,
>
> Julie
>
>
> On 4/24/12 3:02 PM, "Paul Shannon" <pshannon at fhcrc.org> wrote:
>
>> The recent flurry of interest in sequence motifs here on the bioc list
>> suggests to us that maybe we at Bioconductor could strengthen our
>> infrastructure for this kind of work. If this work interests you -- either as
>> a package creator, or as a package user -- please suggest ideas or use cases.
>> What do you need? I will collect and collate the responses. We hope to
>> identify places where Bioc can help out.
>>
>> For background: we already have a number of packages (rGADEM, MotIV, cosmo,
>> BCRANK, motifRG) which address, with different strengths, what I believe to be
>> the two aspects of the motif problem:
>>
>> 1) Detecting enriched motifs in DNA sequence, or in ChIP-seq data (rGADEM,
>> cosmo, motifRG, BCRANK)
>> 2) Predicting the sequence motifs which bind to these enriched motifs, and
>> what binding molecules they belong to (MotIV)
>>
>> In the past, a lot of sequence motif/binding work has addressed the search for
>> transcription factor binding sites and their cognate transcription factors.
>> miRNAs, phorphorylation and methylation all pose related problems. Is there
>> support which we can practically offer here as well?
>>
>> In addition to Bioc packages, there are of course many worthwhile websites and
>> external tools: JASPAR, meme, STAMP (and TRANSFAC, for those with a license).
>> Nooshin mentioned the arabidopsis-specific 'AthaMap' (http://www.athamap.de).
>> Are there other open-source data repositories like this for other organisms?
>> c.elegans, as Julie requested?
>>
>> Questions, suggestions, use cases and data sources are all welcome.
>>
>> Thanks!
>>
>> - Paul
>>
>>
>>
>>
>> On Apr 24, 2012, at 10:47 AM, Zhu, Lihua (Julie) wrote:
>>
>>> Eloi,
>>>
>>> I would like to use MotIV for a c.elegans dataset. What data source would
>>> you recommend for matchMotif? Many thanks for your help!
>>>
>>> Best regards,
>>>
>>> Julie
>>>
>>>
>>> On 4/24/12 1:28 PM, "Mercier Eloi" <emercier at chibi.ubc.ca> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am one of the developer of MotIV. I will be happy to help you if you
>>>> have any question regarding the package.
>>>>
>>>> First, I want to mention that in the Plos One paper, we used PICS,
>>>> rGADEM and MotIV as a pipeline but MotIV can be use as a stand alone.
>>>> Some of the advanced functions won't be available though.
>>>>
>>>> Since the PWMs in MotIV correspond to human TF, you may have to use your
>>>> own list of PWMs. What MotIV needs is a simple list of matrices
>>>> (head(jaspar) to view the format).
>>>> Jaspar's PWMs can be easily downloaded but it seems it only contains ~20
>>>> motifs. On the other hand, AthaMap has more motifs but I did not manage
>>>> to find an easy way to get them. Another place to look at is the AGRIS
>>>> website (http://arabidopsis.med.ohio-state.edu/downloads.html).
>>>>
>>>> If you're only interested by the identification of the motifs and do not
>>>> want to do further analysis with R, I recommend you to look at
>>>> http://www.benoslab.pitt.edu/stamp for the identification of your motifs.
>>>>
>>>> Regards,
>>>>
>>>> Eloi Mercier
>>>>
>>>>
>>>> On 12-04-24 07:36 AM, nooshin wrote:
>>>>> Thanks a lot for your suggestion. I will for sure have a look and inform
>>>>> you.
>>>>> Bests,
>>>>> Nooshin
>>>>>
>>>>>
>>>>> On 04/24/2012 04:15 PM, Tim Triche, Jr. wrote:
>>>>>> Ah, I see. GSL is a useful library to have installed regardless.
>>>>>> Hope things work out. I found your exchanges with Paul to be useful
>>>>>> reading, but obviously I was not reading closely enough, since Paul
>>>>>> started off his code sample with biocLite('MotIV'). Oops :-o
>>>>>>
>>>>>> Here is a paper that I found interesting, which does go into some
>>>>>> detail towards a "bulk" approach, from Gottardo's group:
>>>>>>
>>>>>> http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0016432
>>>>
>>>>>> Perhaps it will be useful to you as well, would be curious to hear if so.
>>>>>>
>>>>>> --t
>>>>>>
>>>>>> On Tue, Apr 24, 2012 at 7:00 AM, nooshin<n_omranian at yahoo.com
>>>>>> <mailto:n_omranian at yahoo.com>> wrote:
>>>>>>
>>>>>>
>>>>>> Thanks, it's been already solved, it needs GSL package, which is a
>>>>>> bit problematic, but I solved it already.
>>>>>>
>>>>>> But it does include only 5 matrices (in the webpage) for
>>>>>> arabidopsis and in the package also!
>>>>>> I'm downloading manually from AthaMap!
>>>>>>
>>>>>> Thanks again and keep waiting for 'bulk' approach.
>>>>>>
>>>>>> Bests,
>>>>>> Nooshin
>>>>>>
>>>>>>
>>>>>> On 04/24/2012 03:16 PM, Tim Triche, Jr. wrote:
>>>>>>> source("http://bioconductor.org/biocLite.R")
>>>>>>> biocLite("MotIV")
>>>>>>>
>>>>>>> ought to do the trick for you
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 24, 2012 at 1:01 AM, nooshin<n_omranian at yahoo.com
>>>>>>> <mailto:n_omranian at yahoo.com>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Paul,
>>>>>>>
>>>>>>> Thanks a lot.
>>>>>>> I forgot to include bioc, since I only replied to you (no to
>>>>>>> all).
>>>>>>>
>>>>>>> I can"t install MotIV package to check. I checked in google but I
>>>>>>> couldn't find any solution! Do you have any suggestion for
>>>>>>> installing
>>>>>>> this package?
>>>>>>>
>>>>>>> Bests,
>>>>>>> Nooshin
>>>>>>>
>>>>>>> On 04/23/2012 06:35 PM, Paul Shannon wrote:
>>>>>>>> (redirecting this back to the Bioc list...)
>>>>>>>>
>>>>>>>> Hi Nooshin,
>>>>>>>>
>>>>>>>> The 'bulk' approach is not quite so ready as I predicted.
>>>>>>> I might have something by the end of the week.
>>>>>>>>
>>>>>>>> As for mapping between PWMs and TFs, I have most often done
>>>>>>> this with 'tom-tom' from the meme website.
>>>>>>>>
>>>>>>>> But I just discovered what looks like a good -- maybe
>>>>>>> better -- approach: the Bioconductor MotIV package, which
>>>>>>> includes a 2010 version of jasper.
>>>>>>>> Try this:
>>>>>>>>
>>>>>>>> source("http://bioconductor.org/biocLite.R")
>>>>>>>>
>>>>>>>> biocLite ('MotIV')
>>>>>>>> library (MotIV);
>>>>>>>> browseVignettes ('MotIV')
>>>>>>>>
>>>>>>>> The jaspar data in this package has 130 TF-PWM mappings,
>>>>>>> which appear to be human. More must be known, and publicly
>>>>>>> available. The JASPAR website has a 'JASPAR CORE Plantae'
>>>>>>> data set that
>>>>>>>> - is probably what you are interested in
>>>>>>>> - might be downloadable, and convertible to the form
>>>>>>> MotIV wants.
>>>>>>>>
>>>>>>>> Perhaps other readers of the list have other suggestions.
>>>>>>>>
>>>>>>>> If you have any questions on this, please include 'BioC' in
>>>>>>> your reply, so that we can all get better at this!
>>>>>>>>
>>>>>>>> - Paul
>>>>>>>>
>>>>>>>>
>>>>>>>> On Apr 23, 2012, at 6:53 AM, nooshin wrote:
>>>>>>>>
>>>>>>>>> Hi Paul,
>>>>>>>>>
>>>>>>>>> Many thanks for your comprehensive information and code!
>>>>>>>>> I have a question regarding to extract of PWMs. How and
>>>>>>> where I can download these matrices for all TFs that PWM is
>>>>>>> available for them? I need it only for Arabidopsis thaliana.
>>>>>>>>> Is there any package in R which I can give the TF and
>>>>>>> receive the PWM for it? Or any online database which I can
>>>>>>> download from it? I have a big problem since Friday to find
>>>>>>> out these matrices for different TFs of A.th. That would be
>>>>>>> so great if you can help me to get these matrices.
>>>>>>>>>
>>>>>>>>>> If you want to do this in bulk, Herve' has some lovely
>>>>>>> code to make that efficient.
>>>>>>>>> Also can I have this? :)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks a lot in advance.
>>>>>>>>> Best regards,
>>>>>>>>> Nooshin
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> *TODAY*/(Beta) /*.*Powered by Yahoo!
>>>>>>>
>>>>>>> Armored catfish wreak havoc in U.S. South
>>>>>>>
>>>>>>> <http://news.yahoo.com/blogs/sideshow/armored-catfish-wreaking-havoc-sout
>>>>>>> h-
>>>>>>> florida-lakes-182812663.html;_ylc=X3oDMTFia2oyNjZoBF9TAzk1NDAxMDAyNwRwa2c
>>>>>>> Da
>>>>>>> WQtMjIzODM5NARzeWlkA2RfZWNoMGQ4MGQ-#more-4190>
>>>>>>>
>>>>>>> Privacy Policy
>>>>>>> <http://info.yahoo.com/privacy/us/yahoo/webbeacons/details.html>
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> /A model is a lie that helps you see the truth./
>>>>>>> /
>>>>>>> /
>>>>>>> Howard Skipper
>>>>>>> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> /A model is a lie that helps you see the truth./
>>>>>> /
>>>>>> /
>>>>>> Howard Skipper
>>>>>> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
More information about the Bioconductor
mailing list