[BioC] pairwiseAlignment of PDB files to canonical protein structure
Gregory Ryslik
rsaber at comcast.net
Sun Jun 3 21:15:19 CEST 2012
Hi Everyone,
I am new to this list so please forgive me if I miss something. Over the
past few weeks, I have been attempting to match the positions provided
by the PDB to the canonical protein structure. For instance, if a pdb
file puts a CA Leucine residue at position 5, that does not mean that
position 5 in the canonical protein structure (as shown by uniprot or
other databases) is a Leucine. That is because the PDB numbering is
different. Using CIF files from the PDB database I am more or less able
to reconstruct the canonical numbering for about 70% of all files.
However, I would like to also align the residues I pull from the CIF
file with the canonical structure for the structures that my algorithm
fails to process. To do this, I am using the pairwiseAlignment function
in the Biostrings package. This function seems to work very well,
however, I am new to alignment and am thus wondering what are the best
parameters to use for my problem?
Suppose I have the canonical protein sequence in "canonical.protein" and
the cif sequnce that I pull from the PDB database in
"protein.extracted". I then run "pairwiseAlignment(pattern =
canonical.protein, subject=protein.extracted)", and use the default
settings for the other parameters. If someone has done something
similar, can they point me if there parameters that are optimal?
Especially for things like gapOpening, gapExtension, etc...
Thank you for your help,
Greg
More information about the Bioconductor
mailing list