[BioC] snpStats, read.long, alleles in two columns
Liz Hare
doggene at earthlink.net
Wed Mar 7 16:16:25 CET 2012
Hello,
I am trying to read an Illumina final format .txt file (tab-delimited)
into snpStats. The file contains 4 columns: snp, sample, allele 1, and
allele 2. Some sample lines:
BICF2G630100019 04-0677/J279 C C
BICF2G630100032 04-0677/J279 T T
BICF2G630100034 04-0677/J279 G G
BICF2G630100043 04-0677/J279 A A
BICF2G630100054 04-0677/J279 T T
BICF2G630100063 04-0677/J279 T C
BICF2G630100075 04-0677/J279 T T
BICF2G63010009 04-0677/J279 G G
BICF2G630100090 04-0677/J279 C C
I can't figure out from the documentation or vignette on data input how
to specify that the alleles are in two columns.
This doesn't work:
> CanineHD <- read.long(file="filename",
+ fields=c(snp=1, sample=2, genotype=3, genotype=4),
+ verbose=TRUE)
Data to be read from the file filename
No confidence thresholds specified
Genotype read as a single field of two characters (which specify the
alleles)
Initial scan of file
First sample: 04-0677/J279
First snp: BICF2G630100019
Last snp: YNp1-608
Last sample: 10-1160
96x173662 matrix to be read
Reading genotypes from file
20% 40% 60% 80% 100%
.........|.........|.........|.........|.........|
-Error in read.long(file = "filename", :
at line 1: C (expecting a 2-character genotype field)
In addition: Warning message:
closing unused connection 3 (filename)
So I tried:
> CanineHD <- read.long(file="filename",
+ fields=c(snp=1, sample=2, genotype=3),
+ gcodes="\t", codes="nucleotide", verbose=TRUE)
Error in read.long(file = "filename", :
unused argument(s) (codes = "nucleotide")
> CanineHD <- read.long(file="filename",
+ fields=c(snp=1, sample=2, genotype=3),
+ split="\t", verbose=TRUE)
Data to be read from the file filename
No confidence thresholds specified
Genotype read as a single field of two characters (which specify the
alleles)
Initial scan of file
First sample: 04-0677/J279
First snp: BICF2G630100019
Last snp: YNp1-608
Last sample: 10-1160
96x173662 matrix to be read
Reading genotypes from file
20% 40% 60% 80% 100%
.........|.........|.........|.........|.........|
-Error in read.long(file = "filename", :
at line 1: C (expecting a 2-character genotype field)
In addition: Warning message:
closing unused connection 12 (filename)
Is there a keyword for alleles rather than genotypes? I tried
substituting the word 'allele' but didn't get anywhere. I suspect I'm
not understanding something in the Details section of the documentation.
Thanks,
Liz
--
Liz Hare PhD
Dog Genetics LLC
doggene at earthlink.net
http://www.doggenetics.com
More information about the Bioconductor
mailing list