[R] Errors melt()ing data...
Neil Shephard
nshephard at gmail.com
Thu Feb 28 12:42:14 CET 2008
Hi,
I'm trying to melt() some data for subsequent cast()ing and am
encoutering errors.
The overall process requires a couple of casts()s and melt()s.
########Start Session 1##########
## I have the data in a (fully) melted format and can cast it fine...
> norm1[1:10,]
Pool SNP Sample.Name variable value
1 1 rs1045485 CA0092 Height.1 0.003488853
2 1 rs1045485 CA0142 Height.2 0.333274200
3 1 rs1045485 CO0007 Height.2 0.396250961
4 1 rs1045485 CA0047 Height.2 0.535686831
5 1 rs1045485 CO0149 Height.2 0.296611673
6 1 rs1045485 CA0106 Height.2 0.786115546
7 1 rs1045485 CO0191 Height.1 0.669268523
8 1 rs1045485 CA0097 Height.2 0.609603217
9 1 rs1045485 CA0076 Height.1 0.004257584
10 1 rs1045485 CO0017 Height.2 0.589261427
## This gets the data
> t.norm1 <- cast(norm1, Sample.Name + SNP + Pool ~ variable, sum)
> t.norm1[1:10,]
Sample.Name SNP Pool Height.1 Height.2
1 CA0001 rs1045485 1 0.003311454 0.4789782
2 CA0001 rs1045487 1 0.001818583 0.5089827
3 CA0001 rs11212570 1 0.006078444 0.4496129
4 CA0001 rs13010627 1 0.008753049 0.5424499
5 CA0001 rs13113 1 0.186821486 0.2294912
6 CA0001 rs13402616 1 0.012030235 0.4161610
7 CA0001 rs170548 1 0.002425579 0.3111907
8 CA0001 rs17503908 1 0.002179705 0.3063292
9 CA0001 rs1799794 1 0.003632984 0.5049848
10 CA0001 rs1799796 1 0.389774160 0.0000000
## I now melt it and cast again to the desired format
> t <- melt(t.norm1, id = c("Sample.Name", "SNP"))
> cast.height.norm1 <- cast(t, SNP ~ Sample.Name + variable, sum)
> cast.height.norm1[1:10,1:5]
SNP CA0001_Height.1 CA0001_Height.2 CA0002_Height.1 CA0002_Height.2
1 rs1045485 0.003311454 0.4789782 0.401218142 0.343031163
2 rs1045487 0.001818583 0.5089827 0.007329439 0.453102612
3 rs11212570 0.006078444 0.4496129 0.015164118 0.434320814
4 rs13010627 0.008753049 0.5424499 0.013440474 0.463863778
5 rs13113 0.186821486 0.2294912 0.224865477 0.272916077
6 rs13402616 0.012030235 0.4161610 0.191099755 0.285744704
7 rs170548 0.002425579 0.3111907 0.365986770 0.240187431
8 rs17503908 0.002179705 0.3063292 0.011100347 0.232259627
9 rs1799794 0.003632984 0.5049848 0.430635350 0.008364312
10 rs1799796 0.389774160 0.0000000 0.173564141 0.235928006
########Finish Session 1##########
This is the format that I'm aiming for and everythings worked fine.
However, I wish to derive two transformed variables (polar.1 and
polar.2) based on each row of t.norm1 and then melt() and cast() the
data into the same desired format.
########Start Session 2##########
## Now generate polar co-ordinates
t.norm1$polar.1 <- log10(sqrt(t.norm1$Height.1^2 + t.norm1$Height.2^2))
t.norm1$polar.2 <- atan((t.norm1$Height.2 / t.norm1$Height.1))
## And cast the polar data
> t <- melt(subset(t.norm1, select= c("Sample.Name", "SNP", "Pool", "polar.1", "polar.2")), id=c("Sample.Name", "SNP"))
Error in if (!missing(id.var) && !(id.var %in% varnames)) { :
missing value where TRUE/FALSE needed
> traceback()
4: melt_check(data, id.var, measure.var)
3: melt.data.frame(as.data.frame(data), id = attr(data, "idvars"))
2: melt.cast_df(subset(t.norm1, select = c("Sample.Name", "SNP",
"Pool", "polar.1", "polar.2")), id = c("Sample.Name", "SNP"),
measure = c("polar.1", "polar.2"))
1: melt(subset(t.norm1, select = c("Sample.Name", "SNP", "Pool",
"polar.1", "polar.2")), id = c("Sample.Name", "SNP"), measure =
c("polar.1",
"polar.2"))
########Finish Session 2##########
As far as I can tell the error is occurring within melt_check() where
there is a check to see if the id.var is missing and whether the
id.var exists within the data frames names, both of which are true
since the subset() call works fine on its own...
########Start Session 3##########
> test <- subset(t.norm1, select= c("Sample.Name", "SNP", "Pool", "polar.1", "polar.2"))
> names(test)
[1] "Sample.Name" "SNP" "Pool" "polar.1" "polar.2"
########Start Session 3##########
What I find particularly strange is that there isn't really any
difference between
########Session 1
> t <- melt(t.norm1, id = c("Sample.Name", "SNP"))
....and
########Session 2
t <- melt(subset(t.norm1, select= c("Sample.Name", "SNP", "Pool",
"polar.1", "polar.2")), id=c("Sample.Name", "SNP"))
..since I've done nothing to alter the "Sample.Name" and "SNP"
columns, all thats changing is the names of the two columns that are
the measure.var which in this instance is everything thats not defined
as being and id.var in the call to melt().
If anyone can provide any insight to what I'm doing wrong I'd be very grateful.
Thanks,
Neil
--
Email - nshephard at gmail.com / n.shephard at sheffield.ac.uk
More information about the R-help
mailing list