[R] How to replace a column in a data frame with another one with a different size
R. Michael Weylandt
michael.weylandt at gmail.com
Sun Jul 8 19:52:52 CEST 2012
Your
On Sun, Jul 8, 2012 at 12:22 PM, Stathis Kamperis <ekamperi at gmail.com> wrote:
> 2012/7/8 Michael Weylandt <michael.weylandt at gmail.com>:
>>
>>
>> On Jul 8, 2012, at 9:31 AM, Stathis Kamperis <ekamperi at gmail.com> wrote:
>>
>>> Hello everyone,
>>>
>>> I have a dataframe with 1 column and I'd like to replace that column
>>> with a moving average.
>>> Example:
>>>
>>>> library('zoo')
>>>> mydat <- seq_len(10)
>>>> mydat
>>> [1] 1 2 3 4 5 6 7 8 9 10
>>>> df <- data.frame("V1" = mydat)
>>>> df
>>> V1
>>> 1 1
>>> 2 2
>>> 3 3
>>> 4 4
>>> 5 5
>>> 6 6
>>> 7 7
>>> 8 8
>>> 9 9
>>> 10 10
>>>> df[df$V1 <- rollapply(df$V1, 3, mean)]
>>> Error in `$<-.data.frame`(`*tmp*`, "V1", value = c(2, 3, 4, 5, 6, 7, 8, :
>>> replacement has 8 rows, data has 10
>>>>
>>>
>>
>> I'm not sure you need the outer df[...] -- I think you just want
>>
>> df$V1 <- rollapply(df$V1,3,mean)
>>
>> However, this will still give you the error message you're seeing because rollapply() only returns 8 values here (you don't get the "endpoints" by default). To get the right number of rows, you want
>>
>> rollapply(df$V1, 3, mean, fill = NA) # Change NA if desired
>>
>> which will put NA's on each end and give you a length 10 result, as needed.
>>
>
> Thanks Michael (and arun@)!
>
> If I would do that, then (in my particular case), I'd need to
> eliminate NA's, with something like:
> df$V1 <- df$V1[!is.na(df$V1)]
>
> which would still fail with the same error message :-P
You're getting tripped up (again) by trying to sub-assign something
that's too small.
df is a rectangular array of data: on the RHS of that expression, you
are selecting out a subset of it of say 8 rows and telling R to
replace the 10-row V1 column with those 8 elements. This cannot be
done with the fixed rectangular structure and hence the error message.
What you want to do is something like this:
df[!is.na(df$V1), ]
Let's walk through that
df$V1 -- take the V1 column of df
is.na() -- get a logical vector saying where NAs are
!is.na() -- identify the rows where there _aren't_ NAs
df[ !is.na(), ] -- (the important one) take the rows of df (all
columns) where there aren't NAs
What you might be wanting to do is
df <- df[!is.na(df$V1), ]
This is much better than what you are trying to do (working on the
whole array at a time and trusting R to keep it all together than
trying to manipulate slices individually)
But even more idiomatic would be
complete.cases(df)
Take a look at some introductory material and try to wrap your head
around indexing rows and columns together again: it's a fantastic
paradigm and will be of much more use to you long run than trying to
work on individual columns for subsetting/data-cleaning.
Best,
Michael
>
> Regards,
> Stathis
>
>> Best,
>> Michael
>>
>>> I could use a temporary variable to store the results of rollapply()
>>> and then reconstruct the data frame, but I was wondering if there is a
>>> one-liner that can achieve the same thing.
>>>
>>> Best regards,
>>> Stathis
>>>
>>> P.S. If you don't mind, cc me at your reply because I'm not subscribed
>>> to the list (but I will check the archive anyway).
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list