[R] Conditional Weighted Average (ddply or any other function)
Punit Anand
anandpunit at gmail.com
Fri Mar 1 20:33:17 CET 2013
oops - it should be MKT, I have been playing with a number of data
sets simultaneously.
ddply (dataread , .(Sector, FISCALYEAR), summarise,WROE=wavg(ROE, MKT)))
On Fri, Mar 1, 2013 at 2:23 PM, John Kane <jrkrideau at inbox.com> wrote:
> Okay I got the data but you seem to have an undefined variable in wavg.
> You write :
> ddply (dataread , .(Sector, FISCALYEAR), summarise,WROE=wavg(ROE, MKTCAP)))
>
> There is no MKTCAP in the data.frame. Also there is one too many ) in the equation: I think you mean :
> ddply (dataread , .(Sector, FISCALYEAR), summarise,WROE=wavg(ROE, MKTCAP))
>
> Have you left out a equation that calculates MKTCAP?
>
> John Kane
> Kingston ON Canada
>
>
>> -----Original Message-----
>> From: anandpunit at gmail.com
>> Sent: Fri, 1 Mar 2013 13:53:44 -0500
>> To: jrkrideau at inbox.com
>> Subject: Re: [R] Conditional Weighted Average (ddply or any other
>> function)
>>
>> Hi John,
>>
>> The sample size is huge involving 10,000 + firms. I have put a
>> representative sample using dput ( Name, ticker and country have been
>> changed so that firms cannot be identified due to proprietary data
>> set, also EPS is not required and removed from the dataset)
>>
>> structure(list(NAME = structure(c(8L, 8L, 8L, 8L, 8L, 8L, 7L,
>> 7L, 7L, 7L, 7L, 7L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
>> 6L, 6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CCC", "CTAX",
>> "INN", "NOB", "SH", "SZ", "WASH", "WILLSON"), class = "factor"),
>> Ticker = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L,
>> 8L, 8L, 8L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L,
>> 6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CC13",
>> "CT56", "INN12", "NB12", "SH12", "SZ12", "W12", "W15"), class =
>> "factor"),
>> Industry = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
>> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
>> 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L,
>> 5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Commercial &
>> Professional Serv",
>> "Energy", "Media", "Retail", "Transportation"), class = "factor"),
>> Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Consumer
>> Discretionary",
>> "Energy", "Industrials"), class = "factor"), Country =
>> structure(c(4L,
>> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L,
>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 2L, 2L), .Label = c("Brazil", "China", "India", "UK"), class =
>> "factor"),
>> FISCALYEAR = structure(c(3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L,
>> 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 3L, 2L, 1L, 4L,
>> 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L,
>> 2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L), .Label = c("FY-1",
>> "FY-2", "FY-3", "FY0", "FY1", "FY2"), class = "factor"),
>> ROE = c(0.026, 0.0656, 0.1621, 0.1885, 0.1968, 0.2126, 0.0207,
>> 0.0319, 0.0963, 0.0431, 0.066, 0.066, 0.0707, 0.0797, 0.0781,
>> 0.078, 0.098, 0.126, 0.0352, 0.2625, 0.3714, 0.2929, 0.3133,
>> 0.2509, 0.2398, 0.2779, 0.1109, 0.0509, 0.069, 0.1017, 0.1298,
>> 0.5842, 0.3953, 0.4429, 0.3616, 0.26, 0.2, 0.4472, 0.2912,
>> 0.21, 0.2849, 0.3553, 0.4347, 0.3289, 0.3846, 0.2643, 0.0458,
>> 0.1265, 0.28), MKT = c(2919236084, 836858582, 2015182617,
>> 3399344971, 4324821777, 4324821777, 7619453125, 3579844727,
>> 4132238281, 3712239990, 2879757813, 2879757813, 1525237793,
>> 700357605, 1814942993, 1858225342, 1242890503, 1242890503,
>> 1879700000, 557093400, 224900300, 1634700000, 1443200000,
>> 3582664735, 3582664735, 5830366211, 10660833984, 9024061523,
>> 7628660645, 9154108398, 9154108398, 7064532227, 1804380005,
>> 6331067871, 10445639648, 9153587891, 9153587891, 6231200000,
>> 4.078e+09, 10107500000, 12460300000, 17800051556, 17800051556,
>> 513478700, 260993500, 882575400, 1.151e+09, 855938413, 855938413
>> )), .Names = c("NAME", "Ticker", "Industry", "Sector", "Country",
>> "FISCALYEAR", "ROE", "MKT"), class = "data.frame", row.names = c(NA,
>> -49L))
>>
>> Thanks,
>> Punit
>>
>>
>>
>> On Fri, Mar 1, 2013 at 12:51 PM, John Kane <jrkrideau at inbox.com> wrote:
>>> See below
>>>
>>>
>>>> -----Original Message-----
>>>> From: anandpunit at gmail.com
>>>> Sent: Fri, 1 Mar 2013 12:36:53 -0500
>>>> To: jrkrideau at inbox.com
>>>> Subject: Re: [R] Conditional Weighted Average (ddply or any other
>>>> function)
>>>>
>>>> Hi John,
>>>>
>>>> I was using symbols, Column ROE, EPS, MKTCAP are numeric, Name,
>>>> Ticker, Sector, Country, FISCALYEAR or Year are character strings.
>>>>
>>>> and column "Year" is referring to "FISCALYEAR"
>>>>
>>> Definitely a no-no in R-help. :) We really need some representative
>>> sample data to play with. See
>>> https://github.com/hadley/devtools/wiki/Reproducibility for some general
>>> pointers on how to compose a good question. The fact that you included
>>> the code you are using was excellent but without some data it is rather
>>> useless.
>>>
>>> The easiest way to supply data is to use the dput() function. Example
>>> with your file named "testfile":
>>> dput(testfile)
>>> Then copy the output and paste into your email. This is what I did with
>>> your data that I pasted into my email . I added the dat1 <- to it.
>>>
>>> For large data sets, you can just supply a representative sample.
>>> Usually, dput(head(testfile, 100)) will be sufficient.
>>>
>>> I hope this is of some help.
>>>
>>>
>>>>
>>>> On Fri, Mar 1, 2013 at 12:31 PM, John Kane <jrkrideau at inbox.com> wrote:
>>>>> It is not at all clear what you are doing. You state that the data
>>>>> set
>>>>> you are using is what I have called dat1 : see dput form below.
>>>>>
>>>>> As far as I can see there is no numerical value in there.
>>>>>
>>>>> ##===========data set in dput form================#
>>>>> dat1 <- structure(list(Name = c("N1", "N1", "N1", "N1", "N1", "N1",
>>>>> "N1",
>>>>> "N2", "N2", "N2", "N2", "N2", "N2", "N2"), Ticker = c("T1",
>>>>> "T1",
>>>>> "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2", "T2", "T2",
>>>>> "T2",
>>>>> "T2"), Sector = c("S1", "S1", "S1", "S1", "S1", "S1", "S1",
>>>>> "S2",
>>>>> "S2", "S2", "S2", "S2", "S2", "S2"), Industry = c("I1",
>>>>> "I1",
>>>>> "I1", "I1", "I1", "I1", "I1", "I2", "I2", "I2", "I2", "I2", "I2",
>>>>> "I2"), Country = c("C1", "C1", "C1", "C1", "C1", "C1", "C1",
>>>>> "C2", "C2", "C2", "C2", "C2", "C2", "C2"), Year = c("FY-4",
>>>>> "FY-3",
>>>>> "FY-2", "FY-1", "FY0", "FY1", "FY2", "FY-4", "FY-3", "FY-2",
>>>>> "FY-2", "FY0", "FY2", "FY2"), ROE = c("ROE11", "ROE12",
>>>>> "ROE13",
>>>>> "ROE14", "ROE15", "ROE16", "ROE17", "ROE21", "ROE22",
>>>>> "ROE23",
>>>>> "ROE24", "ROE25", "ROE26", "ROE27"), EPS = c("EPS11",
>>>>> "EPS12",
>>>>> "EPS13", "EPS14", "EPS15", "EPS16", "EPS17", "EPS21",
>>>>> "EPS22",
>>>>> "EPS23", "EPS24", "EPS25", "EPS26", "EPS27"), MKTCAP =
>>>>> c("MKT11",
>>>>> "MKT12", "MKT13", "MKT14", "MKT15", "MKT16", "MKT17",
>>>>> "MKT21",
>>>>> "MKT22", "MKT23", "MKT24", "MKT25", "MKT26", "MKT27")),
>>>>> .Names
>>>>> = c("Name",
>>>>> "Ticker", "Sector", "Industry", "Country", "Year", "ROE",
>>>>> "EPS",
>>>>> "MKTCAP"), class = "data.frame", row.names = c(NA, -14L))
>>>>> ## =================end of dataset==================#
>>>>>
>>>>> There is no FISCALYEAR variable that you specifed below
>>>>>
>>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise, > WROE=wavg(ROE,
>>>>>> MKTCAP)))
>>>>>
>>>>> I think we need a bit more information.
>>>>>
>>>>> John Kane
>>>>> Kingston ON Canada
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: anandpunit at gmail.com
>>>>>> Sent: Fri, 1 Mar 2013 11:01:42 -0500
>>>>>> To: r-help at r-project.org
>>>>>> Subject: [R] Conditional Weighted Average (ddply or any other
>>>>>> function)
>>>>>>
>>>>>> Hello R community,
>>>>>>
>>>>>> I am computing weighted average statistic by using ddply function:
>>>>>>
>>>>>> My data set is:
>>>>>> N1 T1 S1 I1 C1 FY-4 ROE11 EPS11 MKT11
>>>>>> N1 T1 S1 I1 C1 FY-3 ROE12 EPS12 MKT12
>>>>>> N1 T1 S1 I1 C1 FY-2 ROE13 EPS13 MKT13
>>>>>> N1 T1 S1 I1 C1 FY-1 ROE14 EPS14 MKT14
>>>>>> N1 T1 S1 I1 C1 FY0 ROE15 EPS15 MKT15
>>>>>> N1 T1 S1 I1 C1 FY1 ROE16 EPS16 MKT16
>>>>>> N1 T1 S1 I1 C1 FY2 ROE17 EPS17 MKT17
>>>>>> N2 T2 S2 I2 C2 FY-4 ROE21 EPS21 MKT21
>>>>>> N2 T2 S2 I2 C2 FY-3 ROE22 EPS22 MKT22
>>>>>> N2 T2 S2 I2 C2 FY-2 ROE23 EPS23 MKT23
>>>>>> N2 T2 S2 I2 C2 FY-2 ROE24 EPS24 MKT24
>>>>>> N2 T2 S2 I2 C2 FY0 ROE25 EPS25 MKT25
>>>>>> N2 T2 S2 I2 C2 FY2 ROE26 EPS26 MKT26
>>>>>> N2 T2 S2 I2 C2 FY2 ROE27 EPS27 MKT27
>>>>>>
>>>>>> with colnames:
>>>>>> (Name,Ticker,Sector,Industry,Country,Year,ROE,EPS,MKTCAP)
>>>>>>
>>>>>> I want to compute
>>>>>> 1) Weighted ROE based on Sector and Fiscal Year.
>>>>>> For firm N1 of Sector S1 and Fiscalyear FY-3 weight is
>>>>>> MKT1 / SUM(MKT, where Sector = S1, Fiscalyear FY-3)
>>>>>>
>>>>>> 2) Weighted ROE based on Country and Fiscal Year.
>>>>>> For firm N1 of Country C1 and Fiscalyear FY-3 weight is
>>>>>> MKT1 / SUM(MKT, where Country = C1, Fiscalyear FY-3)
>>>>>>
>>>>>> 3) Weighted ROE based on Country, Sector and Fiscal Year.
>>>>>> For firm N1 of Country C1, Sector S1 and Fiscalyear FY-3
>>>>>> weight is MKT1 / SUM(MKT, where Country = C1, Sector = S1, Fiscalyear
>>>>>> FY-3)
>>>>>>
>>>>>> 4) Weighted ROE based on Country, Industry and Fiscal Year.
>>>>>> For firm N1 of Country C1, Industry I1 and Fiscalyear FY-3
>>>>>> weight is MKT1 / SUM(MKT, where Country = C1, Industry = I1,
>>>>>> Fiscalyear
>>>>>> FY-3)
>>>>>>
>>>>>>
>>>>>> I tried using ddply function:
>>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise, WROE=wavg(ROE,
>>>>>> MKTCAP)))
>>>>>>
>>>>>> where wavg <- function(x, wt) x %*% wt/sum(wt)
>>>>>> but this doesn't give me the right answer.
>>>>>>
>>>>>> I could try subseting the data into different sectors and compute the
>>>>>> weighted average which doesn't look like an elegant solution and
>>>>>> would
>>>>>> defeat the purpose of ddply
>>>>>>
>>>>>> I coudn't think of properly using melt and cast functions to solve
>>>>>> this issue. Any help will be highly appreciated.
>>>>>>
>>>>>> Thanks and Regards,
>>>>>> Punit
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>> ____________________________________________________________
>>>>> GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at
>>>>> http://www.inbox.com/smileys
>>>>> Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™
>>>>> and most webmails
>>>>>
>>>>>
>>>
>>> ____________________________________________________________
>>> FREE ONLINE PHOTOSHARING - Share your photos online with your friends
>>> and family!
>>> Visit http://www.inbox.com/photosharing to find out more!
>>>
>>>
>
> ____________________________________________________________
> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!
> Check it out at http://www.inbox.com/marineaquarium
>
>
More information about the R-help
mailing list