[R] interpreting profiling output

Tue Jul 27 04:43:54 CEST 2004

Luke Tierney <luke at stat.uiowa.edu> writes:

> On Sun, 25 Jul 2004, Peter Dalgaard wrote:
>
>> Kasper Daniel Hansen <k.hansen at biostat.ku.dk> writes:
>> 
>> > I have some trouble interpreting the output from profiling. I have
>> > read the help pages Rprof, summaryRprof and consult the R extensions
>> > manual, but I still have problems understanding the output.
>> > 
>> > Basically the output consist of self.time and total.time. I have the
>> > understanding that total.time is the time spent in a given function
>> > including any subcalls or child functions or whatever the technical
>> > term for that activity is. In contrasts self.time is the time spent in
>> > the function excluding subcalls.
>> > 
>> > Now, in my understanding basically everything in R is functions. I
>> > would then guess that for almost any function (except the "atomic
>> > ones") the self.time would be very small as it would spend most of its
>> > time calling other functions (again, since almost everything is a
>> > function). So how do R determine when a subfunction is called?
>> ...brutal snippage...
>> > I guess some of the answers would be clear if I had a firm grasp of
>> > the inner workings of R :)
>> 
>> In a word, yes... 
>> 
>> The profiling keeps track of R's *context stack* which is not quite
>> the same as function calls. Essentially, it only counts R functions that
>> are actually written in R, but not .Internal, .Primitive, etc. So
>> "self" counts the amount of time that a function was at the top of the
>> stack. This is done by a periodic poll which dumps out the context
>> stack at regular intervals. Seeing cases where the self percentages
>> don't add to 100% is, I believe, simply due to truncation of the tails
>> -- that is, there is a large number of different functions which each
>> are counted a few times, and these are not shown in the summary output.
>> 
>> [Sorry about the late reply, but I was out of town, and noone seems to
>> have answered this.]
>
> Just one additional clarification: There are two flavors of
> .Primitive, 'builtin' and 'special'.  You can tell which is which
> using typeof().  Builtins are things like arithmetic operations and
> specials include control constructs like `for`.  At present profiling
> does record builtins but not specials.  Recording specials is
> something we would like to do eventually but it requires more changes
> to the call stack (and dealing with more interections with those
> changes) so it hasn't happened yet. .Internal is a special, so no
> .Internal calls are recorded.
>
> luke
>
> -- 
> Luke Tierney
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>    Actuarial Science
> 241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

Thanks to both of you for considering my question. I still have some
problems however. Consider the nnet.default function which uses the .C
interface. As 
> typeof(.C)
[1] "builtin"
I would guess - from my current understanding - that the time spent in
the C functions of nn.default are added to its self.time. Now look at
the output

> prof02.out$by.self[1:10,]
                  self.time self.pct total.time total.pct
nnet.default          33.32     11.2    2186.48      92.1

So nnet.default only used 33.3 on its C calls, but 2186 in total
time. From this I would guess that nnet.default - in my case - spends
its majority of time setting up the fit and postprocessing it (as this
is done by R functions which does not add to its self time). The
actual fit (which is done by the C functions) only takes very little
time. 

Looking further down the table I see

                  self.time self.pct total.time total.pct
as.integer            22.28      7.5      30.92       1.3

as.interger is basically a .Internal call - which ought not to be
recorded. But why does it then have such a "high" self.time?

And finally, if a program (such as mine) have a high discrepancy
between self.time and sampling.time, does this imply that the far
majority of time is spent in .Primitives which are of type special?
-- 
Kasper Daniel Hansen, Research Assistant
Department of Biostatistics, University of Copenhagen