Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
| Download this listing! | ||
Linux Magazine Column 75 (Nov 2005)
[Suggested title: ``Profiling in Template Toolkit via overriding'']
I've talked a lot about Andy Wardley's Template Toolkit in this column
over the past year or so. This isn't surprising, since I've been
doing quite a bit of work lately for my clients (such as
www.geekcruises.com) building web sites using Template Toolkit. Of
course, ``work'' means ``solving problems'', and some of the problems
having me a bit stumped until that ``a-ha!'' moment comes along.
For example, I recently completed a major phase of the deliverable web site, at least in terms of functionality. But as Neil (the customer) started playing with it, one of his first comments was ``this part is too slow, and I spend all my time on this screen''. That wasn't a show-stopping comment, because I had deliberately chosen to program the application with flexibility and ease-of-implementation first, knowing that I'd need to make a performance pass later to optimize anything that was too slow. Failing to do so would have put me in the realm of ``premature optimization'': making decisions about performance without actually understanding where the bottlenecks might eventually be.
The next step to keep Neil paying my paycheck was to make that
particular operation faster. ``No problem'', I said to Neil, ``because
I'll just trot out Devel::DProf to find the hotspots and slow
spots, and optimize those!'' But, alas, adding Devel::DProf caused
the program to core dump. Ugh. Even moving the code to a different
architecture (from my MacOSX development platform to the OpenBSD
deployment platform) still caused it to coredump. This was not
looking good.
After fretting a bit, and asking online in all the usual places for
some help, I decided that I still needed to know where the program was
spending its time. I suspected that 90% or more of the time was in
the ``rendering'' phase of CGI::Prototype, so I could accept a set of
timings on a per-template basis as a first-order diagnostic tool.
Template Toolkit doesn't have any profiling tools ``out of the
box''. But this is Perl, and thus the code is open. Also, Andy
designed Template to be easily subclassed to alter portions of the
behavior. I spent about an hour tracing the flow of a template being
processed, and decided that my per-template statistics could be
obtained by noting the execution time of each invocation of the
process method of Template::Context. The first parameter to
process seemed to be the same kind of value as was handed in as the
first parameter of the process method of the top-level Template
object, or the template handed to the PROCESS or INCLUDE or WRAPPER
Template directives. Thus, I could use that value to aggregate results
over different invocations of the same template.
My strategy was simple. I'd create a subclass of
Template::Context, called My::Template::Context. I'd override
the process method to ``start the timer'' by noting the wall clock
time (via time), and the four CPU times (user CPU, system CPU,
child user CPU, and child system CPU) accumulated thus far in the
process (via times). Then I'd call the ``real'' process method to
do the actual work, capturing its return value. When the method
returned, I'd subtract the initial timer values from the new values,
and that would give me the amount of time spent processing this
template. Finally, I'd tell Template to use my new class instead
of the original class by setting $Template::Config::CONTEXT to my
class name (another nice configuration hook provided by Andy).
Or at least it looked like it'd be simple. But then I realized that
what I really wanted was time spent only in that template, not in
all the templates that it calls. This is important to me because I'm
also interested in the immediate cost of all the callbacks made by a
template, which I could measure only indirectly. (To measure those
more directly, I would have had to perform a similar trick with
Template::Stash, and that looked even hairier than this.)
So, I decided that I'd simply maintain a stack of calls into
process, and on each return, compute the deltas for this call, but
also subtract that value from each of the parent calls. After all, if
the time was being charged to this invocation, it's no longer
affecting the parent. To do that, I push forward the starting time
counters of every parent call, pretending they started later. It all
works out in the end, although I got the math wrong at least twice
while I was hacking.
And the result of that is in [listing one, below], which I will now describe in my traditional manner.
Line 1 defines the beginning of my module. Obviously, in a
mod_perl enviroment, the name My:: is not a good choice. But
we're still in CGI at this point, so I didn't have trouble with this
for now.
Line 2 both pulls in the Template::Context module, and declares the
current module to be a derived class by setting @ISA automatically.
Line 4 declares the @stack of times for each nested invocation.
Initially, the empty stack shows that we're at the top level. As each
new invocation is entered, a new arrayref is pushed onto the stack.
When the stack again becomes empty, we're at the top-level, and it's
time to dump the results (simply to STDERR and thus into my web
log).
Line 5 accumulates the results. The key is the template name, with the value being an arrayref of the accumulated sums of the times. This hash is reset as each top-level is reached.
Lines 7 to 47 define the process method that we're overriding
from Template::Context.
Line 8 pulls out the $self object from the arguments.
Lines 10 to 13 attempt to figure out the name for this template. The
first parameter to process can either be a template name, or a
Template::Document if it's a BLOCK inside another template, or even
a coderef. Luckily, I didn't have any coderefs in my application, but
I still had to do something sensible with BLOCK values. So, line
11 sees if this is an object that inherits from Template::Document.
If it does, then we can ask it its name with the name method.
Another way that might have worked just as well would be:
my $template = $_[0];
eval { $template = $template->name };
This simpler version tries the name method, and if it doesn't work,
leaves the template name alone. Again, I coded what worked first, and
moved on.
Line 15 captures the initial wall clock time and the CPU times into a 5 element list, then pushes that as an arrayref onto the stack. This starts our timer.
Lines 17 to 19 call SUPER::process on the arguments to do the job.
However, I was unsure of the invocation context of process, so I
had to be careful to pass that same context on to the super call. If
we're in a list context, my @return ... is already good. If not,
then the scalar prefix ensures the proper context. I'm not paying
attention to the void context here, as it would have complicated the
code even further.
Now that we've done the real work, it's time to figure out how long
that took. Line 21 pops the stack, retrieving the five starting
values. These might have been altered by nested calls to process:
we'll get to that in a moment.
Line 22 subtracts the old values from the current values, using
a crazy little idiom I invented a while back. The map executes the
block five times. Each value is subtracted from the next element
shifted from @delta_times, creating the delta value. But the
map accumulates these five deltas, which then get assigned... right
back into @delta_times. OK, slightly obfuscated, admittedly. But
I've used it enough times now that it begins to feel automatically
like the right thing to reach for.
Lines 23 through 28 process each of the five deltas (wall clock, and
the four CPU times). First, line 24 adds them into the appropriate
accumulator, keyed by the name in $template. Then, the loop from
line 25 to 27 pushes each of the ``start time'' counters forward by that
same amount in the stack, thus discounting that amount of time from
those accumulations. Trickiest part of the code, I'd say. But now
that I got it, it looks straightforward.
Line 29 accumulates a count of the calls, just so I can see if something was called once or a hundred times. In general, you worry about total CPU time used, not the per-usage CPU time, but having the call count might be useful to understand ``big O'' scaling issues to determine an asymptotic upper bound.
If we're at the top of the stack, it's time to show how much the hit
costs us. Lines 31 to 43 dump out the totals. The dump goes out to
STDERR so that I see it on my web error log, and I'm using
printf to try to make it at least slightly pretty. Line 42 clears
out the totals (the stack is already clear) so that a new invocation
starts it all over again.
Line 46 returns an appropriate value to the caller: if we're in a
scalar context, only $return[0] has the proper value. Otherwise,
we use the entire list.
Finally, line 49 tells Template that this is the kind of
Template::Context that we want in our program. Line 51 provides
the traditional true value for the require embedded in the invoking
use.
Speaking of which, using this module is now as simple as putting a few extra lines at the top of my CGI script:
#!/usr/bin/perl
use lib "/my/local/lib"; # location of this module use My::Template::Context; # turn on logging
use My::App; # CGI::Prototype My::App->activate;
Now, as this CGI program is invoked, I get extra lines in my web log showing exactly where my time is being spent. Yeay!
Within a few hours, I was able to cut the CPU time by over 70%, and Neil was happy, and continues to employ me to this day. Happy endings. Until next time, enjoy!
Listing
=1= package My::Template::Context;
=2= use base qw(Template::Context);
=3=
=4= my @stack;
=5= my %totals;
=6=
=7= sub process {
=8= my $self = shift;
=9=
=10= my $template = $_[0];
=11= if (UNIVERSAL::isa($template, "Template::Document")) {
=12= $template = $template->name || $template;
=13= }
=14=
=15= push @stack, [time, times];
=16=
=17= my @return = wantarray ?
=18= $self->SUPER::process(@_) :
=19= scalar $self->SUPER::process(@_);
=20=
=21= my @delta_times = @{pop @stack};
=22= @delta_times = map { $_ - shift @delta_times } time, times;
=23= for (0..$#delta_times) {
=24= $totals{$template}[$_] += $delta_times[$_];
=25= for my $parent (@stack) {
=26= $parent->[$_] += $delta_times[$_] if @stack; # parent adjust
=27= }
=28= }
=29= $totals{$template}[5] ++; # count of calls
=30=
=31= unless (@stack) {
=32= ## top level again, time to display results
=33= print STDERR "-- $template at ". localtime, ":\n";
=34= printf STDERR "%3s %3s %6s %6s %6s %6s %s\n",
=35= qw(cnt clk user sys cuser csys template);
=36= for my $template (sort keys %totals) {
=37= my @values = @{$totals{$template}};
=38= printf STDERR "%3d %3d %6.2f %6.2f %6.2f %6.2f %s\n",
=39= $values[5], @values[0..4], $template;
=40= }
=41= print STDERR "-- end\n";
=42= %totals = (); # clear out results
=43= }
=44=
=45= # return value from process:
=46= wantarray ? @return : $return[0];
=47= }
=48=
=49= $Template::Config::CONTEXT = __PACKAGE__;
=50=
=51= 1;
Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.
Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

