This is the second in a series of articles which form one Perl programmer's
response to the book, Design Patterns (also known as the Gang of Four
book or simply as GoF, because four authors wrote it).
People into patterns often talk about how knowing patterns makes describing
designs easier. The parenthetical comments in the last sentence show how
Perl takes this to new heights by including the patterns internally.
This article continues my treatment by considering patterns which rely
on data containers and/or code references (which are also called callbacks).
Before showing the patterns let me explain these terms.
I use data containers to mean any reference that holds a data structure.
Arrays and hashes are common data containers, but hashes of lists of hashes
storing things are more interesting. Careful use of these structure
containers can often eliminate the need for objects.
Here's a concrete example. Suppose I want a phone list. I might use
a container like this:
my $phone_list = {
'phil' => [
{ type => 'home', number => '555-0001' },
{ type => 'pager', number => '555-1000' },
],
'frank' => [
{ type => 'cell', number => '555-9012' },
{ type => 'pager', number => '555-5678' },
{ type => 'home', number => '555-1234' },
],
};
This container is housed in a hash. Its keys are names; its values are
phone numbers. The numbers are listed in the order the person would like
them used. (Call Frank on his cell phone first, then try his pager. If
all else fails, use his home phone.)
To use this structure I might do the following:
#!/usr/bin/perl
use strict; use warnings;
my $phone_list = {
'Phil' => [
{ type => 'home', number => '555-0001' },
{ type => 'pager', number => '555-1000' },
],
'Frank' => [
{ type => 'cell', number => '555-9012' },
{ type => 'pager', number => '555-5678' },
{ type => 'home', number => '555-1234' },
],
};
my $person = shift or die "usage: $0 person\n";
foreach my $number (@{$phone_list->{$person}}) {
print "$number->{type} $number->{number}\n";
}
Here, the user supplies the name of the person he or she wants to reach
as a command line argument which I store as $person. I then loop through
all the phone numbers for that person, printing the type and number.
Of course, in practice your data would live outside your script. The
example just shows what one data container can hold.
If you need to use a structure made of data nodes, you can often avoid
the need for a node object by using a data container instead. Object
Oriented programming proponents would probably want me to make an object
for each person. In that object they might even want me to store an
object for each phone type in some cumbersome list container. My advice:
don't give in to pedants. Even in Java, I can build a structure like
the one above (though not as easily). Doing so is often wise. Objects
work better in more complex situations.
A code reference is like any reference in Perl, but what it points to
is a subroutine you can call. For instance, I could write:
my $doubler = sub { return 2 * $_[0]; };
Then later in my program I would call that routine as:
my $doubled = &$doubler(5); # $doubled is now 10
This example is contrived. But it lets you see the basic syntax of code
references. If you assign a sub to a variable, you receive a code reference
by the grace of Perl. To call the sub stored in the reference put an
& in front of the variable which stores it. This is like we do for other
references, as in this standard hash walker:
foreach my $key (keys %$hash_reference) { ... }
The & is the sigil (or funny character) for subroutines, just like @
and % are the sigils for arrays and hashes.
Many patterns in GoF and outside it can be implemented well in Perl with
code references. Languages which don't provide code references are missing
an important type.
Having explained these tools, I'm ready to show you some patterns which
use them.
When you want to select from a series of choices for how something should
be done, you need a strategy scheme. For example, you might want to
sort based on a comparison function. Each time you sort, you should
be able to specify the order strategy.
Since Perl has code references, we can easily implement the strategy
pattern without bloating our code base with a proliferation of classes
whose sole purpose is to provide one function.
Here's an example with the built-in sort:
sort { lc($a) cmp lc($b) } @items
This sorts without regard to case. Notice how sort is receiving the
function directly in the call. Though we could do this for our own
functions, it is more common to take a reference to the function as
a required positional parameter.
Suppose, for example, that we want to list all files in the current
directory, or any of its subdirectories, with some property. There
are two pieces to this task: (1) Scan down the directory tree for
all the entries, and (2) Test each file to see if it meets the criterion.
Ideally we would like to separate these tasks so we can reuse them
independently (for instance scanning a directory tree is more common than
any particular criterion). We will make the criterion a strategy executed
by the directory scanner.
#!/usr/bin/perl
use strict; use warnings;
my @files = find_files(\&is_hidden, ".");
local $" = "\n";
print "@files\n";
sub is_hidden {
my $file = shift;
$file =~ s!.*/!!;
return 0 if ($file =~ /^\./);
return 1;
}
sub find_files {
my $callback = shift;
my $path = shift;
my @retval;
push @retval, $path if &$callback($path);
return @retval unless (-d $path);
# identify children
opendir DIR, $path or return;
my @files = readdir DIR;
closedir DIR;
# visit each child
foreach my $file (@files) {
next if ($file =~ /^\.\.?$/); # skip . and ..
push @retval, find_files("$path/$file", $callback);
}
return @retval;
}
To understand this example, start with the initial call to find_files.
It passes two arguments. The first is a code reference. Note the syntax.
As I pointed out in the introduction, to let Perl know I mean a subroutine,
I put the & sigil in front of is_hidden. To make a reference to that
routine (instead of calling it immediately), I put a backslash in
front, just as I would to take any other kind of reference.
When I use the callback in find_files, $callback has the reference to
the code. To dereference it I put the & sigil in front of it.
The find_files subroutine takes a path where the search begins and a
code reference called $callback. At each invocation, it stores the path
in the return list, if callback returns true for that path. This allows
you to reuse find_files for many applications, changing only the callback
subroutine to change the outcome. This is the strategy pattern, but
without the hassle of subclassing the find_files abstract base class
and overriding the criterion method.
In find_files, I use recursion to descend the directory tree and its
subtrees. First, I call the callback to see if the current path should
go into the output. Then the real routine begins. What the callback
does makes no difference to this routine. Any true or false value is OK
with find_files.
The recursion stops if the file is not a directory. At that point the
list is immediately returned. (It could be empty or have the current path
in it, depending on the callback's return value.) Otherwise, all the files and
subdirectories in the current path are read into @files. Each of those
entries is scanned by the recursive call to find_files (unless the file
is . or .., which would create endless recursion). Whatever the recursive
call to find_files returns, it is pushed onto the end of the final output.
When all children have been visited, @result is returned to the caller.
The CPAN module File::Find robustly solves the problem approached quickly
in my example above. It relies on exactly this kind of function callback.
The Strategy Pattern uses a callback to perform a single task that varies
from use to use. The next pattern uses a series of callbacks to implement
the steps of an algorithm.
In some calculations the steps are known, but what the steps do is not.
For example, computing charges for a rental might involve three steps:
-
Calculate amount due from rates.
-
Calculate taxes.
-
Add these together.
Yet different rentals might have different schemes for calculating the amount
due from rates, and different jurisdictions usually have different tax schemes.
A template method can implement the outline, deferring to callers for
the individual schemes.
package Calc;
use strict; use warnings;
sub calculate {
my $class = shift; # discarded
my $data = shift;
my $rate_calc = shift; # a code ref
my $tax_calc = shift; # also a code ref
my $rate = &$rate_calc($data);
my $taxes = &$tax_calc($data, $rate);
my $answer = $rate + $taxes;
}
Here the caller supplies a data reference (probably to a hash or object)
together with two code references which are used as callbacks. Each
callback must expect the data reference as its first parameter. The tax_calc
code reference also receives the amount due from the rate calculator.
This allows it to use a percentage of the amount together with information in
the data reference.
A caller might look like this:
#!/usr/bin/perl
use strict; use warnings;
use Calc;
my $rental = {
days_used => 5,
day_rate => 19.95,
tax_rate => .13,
};
my $amount_owed = Calc->calculate($rental, \&rate_calc, \&taxes);
print "You owe $amount_owed\n";
sub rate_calc {
my $data = shift;
return $data->{days_used} * $data->{day_rate};
}
sub taxes {
my $data = shift; # discarded
my $subtotal = shift;
return $data->{tax_rate} * $subtotal;
}
I made this contrived caller so you can see the calling sequence. The
data here is a simple hash. To save exporting from Calc, I made calculate
a class method, so I call it through its class. In the call, I pass a
reference to my data hash and references to the two calculation routines.
This can be made more complex if you like. One could even make a full-blown
class hierarchy of calculators, allowing callers to select the one they want.
This example is about as simple as I could make the template method pattern.
Another approach to templates is to have the caller place methods in
the template package. This approach amounts to an implementation of
mixins a la Ruby. Here's a sample that is more object oriented.
package Calc;
sub calculate {
my $self = shift;
my $rate = $self->calculate_rate();
my $tax = $self->calculate_tax($rate);
return $rate + $tax;
}
1;
The whole module is really only the template method. To use it, you
have to code calculate_rate and calculate_tax methods, or your script
will die. Here's a particular implementation of the scheme:
package CalcDaily;
package Calc;
use strict; use warnings;
sub new {
my $class = shift;
my $self = {
days_used => shift,
day_rate => shift,
tax_rate => shift,
};
return bless $self, $class;
}
sub calculate_rate {
my $data = shift;
return $data->{days_used} * $data->{day_rate};
}
sub calculate_tax {
my $data = shift; # discarded
my $subtotal = shift;
return $data->{tax_rate} * $subtotal;
}
1;
Note that I added a constructor and two methods to the Calc package
in a different source file. This is perfectly legal and occasionally useful.
By doing this, the template is totally isolated. It doesn't even know
what sort of data will be stored in the objects of its own type. That
does mean that only one Calc subtype can be used at a time. If that's
a problem for you, do the standard thing: have Calc call methods on objects
in some separate hierarchy.
There are two package statements at the top of the file, this is on purpose.
The first one tells people (and crawlers) that this is the CalcDaily
package which rightfully belongs in CalcDaily.pm, not the original Calc, which belongs in Calc.pm.
Finally, here's the caller, which is only slightly modified:
#!/usr/bin/perl
use strict; use warnings;
use Calc;
use CalcDaily;
my $rental = Calc->new(5, 19.95, .13);
my $amount_owed = $rental->calculate();
print "You owe $amount_owed\n";
This technique is similar to the one used in the debugger architecture
for Perl. To make my own debugger, I need a name for it. I might choose
PhilDebug.pm. Then I have to make a file with that name in a Devel
directory which is in my @INC list. The first line in the
file should be (but doesn't have to be):
package Devel::PhilDebug;
This allows the CPAN indexer to properly catalog my module.
The base package for debuggers is fixed as DB. Perl expects to call
the DB function in that package. So all together it might look something
like this:
package Devel::PhilDebug;
package DB;
sub DB {
my @info = caller(0);
print "@info\n";
}
1;
Any script will use this debugger if it is invoked as:
perl -d:PhilDebug script
Each time the debugger notices that a new statement is about to start, it
first calls DB::DB. This is a very powerful example of plug-and-play.
It is not usually wise to pollute foreign classes with your own code.
Yet, Perl permits this, because it is sometimes highly useful. There
seems to be a theme here:
Don't rule out dangerous things. Just avoid them,
unless you have a good reason to use them.
The Strategy and Template patterns use code references to allow the caller
to adjust the behavior of an algorithm. The template I showed used a data
container to hold rental information. The next pattern makes more use
of data containers.
Many structures external to your program should be represented with
composites (like trees or the data container in the introduction) inside
your program. There are two fundamentally different ways to represent
these structures. For an object-oriented way to compose such structures
see the Composite Pattern in GoF (which I will discuss in my next article).
Here we'll look at how to build a composite structure in a hash of hashes.
You might rather build the objected-oriented version. Which you choose
should depend on the complexity of the data and the methods to act on it.
If data and methods are simple, you should probably use the hash structure.
It will be faster, have built-in support, and be more familiar to Perl
programmers who might need to maintain your code. If the complexities are
large, you should use full-blown objects. They make the structure easier
to understand for object-oriented programmers and provide more code-based
documentation than simple hashes.
So, hashes are superior structures for simple to moderately
complex data. To see how to build a hash structure consider an example:
visualizing an outline. For simplicity, I'll represent the outline
purely through indentation (not with Roman or other numerals). Here's an
example outline:
Grocery Store
Milk
Juice
Butcher
Thin sliced ham
Chuck roast
Cheese
Cleaners
Home Center
Door
Lock
Shims
This outline describes a theoretical shopping trip. I want to represent
it internally in my program so I can play with it. (One of my favorite
games is turning outlines into pictures, see below.)
Instead of a full-blown object, I'll use a little hash-based data container
for each node in the tree. Each node will keep track of three things:
-
Name
-
Level
-
Children (a list of other nodes)
To keep track of who is a child of whom, I'll use a stack of these nodes.
The node on the top of the stack is usually the parent of the next line of
input. To show my method, I'll intersperse comments with the script. At
the bottom of this section the script appears in one piece.
#!/usr/bin/perl
use strict; use warnings;
These lines are always a good idea.
my $root = {
name => "ROOT",
level => -1,
children => [],
};
This is the root node. It's a hash reference containing the three keys
mentioned earlier. The root node is special. Since it isn't in the file,
I give it an artificial name and a level that is lower than anyone else's.
(In a moment, we will see that levels in the input will be zero or positive.)
Initially the list of children is empty.
my @stack;
push @stack, $root;
The stack will keep track of the ancestry of each new node. For starters
it needs the root node, which won't ever be popped, because it is an ancestor
of all the nodes.
while (<>) {
/^(\s*)(.*)/;
my $indentation = length $1 if defined ($1);
my $name = $2;
To read the file, I chose a magic while. For each line there will be two
parts: the indentation (the leading spaces) and the name (the rest of the
line). The regular expression captures any leading space into $1 and
everything else (except the new line) into $2. The length of the
indentation is the important part, the bigger this is the more ancestors
the node has. Lines starting at the margin have an indentation of 0
(which is why the ROOT has a level of -1).
while ($indentation <= $stack[-1]{level}) {
pop @stack;
}
This loop handles ancestry. It pops the stack, until the node on top of
the stack is the parent of the new node. Think of an example.
When Home Center comes along, Cleaners and ROOT are on the stack.
Home Center's level is 0 (it's at the margin), so is Cleaners'. Thus,
Cleaners is popped (since 0 <= 0). Then only ROOT remains, so popping
stops (0 is not <= -1).
my $node = {
name => $name,
level => $indentation,
children => [],
};
This builds a new node for the current line. It's name and level are set.
We haven't seen any children yet, but I make room for them in an empty list.
push @{$stack[-1]{children}}, $node;
This line adds the new node to its parent's list of children. Remember
that the parent is sitting on top of the stack. The top of the stack is
$stack[-1] or the last element in the array.
push @stack, $node;
}
This pushes the new node onto the stack, in case it has children. The
closing brace ends the magic while loop. For simplicity, I chose to
display the output with Data::Dumper:
use Data::Dumper; print Dumper($root);
Running this shows the tree (sideways) on standard out.
Here's the whole code without interruption:
#!/usr/bin/perl
use strict; use warnings;
my $root = {
name => "ROOT",
level => -1,
children => [],
};
my @stack;
push @stack, $root;
while (<>) {
/^(\s*)(.*)/;
my $indentation = length $1;
my $name = $2;
while ($indentation <= $stack[-1]{level}) {
pop @stack;
}
my $node = {
name => $name,
level => $indentation,
children => [],
};
push @{$stack[-1]{children}}, $node;
push @stack, $node;
}
use Data::Dumper; print Dumper($root);
I promised to explain how structures like the one above can be turned into
pictures. The CPAN module UML::Sequence builds a structure similar to the
one shown here. It then uses that to generate a UML Sequence diagram of
the steps in SVG (Scalable Vector Graphics) format. That format can
be converted with standard tools like Batik to PNG or JPEG. In practice
the outlines which I turn into pictures represent call sequences for
programs. Perl can even generate the outline by running the program.
See UML::Sequence for more details.
When you have some interesting structured input, a builder might help
make a good internal structure. One high value builder is XML::DOM.
Another with a slightly different approach is XML::Twig. It is not
coincidental that XML parsers are really builders, as XML files are
non-binary trees.
If you haven't looked in GoF yet, start with the interpreter pattern.
Laughter is good for the soul. The person who taught me patterns in
Java did not even know why this pattern would not work in practice.
He had heard it was somewhat slow, but he wasn't sure. Well I'm sure.
Luckily for us, Perl has alternatives. These range from quick and dirty
to full blown. Here's the litany covered with examples below:
-
split
-
eval'ing Perl code
-
Config::Auto
-
Parse::RecDescent
Since we already have a language we like (that's Perl for those who haven't
been paying attention), interpreting is limited to small languages that
do something for us. Usually these turn out to be configuration files,
so I will focus on those. (See the builder section above if a tree can
represent your data file.)
The easiest route involves split. Suppose I have a config file which
uses variable=value settings. Comments and blanks should be ignored,
all other lines should have a variable, value pair. That's easy:
sub parse_config {
my $file = shift;
my %answer;
open CONFIG, "$file" or die "Couldn't read config file $file: $!\n";
while (<CONFIG>) {
next if (/^#|^\s*$/); # skip blanks and comments
my ($variable, $value) = split /=/;
$answer{$variable} = $value;
}
close CONFIG;
return %answer;
}
This subroutine expects a config file name. It opens and reads that file.
Inside the magic while loop the regex rejects lines which start with '#'
and those which contain only whitespace. All other lines are split on '='.
The variables become keys in the %answer hash. When all the lines are read,
the caller gets the hash back.
You could go much further along these lines, but see below for those who've
gone before you (see especially Config::Auto).
My current favorite way to bring configuration information into a Perl program
is to specify the config file in Perl. So, I might have a config file like
this:
our $db_name = "projectdb";
our $db_pass = "my_special_password_no_one_will_think_of";
our %personal = (
name => "Phil Crow",
address => "philcrow2000@yahoo.com",
);
To use this in a Perl program all I have to do is eval it:
...
open CONFIG, "config.txt" or die "couldn't...\n";
my $config = join "", <CONFIG>;
close CONFIG;
eval $config;
die "Couldn't eval your config: $@\n" if $@;
...
To read the file, I open it, then use join to put the angle read operator
in list context. This lets me bring the whole file into a scalar. Once
it's in (and the file is closed for tidiness), I just eval the string I read.
I need to check $@ to make sure the file was good Perl. After that, I'm
ready to use the values just as if they appeared in the program originally.
If you're too lazy to write your own config handler, or if you have lots
of configs outside your control, Config::Auto may be for you. Basically,
it takes a file and guesses how to turn it into a config hash. (It
can even guess the name of your config file). Using it is easy (if it works):
#!/usr/bin/perl
use strict; use warnings;
use Config::Auto;
my $config = Config::Auto::parse("your.config");
...
What ends up in $config depends on what your config file looks like (shock).
For files which use variable=value pairs, you get what you expect, which
is exactly what the first example above generates for the same input. It
is possible to specify a config file that Config::Auto cannot understand
(shock and amazement).
If the file you need to parse is complex, consider Parse::RecDescent.
It implements a clever top/down parser scheme. To use it, you specify
a grammar. (You remember grammars, don't you? If not, see below.) It
builds a parser from your grammar. You feed text to the parser. It does
whatever the grammar specifies in its actions.
To give you a feel for how this works, I'll parse small Roman numerals.
The program below takes numbers from the keyboard and translates them
from Roman numerals to decimal integers, so XXIX becomes 29.
#!/usr/bin/perl
use strict; use warnings;
use Parse::RecDescent;
my $grammar = q{
Numeral : TenList FiveList OneList /\Z/
{ $item[1] + $item[2] + $item[3]; }
| /quit/i { exit(0); }
| <error>
TenList : Ten(3) { 30 }
| Ten(2) OptionalNine { 20 + $item[2] }
| Ten OptionalNine { 10 + $item[2] }
| OptionalNine { $item[1] }
OptionalNine : One Ten { 9 }
| { 0 }
FiveList : One Five { 4 }
| Five { 5 }
| { 0 }
OneList : /(I{0,3})/i { length $1 }
Ten : /X/i
Five : /V/i
One : /I/i
};
my $parse = new Parse::RecDescent($grammar);
while (<>) {
chomp;
my $value = $parse->Numeral($_);
print ``value: $value\n'';
}
As you can see $grammar takes up most of the space in this program.
The rest is pretty simple. Once I receive the parser from the
Parse::RecDescent constructor, I just call its Numeral method repeatedly.
So what does the grammar mean? Let's start at the top. Grammars are
built from rules. The rule for a Numeral (the Roman kind) says:
A Numeral takes the form of one of these choices
a TenList then a FiveList then a OneList then the end of the string
OR
the word quit in any case (not a Numeral, but a way to quit)
OR
anything else, which is an error
We'll see what TenList and its friends are shortly. The code after
the first choice is called an action. If the rule matches a possibility,
it performs that possibility's action. So if a valid Numeral is seen,
the action is executed. This particular action adds up the values
TenList, FiveList, and OneList have accumulated. The items are numbered
starting with 1, so TenList's value is in $item[1], etc.
How does TenList get a value? Well, when Numeral starts matching, it looks
first for a valid TenList. There are four choices:
A TenList takes the form of one of these choices
three Tens
OR
two Tens then an OptionalNine
OR
a Ten then an OptionalNine
OR
an OptionalNine
These choices are tried in order. A Ten is simply an upper- or lower-case
X (see the Ten rule). The result of an action is the result of its last
statement. So, if there are three tens, the TenList returns 30. If there
are two tens, it returns 20 plus whatever OptionalNine returned.
The Roman numeral IX is our 9. I call this an OptionalNine. (The names
are completely arbitrary.) So after zero, one, or two X's, there can be
an IX which adds 9 to the total. If there is no IX, the OptionalNine will
match the empty rule. That consumes no text from the input and returns
zero according to its action.
Roman numerals are a lot more complex than my little grammar can handle.
For starters, by my calendar, we're now in the year MMIII. There are no
M's in my grammar. Further, some Romans thought that IIIIII was perfectly
valid. In my grammar three is the limit for all repetitions, and only I
and X can repeat. Further, reductions can only take one away. So, IIX
is not eight, it's invalid. This grammar can recognize any normalized
Roman numeral up to 38. Feel free to expand it.
Parse::RecDescent is not as fast as a yacc-generated parser, but it is
easier to use. See the documentation in the distribution for more
information, especially the tutorial which originally appeared in The
Perl Journal.
If you look at what's inside the parser (say with Data::Dumper) you might
think this actually implements the interpreter pattern. After all, it
makes a tree of objects from the grammar. Look closer and you will see
the key difference. All of the objects in the tree are members of
classes of like Parse::RecDescent::Action, which were written by Damian
Conway when he wrote the module. In the GoF interpreter pattern we are
expected to build a class for each non-terminal in the grammar (above
those classes would be Numeral, ReducedTen, etc.). Thus, the tree node
types are different for each grammar.
This difference has two implications: (1) it makes the RecDescent parser
generator simpler and (2) it's result faster.
In this installment we have seen how to use code references to implement
the Strategy and Template Method patterns. We even saw how to force
our code into someone else's class. Builder turns text into an internal
structure, which most Interpreters also do. Those
structures can often be simple combinations of hashes, lists, and scalars.
If what you need to read is simpler, use split or Config::Auto.
If it is more complex, use Parse::RecDescent. If that won't do it fast
enough, you might need one of the yaccs.
Next time I will look at patterns which actually rely on objects.