
Introducing XML::SAX::Machines, Part One
Introduction
In recent columns we have seen that SAX provides a modular way to generate and filter XML content. For those just learning how SAX works, though, the task of hooking up to the correct parser-generator-driver and building chains of filters can be tricky. More experienced SAX users may have a clearer picture of how to proceed, but they often find that initializing complex filter chains is tedious and lends itself to lots of duplicated code.
Consider the following simple filter chain script:
use XML::SAX::ParserFactory; use XML::SAX::Writer; use My::SAXFilter::One; use My::SAXFilter::Two; use My::SAXFilter::Three; my $writer = XML::SAX::Writer->new(); my $filter3 = My::SAXFilter::Three->new( Handler => $writer ); my $filter2 = My::SAXFilter::Two->new( Handler => $filter3 ); my $filter1 = My::SAXFilter::One->new( Handler => $filter2 ); my $parser = XML::SAX::ParserFactory->parser( Handler => $filter1 ); $parser->parse_uri( $xml_file );
Not too bad for this tiny example, perhaps, but imagine
how it might look in a complex system with 10 or 15 filters all
doing their part. Also, new SAX users often stumble over the fact
that the handler chain must be built in reverse order
($filter3 has to be initialized before
$filter2 so it can be passed in as the handler class,
for example). Yet another potential weakness in this script is that
the filters in the chain are hard-coded from the start. While it is
possible to make some aspects more flexible, adding the ability to
have a dynamic list of filters only adds to the complexity of the
script.
Barrie Slaymaker's outstanding new
XML::SAX::Machines addresses both the complexity and the
tedium of creating SAX systems. Compare the following snippet to the
one above.
use XML::SAX::Machines qw( :all );
my $machine = Pipeline(
"My::SAXFilter::One",
"My::SAXFilter::Two",
"My::SAXFilter::Three",
\*STDOUT
);
$machine->parse_uri( $xml_file );
Less verbose, more intuitive (note that the chain is declared in processing order) and, perhaps most importantly, making the filter chain dynamic is as simple as creating a list of strings containing module names:
my $machine = Pipeline(
@filter_list,
\*STDOUT
);
Where @filter_list is built dynamically elsewhere in
the application.
The story does not end there,
however. XML::SAX::Machines and its associated
Machine classes provide a small host of options for
building easy-to-maintain SAX-based XML processing systems. Over the
next two months we will be looking at this inventive distribution,
beginning with this month's introduction.
Machine Types
XML::SAX::Machines is high-level wrapper class that
allows its various Machine classes (which may also be
used as standalone libraries) to be easily chained together to
create complex SAX filtering
systems. XML::SAX::Machines currently installs and
knows about several Machines by default.
Pipeline
Implemented by XML::SAX::Pipeline, a
Pipeline provides a way to set up a linear series of
filters (or other Machines) that works like the traditional
hand-rolled SAX filter chain that we looked at in the introduction.
That is, the events fired go directly to the next filter or handler
on the chain with no intervention.
my $machine = Pipeline(
"My::SAXFilter::One",
"My::SAXFilter::Two",
"My::SAXFilter::Three",
\*STDOUT
);
In this example, the three filter classes are fired in linear order with the
results of My::SAXFilter::One being sent to My::SAXFilter::Two
and so on.
Manifold
Manifold Machines provide a way to create multi-pass
filters. The events are cached at the beginning of the
Manifold's run and duplicate copies of that event
stream are sent through the filters one by one and recompiled into a
single document upon completion. It is implemented by
XML::SAX::Manifold.
my $machine = Pipeline(
Manifold(
"My::SAXFilter::A",
"My::SAXFilter::B",
"My::SAXFilter::C",
),
\*STDOUT
);
Here, events fired during parsing are buffered and sent directly to each of the three filters (in order) and the output of each of the filters is merged into a single stream before being handed off to the Writer class.
Tap
Implemented by XML::SAX::Tap, a Tap offers
a way to insert a class that examines one or more SAX events, but in
no way alters the data passed to the next filter or handler. This
can be extremely useful for cases where you want to examine the
result of a given filter or other Machine part for debugging
purposes. The handler that you use for your Tap need
not forward the events as a typical filter would since the same
events will also be sent to the next handler in the chain as if the
Tap did not exist. Note:
my $machine = Pipeline(
"My::SAXFilter::One",
"My::SAXFilter::Two",
Tap(
"My::SAXDumper"
),
"My::SAXFilter::Three",
\*STDOUT
);
In this case, we have taken the Pipeline from above and
added a Tap to send events fired by
My::SAXFilter::Two to our SAXDumper for debugging.
ByRecord
ByRecord carves up record-oriented XML documents and
sends each record through each filter in the ByRecord
machine as a separate event stream delimited by
start_document and end_document
events. All other events (data outside of the records) are forwarded
appropriately to the downstream filter or handler. It is implemented
by XML::SAX::ByRecord
my $machine = Pipeline(
ByRecord(
"My::RecordFilter::One",
"My::RecordFilter::Two",
),
"My::SAXFilter::One",
"My::SAXFilter::Two",
"My::SAXFilter::Three",
\*STDOUT
);
In this case, we have taken the Pipeline from above and
added a ByRecord Machine to process the record-oriented
parts of the document before beginning the rest of the
Pipeline chain.
Now that we have an idea of the various Machines that are currently available, let's get straight to this month's code example.
Pages: 1, 2 |

