Planet Perl

November 28, 2009

Dave Rolsky — What's the Point of Markdent?

Markdent is my new event-driven Markdown parser toolkit, but why should you care?

First, let's talk about Markdown. Markdown is yet another wiki-esque format for marking up plain text. What makes Markdown stand out is it's emphasis on usability and "natural" usage. It's syntax is based on things people have been doing to "mark up" plain text email for years.

For example, if you wanted to list some items in a plain text email, you'd wite something like:

* List item 1
* List item 2
* List item 3

Well, this is how it works in Markdown too. Want to emphasize some text? *Wrap it in asterisks* or _underscores_.

So why do you need an event-driven parser toolkit for dealing with Markdown? CPAN already has several modules for dealing with Markdown, most notably Text::Markdown.

The problem with Text::Markdown is that all you can do with it is generate HTML, but there's so much more you could do with a Markdown document.

If you're using Markdown for an application (like a wiki), you may need to generate slightly different HTML for different users. For example, maybe logged-in users see documents differently.

But what if you want to cache parsing in order to speed things up? If you're going straight from Markdown to HTML, you'd need to cache the resulting HTML for each type of user (or even for each individual user in the worst case).

With Markdent, you can cache an intermediate representation of the document as a stream of events. You can then replay this stream back to the HTML generator as needed.

What's the Impact of Caching?

Here's a benchmark comparing three approaches.

Use Markdent to parse the document and generate HTML from scratch each time.
Use Text::Markdown
Use Markdent to parse the document once, then use Storable to store the event stream. When generating HTML, thaw the event stream and replay it back to the HTML generator.

	Rate	parse from scratch	Text::Markdown	replay from captured events
parse from scratch	1.07/s	--	-67%	-83%
Text::Markdown	3.22/s	202%	--	-48%
replay from captured events	6.13/s	475%	91%	--

This benchmark is included in the Markdent distro. One feature to note about this benchmark is that it parses 23 documents from the mdtest test suite. Those documents are mostly pretty short.

If I benchmark just the largest document in mdtest, the numbers change a bit:

	Rate	parse from scratch	Text::Markdown	replay from captured events
parse from scratch	2.32/s	--	-58%	-84%
Text::Markdown	5.52/s	138%	--	-63%
replay from captured events	14.8/s	538%	168%	--

Markdent probably speeds up on large documents because each new parse requires constructing a number of objects. With 23 documents we construct those objects 23 times. When we parse one document the actual speed of parsing becomes more important, as does the speed of not parsing.

What Else?

But there's more to Markdent than caching. One feature that a lot of wikis have is "backlinks", which is a list of pages linking to the current page. With Markdent, you can write a handler that only looks at links. You can use this to capture all the links and generate your backlink list.

How about a full text search engine? Maybe you'd like to give a little more weight to titles than other text. You can write a handler which collects title text and body text separately, then feed that into your full text search tool.

There's a theme here, which is that Markdent makes document analysis much easier.

That's not all you can do. What about a Markdown-to-Textile converter? How about a Markdown-to-Markdown converter for canonicalization?

Because Markdent is modular and pluggable, if you can think of it, you can probably do it.

I haven't even touched on extending the parser itself. That's still a much rougher area, but it's not that hard. The Markdent distro includes an implementation of a dialect called "Theory", based on some Markdown extension proposals by David Wheeler.

This dialect is implemented by subclassing the Standard dialect parser classes, and providing some additional event classes to represent table elements.

I hope that other people will pick up on Markdent and write their own dialects and handlers. Imagine a rich ecosystem of tools for Markdown comparable to what's available for XML or HTML. This would make an already useful markup language even more useful.

by Dave Rolsky at November 28, 2009 17:20 UTC

Marcus Ramberg — Burning the candle at both ends.

In addition to recent fatherhood and doing my own startup, I am now the leader of the Oslo Perl Mongers. Look forward to tech talks from Oslo.pm in the months to come. :)

(permalink)

by marcus at November 28, 2009 09:17 UTC

November 27, 2009

Dave Rolsky — Want Good Tools? Break Your Problems Down

I've been working a new a project recently, Markdent, an event-driven Markdown parser toolkit.

Why? Because the existing Perl Markdown tools just aren't flexible enough. They bundle up Markdown parsing with HTML conversion all in one API, and I need to do more than convert to HTML.

This sort of inflexibility is quite common when I look at CPAN libraries. Looking back at the Perl DateTime Project, one of my big problems with all the other date/time modules on CPAN was their lack of flexibility. If I could have added good time zone handling to an existing project way back then, I probably would have, but I couldn't, and the Perl DateTime Project was born.

If there is one point I would hammer home to all module authors, it would be "solve small problems". I think that the failure to do this is what leads to the inflexibility and tight coupling I see in so many CPAN distributions.

For example, I imagine that in the date/time world some people thought "I need a bunch of date math functions" or "I need to parse lots of possible date/time strings". Those are good problems to solve, but by going straight there you lose any hope of a good API.

Similarly, with Markdown parsers, I imagine that someone though "I'd like to convert Markdown to HTML", so they wrote a module that does just that.

I can't really fault their goal-focused attitudes. Personally, I sometimes find myself getting lost in digressions. For example, I'm currently writing a webapp with the goal of exploring techniques I want to use in another webapp!

But there's a lot to be said for not going straight to your goal. I'm a big fan of breaking a problem down into smaller pieces and solving each piece separately.

For example, when it comes to Markdown, there are several distinct steps on the way from Markdown to HTML. First, we need to be able to parse Markdown. Parsing Markdown is a step of its own. Then we need to take the results of parsing and turn it into HTML.

If we think of the problem as consisting of these pieces, a clear and flexible design emerges. We need a tool for parsing Markdown (a parser). Separately, we need a tool for converting parse results to HTML (a converter or parse result handler).

Now we need a way to connect these pieces. In the case of Markdent, the connection is an event-driven API where each event is an object and the event receiver conforms to a known API.

It's easy to put these two things together and make a nice simple Markdown-to-HTML converter.

But since I took the time to break the problem down, you can also do other things with this tool. For example, I can do something else with our parse results, like capture all the links or cache the intermediate result of the parsing (an event stream).

And since the HTML generator is a small piece, I can also reuse that. Now that I've cached our event stream, I can pull it from the cache later and use it to generate HTML without re-parsing the document. In the case of Markdent, using a cached parse result to generate HTML was about six times faster in my benchmarks!

Because Markdent has small pieces, there are all sorts of interesting ways to reuse them. How about a Markdown-to-Textile converter? Or how about adding a filter which doesn't allow any raw HTML?

We've all heard that loose coupling makes good APIs. But just saying that doesn't really help you understand how to achieve loose coupling. Loose coupling comes from breaking a big problem down into small independent problems.

As you solve each problem, think about how those solutions will communicate. Design a simple API or communications protocol. You'll know the API is simple enough if you can imagine easily swapping out each piece of the problem with another API-conformant piece. A loosely coupled API is one that makes replacing one end of the API easy.

And best of all, when you break problems down into loosely coupled pieces, you'll make it much easier for others to contribute to and extend your tools. Moose is a great example of this. It's fancy sugar layer exists on top of loosely coupled units known as the metaclass protocol. By separating the sugar from the underlying pieces, we've enabled others to create a huge number of Moose extensions.

The same goes for the Perl DateTime Project. I wrote the core pieces, but there have been many, many great contributions. This wealth of extensions wouldn't be possible without the loosely coupled core pieces and a well-defined API for communicating between components.

by Dave Rolsky at November 27, 2009 22:09 UTC

Yuval Kogman — The timing of values in imperative APIs

Option configuration is a classic example of when I prefer a purely functional approach. This post is not about broken semantics, but rather about the tension between ease of implementation and ease of use.

Given Perl's imperative heritage, many modules default to imperative option specification. This means that the choice of one behavior over another is represented by an action (setting the option), instead of a value.

Actions are far more complicated than values. For starters, they are part of an ordered sequence. Secondly, it's hard to know what the complete set of choices is, and it's hard to correlate between choices. And of course the actual values must still be moved around.

A simple example is Perl's built in import mechanism.

When you use a module, you are providing a list of arguments that passed to two optional method calls on the module being loaded, import and VERSION.

Most people know that this:

use Foo;

Is pretty much the same as this:

BEGIN {
    require Foo;
    Foo->import();
}

There's also a secondary syntax, which allows you to specify a version:

use Foo 0.13 qw(foo bar);

The effect is the same as:

BEGIN {
    require Foo;
    Foo->VERSION(0.13);
    Foo->import(qw(foo bar));
}

UNIVERSAL::VERSION is pretty simple, it looks at the version number and compares it with $Foo::VERSION and then complains loudly if $Foo::VERSION isn't recent enough.

But what if we wanted to do something more interesting, for instance adapt the exported symbols to be compatible with a certain API version?

This is precisely why VERSION is an overridable class method, but this flexibility is still very far from ideal.

my $import_version;

sub VERSION {
    my ( $class, $version ) = @_;

    # first verify that we are recent enough
    $class->SUPER::VERSION($version);

    # stash the value that the user specified
    $import_version = $version;
}

sub import {
    my ( $class, @import ) = @_;

    # get the stashed value
    my $version = $import_version;

    # clear it so it doesn't affect subsequent imports
    undef $import_version;

    ... # use $version and @imports to set things up correctly
}

This is a shitty solution because really all we want is a simple value, but we have to juggle it around using a shared variable.

Since the semantics of import would have been made more complex by adding this rather esoteric feature, the API was made imperative instead, to allow things to be optional.

But the above code is not only ugly, it's also broken. Consider this case:

package Evil;
use Foo 0.13 (); # require Foo; Foo->VERSION;

package Innocent;
use Foo qw(foo bar); # require Foo; Foo->import;

In the above code, Evil is causing $import_version to be set, but import is never called. The next invocation of import comes from a completely unrelated consumer, but $import_version never got cleared.

We can't use local to keep $import_version properly scoped (it'd be cleared before import is called). The best solution I can come up with is to key it in a hash by caller(), which at least prevents pollution. This is something every implementation of VERSION that wants to pass the version to import must do to be robust.

However, even if we isolate consumers from each other, the nonsensical usage use Foo 0.13 () which asks for a versioned API and then proceeds to import nothing, still can't be detected by Foo.

We have 3 * 2 = 6 different code paths^[1] for the different variants of use Foo, one of which doesn't even make sense (VERSION but no import), two of which have an explicit stateful dependency between two parts of the code paths (VERSION followed by import, in two variants), and two of which have an implicit stateful dependency (import without VERSION should get undef in $import_version). This sort of combinatorial complexity places the burden of ensuring correctness on the implementors of the API, instead of the designer of the API.

It seems that the original design goal was to minimize the complexity of the most common case (use Foo, no VERSION, and import called with no arguments), but it really makes things difficult for the non default case, somewhat defeating the point of making it extensible in the first place (what good is an extensible API if nobody actually uses it to its full potential).

In such cases my goal is often to avoid fragmenting the data as much as possible. If the version was an argument to import which defaulted to undef people would complain, but that's just because import uses positional arguments. Unfortunately you don't really see this argument passing style in the Perl core:

sub import {
    my ( $class, %args ) = @_;

    if ( exists $args{version} ) {
        ...
    }
    ... $args{import_list};
}

This keeps the values together in both space and time. The closest thing I can recall from core Perl is something like $AUTOLOAD. $AUTOLOAD does not address space fragmentation (an argument is being passed using a a variable instead of an argument), but it at leasts solves the fragmentation in time, the variable is reliably set just before the AUTOLOAD routine is invoked.

Note that if import worked like this it would still be far from pure, it mutates the symbol table of its caller, but the actual computation of the symbols to export can and should be side effect free, and if the version were specified in this way that would have been easier.

This is related to the distinction between intention and algorithm. Think of it this way: when you say use Foo 0.13 qw(foo bar), do you intend to import a specific version of the API, or do you intend to call a method to set the version of the API and then call a method to import the API? The declarative syntax has a close affinity to the intent. On the other hand, looking at it from the perspective of Foo, where the intent is to export a specific version of the API, the code structure does not reflect that at all.

Ovid wrote about a similar issue with Test::Builder, where a procedural approach was taken (diagnosis output is treated as "extra" stuff, not really a part of a test case's data).

Moose also suffers from this issue in its sugar layer. When a Moose class is declared the class definition is modified step by step, causing load time performance issues, order sensitivity (often you need to include a role after declaring an attribute for required method validation), etc.

Lastly, PSGI's raison d'etre is that the CGI interface is based on stateful values (%ENV, globally filehandles). The gist of the PSGI spec is encapsulating those values into explicit arguments, without needing to imperatively monkeypatch global state.

I think the reason we tend to default to imperative configuration is out of a short sighted laziness^[2]. It seems like it's easier to be imperative, when you are thinking about usage. For instance, creating a data type to encapsulate arguments is tedius. Dealing with optional vs. required arguments manually is even more so. Simply forcing the user to specify everything is not very Perlish. This is where the tension lies.

The best compromise I've found is a multilayered approach. At the foundation I provide a low level, explicit API where all of the options are required all at once, and cannot be changed afterwords. This keeps the combinatorial complexity down and lets me do more complicated validation of dependent options. On top of that I can easily build a convenience layer which accumulates options from an imperative API and then provides them to the low level API all at once.

This was not done in Moose because at the time we did not know to detect the end of a .pm file, so we couldn't know when the declaration was finished^[3].

Going back to VERSION and import, this approach would involve capturing the values as best we in a thin import (the sugar layer), and passing them onwards together to some underlying implementation that doesn't need to worry about the details of collecting those values.

In my opinion most of the time an API doesn't actually merit a convenience wrapper, but if it does then it's easy to develop one. Building on a more verbose but ultimately simpler foundation usually makes it much easier to write something that is correct, robust, and reusable. More importantly, the implementation is also easier to modify or even just replace (using polymorphism), since all the stateful dependencies are encapsulated by a dumb sugar layer.

Secondly, when the sugar layer is getting in the way, it can just be ignored. Instead of needing to hack around something, you just need to be a little more verbose.

Lastly, I'd also like to cite the Unix philosophy, another strong influence on Perl: do one thing, and do it well^[4]. The anti pattern is creating one thing that provides two features: a shitty convenience layer and a limited solution to the original problem. Dealing with each concern separately helps to focus on doing the important part, and of course doing it well ;-)

This post's subject matter is obviously related to another procedural anti-pattern ($foo->do_work; my $results = $foo->results vs my $results = $foo->do_work). I'll rant about that one in a later post.

^[1]

use Foo;
use Foo 0.13;
use Foo qw(foo bar);
use Foo 0.13 qw(Foo Bar);
use Foo ();
use Foo 0.13 ();

and this doesn't even account for manual invocation of those methods, e.g. from delegating import routines.

^[2] This is the wrong kind of laziness, the virtuous laziness is long term

^[3] Now we have B::Hooks::EndOfScope

^[4] Perl itself does many things, but it is intended to let you write things that do one thing well (originally scripts, though nowadays I would say the CPAN is a much better example)

by nothingmuch (nothingmuch@woobling.org) at November 27, 2009 02:02 UTC

November 26, 2009

David Golden — Too many CPAN.pm config options?

Over the years, CPAN.pm has accumulated a staggering number of configuration options. Recently, an otherwise expert Perl programmer (that I won’t embarrass by name) asked on IRC for the name of an option rather than reading the (admittedly long) manual. This reminded me that many Perl users may not know about a handy feature that has been in the CPAN shell since 2006:

cpan> o conf /MATCH/
cpan> o conf init /MATCH/

The first form will show all options matching a pattern. The second will re-run interactive configuration with explanatory paragraphs for all options matching a pattern.

Thanks to these features, I rarely actually read the manual. As long as I have a rough idea what the option name might be, I just search for it. And I can always fall back on “o conf” to see all options and then “o conf init NAME” to read the paragraph about any particular one.

by dagolden at November 26, 2009 14:07 UTC

Adam Kennedy — Padre Standalone 0.50 for Mac (Experimental)

Hurray for conferences!

I'm happy to report that, after bitching in my Padre talk at OSDC.AU that Mac people was unusually incapable of building software, Byron Ellicot from APNIC secretly took up the challenge and has build a Padre Standalone DMG installer in a marathon overnight session involving hand-editing XS and some Objective C hacking.

It should be available shortly at http://ali.as/Padre.dmg (file size is 60,632,004)

by Alias at November 26, 2009 08:52 UTC

November 25, 2009

Perl NOC Log — perl.org outage

One of our remaining single-point-of-failure-servers is doing it's duty demonstrating to us why we really should get rid of the last single-point-of-failures. Our console server is down, too (arrgh!), so no ETA yet but we'll get it fixed as soon as possible.

Update 8.30 PST - the troubled server is coming back now and we'll prioritize our plans to get things off this box. (It is among other things our shared NFS server from back at a time when saving a couple of gigabytes was worth the pain of NFS; obviously that's not true anymore).

by Ask Bj?rn Hansen at November 25, 2009 16:22 UTC

Gabor Szabo — Context Sensitive Help using Padre, the Perl IDE

This is the first screencast I made. It took me I think 6 hours to create a version that does not totally suck. The screencast shows how Padre can provide help on the various keywords and variables of Perl 5, PIR of Parrot and Perl 6.

by Gabor Szabo at November 25, 2009 08:44 UTC

Perl Buzz — Christmas brings the RJBS Advent Calendar

By Ricardo Signes

Back when I first started learning Perl 5, I was excited to find the Perl Advent Calendar. It was a series of 24 or so short articles about useful Perl modules or techniques, with one new entry each day leading up to Christmas. A few years later, the Catalyst crew started the Catalyst Advent Calendar. I always liked the Perl Advent Calendars, and kept meaning to contribute. Every time, though there were too many things I'd want to write about -- and mostly they were my own code, so I felt sort of smarmy and self-promoting and never did it.

Finally, though, I'm glad to say I have tackled those feelings. I will not shy away from showing off my own code, and I will not worry about having to choose just one thing. This year, I will publish the RJBS Advent Calendar, 24+ full days of cool, useful, or stupid code that I have written and given as a gift to the rest of the CPAN community.

I've had a lot of fun working on this project, and it's helped me find and fix a number of little bugs or imperfections in the software I'll be talking about.

The first door opens in seven days. I hope it's as fun to read as it was to write. No returns will be accepted. Approximate actual cash value: $0.02

Ricardo Signes has written tons of modules on the CPAN, including Dist::Zilla, the heir apparent to Module::Starter. He is also a total sweetheart, and has a fuzzy head.

by Andy Lester at November 25, 2009 04:10 UTC

November 24, 2009

Gabor Szabo — Beautiful Perl - creating charts and graphs

Yesterday I wanted to create a graph of 4 series of numbers along an x axis representing time. I know some GD related modules can do this so I wondered over to search.cpan.org and looked at some of the modules. The problem I encountered is lack of examples and lack of display of results.

Of course search.cpan is just a search engine that should display textual results but did not know where should I go to find a few charts created by a Perl module so I can see which module can generate the stuff I want to create and how to use it.

As I know the PDL people generate all kinds of 2D and 3D graphs I sent and e-mail to that list. While waiting for a reply the bright new world created a spark and I asked the same on Twitter as well. (BTW you can follow me on twitter it started to be useful to me!). I got a reply within seconds on Twitter and within minutes on the PDL mailing list pointing to two different solutions.

PLplot is a multilingual tool for creating charts and Chart::Clicker is Perl only tool for simple but very nice charts. For the latter I even found the slides of a talk Cory, the author gave: Data Visualization with Chart::Clicker. Way nicer than any of my slides.

I am pointing out these as they and the way they are displayed are approaching what I wanted to see and what I think can be very good for the Perl community.

I wanted to see a set of charts or graphs along with the data set and the script that generated those charts. Visual examples of the end results make a huge difference in convincing me that Perl - and the particular module - is nice. Having them with full code example and the data set that generated them will helping me pick the tool that fits my needs and will help me actually solve the problem at hand.

I'd really like to thank those who pointed out these examples and to the people who created the examples. I wish their work was easier to locate. Maybe such examples could be linked from www.perl.org or even embedded there.

Of course not every module on CPAN can have a nice visual output but it would be really nice if those that can have such output would be presented with nice examples.

For a totally unrelated example we can look at the home page of SDL.

It now has an embedded video of the Bouncy game. So visitors of the site can see a nice site and an example of a project that was created by SDL.

by Gabor Szabo at November 24, 2009 12:53 UTC

Curtis Poe — How To Become a Millionaire with Parrot and COBOL

Do a little research into COBOL and a few interesting things jump out at you. Some of this information is from Gartner Group and the rest can easily be verified by doing even a brief survey of the field. Taking the following bits of information:

75% of the world's business data passes through COBOL (Gartner Group estimate)
There is possibly up to a fifth of a trillion lines of COBOL code out there (Gartner again)
People are still writing COBOL constantly, but usually on existing systems.
The industry is struggling to find new COBOL programmers because few young programmers love the thought of maintaining decades-old enterprise systems where all data is global and GOTO is often the first choice in flow control.
Many companies want to move from COBOL, but can't do so easily because too much code is written in COBOL (and the source is often lost).

People really, really underestimate these problems. For example, I've seen several companies express a desire to move away from Perl but find out they can't because they don't realize quite how reliant on the language they are. Now imagine a multi-national corporation with several million lines of COBOL code. What are they going to do?

COBOL salaries, from what I've seen, are trending upwards. Older programmers are sometimes being enticed out of retirement to maintain legacy systems (this is rather hit or miss as there appears to still be some age discrimination here). There are companies out there offering software to allow COBOL programmers to write NetBeans, integrate with .NET code or simply translate the COBOL into other languages (the latter appears to have mostly been a disaster, but I don't have enough hard data on this).

So let's summarize the above:

Trillions of dollars flow through COBOL.
Trillions of dollars flow through systems that businesses want to replace.
Current mitigation strategies involve supplementing COBOL, not replacing it.

You see the issue here? There's a fortune to be made for the people who figure out how to turn this trick. My thought is to not write supplementary tools for COBOL. It's to write a COBOL compiler on top of Parrot. Imagine coming across the following COBOL[1]:

000510 MAIN-PARA. 000520 OPEN INPUT IN-FILE 000530 OUTPUT OUT-FILE 000535 000540 PERFORM UNTIL END-OF-FILE 000550 ADD 10 TO LINE-NUMBER 000560 READ IN-FILE AT END 000570 MOVE 'Y' TO EOF-FLAG 000580 NOT AT END 000590 IF (CHAR-1 = '*') 000600 OR (CHAR-1 = '/') 000610 OR (CHAR-1 = '-') THEN 000620 MOVE LINE-CODE-IN TO L-COMMENT 000630 MOVE LINE-NUMBER TO L-NUM-COM 000640 WRITE LINE-CODE-OUT FROM NUMBER-COMMENT 000660 ELSE 000670 MOVE LINE-CODE-IN TO L-CODE 000680 MOVE LINE-NUMBER TO L-NUM-CODE 000690 WRITE LINE-CODE-OUT FROM NUMBER-CODE 000720 END-IF 000730 END-READ 000740 INITIALIZE NUMBER-CODE NUMBER-COMMENT 000750 END-PERFORM

With Parrot and a COBOL compiler, you could allow a more modern langauge (say, Rakudo) to be embedded:

000510 MAIN-PARA. 000520 OPEN INPUT IN-FILE 000530 OUTPUT OUT-FILE 000535 000540+Rakudo my $line_num = 0; while <C:IN-FILE> { $line_num += 10; my $c_area = /^[-*/]/ ?? '' !! ' '; # is this a comment? print C:OUT_FILE sprintf "%06d$c_area%-100s" => $lin_num, $line; } 000550

Now this example isn't the greatest (but being able to declare the variables next to where they're used is a huge win), but imagine working with free-form text. I once took a huge bit of COBOL translating CSV data to a fixed-width format and got it down to 10 lines of Perl (with error checking). With this strategy, you could gradually migrate away from COBOL by embedding a modern language directly inside the COBOL instead of keeping the COBOL and wrapping modern tools around it.

I'm surprised I've never seen this approach before. It really shouldn't be too hard. (If anyone wants to pay me a bazillion dollars to do this, let me know :)

1. If you look carefully at the COBOL and the Perl 6, you have no way of knowing if they're functionally equivalent due to how COBOL variables are declared. In fact, if you don't know COBOL, you might be misled into thinking that the COBOL code can't possibly be correct (look at the variable names), but it is.

by Ovid at November 24, 2009 11:27 UTC

Ricardo Signes — the rjbs advent calendar

I've had a lot of fun working on this project, and it's helped me find and fix a number of little bugs or imperfections in the software I'll be talking about.

The first door opens in seven days. I hope it's as fun to read as it was to write. No returns will be accepted. Approximate actual cash value: $0.02

by rjbs at November 24, 2009 05:22 UTC

November 23, 2009

Adam Kennedy — DBD::SQLite 1.27 now (finally) released

http://svn.ali.as/cpan/releases/DBD-SQLite-1.27.tar.gz

I'm happy to announce the release of DBD::SQLite 1.27, the first production release since the original sprint in March/April.

This is a massive release for the DBD::SQLite team. Since the last release, we've seen around 150-200 commits and 7 developer releases. All of the changes in this release can be credited to (in commit quantity order) ISHIGAKI, DAMI, DUNCAND, VLYON, and a larger cast of bug submitters and patch submitters, and myself merely acting as release manager and resident devil's advocate.

Major features include:

- SQLite release upgraded to 3.6.20

- Hugely improved test suite, and a major increase in reliability (as measured by CPAN Testers). This should allow DBD::SQLite to finally escape it's shameful position in the FAIL 100 list.

- Foreign key constraints are now supported and enforceable. In line with SQLite itself, we have them disabled by default. A pragma (detailed in the SQLite documentation) is used to turn them on.

- Huge refactoring of the internals to match DBI's preferred style, the func calls are now deprecated, replaced by first class sqlite_* methods.

- Online Backup support, allowing the copying of SQLite databases while they are live.

- SQLite includes syntactic support for an infix operator 'REGEXP'. This is now implemented using Perl regular expressions.

- Support for loading external sqlite3 extensions

- Support for fine-grained query security via sqlite_set_authorizer hooks.

- Support for commit, rollback and update hooks

While we've tried to keep compatibility wherever possible, if you do some more unusual things with SQLite you may see problems or regressions.

For example, my Xtract tool was using the column_info method to discover types in SQLite interfaces. The column_info method now requires an explicit schema, which broke my code.

In most cases, we hope that any changes you will need to make are minor and will be more correct (i.e. closer to the standard DBI usage).

This release is recommended for all users, and (pending a unexpected late bug) is expected to be the stable release for some time (probably until the next SQLite release they define as a recommended upgrade).

by Alias at November 23, 2009 23:53 UTC

Gabor Szabo — How to help people make money using Perl?

I know some of the hard-core Perl people will think about me as a heretic when I mention the M word (either money or marketing) but in the end you do need to feed your cat or dog, not to mention your kids, don't you?

As Peter Shangov pointed out one of the (or the main?) reason so many companies are using Java or Microsoft technologies for writing applications is because both of those worlds help many people to make money and thus these people are promoting the respective technologies.

We in the Perl community are also promoting our technology but in a grass-root way, usually far from the high-level decision makers. We do it mostly because we believe in the technology, we know it can solve the problems at hand in a good way and we like our language.

In the end we do quite similar things just in different channels.

So I have been wondering how could we help people make (more) money using Perl? Let's take a look at who is using Perl and what each of them needs?

There are the system administrators who tweak Unix and Linux machines and to some small extent Windows machines who sometimes need to automate things. They promote Perl by solving tasks and then showing to their piers and superiors how it was done using Perl. How much time and money they saved for the organization etc. The health of Perl does not have a direct impact on their salary but it does effect their quality of life. If they did not have perl on the system or if perl was banned they would enjoy their work less. On the other hand perl makes them more productive than others who don't know perl - or don't know it well enough - and thus they are more valuable to the organizations. Some of them can translate this to higher salary.

The fact that Perl is on every system helps them get their job done. The lack of many preinstalled CPAN modules and the relative hardship to distribute them to all the system hinders their job. If it was easier to distribute their code along with various dependencies, their life would be easier and their value would be higher.

Their salary won't change in large percentages along the level of acceptance of Perl but it will be easier for them to find a better jobs if their knowladge of Perl had a higher valuation among managers.

Perl being in a better position would also make more sysadmins learn Perl so the total value generated will be higher.

For these people what the Perl community can do is to strengthen the view of Perl as an important tool for system administrators. If possible we should let managers understand that sysadmins who know Perl are more valuable. For this some kind of an evaluation system might help but certification is not likely to happen in the Perl community so we should think of some other way to show a potential employer that someone knows perl at an adequate level. (References?)

What else do you think would help system administrators have a better position to earn more money or find better jobs?

by Gabor Szabo at November 23, 2009 20:51 UTC

use.perl — Perl 5.11.2

acme writes: The streets were pretty quiet, which was nice. They're always quiet here at that time: you have to be wearing a black jacket to be out on the streets between seven and nine in the evening, and not many people in the area have black jackets. It's just one of those things. I currently live in Colour Neighbourhood, which is for people who are heavily into colour. All the streets and buildings are set for instant colourmatch: as you walk down the road they change hue to offset whatever you're wearing. When the streets are busy it's kind of intense, and anyone prone to epileptic seizures isn't allowed to live in the Neighbourhood, however much they're into colour. - Michael Marshall Smith, "Only Forward" It gives me great pleasure to announce the release of Perl 5.11.2. This is the third DEVELOPMENT release in the 5.11.x series leading to a stable release of Perl 5.12.0. You can find a list of high-profile changes in this release in the file "perl5112delta.pod" inside the distribution. You can download the 5.11.2 release from: http://search.cpan.org/~lbrocard/perl-5.11.2/ The release's SHA1 signatures are: 2988906609ab7eb00453615e420e47ec410e0077 perl-5.11.2.tar.gz 0014442fdd0492444e1102e1a80089b6a4649682 perl-5.11.2.tar.bz2

use.perl — Rakudo Perl 6 development release #23 (

Announce: Rakudo Perl 6 development release #23 ("Lisbon")On behalf of the Rakudo development team, I'm pleased to announce theNovember 2009 development release of Rakudo Perl #23 "Lisbon".Rakudo is an implementation of Perl 6 on the Parrot Virtual Machine(see http://www.parrot.org). The tarball for the November 2009 releaseis available from http://github.com/rakudo/rakudo/downloadsDue to the continued rapid pace of Rakudo development and the frequentaddition of new Perl 6 features and bugfixes, we recommend building Rakudofrom the latest source, available from the main repository at github.More details are available at http://rakudo.org/how-to-get-rakudo.Rakudo Perl follows a monthly release cycle, with each release codenamed after a Perl Mongers group. The November 2009 release is codenamed "Lisbon" for Lisbon.pm, who did a marvellous job arranging thisyear's YAPC::EU.Shortly after the October 2009 (#22) release, the Rakudo teambegan a new branch of Rakudo development ("ng") that refactorsthe grammar to much more closely align with STD.pm as well asupdate some core features that have been difficult to achievein the master branch [1, 2]. Most of our effort for the past monthhas been in this new branch, but as of the release date the newversion had not sufficiently progressed to be the release copy.We expect to have the new version in place in the December 2009 release.This release of Rakudo requires Parrot 1.8.0. One must stillperform "make install" in the Rakudo directory before the "perl6"executable will run anywhere other than the Rakudo build directory.For the latest information on building and using Rakudo Perl, see thereadme file section titled "Building and invoking Rakudo".Some of the specific changes and improvements occuring with thisrelease include:* Rakudo is now passing 32,753 spectests, an increase of 171 passing tests since the October 2009 release. With this release Rakudo is now passing 85.5% of the available spectest suite.* As mentioned above, most development effort for Rakudo in November has taken place in the "ng" branch, and will likely be reflected in the December 2009 release.* Rakudo now supports unpacking of arrays, hashes and objects in signatures* Rakudo has been updated to use Parrot's new internal calling conventions, resulting in a slight performance increase.The development team thanks all of our contributors and sponsors formaking Rakudo Perl possible. If you would like to contribute,see http://rakudo.org/how-to-help , ask on the perl6-compiler@perl.orgmailing list, or ask on IRC #perl6 on freenode.The next release of Rakudo (#24) is scheduled for December 17, 2009.A list of the other planned release dates and codenames for 2009 isavailable in the "docs/release_guide.pod" file. In general, Rakudodevelopment releases are scheduled to occur two days after eachParrot monthly release. Parrot releases the third Tuesday of each month.Have fun![1] http://use.perl.org/~pmichaud/journal/39779[2] http://use.perl.org/~pmichaud/journal/39874

Curtis Poe — How Not To Advertise Your Job Service

From the front page of getcoboljobs.com:

"I've used the service for some weeks now and even though I haven't had an offer from anyone yet, I'll keep using the service. Thanks for providing it." - Michael Mayes, Mainframe Developer Analyst

Yeah, that would make me rush right out and sign up! (Does just getting close to COBOL make you stupid?)

by Ovid at November 23, 2009 13:56 UTC

Curtis Poe — Hash Objects and AutoInflation

Until the issues are sorted out with blogs.perl.org, I'll still be posting here for a while (probably cross-posting for a while after, too).

Recently I've wanted a simple hash object:

my $object = Hash::Object->new({ id => $some_id, this => 'that', name => 'john', }); print $object->name; # john

The problem is that current implementations I've found have limitations. One uses AUTOLOAD so any method name becomes valid. Another doesn't do that, but it also does nothing else. I'd like to be able to autoinflate an object such that if I call an unknown method, it inflates the hash object into a full-blown object and redispatches to that object, only displaying a "Can't locate object method" error if the "inflated" version of that object doesn't provide the requested method.

my $object = Hash::Object->new( data => { id => $some_id, this => 'that', name => 'john', }, inflate => sub { Customer->find({ id => shift->id }) } ); print $object->name; # inflates to full Customer object and redispatches method print $object->fullname;

The reason for this is that we sometimes have very expensive objects, but we only need one or two pieces of data from them. It would be nice to return a "proxy" object, containing the data we probably want, but will transparently work like the real thing if needed.

This is on the CPAN, yes? What's it called?

by Ovid at November 23, 2009 11:40 UTC

November 22, 2009

David Golden — Extending the timeline for CPAN Meta Spec revisions

When I set out the time line for revisions to the CPAN Meta Spec, I was anticipating closing the public comments at the end of October and finalizing the spec at the end of November.

Fortunately for Perl, but unfortunately for me, our new pumpking, Jesse Vincent, announced on Halloween a Perl core feature freeze three weeks later (i.e. this weekend). Getting Module::Build and other bits of the toolchain ready for that has sucked up the time I was hoping to use to synthesize the Meta Spec patches for consistency and get the working group to discuss them. (I would expect that members of the working group are similarly distracted with a last-minute surge prior to the freeze.)

Therefore, I’m extending the timeline for the CPAN Meta Spec revisions by one month, and will be aiming to distribute a draft to the working group by the end of November and to finalize the new spec by Jan 1.

by dagolden at November 22, 2009 16:27 UTC

November 21, 2009

Yuval Kogman — Restricted Perl

zby's comments on my last post got me thinking. There are many features in Perl that we no longer use, or that are considered arcane or bad style, or even features we could simply live without. However, if they were removed, lots of code would break. So we keep those features, and we keep writing new code that uses them.

Suppose there was a pragma, similar to no indirect in that it restricts existing language features, and similar strict in that it lets you opt out of unrelated discouraged behaviors.

I think this would be an interesting baby step towards solving some of the problems that plague Perl code today:

Features that are often misused and need lots of critique.
Language features that are hard to change in the interpreter's implementation, limiting the revisions we can make to Perl 5.
Code that will be hard to translate to Perl 6, for no good reason.

On top of that one could implement several different defaults sets of feature-restricted Perl (sort of like Modern::Perl).

Instead of designing some sort of restricted subset of Perl 5 from the bottom up, several competing subsets could be developed organically, and if memory serves me right that is something we do quite well in our community =)

So anyway, what are some things that you could easily live without in Perl? What things would you be willing to sacrifice if it meant you could trade them off for other advantages? Which features would you rather disallow as part of a coding standard?

My take

Here are some ideas. They are split up into categories which are loosely related, but don't necessarily go hand in hand (some of them even contradict slightly).

They are all of a reasonable complexity to implement, either validating something or removing a language feature in a lexical scope.

It's important to remember that these can be opted out of selectively, when you need them, just like you can say no warnings 'uninitialized' when stringifying undef is something you intentionally allowed.

Restrictions that would facilitate static modularity

The first four restrictions make it possible to treat .pm files as standalone, cacheable compilation units. The fifth also allows for static linkage (no need to actually invoke import when evaluating a use statement), since the semantics of import are statically known. This could help alleviate startup time problems with Perl code, per complicit compilation unit (without needing to solve the problem as a whole by crippling the adhoc nature of Perl's compile time everywhere).

Disallow recursive require.
Disallow modification to a package's symbol table after its package declaration goes out of scope.
Restrict a file to to only one package (which must match the .pm file name).
Disallow modification of other packages other than the currently declared one.
Restrict the implementation of import to a statically known one.
Disallow access to external symbols that are not bound at compile time (e.g. variables from other packages, subroutines which weren't predeclared (fully qualified is OK).

Restrictions that allow easier encapsulation of side effects

These restrictions address pollution of state between unrelated bits of code that have interacting dynamic scopes.

Disallow modification of any global variables that control IO behavior, such as $/, $|, etc, as well as code that depends on them. IO::Handle would have to be augmented a bit to allow per handle equivalents, but it's most of the way there.
Disallow such variables completely, instead requiring a trusted wrapper for open that sets them at construction time and leaves them immutable thereafter.
Disallow /g matches on anything other than private lexicals (sets pos)
Disallow $SIG{__WARN__}, $SIG{__DIE__}, and $^S
Disallow eval (instead, use trusted code that gets local $@ right)
Disallow use of global variables altogether. For instance, instead of $! you'd rely on autodie, for @ARGV handling you'd use MooseX::Getopt or App::Cmd.
Disallow mutation through references (only private lexical variables can be modified directly, and complex data structures are therefore immutable after being constructed). This has far reaching implications for object encapsulation, too.

Restrictions that would encourage immutable data.

These restrictions alleviate some of the mutation centric limitations of the SV structure, that make lightweight concurrency impossible without protecting every variable access with a mutex. This would also allow aggressive COW.

Only allow assignment to a variable at its declaration site. This only applies to lexicals.
Allow only a single assignment to an SV (by reference or directly. Once an SV is given a value it becomes readonly)
Disallow assignment modification of external variables (non lexicals, and closure captures). This is a weaker guarantee than the previous one (which is also much harder to enforce), but with similar implications (all assignment is guaranteed to have side effects that outlive its lexical scope)

Since many of the string operations in Perl are mutating, purely functional variants should be introduced (most likely as wrappers).

Implicit mutations (such as the upgrading of an SV due to numification) typically results in a copy, so multithreaded access to immutable SVs could either pessimize the caching or just use a spinlock on upgrades.

Restrictions that would facilitate functional programming optimizations

These restrictions would allow representing simplified optrees in more advanced intermediate forms, allowing for interesting optimization transformations.

Disallow void context expressions
...except for variable declarations (with the afore mentioned single use restrictions, this effectively makes every my $x = ... into a let style binding)
Allow only a single compound statement per subroutine, apart from let bindings (that evaluates to the return value). This special cases if blocks to be treated as a compound statement due to the way implicit return values work in Perl.
Disallow opcodes with non local side effects (including calls to non-verified subroutines) for purely functional code.

This is perhaps the most limiting set of restrictions. This essentially lets you embed lambda calculus type ASTs natively in Perl. Alternative representations for this subset of Perl could allow lisp style macros and other interesting compile time transformations, without the difficulty of making that alternative AST feature complete for all of Perl's semantics.

Restrictions that facilitate static binding of OO code

Perl's OO is always late bound, but most OO systems can actually be described statically. These restrictions would allow you to opt in for static binding of OO dispatch for a given hierarchy, in specific lexical scopes. This is a little more complicated than just lexical restrictions on features, since metadata about the classes must be recorded as well.

Only allow blessing into a class derived from the current package
Enforce my Class $var, including static validation of method calls
Disallow introduction of additional classes at runtime (per class hierarchy or alltogether)
Based on the previous two restrictions, validate method call sites on typed variable invocants as static subroutine calls (with several target routines, instead of one)
Similar the immutable references restriction above, disallow dereferencing of any blessed reference whose class is not derived from the current package.

Restrictions that are easy to opt in to in most code (opting out only as necessary)

These features are subject to lots of criticism, and their usage tends to be discouraged. They're still useful, but in an ideal world they would probably be implemented as CPAN modules.

Disallow formats
Disallow $[
Disallow tying and usage of tied variables
Disallow overloading (declaration of overloads, as well as their use)

A note about implementation

Most of these features can be implemented in terms of opcheck functions possibly coupled scrubbing triggered by and end of scope hook. Some of them are static checks at use time. A few others require more drastic measures. For related modules see indirect, Safe, Sys::Protect, and Devel::TypeCheck to name but a few

I also see a niche for modules that implement alternatives to built in features, disabling the core feature and providing a better alternative that replaces it instead of coexisting with it. This is the next step in exploratory language evolution as led by Devel::Declare.

The difficulty of modernizing Perl 5's internals is the overwhelming amount of orthogonal concerns whenever you try to implement something. Instead of trying to take care of these problems we could make it possible for the user to promise they won't be an issue. It's not ideal, but it's better than nothing at all.

The distant future

If this sort of opt-out framework turns out to be successful, there's no reason why use 5.20.0 couldn't disable some of the more regrettable features by default, so that you have to explicitly ask for them instead. This effectively makes Perl's cost model per-per-use, instead of always pay.

This would also increase the likelihood that people stop using such features in new code, and therefore the decision making aspects of the feature deprecation process would be easier to reason about.

Secondly, and perhaps more importantly, it would be possible to try for alternative implementations of Perl 5 with shorter termed deliverables.

Compiling a restricted subset of Perl to other languages (for instance client side JavaScript, different bytecodes, adding JIT support, etc) is a much easier task than implementing the language as a whole. If more feature restricted Perl code would be written and released on the CPAN, investments in such projects would be able to produce useful results sooner, and have clearer indications of progress.

by nothingmuch (nothingmuch@woobling.org) at November 21, 2009 21:47 UTC

Leon Brocard — Perl 5.11.2

The streets were pretty quiet, which was nice. They're always quiet here
at that time: you have to be wearing a black jacket to be out on the
streets between seven and nine in the evening, and not many people in the
area have black jackets. It's just one of those things. I currently live
in Colour Neighbourhood, which is for people who are heavily into colour.
All the streets and buildings are set for instant colourmatch: as you
walk down the road they change hue to offset whatever you're wearing.
When the streets are busy it's kind of intense, and anyone prone to
epileptic seizures isn't allowed to live in the Neighbourhood, however
much they're into colour.
- Michael Marshall Smith, "Only Forward"

It gives me great pleasure to announce the release of Perl 5.11.2.

This is the third DEVELOPMENT release in the 5.11.x series leading to a stable release of Perl 5.12.0. You can find a list of high-profile changes in this release in the file "perl5112delta.pod" inside the distribution.

You can download the 5.11.2 release from:

http://search.cpan.org/~lbrocard/perl-5.11.2/

The release's SHA1 signatures are:

2988906609ab7eb00453615e420e47ec410e0077 perl-5.11.2.tar.gz

0014442fdd0492444e1102e1a80089b6a4649682 perl-5.11.2.tar.bz2

We welcome your feedback on this release. If you discover issues with Perl 5.11.2, please use the 'perlbug' tool included in this distribution to report them. If Perl 5.11.2 works well for you, please use the 'perlthanks' tool included with this distribution to tell the all-volunteer development team how much you appreciate their work.

If you write software in Perl, it is particularly important that you test your software against development releases. While we strive to maintain source compatibility with prior stable versions of Perl wherever possible, it is always possible that a well-intentioned change can have unexpected consequences. If you spot a change in a development version which breaks your code, it's much more likely that we will be able to fix it before the next stable release. If you only test your code against stable releases of Perl, it may not be possible to undo a backwards-incompatible change which breaks your code.

Notable changes in this release:

It is now possible to overload the C operator
Extension modules can now cleanly hook into the Perl parser to define new kinds of keyword-headed expression and compound statement
The lowest layers of the lexer and parts of the pad system now have C APIs available to XS extensions
Use of C<:=> to mean an empty attribute list is now deprecated

Perl 5.11.2 represents approximately 3 weeks development since Perl 5.11.1 and contains 29,992 lines of changes across 458 files from 38 authors and committers:

Abhijit Menon-Sen, Abigail, Ben Morrow, Bo Borgerson, Brad Gilbert, Bram, Chris Williams, Craig A. Berry, Daniel Frederick Crisman, Dave Rolsky, David E. Wheeler, David Golden, Eric Brine, Father Chrysostomos, Frank Wiegand, Gerard Goossen, Gisle Aas, Graham Barr, Harmen, H.Merijn Brand, Jan Dubois, Jerry D. Hedden, Jesse Vincent, Karl Williamson, Kevin Ryde, Leon Brocard, Nicholas Clark, Paul Marquess, Philippe Bruhat, Rafael Garcia-Suarez, Sisyphus, Steffen Mueller, Steve Hay, Steve Peters, Vincent Pit, Yuval Kogman, Yves Orton, and Zefram.

Many of the changes included in this version originated in the CPAN modules included in Perl's core. We're grateful to the entire CPAN community for helping Perl to flourish.

Jesse Vincent or a delegate will release Perl 5.11.3 on December 20, 2009. Ricardo Signes will release Perl 5.11.4 on January 20, 2010. Steve Hay will release Perl 5.11.5 on February 20, 2010.

Regards, Léon

by acme at November 21, 2009 08:56 UTC

November 20, 2009

Gabor Szabo — FOSDEM application dead-line in 2 days!

As mentioned earlier in ( Perl on FOSDEM ) I would like to have some interesting Perl content on FOSDEM. I have submitted a request for a Perl stand and should receive a reply in 10 days but there are a few other interesting possibilities. Most notably there ara talks in the Main track and there are Lightning talks thought on FOSDEM they take 15 minutes.

The dead-line to submit a proposal to the Main track is on 22 November, in 2 days from now. It is time to fill the form if you'd like to talk about your stuff in-front of the FOSDEM attendees.

by Gabor Szabo at November 20, 2009 20:12 UTC

Perl Buzz — Perl gets modern community blogging platform at blogs.perl.org

In a move of unparalleled beauty, Dave Cross and Aaron Crane have announced blogs.perl.org, a modern blogging platform for the Perl community.

Go look. Enjoy the non-ugly color scheme. Marvel at the code syntax highlighting and ability to embed images. Navigate posts using thoughtful categories.

A million thanks to Dave and Aaron for putting this together, and to Six Apart for the design. Links to feeds will be going up here on Perlbuzz as soon as I have time.

by Andy Lester at November 20, 2009 17:11 UTC

Dave Cross — My Sekrit Project

It's quite possible that over the last year or so you've heard me muttering about a secret project that I've been working on. Well, this morning I can finally reveal what it is.

Do you ever wish that the Perl community had a centralised blogging site where anyone could set up a blog for free? Something, perhaps, that allows you to use modern blogging features like images in your posts or tags.

If you've ever wished for something like that, then can I suggest that you take a look at blogs.perl.org. I think it might be just what you're looking for.

The site is built using Movable Type and we were lucky enough to get some people from Six Apart to build it for us. I'd like to particularly thank Steve Cook of their professional services division who has done a lot of the actual work. Thanks also go to Jeremy King who designed the site and David Jacobs who is their manager and allowed them to work on the project on company time.

From the Perl community I need to thanks Aaron Crane who is hosting the site and Curtis Poe and Aristotle Pagaltzis who have both been involved in planning this project. Many other people have given invaluable advice or have been early testers of the site. Thanks to everyone who has been involved.

All that remains now is for you to try it out. You should regard it as a beta test version, so some of you will find problems. When you do, please just let me know and we'll fix them.

I hope you enjoy the site and find it useful.

Update: Yes, there seems to be one quite glaring problem with it. That's the web server errors that are generated occasionally when someone tries to log in (or out). Seems to be a resource allocation issue with the server. We're looking into it. Please bear with me.

by Dave Cross at November 20, 2009 09:50 UTC

Curtis Poe — blogs.perl.org

It's official! We've a new blogging site available for the Perl community. Dave Cross has now made the announcement. I've a blog post over there explaining a few things about it. It's still alpha and there will be bugs, so keep that in mind. That being said, play around and have fun.

by Ovid at November 20, 2009 09:03 UTC

November 19, 2009

Dave Cross — London Perl Workshop

The London Perl Workshop is getting closer. It's on Saturday 5th December at the University of Westminster's Cavendish Street Campus (the same place it's been for the last few years).

The schedule was announced a couple of days ago and, at always, it looks like a great line-up. I'm particularly pleased to see that Tatsuhiko Miyagawa will be there talking about Plack and PSGI? (although, slightly less pleased to see that it clashes with a presentation that I'm involved with).

I'm going to be involved in a few things at the workshop. They've invited me to give the keynote again, so I'm giving a talk called The "M" Word?. Later in the day I'm giving a two hour tutorial called The Professional Programmer which will discuss some of the practicalities of working in the IT industry (this is largely aimed at the university's students but others will also be welcome). Finally, towards the end of the day, I'll be speaking alongside Matt Trout, Curtis Poe and Ed Freyfogle on a panel called Skills in the Workplace.

The LPW is always a great day. I hope you'll come along and join in. Oh, and even if you can't make it you can pretend you were there by buying one of the workshop t-shirts.

by Dave Cross at November 19, 2009 13:36 UTC

Yuval Kogman — Functional programming and unreasonable expectations

<record type="broken">I'm a big fan of purely functional programming</record>.

Another reason I like it so much is that purely functional software tends to be more reliable. Joe Armstrong of Erlang fame makes that point in an excellent talk much better than I could ever hope to.

However, one aspect he doesn't really highlight is that reliability is not only good for keeping your system running, it also makes it easier to program.

When a function is pure it is guaranteed to be isolated from other parts of the program. This separation is makes it much easier to change the code in one place without breaking anything unrelated.

Embracing this style of programming has had one huge drawback though: it utterly ruined my expectations of non functional code.

In imperative languages it's all too easy to add unstated assumptions about global state. When violated, these assumptions then manifest in very ugly and surprising ways (typically data corruption).

A good example is reentrancy (or rather the lack thereof) in old style C code. Reentrant code can be freely used in multiple threads, from inside signal handlers, etc. Conversely, non-reentrant routines may only be executed once at a given point in time. Lack of foresight in early C code meant that lots of code had to be converted to be reentrant later on. Since unstated assumptions are by definition hidden this can be a difficult and error prone task.

The specific disappointment that triggered this post is Perl's regular expression engine.

Let's say we're parsing some digits from a string and we want to create a SomeObject with those digits. Easy peasy:

$string =~ m/(\d+)/;
push @results, SomeObject->new( value => $1 );

Encapsulating that match into a resuable regex is a little harder though. Where does the post processing code go? Which capture variable does it use? Isolation would have been nice. The following example might work, but it's totally wrong:

my $match_digits = qr/(\d+)/;

my $other_match = qr{ ... $match_digits ... }x;

$string =~ $other_match;
push @results, SomeObject->new( value => $1 ); # FIXME makes no sense

Fortunately Perl's regex engine has a pretty awesome feature that let you run code during a match. This is very useful for constructing data from intermittent match results without having to think about nested captures, especially since the $^N variable conveniently contains the result of the last capture.

Not worrying about nested captures is important when you're combining arbitrary patterns into larger ones. There's no reliable way to know where the capture result ends up so it's easiest to process it as soon as it's available.

qr{
    (\d+) # match some digits

    (?{
        # use the previous capture to produce a more useful result
        my $obj = SomeObject->new( value => $^N );

        # local allows backtracking to undo the effects of this block
        # this would have been much simpler if there was a purely
        # functional way to accumulate arbitrary values from regexes
        local @results = @results, $obj;
    })
}x;

Even though this is pretty finicky it still goes a long way. With this feature you can create regexes that also encapsulate the necessary post processing, while still remaining reusable.

Here is a hypothetical the definition of SomeObject:

package SomeObject;
use Moose;

has value => (
    isa => "Int",
    is  => "ro",
);

Constructing SomeObject is a purely functional operation: it has no side effects, and only returns a new object.

The only problem is that the above code is totally broken. It works, but only some of the time. The breakage is pretty random.

Did you spot the bug yet? No? But it's oh so obvious! Look inside Moose::Util::TypeConstraints::OptimizedConstraints and you will find the offending code:

sub Int { defined($_[0]) && !ref($_[0]) && $_[0] =~ /^-?[0-9]+$/ }

The constructor Moose generated for SomeObject is in fact not purely functional at all; though seemingly well behaved, in addition to returning an object it also the side effect of shitting all over the regexp engine's internal data structures, causing random values to be occasionally assigned to $^N (but only if invoked from inside a (?{ }) block during a match). You can probably imagine what a great time I had finding that bug.

What makes me sad is that the Int validation routine appears purely functional. It takes a value and then without modifying anything merely checks that it's defined, that it's not a reference, and that its stringified form contains only digits, returning a truth value as a result. All of the inputs and all of the outputs are clear, and therefore it seems only logical that this should be freely reusable.

When I came crying to #p5p it turned out that this is actually a known issue. I guess I simply shouldn't have expected the regexp engine to do such things, after all it has a very long history and these sorts of problems are somewhat typical of C code.

If the regexp engine was reentrant the what I tried to do would have just worked. Reentrancy guarantees one level of arbitrary combinations of code (the bit of reentrant code can be arbitrarily combined with itself). Unfortunately it seems very few people are actually in a position to fix it.

Purely functional code goes one step further. You can reliably mix and match any bit of code with any other bit of code, combining them in new ways, never having to expect failure. The price you have to pay is moving many more parameters around, but this is exactly what is necessary to make the boundaries well defined: all interaction between components is explicit.

When old code gets reused it will inevitably get prodded in ways that the original author did not think of. Functional code has a much better chance of not needing to be reimplemented, because the implementation is kept isolated from the usage context.

In short, every time you write dysfunctional code god kills a code reuse. Please, think of the code reuse!

by nothingmuch (nothingmuch@woobling.org) at November 19, 2009 02:01 UTC

Adam Kennedy — The unfortunate demise of the plan

Over the last year, I've seen a disturbing trend on the part of some of Perl testing thought-leaders (ugh... what's a better word for this... Testerati?) to demonise the testing plan.

I thought I'd take a moment to come to the defense of a plan (and why I don't like done_testing() except in a very specific situation).

When I wrote my first CPAN module, and as a result discovered Perl testing, the thing that impressed me most of all was not the syntax, or the fact that tests were just programs (both of which I like).

It was the testing plan that I found to be a stroke of brilliance.

Even though it can be a little annoying to maintain the number (although updating this number sounds like a good feature for an editor to implement) the plan catches two major problems simultaneously.

1. It detects ending, aborting, dieing and crashing tests, even if the crash is instantaneous with no evidence of it happening.

2. It detects running too many tests, too few tests, bad skip blocks, and other soft failures, preventing the need to write tons of explicit list length tests.

It also prevents the need to say when you are done, reducing the size of your test code.

For example, look at the following two code blocks, both of which are equivalent.

This is the new no plan done_testing way.
use Test::More;

my @list = list(); is( scalar(@list), 2, 'Found 2 members' );

foreach ( @list ) { ok( $_, 'List member ok' ); }

done_testing();
And this is the original way.
use Test::More tests => 2;

foreach ( list() ) { ok( $_, 'List member ok' ); }

The difference is stark. In the original, I know I'm doing 2 tests so I don't need to test for the size of the list from list(). If list returns 3 things, or 1 thing, the test will fail.

There is one clear use case for done_testing (and btw, who decided that done_testing was a great name for a function, what ELSE are you going to be done with? Surely just done() is good enough) and that is when the number of elements returned by list() is unknowable.

Even in this case, I still prefer explicitly telling Test::More that I have NFI how many tests will run. That at least gives it some certainty that I have actually started testing.

The new paradigm of not using a plan is far far messier, for no obvious benefit I can see and open up new areas with bugs can creep in far too easily.

by Alias at November 19, 2009 00:07 UTC

November 18, 2009

Gabor Szabo — Padre supporting technologies in addition to Perl

Looking at the results of the 2nd editor poll it seems that most of what perl developers do is web programming. This is probably just a side effect of both the very limited reach of the poll - mostly people reading my blog in one way or another - and the fact that not many other technologies were listed.

My hope with the poll was to create a short lits of technologies we should focus on when thinking about improving Padre but after seeing the results and reading the comments, the poll was not really asking the right question.

Of course we would like to provide syntax highlighting to many languages and actually Padre already does in some level as we get it "free" from using Scintilla. We also want to make it easy to write your own syntax highlighter.

This is not the level of support I am thinking of. Hence the question of popularity of other technologies among Perl users is less important. What would be more important to know is what are the technologies people use that they need help with? For example I am not too familiar with CSS so when I write a CSS file it would be great to be able to fetch the list of attributes and possible values and even have some explanation next to it. Same with Javascript an JQuery and with PDL, the Perl Data Language

Just as we provide context sensitive help for both Perl 5 and Perl 6 and soon for PIR files as well, it would be great to have such help for langugages we use less often.

If you are interested in helping in either of these extension, the Padre web site have directions how to reach us.

by Gabor Szabo at November 18, 2009 23:14 UTC

David Golden — Perl 6 is Perl 5++

I’m amused reading masak and mst take stock of the Perl 5 vs. Perl 6 issue/debate/brouhaha/whatever. I think mst gets it slightly closer to right (thanks to his refreshing directness), but both miss what I think is an obvious analogy:

Perl 6 is to Perl 5 as C++ is to C

Mostly familiar syntax? Check. Easy to move from one to the other and back again? Check. C++ didn’t turn out to be the successor to C any more than I think Perl 6 is the successor to Perl 5.

If Larry had just called it Perl 5++, everyone would have gotten the joke up front and the whole successor meme would never have gotten in the way of things.

by dagolden at November 18, 2009 03:01 UTC

November 17, 2009

Perl Buzz — Perlbuzz news roundup for 2009-11-17

These links are collected from the Perlbuzz Twitter feed. If you have suggestions for news bits, please mail me at andy@perlbuzz.com.

Pod::Simple 3.09 hits the CPAN (justatheory.com)
Strawberry Perl and the nightmare of installing Padre (use.perl.org)
A busy month for masak in Perl 6 (use.perl.org)
A productive week in Rakudo-land (use.perl.org)
Perl one-liners explained part III: Calculations (catonmat.net)
Handy one-liner to lowercase all filenames in a directory: ls | perl -lne'$x=lc;print qq{mv $_ $x}' | sh -x
Use CPAN's toolchain to improve your code (use.perl.org)
Future Perl snuck up on me (headrattle.blogspot.com)
Find the stupid bug in my progress indicator: say "$n so far" if ( $n % 100000 )";
I maeked u a shell: lolshell, written in Perl 6 (theintersect.org)
The horrible bug your command line Perl program probably has (perlbuzz.com)
Frozen Perl 2010 looking for speakers (news.perlfoundation.org)
apache2rest is a new framework for REST APIs under mod_perl2 (code.google.com)
Putting MySQL on a ramdisk to speed up tests (use.perl.org)
Generating Feedburner graphs (catonmat.net)

by Andy Lester at November 17, 2009 20:34 UTC

Yuval Kogman — Scoping of the current package

The one thing that I almost always notice when playing around in non Perl languages is how well Perl handles scoping.

There is one place in which Perl got it totally wrong though.

The value of the current package is lexically scoped:

package Foo;

{
    package Bar;
    say __PACKAGE__; # prints Bar
}

say __PACKAGE__; # prints Foo

However, the notion of the current package during compilation is dynamically scoped, even between files:

# Foo.pm:

package Foo;
use Bar;

# Bar.pm:

say __PACKAGE__; # prints Foo

In other words, if you don't declare a package at the top of the .pm file before doing anything, you are risking polluting the namespace of the module that called you. What's worse is that it can be unpredictable, only the first module to load Bar will leak into Bar.pm, so this could amount to serious debugging headaches.

Consider the following:

# Foo.pm:

package Foo;
use Moose;

use Bar;

sub foo { ... }

Now suppose a subsequent version of Bar is rewritten using MooseX::Declare:

use MooseX::Declare;

class Bar {
    ...
}

Guess which package the class keyword was exported to?

But maybe Bar was tidy and used namespace::clean; instead of making $foo_object->class suddenly start working, $foo_object->meta would suddenly stop working. And all this without a single change to Foo.pm.

Now imagine what would happen if Foo did require Bar instead of use…

Anyway, I think the point was made, always declare your package upfront or you risk pooping on your caller. Anything you do before an explicit package declaration is in no man's land.

I'm pretty sure a future version of MooseX::Declare will contain a specific workaround for this, but I still think it's a good habit to always start every file with a package declaration, even if it's made redundant a few lines down.

by nothingmuch (nothingmuch@woobling.org) at November 17, 2009 12:04 UTC

Curtis Poe — Why no plan for nested TAP?

At least one person has objected quite strongly to my patch for Test::Builder which adds an implicit done_testing() to subtests with no plan. I was kind of surprised with the vehemence of the discussion, so I want to clear this up.Consider the following test subtests:

subtest 'with plan' => sub { plan => 2; is $this, $that, 'this is that'; is $foo, $bar, 'foo is bar'; }; subtest 'without plan' => sub { is $this, $that, 'this is that'; is $foo, $bar, 'foo is bar'; };

Now imagine that this is a test suite you're currently working on and you keep adding tests to each. It will be annoying to keep updating the plan and sometimes get spurious test failures when your code is fine. The "without plan" subtest is much easier to use. If you want plans in your subtests, fine! Use them. There's nothing stopping you, but if you forget, I've added a safety net. Don't sweat it and have fun.

Unlike with the overall test program, you know exactly when that subtest ends. The plan buys you virtually nothing. In a subtest, instead of an annoyance which actually helps, it's an annoyance which hinders. Heck, for my top level tests, I'm thinking about writing a vim macro which automatically inserts this:

use Test::Most; END { done_testing() }

Plans are almost not needed any more (this isn't entirely true, but for subtests, why bother?). I'm hard-pressed to believe that they now add enough value to overcome the annoyance of using them.

by Ovid at November 17, 2009 11:24 UTC

Kirrily Robert — Defining openness: open source, open data, open APIs, open communities, and more

A couple of weeks ago I was in Florida giving a talk on Open Source, Open Data in which I tried to describe what open data was. In preparation for that talk, I went looking for definitions of “open” as it applied to either field, and found myself drawing on the following documents:

The four software freedoms
This definition of free cultural works, which is based on the four software freedoms
The OSI’s Open Source Definition, and
The Open Knowledge Definition which seems to be largely based on the open source definition.

In the end I structured my talk around the four freedoms because, let’s face it, they’re snappier — but this is all just background.

In any case, I’ve started to collect articles that talk about openness, and in the last couple of weeks I’ve seen a burst of them. Perhaps I’m just hyper-aware at the moment, or maybe we’re going through a phase of introspection about the whole idea. In any case, I thought I’d post a round-up of recent posts on describing, defining, and measuring openness for software, data, APIs, and the communities and processes that surround them.

From the OpenGeoData blog, The Cake Test for determining whether geodata is truly open:

What is the Cake Test? Easy: A set of geodata, or a map, is libre only if somebody can give you a cake with that map on top, as a present.

Cakes are empirical proof that most the data in most SDIs cannot be used freely, because of the licensing terms of the SDIs. And they are an empirical proof that attendants to the latest spanish SDI conference could taste themselves.

Louis Gray, The blurry picture of open APIs, standards, data ownership:

Following the much-discussed news of Facebook debuting its “Open Graph API” on Wednesday, I traded a few e-mails with a few respected tech-minded developers, and found, unsurprisingly, that not everyone believes Facebook is fully “open”. In fact, it’s believed some companies are playing fast and loose with terms that should be better understood.

To quickly summarize the discussion, there are essentially three major ways to bucket “open” APIs…

open access

APIs that leverage open standards

open standard APIs like OpenSocial, OpenID, PubSubHubbub, AtomPub and others

In short, you have “open but we control the process”, “standing on the backs of open” and “truly open”, if this opinion is accepted. The developer adds, “In short, the first two mean nothing, the last one actually fits the dictionary definition. The Web is built on open standard APIs and protocols.”

Simon Phipps, A software freedom scorecard (video from a talk at the South Tyrol Free Software Conference last week) describes why an OSI-approved license isn’t enough to guarantee software freedom, and describes a number of indicators you can use to quantify the freedom of a given piece of software.

Matt Zimmerman, Open vs. open vs. open: a model for public collaboration describes three axes of openness for open source projects:

Open (available)

In order for your project to be open to anyone, they need to be able to find out about it and experience it for themselves.

Open (transparent)

The next flavor of openness is about transparency. This means enabling people to find out about what is happening in your project.

Open (for participation)

The third type of openness is open participation. This builds on transparency by creating a feedback loop: people observe activity in your project, react to it, and then actually change its course.

Finally, Melissa Draper posted about Open community, pointing out external commentary and even criticism is a natural part of having an open (transparent – to use mdz’s term) community.

(Note: Some blockquoted sections above have been edited for length.)

Got any other good links — especially recent ones — on the topic? I’m sure I’ve missed some.

by Skud at November 17, 2009 06:26 UTC

Oct	NOV	Jan
	28
2008	2009	2010

Colophon

Perl blogs

Planetarium