Editor's note: Perl guru Tom Christiansen created and maintains a list of 44 recipes for working with Unicode in Perl 5. Perl.com is pleased to serialize this list over the coming weeks.
? 44: PROGRAM: Demo of Unicode collation and printing
The past several weeks of Unicode recipes have explained how Unicode works and shown how to use it in your programs. If you've gone through those recipes, you now understand more than most programmers.
How about putting everything together?
Here's a full program showing how to make use of locale-sensitive sorting, Unicode casing, and managing print widths when some of the characters take up zero or two columns, not just one column each time. When run, the following program produces this nicely aligned output (though the quality of the alignment depends on the quality of your Unicode font, of course):
Cr?me Br?l?e....... ?2.00
?clair............. ?1.60
Fideu?............. ?4.20
Hamburger.......... ?6.00
Jam?n Serrano...... ?4.45
Lingui?a........... ?7.00
P?t?............... ?4.15
Pears.............. ?2.00
P?ches............. ?2.25
Sm?rbr?d........... ?5.75
Sp?tzle............ ?5.50
Xori?o............. ?3.00
Γ?ρο?.............. ?6.50
???............. ?4.00
おもち............. ?2.65
お好み焼き......... ?8.00
シュークリーム..... ?1.85
寿司............... ?9.99
包子............... ?7.50
Here's that program; tested on v5.14.
#!/usr/bin/env perl
# umenu - demo sorting and printing of Unicode food
#
# (obligatory and increasingly long preamble)
#
use utf8;
use v5.14; # for locale sorting and unicode_strings
use strict;
use warnings;
use warnings qw(FATAL utf8); # fatalize encoding faults
use open qw(:std :utf8); # undeclared streams in UTF-8
use charnames qw(:full :short); # unneeded in v5.16
# std modules
use Unicode::Normalize; # std perl distro as of v5.8
use List::Util qw(max); # std perl distro as of v5.10
use Unicode::Collate::Locale; # std perl distro as of v5.14
# cpan modules
use Unicode::GCString; # from CPAN
# forward defs
sub pad($$$);
sub colwidth(_);
sub entitle(_);
my %price = (
"γ?ρο?" => 6.50, # gyros, Greek
"pears" => 2.00, # like um, pears
"lingui?a" => 7.00, # spicy sausage, Portuguese
"xori?o" => 3.00, # chorizo sausage, Catalan
"hamburger" => 6.00, # burgermeister meisterburger
"?clair" => 1.60, # dessert, French
"sm?rbr?d" => 5.75, # sandwiches, Norwegian
"sp?tzle" => 5.50, # Bayerisch noodles, little sparrows
"包子" => 7.50, # bao1 zi5, steamed pork buns, Mandarin
"jam?n serrano" => 4.45, # country ham, Spanish
"p?ches" => 2.25, # peaches, French
"シュークリーム" => 1.85, # cream-filled pastry like ?clair, Japanese
"???" => 4.00, # makgeolli, Korean rice wine
"寿司" => 9.99, # sushi, Japanese
"おもち" => 2.65, # omochi, rice cakes, Japanese
"cr?me br?l?e" => 2.00, # tasty broiled cream, French
"fideu?" => 4.20, # more noodles, Valencian (Catalan=fideuada)
"p?t?" => 4.15, # gooseliver paste, French
"お好み焼き" => 8.00, # okonomiyaki, Japanese
);
# find the widest allowed width for the name column
my $width = 5 + max map { colwidth } keys %price;
# So the Asian stuff comes out in an order that someone
# who reads those scripts won't freak out over; the
# CJK stuff will be in JIS X 0208 order that way.
my $coll = Unicode::Collate::Locale->new( locale => "ja" );
for my $item ($coll->sort(keys %price)) {
print pad(entitle($item), $width, ".");
printf " ?%.2f\n", $price{$item};
}
sub pad($$$) {
my($str, $width, $padchar) = @_;
return $str . ($padchar x ($width - colwidth($str)));
}
sub colwidth(_) {
my($str) = @_;
return Unicode::GCString->new($str)->columns;
}
sub entitle(_) {
my($str) = @_;
$str =~ s{ (?=\pL)(\S) (\S*) }
{ ucfirst($1) . lc($2) }xge;
return $str;
}
Simple enough, isn't it? Put together, everything just works nicely.




