Editor's note: Perl guru Tom Christiansen created and maintains a list of 44 recipes for working with Unicode in Perl 5. Perl.com is pleased to serialize this list over the coming weeks.
? 34: Unicode column-width for printing
Perl's printf, sprintf, and format
think all codepoints take up 1 print column, but many codepoints take 0 or 2. If
you use any of these builtins to align text, you may find that Perl's idea of the width of any codepoint doesn't match what you think it ought to.
The Unicode::GCString
module's columns() method considers the width of each codepoint
and returns the number of columns the string will occupy. Use this to determine
the display width of a Unicode string.
To show that normalization makes no di?erence to the number of columns of a string, we print out both forms:
# cpan -i Unicode::GCString
use Unicode::GCString;
use Unicode::Normalize;
my @words = qw/cr?me br?l?e/;
@words = map { NFC($_), NFD($_) } @words;
for my $str (@words) {
my $gcs = Unicode::GCString->new($str);
my $cols = $gcs->columns;
my $pad = " " x (10 - $cols);
say str, $pad, " |";
}
... generates this to show that it pads correctly no matter the normalization:
cr?me |
cr?me |
br?l?e |
br?l?e |



