PMotD: Text::Balanced

dnb — Fri, 31 Jul 2009 05:26:59 +0000

If you’ve read Jeffrey Friedl’s Mastering Regular Expressions¹ you’ll recall that one of the things that is a lot harder than it looks to get right using regexps is extracting the content from within some text-delimited string, especially when there are opening and closing delimiters².

Text::Balanced originally by Damian Conway and now maintained by Adam Kennedy makes short work of this extraction for:

strings delimited by the same single character
strings delimited by brackets of some sort (parens, square brackets, etc.)
strings delimited by XML or other kinds of tags
strings delimited by Perl quoting operators
code blocks

Each kind of delimited string has its own function, e.g. extract_bracketed(). For the most part, they all work like this:

my ($extracted,$remainder) = 
        extract_something($extract_from, $delimiter, $opt_prefix);

where you have a function called extract_something() that takes the string to extract the data from, the delimiter that surrounds the data in question, and an optional prefix. That last argument reveals a part of Text::Balanced that tends to confuse people so we’ll look more at it in a second. The extract_something() functions return (in a list context) the first extracted piece of data and what was left over or just the extracted data in a scalar context (with alas, another twist).

There are two things about this module that tend to trip up newcomers:

By default (i.e. without the optional third argument), the extraction functions all expect the data they are going to extract to be found either right at the beginning of the string or right at the position in the string that the last extraction left off. If you expect it to skip over non-delimited data that isn’t just whitespace, you will have to provide that third $opt_prefix argument.
When called in a scalar context, the extraction functions eat the text. Extracted strings are removed from the input text. Fans of functional or immutable data structure programming will not be pleased.

If you do want to do split() on steroids kind of stuff, Text::Balanced also offers a extract_multiple() function that takes a list of extraction functions, each of which gets run over the string, returning what they collectively find. Text::Balanced can also make Friedl proud by generating optimal regular expressions for balanced matches that you can use with a minimum of head scratching.

If you need something to extract data matched delimiters of almost all sorts, this module will be spot on for you and will do its job like a laser beam.

and if you haven’t–you really should. It is a bit light on plot and character development, but it is definitely required reading ↩
the other being validating email addresses, but that’s a post for another day ↩

PMotD: Devel::Deprecate

dnb — Tue, 28 Jul 2009 03:51:29 +0000

Today’s interesting Perl module of the Day: Devel::Deprecate by Curtis “Ovid” Poe.

Back in 2000, I started teaching a class called “Perl Saves the Day: Writing Small Perl Programs to Get Out of Big SysAdmin Pinches” which was essentially a class about Perl hacks and how to use them responsibly in System Administration. One of my slides was “How to Get Rid of Hacks,” which started with:

It can be hard. You may have to wait for the rewrite.
Step one: remember you did it.

Go back and document the code after the crisis.

Send yourself mail.

Set up an AT job.

Put it in your calendar.

Now there’s an even cooler way: Devel::Deprecate. Devel::Deprecate provides a deprecate() function that let’s you write code like this (to quote the doc):

deprecate (
    reason => 'Please use the set_name() method for setting names',
    warn   => '2008-11-01',    # also accepts DateTime objects
    die    => '2009-01-01',    # two month deprecation period
);

deprecate() only comes into play when the code is run during a test (which you are writing, right?). Each time it is run under this condition (and only this condition, it never comes into play when the code is run outside of a test), it produces output that is crystal clear:

# DEPRECATION WARNING
#
# Package:     Our::Customer
# File:        lib/Our/Customer.pm
# Line:        58
# Subroutine:  Our::Customer::name
#
# Reason:      Please use the set_name() method for setting names
#
# This warning becomes FATAL on (2009-01-01)

After the due date, it blows up just as promised with an equally verbose message. Very cool.

perl – The Otter Book

PMotD: Text::Balanced

PMotD: Devel::Deprecate