-*- mode: text; mode: outline-minor -*-

* Steve's Place - Perl Tutorial

** Lesson 1

*** Warning Pragma

This can be enable with the use statement or the -w switch, on the command
line or within a shebang.

*** Escape Sequences in Double-Quoted Strings

These are useful within double-quoted strings:

\n - newline
\t - tab
\r - carriage return
\f - form feed
\\ - literal backslash
\" - literal double quote
\a - alarm character
\cC - control character C-c
\e - escape character
\b - backspace character
\033 - escape in octal
\x7f - DEL in hexadecimal
\x{263a} - Unicode (smiley)
\N{DEVANAGARI OM} - with 'use charnames', a named Unicode character

\u - Force next character to uppercase ("titlecase" in Unicode)
\l - Force next character to lowercase
\U - Force all following characters to uppercase
\L - Force all following characters to lowercase
\Q - Backslash all following nonalphanumeric
\E - End \U, \L, or \Q

*** Escape Sequences in Single-Quoted Strings

The only escapes available in single-quoted strings are \\ and \'.

*** Number Clarification

Adding underscores to a number literal does nothing to change the value; it
is useful, however, for delimiting segments of a number (e.g., 123_456).

*** Barewords

These work for backwards-compatibility reasons, but are extremely poor style
except in a few limited contexts (such as hash construction).

*** Filehandles

The STDIN filehandle is defined by the Perl interpreter itself. Use of the
diamond operator (e.g., <STDIN>) reads a line (or rather, reads up to and
including the input record separator) and returns it in scalar context. Ex:
"Foo\n". In list context, the entire contents of the filehandle are split on
the input record separator and the resulting list is returned.

*** Scalars

$name = <STDIN>;

Remember, $ for $calar. A $calar holds a $ingle value, such as a string,
number (integer or floating point), or reference.

*** Undefining Variables

`undef' eliminates a variable from the symbol table or scratchpad, and frees
all memory associated with it. (It is best to call a destructor for an object
instance if one is provided, though.)

*** chomp

The chomp function removes the input record separator from a scalar variable
if present.

*** Function Calling

The parentheses surrounding a parameter list in Perl may be omitted except for
cases where clarity is necessary. Ex: chomp $name;

*** Variable Interpolation in Strings

print "$name likes beans.\n" is way more convenient than print $name, " like
beans.\n". (For print accepts a list of input parameters.) Note that this
interpolation only occurs again in double-quoted strings. This interpolation
occurs also for individual hash and array elements. Ex:

print "The address of $name is $addresses{$name}";

** Lesson 2

*** Typing In Perl

Perl can be considered a weakly-typed language, in most senses. One may freely
assign a variety of values to a scalar at any time in the compilation and
execution of a Perl program, and coercion applies a manner similar to other
languages. However, any given cell in the symbol table (or scratch area for
lexical variables?) has several distinct compartments, each of which may
independently hold a value for each of Perl's fundamental data types. In this
way, Perl retains some aspects of strong typing, though it hides it from the
user, reducing any practical import this feature might otherwise have.

**** Scalar Numbers

These are best specified without quotes, although one can certainly use an
expression like:

my $foo = "22";

and then:

print "The answer to the Question is: ", $foo + 20;

As per Tcl, the appropriate value will be calculated at runtime.

*** Arrays and Lists in Perl

The @ sign indicates an @rray, a mutable sequence of scalars in the array
compartment of a symbol table entry, whose members the programmer may access
in any order, at any time, with a numerical index. Ex:

@array = ( 12.5, 'plop', "some tabs\t\t\t\t", $chocolate, 56, "C" );

**** List Constructors

The parentheses represent a _list constructor_, whose value is assigned to the
array compartment of `array' in the symbol table. List constructors may
contain various anonymous scalars, scalar variables, and even function calls,
when qualified with calling parentheses. Ex: caller().

Lists constructions may also include other nested list constructions. Ex:

my @bits = ( 'hello, sailor',
             1,
             qw( mung adzuki haricot ),
             "dal\n",
             "\t",
             $thing );

**** The Difference Between Arrays and Lists

The official Perl documentation does draw a distinction between arrays and
lists (which is why `wantarray' should have been named `wantlist', according
to perldoc -f wantarray). A list is any sequence of scalars; application of
the diamond operator to a filehandle may occur in list context, for instance.
An array is a list stored in the symbol table. That is all.

**** Array and List Indexing

For arrays, one qualifies the variable name with a dollar sign before, and a
braced index (C style; start at zero) after. The index may be any expression
that evaluates to a numerical index, such as a function call
($array[FindRightIndex()]) or arithmetic statement ($array[42 + $offset]), but
this is usually a plain, anonymous integer. Ex: $array[8]. One may also access
members of any given anonymous list in a like manner. Ex:

Negative indices to an array or list start at the end of the list and move
backwards through the list. Ex:

my @beans = qw/adzuki haricot mung/;
print "$beans[-1]\n";

Unfortunately, slicing does not work with negative indices as in Python, at
least not without tying the array. :evil:

***** Slicing Arrays

One can access an arbitrary selection of elements from an array or list by
specifying a list of indices between the braces (any old list, as per the
rules above in Arrays and Lists in Perl), and prefacing the variable with a @,
as multiple values are involved. Ex:

my @cake = qw/flour eggs milk sugar butter sultanas water/;
my @SliceOfCake = @cake[3, 6];
print "@SliceOfCake\n";

****** l-value Usage

Perl does some pretty clever, unexpected things with l-values, and its usage
of slices in that context is one of them. Ex:

my @array = qw/hello everybody I'm Dr. Nick Riviera/;
print "@array\n";
@array[3..5] = qw/a complete charlatan/;
print "@array\n";

**** Array Interpolation in Double-Quoted Strings

As seen in the above examples, interpolation of an array or array slice causes
padding of single spaces between the added elements. Do not forget this!!

**** Shell-Style Boundary Indicator

$#foo yields the last index in an array. Ex:

my @foo = qw/foo bar baz/;
print $#foo; # 2

***** lvalue Usage

$#foo = expression-evaluating-to-integer can increase the size of an array to
preempt later allocation for a large array, or decrease the size of an array.
Ex:

my @foo = qw/this array is rather much too long/;
$#foo = 2;

print "@foo"; # 'this array is'

*** Semantics: Difference between Perl and perl

Perl: the abstract specification of the Perl programming language. Perl is
ideally equal on all platforms, although the details of perl (see below)
sometimes differ due to the peculiarities of the underlying system.

perl: the name of the program that executes a Perl program. This traditionally
refers to a Unix instance of such a program. 

These are in fact quite distinct.

*** Perl Hashes

These are also known as 'associative arrays' or, in programming languages such
as Python and Visual Basic (makes the sign of Our Ford), as dictionaries, and
occupy their own space in a symbol table entry, signified syntactically by the
percent sign, perhaps suggesting the relation between a key and value. Ex:

my %hash = ( foo => "bar", baz => "quux" );

**** Hash Constructors

These are a peculiar brand of list constructors, which consist essentially of
pairs of scalar values, where the former constitutes a key, and the latter a
value. The difference in semantics is imposed only by the assignment itself,
and not by any characteristic of the constructor; it is fundamentally no
different than any other list constructor.

***** The Fat Comma

`=>' can be used to separate a key and a value. It has the advantage of
clarifying the nature of the hash constructor, and frees the programmer to use
barewords for keys. There is nothing that prevents its use in regular list
construction; if qw// did not exist, one could use:

my @array = ( foo => bar => baz => ); 

to emulate it.

**** Hash Indexing

As with the other mutable named type, arrays, hashes are prefaced with a
dollar sign, but followed by an index in curly braces. The index may be any
expression that evaluates to a usable scalar, although strings are the most
common index. In particular, these strings may be written as barewords within
the braces, even with the warnings and strict pragmata enabled. Ex:

print "$hash{key}\n";

***** Hash Slicing

my %trees = ( apple => "Malus",
              pear => "Pyrus",
              plum => "Prunus",
              oak => "Quercus",
              ash => "Fraxinus",
              yew => "Taxus" );

print "@LatinNamesOfFruitTrees{qw/apple pear plum/}\n";

Notice that this syntax corresponds to that of array slicing.

****** lvalue Usage

Again, there exists the same parallelism; using the %trees as above, this
changes the Latin classification to Hindi names of the actual fruit of the
trees.

@trees{qw/apple pear/} = qw/seba naashapatii/;
print "@trees{qw/apple pear/}\n";

**** Hash Evaluation in List Context

This will degenerate a hash to its constructor list. Ex, this awesome idiom:

%by_name = reverse %by_address;     # Invert the hash

This only works if the values are unique, though.

**** Hash Evaluation in Scalar Context

This yields a string representing a fraction whose denominator is the number
of allocated buckets and whose numerator is the number of used buckets.

print scalar %hash; # Something like 42/64 

*** Whitespace Processing in Perl

Where whitespace is needed to distinguish tokens, any amount or kind is
usable, in the tradition of Algol. Observe the hash construction of %trees for
such an example. The trade-off is the need for an explicit statement
terminator, which can be damned annoying if one is not used to it.

*** Independent Availability of Symbol Table Entries

Remember, $beans is completely different from @beans! Think MacLISP or Common
LISP.

*** Escaping Variable Syntax

As with other escape sequences, \@, and \$ are useful for escaping what would
otherwise be interpreted as scalars, arrays, and slices or individual elements
of the mutable data types in double-quoted strings. Ex:

print "\@beans contains ", scalar @beans, " members, @beans.\n";

*** Scalar Context of Arrays

An array used in scalar context will evaluate to an integer equal to the
number of elements in the array. This does not appear to be useful for
anonymous lists such as created by qw//. It is, however, useful for calls to
functions, such as caller, which returns the package name of the calling
function, block, or `eval' statement (if I understand the documentation
correctly!). The behavior of scalar context in functions varies according to
its use of Perl's context detection (wantarray for user functions), and does
not necessarily conform to the standard used for arrays.

**** Examples of Scalar Context

***** Assignment to Scalar Variables

my $number = @things;

$number will hold the number of elements in things. Note that this is
different from:

my ($number) = @things;

...or, if $number has already been declared, simply:

($number) = @things;

which both retain list context, assigning the first value in @things to
$number. (More on this later.)

***** Explicit Context with scalar

The scalar function is useful for imposing scalar context where it can not be
implied (such as above). There is no corresponding function for imposing list
context, because Perl automatically does it where necessary, such as in a call
to print with a single argument. Here I will use the example presented in
Escaping Variable Syntax again:

print "\@beans contains ", scalar @beans, " members, @beans.\n";

Use of @beans alone would expand to a munged together list of its contents
(e.g., adzukiharicotmung).

*** C-Style Numerical for Loops

**** Basic Form

The C-style for control structure proceeds as follow:

for (initialization; test; increment) {
  body statements...
}

A more specific example:

for ( my $i = 0; $i < scalar @beans; ++$i ) {
  print "$name likes $beans[$i] beans.\n";
}

The initialization statement will traditionally (even idiomatically) establish
a variable for looping, which, in an accompanying tradition, is usually $i.
The test statement is then evaluated. If it is true, Perl proceeds to execute
the block of statements given to the control structure; otherwise it returns
control to the next statement following it. Upon completing execution of the
block, the third statement, responsible for incrementing the loop value is
executed. (Although one could make all three do whatever the hell you want,
which is occasionally useful.) Here we use the pre-increment operator, as
opposed to the post-increment operator, although it really does not matter in
this context. It is important, however, to realize the difference between the
two of them.

Note that traditional for loops cannot be rewritten as Anglified statement
modifiers, such as it can with generic for loops (e.g.,
print for qw/foo bar baz/;)

**** Truth and Falsehood

The above description of the `for' loop raises the interesting question of
what value exactly will pass the second 'test' clause. In Boolean context,
these values are false: undef (such as undefined variables, functions that
return undef, etc.), an empty string, zero, or "0". Everything else is true
(including references).

The Fourfold Truth of Perl:

Any string is true except for "" and "0".
Any number is true except for 0.
Any reference is true.
Any undefined value is false.

*** Useful Operators

**** Pre- and Post-Increment

You'll never guess what expressions like --$i and $i-- do.

**** Arithmetic Operators

+, -, /, and * should be fairly obvious. ** is the less obvious exponentiation
operator, and % takes the modulus (remainder) of a division. x multiplies a
string (e.g., print 'hello' x 3), and . concatenates two strings (e.g., print
'foo' . 'bar';)

**** Arithmetical Assignment Operators

$i += 2;  # Add two to the value of $i
$i **= 3; # Cube $i
$i x= 3;  # 'Multiply' string three times

Etc...

*** Return Values of Assignments

Assignments return a kind of reference to the value modified (that is, a
compartment of an entry in the symbol table), which can be useful for
shortening statement. Ex:

chomp( my $name = <STDIN> )

Which assigns an input record from the predefined STDIN filehandle to the
lexically scoped variable $name, and then chomps it.

*** Some Quoting Operators

**** The 'Quote Words' Operator

The quote words operator, `qw', constructs a list consisting of the
whitespace-delimited barewords between its opening and closing elements, which
may be a pair of identical non-alphanumeric characters (traditionally the
forward slash) or a natural pair. Ex:

my @list = qw/foo bar baz/;
my @list = qw#foo bar baz#;
my @list = qw^foo bar baz^;
my @list = qw{foo bar baz};

Strings within a quote words construct are considered singly-quoted; there is
no variable substitution or expansion of escape sequences. 

**** Generic Forms of Single and Double Quoting

q and qq provide generic single and double quoting, which relieve the
programmer of the tedious escape sequence needed to include the quote
characters themselves in a string.

***** Escaping in Quoting Operators

If a terminating character must be used with one of these quoting operators,
it can be escaped as with other similar characters. Ex:

my $answer = 42;
print qq:The answer to the Question is\: $answer:;'

*** The Generic `for' Loop

**** Basic Form

for declaration list {
  body statements...
}

A more specific example; with an explicit variable declaration, even the
parentheses around @beans are unnecessary:

foreach my $bean (@beans) {
  print "$name likes $bean beans.";
}

`for' may also be written as `foreach'; strictly optional though.

***** Introducing $_

The declaration of a variable may be omitted, in which case each element of
the list assumes the default variable, $_. Likewise, many functions will
assume they will operate on this common default variable if no arguments are
given. Ex:

for (qw/foo bar baz/) {
  print;
}

will just print out each member. Another example:

foreach (@beans) {
  print "$name likes $_ beans.\n";
}

**** Statement Modifier Form

The generic `for' can even be used as a statement modifier, when assuming the
default variable of $_. Ex:

print for qw/foo bar baz/;

*** Perl Variable Names

User variables should generally begin with an alphabetical character, followed
by any combination of alphanumeric characters and the underscore. However,
there are a number of influential 'punctuation' variables that affect the
fundamental operation of the Perl program, such as the default variable $_ and
$/, the input record separator. If the utf8 pragma is in effect, Unicode
characters which represent elements of human speech and numbers are also valid
in variable names ('alpha'numeric is too broad where one might be using
Katakana or pre-composed Hangul jamo).

** Lesson 3

*** Lexical Scope

The keyword `my' instantiates lexical scope. You know how this works; Perl
even has 'closures' and 'deep binding' (closures seek backward through lexical
bindings until they find an appropriate matching value), etc.

Lexical bindings established in the openings of control and iteration
structures extend to their bodies. Ex:

while ( chomp(my $tree = <>) ) {
  # Stuff involving $tree...
}

Presumably, this applies for `local' variables as well, but I've always been a
fan of lexical scope myself.

An interesting variation on lexical scoping is `our'. `our' actually creates a
package variable with a lexical alias. While accessible to other packages
through full qualification (e.g., $Foo::Bar) and present in the symbol table,
it acts like a lexical variable within the current block. This is especially
useful for use under the `strict' pragma, where one may not reference global
variables without a package name.

*** `strict' Pragma

Unless it's a one-off script or -e expression, you want this. The `strict'
pragma disallows use of global variables without package qualification (unless
declared by `our'), symbolic references, and a whole bunch of other nasty
shit that you should not use on a regular basis.

*** C-Style `while' Loops

As in C, `while' loops repeatedly executes the contents of a block as long as
a condition remains true. ('Execute' is a better word than 'evaluate', because
`while' loops can not return any value.) The evaluation of the condition
always precedes the block; if it is false the first time, the block never
acts once. Ex:

while ( my $type = pop @peas ) {
  print "$type peas are ", flavour( $type ), ".\n";
}

This will run until there are no more elements in @peas, because the return
value of an assignment is the variable itself. The final `pop' statement,
assigning undef to $type, causes the while loop to terminate. It is
interesting to note that `while' (unlike `foreach') does not implicitly
localize any variables, hence the use of `my' here. If, for example, you need
to iterate over a filehandle (see below) while preserving the global value of
`$_', you might do something like this:

while (defined( local $_ = <$fh> ) ) {
  ...
}

**** Statement Modifier Form

This is pretty routine by now, I would imagine. But ex:

print while (<STDIN>);

***** Cautionary Statement on the Use of `my'

The use of `my' in the control structure of a statement modifier is undefined,
such as in:

print $type while my $type = pop @peas;

...so don't do it. (It doesn't work at all in perl 5.8.6 for i486-linux.)

**** Negated `while' Loop

`until', as a control structure with block or statement modifier does the
inverse of `while', one of Perl's many convenient syntactosaccharides.

**** `continue' Block

After the natural execution of a `while', `foreach', or raw block in a looping
statement, or following a `next' statement (see below), any `continue' block
specified will be executed. The canonical example of this is the behavior of
the `-p' switch:

LINE: while (<>) {
    ...             # your program goes here
} continue {
    print or die "-p destination: $!\n";
}

Note that the variables available in the main `while' loop are also in scope
in the `continue' block.

*** Raw Blocks

A lone block delimited by curly braces is semantically equivalent to a loop
that executes once. `my' and `local' definitions will limit their scope to
such a block, and it is possible to use `continue' blocks and all that other
good shit you would expect with a `while' or `foreach'. Ex:

SWITCH: {
  if (/^abc/) { $abc = 1; last SWITCH; }
  if (/^def/) { $def = 1; last SWITCH; }
  if (/^xyz/) { $xyz = 1; last SWITCH; }
  $nothing = 1;
}

*** Other Array Operators

push ARRAY LIST - add LIST to the end of ARRAY
shift ARRAY - extract element from the beginning of the list
unshift ARRAY LIST - bung LIST values onto the beginning of ARRAY, maintaining
specified order

**** splice

`splice' gets its own section. This is an extremely flexible, extremely ugly
general Swiss Army knife operator for arrays. The full form of `splice'
proceeds so: splice ARRAY OFFSET LENGTH LIST.

`splice' removes elements LENGTH elements after OFFSET (unless either are
negative; see below), and replaces them with the elements of LIST; the array
will grow or shrink as necessary.

If OFFSET is negative, the offset is that many elements from the end of the
list. If LENGTH is negative, the extracted region spans from OFFSET up to and
including the element immediately before the last abs(LENGTH) elements.

If LENGTH is omitted altogether, remove elements from OFFSET onward. If OFFSET
is omitted, splice from OFFSET to the end. If LIST is omitted, simply remove
the elements designated by OFFSET and/or LENGTH. Lastly, if no parameters but
ARRAY are specified (the only necessary one), then wipe the whole array (the
same as $#foo = -1).

`splice' returns the elements removed in list context, the last element
removed in scalar context, or undef if none were removed.

***** Equivalences from the Official Documentation

The following equivalences hold (assuming "$[ == 0 and $#a >= $i" )
                   
push(@a,$x,$y)      splice(@a,@a,0,$x,$y)
pop(@a)             splice(@a,-1)
shift(@a)           splice(@a,0,1)
unshift(@a,$x,$y)   splice(@a,0,0,$x,$y)
$a[$i] = $y         splice(@a,$i,1,$y)

**** reverse

`reverse', in list context, returns a reversed list of the input parameters.
In scalar context, it gets a little trickier and concatenates the elements of
list into a single string, returning a reversed form of this string. With no
parameters, it reverses the current value of $_, whatever that may be.

*** Subroutines

**** Basic Form

Subroutines or functions (technically, subroutines produce only side effects,
but Perl does not seem to give a shit about this convention) are defined like
so:

sub foo {
  body statements...
}

in the most basic form. Prototyping and attributes are matters I'll get to
later on. Oh, and the 'magic character' for functions is '&'. This is useful
for, say, symbol table aliasing such as seen here:

local *foo = \&bar;
bar();

...or for references (see below)

"(Often a function without an explicit return statement is called a
subroutine, but there's really no difference from Perl's perspective.)"

  -- bad-ass Perl takes a shit on convention

**** Calling Subroutines

Perl subroutines are called using the Hinarita (lesser method),
foo(bar, baz), assuming everything is normal (you haven't overridden a
toolbox function, etc.). Note that this isn't as good as the Maharita used by
Lisp, (foo bar baz), but we still love Perl anyway.

***** List Ambiguity

Sometimes, one wishes to distinguish a plain anonymous list from a list of
function parameters. The unary `+' is useful here. Ex:

# Without the plus, this will be interpreted as print(1, 2, 3) AND 4 AND 5,
# which is not what we want.
print +(1, 2, 3), 4, 5;

**** Return Values

One can return a value or values explicitly using `return' as with C, but the
value of the last expression evaluated in the block also works. I like using
`return' just for clarity, myself.

**** Subroutine Arguments

@_, an array of parameters which implement variadic calling, corresponds to $_
as the default variable upon which functions like shift operate by default. @_
is not visible in callee functions, unless the form &foo; is used, essentially
bringing @_ into lexical scope. (Though I haven't checked if this works with
closures.) Without prototypes, function calls flatten all scalar, array, and
hash arguments into this same uniformly one-dimensional array. The elements of
@_ are magic aliases to the original arguments; in this way, Perl subroutines
can exhibit call-by-reference behavior. Ex:

sub foo {
  $_[0] = "hanumizzle";
}

my @foo = qw/foo bar baz/;
foo(@foo);

print "@foo";
hanumizzle bar baz

But this is unusual behavior.

***** Idiomatic Operations on Arguments

****** shift

`shift' will assume @_ as the object of operation in a subroutine (or format,
but we hateses formats). Ex:

my $first = shift;
my $second = shift;

Use of `shift' on @_ will *not* modify the aliased parameters themselves,
which is probably what you want.

****** Named Parameters

Perl 5.x.x does not have formal named parameters, but it's pretty damned easy
to emulate them with such statements as these:

# Assign first element of @_ to lexical $foo; do not change @_. Note this is
# different from my $foo = @_, which puts @_ in scalar context.
my ($foo) = @_;

# Assign a couple named parameters; ignore rest
my ($foo, $bar, $baz) = @_;

# Assign some named parameters, and the remainder are bunged into @quux
# greedily (think `&rest' in some Lisp dialects, such as Emacs Lisp).
my ($foo, $bar, $baz, @quux) = @_;

*** C-Style if Control Structures

**** Basic Form

if (yada) { foo } # You get the idea by now

`if' can be used as a statement modifier, but perhaps even better is this
idiom:

yada and foo;

...or:

yada and do { foo; bar; baz };

...for a block. This uses the way-low precedence Boolean operator `and' as a
sort of jury-rigged control structure, as `and' (or `&&' for that matter) will
not evaluate the second expression at all if the first fails, that being a
waste of time.

**** Negated if Control Structure

`unless' does guess-what.

**** Control Transfer

Any number of `elsif' clauses and `else' will inherit control from the `if' in
the event that its conditional does not evaluate to a true value. Ex:

if ( THIS_IS_TRUE ) {
  DO_THIS_THING;
} elsif ( THIS_OTHER_THING_IS_TRUE ) {
  DO_THIS_OTHER_THING;
} else {
  DO_THE_DEFAULT_THING;
}

*** Equality Comparison Operators

Without overloading or anything tricky like that, the only defined behavior
for these operators is on scalars, apparently.

If an equality comparison operator evaluates true, the specific value yielded
is 1; for false, undef.

**** Strings

`eq': compare two strings; return true if equal
`ne': not equal
`gt': greater than
`ge': greater than or equal to
`lt': less than
`le': less than or equal to

The 'inequalities' use phone book ordering to determine whether a string is
greater or less than another. I don't know how Unicode comes into play; is
Devanagari vocalic r greater than mor maa?

**** Numbers

These numerical equality comparison operators are direct analogs to the string
operators listed:

`==': compare two strings; return true if equal
`!=': not equal
`>' : greater than
`>=': greater than or equal to
`<' : less than
`<=': less than or equal to

**** The Difference?

`"2" eq "2.0"' is false. `"2" == "2.0"' is true.

*** `switch' Statements in Perl

**** Emulation with Generic `for' Loop

for ($variable_to_test) {
  if    (/pat1/)  { }     # do something
  elsif (/pat2/)  { }     # do something else
  elsif (/pat3/)  { }     # do something else
  else            { }     # default
}

for ( $arg ) {
  /^quit$/ && do { exit(0) ; } ;
  /^help$/ && do { system( "perldoc $0") };
}

**** use Switch;

I'm feeling lazy; I'll just toss out a few examples:

use Switch;

switch ($val) {
  case 1          { print "number 1" }
  case "a"        { print "string a" }
  case [1..10,42] { print "number in list" }
  case (@array)   { print "number in list" }
  case /\w+/      { print "pattern" }
  case qr/\w+/    { print "pattern" }
  case (%hash)    { print "entry in hash" }
  case (\%hash)   { print "entry in hash" }
  case (\&sub)    { print "arg to subroutine" }
  else            { print "previous case not true" }
}

--

use Switch;

# AND LATER...

%special = ( woohoo => 1,  d'oh => 1 );

while (<>) {
  switch ($_) {

  case (%special) { print "homer\n"; }      # if $special{$_}
  case /a-z/i     { print "alpha\n"; }      # if $_ =~ /a-z/i
  case [1..9]     { print "small num\n"; }  # if $_ in [1..9]

  case { $_[0] >= 10 } {                    # if $_ >= 10
      my $age = <>;
      switch (sub{ $_[0] < $age } ) {

          case 20  { print "teens\n"; }     # if 20 < $age
          case 30  { print "twenties\n"; }  # if 30 < $age
          else     { print "history\n"; }
      }
  }

  print "must be punctuation\n" case /\W/;  # if $_ ~= /\W/
}

/Most/ of this is pretty intuitive. See the official documentation for full
details.

** Lesson 4

*** More on Hashes (and Arrays)

**** Adding New Elements to a Hash

$foo{key} = ScalarValue;

Just use a qualified hash element as an lvalue; it's that simple.

**** Existence Checking 

Here, you gotta be careful.

***** The Intuitive and Possibly the Wrong Method

`exists' may seem like the way to check for the existence of a member within a
hash or array, but it might not do exactly what you want. From the manual:

"Given an expression that specifies a hash element or array element, returns
true if the specified element in the hash or array has ever been initialized,
even if the corresponding value is undefined. The element is not autovivified
if it doesn't exist."

So `exists' is a bit like `intern-soft' for Lisp users. Unfortunately, it
evaluates true if the specified element has been initialized, even though its
value may currently be undefined. If the specified element array or hash is
reduced with splice, $#foo = ... statements, or delete, `exists' returns
false. Ex:

# Will print a blank line
my @foo = qw/foo bar baz/; $#foo = 1; print exists $foo[2];
my %foo = ( foo => 42, bar => 69 ); delete $foo{bar};

***** What You Probably Want

`defined' tests that a value is not `undef', and otherwise acts like `exists'.
In most cases, this is the behavior you want.

**** Removing Elements

`delete' will remove an individual element or a slice from a hash or array.
`defined' tests on these elements will return false, unless they are defined
anew, of course.

`delete' returns a list equal in length to the number of /attempted/
deletions. Each member is a former value in the operand array or hash, or
`undef' for cases where deletion wasn't successful (key didn't exist).

**** Iterating over Elements

***** `each'

#!/usr/bin/perl -w
my %bits = ( soy => 'sauce', sesame => 'oil', garlic => 'clove' );
while ( my ( $key, $value ) = each %bits ) {
  print "$key has value $value\n";
}

`each' is one method for iterating over the elements of a hash. Perl hashes
maintain iterators which traverse the contents in an ostensibly arbitrary
order, but certainly reference each key/value pair exactly once. When called
in list context, it returns the next (or first) key and value pair in this
sequence. In scalar context, it returns only the key. Upon reaching the end of
this sequence, `each' returns a false value in list or scalar context (doesn't
really matter much what it is, does it?), which makes this useful for `while'
loops. The next call to `each' will begin the next iteration, which proceeds
exactly like the one before. Think of it almost like a Mahayuga. The iterator
will also be reset by calls to `keys' or `values'.

One caveat arises here: don't add or delete elements from the hash during
iteration with `each', excepting cases where one deletes the last element
returned by `each'. Ex:

while (($key, $value) = each %hash) {
  print $key, "\n";
  delete $hash{$key};   # This is safe
}

***** `keys'

In list context, return the copied keys of a hash (modifying this list will
not affect the hash itself), in an 'order' identical to that used by `each'.
(It may be useful to sort this list!) In scalar context, return the number of
elements in the hash, and in the void context, reset the iterator, with no
additional overhead. An example of `keys':

my @keys = keys %ENV;
my @values = values %ENV;

while (@keys) {
    print pop(@keys), '=', pop(@values), "\n";
}

When used as an lvalue, keys increases the number of buckets allocated for the
operand hash (it is not possible to decrease the value, so don't worry about
it). This can be used to preempt inefficiency for what you anticipate to be a
large hash, and is therefore best used before the hash holds any values. Ex:

keys %foo = 1000; # Perl rounds up to the next 2 ** x; this is really 1024

while (<>) {
  # Load up %foo...
}

***** `values'

In list context, returns all the members of the hash in the same order
established by the iterator `each' uses, in scalar context, returns the number
of elements, and in void context, resets the iterator.

The list returned aliases its members to the actual values, enabling one to do
some awesome shiznit:

for (values %hash) {
  s/foo/bar/g;
} # modifies %hash values

*** Various List Properties

**** Multiple Definitions with my

my ( $one, $two, $three ) = ( 1, 2, 'three' );

And the obligatory caveat:

"Note the brackets around the ($one, $two, $three). You need these to make perl
realise it's a list, just as when you create arrays. If you miss them off,
perl will try to evaluate $one, $two and $three separately (i.e. in scalar
context), and therefore come up with the last thing it evaluated, which is
$three. It will then do exactly the same to the other side, come up with
'three', then go " $three = 'three' ", and nothing else. $one and $two will
never be assigned anything. You need brackets to force list context, in the
same way as you sometimes need scalar to force scalar context."

**** Swapping Variables

($y, $x) = ($x, $y); # That simple

**** Generating Ranges

`0..5' and like forms generate a list of numbers between their endpoints,
inclusive. These are often used for slicing lists in sequential blocks. Ex:

my @foo = 1..5;       # Usable outside array indices
my @bar = @foo[2..4]; # Expands to @foo[2,3,4];
# Repeat a command
print 'Ah, ah, ah! You didn't say the magic word!' for 1..1000;

**** Assigning the Remainder of a List to an Array

# @baz will hold 'Python' and 'Scheme'
my ($foo, $bar, @baz) = ('Common Lisp', qw(Perl Python Scheme) );

# Slices restrict the greediness of this phenomenon. The array must be
# declared first; my cannot 'declare' slices.
(@foo[0..5], $bar) = qw/The quick brown turd jumps over the lazy fox/;

*** `sort'

With one argument, a list, `sort' returns an ASCIIbetical sorting of its
contents (actually, it also respects locale collations if use locale is in
effect). One may specify a block or subname (reference or string naming a
function) before the list to influence the sort. sort will introduce the
values $a and $b to this block or function, which are any two particular
values in use by the sorting algorithm (unless the function is prototype ($$),
in which case $a and $b are passed to @_ as usual). The return value of this
particular piece of code is either:

1, indicating that $a will come after $b in the list
0, indicating equal status, or
-1, indicating that $a will come before $b in the list

MahaEx:

# sort lexically
@articles = sort @files;

# same thing, but with explicit sort routine
@articles = sort {$a cmp $b} @files;

# now case-insensitively
@articles = sort {uc($a) cmp uc($b)} @files;

# same thing in reversed order
@articles = sort {$b cmp $a} @files;

# sort numerically ascending
@articles = sort {$a <=> $b} @files;

# sort numerically descending
@articles = sort {$b <=> $a} @files;

# this sorts the %age hash by value instead of key
# using an in-line function
@eldest = sort { $age{$b} <=> $age{$a} } keys %age;

# sort using explicit subroutine name
sub byage {
  $age{$a} <=> $age{$b};  # presuming numeric
}
@sortedclass = sort byage @class;

The two operators of exceptional utility in the examples are:

**** `cmp' and `<=>'

`cmp' returns 1 if its first operand is ASCIIbetically greater than the
second, -1 if it is less, and 0 if they are equal. `<=>' does the same for
numbers (does conversions and everything). The default implicit block is
therefore sort { $a cmp $b } @list.

*** Symbol Tables and Packages

As you know well by now, all variables reside either in the symbol tables of
Perl's namespaces (which are called 'packages'), or in scratchpads associated
with a certain lexical scope. The use of lexical scope is fairly
straightforward, but packages merit some additional description. In fact,
there are really no such things as 'global variables', just package variables.
`main' is the default package name, and is the closest equivalency to a
'global' namespace Perl has to offer; variable look-up will fall back on the
`main' package in lieu of any more appropriate lexical or package bindings.
One can also specify an explicit package name by qualifying the variable name
with its package (e.g., @So::Long::And::Thanks::For::AllTheFish).

It is also possible to instantiate an alternate current package with such
statements as:

package Foo; # Current package is `Foo'
# One can impose hierarchy with double colons. This does not mean a search for 
# $Foo::Bar::Baz::suxorz will default to $Foo::Bar::suxorz, but one may
# traverse the hierarchy with the symbol name hashes (see below), and this
# scheme still offers useful organization for OO modules, for instance.
package Foo::Bar::Baz;
package main; # Revert to `main' package

All units Perl compiles will fall within this current package by default (that
is, if no other explicit package name is specified.) This installation lasts
through to the end of the current unit of compilation, that is, a block, file,
or eval STRING, or until another such declaration supersedes it.

The special literal `__PACKAGE__' evaluates to a string containing the name of
the current package.

**** Typeglobs

Typeglobs represent individual symbol table entries, and it is of technical
import to note that they are a huge pain in the ass. As scalars have the magic
character of the dollar sign, and arrays, the commercial 'at' sign, typeglobs
have the asterisk. Perhaps this is not a coincidence; they are the equivalent
of C pointers: confusing, loaded with semantics that appear mysterious at
first, and usually remain mysterious afterwards. They will happily be removed
with the Messianic coming of Perl 6.

***** Aliasing

One of the most basic (and indeed, one of the most common) uses of typeglobs
is to alias one symbol to another; these symbols may even span across
packages. Ex:

# You can localize *glob without a package, but you can't seem to use it
# without a package otherwise under the strict pragma. Hmm...dumb...

no strict;

$Foo::bar = "baz";
{ 
  local *glob = *Foo::bar;
  print $glob;
}

Assigning a glob to a glob aliases a symbol table entry. All references to any
compartment of the aliased glob refer back to the original location; so
`$glob' is really the value of `$Foo::bar'; `%glob' is likewise `%Foo::bar',
and so forth. This is the basic principle underlying the Export mechanism.

Notice as well that this does not work with lexical variables, as they do not
live in any symbol table. Therefore, statements such as `my *foo;' will not
work.

****** Limited Aliasing

One can restrict aliasing to a given type using statements of the form:

local *foo = \$bar; # or just `*foo'; Emacs outlining doesn't like it!!

$foo now points to $bar, but @foo does /not/ point to @bar. (I think this
syntax makes little sense, too.)

***** Defining Constants

``Another use of symbol tables is for making "constant" scalars.

  *PI = \3.14159265358979;

Now you cannot alter $PI, which is probably a good thing all in all. This
isn't the same as a constant subroutine, which is subject to optimization at
compile-time. A con­ stant subroutine is one prototyped to take no arguments
and to return a constant expression. See perlsub for details on these. The
"use constant" pragma is a conve­ nient shorthand for these.''

***** Storing Typeglobs in Scalars

This is of limited use these days, but I'll cover it for old time's sake.
Before Perl 5.6, it was necessary to do things like this limit a filehandle to
given lexical or dynamic scope:

sub newopen {
  my $path = shift;
  local *FH;          # not my() nor our()
  open(FH, $path) or return undef;
  return *FH;         # not \*FH!
}
$fh = newopen('/etc/passwd');

Or:

my $fh = do { local *FH; }
open $fh, "/etc/passwd";

Typeglobs can be stored in scalars, effectively limiting their scope to that
of the container variable. The syntax is a little different when dereferencing
them, though:

my $foo = *main::bar; # $foo can be lexical! It's only /holding/ a glob
$main::bar = "Frobnitz";
# One dolla to get the scalar compartment of `foo', another to dereference the
# scalar part of its value, *main::bar
print $$foo;

**** Symbol Table Hashes

Perl exposes symbol tables for dynamically scoped variables with hashes
visible to the user; changes in these hashes reflect in the symbol table
itself. Such a hash is addressed with the package name followed by two colons.
Ex:

# Shows all the variables in the `main' package. Because `main' is the default
# package for variable look-up, it is also possible to use `%::'.
print for keys %main::;

 */
 *stderr
 *utf8::
 *"
 *CORE::
 *DynaLoader::
 *stdout
 *attributes::
 *
 *stdin
 *ARGV
 *INC
 *_<-e
 *ENV
 *Regexp::
 *UNIVERSAL::
 *$
 *_<perlio.c
 *main::
 *-
 *_<perlmain.c
 *PerlIO::
 *_<universal.c
 *0
 *@
 *_<xsutils.c
 *STDOUT
 *IO::
 *
 *_
 *+
 *STDERR
 *Internals::
 *STDIN
 *DB::
 *<none>::

Note that `main::' is a key in `%main::'. Turtles all the way down, be careful
if you write code to traverse the symbol table. The values of such a hash are
'typeglobs':

print for values %main::;

 *main::/
 *main::stderr
 *main::utf8::
 *main::"
 *main::CORE::
 *main::DynaLoader::
 *main::stdout
 *main::attributes::
 *main::
 *main::stdin
 *main::ARGV
 *main::INC
 *main::_<-e
 *main::ENV
 *main::Regexp::
 *main::UNIVERSAL::
 *main::$
 *main::_<perlio.c
 *main::main::
 *main::-
 *main::_<perlmain.c
 *main::PerlIO::
 *main::_<universal.c
 *main::0
 *main::
 *main::@
 *main::_<xsutils.c
 *main::STDOUT
 *main::IO::
 *main::
 *main::_
 *main::+
 *main::STDERR
 *main::Internals::
 *main::STDIN
 *main::DB::
 *main::<none>::

An even better example:

#!/usr/bin/perl -w
# use strict;

# define some things
$pibble = 2;
@foo = ( 1, 4 );
$foo = 'bar';
%foo = ( key => 'value' );
%bits = ( me => 'tired' );

sub my_sort { return ( $a cmp $b ) }

print "This program contains...\n";

while ( my ( $key, $value ) = each %main:: ) {
# iterate over the key/value pairs of the symbol table hash
  local *symbol = $value;
  # this assigns the value from the symbol table to a typeglob
  # these lines look to see if the typeglob contains 
  # a $, %, @ or & definition
  if ( defined $symbol ) {
    print "a scalar called \$$key\n";
    # \$$k is just an escaped $ 
    # followed by the contents of variable $key
  }

  if ( defined @symbol ) {
    print "an array called \@$key\n";
  }

  if ( defined %symbol ) {
    print "a hash called \%$key\n";
  }

  if ( defined &symbol ) {
    print "a subroutine called $key\n";
  }
}

a hash called %ENV
a scalar called $pibble
a scalar called $_
a hash called %UNIVERSAL::
a scalar called $foo
an array called @foo
a hash called %foo
a scalar called $$
...
 
In fact, these two statements are nearly identical:

local *sym = *main::variable;
local *sym = $main::{"variable"};

...except that the first is more efficient, as it is evaluated at compile
time, rather than run time. It will also /create/ the original glob if
necessary. I am under the impression that using the symbol table hashes is
best for traversal and other purposes not attainable through the normal
syntax.

** Lesson 5

*** `open'

**** Basic Form

open FILEHANDLE FILE

...where FILEHANDLE is just that, and FILE is an expression evaluating to a
string, which in turn indicates a filename. (I supposed FILEHANDLE could be an
expression as well, but I've never seen it done.) Perl will assume by default
that this is a read-only file.

Actually, as of Perl 5.6, the 'filehandle' can be a scalar; Perl will
autovivify it with a reference to an anonymous typeglob, greatly unifying
management of files with the normal scoping protocol. (Filehandles /as such/
live in their own global namespace, a rough edge Perl 6 will sand down.)

Because Perl is implemented on diverse platforms, the designers settled on the
standard of directories and files delimited with Unix-style forward slashes;
this is portable across operating systems, although there remains the issue of
the drive letter. It is still possible to do something like:
"C:\\autoexec.bat", which may be necessary if you intend to pass the filename
to an external utility. It is probably best to retain forward slashes until
conversion is necessary and use the substitution operator: s#/#\\#g.

Once you have finished with a filehandle, close it with `close', intuitively
enough.

**** `open' Modes

Anyone who is familiar with Unix shell programming will feel right at home. Ex:

open OUTPUT, ">C:/copied.bat" or die "Can't open C:/copied.bat for writing $!\n";
open READ, "<C:/autoexec.bat"; # explicit < for reading
open READ, "<", "C:/autoexec.bat"; # three argument version is safer
open WRITE, ">C:/autoexec.bat"; # open for writing with >
open WRITE, ">", "C:/autoexec.bat";
open APPEND, ">>C:/autoexec.bat"; # open for appending with >>
open APPEND, ">>", "C:/autoexec.bat";
open READ, "C:/autoexec.bat"; # perl will assume you mean 'for reading' otherwise

# Magical piping fun!
open(PRINTER, "| lpr -Plp1")    || die "can't run lpr: $!";
print PRINTER "stuff\n";
close(PRINTER)                  || die "can't close lpr: $!";
open(NET, "netstat -i -n |")    || die "can't fork netstat: $!";
while (<NET>) { }               # do something with input
close(NET)                      || die "can't close netstat: $!";

The reason the three-argument version is considered safer is because
user-specified filenames cannot override the mode. For instance:

open my $fh, scalar <STDIN>;
print $fh "Foo!";

...and the user types '>foo'. Well, that wipes out whatever was in `foo'
earlier. On the other hand, you will end up with a filename like '>foo'...

***** Filehandle Names

By convention, these are usually `FH' and `LOG', though CamelCase would still
distinguish them from Perl keywords, as Perl is fully case-sensitive. This
unofficial standard applies, of course, only if you are not using IO::Handle
objects or anonymous glob references, available even in only semi-recent
versions of Perl 5.

The filehandles STDIN, STDOUT, and STDERR are predefined, and I'm sure you can
guess what they mean. They can be closed and redefined as you please. Ex:

close STDERR;
open STDERR, ">>errors.log";

*** Exception Handling Idiom

open my $file, "C:/autoexec.bat" or die "Can't open C:/autoexec.bat for reading $!";

**** Basic Principle

`or' is the ultra-low priority Boolean `or' operator, and is traditionally
used for control statements, but can serve as a terse if/else control
structure. If `open' succeeds, it will return a true value, and Perl will not
waste any time executing the opposing clause (short-circuit). If it does not
succeed, this will, in effect, cause the termination of the program, as it
moves on to evaluate the second. In the void context, this is useless as a
logical statement, but does make a great substitute for a full-fledged control
statement.

**** `die'

The manual does such an excellent job of explaining this that I won't even
bother to paraphrase it:

``Outside an "eval", prints the value of LIST to "STDERR" and exits with the
current value of $! (errno). If $! is 0, exits with the value of "($? >> 8)"
(backtick `command` status). If "($? >> 8)" is 0, exits with 255. Inside an
"eval()," the error message is stuffed into $@ and the "eval" is terminated
with the undefined value. This makes "die" the way to raise an exception.''

`warn' does much the same thing, but does not end the program. and is
therefore not responsible for setting an exit value and all that other jazz.

***** Useful Advice

``Hint: sometimes appending ", stopped" to your mes­ sage will cause it to
make better sense when the string "at foo line 123" is appended. Suppose you
are running script "canasta".''

Beautiful.

**** $!

This is the magic punctuation variable that signifies either an error message
or just an `errno' from a system or library call, depending on whether it is
used as a string or a number. It is /only/ useful immediately after an
unsuccessful system call. Ex:

if (open(FH, $filename)) {
    # Here $! is meaningless.
    ...
} else {
    # ONLY here is $! meaningful.
    ...
    # Already here $! might be meaningless.
}

# Since here we might have either success or failure,
# here $! is meaningless.

*** The Angle Operator

Internally, this is the `readline' function, which is also visible to the
user. In scalar context, `readline' or the angle operator (more formally known
as the line input operator) takes an expression that somehow evaluates to a
filehandle and reads a 'line' from it, that is, up to and including $/, the
input record separator. More specifically, the expression representing a
filehandle can evaluate to a glob, such as `*STDIN' (this is actually only
useful with the `readline' function proper), filehandle (just `STDIN', for
instance), or a reference that indicates a filehandle indirectly (such as
`$fh'). In list context, this slurps the whole file and returns it as a list.
If $/ is undefined, the whole is read in. Lastly, upon reaching EOF,
`readline' returns `undef'.

**** Line Input Operator in `while' Constructs

while (<$fh>) {
  ...
}

is shorthand for:

while (defined($_ = <$fh>)) {
  ...
}

``The $_ variable is not implicitly localized. You'll have to put a
"local $_;" before the loop if you want that to happen.'' Ex:

while (local $_ = <STDIN>) {
  # Perl will still automatically use a `defined' test; the method of
  # assignment is not important, just that there is exactly one expression
  # that assigns a line from a filehandle to `$_'
}

Note that Perl uses the `defined' test to avoid premature breaking when a line
contains, for example, "0" or nothing ("").

**** Use of the Null Filehandle

The null filehandle, affectionately referred to as the 'diamond operator',
abstracts files given on the command line into a uniform stream, emulating the
behavior of `sed' and `awk'. More accurately, it looks at the current value of
@ARGV when first invoked. If there are no values in @ARGV, it sets the value
of $ARGV[0] to '-' (standard input), effectively acting like <STDIN>.

*** `print' and Explicit Filehandles

print FH LIST. No comma between the FH and the LIST. `print' actually runs
like this by default: `print STDOUT "foo $bar baz..."'

*** `system'

From the manual, which again does such an excellent job of explaining things
that I don't even care to write it in my own words:

``Does exactly the same thing as "exec LIST", except that a fork is done
first, and the parent process waits for the child process to complete. Note
that argument processing varies depending on the number of arguments. If there
is more than one argument in LIST, or if LIST is an array with more than one
value, starts the program given by the first element of the list with
arguments given by the rest of the list. If there is only one scalar argument,
the argument is checked for shell metacharacters, and if there are any, the
entire argument is passed to the system's command shell for parsing (this is
"/bin/sh -c" on Unix platforms, but varies on other platforms). If there are
no shell metacharacters in the argument, it is split into words and passed
directly to "execvp", which is more efficient.''

`system' is a very unusual call, because its return value, which is the same
as that returned by `wait(2)', or -1 if the command could not be found, does
not necessarily correspond to any true or false value as per Perl. This 16-bit
return value contains the actual exit code in the high half (0 for normal
execution), and any specific error details in the low half (again, 0 for
normal execution). In effect, normal execution is a false value according to
Perl! So clauses like this will /not/ work correctly:

system "ls" or die "ls not found!";

*** $^O

This variable is a string that tells you what Perl thinks your operating
system is. That name is in fact two characters long, not a C-o. Ex:

print $^O;
linux

*** Backticks

Backticks in Perl intuitively act just an awful lot like they do in the shell;
Perl executes the command with the shell, so that wildcards, variables and the
like will be recognized, and then returns the output. More specifically, they
are treated as double-quoted strings, allowing the interpolation of variables
and escape sequences. In scalar context, they capture the output of the
command into a string, in list context, a list split by the current value of
$/ (although no chomping is done).

Backticks only capture standard output, so you'll have to use shell
redirection flags to discard or capture standard error. Ex:

my @full = qx#ls foo 2>&1#;
do { chomp and print } for @full;

The backtick operator also has the generic `qx' form, which behaves otherwise
like the other quoting operators. Use of ' as a delimiter suppresses Perl's
double-quoted interpretation of the string. Ex:

$perl_info  = qx(ps $$);            # that's Perl's $$
$shell_info = qx'ps $$';            # that's the new shell's $$

*** Directory Handling

**** `chdir'

`chdir' changes the directory to its single string argument; if none be given,
it tries $ENV{HOME} instead. `getcwd' in POSIX or Cwd will return the current
directory. `cwd' effectively does the same, but calls `pwd'.

**** `opendir'

`opendir' behaves in a manner very similar to `open'. Directory handles live
in a kind of namespace of their own altogether, so one can have a filehandle
/and/ a dirhandle named FOOBAR. Here again, one can (and probably should) use
autovivified scalars.

**** `readdir'

This acts like `readline' on a dirhandle. It is not, however, usable with the
angle brackets, unfortunately, forcing us to use ugly constructs like this:

while (defined($_ = readdir $dir)) {
  ...
}

**** `rewinddir'

Sets the position at the beginning of the dirhandle. No caveats!

*** Argument Processing

As you have seen before, `<>' will iterate over `@ARGV'. `@ARGV' is the array
containing arguments to the Perl executable, perl. ($0 holds the called
executable name itself). When reading from <>, $ARGV, the scalar, holds the
current filename.

*** File Test Operators

These are essentially identical to the shell file test operators, and are
documented under perldoc -f -X. A few commonly used ones (or at least ones I
find myself using often):

-e: file exists
-f: file is a plain file
-d: file is a directory
-l: file is a symbolic link
-x: file is executable

*** perldoc

Perl has its own documentation format, POD (Plain Old Documentation), which is
way more spiffy and modern than the archaic troff, and displays in diverse
media; it even supports hyperlinking. The premier reader for POD
documentation, perldoc, does not support this feature, but is awfully handy
anyway. Ex:

perldoc -f sort # Info on the `sort' function; # comments work for shell, too!
perldoc perl    # Directory of information
perldoc POSIX   # POSIX module documentation

** Lesson 6

*** Regular Expressions

I already have the basics down, having worked with sed, grep, locate, and
(gasp!) awk for years. All I need to document are Perl-specific elements.

**** `m//'

***** Basic Form

In its most terse expression, the match operator need not bind to any
expression, nor include the initial 'm'. Here is one such instance of the
match operator:

/^http:/ and print while (<>);

As you might have already guessed, an unqualified pair of forward slashes can
delimit a pattern, and this pattern will act upon $_ by default. You will also
notice that this basic form of pattern matching returns true or false
depending on the success of the match.

The pattern match operator may be fully qualified as with other string
operators; as in other cases, this is useful for instances where a unique
delimiter is desirable:

m{^/opt/plt} and print while (<>);

As with `qx', strings given to `m//' will be treated as though they were
double quoted unless ' is used as a delimiter. One may even substitute
instances of `m//' and the pattern-matching left side of `s///' with its own
generic form, `qr' (quote regex), which compiles a regex for repeated use. Ex:

my $pat = qr/Hanumizzle/;
my $text = "Hanumizzle soared through the clouds to distant Lanka.";
$text =~ s/$pat/Hanuman/g;

`m//', or just `//', is shorthand for the last successfully-matched pattern.
This may even be used with the substitution operator (see below), such as in
this snippet:

# inspired by :1,$g/fred/s//WILMA/
while (<>) {
    ?(fred)?    && s//WILMA $1 WILMA/;
    ?(barney)?  && s//BETTY $1 BETTY/;
    ?(homer)?   && s//MARGE $1 MARGE/;
} continue {
    print "$ARGV $.: $_";
    close ARGV  if eof();           # reset $.
    reset       if eof();           # reset ?pat?
}

***** `??'

...is what I said the first time I saw this. If the match operator is written
`m??', or simply `??', the regex matches only once, and does not match again
until a call to reset (no arguments). Programming Perl gives an excellent
example of its use:

open DICT, "/usr/dict/words" or die "Can't open words: $!\n";
while (<DICT>) {
  $first = $1 if ?(^neur.*)?;
  $last  = $1 if /(^neur.*)/;
}

print $first,"\n";          # prints "neurad"
print $last,"\n";           # prints "neurypnology"

***** `m//' Specific Modifiers

g: in list context, repeat match until string is consumed, then return a list
   of all matches ($1..$n are not used); in scalar context do progressive
   matching, updating pos() for the string and allowing use of the `\G'
   metacharacter, which anchors where the last match left off, allowing
   context sensitive matching and other tricks
c: do not reset pos() after failed match

Ex of `/g':

while ($x =~ /(\w+)/g) {
  print "Word is $1, ends at position ", pos $x, "\n";
}

prints

    Word is cat, ends at position 3
    Word is dog, ends at position 7
    Word is house, ends at position 13

**** Binding

Any expression evaluating to a string can be bound to the match operator (as
well as tr/// and s///); as match does not modify the operand string (compare
with s///), this may be something like an anonymous string returned from a
function. The syntax is nearly identical to the shell operator of the same
function:

print $test if $test =~ /pattern/;

There exists a convenient inverted form, `!~', which is a little easier than
writing the not (!) operator before the pattern.

Finally, it is important to observe that the binding operators, `=~' and `!~'
rank higher in precedence than, say, assignment, hence idioms like these:

($foo = $bar) =~ s/foo/bar/g;

...which assigns the value of $bar to $foo, then binds the substitution
operator to the new value of $foo (remember, assignments yield the variable
itself).

**** `s///'

`s///' is the substitution operator, akin to the operator of the same name in
`sed' and `vi'. However, one vital difference that trips up users of more
traditional utilities is that the right hand side of the operation should use
the $1..$n variables instead of backreferences, because they are technically
outside of the pattern match proper. (Remember, `s///' is treated as a double
quoted string with full implications for use.) The substitution operator
harbors the significant advantage of total flexibility in quoting; you can
even mix styles. Ex:

$foo =~ s{/usr/bin}#/usr/local/bin#g

The return value of the substitution operator does not rely on context; it is
the number of times the action succeeded, or false if there was no match. This
operator is most commonly used for its side effects, though.

***** `s///' Specific Modifiers

g: replace pattern globally throughout a string
e: evaluate right-hand side in replacement

#!/usr/bin/perl -w
use strict;

my $string = "2 3 4 5 6";
$string =~ s/ (\d+) / 2 * $1 /xge; # double every number you match

print $string;

4 6 8 10 12

**** Common Modifiers

x: ignore whitespace that is not backslashed or in a character class, permits
   comments as well
i: case-insensitive searching (not valid for most writing systems :)
m: interpret operand string as multi-lined; `^' and `$' will anchor at
   newlines within the string; \A and \Z still match the absolute ends  (`$/'
   apparently does not apply!!)
s: `.' matches newline, `^' and `$' anchor at the absolute beginning and end
   of the string
o: compile regex but once

`s' and `m' used concurrently, do not refer to sexual perversions, but instead
to a hybrid behavior where `.' behaves as in `s', and `^' and `$' may refer to
absolute extremities of the string /or/ newlines.

$x = "There once was a girl\nWho programmed in Perl\n";

$x =~ /^Who/;   # doesn't match, "Who" not at start of string
$x =~ /^Who/s;  # doesn't match, "Who" not at start of string
$x =~ /^Who/m;  # matches, "Who" at start of second line
$x =~ /^Who/sm; # matches, "Who" at start of second line

$x =~ /girl.Who/;   # doesn't match, "." doesn't match "\n"
$x =~ /girl.Who/s;  # matches, "." matches "\n"
$x =~ /girl.Who/m;  # doesn't match, "." doesn't match "\n"
$x =~ /girl.Who/sm; # matches, "." matches "\n"

**** Character Classes

\w: word characters, that is letters, underscores, and numbers, admitting the
influence of `use locale' and Unicode support
\d: digit characters (which, I imagine, probably includes ek, do, tin, etc.)
\s: whitespace

Capitalizing the letter will invert the class (e.g., \D matches all
non-digits).

**** Capturing

...can be a bit of a pain in the ass, until you know just what to do.
Submatches in Perl are very simply achieved with bare parentheses, no
backslashes needed. There are really three ways they are useful:

***** Submatch Variables

Perl will expose submatches through the scalar variables $1..$n; this can go
into the double digits and beyond. ($0, as you know, is reserved for the
executable name.) These will remain available until overwritten. One of my
favorite idioms goes:

perl -wnle 'print $1 if /foo bar (baz)/;'

...which prints the match if and only if it occured. This essentially works
like `grep' on crack, when exploiting the full power of Perl regexps.

***** List Context

In list context, the match will return all matches in a flat list. There is
(presently) no way to build complex `thingies' from matches, but this is in
the works for Perl 6. The only other problem with this method is making sure
the match *is in* list context:

my @content = split /, /, ($tag =~ m{content=.*?"(.*?)"}s); # No
my @content = split /, /, ($tag =~ m{content=.*?"(.*?)"}s)[0]; # Yes

The problem with the first statement is that split expects a string after the
pattern, putting the match in scalar context. This will just give me `1',
which is not what I want. The second form preempts this faulty interpretation
by selecting the first subscript from the anonymous list returned by the
match, effectively 'beating Perl to the punch'. (It is possible to defeat
Perl?)

***** Backreferences

These are for use within a pattern, actually, although Perl will begrudgingly
allow you to use them in the replacement half of a `s///', as with `sed' and
other inferior programs. You are really better off doing this, though:

my $name = "Hanumizzle Vanara";
$name =~ s/(\w+) (\w+)/$2, $1/;
print $name;
Vanara, Hanumizzle

*** The Ternary Operator

This acts the same as in C, and can even be used in `return' statements. (For
whatever reason, no version of GCC has ever let me do this, is it standard?):

# In your face, C
return /\w/ ? "OK" : "Does not contain any word characters!";

This is still a tiny, crippled subset of fully first class control structures
available in Lisp and Scheme, but let's pretend I didn't say that.

*** `split'

`split' takes zero, one, two (or three, but this third, which limits the
degree of splittage, is pretty arcane) arguments. The first argument, if any,
is the pattern on which to split, and the second is the operand string.
`split' will remove all instances of this pattern from the string and produce
a list of the values in between the matches; by default, `split' retains empty
leading fields, while omitting empty trailing fields. In scalar context,
`split' will return the number of fields so produced and bung the list into
`@_'. Because Perl uses `@_' for function arguments, the side effects are
understandably deprecated (and will produce a warning), and it is advised that
you use `split' in list context, which returns all the matches. (Besides,
functional programming is cooler.)

If no operand string is given, `split' will follow good Perl protocol and
default to `$_'. If no pattern is given, `split' splits on whitespace,
omitting leading and trailing whitespace beforehand; this is frequently
desirable.

``A pattern matching the null string (not to be confused with a null pattern
"//", which is just one member of the set of patterns matching a null string)
will split the value of EXPR into separate characters at each point it matches
that way. For example:

print join(':', split(/ */, 'hi there'));

produces the output 'h:i:t:h:e:r:e'.

Using the empty pattern "//" specifically matches the null string, and is not
be confused with the use of "//" to mean "the last successful pattern
match".''

*** `join'

`join' returns a string consisting of all arguments following the first
argument, joined by the first argument itself. That simple.

*** `grep'

I'll just pilfer the description from the documentation again:

This is similar in spirit to, but not the same as, grep(1) and its relatives.
In particular, it is not limited to using regular expressions.

Evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to
each element) and returns the list value consisting of those ele­ ments for
which the expression evaluated to true. In scalar context, returns the number
of times the expression was true.

@foo = grep(!/^#/, @bar);    # weed out comments

or equivalently,

@foo = grep {!/^#/} @bar;    # weed out comments

Note that $_ is an alias to the list value, so it can be used to modify the
elements of the LIST. While this is useful and supported, it can cause bizarre
results if the elements of LIST are not variables. Similarly, grep returns
aliases into the original list, much as a for loop's index variable aliases
the list elements. That is, mod­ ifying an element of a list returned by grep
(for example, in a "foreach", "map" or another "grep") actually modifies the
element in the original list. This is usually something to be avoided when
writing clear code.

*** `map'

Likewise for `map':

Evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to
each element) and returns the list value composed of the results of each such
evaluation. In scalar context, returns the total number of elements so
generated. Evalu­ ates BLOCK or EXPR in list context, so each ele­ ment of
LIST may produce zero, one, or more ele­ ments in the returned value.

@chars = map(chr, @nums);

translates a list of numbers to the corresponding characters. And

%hash = map { getkey($_) => $_ } @array;

is just a funny way to write

%hash = ();
foreach $_ (@array) {
  $hash{getkey($_)} = $_;
}

Note that $_ is an alias to the list value, so it can be used to modify the
elements of the LIST. While this is useful and supported, it can cause bizarre
results if the elements of LIST are not variables. Using a regular "foreach"
loop for this purpose would be clearer in most cases. See also "grep" for an
array composed of those items of the original list for which the BLOCK or EXPR
evaluates to true.

"{" starts both hash references and blocks, so "map { ..." could be either the
start of map BLOCK LIST or map EXPR, LIST. Because perl doesn't look ahead for
the closing "}" it has to take a guess at which its dealing with based what it
finds just after the "{". Usually it gets it right, but if it doesn't it won't
realize something is wrong until it gets to the "}" and encounters the missing
(or unexpected) comma. The syntax error will be reported close to the "}" but
you'll need to change something near the "{" such as using a unary "+" to give
perl some help:

%hash = map {  "\L$_", 1  } @array  # perl guesses EXPR.  wrong
%hash = map { +"\L$_", 1  } @array  # perl guesses BLOCK. right
%hash = map { ("\L$_", 1) } @array  # this also works
%hash = map {  lc($_), 1  } @array  # as does this.
%hash = map +( lc($_), 1 ), @array  # this is EXPR and works!

%hash = map  ( lc($_), 1 ), @array  # evaluates to (1, @array)

or to force an anon hash constructor use "+{"

@hashes = map +{ lc($_), 1 }, @array # EXPR, so needs , at end

and you get list of anonymous hashes each with only 1 entry.

** Lesson 7

*** References

When it came to handling complex, organized data, Perl 4 was a big pain in the
ass. Huge pain in the ass. As you know, the only valid datum type for arrays
and hashes is a scalar. This may be a string, integer, or even floating point,
but it does not solve the problem of hierarchical data. It is possible to do
symbolic evaluation of names (as in Vim scripting), but that's a desperate
hack. Lastly, there still existed the Perl-specific problem of array and hash
interpolation in subroutines (or anywhere for that matter, but subroutines in
particular), which made it practically impossible to pass multiple mutable
data types into a function. Perl 5 introduced 'hard references', which equate
roughly to C pointers, though Perl safeguards the user from nasty segfaults
(woo hoo!) and various other memory violations associated with, say, C and
C++.

**** Forming References

There are essentially two ways one can create a hard reference. The first is
the unary backslash (outside a string, of course), which acts rather like the
& 'address of' operator in C. One may apply it to a single variable, or a
single anonymous datum, or even a list of either type, freely combined. All
these statements are valid:

my $foo = \7;
my $bar = \@baz;
my @quux = \(qw/foo bar baz/ $this $that); # Even this

The backslash operator can even be nested, though I'm not sure how this is
useful:

my $foo = \\"foo";
print $$$foo; # Dereference a reference to a reference!

References have a cryptic 'print syntax' all their own, which, when one things
about it, resembles such elements of Lisp systems as:

#<hash-table 'eql nil 0/65 0x93f86c8> ;; What's with the 28 bits?

A Perl reference looks like this when printed: 'HASH(0x812f1c8)', indicating
its memory address (28-bit??) and type. Perl reference /values/ are inherently
bound to the type of their referents, although a given scalar variable can be
switched to hold a new reference at any time, of course.

It is interesting to note that the garbage collector will preserve data which
have become referents, even when leaving scope. Ex: 

my $foo = eval { my @foo = qw/foo bar baz/; \@foo };

Or better:

my $foo = eval { [ qw/foo bar baz/ ] }; # See below

***** Constructors

There exist two constructors for the effective creation of anonymous arrays
and hashes. These two are distinct (unlike regular list and hash constructors)
because references appear to be typed in order to record important details
about their semantics. Anyway, these are: `[ ]' and `{ }'. You can ensure that
`{ }' is interpreted as a hashref constructor with the unary plus. Ex:

my $foo = [ qw/foo bar baz/ ];
my $bar = { foo => "bar", baz => "quux" };

***** The Lambda Nature

Perl can /even/ form references to anonymous subroutines when using `sub'
forms without a name, just a block.

$coderef = sub { print "Boink!\n" }; # Make sure you include the semicolon!

These anonymous subroutines act as closures with respect to the surrounding
lexical environment. You know what this means already, so no need for an
example. Needless to say, this is way awesome.

**** Dereferencing

***** Basic Form

For simple scalar variables, it is simple to enough to prepend (and append, if
necessary) the syntax used for normal variable access. MahaEx:

$foo         = "three humps";
$scalarref   = \$foo;         # $scalarref is now a reference to $foo
$camel_model = $$scalarref;   # $camel_model is now "three humps"

$bar = $$scalarref;

push(@$arrayref, $filename);
$$arrayref[0] = "January";            # Set the first element of @$arrayref
@$arrayref[4..6] = qw/May June July/; # Set several elements of @$arrayref

%$hashref = (KEY => "RING", BIRD => "SING");  # Initialize whole hash
$$hashref{KEY} = "VALUE";                     # Set one key/value pair
@$hashref{"KEY1","KEY2"} = ("VAL1","VAL2");   # Set two more pairs

&$coderef(1,2,3);

print $handleref "output\n";

***** The Block Method

For instances where it is not possible to dereference a simple variable
(e.g., dereferencing function return values directly or dereferencing hash or
array indices), one may dereference the value of BLOCK, where BLOCK evaluates
to a valid referent.

# %dispatch is a hash of coderefs. This returns $index in %dispatch and calls
# it with the arguments 1, 2, and 3.
&{ $dispatch{$index} }(1, 2, 3);

***** The Arrow Operator

For individual elements of hash and array refs (Why not slices? Don't ask me.)
and coderefs, it is possible to use this syntactosaccharide to obviate the
need for the block and magic characters, which can get really tedious for
complex thingies. MahaEx:

$  $arrayref  [2] = "Dorian";         #1
${ $arrayref }[2] = "Dorian";         #2
   $arrayref->[2] = "Dorian";         #3

$  $hashref  {KEY} = "F#major";       #1
${ $hashref }{KEY} = "F#major";       #2
   $hashref->{KEY} = "F#major";       #3

&  $coderef  (Presto => 192);         #1
&{ $coderef }(Presto => 192);         #2
   $coderef->(Presto => 192);         #3

One may even nest such operations, as the arrow operator associates left to
right:

print $array[3]->{"English"}->[0];

The arrow is optional between brackets or braces, or between a closing bracket
or brace and a parenthesis for an indirect function call, so it is possible,
for instance, to rewrite the above expression like so:

print $array[3]{"English"}[0];

Likewise:

$dispatch{$index}(1, 2, 3);

***** Autovivification

In cases where access to a non-existent referent is attempted in an lvalue
context, Perl will automatically complete the pathway. This for instance,
works:

my @array;
$array[3]->{"English"}->[0] = "January";

Note that this is a little different:

my $foo = "baz"; # $foo is /already/ defined
$$foo = "bar";   # Attempt to autovivify defined value
print $foo;      # 'baz'
print $$foo;     # 'bar': WTF? Reference and ordinary scalar at once?
print $baz;      # 'bar' Oh...

If the the value is already defined, the '$foo' in '$$foo' will be
symbolically evaluated, and the assignment will go to '$baz'.

In an rvalue context, the effect is a little different. For instance:

print $array[3]{"English"}[0];

...where the requisite elements are not defined, only autovivifies the thingie
up to $array[3]{"English"}. This is because everything before the final index
0 must be assigned to in order to complete the path to that index
(essentially, they are lvalues when viewed that way).

This use of thingie autovivification in an rvalue context is faulty and may be
removed in a future version of Perl.

**** `ref'

`ref' determines the type of an expression which evaluates to a referent,
returning that type in the form of uppercase STRING, or the empty string if
the value was invalid (not a referent). Built-in types include:

SCALAR
ARRAY
HASH
CODE (function or closure)
GLOB
REF (another reference)
LVALUE (??)
IO (the IO handle associated with files and directories)

Use of `ref' on an instance reference will yield the name of its class. (See
OOP below.)

*** `eval' and `do'

**** `eval'

`eval' is one function useful for execution of arbitrary code within its own
environment. While not as flexible as the read-eval-print loop in LISP or
Forth's own peculiar brand of reflexive examination (although the modules in
the `B' hierarchy might help here), it remains pretty handy in instances where
it is necessary to diverge from the normal, static execution of code.

The first form of `eval' accepts an expression evaluating (in scalar context)
to a string, which contains Perl expressions. If omitted, this defaults to
`$_'. The string is parsed and executed at runtime, in the lexical context of
the current Perl program. This is useful for instances where parsing and
execution is to be delayed until a certain point: in particular, temporally
arbitrary `use' statements, which are surrounded by an implicit `BEGIN { }'
block by default.

`eval BLOCK', on the other hand, is primarily useful for catching exceptions.
The block is parsed and compiled once along with the surrounding program. If
any expressions contained in the block raise a runtime error, `$@' will store
the error, which will be available outside the block for further action
(compare to `eval STRING', which can trap compile-time errors). Other error
variables (such as $!) will likewise be available.

From the manual:

``In both forms, the value returned is the value of the last expression
evaluated inside the mini-program; a return statement may be also used, just
as with subroutines. The expression providing the return value is evaluated in
void, scalar, or list context, depending on the context of the eval itself.
See "wantarray" for more on how the evaluation context can be determined.''

**** `do'

`do' may be used to execute a block, but I can thus far discern no difference
between `do' and `eval' in this respect. However, when given an expression
representing a filename, `do' will evaluate the contents thereof, with several
conveniences for the programmer: Perl will check @INC and update %INC if the
file is found, attribute errors to the file if necessary, and separate the
execution from the current lexical scope (compare with `eval').

From the manual:

``...It's the same [as `eval'], however, in that it does reparse the file
every time you call it, so you probably don't want to do this inside a loop.

If "do" cannot read the file, it returns undef and sets $! to the error. If
"do" can read the file but cannot compile it, it returns undef and sets an
error message in $@. If the file is success­ fully compiled, "do" returns the
value of the last expression evaluated.

Note that inclusion of library modules is better done with the "use" and
"require" operators, which also do automatic error checking and raise an
exception if there's a problem.''

One can also use forms like `do { } while' and `do { } until', which execute
the block at least once, whereas `while' evaluates the condition first and may
preempt executing the loop altogether.

Control statements like `next' ***DO NOT*** work in `do { }'.

``Here's how a block can be used to let loop-control operators work with a do{}
construct. To next or redo a do, put a bare block inside:

do {{
  next if $x == $y;
  # do something here
}} until $x++ > $z;

For last, you have to be more elaborate:

{
  do {
    last if $x = $y ** 2;
    # do something here
  } while $x++ <= $z;
}

And if you want both loop controls available, you'll have put a label on those blocks so you can tell them apart:

DO_LAST: {
           do {
DO_NEXT:        {
                  next DO_NEXT if $x == $y;
                  last DO_LAST if $x =  $y ** 2;
                  # do something here
                }
              } while $x++ <= $z;
         }

But certainly by that point (if not before), you'd be better off using an
ordinary infinite loop with last at the end:

for (;;) {
  next if $x == $y;
  last if $x =  $y ** 2;
  # do something here
  last unless $x++ <= $z;
}

*** `sleep'

``sleep EXPR
  sleep

Causes the script to sleep for EXPR seconds, or forever if no EXPR. May be
interrupted if the process receives a signal such as "SIGALRM". Returns the
number of seconds actually slept. You probably cannot mix "alarm" and "sleep"
calls, because "sleep" is often implemented using "alarm".

On some older systems, it may sleep up to a full second less than what you
requested, depending on how it counts seconds. Most modern systems always
sleep the full amount. They may appear to sleep longer than that, however,
because your process might not be scheduled right away in a busy multi­
tasking system.

For delays of finer granularity than one second, you may use Perl's "syscall"
interface to access setitimer(2) if your system supports it, or else see
"select" above. The Time::HiRes module (from CPAN, and starting from Perl 5.8
part of the stan­ dard distribution) may also help.

See also the POSIX module's "pause" function.''

*** `localtime'

``localtime EXPR

Converts a time as returned by the time function to a 9-element list with the
time analyzed for the local time zone. Typically used as follows:

#  0    1    2     3     4    5     6     7     8
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
                                            localtime(time);

All list elements are numeric, and come straight out of the C `struct tm'.
$sec, $min, and $hour are the seconds, minutes, and hours of the speci­ fied
time. $mday is the day of the month, and $mon is the month itself, in the
range 0..11 with 0 indicating January and 11 indicating December. $year is the
number of years since 1900. That is, $year is 123 in year 2023. $wday is the
day of the week, with 0 indicating Sunday and 3 indicat­ ing Wednesday. $yday
is the day of the year, in the range 0..364 (or 0..365 in leap years.) $isdst
is true if the specified time occurs during daylight savings time, false
otherwise.

Note that the $year element is not simply the last two digits of the year. If
you assume it is, then you create non-Y2K-compliant programs--and you wouldn't
want to do that, would you?

The proper way to get a complete 4-digit year is simply:

$year += 1900;

And to get the last two digits of the year (e.g., '01' in 2001) do:

$year = sprintf("%02d", $year % 100);

If EXPR is omitted, "localtime()" uses the current time ("localtime(time)").

In scalar context, "localtime()" returns the ctime(3) value:

$now_string = localtime;  # e.g., "Thu Oct 13 04:54:34 1994"

This scalar value is not locale dependent but is a Perl builtin. For GMT
instead of local time use the "gmtime" builtin. See also the "Time::Local"
module (to convert the second, minutes, hours, ... back to the integer value
returned by time()), and the POSIX module's strftime(3) and mktime(3) func­
tions.

To get somewhat similar but locale dependent date strings, set up your locale
environment variables appropriately (please see perllocale) and try for
example:

    use POSIX qw(strftime);
    $now_string = strftime "%a %b %e %H:%M:%S %Y", localtime;
    # or for GMT formatted appropriately for your locale:
    $now_string = strftime "%a %b %e %H:%M:%S %Y", gmtime;

Note that the %a and %b, the short forms of the day of the week and the month
of the year, may not necessarily be three characters wide.

*** Divergent Control Statements

**** `next'

`next' will move to the next iteration of the innermost loop, or to next
iteration of the loop labelled LABEL if given such a label (LABEL is a
bareword). Ex:

OUTER: for my $wid (@ary1) {
INNER:   for my $jet (@ary2) {
           next OUTER if $wid > $jet;
           $wid += $jet;
         }
       }

**** `last'

`last' will terminate execution of either the innermost loop or that loop
specifed by LABEL. Ex:

LINE: while (<STDIN>) {
        last LINE if /^$/;      # exit when done with header
        ...
      }

**** `redo'

``The "redo" command restarts the loop block without evalu­ ating the
conditional again. The "continue" block, if any, is not executed. This command
is normally used by programs that want to lie to themselves about what was
just input.

For example, when processing a file like /etc/termcap. If your input lines
might end in backslashes to indicate con­ tinuation, you want to skip ahead
and get the next record.
    
while (<>) {
  chomp;
  if (s/\\$//) {
    $_ .= <>;
    redo unless eof();
  }
  # now process $_
}

which is Perl short-hand for the more explicitly written version:
    
LINE: while (defined($line = <ARGV>)) {
  chomp($line);
  if ($line =~ s/\\$//) {
    $line .= <ARGV>;
    redo LINE unless eof(); # not eof(ARGV)!
  }
  # now process $line
}''

*** `printf' and `sprintf'

I'll get around to documenting these when I see the need.

** Lesson 8

*** Modules

Modules are the fundamental unit of reuse in Perl, and packages are their
vehicle. They exist in two basic forms (there is, after all, more than one way
to do it), traditional and object-oriented. Traditional modules define
subroutines and variables which the user may import. The canonical mechanism
used here, `Exporter', actually relies on inheritance in Perl's object system!
Object-oriented modules, on the other hand, usually leave nothing to be
imported, preferring instead to use instance construction and method calls.
Methods should /never/ be exported. Perl does not suggest that you buy into
any narrow-minded dogma about which to use. Horses for courses.

Furthermore, there exists a selection of modules that influence the execution
of Perl itself, called /pragmatic modules/, or /pragmata/, if you like Latin.
`warnings' and `strict' are perhaps the best known; other important pragmata
are `constant', `fields', and `diagnostics'. (Note that they all begin with a
lowercase letter.)

The central repository for all ilk of Perl module is CPAN (Comprehensive Perl
Module Archive). CPAN hosts a main site, cpan.org, with mirrors around the
world. The organization supplies documentation and Perl itself, as well as
other useful utilities, but CPAN is best known for the plethora of modules it
hosts.

**** Using Modules (Or, `use'ing Modules)

***** `require'

****** Version Request

First of all, `require' can be used to ensure that perl meets a certain
version or greater. The recommended form of such a statement is: `require
5.006', where `5.006' is a simple numeric argument which is compared to `$['.
This argument, of course, should technically contain only one decimal point,
but recent versions of Perl will rewrite `5.8.6' as `5.008006'.

****** Module Request

`require' may also request modules from the directories listed in the `@INC'
array, which is defined by Perl and catalogs directories to be searched for
modules. The semantics vary subtly, depending on the argument, however. The
first and most common use is with a bareword argument. `require' will
translate double colons into the platform-specific directory delimiter (`/'
here), append `.pm', and search through `@INC' for the first matching module
file, which it will the execute. So, for example, if `@INC' is:

/usr/lib/perl5/5.8.6/i486-linux /usr/lib/perl5/5.8.6
/usr/lib/perl5/site_perl/5.8.6/i486-linux /usr/lib/perl5/site_perl/5.8.6
/usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl .

...and I do: `require Foo::Bar', `require' might find
`/usr/lib/perl5/site_perl/Foo/Bar.pm', which it will then execute and load
into the current Perl interpreter. Actually, I lied: if `require' first finds
a byte-compiled file generated by `B::Bytecode' (ending in `.pmc') whose
modification timestamp is no older than that of any matching `.pm', `require'
will load this instead. So, it's an awful lot like loading byte-compiled Emacs
LISP files (even better, actually), substituting `@INC' for all instances of
`load-path'. `require' will then update `%INC', a hash containing the names of
all files thus far `require'd in this manner. All files loaded by `require'
must return a true value. The traditional implementation is to add `1;' at the
end of package files. Ex:

  package Foo;
  
  use base qw/Exporter/;
  our @EXPORT_OK = qw/foo bar baz/;
  
  sub foo {
    ...
  }
  
  sub bar {
    ...
  }
  
  sub baz {
    ...
  }

  1; # Stuff after `__END__' not evaluated

  __END__

  ...Pod documentation goes here...

As I said, there is another form of `require', although it seems to have
fallen out of use generally, and we rarely ever see it. The argument is any
expression that evaluates to the name of a package (but not a bareword!),
including the `.pm' or `.pmc' extension. Double colons will not be expanded.
Otherwise, this form acts the same as the above.

***** `use'

`use' is essentially a shorthand form of:

  BEGIN { require Module; Module->import; }

`use' will include a `Module' according to the rules described above and then
call the `import' method in the same package (this is usually inherited from
`Exporter' as we'll see below). `BEGIN' blocks all evaluate as soon as they
are defined, ensuring no possible delay in the execution of statements which
may be vital to the program. `use' may also require a certain version of Perl
as with `require', but there really seems to be no difference in semantics.
(After all, you can't import symbols from the numerical representation of a
version of Perl, can you?)

****** Importing Symbols

See below for all details. Atoms beginning with a colon (kinda like Common
LISP keywords, eh?) refer to `%EXPORT_TAGS':

``use Bestiary;                    # Import @EXPORT symbols
  use Bestiary ();                 # Import nothing
  use Bestiary qw(ram @llama);     # Import the ram function and @llama array
  use Bestiary qw(:camelids);      # Import $camel and @llama
  use Bestiary qw(:DEFAULT);       # Import @EXPORT symbols
  use Bestiary qw(/am/);           # Import $camel, @llama, and ram
  use Bestiary qw(/^\$/);          # Import all scalars
  use Bestiary qw(:critters !ram); # Import the critters, but exclude ram
  use Bestiary qw(:critters !:camelids);
                                   # Import critters, but no camelids''

``Because this is a wide-open interface, pragmas (compiler directives) are
also implemented this way. Currently implemented pragmas are:

  use constant;
  use diagnostics;
  use integer;
  use sigtrap  qw(SEGV BUS);
  use strict   qw(subs vars refs);
  use subs     qw(afunc blurfl);
  use warnings qw(all);
  use sort     qw(stable _quicksort _mergesort);''

Most pragmata are lexically scoped, so they may be, for instance, enabled or
disabled in a limited manner within a bare block:

  standard code...

  { 
    no strict;
    ...tricky shit...
  }

  ...more standard code 

****** Not Using Modules

``There's a corresponding "no" command that unimports meanings imported by
"use", i.e., it calls "unimport Module LIST" instead of "import".

  no integer;
  no strict 'refs';
  no warnings;''

***** Changing `@INC'

When changing the library search path, it is essential to escalate the
priority of the operation:

  BEGIN { push @INC, '/foo/bar/baz/baz-0.3'; }

is better written as:

  use lib '/foo/bar/baz/baz-0.3';

**** Writing Modules

***** `package' Declaration

In order for modules to work according to the file system metaphor described
above, it is necessary for the actual file representing the module to begin
with a `package' instantiation corresponding to the path. So, with a module
named '/usr/lib/perl5/5.8.6/Tie/RefHash.pm', I want to start with `package
Tie::RefHash;'. This is a real module, and the inquiring mind will notice that
this module contains a second package instantiation way down at the bottom,
`package Tie::RefHash::Nestable'. That's OK; as long as the essential portion
of the module falls under the same package name as the path relative to the
members of `@INC', that works. Note that you can't `use
Tie::RefHash::Nestable' though, although the contents of that package will be
available after `use Tie::RefHash'. Remember that packages inherently have
nothing to do with the compilation unit of the file, and vice versa. What Perl
/does/ with them ties the two together in some ways. Realize the difference,
but embrace the idiom.

***** Useful Pragmata

As always, you want `warnings' and `strict'. It may also be useful to `use
/version/' where /version/ is some version of Perl supporting a minimum of
functionality for the module. `use 5.004;', for instance, will forbid
compilation of the module if the version of Perl is less than 5.004, which was
a very significant maintenance release.

***** `Exporter'

For object-oriented modules, this can pretty much be ignored; some details may
remain important. However, for traditional modules, this is considered a
canonical part of the package itself. One usually sees an `Exporter' stanza
like this:

  require Exporter;
  our @ISA = qw/Exporter/; # Or `use base qw/Exporter/;'
  our @EXPORT    = qw($camel %wolf ram);              # Export by default
  our @EXPORT_OK = qw(leopard @llama $emu);           # Export by request
  our %EXPORT_TAGS = (                                # Export as group
                       camelids => [qw($camel @llama)],
                       critters => [qw(ram $camel %wolf)],
                     );

****** `require Exporter'

Remember, one does not `use Exporter'. The use of, well, `use' implies the use
of `import'. But the relation between `Exporter' and the client module rests
on the use of inheritance, not importing. Anyway, this execute the module and
prepare you for the next step.

****** `our @ISA = qw/Exporter/;'

This may be rewritten using the pragma `use base qw/Exporter/;', which
essentially does the same thing, but is 13373r. This tells `use' to search for
an `import' method in the `Exporter' namespace. (The package itself should not
provide such a method!). This will use the functionality of `import' in
`Exporter' for programs or packages using the current module. Eh? The
terminology varies depending on what is being used wrt what. It just happened
that way.

****** Exported Variables

Now, explaining this mess:

  our @EXPORT    = qw($camel %wolf ram);              # Export by default
  our @EXPORT_OK = qw(leopard @llama $emu);           # Export by request
  our %EXPORT_TAGS = (                                # Export as group
                       camelids => [qw($camel @llama)],
                       critters => [qw(ram $camel %wolf)],
                     );

`@EXPORT' is an array of symbols which one wishes to import into the calling
package by default. When not qualified with a type sigil, `Exporter' will
assume you are naming a function (e.g., &ram and &leopard will be available).
Likewise, `@EXPORT_OK' lists symbols which you may import by request. Finally,
`%EXPORT_TAGS' defines sets of symbols which you may import collectively by
way of the tag (key).

``Since the symbols listed within %EXPORT_TAGS must also appear in either
@EXPORT or @EXPORT_OK, the Exporter provides two functions to let you add
those tagged sets of symbols:

  %EXPORT_TAGS = (foo => [qw(aa bb cc)], bar => [qw(aa cc dd)]);
  
  Exporter::export_tags('foo');     # add aa, bb and cc to @EXPORT
  Exporter::export_ok_tags('bar');  # add aa, cc and dd to @EXPORT_OK''

Finally, `@EXPORT_FAIL' lists symbols which /may not/ be
imported. Any attempt to import symbols enumerated in `@EXPORT_FAIL' will be
trapped by the `export_fail' method in the /exporting/ package:

``If a module attempts to import any of these symbols the Exporter will give
the module an opportunity to handle the situation before generating an error.
The Exporter will call an export_fail method with a list of the failed
symbols:

  @failed_symbols = $module_name->export_fail(@failed_symbols);

If the export_fail method returns an empty list then no error is recorded and
all the requested symbols are exported. If the returned list is not empty then
an error is generated for each symbol and the export fails. The Exporter
provides a default export_fail method which simply returns the list
unchanged.''

*** Plain Old Documentation

Perl's Plain Old Documentation is a minimal markup language for documenting
Perl programs and modules, although there is nothing to prevent you from using
it for general purposes; for one, its syntax is way less arcane than that of
nroff, the primary medium for Unix manual pages. POD translators exist to
transform the POD source into a medium with more than intrinsic value as
documentation, such as HTML or text. Pure POD can express nearly any strictly
textual, one-column document, while the `=for' directive, when used with a POD
translator such as pod2html, expands POD's abilities to those of the new
medium. Indeed, even such a significant book as Programming Perl was written
chiefly in POD!

Because POD documentation is most likely found in a Perl source file, the Perl
specification provides for its coexistence. In particular, Perl will ignore
extents of POD documentation anywhere within the source code, and a standard
practice includes complete POD documentation for a module after the special
`__END__' literal signifying the end of the source.

The fundamental unit of POD documentation is the _paragraph_, which is
delimited from surrounding paragraphs by blank lines. In turn, there are three
basic kinds of paragraphs: ordinary, comparable to `<p>'; verbatim, much like
`<code>'; and command, which impose various changes to blocks of text
depending on the specific directive.

**** Commands

***** `=headn'

The first of these command directives you are likely to see is `=headn', where
`n' is a number from 1 to 4, indicating the depth of the heading level. Ex:
`=head2 Object Attributes'. All command statements accepting arguments use a
like prefix notation.

***** `=over', `=item' and `=back'

The `=over' instruction indents the documents and maintains this state until
`=back' has been issued. This is primarily useful for producing lists with
`=item' and indenting groups of paragraphs.

``The indentlevel option to "=over" indicates how far over to indent,
generally in ems (where one em is the width of an "M" in the document's base
font) or roughly comparable units; if there is no indentlevel option, it
defaults to four. (And some formatters may just ignore whatever indentlevel
you provide.)''

``Don't use "=item"s outside of an "=over" ... "=back" region.

The first thing after the "=over" command should be an "=item", unless there
aren't going to be any items at all in this "=over" ... "=back" region.

Don't put "=headn" commands inside an "=over" ... "=back" region.

And perhaps most importantly, keep the items consistent: either use "=item *"
for all of them, to produce bullets; or use "=item 1.", "=item 2.", etc., to
produce numbered lists; or use "=item foo", "=item bar", etc. -- namely,
things that look nothing like bullets or numbers.

If you start with bullets or numbers, stick with them, as formatters use the
first "=item" type to decide how to format the list.''

***** `=cut'

`=cut' statements terminate a POD document and must be delimited by two
blanks, one on either side.

***** `=pod'

`=pod' explicitly begins (or continues) a POD document if a given stanza does
not begin with a command directive. The "=pod" command by itself doesn't do
much of anything, but it signals to Perl (and Pod formatters) that a Pod block
starts here. A Pod block starts with any command paragraph, so a "=pod"
command is usually used just when you want to start a Pod block with an
ordinary paragraph or a verbatim paragraph. For example:
  
  =item stuff()
  
  This function does stuff.
    
  =cut
    
  sub stuff {
    ...
  }
    
  =pod
  
Remember to check its return value, as in:
    
  stuff() || die "Couldn't do stuff!";
  
  =cut

***** `=begin', `=end', and `=for'

"=begin formatname"
"=end formatname"
"=for formatname text..."

For, begin, and end will let you have regions of text/code/data that are not
generally interpreted as normal Pod text, but are passed directly to
particular formatters, or are otherwise special. A formatter that can use that
format will use the region, other­ wise it will be completely ignored.

A command "=begin formatname", some paragraphs, and a command "=end
formatname", mean that the text/data inbetween is meant for formatters that
understand the special format called formatname. For example,
      
  =begin html
  
  <hr> <img src="thang.png">
  <p> This is a raw HTML paragraph </p>
  
  =end html
    
The command "=for formatname text..." specifies that the remainder of just
this paragraph (starting right after formatname) is in that special format.
      
  =for html <hr> <img src="thang.png">
  <p> This is a raw HTML paragraph </p>
    
This means the same thing as the above "=begin html" ... "=end html" region.
    
That is, with "=for", you can have only one paragraph's worth of text (i.e.,
the text in "=foo targetname text..."), but with "=begin targetname" ...
"=end targetname", you can have any amount of stuff in between. (Note that
there still must be a blank line after the "=begin" command and a blank line
before the "=end" command.

Here are some examples of how to use these:

  =begin html
  
  <br>Figure 1.<br><IMG SRC="figure1.png"><br>
  
  =end html
  
  =begin text
  
    ---------------
    |  foo        |
    |        bar  |
    ---------------
  
  ^^^^ Figure 1. ^^^^
  
  =end text

Some format names that formatters currently are known to accept include
"roff", "man", "latex", "tex", "text", and "html". (Some formatters will treat
some of these as synonyms.)

A format name of "comment" is common for just making notes (presumably to
yourself) that won't appear in any formatted version of the Pod document:

  =for comment
  Make sure that all the available options are documented!

Some formatnames will require a leading colon (as in "=for :formatname", or
"=begin :formatname" ... "=end :formatname"), to signal that the text is not
raw data, but instead is Pod text (i.e., possibly containing formatting codes)
that's just not for normal formatting (e.g., may not be a normal-use
paragraph, but might be for formatting as a footnote).

***** `=encoding'

"=encoding encodingname"

This command is used for declaring the encoding of a document. Most users
won't need this; but if your encoding isn't US-ASCII or Latin-1, then put a
"=encoding encodingname" command early in the document so that pod formatters
will know how to decode the document. For encodingname, use a name recognized
by the Encode::Supported module. Examples:
             
  =encoding utf8
               
  =encoding koi8-r
  
  =encoding ShiftJIS
  
  =encoding big5

**** Caveat on Whitespace

And don't forget, when using any command, that the command lasts up until the
end of its paragraph, not its line. So in the examples below, you can see that
every command needs the blank line after it, to end its paragraph.

Some examples of lists include:

  =over
  
  =item *
  
  First item
  
  =item *
  
  Second item
  
  =back
  
  =over
  
  =item Foo()
  
  Description of Foo function
  
  =item Bar()
  
  Description of Bar function
  
  =back

**** Interior Sequences

``In ordinary paragraphs and in some command paragraphs, various formatting
codes (a.k.a. "interior sequences") can be used.''

***** `I<text>'

"I<text>" -- italic text

Used for emphasis (""be I<careful!>"") and parameters (""redo I<LABEL>"")

***** `B<text>'

"B<text>" -- bold text

Used for switches (""perl's B<-n> switch""), programs (""some systems provide
a B<chfn> for that""), empha­ sis (""be B<careful!>""), and so on (""and that
fea­ ture is known as B<autovivification>"").

***** `C<code>'

"C<code>" -- code text

Renders code in a typewriter font, or gives some other indication that this
represents program text (""C<gmtime($^T)>"") or some other form of computerese
(""C<drwxr-xr-x>"").

***** `L<name>'

"L<name>" -- a hyperlink

There are various syntaxes, listed below. In the syntaxes given, "text",
"name", and "section" cannot con­ tain the characters '/' and '|'; and any '<'
or '>' should be matched.
           
  "L<name>"

Link to a Perl manual page (e.g., "L<Net::Ping>"). Note that "name" should not
contain spaces. This syntax is also occasionally used for references to UNIX
man pages, as in "L<crontab(5)>".
           
  "L<name/"sec">" or "L<name/sec>"
               
Link to a section in other manual page. E.g., "L<perlsyn/"For Loops">"
           
  "L</"sec">" or "L</sec>" or "L<"sec">"
               
Link to a section in this manual page. E.g., "L</"Object Methods">"
           
A section is started by the named heading or item. For example,
"L<perlvar/$.>" or "L<perlvar/"$.">" both link to the section started by
""=item $."" in perl­ var. And "L<perlsyn/For Loops>" or "L<perlsyn/"For
Loops">" both link to the section started by ""=head2 For Loops"" in perlsyn.

To control what text is used for display, you use ""L<text|...>"", as in:

  "L<text|name>"

Link this text to that manual page. E.g., "L<Perl Error Messages|perldiag>"

  "L<text|name/"sec">" or "L<text|name/sec>"

Link this text to that section in that manual page. E.g., "L<SWITCH
statements|perlsyn/"Basic BLOCKs and Switch Statements">"

  "L<text|/"sec">" or "L<text|/sec>" or "L<text|"sec">"

Link this text to that section in this manual page. E.g., "L<the various
attributes|/"Member Data">"

Or you can link to a web page:

  "L<scheme:...>"

Links to an absolute URL. For example, "L<http://www.perl.org/>". But note
that there is no corresponding "L<text|scheme:...>" syntax, for various
reasons.

***** `E<escape>'

"E<escape>" -- a character escape

Very similar to HTML/XML "&foo;" "entity references":

  "E<lt>" -- a literal < (less than)
  
  "E<gt>" -- a literal > (greater than)
  
  "E<verbar>" -- a literal | (vertical bar)
  
  "E<sol>" = a literal / (solidus)

The above four are optional except in other for­ matting codes, notably
"L<...>", and when preceded by a capital letter.

  "E<htmlname>"

Some non-numeric HTML entity name, such as "E<eacute>", meaning the same thing
as "&eacute;" in HTML -- i.e., a lowercase e with an acute (/-shaped) accent.

  "E<number>"

The ASCII/Latin-1/Unicode character with that num­ ber. A leading "0x" means
that number is hex, as in "E<0x201E>". A leading "0" means that number is
octal, as in "E<075>". Otherwise number is interpreted as being in decimal, as
in "E<181>".

Note that older Pod formatters might not recognize octal or hex numeric
escapes, and that many formatters cannot reliably render characters above 255.
(Some formatters may even have to use com­ promised renderings of Latin-1
characters, like rendering "E<eacute>" as just a plain "e".)

  "F<filename>" -- used for filenames

Typically displayed in italics. Example: ""F<.cshrc>""

  "S<text>" -- text contains non-breaking spaces

This means that the words in text should not be broken across lines. Example:
"S<$x ? $y : $z>".

  "X<topic name>" -- an index entry

This is ignored by most formatters, but some may use it for building indexes.
It always renders as empty-string. Example: "X<absolutizing relative URLs>"

  "Z<>" -- a null (zero-effect) formatting code

This is rarely used. It's one way to get around using an E<...> code
sometimes. For example, instead of ""NE<lt>3"" (for "N<3") you could write
""NZ<><3"" (the "Z<>" breaks up the "N" and the "<" so they can't be
considered the part of a (fictitious) "N<...>" code.

Most of the time, you will need only a single set of angle brackets to delimit
the beginning and end of formatting codes. However, sometimes you will want to
put a real right angle bracket (a greater-than sign, '>') inside of a
formatting code. This is particularly common when using a formatting code to
provide a different font-type for a snippet of code. As with all things in
Perl, there is more than one way to do it. One way is to simply escape the
closing bracket using an "E" code:

  C<$a E<lt>=E<gt> $b>

This will produce: ""$a <=> $b""

A more readable, and perhaps more "plain" way is to use an alternate set of
delimiters that doesn't require a single ">" to be escaped. With the Pod
formatters that are stan­ dard starting with perl5.5.660, doubled angle
brackets ("<<" and ">>") may be used if and only if there is whitespace right
after the opening delimiter and whites­ pace right before the closing
delimiter! For example, the following will do the trick:

  C<< $a <=> $b >>

In fact, you can use as many repeated angle-brackets as you like so long as
you have the same number of them in the opening and closing delimiters, and
make sure that whitespace immediately follows the last '<' of the opening
delimiter, and immediately precedes the first '>' of the closing delimiter.
(The whitespace is ignored.) So the following will also work:

  C<<< $a <=> $b >>>
  C<<<<  $a <=> $b     >>>>

And they all mean exactly the same as this:

  C<$a E<lt>=E<gt> $b>

As a further example, this means that if you wanted to put these bits of code
in "C" (code) style:

  open(X, ">>thing.dat") || die $!
  $foo->bar();

you could do it like so:

  C<<< open(X, ">>thing.dat") || die $! >>>
  C<< $foo->bar(); >>

which is presumably easier to read than the old way:

  C<open(X, "E<gt>E<gt>thing.dat") || die $!>
  C<$foo-E<gt>bar();>

This is currently supported by pod2text (Pod::Text), pod2man (Pod::Man), and
any other pod2xxx or Pod::Xxxx translators that use Pod::Parser 1.093 or
later, or Pod::Tree 1.02 or later.

*** Object Oriented Programming

Ever since people began to write significantly complex programs in high-level
programming languages, there has always been some organizing facility
introducing necessary structure and abstraction to the elements of the program
and their flow. Procedures are very often the right answer to this problem;
however, the advent of systems representing complex, persistent data
structures stimulated the development of object-oriented programming, first
implemented in the revolutionary Smalltalk language at Xerox PARC. Object
oriented programming, or OOP, revolves around data: the fundamental units are
the _class_, a description of an abstract type); _methods_ and sometimes
_attributes_, functions and qualities associated with a class, respectively;
and _instances_, data bearing all the features of their class, yet retaining
individuality through their own independtly mutable parts. Furthermore, OOP
ardently promotes encapsulation, the total separation from the implementation
of an object and its interface. All actions respecting objects are discharged
through this interface, usually with the use _methods_ as mentioned above. OOP
is neither the panacea nor the menace that some people make it out to be. It
is, however, frequently an appropriate paradigm for the processes managed by
computing in modern times.

**** OOP in Perl

Perl novelly uses packages, which provide separation of namespaces, one
feature vital to the implementation of an object system, and add a few more
basic semantics to them to form classes. It's that simple.

***** Method Invocation

In particular, the notion of the 'method invocation' is introduced. There are
two syntactic forms of method calling, and in turn two significant semantic
contexts in which method calls may happen, but in any case, a method call
essentially informs the function of the identity of the package to which it
belongs, or an instance thereof. In the latter case, methods may manipulate
this instance data, thereby projecting an opaque interface for its use. This
is all it takes to promote a boring old function call to a _method_.

****** Invocation Variants

******* Class Invocants

  package Foo;
  
  sub bar {
    print "@_";
  }
  
  package main;
  
  Foo::bar("Bar"); # Procedural invocation 
  Foo->bar("Bar"); bar Foo ("Bar"); # Method invocation

This will print: `Foo', followed by `Foo Bar' twice. The infix arrow operator,
when used between a class (or package) name and a method, performs class
invocation of a method, passing the class name, `Foo', as an extra initial
argument. Furthermore, either operand may be an expression that symbolically
evaluates to an appropriate value, although I have thus far found that a
simple variable is the only form that will not terminally confuse the parser.
Ex:

  $method = "summon";
  $mage = Wizard->$method("Gandalf");  # Invoke Wizard->summon

This is not a breach of `use strict qw/refs/'!

One may also use the method-invocant-list construct (known popularly as the
indirect object form). The list is optional. Ex:

  $mage = summon Wizard "Gandalf";
  $nemesis = summon Balrog home => "Moria", weapon => "whip";
  move $nemesis "bridge";
  speak $mage "You cannot pass";
  break $staff;               # safer to use: break $staff ();

``The indirect object form even permits you to specify the INVOCANT as a BLOCK
that evaluates to an object (reference) or class (package). This lets you
combine those two invocations into one statement this way:

  speak { summon Wizard "Gandalf" } "friend";''

******* Instance Invocants

In Perl, references upon whom the `bless' operator has conferred special
status form instances. For the sake of continuity, I will not digress
immediately into the details of generating and handling instances, but suffice
it to say that instances are references with a little extra qualification
indicating the package of origin. In this case, the implicit initial argument
/is/ that reference, whose contents only methods may (or should, anyway)
manipulate directly. Ex:

  package Widget;
  
  sub new { return bless {}, $_[0] } # Ignore the meaning of this for now
  sub get { return $_[0]->{$_[1]} }
  sub set { $_[0]->{$_[1]} = $_[2]; return undef; }
  
  package main;
  
  my $foo = new Widget;
  $foo->set("foo", "bar");
  print $foo->get("foo");
  print ref $foo;

...which will print `bar', then `Widget'.

****** Resolution of Method Names

As mentioned earlier, symbol evaluation in method calls does not conflict with
'use strict qw/refs/', as both the package name and method are symbolically
evaluated at runtime anyway. Moreover, Perl may only determine the package of
a method at runtime, due to the complications imposed by various features of
the object system, such as inheritance and the class of the invocant. One
important implication is that methods may not use prototypes, as these are
checked at compile time. The exact behavior of a method may not be pinned down
during compilation due to inheritance, covered below.

******* Inheritance

The package variable `@ISA' indicates all classes from which the current class
'inherits' methods (superclasses). Inheritance is the process of deriving
methods from superclasses in lieu of an explicitly specified method for a
given class, thereby eliminating redundancy in the definition of class
methods. To clarify the meaning of this mechanism, imagine a system of classes
that embodies the entire terrestrial biological spectrum, organized
taxonomically. The `Cnidaria' class inherits from `Radiata', `Radiata' from
`Animalia', and so forth. `Cnidaria' judiciously chooses not to implement
methods general to all classes that inherit from `Radiata', namely, those that
introduce radial growth patterns! Usually, the instance schema defined by a
parent of class X is a subset of the definition formed by X. As we have seen
before, `our @ISA = qw/Exporter/', or simply `use base qw/Exporter/', allows a
traditional module to inherit the `import' method from Exporter. (Which is
counterintuitive, in a way.)

To illustrate a more specific example, say that there exist two classes,
`Dan', and `Dan::Tranh'. A `Dan' is any chordophone, and a `Dan::Tranh' is a
particular type of chordophone, namely, a thirty six string zither. This class
is not wholly distinct from `Dan', bearing only a superset of its
functionality, so we can use the same constructor:

  package Dan::Tranh;
  
  use strict;
  use warnings;
  use Dan;
  
  our @ISA = qw/Dan/;
  
  ...

When one instantiates the `Dan::Tranh' class, the newly created instance will
inherit the `new' method from `Dan', if none be given in the `Dan::Tranh'
class itself. Perl will call it thusly: Dan::new("Dan::Tranh", ...), so that,
if the constructor be properly designed (see below), the instance will still
be blessed into the `Dan::Tranh' class.

If `@ISA' names more than one superclass, the search for inherited methods
will traverse all the superclasses of the first immediate superclass in the
list before moving on to the next; we say the search is 'depth-first'.

******** `SUPER'

Within a method call, and /only/ within an instance method call, the `SUPER'
package refers to the superclasses of the receiving class or method. Say that
a `Tune' method exists in `Dan', with which the unique `Tune' method in
`Dan::Tranh' should share some commonalities. Ex:

  sub Dan::Tranh::Tune {
    my $self = shift;
    ref $self or croak 'Tune method for instance use only!';
    my $tuned;
  
    until ($tuned) {
      $self->SUPER::Tune; # Pluck the strings for sound; general method in `Dan'
      $tuned = $self->Adjust; # The Adjust method is unique to `Dan::Tranh'
    }
  }

It is important to note that `SUPER' refers to the superclasses defined by
`@ISA' in the _current package_, not the superclasses of the invocant! Imagine
there are two classes, `Foo' and `Bar', and the latter inherits method `Frob'
from the former. `Foo', in turn, inherits from `Quux', which has its own
definition of `Frob'. If `SUPER' referred to the superclasses of the invocant,
calling `$self->SUPER::Frob' in `Foo::Frob', where `$self' is an instance of
`Bar', would simply recurse instead of calling `Quux::Frob'!

It may seem like a good idea to use `SUPER' in constructors, instantiating a
superclass and then 'reblessing' the instance into the new class, but it is
safer not to do so, as the the inheriting class is now responsible for
cleaning up after its parent. (That's what nursing homes are for!!) Here be
dragons. If you must do this, do it when you are sure there is no special
procedure for destruction in the superclass. In general, it is better to use
some form of aggregation in the instance.

******** `UNIVERSAL'

`UNIVERSAL' is the absolute superclass. The coming Mahapralaya will incur the
dissolution of `UNIVERSAL', followed by the advent of a new Perl object
system. This class defines a few essential utility methods useful for general
reflection into the object system (you may add your own as well!). These are:

********* `isa'

"isa" returns true if its object is blessed into a subclass of "CLASS"

You can also call "UNIVERSAL::isa" as a subroutine with two arguments. The
first does not need to be an object or even a reference. This allows you to
check what a reference points to, or whether something is a reference of a
given type. Example
               
  if(UNIVERSAL::isa($ref, 'ARRAY')) {
      #...
  }
           
To determine if a reference is a blessed object, you can write
               
  print "It's an object\n" if UNIVERSAL::isa($val, 'UNIVERSAL');

********* `can'

"can" checks to see if its object has a method called "METHOD", if it does
then a reference to the sub is returned, if it does not then undef is
returned.

"UNIVERSAL::can" can also be called as a subroutine with two arguments. It'll
always return undef if its first argument isn't an object or a class name. So
here's another way to check if a reference is a blessed object

  print "It's still an object\n" if UNIVERSAL::can($val, 'can');

You can also use the "blessed" function of Scalar::Util:

  use Scalar::Util 'blessed';

  my $blessing = blessed $suspected_object;

"blessed" returns the name of the package the argument has been blessed into,
or "undef".

********* `VERSION'

"VERSION" returns the version number of the class (package). If the NEED
argument is given then it will check that the current version (as defined by
the $VERSION variable in the given package) not less than NEED; it will die if
this is not the case. This method is normally called as a class method. This
method is called automatically by the "VERSION" form of "use".

use A 1.2 qw(some imported subs);
# implies:
A->VERSION(1.2);

****** Private Methods

I'll get to this when I get to this.

****** `->' Associativity

Here's a handy trick: `Wizard->summon("Gandalf")->speak("friend");'

***** Instance Construction

Recall that an instance is a reference qualified with a class designation. But
how does this happen to begin with? In lieu of symbol exportation, most
classes will furnish a constructor method, which is (usually) called with a
class invocant and returns an instance. Customarily, the constructor is called
`new', although one could call it anything valid. Compare this to `argc' and
`argv' in C, which may certainly be written:

  int main (unsigned int foo, char **bar) {
    ...
  }

(Just don't shit on convention for the sake of shitting on convention.) One
might write such a constructor as this: 

  package Foo::Bar;
  
  use strict;
  use warnings;
  use Carp;
  
  sub new {
    # Get the invocant
    my $self = shift;
  
    # `new' should not be called on instances!!
    ref $self and croak 'Not an instance method!';
    
    # Create the instance hashref and fill in some qualities 
    my $instance = {}; # Most common idiom for instance data
    $instance->{This} = That(42);
    $instance->{FillIn} = SomeMoreWidgets('goose');
  
    # Return the blessed hashref
    return bless $instance;
  }

The first line, `my $self = shift', pervades Perl OOP code of all kinds. It
simply bungs the invocant, whether this is the class name or an instance, into
a lexical variable called `$self', so that the remaining arguments may be
processed uniformly. (You probably don't want to use the invocant in a
`foreach' loop!) The next active line checks if `$self' is an instance and
quits with an error if it doesn't. (We'll get to the Carp module, which
performs sophisticated contextual error reporting.) Some programmers actually
write constructors that exhibit polymorphism by /copying/ the instance when
passed one, so this is not mandatory behavior. The next lines create a hashref
which will become the new instance, and fill in some default data. Hashrefs
are the most common instance template in Perl, because their associative
nature is greatly flexible and naturally corresponds to the concept of an
entity with a list of qualities and their names. However, the referent may be
anything; `IO::Handle', in particular, simply uses an anonymous typeglob. But
the crux of the constructor is the invocation of `bless', without which this
would not work at all! It takes one or two arguments, the reference to be
promoted and a class into which to bless it. The second, as you can see, is
optional, and defaults to the current package.

When used in a class where inheritance (see below) can be expected, it is
crucial to fully qualify all calls to `bless' in constructors, using the class
name supplied through invocation of the method. Do not hardcode class names,
ever!

  sub new {
    my $self = shift;
    ...
    return bless $instance, $self; # Right
  }
  
  sub new {
    my $self = shift;
    ...
    return bless $instance, 'Foo'; # WRONG
  }

And one other important observation from `perlobj':

``A clarification: Perl objects are blessed. References are not. Objects know
which package they belong to. References do not. The bless() function uses the
refer­ ence to find the object. Consider the following example:''
           
  $a = {};
  $b = $a;
  bless $a, BLAH;
  print "\$b is a ", ref($b), "\n";

This reports $b as being a BLAH, so obviously bless() operated on the object
and not on the reference.''

****** Handling Global Data

One axiom of object-oriented design is that the object negotiates all
transactions. For this reason, it is essential that any access to global data
occurs through a reference in the defintion of an instance. Of course, the
constructor is responsible for establishing such a reference. Ex:

  package Bar;

  %fizzle = ( 'Password' => 'XYZZY' );

  sub new {
          my $type = shift;
          my $self = {};
          $self->{'fizzle'} = \%fizzle;
          bless $self, $type;
  }

This reference to `%fizzle' preempts a number of nasty ambiguities involving
scoping. In particular, imagine that `Bar' inherits from `Foo'. To what would
an inherited method refer in naming `%fizzle' directly? Most likely a package
variable in `Foo', or something else you don't want. Without this needed
degree of indirection, it is impossible to safely use global data in objects.

***** Internal Method Usage

Object-oriented programming scales better when methods themselves make use of
the encapsulation their class provides. Ex:

  sub hairball {
    my ( $self ) = @_;
    my $vomit = $self->feed();
    $self->feed( "empty" );
    return $vomit;
  }

...is better form than:

  sub hairball {
    my ( $self ) = @_;
    my $vomit = $self->{ stomach };
    $self->{ stomach } = "empty";
    return $vomit;
  }

The former method protects the class from changes in its own implementation!

***** Destructors

Destructor methods are automatically called when Perl decommissions an object,
and must be named `DESTROY'. In most cases, Perl's GC obviates any need for
defining a destructor in the first place. However, consider the common
facility of `census taking' in objects. Here, it is necessary to include a
reference to a global `$census' (see above). The destructor, then, might
decrement this census:

  sub DESTROY {
    my $self = shift;
    --${ $self->{Census} };
  }