-*- mode: text; mode: outline-minor -*- * Steve's Place - Perl Tutorial ** Lesson 1 *** Warning Pragma This can be enable with the use statement or the -w switch, on the command line or within a shebang. *** Escape Sequences in Double-Quoted Strings These are useful within double-quoted strings: \n - newline \t - tab \r - carriage return \f - form feed \\ - literal backslash \" - literal double quote \a - alarm character \cC - control character C-c \e - escape character \b - backspace character \033 - escape in octal \x7f - DEL in hexadecimal \x{263a} - Unicode (smiley) \N{DEVANAGARI OM} - with 'use charnames', a named Unicode character \u - Force next character to uppercase ("titlecase" in Unicode) \l - Force next character to lowercase \U - Force all following characters to uppercase \L - Force all following characters to lowercase \Q - Backslash all following nonalphanumeric \E - End \U, \L, or \Q *** Escape Sequences in Single-Quoted Strings The only escapes available in single-quoted strings are \\ and \'. *** Number Clarification Adding underscores to a number literal does nothing to change the value; it is useful, however, for delimiting segments of a number (e.g., 123_456). *** Barewords These work for backwards-compatibility reasons, but are extremely poor style except in a few limited contexts (such as hash construction). *** Filehandles The STDIN filehandle is defined by the Perl interpreter itself. Use of the diamond operator (e.g., ) reads a line (or rather, reads up to and including the input record separator) and returns it in scalar context. Ex: "Foo\n". In list context, the entire contents of the filehandle are split on the input record separator and the resulting list is returned. *** Scalars $name = ; Remember, $ for $calar. A $calar holds a $ingle value, such as a string, number (integer or floating point), or reference. *** Undefining Variables `undef' eliminates a variable from the symbol table or scratchpad, and frees all memory associated with it. (It is best to call a destructor for an object instance if one is provided, though.) *** chomp The chomp function removes the input record separator from a scalar variable if present. *** Function Calling The parentheses surrounding a parameter list in Perl may be omitted except for cases where clarity is necessary. Ex: chomp $name; *** Variable Interpolation in Strings print "$name likes beans.\n" is way more convenient than print $name, " like beans.\n". (For print accepts a list of input parameters.) Note that this interpolation only occurs again in double-quoted strings. This interpolation occurs also for individual hash and array elements. Ex: print "The address of $name is $addresses{$name}"; ** Lesson 2 *** Typing In Perl Perl can be considered a weakly-typed language, in most senses. One may freely assign a variety of values to a scalar at any time in the compilation and execution of a Perl program, and coercion applies a manner similar to other languages. However, any given cell in the symbol table (or scratch area for lexical variables?) has several distinct compartments, each of which may independently hold a value for each of Perl's fundamental data types. In this way, Perl retains some aspects of strong typing, though it hides it from the user, reducing any practical import this feature might otherwise have. **** Scalar Numbers These are best specified without quotes, although one can certainly use an expression like: my $foo = "22"; and then: print "The answer to the Question is: ", $foo + 20; As per Tcl, the appropriate value will be calculated at runtime. *** Arrays and Lists in Perl The @ sign indicates an @rray, a mutable sequence of scalars in the array compartment of a symbol table entry, whose members the programmer may access in any order, at any time, with a numerical index. Ex: @array = ( 12.5, 'plop', "some tabs\t\t\t\t", $chocolate, 56, "C" ); **** List Constructors The parentheses represent a _list constructor_, whose value is assigned to the array compartment of `array' in the symbol table. List constructors may contain various anonymous scalars, scalar variables, and even function calls, when qualified with calling parentheses. Ex: caller(). Lists constructions may also include other nested list constructions. Ex: my @bits = ( 'hello, sailor', 1, qw( mung adzuki haricot ), "dal\n", "\t", $thing ); **** The Difference Between Arrays and Lists The official Perl documentation does draw a distinction between arrays and lists (which is why `wantarray' should have been named `wantlist', according to perldoc -f wantarray). A list is any sequence of scalars; application of the diamond operator to a filehandle may occur in list context, for instance. An array is a list stored in the symbol table. That is all. **** Array and List Indexing For arrays, one qualifies the variable name with a dollar sign before, and a braced index (C style; start at zero) after. The index may be any expression that evaluates to a numerical index, such as a function call ($array[FindRightIndex()]) or arithmetic statement ($array[42 + $offset]), but this is usually a plain, anonymous integer. Ex: $array[8]. One may also access members of any given anonymous list in a like manner. Ex: Negative indices to an array or list start at the end of the list and move backwards through the list. Ex: my @beans = qw/adzuki haricot mung/; print "$beans[-1]\n"; Unfortunately, slicing does not work with negative indices as in Python, at least not without tying the array. :evil: ***** Slicing Arrays One can access an arbitrary selection of elements from an array or list by specifying a list of indices between the braces (any old list, as per the rules above in Arrays and Lists in Perl), and prefacing the variable with a @, as multiple values are involved. Ex: my @cake = qw/flour eggs milk sugar butter sultanas water/; my @SliceOfCake = @cake[3, 6]; print "@SliceOfCake\n"; ****** l-value Usage Perl does some pretty clever, unexpected things with l-values, and its usage of slices in that context is one of them. Ex: my @array = qw/hello everybody I'm Dr. Nick Riviera/; print "@array\n"; @array[3..5] = qw/a complete charlatan/; print "@array\n"; **** Array Interpolation in Double-Quoted Strings As seen in the above examples, interpolation of an array or array slice causes padding of single spaces between the added elements. Do not forget this!! **** Shell-Style Boundary Indicator $#foo yields the last index in an array. Ex: my @foo = qw/foo bar baz/; print $#foo; # 2 ***** lvalue Usage $#foo = expression-evaluating-to-integer can increase the size of an array to preempt later allocation for a large array, or decrease the size of an array. Ex: my @foo = qw/this array is rather much too long/; $#foo = 2; print "@foo"; # 'this array is' *** Semantics: Difference between Perl and perl Perl: the abstract specification of the Perl programming language. Perl is ideally equal on all platforms, although the details of perl (see below) sometimes differ due to the peculiarities of the underlying system. perl: the name of the program that executes a Perl program. This traditionally refers to a Unix instance of such a program. These are in fact quite distinct. *** Perl Hashes These are also known as 'associative arrays' or, in programming languages such as Python and Visual Basic (makes the sign of Our Ford), as dictionaries, and occupy their own space in a symbol table entry, signified syntactically by the percent sign, perhaps suggesting the relation between a key and value. Ex: my %hash = ( foo => "bar", baz => "quux" ); **** Hash Constructors These are a peculiar brand of list constructors, which consist essentially of pairs of scalar values, where the former constitutes a key, and the latter a value. The difference in semantics is imposed only by the assignment itself, and not by any characteristic of the constructor; it is fundamentally no different than any other list constructor. ***** The Fat Comma `=>' can be used to separate a key and a value. It has the advantage of clarifying the nature of the hash constructor, and frees the programmer to use barewords for keys. There is nothing that prevents its use in regular list construction; if qw// did not exist, one could use: my @array = ( foo => bar => baz => ); to emulate it. **** Hash Indexing As with the other mutable named type, arrays, hashes are prefaced with a dollar sign, but followed by an index in curly braces. The index may be any expression that evaluates to a usable scalar, although strings are the most common index. In particular, these strings may be written as barewords within the braces, even with the warnings and strict pragmata enabled. Ex: print "$hash{key}\n"; ***** Hash Slicing my %trees = ( apple => "Malus", pear => "Pyrus", plum => "Prunus", oak => "Quercus", ash => "Fraxinus", yew => "Taxus" ); print "@LatinNamesOfFruitTrees{qw/apple pear plum/}\n"; Notice that this syntax corresponds to that of array slicing. ****** lvalue Usage Again, there exists the same parallelism; using the %trees as above, this changes the Latin classification to Hindi names of the actual fruit of the trees. @trees{qw/apple pear/} = qw/seba naashapatii/; print "@trees{qw/apple pear/}\n"; **** Hash Evaluation in List Context This will degenerate a hash to its constructor list. Ex, this awesome idiom: %by_name = reverse %by_address; # Invert the hash This only works if the values are unique, though. **** Hash Evaluation in Scalar Context This yields a string representing a fraction whose denominator is the number of allocated buckets and whose numerator is the number of used buckets. print scalar %hash; # Something like 42/64 *** Whitespace Processing in Perl Where whitespace is needed to distinguish tokens, any amount or kind is usable, in the tradition of Algol. Observe the hash construction of %trees for such an example. The trade-off is the need for an explicit statement terminator, which can be damned annoying if one is not used to it. *** Independent Availability of Symbol Table Entries Remember, $beans is completely different from @beans! Think MacLISP or Common LISP. *** Escaping Variable Syntax As with other escape sequences, \@, and \$ are useful for escaping what would otherwise be interpreted as scalars, arrays, and slices or individual elements of the mutable data types in double-quoted strings. Ex: print "\@beans contains ", scalar @beans, " members, @beans.\n"; *** Scalar Context of Arrays An array used in scalar context will evaluate to an integer equal to the number of elements in the array. This does not appear to be useful for anonymous lists such as created by qw//. It is, however, useful for calls to functions, such as caller, which returns the package name of the calling function, block, or `eval' statement (if I understand the documentation correctly!). The behavior of scalar context in functions varies according to its use of Perl's context detection (wantarray for user functions), and does not necessarily conform to the standard used for arrays. **** Examples of Scalar Context ***** Assignment to Scalar Variables my $number = @things; $number will hold the number of elements in things. Note that this is different from: my ($number) = @things; ...or, if $number has already been declared, simply: ($number) = @things; which both retain list context, assigning the first value in @things to $number. (More on this later.) ***** Explicit Context with scalar The scalar function is useful for imposing scalar context where it can not be implied (such as above). There is no corresponding function for imposing list context, because Perl automatically does it where necessary, such as in a call to print with a single argument. Here I will use the example presented in Escaping Variable Syntax again: print "\@beans contains ", scalar @beans, " members, @beans.\n"; Use of @beans alone would expand to a munged together list of its contents (e.g., adzukiharicotmung). *** C-Style Numerical for Loops **** Basic Form The C-style for control structure proceeds as follow: for (initialization; test; increment) { body statements... } A more specific example: for ( my $i = 0; $i < scalar @beans; ++$i ) { print "$name likes $beans[$i] beans.\n"; } The initialization statement will traditionally (even idiomatically) establish a variable for looping, which, in an accompanying tradition, is usually $i. The test statement is then evaluated. If it is true, Perl proceeds to execute the block of statements given to the control structure; otherwise it returns control to the next statement following it. Upon completing execution of the block, the third statement, responsible for incrementing the loop value is executed. (Although one could make all three do whatever the hell you want, which is occasionally useful.) Here we use the pre-increment operator, as opposed to the post-increment operator, although it really does not matter in this context. It is important, however, to realize the difference between the two of them. Note that traditional for loops cannot be rewritten as Anglified statement modifiers, such as it can with generic for loops (e.g., print for qw/foo bar baz/;) **** Truth and Falsehood The above description of the `for' loop raises the interesting question of what value exactly will pass the second 'test' clause. In Boolean context, these values are false: undef (such as undefined variables, functions that return undef, etc.), an empty string, zero, or "0". Everything else is true (including references). The Fourfold Truth of Perl: Any string is true except for "" and "0". Any number is true except for 0. Any reference is true. Any undefined value is false. *** Useful Operators **** Pre- and Post-Increment You'll never guess what expressions like --$i and $i-- do. **** Arithmetic Operators +, -, /, and * should be fairly obvious. ** is the less obvious exponentiation operator, and % takes the modulus (remainder) of a division. x multiplies a string (e.g., print 'hello' x 3), and . concatenates two strings (e.g., print 'foo' . 'bar';) **** Arithmetical Assignment Operators $i += 2; # Add two to the value of $i $i **= 3; # Cube $i $i x= 3; # 'Multiply' string three times Etc... *** Return Values of Assignments Assignments return a kind of reference to the value modified (that is, a compartment of an entry in the symbol table), which can be useful for shortening statement. Ex: chomp( my $name = ) Which assigns an input record from the predefined STDIN filehandle to the lexically scoped variable $name, and then chomps it. *** Some Quoting Operators **** The 'Quote Words' Operator The quote words operator, `qw', constructs a list consisting of the whitespace-delimited barewords between its opening and closing elements, which may be a pair of identical non-alphanumeric characters (traditionally the forward slash) or a natural pair. Ex: my @list = qw/foo bar baz/; my @list = qw#foo bar baz#; my @list = qw^foo bar baz^; my @list = qw{foo bar baz}; Strings within a quote words construct are considered singly-quoted; there is no variable substitution or expansion of escape sequences. **** Generic Forms of Single and Double Quoting q and qq provide generic single and double quoting, which relieve the programmer of the tedious escape sequence needed to include the quote characters themselves in a string. ***** Escaping in Quoting Operators If a terminating character must be used with one of these quoting operators, it can be escaped as with other similar characters. Ex: my $answer = 42; print qq:The answer to the Question is\: $answer:;' *** The Generic `for' Loop **** Basic Form for declaration list { body statements... } A more specific example; with an explicit variable declaration, even the parentheses around @beans are unnecessary: foreach my $bean (@beans) { print "$name likes $bean beans."; } `for' may also be written as `foreach'; strictly optional though. ***** Introducing $_ The declaration of a variable may be omitted, in which case each element of the list assumes the default variable, $_. Likewise, many functions will assume they will operate on this common default variable if no arguments are given. Ex: for (qw/foo bar baz/) { print; } will just print out each member. Another example: foreach (@beans) { print "$name likes $_ beans.\n"; } **** Statement Modifier Form The generic `for' can even be used as a statement modifier, when assuming the default variable of $_. Ex: print for qw/foo bar baz/; *** Perl Variable Names User variables should generally begin with an alphabetical character, followed by any combination of alphanumeric characters and the underscore. However, there are a number of influential 'punctuation' variables that affect the fundamental operation of the Perl program, such as the default variable $_ and $/, the input record separator. If the utf8 pragma is in effect, Unicode characters which represent elements of human speech and numbers are also valid in variable names ('alpha'numeric is too broad where one might be using Katakana or pre-composed Hangul jamo). ** Lesson 3 *** Lexical Scope The keyword `my' instantiates lexical scope. You know how this works; Perl even has 'closures' and 'deep binding' (closures seek backward through lexical bindings until they find an appropriate matching value), etc. Lexical bindings established in the openings of control and iteration structures extend to their bodies. Ex: while ( chomp(my $tree = <>) ) { # Stuff involving $tree... } Presumably, this applies for `local' variables as well, but I've always been a fan of lexical scope myself. An interesting variation on lexical scoping is `our'. `our' actually creates a package variable with a lexical alias. While accessible to other packages through full qualification (e.g., $Foo::Bar) and present in the symbol table, it acts like a lexical variable within the current block. This is especially useful for use under the `strict' pragma, where one may not reference global variables without a package name. *** `strict' Pragma Unless it's a one-off script or -e expression, you want this. The `strict' pragma disallows use of global variables without package qualification (unless declared by `our'), symbolic references, and a whole bunch of other nasty shit that you should not use on a regular basis. *** C-Style `while' Loops As in C, `while' loops repeatedly executes the contents of a block as long as a condition remains true. ('Execute' is a better word than 'evaluate', because `while' loops can not return any value.) The evaluation of the condition always precedes the block; if it is false the first time, the block never acts once. Ex: while ( my $type = pop @peas ) { print "$type peas are ", flavour( $type ), ".\n"; } This will run until there are no more elements in @peas, because the return value of an assignment is the variable itself. The final `pop' statement, assigning undef to $type, causes the while loop to terminate. It is interesting to note that `while' (unlike `foreach') does not implicitly localize any variables, hence the use of `my' here. If, for example, you need to iterate over a filehandle (see below) while preserving the global value of `$_', you might do something like this: while (defined( local $_ = <$fh> ) ) { ... } **** Statement Modifier Form This is pretty routine by now, I would imagine. But ex: print while (); ***** Cautionary Statement on the Use of `my' The use of `my' in the control structure of a statement modifier is undefined, such as in: print $type while my $type = pop @peas; ...so don't do it. (It doesn't work at all in perl 5.8.6 for i486-linux.) **** Negated `while' Loop `until', as a control structure with block or statement modifier does the inverse of `while', one of Perl's many convenient syntactosaccharides. **** `continue' Block After the natural execution of a `while', `foreach', or raw block in a looping statement, or following a `next' statement (see below), any `continue' block specified will be executed. The canonical example of this is the behavior of the `-p' switch: LINE: while (<>) { ... # your program goes here } continue { print or die "-p destination: $!\n"; } Note that the variables available in the main `while' loop are also in scope in the `continue' block. *** Raw Blocks A lone block delimited by curly braces is semantically equivalent to a loop that executes once. `my' and `local' definitions will limit their scope to such a block, and it is possible to use `continue' blocks and all that other good shit you would expect with a `while' or `foreach'. Ex: SWITCH: { if (/^abc/) { $abc = 1; last SWITCH; } if (/^def/) { $def = 1; last SWITCH; } if (/^xyz/) { $xyz = 1; last SWITCH; } $nothing = 1; } *** Other Array Operators push ARRAY LIST - add LIST to the end of ARRAY shift ARRAY - extract element from the beginning of the list unshift ARRAY LIST - bung LIST values onto the beginning of ARRAY, maintaining specified order **** splice `splice' gets its own section. This is an extremely flexible, extremely ugly general Swiss Army knife operator for arrays. The full form of `splice' proceeds so: splice ARRAY OFFSET LENGTH LIST. `splice' removes elements LENGTH elements after OFFSET (unless either are negative; see below), and replaces them with the elements of LIST; the array will grow or shrink as necessary. If OFFSET is negative, the offset is that many elements from the end of the list. If LENGTH is negative, the extracted region spans from OFFSET up to and including the element immediately before the last abs(LENGTH) elements. If LENGTH is omitted altogether, remove elements from OFFSET onward. If OFFSET is omitted, splice from OFFSET to the end. If LIST is omitted, simply remove the elements designated by OFFSET and/or LENGTH. Lastly, if no parameters but ARRAY are specified (the only necessary one), then wipe the whole array (the same as $#foo = -1). `splice' returns the elements removed in list context, the last element removed in scalar context, or undef if none were removed. ***** Equivalences from the Official Documentation The following equivalences hold (assuming "$[ == 0 and $#a >= $i" ) push(@a,$x,$y) splice(@a,@a,0,$x,$y) pop(@a) splice(@a,-1) shift(@a) splice(@a,0,1) unshift(@a,$x,$y) splice(@a,0,0,$x,$y) $a[$i] = $y splice(@a,$i,1,$y) **** reverse `reverse', in list context, returns a reversed list of the input parameters. In scalar context, it gets a little trickier and concatenates the elements of list into a single string, returning a reversed form of this string. With no parameters, it reverses the current value of $_, whatever that may be. *** Subroutines **** Basic Form Subroutines or functions (technically, subroutines produce only side effects, but Perl does not seem to give a shit about this convention) are defined like so: sub foo { body statements... } in the most basic form. Prototyping and attributes are matters I'll get to later on. Oh, and the 'magic character' for functions is '&'. This is useful for, say, symbol table aliasing such as seen here: local *foo = \&bar; bar(); ...or for references (see below) "(Often a function without an explicit return statement is called a subroutine, but there's really no difference from Perl's perspective.)" -- bad-ass Perl takes a shit on convention **** Calling Subroutines Perl subroutines are called using the Hinarita (lesser method), foo(bar, baz), assuming everything is normal (you haven't overridden a toolbox function, etc.). Note that this isn't as good as the Maharita used by Lisp, (foo bar baz), but we still love Perl anyway. ***** List Ambiguity Sometimes, one wishes to distinguish a plain anonymous list from a list of function parameters. The unary `+' is useful here. Ex: # Without the plus, this will be interpreted as print(1, 2, 3) AND 4 AND 5, # which is not what we want. print +(1, 2, 3), 4, 5; **** Return Values One can return a value or values explicitly using `return' as with C, but the value of the last expression evaluated in the block also works. I like using `return' just for clarity, myself. **** Subroutine Arguments @_, an array of parameters which implement variadic calling, corresponds to $_ as the default variable upon which functions like shift operate by default. @_ is not visible in callee functions, unless the form &foo; is used, essentially bringing @_ into lexical scope. (Though I haven't checked if this works with closures.) Without prototypes, function calls flatten all scalar, array, and hash arguments into this same uniformly one-dimensional array. The elements of @_ are magic aliases to the original arguments; in this way, Perl subroutines can exhibit call-by-reference behavior. Ex: sub foo { $_[0] = "hanumizzle"; } my @foo = qw/foo bar baz/; foo(@foo); print "@foo"; hanumizzle bar baz But this is unusual behavior. ***** Idiomatic Operations on Arguments ****** shift `shift' will assume @_ as the object of operation in a subroutine (or format, but we hateses formats). Ex: my $first = shift; my $second = shift; Use of `shift' on @_ will *not* modify the aliased parameters themselves, which is probably what you want. ****** Named Parameters Perl 5.x.x does not have formal named parameters, but it's pretty damned easy to emulate them with such statements as these: # Assign first element of @_ to lexical $foo; do not change @_. Note this is # different from my $foo = @_, which puts @_ in scalar context. my ($foo) = @_; # Assign a couple named parameters; ignore rest my ($foo, $bar, $baz) = @_; # Assign some named parameters, and the remainder are bunged into @quux # greedily (think `&rest' in some Lisp dialects, such as Emacs Lisp). my ($foo, $bar, $baz, @quux) = @_; *** C-Style if Control Structures **** Basic Form if (yada) { foo } # You get the idea by now `if' can be used as a statement modifier, but perhaps even better is this idiom: yada and foo; ...or: yada and do { foo; bar; baz }; ...for a block. This uses the way-low precedence Boolean operator `and' as a sort of jury-rigged control structure, as `and' (or `&&' for that matter) will not evaluate the second expression at all if the first fails, that being a waste of time. **** Negated if Control Structure `unless' does guess-what. **** Control Transfer Any number of `elsif' clauses and `else' will inherit control from the `if' in the event that its conditional does not evaluate to a true value. Ex: if ( THIS_IS_TRUE ) { DO_THIS_THING; } elsif ( THIS_OTHER_THING_IS_TRUE ) { DO_THIS_OTHER_THING; } else { DO_THE_DEFAULT_THING; } *** Equality Comparison Operators Without overloading or anything tricky like that, the only defined behavior for these operators is on scalars, apparently. If an equality comparison operator evaluates true, the specific value yielded is 1; for false, undef. **** Strings `eq': compare two strings; return true if equal `ne': not equal `gt': greater than `ge': greater than or equal to `lt': less than `le': less than or equal to The 'inequalities' use phone book ordering to determine whether a string is greater or less than another. I don't know how Unicode comes into play; is Devanagari vocalic r greater than mor maa? **** Numbers These numerical equality comparison operators are direct analogs to the string operators listed: `==': compare two strings; return true if equal `!=': not equal `>' : greater than `>=': greater than or equal to `<' : less than `<=': less than or equal to **** The Difference? `"2" eq "2.0"' is false. `"2" == "2.0"' is true. *** `switch' Statements in Perl **** Emulation with Generic `for' Loop for ($variable_to_test) { if (/pat1/) { } # do something elsif (/pat2/) { } # do something else elsif (/pat3/) { } # do something else else { } # default } for ( $arg ) { /^quit$/ && do { exit(0) ; } ; /^help$/ && do { system( "perldoc $0") }; } **** use Switch; I'm feeling lazy; I'll just toss out a few examples: use Switch; switch ($val) { case 1 { print "number 1" } case "a" { print "string a" } case [1..10,42] { print "number in list" } case (@array) { print "number in list" } case /\w+/ { print "pattern" } case qr/\w+/ { print "pattern" } case (%hash) { print "entry in hash" } case (\%hash) { print "entry in hash" } case (\&sub) { print "arg to subroutine" } else { print "previous case not true" } } -- use Switch; # AND LATER... %special = ( woohoo => 1, d'oh => 1 ); while (<>) { switch ($_) { case (%special) { print "homer\n"; } # if $special{$_} case /a-z/i { print "alpha\n"; } # if $_ =~ /a-z/i case [1..9] { print "small num\n"; } # if $_ in [1..9] case { $_[0] >= 10 } { # if $_ >= 10 my $age = <>; switch (sub{ $_[0] < $age } ) { case 20 { print "teens\n"; } # if 20 < $age case 30 { print "twenties\n"; } # if 30 < $age else { print "history\n"; } } } print "must be punctuation\n" case /\W/; # if $_ ~= /\W/ } /Most/ of this is pretty intuitive. See the official documentation for full details. ** Lesson 4 *** More on Hashes (and Arrays) **** Adding New Elements to a Hash $foo{key} = ScalarValue; Just use a qualified hash element as an lvalue; it's that simple. **** Existence Checking Here, you gotta be careful. ***** The Intuitive and Possibly the Wrong Method `exists' may seem like the way to check for the existence of a member within a hash or array, but it might not do exactly what you want. From the manual: "Given an expression that specifies a hash element or array element, returns true if the specified element in the hash or array has ever been initialized, even if the corresponding value is undefined. The element is not autovivified if it doesn't exist." So `exists' is a bit like `intern-soft' for Lisp users. Unfortunately, it evaluates true if the specified element has been initialized, even though its value may currently be undefined. If the specified element array or hash is reduced with splice, $#foo = ... statements, or delete, `exists' returns false. Ex: # Will print a blank line my @foo = qw/foo bar baz/; $#foo = 1; print exists $foo[2]; my %foo = ( foo => 42, bar => 69 ); delete $foo{bar}; ***** What You Probably Want `defined' tests that a value is not `undef', and otherwise acts like `exists'. In most cases, this is the behavior you want. **** Removing Elements `delete' will remove an individual element or a slice from a hash or array. `defined' tests on these elements will return false, unless they are defined anew, of course. `delete' returns a list equal in length to the number of /attempted/ deletions. Each member is a former value in the operand array or hash, or `undef' for cases where deletion wasn't successful (key didn't exist). **** Iterating over Elements ***** `each' #!/usr/bin/perl -w my %bits = ( soy => 'sauce', sesame => 'oil', garlic => 'clove' ); while ( my ( $key, $value ) = each %bits ) { print "$key has value $value\n"; } `each' is one method for iterating over the elements of a hash. Perl hashes maintain iterators which traverse the contents in an ostensibly arbitrary order, but certainly reference each key/value pair exactly once. When called in list context, it returns the next (or first) key and value pair in this sequence. In scalar context, it returns only the key. Upon reaching the end of this sequence, `each' returns a false value in list or scalar context (doesn't really matter much what it is, does it?), which makes this useful for `while' loops. The next call to `each' will begin the next iteration, which proceeds exactly like the one before. Think of it almost like a Mahayuga. The iterator will also be reset by calls to `keys' or `values'. One caveat arises here: don't add or delete elements from the hash during iteration with `each', excepting cases where one deletes the last element returned by `each'. Ex: while (($key, $value) = each %hash) { print $key, "\n"; delete $hash{$key}; # This is safe } ***** `keys' In list context, return the copied keys of a hash (modifying this list will not affect the hash itself), in an 'order' identical to that used by `each'. (It may be useful to sort this list!) In scalar context, return the number of elements in the hash, and in the void context, reset the iterator, with no additional overhead. An example of `keys': my @keys = keys %ENV; my @values = values %ENV; while (@keys) { print pop(@keys), '=', pop(@values), "\n"; } When used as an lvalue, keys increases the number of buckets allocated for the operand hash (it is not possible to decrease the value, so don't worry about it). This can be used to preempt inefficiency for what you anticipate to be a large hash, and is therefore best used before the hash holds any values. Ex: keys %foo = 1000; # Perl rounds up to the next 2 ** x; this is really 1024 while (<>) { # Load up %foo... } ***** `values' In list context, returns all the members of the hash in the same order established by the iterator `each' uses, in scalar context, returns the number of elements, and in void context, resets the iterator. The list returned aliases its members to the actual values, enabling one to do some awesome shiznit: for (values %hash) { s/foo/bar/g; } # modifies %hash values *** Various List Properties **** Multiple Definitions with my my ( $one, $two, $three ) = ( 1, 2, 'three' ); And the obligatory caveat: "Note the brackets around the ($one, $two, $three). You need these to make perl realise it's a list, just as when you create arrays. If you miss them off, perl will try to evaluate $one, $two and $three separately (i.e. in scalar context), and therefore come up with the last thing it evaluated, which is $three. It will then do exactly the same to the other side, come up with 'three', then go " $three = 'three' ", and nothing else. $one and $two will never be assigned anything. You need brackets to force list context, in the same way as you sometimes need scalar to force scalar context." **** Swapping Variables ($y, $x) = ($x, $y); # That simple **** Generating Ranges `0..5' and like forms generate a list of numbers between their endpoints, inclusive. These are often used for slicing lists in sequential blocks. Ex: my @foo = 1..5; # Usable outside array indices my @bar = @foo[2..4]; # Expands to @foo[2,3,4]; # Repeat a command print 'Ah, ah, ah! You didn't say the magic word!' for 1..1000; **** Assigning the Remainder of a List to an Array # @baz will hold 'Python' and 'Scheme' my ($foo, $bar, @baz) = ('Common Lisp', qw(Perl Python Scheme) ); # Slices restrict the greediness of this phenomenon. The array must be # declared first; my cannot 'declare' slices. (@foo[0..5], $bar) = qw/The quick brown turd jumps over the lazy fox/; *** `sort' With one argument, a list, `sort' returns an ASCIIbetical sorting of its contents (actually, it also respects locale collations if use locale is in effect). One may specify a block or subname (reference or string naming a function) before the list to influence the sort. sort will introduce the values $a and $b to this block or function, which are any two particular values in use by the sorting algorithm (unless the function is prototype ($$), in which case $a and $b are passed to @_ as usual). The return value of this particular piece of code is either: 1, indicating that $a will come after $b in the list 0, indicating equal status, or -1, indicating that $a will come before $b in the list MahaEx: # sort lexically @articles = sort @files; # same thing, but with explicit sort routine @articles = sort {$a cmp $b} @files; # now case-insensitively @articles = sort {uc($a) cmp uc($b)} @files; # same thing in reversed order @articles = sort {$b cmp $a} @files; # sort numerically ascending @articles = sort {$a <=> $b} @files; # sort numerically descending @articles = sort {$b <=> $a} @files; # this sorts the %age hash by value instead of key # using an in-line function @eldest = sort { $age{$b} <=> $age{$a} } keys %age; # sort using explicit subroutine name sub byage { $age{$a} <=> $age{$b}; # presuming numeric } @sortedclass = sort byage @class; The two operators of exceptional utility in the examples are: **** `cmp' and `<=>' `cmp' returns 1 if its first operand is ASCIIbetically greater than the second, -1 if it is less, and 0 if they are equal. `<=>' does the same for numbers (does conversions and everything). The default implicit block is therefore sort { $a cmp $b } @list. *** Symbol Tables and Packages As you know well by now, all variables reside either in the symbol tables of Perl's namespaces (which are called 'packages'), or in scratchpads associated with a certain lexical scope. The use of lexical scope is fairly straightforward, but packages merit some additional description. In fact, there are really no such things as 'global variables', just package variables. `main' is the default package name, and is the closest equivalency to a 'global' namespace Perl has to offer; variable look-up will fall back on the `main' package in lieu of any more appropriate lexical or package bindings. One can also specify an explicit package name by qualifying the variable name with its package (e.g., @So::Long::And::Thanks::For::AllTheFish). It is also possible to instantiate an alternate current package with such statements as: package Foo; # Current package is `Foo' # One can impose hierarchy with double colons. This does not mean a search for # $Foo::Bar::Baz::suxorz will default to $Foo::Bar::suxorz, but one may # traverse the hierarchy with the symbol name hashes (see below), and this # scheme still offers useful organization for OO modules, for instance. package Foo::Bar::Baz; package main; # Revert to `main' package All units Perl compiles will fall within this current package by default (that is, if no other explicit package name is specified.) This installation lasts through to the end of the current unit of compilation, that is, a block, file, or eval STRING, or until another such declaration supersedes it. The special literal `__PACKAGE__' evaluates to a string containing the name of the current package. **** Typeglobs Typeglobs represent individual symbol table entries, and it is of technical import to note that they are a huge pain in the ass. As scalars have the magic character of the dollar sign, and arrays, the commercial 'at' sign, typeglobs have the asterisk. Perhaps this is not a coincidence; they are the equivalent of C pointers: confusing, loaded with semantics that appear mysterious at first, and usually remain mysterious afterwards. They will happily be removed with the Messianic coming of Perl 6. ***** Aliasing One of the most basic (and indeed, one of the most common) uses of typeglobs is to alias one symbol to another; these symbols may even span across packages. Ex: # You can localize *glob without a package, but you can't seem to use it # without a package otherwise under the strict pragma. Hmm...dumb... no strict; $Foo::bar = "baz"; { local *glob = *Foo::bar; print $glob; } Assigning a glob to a glob aliases a symbol table entry. All references to any compartment of the aliased glob refer back to the original location; so `$glob' is really the value of `$Foo::bar'; `%glob' is likewise `%Foo::bar', and so forth. This is the basic principle underlying the Export mechanism. Notice as well that this does not work with lexical variables, as they do not live in any symbol table. Therefore, statements such as `my *foo;' will not work. ****** Limited Aliasing One can restrict aliasing to a given type using statements of the form: local *foo = \$bar; # or just `*foo'; Emacs outlining doesn't like it!! $foo now points to $bar, but @foo does /not/ point to @bar. (I think this syntax makes little sense, too.) ***** Defining Constants ``Another use of symbol tables is for making "constant" scalars. *PI = \3.14159265358979; Now you cannot alter $PI, which is probably a good thing all in all. This isn't the same as a constant subroutine, which is subject to optimization at compile-time. A con­ stant subroutine is one prototyped to take no arguments and to return a constant expression. See perlsub for details on these. The "use constant" pragma is a conve­ nient shorthand for these.'' ***** Storing Typeglobs in Scalars This is of limited use these days, but I'll cover it for old time's sake. Before Perl 5.6, it was necessary to do things like this limit a filehandle to given lexical or dynamic scope: sub newopen { my $path = shift; local *FH; # not my() nor our() open(FH, $path) or return undef; return *FH; # not \*FH! } $fh = newopen('/etc/passwd'); Or: my $fh = do { local *FH; } open $fh, "/etc/passwd"; Typeglobs can be stored in scalars, effectively limiting their scope to that of the container variable. The syntax is a little different when dereferencing them, though: my $foo = *main::bar; # $foo can be lexical! It's only /holding/ a glob $main::bar = "Frobnitz"; # One dolla to get the scalar compartment of `foo', another to dereference the # scalar part of its value, *main::bar print $$foo; **** Symbol Table Hashes Perl exposes symbol tables for dynamically scoped variables with hashes visible to the user; changes in these hashes reflect in the symbol table itself. Such a hash is addressed with the package name followed by two colons. Ex: # Shows all the variables in the `main' package. Because `main' is the default # package for variable look-up, it is also possible to use `%::'. print for keys %main::; */ *stderr *utf8:: *" *CORE:: *DynaLoader:: *stdout *attributes:: * *stdin *ARGV *INC *_<-e *ENV *Regexp:: *UNIVERSAL:: *$ *_:: Note that `main::' is a key in `%main::'. Turtles all the way down, be careful if you write code to traverse the symbol table. The values of such a hash are 'typeglobs': print for values %main::; *main::/ *main::stderr *main::utf8:: *main::" *main::CORE:: *main::DynaLoader:: *main::stdout *main::attributes:: *main:: *main::stdin *main::ARGV *main::INC *main::_<-e *main::ENV *main::Regexp:: *main::UNIVERSAL:: *main::$ *main::_:: An even better example: #!/usr/bin/perl -w # use strict; # define some things $pibble = 2; @foo = ( 1, 4 ); $foo = 'bar'; %foo = ( key => 'value' ); %bits = ( me => 'tired' ); sub my_sort { return ( $a cmp $b ) } print "This program contains...\n"; while ( my ( $key, $value ) = each %main:: ) { # iterate over the key/value pairs of the symbol table hash local *symbol = $value; # this assigns the value from the symbol table to a typeglob # these lines look to see if the typeglob contains # a $, %, @ or & definition if ( defined $symbol ) { print "a scalar called \$$key\n"; # \$$k is just an escaped $ # followed by the contents of variable $key } if ( defined @symbol ) { print "an array called \@$key\n"; } if ( defined %symbol ) { print "a hash called \%$key\n"; } if ( defined &symbol ) { print "a subroutine called $key\n"; } } a hash called %ENV a scalar called $pibble a scalar called $_ a hash called %UNIVERSAL:: a scalar called $foo an array called @foo a hash called %foo a scalar called $$ ... In fact, these two statements are nearly identical: local *sym = *main::variable; local *sym = $main::{"variable"}; ...except that the first is more efficient, as it is evaluated at compile time, rather than run time. It will also /create/ the original glob if necessary. I am under the impression that using the symbol table hashes is best for traversal and other purposes not attainable through the normal syntax. ** Lesson 5 *** `open' **** Basic Form open FILEHANDLE FILE ...where FILEHANDLE is just that, and FILE is an expression evaluating to a string, which in turn indicates a filename. (I supposed FILEHANDLE could be an expression as well, but I've never seen it done.) Perl will assume by default that this is a read-only file. Actually, as of Perl 5.6, the 'filehandle' can be a scalar; Perl will autovivify it with a reference to an anonymous typeglob, greatly unifying management of files with the normal scoping protocol. (Filehandles /as such/ live in their own global namespace, a rough edge Perl 6 will sand down.) Because Perl is implemented on diverse platforms, the designers settled on the standard of directories and files delimited with Unix-style forward slashes; this is portable across operating systems, although there remains the issue of the drive letter. It is still possible to do something like: "C:\\autoexec.bat", which may be necessary if you intend to pass the filename to an external utility. It is probably best to retain forward slashes until conversion is necessary and use the substitution operator: s#/#\\#g. Once you have finished with a filehandle, close it with `close', intuitively enough. **** `open' Modes Anyone who is familiar with Unix shell programming will feel right at home. Ex: open OUTPUT, ">C:/copied.bat" or die "Can't open C:/copied.bat for writing $!\n"; open READ, "C:/autoexec.bat"; # open for writing with > open WRITE, ">", "C:/autoexec.bat"; open APPEND, ">>C:/autoexec.bat"; # open for appending with >> open APPEND, ">>", "C:/autoexec.bat"; open READ, "C:/autoexec.bat"; # perl will assume you mean 'for reading' otherwise # Magical piping fun! open(PRINTER, "| lpr -Plp1") || die "can't run lpr: $!"; print PRINTER "stuff\n"; close(PRINTER) || die "can't close lpr: $!"; open(NET, "netstat -i -n |") || die "can't fork netstat: $!"; while () { } # do something with input close(NET) || die "can't close netstat: $!"; The reason the three-argument version is considered safer is because user-specified filenames cannot override the mode. For instance: open my $fh, scalar ; print $fh "Foo!"; ...and the user types '>foo'. Well, that wipes out whatever was in `foo' earlier. On the other hand, you will end up with a filename like '>foo'... ***** Filehandle Names By convention, these are usually `FH' and `LOG', though CamelCase would still distinguish them from Perl keywords, as Perl is fully case-sensitive. This unofficial standard applies, of course, only if you are not using IO::Handle objects or anonymous glob references, available even in only semi-recent versions of Perl 5. The filehandles STDIN, STDOUT, and STDERR are predefined, and I'm sure you can guess what they mean. They can be closed and redefined as you please. Ex: close STDERR; open STDERR, ">>errors.log"; *** Exception Handling Idiom open my $file, "C:/autoexec.bat" or die "Can't open C:/autoexec.bat for reading $!"; **** Basic Principle `or' is the ultra-low priority Boolean `or' operator, and is traditionally used for control statements, but can serve as a terse if/else control structure. If `open' succeeds, it will return a true value, and Perl will not waste any time executing the opposing clause (short-circuit). If it does not succeed, this will, in effect, cause the termination of the program, as it moves on to evaluate the second. In the void context, this is useless as a logical statement, but does make a great substitute for a full-fledged control statement. **** `die' The manual does such an excellent job of explaining this that I won't even bother to paraphrase it: ``Outside an "eval", prints the value of LIST to "STDERR" and exits with the current value of $! (errno). If $! is 0, exits with the value of "($? >> 8)" (backtick `command` status). If "($? >> 8)" is 0, exits with 255. Inside an "eval()," the error message is stuffed into $@ and the "eval" is terminated with the undefined value. This makes "die" the way to raise an exception.'' `warn' does much the same thing, but does not end the program. and is therefore not responsible for setting an exit value and all that other jazz. ***** Useful Advice ``Hint: sometimes appending ", stopped" to your mes­ sage will cause it to make better sense when the string "at foo line 123" is appended. Suppose you are running script "canasta".'' Beautiful. **** $! This is the magic punctuation variable that signifies either an error message or just an `errno' from a system or library call, depending on whether it is used as a string or a number. It is /only/ useful immediately after an unsuccessful system call. Ex: if (open(FH, $filename)) { # Here $! is meaningless. ... } else { # ONLY here is $! meaningful. ... # Already here $! might be meaningless. } # Since here we might have either success or failure, # here $! is meaningless. *** The Angle Operator Internally, this is the `readline' function, which is also visible to the user. In scalar context, `readline' or the angle operator (more formally known as the line input operator) takes an expression that somehow evaluates to a filehandle and reads a 'line' from it, that is, up to and including $/, the input record separator. More specifically, the expression representing a filehandle can evaluate to a glob, such as `*STDIN' (this is actually only useful with the `readline' function proper), filehandle (just `STDIN', for instance), or a reference that indicates a filehandle indirectly (such as `$fh'). In list context, this slurps the whole file and returns it as a list. If $/ is undefined, the whole is read in. Lastly, upon reaching EOF, `readline' returns `undef'. **** Line Input Operator in `while' Constructs while (<$fh>) { ... } is shorthand for: while (defined($_ = <$fh>)) { ... } ``The $_ variable is not implicitly localized. You'll have to put a "local $_;" before the loop if you want that to happen.'' Ex: while (local $_ = ) { # Perl will still automatically use a `defined' test; the method of # assignment is not important, just that there is exactly one expression # that assigns a line from a filehandle to `$_' } Note that Perl uses the `defined' test to avoid premature breaking when a line contains, for example, "0" or nothing (""). **** Use of the Null Filehandle The null filehandle, affectionately referred to as the 'diamond operator', abstracts files given on the command line into a uniform stream, emulating the behavior of `sed' and `awk'. More accurately, it looks at the current value of @ARGV when first invoked. If there are no values in @ARGV, it sets the value of $ARGV[0] to '-' (standard input), effectively acting like . *** `print' and Explicit Filehandles print FH LIST. No comma between the FH and the LIST. `print' actually runs like this by default: `print STDOUT "foo $bar baz..."' *** `system' From the manual, which again does such an excellent job of explaining things that I don't even care to write it in my own words: ``Does exactly the same thing as "exec LIST", except that a fork is done first, and the parent process waits for the child process to complete. Note that argument processing varies depending on the number of arguments. If there is more than one argument in LIST, or if LIST is an array with more than one value, starts the program given by the first element of the list with arguments given by the rest of the list. If there is only one scalar argument, the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system's command shell for parsing (this is "/bin/sh -c" on Unix platforms, but varies on other platforms). If there are no shell metacharacters in the argument, it is split into words and passed directly to "execvp", which is more efficient.'' `system' is a very unusual call, because its return value, which is the same as that returned by `wait(2)', or -1 if the command could not be found, does not necessarily correspond to any true or false value as per Perl. This 16-bit return value contains the actual exit code in the high half (0 for normal execution), and any specific error details in the low half (again, 0 for normal execution). In effect, normal execution is a false value according to Perl! So clauses like this will /not/ work correctly: system "ls" or die "ls not found!"; *** $^O This variable is a string that tells you what Perl thinks your operating system is. That name is in fact two characters long, not a C-o. Ex: print $^O; linux *** Backticks Backticks in Perl intuitively act just an awful lot like they do in the shell; Perl executes the command with the shell, so that wildcards, variables and the like will be recognized, and then returns the output. More specifically, they are treated as double-quoted strings, allowing the interpolation of variables and escape sequences. In scalar context, they capture the output of the command into a string, in list context, a list split by the current value of $/ (although no chomping is done). Backticks only capture standard output, so you'll have to use shell redirection flags to discard or capture standard error. Ex: my @full = qx#ls foo 2>&1#; do { chomp and print } for @full; The backtick operator also has the generic `qx' form, which behaves otherwise like the other quoting operators. Use of ' as a delimiter suppresses Perl's double-quoted interpretation of the string. Ex: $perl_info = qx(ps $$); # that's Perl's $$ $shell_info = qx'ps $$'; # that's the new shell's $$ *** Directory Handling **** `chdir' `chdir' changes the directory to its single string argument; if none be given, it tries $ENV{HOME} instead. `getcwd' in POSIX or Cwd will return the current directory. `cwd' effectively does the same, but calls `pwd'. **** `opendir' `opendir' behaves in a manner very similar to `open'. Directory handles live in a kind of namespace of their own altogether, so one can have a filehandle /and/ a dirhandle named FOOBAR. Here again, one can (and probably should) use autovivified scalars. **** `readdir' This acts like `readline' on a dirhandle. It is not, however, usable with the angle brackets, unfortunately, forcing us to use ugly constructs like this: while (defined($_ = readdir $dir)) { ... } **** `rewinddir' Sets the position at the beginning of the dirhandle. No caveats! *** Argument Processing As you have seen before, `<>' will iterate over `@ARGV'. `@ARGV' is the array containing arguments to the Perl executable, perl. ($0 holds the called executable name itself). When reading from <>, $ARGV, the scalar, holds the current filename. *** File Test Operators These are essentially identical to the shell file test operators, and are documented under perldoc -f -X. A few commonly used ones (or at least ones I find myself using often): -e: file exists -f: file is a plain file -d: file is a directory -l: file is a symbolic link -x: file is executable *** perldoc Perl has its own documentation format, POD (Plain Old Documentation), which is way more spiffy and modern than the archaic troff, and displays in diverse media; it even supports hyperlinking. The premier reader for POD documentation, perldoc, does not support this feature, but is awfully handy anyway. Ex: perldoc -f sort # Info on the `sort' function; # comments work for shell, too! perldoc perl # Directory of information perldoc POSIX # POSIX module documentation ** Lesson 6 *** Regular Expressions I already have the basics down, having worked with sed, grep, locate, and (gasp!) awk for years. All I need to document are Perl-specific elements. **** `m//' ***** Basic Form In its most terse expression, the match operator need not bind to any expression, nor include the initial 'm'. Here is one such instance of the match operator: /^http:/ and print while (<>); As you might have already guessed, an unqualified pair of forward slashes can delimit a pattern, and this pattern will act upon $_ by default. You will also notice that this basic form of pattern matching returns true or false depending on the success of the match. The pattern match operator may be fully qualified as with other string operators; as in other cases, this is useful for instances where a unique delimiter is desirable: m{^/opt/plt} and print while (<>); As with `qx', strings given to `m//' will be treated as though they were double quoted unless ' is used as a delimiter. One may even substitute instances of `m//' and the pattern-matching left side of `s///' with its own generic form, `qr' (quote regex), which compiles a regex for repeated use. Ex: my $pat = qr/Hanumizzle/; my $text = "Hanumizzle soared through the clouds to distant Lanka."; $text =~ s/$pat/Hanuman/g; `m//', or just `//', is shorthand for the last successfully-matched pattern. This may even be used with the substitution operator (see below), such as in this snippet: # inspired by :1,$g/fred/s//WILMA/ while (<>) { ?(fred)? && s//WILMA $1 WILMA/; ?(barney)? && s//BETTY $1 BETTY/; ?(homer)? && s//MARGE $1 MARGE/; } continue { print "$ARGV $.: $_"; close ARGV if eof(); # reset $. reset if eof(); # reset ?pat? } ***** `??' ...is what I said the first time I saw this. If the match operator is written `m??', or simply `??', the regex matches only once, and does not match again until a call to reset (no arguments). Programming Perl gives an excellent example of its use: open DICT, "/usr/dict/words" or die "Can't open words: $!\n"; while () { $first = $1 if ?(^neur.*)?; $last = $1 if /(^neur.*)/; } print $first,"\n"; # prints "neurad" print $last,"\n"; # prints "neurypnology" ***** `m//' Specific Modifiers g: in list context, repeat match until string is consumed, then return a list of all matches ($1..$n are not used); in scalar context do progressive matching, updating pos() for the string and allowing use of the `\G' metacharacter, which anchors where the last match left off, allowing context sensitive matching and other tricks c: do not reset pos() after failed match Ex of `/g': while ($x =~ /(\w+)/g) { print "Word is $1, ends at position ", pos $x, "\n"; } prints Word is cat, ends at position 3 Word is dog, ends at position 7 Word is house, ends at position 13 **** Binding Any expression evaluating to a string can be bound to the match operator (as well as tr/// and s///); as match does not modify the operand string (compare with s///), this may be something like an anonymous string returned from a function. The syntax is nearly identical to the shell operator of the same function: print $test if $test =~ /pattern/; There exists a convenient inverted form, `!~', which is a little easier than writing the not (!) operator before the pattern. Finally, it is important to observe that the binding operators, `=~' and `!~' rank higher in precedence than, say, assignment, hence idioms like these: ($foo = $bar) =~ s/foo/bar/g; ...which assigns the value of $bar to $foo, then binds the substitution operator to the new value of $foo (remember, assignments yield the variable itself). **** `s///' `s///' is the substitution operator, akin to the operator of the same name in `sed' and `vi'. However, one vital difference that trips up users of more traditional utilities is that the right hand side of the operation should use the $1..$n variables instead of backreferences, because they are technically outside of the pattern match proper. (Remember, `s///' is treated as a double quoted string with full implications for use.) The substitution operator harbors the significant advantage of total flexibility in quoting; you can even mix styles. Ex: $foo =~ s{/usr/bin}#/usr/local/bin#g The return value of the substitution operator does not rely on context; it is the number of times the action succeeded, or false if there was no match. This operator is most commonly used for its side effects, though. ***** `s///' Specific Modifiers g: replace pattern globally throughout a string e: evaluate right-hand side in replacement #!/usr/bin/perl -w use strict; my $string = "2 3 4 5 6"; $string =~ s/ (\d+) / 2 * $1 /xge; # double every number you match print $string; 4 6 8 10 12 **** Common Modifiers x: ignore whitespace that is not backslashed or in a character class, permits comments as well i: case-insensitive searching (not valid for most writing systems :) m: interpret operand string as multi-lined; `^' and `$' will anchor at newlines within the string; \A and \Z still match the absolute ends (`$/' apparently does not apply!!) s: `.' matches newline, `^' and `$' anchor at the absolute beginning and end of the string o: compile regex but once `s' and `m' used concurrently, do not refer to sexual perversions, but instead to a hybrid behavior where `.' behaves as in `s', and `^' and `$' may refer to absolute extremities of the string /or/ newlines. $x = "There once was a girl\nWho programmed in Perl\n"; $x =~ /^Who/; # doesn't match, "Who" not at start of string $x =~ /^Who/s; # doesn't match, "Who" not at start of string $x =~ /^Who/m; # matches, "Who" at start of second line $x =~ /^Who/sm; # matches, "Who" at start of second line $x =~ /girl.Who/; # doesn't match, "." doesn't match "\n" $x =~ /girl.Who/s; # matches, "." matches "\n" $x =~ /girl.Who/m; # doesn't match, "." doesn't match "\n" $x =~ /girl.Who/sm; # matches, "." matches "\n" **** Character Classes \w: word characters, that is letters, underscores, and numbers, admitting the influence of `use locale' and Unicode support \d: digit characters (which, I imagine, probably includes ek, do, tin, etc.) \s: whitespace Capitalizing the letter will invert the class (e.g., \D matches all non-digits). **** Capturing ...can be a bit of a pain in the ass, until you know just what to do. Submatches in Perl are very simply achieved with bare parentheses, no backslashes needed. There are really three ways they are useful: ***** Submatch Variables Perl will expose submatches through the scalar variables $1..$n; this can go into the double digits and beyond. ($0, as you know, is reserved for the executable name.) These will remain available until overwritten. One of my favorite idioms goes: perl -wnle 'print $1 if /foo bar (baz)/;' ...which prints the match if and only if it occured. This essentially works like `grep' on crack, when exploiting the full power of Perl regexps. ***** List Context In list context, the match will return all matches in a flat list. There is (presently) no way to build complex `thingies' from matches, but this is in the works for Perl 6. The only other problem with this method is making sure the match *is in* list context: my @content = split /, /, ($tag =~ m{content=.*?"(.*?)"}s); # No my @content = split /, /, ($tag =~ m{content=.*?"(.*?)"}s)[0]; # Yes The problem with the first statement is that split expects a string after the pattern, putting the match in scalar context. This will just give me `1', which is not what I want. The second form preempts this faulty interpretation by selecting the first subscript from the anonymous list returned by the match, effectively 'beating Perl to the punch'. (It is possible to defeat Perl?) ***** Backreferences These are for use within a pattern, actually, although Perl will begrudgingly allow you to use them in the replacement half of a `s///', as with `sed' and other inferior programs. You are really better off doing this, though: my $name = "Hanumizzle Vanara"; $name =~ s/(\w+) (\w+)/$2, $1/; print $name; Vanara, Hanumizzle *** The Ternary Operator This acts the same as in C, and can even be used in `return' statements. (For whatever reason, no version of GCC has ever let me do this, is it standard?): # In your face, C return /\w/ ? "OK" : "Does not contain any word characters!"; This is still a tiny, crippled subset of fully first class control structures available in Lisp and Scheme, but let's pretend I didn't say that. *** `split' `split' takes zero, one, two (or three, but this third, which limits the degree of splittage, is pretty arcane) arguments. The first argument, if any, is the pattern on which to split, and the second is the operand string. `split' will remove all instances of this pattern from the string and produce a list of the values in between the matches; by default, `split' retains empty leading fields, while omitting empty trailing fields. In scalar context, `split' will return the number of fields so produced and bung the list into `@_'. Because Perl uses `@_' for function arguments, the side effects are understandably deprecated (and will produce a warning), and it is advised that you use `split' in list context, which returns all the matches. (Besides, functional programming is cooler.) If no operand string is given, `split' will follow good Perl protocol and default to `$_'. If no pattern is given, `split' splits on whitespace, omitting leading and trailing whitespace beforehand; this is frequently desirable. ``A pattern matching the null string (not to be confused with a null pattern "//", which is just one member of the set of patterns matching a null string) will split the value of EXPR into separate characters at each point it matches that way. For example: print join(':', split(/ */, 'hi there')); produces the output 'h:i:t:h:e:r:e'. Using the empty pattern "//" specifically matches the null string, and is not be confused with the use of "//" to mean "the last successful pattern match".'' *** `join' `join' returns a string consisting of all arguments following the first argument, joined by the first argument itself. That simple. *** `grep' I'll just pilfer the description from the documentation again: This is similar in spirit to, but not the same as, grep(1) and its relatives. In particular, it is not limited to using regular expressions. Evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element) and returns the list value consisting of those ele­ ments for which the expression evaluated to true. In scalar context, returns the number of times the expression was true. @foo = grep(!/^#/, @bar); # weed out comments or equivalently, @foo = grep {!/^#/} @bar; # weed out comments Note that $_ is an alias to the list value, so it can be used to modify the elements of the LIST. While this is useful and supported, it can cause bizarre results if the elements of LIST are not variables. Similarly, grep returns aliases into the original list, much as a for loop's index variable aliases the list elements. That is, mod­ ifying an element of a list returned by grep (for example, in a "foreach", "map" or another "grep") actually modifies the element in the original list. This is usually something to be avoided when writing clear code. *** `map' Likewise for `map': Evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element) and returns the list value composed of the results of each such evaluation. In scalar context, returns the total number of elements so generated. Evalu­ ates BLOCK or EXPR in list context, so each ele­ ment of LIST may produce zero, one, or more ele­ ments in the returned value. @chars = map(chr, @nums); translates a list of numbers to the corresponding characters. And %hash = map { getkey($_) => $_ } @array; is just a funny way to write %hash = (); foreach $_ (@array) { $hash{getkey($_)} = $_; } Note that $_ is an alias to the list value, so it can be used to modify the elements of the LIST. While this is useful and supported, it can cause bizarre results if the elements of LIST are not variables. Using a regular "foreach" loop for this purpose would be clearer in most cases. See also "grep" for an array composed of those items of the original list for which the BLOCK or EXPR evaluates to true. "{" starts both hash references and blocks, so "map { ..." could be either the start of map BLOCK LIST or map EXPR, LIST. Because perl doesn't look ahead for the closing "}" it has to take a guess at which its dealing with based what it finds just after the "{". Usually it gets it right, but if it doesn't it won't realize something is wrong until it gets to the "}" and encounters the missing (or unexpected) comma. The syntax error will be reported close to the "}" but you'll need to change something near the "{" such as using a unary "+" to give perl some help: %hash = map { "\L$_", 1 } @array # perl guesses EXPR. wrong %hash = map { +"\L$_", 1 } @array # perl guesses BLOCK. right %hash = map { ("\L$_", 1) } @array # this also works %hash = map { lc($_), 1 } @array # as does this. %hash = map +( lc($_), 1 ), @array # this is EXPR and works! %hash = map ( lc($_), 1 ), @array # evaluates to (1, @array) or to force an anon hash constructor use "+{" @hashes = map +{ lc($_), 1 }, @array # EXPR, so needs , at end and you get list of anonymous hashes each with only 1 entry. ** Lesson 7 *** References When it came to handling complex, organized data, Perl 4 was a big pain in the ass. Huge pain in the ass. As you know, the only valid datum type for arrays and hashes is a scalar. This may be a string, integer, or even floating point, but it does not solve the problem of hierarchical data. It is possible to do symbolic evaluation of names (as in Vim scripting), but that's a desperate hack. Lastly, there still existed the Perl-specific problem of array and hash interpolation in subroutines (or anywhere for that matter, but subroutines in particular), which made it practically impossible to pass multiple mutable data types into a function. Perl 5 introduced 'hard references', which equate roughly to C pointers, though Perl safeguards the user from nasty segfaults (woo hoo!) and various other memory violations associated with, say, C and C++. **** Forming References There are essentially two ways one can create a hard reference. The first is the unary backslash (outside a string, of course), which acts rather like the & 'address of' operator in C. One may apply it to a single variable, or a single anonymous datum, or even a list of either type, freely combined. All these statements are valid: my $foo = \7; my $bar = \@baz; my @quux = \(qw/foo bar baz/ $this $that); # Even this The backslash operator can even be nested, though I'm not sure how this is useful: my $foo = \\"foo"; print $$$foo; # Dereference a reference to a reference! References have a cryptic 'print syntax' all their own, which, when one things about it, resembles such elements of Lisp systems as: # ;; What's with the 28 bits? A Perl reference looks like this when printed: 'HASH(0x812f1c8)', indicating its memory address (28-bit??) and type. Perl reference /values/ are inherently bound to the type of their referents, although a given scalar variable can be switched to hold a new reference at any time, of course. It is interesting to note that the garbage collector will preserve data which have become referents, even when leaving scope. Ex: my $foo = eval { my @foo = qw/foo bar baz/; \@foo }; Or better: my $foo = eval { [ qw/foo bar baz/ ] }; # See below ***** Constructors There exist two constructors for the effective creation of anonymous arrays and hashes. These two are distinct (unlike regular list and hash constructors) because references appear to be typed in order to record important details about their semantics. Anyway, these are: `[ ]' and `{ }'. You can ensure that `{ }' is interpreted as a hashref constructor with the unary plus. Ex: my $foo = [ qw/foo bar baz/ ]; my $bar = { foo => "bar", baz => "quux" }; ***** The Lambda Nature Perl can /even/ form references to anonymous subroutines when using `sub' forms without a name, just a block. $coderef = sub { print "Boink!\n" }; # Make sure you include the semicolon! These anonymous subroutines act as closures with respect to the surrounding lexical environment. You know what this means already, so no need for an example. Needless to say, this is way awesome. **** Dereferencing ***** Basic Form For simple scalar variables, it is simple to enough to prepend (and append, if necessary) the syntax used for normal variable access. MahaEx: $foo = "three humps"; $scalarref = \$foo; # $scalarref is now a reference to $foo $camel_model = $$scalarref; # $camel_model is now "three humps" $bar = $$scalarref; push(@$arrayref, $filename); $$arrayref[0] = "January"; # Set the first element of @$arrayref @$arrayref[4..6] = qw/May June July/; # Set several elements of @$arrayref %$hashref = (KEY => "RING", BIRD => "SING"); # Initialize whole hash $$hashref{KEY} = "VALUE"; # Set one key/value pair @$hashref{"KEY1","KEY2"} = ("VAL1","VAL2"); # Set two more pairs &$coderef(1,2,3); print $handleref "output\n"; ***** The Block Method For instances where it is not possible to dereference a simple variable (e.g., dereferencing function return values directly or dereferencing hash or array indices), one may dereference the value of BLOCK, where BLOCK evaluates to a valid referent. # %dispatch is a hash of coderefs. This returns $index in %dispatch and calls # it with the arguments 1, 2, and 3. &{ $dispatch{$index} }(1, 2, 3); ***** The Arrow Operator For individual elements of hash and array refs (Why not slices? Don't ask me.) and coderefs, it is possible to use this syntactosaccharide to obviate the need for the block and magic characters, which can get really tedious for complex thingies. MahaEx: $ $arrayref [2] = "Dorian"; #1 ${ $arrayref }[2] = "Dorian"; #2 $arrayref->[2] = "Dorian"; #3 $ $hashref {KEY} = "F#major"; #1 ${ $hashref }{KEY} = "F#major"; #2 $hashref->{KEY} = "F#major"; #3 & $coderef (Presto => 192); #1 &{ $coderef }(Presto => 192); #2 $coderef->(Presto => 192); #3 One may even nest such operations, as the arrow operator associates left to right: print $array[3]->{"English"}->[0]; The arrow is optional between brackets or braces, or between a closing bracket or brace and a parenthesis for an indirect function call, so it is possible, for instance, to rewrite the above expression like so: print $array[3]{"English"}[0]; Likewise: $dispatch{$index}(1, 2, 3); ***** Autovivification In cases where access to a non-existent referent is attempted in an lvalue context, Perl will automatically complete the pathway. This for instance, works: my @array; $array[3]->{"English"}->[0] = "January"; Note that this is a little different: my $foo = "baz"; # $foo is /already/ defined $$foo = "bar"; # Attempt to autovivify defined value print $foo; # 'baz' print $$foo; # 'bar': WTF? Reference and ordinary scalar at once? print $baz; # 'bar' Oh... If the the value is already defined, the '$foo' in '$$foo' will be symbolically evaluated, and the assignment will go to '$baz'. In an rvalue context, the effect is a little different. For instance: print $array[3]{"English"}[0]; ...where the requisite elements are not defined, only autovivifies the thingie up to $array[3]{"English"}. This is because everything before the final index 0 must be assigned to in order to complete the path to that index (essentially, they are lvalues when viewed that way). This use of thingie autovivification in an rvalue context is faulty and may be removed in a future version of Perl. **** `ref' `ref' determines the type of an expression which evaluates to a referent, returning that type in the form of uppercase STRING, or the empty string if the value was invalid (not a referent). Built-in types include: SCALAR ARRAY HASH CODE (function or closure) GLOB REF (another reference) LVALUE (??) IO (the IO handle associated with files and directories) Use of `ref' on an instance reference will yield the name of its class. (See OOP below.) *** `eval' and `do' **** `eval' `eval' is one function useful for execution of arbitrary code within its own environment. While not as flexible as the read-eval-print loop in LISP or Forth's own peculiar brand of reflexive examination (although the modules in the `B' hierarchy might help here), it remains pretty handy in instances where it is necessary to diverge from the normal, static execution of code. The first form of `eval' accepts an expression evaluating (in scalar context) to a string, which contains Perl expressions. If omitted, this defaults to `$_'. The string is parsed and executed at runtime, in the lexical context of the current Perl program. This is useful for instances where parsing and execution is to be delayed until a certain point: in particular, temporally arbitrary `use' statements, which are surrounded by an implicit `BEGIN { }' block by default. `eval BLOCK', on the other hand, is primarily useful for catching exceptions. The block is parsed and compiled once along with the surrounding program. If any expressions contained in the block raise a runtime error, `$@' will store the error, which will be available outside the block for further action (compare to `eval STRING', which can trap compile-time errors). Other error variables (such as $!) will likewise be available. From the manual: ``In both forms, the value returned is the value of the last expression evaluated inside the mini-program; a return statement may be also used, just as with subroutines. The expression providing the return value is evaluated in void, scalar, or list context, depending on the context of the eval itself. See "wantarray" for more on how the evaluation context can be determined.'' **** `do' `do' may be used to execute a block, but I can thus far discern no difference between `do' and `eval' in this respect. However, when given an expression representing a filename, `do' will evaluate the contents thereof, with several conveniences for the programmer: Perl will check @INC and update %INC if the file is found, attribute errors to the file if necessary, and separate the execution from the current lexical scope (compare with `eval'). From the manual: ``...It's the same [as `eval'], however, in that it does reparse the file every time you call it, so you probably don't want to do this inside a loop. If "do" cannot read the file, it returns undef and sets $! to the error. If "do" can read the file but cannot compile it, it returns undef and sets an error message in $@. If the file is success­ fully compiled, "do" returns the value of the last expression evaluated. Note that inclusion of library modules is better done with the "use" and "require" operators, which also do automatic error checking and raise an exception if there's a problem.'' One can also use forms like `do { } while' and `do { } until', which execute the block at least once, whereas `while' evaluates the condition first and may preempt executing the loop altogether. Control statements like `next' ***DO NOT*** work in `do { }'. ``Here's how a block can be used to let loop-control operators work with a do{} construct. To next or redo a do, put a bare block inside: do {{ next if $x == $y; # do something here }} until $x++ > $z; For last, you have to be more elaborate: { do { last if $x = $y ** 2; # do something here } while $x++ <= $z; } And if you want both loop controls available, you'll have put a label on those blocks so you can tell them apart: DO_LAST: { do { DO_NEXT: { next DO_NEXT if $x == $y; last DO_LAST if $x = $y ** 2; # do something here } } while $x++ <= $z; } But certainly by that point (if not before), you'd be better off using an ordinary infinite loop with last at the end: for (;;) { next if $x == $y; last if $x = $y ** 2; # do something here last unless $x++ <= $z; } *** `sleep' ``sleep EXPR sleep Causes the script to sleep for EXPR seconds, or forever if no EXPR. May be interrupted if the process receives a signal such as "SIGALRM". Returns the number of seconds actually slept. You probably cannot mix "alarm" and "sleep" calls, because "sleep" is often implemented using "alarm". On some older systems, it may sleep up to a full second less than what you requested, depending on how it counts seconds. Most modern systems always sleep the full amount. They may appear to sleep longer than that, however, because your process might not be scheduled right away in a busy multi­ tasking system. For delays of finer granularity than one second, you may use Perl's "syscall" interface to access setitimer(2) if your system supports it, or else see "select" above. The Time::HiRes module (from CPAN, and starting from Perl 5.8 part of the stan­ dard distribution) may also help. See also the POSIX module's "pause" function.'' *** `localtime' ``localtime EXPR Converts a time as returned by the time function to a 9-element list with the time analyzed for the local time zone. Typically used as follows: # 0 1 2 3 4 5 6 7 8 ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time); All list elements are numeric, and come straight out of the C `struct tm'. $sec, $min, and $hour are the seconds, minutes, and hours of the speci­ fied time. $mday is the day of the month, and $mon is the month itself, in the range 0..11 with 0 indicating January and 11 indicating December. $year is the number of years since 1900. That is, $year is 123 in year 2023. $wday is the day of the week, with 0 indicating Sunday and 3 indicat­ ing Wednesday. $yday is the day of the year, in the range 0..364 (or 0..365 in leap years.) $isdst is true if the specified time occurs during daylight savings time, false otherwise. Note that the $year element is not simply the last two digits of the year. If you assume it is, then you create non-Y2K-compliant programs--and you wouldn't want to do that, would you? The proper way to get a complete 4-digit year is simply: $year += 1900; And to get the last two digits of the year (e.g., '01' in 2001) do: $year = sprintf("%02d", $year % 100); If EXPR is omitted, "localtime()" uses the current time ("localtime(time)"). In scalar context, "localtime()" returns the ctime(3) value: $now_string = localtime; # e.g., "Thu Oct 13 04:54:34 1994" This scalar value is not locale dependent but is a Perl builtin. For GMT instead of local time use the "gmtime" builtin. See also the "Time::Local" module (to convert the second, minutes, hours, ... back to the integer value returned by time()), and the POSIX module's strftime(3) and mktime(3) func­ tions. To get somewhat similar but locale dependent date strings, set up your locale environment variables appropriately (please see perllocale) and try for example: use POSIX qw(strftime); $now_string = strftime "%a %b %e %H:%M:%S %Y", localtime; # or for GMT formatted appropriately for your locale: $now_string = strftime "%a %b %e %H:%M:%S %Y", gmtime; Note that the %a and %b, the short forms of the day of the week and the month of the year, may not necessarily be three characters wide. *** Divergent Control Statements **** `next' `next' will move to the next iteration of the innermost loop, or to next iteration of the loop labelled LABEL if given such a label (LABEL is a bareword). Ex: OUTER: for my $wid (@ary1) { INNER: for my $jet (@ary2) { next OUTER if $wid > $jet; $wid += $jet; } } **** `last' `last' will terminate execution of either the innermost loop or that loop specifed by LABEL. Ex: LINE: while () { last LINE if /^$/; # exit when done with header ... } **** `redo' ``The "redo" command restarts the loop block without evalu­ ating the conditional again. The "continue" block, if any, is not executed. This command is normally used by programs that want to lie to themselves about what was just input. For example, when processing a file like /etc/termcap. If your input lines might end in backslashes to indicate con­ tinuation, you want to skip ahead and get the next record. while (<>) { chomp; if (s/\\$//) { $_ .= <>; redo unless eof(); } # now process $_ } which is Perl short-hand for the more explicitly written version: LINE: while (defined($line = )) { chomp($line); if ($line =~ s/\\$//) { $line .= ; redo LINE unless eof(); # not eof(ARGV)! } # now process $line }'' *** `printf' and `sprintf' I'll get around to documenting these when I see the need. ** Lesson 8 *** Modules Modules are the fundamental unit of reuse in Perl, and packages are their vehicle. They exist in two basic forms (there is, after all, more than one way to do it), traditional and object-oriented. Traditional modules define subroutines and variables which the user may import. The canonical mechanism used here, `Exporter', actually relies on inheritance in Perl's object system! Object-oriented modules, on the other hand, usually leave nothing to be imported, preferring instead to use instance construction and method calls. Methods should /never/ be exported. Perl does not suggest that you buy into any narrow-minded dogma about which to use. Horses for courses. Furthermore, there exists a selection of modules that influence the execution of Perl itself, called /pragmatic modules/, or /pragmata/, if you like Latin. `warnings' and `strict' are perhaps the best known; other important pragmata are `constant', `fields', and `diagnostics'. (Note that they all begin with a lowercase letter.) The central repository for all ilk of Perl module is CPAN (Comprehensive Perl Module Archive). CPAN hosts a main site, cpan.org, with mirrors around the world. The organization supplies documentation and Perl itself, as well as other useful utilities, but CPAN is best known for the plethora of modules it hosts. **** Using Modules (Or, `use'ing Modules) ***** `require' ****** Version Request First of all, `require' can be used to ensure that perl meets a certain version or greater. The recommended form of such a statement is: `require 5.006', where `5.006' is a simple numeric argument which is compared to `$['. This argument, of course, should technically contain only one decimal point, but recent versions of Perl will rewrite `5.8.6' as `5.008006'. ****** Module Request `require' may also request modules from the directories listed in the `@INC' array, which is defined by Perl and catalogs directories to be searched for modules. The semantics vary subtly, depending on the argument, however. The first and most common use is with a bareword argument. `require' will translate double colons into the platform-specific directory delimiter (`/' here), append `.pm', and search through `@INC' for the first matching module file, which it will the execute. So, for example, if `@INC' is: /usr/lib/perl5/5.8.6/i486-linux /usr/lib/perl5/5.8.6 /usr/lib/perl5/site_perl/5.8.6/i486-linux /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl . ...and I do: `require Foo::Bar', `require' might find `/usr/lib/perl5/site_perl/Foo/Bar.pm', which it will then execute and load into the current Perl interpreter. Actually, I lied: if `require' first finds a byte-compiled file generated by `B::Bytecode' (ending in `.pmc') whose modification timestamp is no older than that of any matching `.pm', `require' will load this instead. So, it's an awful lot like loading byte-compiled Emacs LISP files (even better, actually), substituting `@INC' for all instances of `load-path'. `require' will then update `%INC', a hash containing the names of all files thus far `require'd in this manner. All files loaded by `require' must return a true value. The traditional implementation is to add `1;' at the end of package files. Ex: package Foo; use base qw/Exporter/; our @EXPORT_OK = qw/foo bar baz/; sub foo { ... } sub bar { ... } sub baz { ... } 1; # Stuff after `__END__' not evaluated __END__ ...Pod documentation goes here... As I said, there is another form of `require', although it seems to have fallen out of use generally, and we rarely ever see it. The argument is any expression that evaluates to the name of a package (but not a bareword!), including the `.pm' or `.pmc' extension. Double colons will not be expanded. Otherwise, this form acts the same as the above. ***** `use' `use' is essentially a shorthand form of: BEGIN { require Module; Module->import; } `use' will include a `Module' according to the rules described above and then call the `import' method in the same package (this is usually inherited from `Exporter' as we'll see below). `BEGIN' blocks all evaluate as soon as they are defined, ensuring no possible delay in the execution of statements which may be vital to the program. `use' may also require a certain version of Perl as with `require', but there really seems to be no difference in semantics. (After all, you can't import symbols from the numerical representation of a version of Perl, can you?) ****** Importing Symbols See below for all details. Atoms beginning with a colon (kinda like Common LISP keywords, eh?) refer to `%EXPORT_TAGS': ``use Bestiary; # Import @EXPORT symbols use Bestiary (); # Import nothing use Bestiary qw(ram @llama); # Import the ram function and @llama array use Bestiary qw(:camelids); # Import $camel and @llama use Bestiary qw(:DEFAULT); # Import @EXPORT symbols use Bestiary qw(/am/); # Import $camel, @llama, and ram use Bestiary qw(/^\$/); # Import all scalars use Bestiary qw(:critters !ram); # Import the critters, but exclude ram use Bestiary qw(:critters !:camelids); # Import critters, but no camelids'' ``Because this is a wide-open interface, pragmas (compiler directives) are also implemented this way. Currently implemented pragmas are: use constant; use diagnostics; use integer; use sigtrap qw(SEGV BUS); use strict qw(subs vars refs); use subs qw(afunc blurfl); use warnings qw(all); use sort qw(stable _quicksort _mergesort);'' Most pragmata are lexically scoped, so they may be, for instance, enabled or disabled in a limited manner within a bare block: standard code... { no strict; ...tricky shit... } ...more standard code ****** Not Using Modules ``There's a corresponding "no" command that unimports meanings imported by "use", i.e., it calls "unimport Module LIST" instead of "import". no integer; no strict 'refs'; no warnings;'' ***** Changing `@INC' When changing the library search path, it is essential to escalate the priority of the operation: BEGIN { push @INC, '/foo/bar/baz/baz-0.3'; } is better written as: use lib '/foo/bar/baz/baz-0.3'; **** Writing Modules ***** `package' Declaration In order for modules to work according to the file system metaphor described above, it is necessary for the actual file representing the module to begin with a `package' instantiation corresponding to the path. So, with a module named '/usr/lib/perl5/5.8.6/Tie/RefHash.pm', I want to start with `package Tie::RefHash;'. This is a real module, and the inquiring mind will notice that this module contains a second package instantiation way down at the bottom, `package Tie::RefHash::Nestable'. That's OK; as long as the essential portion of the module falls under the same package name as the path relative to the members of `@INC', that works. Note that you can't `use Tie::RefHash::Nestable' though, although the contents of that package will be available after `use Tie::RefHash'. Remember that packages inherently have nothing to do with the compilation unit of the file, and vice versa. What Perl /does/ with them ties the two together in some ways. Realize the difference, but embrace the idiom. ***** Useful Pragmata As always, you want `warnings' and `strict'. It may also be useful to `use /version/' where /version/ is some version of Perl supporting a minimum of functionality for the module. `use 5.004;', for instance, will forbid compilation of the module if the version of Perl is less than 5.004, which was a very significant maintenance release. ***** `Exporter' For object-oriented modules, this can pretty much be ignored; some details may remain important. However, for traditional modules, this is considered a canonical part of the package itself. One usually sees an `Exporter' stanza like this: require Exporter; our @ISA = qw/Exporter/; # Or `use base qw/Exporter/;' our @EXPORT = qw($camel %wolf ram); # Export by default our @EXPORT_OK = qw(leopard @llama $emu); # Export by request our %EXPORT_TAGS = ( # Export as group camelids => [qw($camel @llama)], critters => [qw(ram $camel %wolf)], ); ****** `require Exporter' Remember, one does not `use Exporter'. The use of, well, `use' implies the use of `import'. But the relation between `Exporter' and the client module rests on the use of inheritance, not importing. Anyway, this execute the module and prepare you for the next step. ****** `our @ISA = qw/Exporter/;' This may be rewritten using the pragma `use base qw/Exporter/;', which essentially does the same thing, but is 13373r. This tells `use' to search for an `import' method in the `Exporter' namespace. (The package itself should not provide such a method!). This will use the functionality of `import' in `Exporter' for programs or packages using the current module. Eh? The terminology varies depending on what is being used wrt what. It just happened that way. ****** Exported Variables Now, explaining this mess: our @EXPORT = qw($camel %wolf ram); # Export by default our @EXPORT_OK = qw(leopard @llama $emu); # Export by request our %EXPORT_TAGS = ( # Export as group camelids => [qw($camel @llama)], critters => [qw(ram $camel %wolf)], ); `@EXPORT' is an array of symbols which one wishes to import into the calling package by default. When not qualified with a type sigil, `Exporter' will assume you are naming a function (e.g., &ram and &leopard will be available). Likewise, `@EXPORT_OK' lists symbols which you may import by request. Finally, `%EXPORT_TAGS' defines sets of symbols which you may import collectively by way of the tag (key). ``Since the symbols listed within %EXPORT_TAGS must also appear in either @EXPORT or @EXPORT_OK, the Exporter provides two functions to let you add those tagged sets of symbols: %EXPORT_TAGS = (foo => [qw(aa bb cc)], bar => [qw(aa cc dd)]); Exporter::export_tags('foo'); # add aa, bb and cc to @EXPORT Exporter::export_ok_tags('bar'); # add aa, cc and dd to @EXPORT_OK'' Finally, `@EXPORT_FAIL' lists symbols which /may not/ be imported. Any attempt to import symbols enumerated in `@EXPORT_FAIL' will be trapped by the `export_fail' method in the /exporting/ package: ``If a module attempts to import any of these symbols the Exporter will give the module an opportunity to handle the situation before generating an error. The Exporter will call an export_fail method with a list of the failed symbols: @failed_symbols = $module_name->export_fail(@failed_symbols); If the export_fail method returns an empty list then no error is recorded and all the requested symbols are exported. If the returned list is not empty then an error is generated for each symbol and the export fails. The Exporter provides a default export_fail method which simply returns the list unchanged.'' *** Plain Old Documentation Perl's Plain Old Documentation is a minimal markup language for documenting Perl programs and modules, although there is nothing to prevent you from using it for general purposes; for one, its syntax is way less arcane than that of nroff, the primary medium for Unix manual pages. POD translators exist to transform the POD source into a medium with more than intrinsic value as documentation, such as HTML or text. Pure POD can express nearly any strictly textual, one-column document, while the `=for' directive, when used with a POD translator such as pod2html, expands POD's abilities to those of the new medium. Indeed, even such a significant book as Programming Perl was written chiefly in POD! Because POD documentation is most likely found in a Perl source file, the Perl specification provides for its coexistence. In particular, Perl will ignore extents of POD documentation anywhere within the source code, and a standard practice includes complete POD documentation for a module after the special `__END__' literal signifying the end of the source. The fundamental unit of POD documentation is the _paragraph_, which is delimited from surrounding paragraphs by blank lines. In turn, there are three basic kinds of paragraphs: ordinary, comparable to `

'; verbatim, much like `'; and command, which impose various changes to blocks of text depending on the specific directive. **** Commands ***** `=headn' The first of these command directives you are likely to see is `=headn', where `n' is a number from 1 to 4, indicating the depth of the heading level. Ex: `=head2 Object Attributes'. All command statements accepting arguments use a like prefix notation. ***** `=over', `=item' and `=back' The `=over' instruction indents the documents and maintains this state until `=back' has been issued. This is primarily useful for producing lists with `=item' and indenting groups of paragraphs. ``The indentlevel option to "=over" indicates how far over to indent, generally in ems (where one em is the width of an "M" in the document's base font) or roughly comparable units; if there is no indentlevel option, it defaults to four. (And some formatters may just ignore whatever indentlevel you provide.)'' ``Don't use "=item"s outside of an "=over" ... "=back" region. The first thing after the "=over" command should be an "=item", unless there aren't going to be any items at all in this "=over" ... "=back" region. Don't put "=headn" commands inside an "=over" ... "=back" region. And perhaps most importantly, keep the items consistent: either use "=item *" for all of them, to produce bullets; or use "=item 1.", "=item 2.", etc., to produce numbered lists; or use "=item foo", "=item bar", etc. -- namely, things that look nothing like bullets or numbers. If you start with bullets or numbers, stick with them, as formatters use the first "=item" type to decide how to format the list.'' ***** `=cut' `=cut' statements terminate a POD document and must be delimited by two blanks, one on either side. ***** `=pod' `=pod' explicitly begins (or continues) a POD document if a given stanza does not begin with a command directive. The "=pod" command by itself doesn't do much of anything, but it signals to Perl (and Pod formatters) that a Pod block starts here. A Pod block starts with any command paragraph, so a "=pod" command is usually used just when you want to start a Pod block with an ordinary paragraph or a verbatim paragraph. For example: =item stuff() This function does stuff. =cut sub stuff { ... } =pod Remember to check its return value, as in: stuff() || die "Couldn't do stuff!"; =cut ***** `=begin', `=end', and `=for' "=begin formatname" "=end formatname" "=for formatname text..." For, begin, and end will let you have regions of text/code/data that are not generally interpreted as normal Pod text, but are passed directly to particular formatters, or are otherwise special. A formatter that can use that format will use the region, other­ wise it will be completely ignored. A command "=begin formatname", some paragraphs, and a command "=end formatname", mean that the text/data inbetween is meant for formatters that understand the special format called formatname. For example, =begin html


This is a raw HTML paragraph

=end html The command "=for formatname text..." specifies that the remainder of just this paragraph (starting right after formatname) is in that special format. =for html

This is a raw HTML paragraph

This means the same thing as the above "=begin html" ... "=end html" region. That is, with "=for", you can have only one paragraph's worth of text (i.e., the text in "=foo targetname text..."), but with "=begin targetname" ... "=end targetname", you can have any amount of stuff in between. (Note that there still must be a blank line after the "=begin" command and a blank line before the "=end" command. Here are some examples of how to use these: =begin html
Figure 1.

=end html =begin text --------------- | foo | | bar | --------------- ^^^^ Figure 1. ^^^^ =end text Some format names that formatters currently are known to accept include "roff", "man", "latex", "tex", "text", and "html". (Some formatters will treat some of these as synonyms.) A format name of "comment" is common for just making notes (presumably to yourself) that won't appear in any formatted version of the Pod document: =for comment Make sure that all the available options are documented! Some formatnames will require a leading colon (as in "=for :formatname", or "=begin :formatname" ... "=end :formatname"), to signal that the text is not raw data, but instead is Pod text (i.e., possibly containing formatting codes) that's just not for normal formatting (e.g., may not be a normal-use paragraph, but might be for formatting as a footnote). ***** `=encoding' "=encoding encodingname" This command is used for declaring the encoding of a document. Most users won't need this; but if your encoding isn't US-ASCII or Latin-1, then put a "=encoding encodingname" command early in the document so that pod formatters will know how to decode the document. For encodingname, use a name recognized by the Encode::Supported module. Examples: =encoding utf8 =encoding koi8-r =encoding ShiftJIS =encoding big5 **** Caveat on Whitespace And don't forget, when using any command, that the command lasts up until the end of its paragraph, not its line. So in the examples below, you can see that every command needs the blank line after it, to end its paragraph. Some examples of lists include: =over =item * First item =item * Second item =back =over =item Foo() Description of Foo function =item Bar() Description of Bar function =back **** Interior Sequences ``In ordinary paragraphs and in some command paragraphs, various formatting codes (a.k.a. "interior sequences") can be used.'' ***** `I' "I" -- italic text Used for emphasis (""be I"") and parameters (""redo I