[Chapter 2] 2.3 Terms

2.3 Terms

Now that we've talked about the kinds of data you can represent in Perl, we'd like to introduce you to the various kinds of terms you can use to pull that data into expressions. We'll use the technical term term when we want to talk in terms of these syntactic units. (Hmm, this could get confusing.) The first terms we'll talk about are variables.

Variables

There are variable types corresponding to each of the three data types we mentioned. Each of these is introduced (grammatically speaking) by what we call a "funny character". Scalar variables are always named with an initial $, even when referring to a scalar that is part of an array or hash. It works a bit like the English word "the". Thus, we have:

Construct Meaning

$days Simple scalar value $days

$days[28] 29th element of array @days

$days{'Feb'} "Feb" value from hash %days

$#days Last index of array @days

$days->[28]
29th element of array pointed to by reference $days

Construct	Meaning
`$days`	Simple scalar value `$days`
`$days[28]`	29th element of array `@days`
`$days{'Feb'}`	"`Feb`" value from hash `%days`
`$#days`	Last index of array `@days`
`$days->[28]`	29th element of array pointed to by reference `$days`

Entire arrays or array slices (and also slices of hashes) are named with @, which works much like the words "these" or "those":

Construct Meaning

@days Same as ($days[0], $days[1],... $days[n])

@days[3..5] Same as ($days[3], $days[4], $days[5])

@days[3..5] Same as @days[3,4,5]

@days{'Jan','Feb'} Same as ($days{'Jan'},$days{'Feb'})

Construct	Meaning
`@days`	Same as `($days[0], $days[1],... $days[n])`
`@days[3..5]`	Same as `($days[3], $days[4], $days[5])`
`@days[3..5]`	Same as `@days[3,4,5]`
`@days{'Jan','Feb'}`	Same as `($days{'Jan'},$days{'Feb'})`

Entire hashes are named by %:

Construct Meaning

%days (Jan => 31, Feb => $leap ? 29 : 28, ...)

Construct	Meaning
`%days`	`(Jan => 31, Feb => $leap ? 29 : 28, ...)`

Any of these nine constructs may serve as an lvalue, that is, they specify a location that you could assign a value to, among other things.[2]

[2] Assignment itself is an lvalue in certain contexts--see examples under s///, tr///, chop, and chomp in Chapter 3, Functions.

In addition, subroutine calls are named with an initial &, although this is optional when it's otherwise unambiguous (just as "do" is often redundant in English). Symbol table entries can be named with an initial *, but you don't really care about that yet.

Every variable type has its own namespace. You can, without fear of conflict, use the same name for a scalar variable, an array, or a hash (or, for that matter, a filehandle, a subroutine name, a label, or your pet llama). This means that $foo and @foo are two different variables. It also means that $foo[1] is an element of @foo, not a part of $foo. This may seem a bit weird, but that's okay, because it is weird.

Since variable names always start with $, @, or %, the reserved words can't conflict with variable names. But they can conflict with nonvariable identifiers, such as labels and filehandles, which don't have an initial funny character. Since reserved words are always entirely lowercase, we recommend that you pick label and filehandle names that do not appear all in lowercase. For example, you could say open(LOG,'logfile') rather than the regrettable open(log,'logfile').[3] Using uppercase filehandles also improves readability and protects you from conflict with future reserved words.

[3] Regrettable because log is a predefined function returning the base e logarithm of its argument, or of $_ if its argument is missing, as it is in this case.

Case is significant--FOO, Foo and foo are all different names. Names that start with a letter or underscore may be of any length (well, 255 characters, at least) and may contain letters, digits, and underscores. Names that start with a digit may only contain more digits. Names that start with anything else are limited to that one character (like $? or $$), and generally have a predefined significance to Perl. For example, just as in the Bourne shell, $$ is the current process ID and $? the exit status of your last child process.

Sometimes you want to name something indirectly. It is possible to replace an alphanumeric name with an expression that returns a reference to the actual variable (see Chapter 4, References and Nested Data Structures).

Scalar Values

Whether it's named directly or indirectly, or is just a temporary value on a stack, a scalar always contains a single value. This value may be a number,[4] a string,[5] or a reference to another piece of data. (Or there may be no value at all, in which case the scalar is said to be undefined.) While we might speak of a scalar as "containing" a number or a string, scalars are essentially typeless; there's no way to declare a scalar to be of type "number" or "string". Perl converts between the various subtypes as needed, so you can treat a number as a string or a string as a number, and Perl will do the Right Thing.[6]

[4] Perl stores numbers as signed integers if possible, or as double-precision floating-point values in the machine's native format otherwise. Floating-point values are not infinitely precise. This is very important to remember, since comparisons like (10/3 == 1/3*10) tend to fail mysteriously.
[5] Perl stores strings as sequences of bytes, with no arbitrary constraints on length or content. In human terms, you don't have to decide in advance how long your strings are going to get, and you can include any characters including null characters within your string.
[6] To convert from string to number, Perl uses C's atof (3) function. To convert from number to string, it does the equivalent of an sprintf (3) with a format of "%.14g" on most machines.

While strings and numbers are interchangeable for nearly all intents and purposes, references are a bit different. They're strongly typed, uncastable[7] pointers with built-in reference-counting and destructor invocation. You can use them to create complex data types, including user-defined objects. But they're still scalars, for all that. See Chapter 4, References and Nested Data Structures for more on references.

[7] By which we mean that you can't, for instance, convert a reference to an array into a reference to a hash. References are not castable to other pointer types. However, if you use a reference as a number or a string, you will get a numeric or string value, which is guaranteed to retain the uniqueness of the reference even though the "referenceness" of the value is lost when the value is copied from the real reference. You can compare such values or test whether they are defined. But you can't do much else with the values, since there's no way to convert numbers or strings into references. In general this is not a problem, since Perl doesn't force you to do pointer arithmetic--or even allow it.

Numeric literals

Numeric literals are specified in any of several customary[8] floating point or integer formats:

[8] Customary in UNIX culture, that is. If you're from a different culture, welcome to ours!

12345               # integer
12345.67            # floating point
6.02E23             # scientific notation
0xffff              # hexadecimal
0377                # octal
4_294_967_296       # underline for legibility

Since Perl uses the comma as a list separator, you cannot use it to delimit the triples in a large number. To improve legibility, Perl does allow you to use an underscore character instead. The underscore only works within literal numbers specified in your program, not for strings functioning as numbers or data read from somewhere else. Similarly, the leading 0x for hex and 0 for octal work only for literals. The automatic conversion of a string to a number does not recognize these prefixes--you must do an explicit conversion[9] with the oct function (which works for hex-looking data, too, as it happens).

[9] Sometimes people think Perl should convert all incoming data for them. But there are far too many decimal numbers with leading zeroes in the world to make Perl do this automatically. For example, the zip code for O'Reilly & Associates' office in Cambridge, MA is 02140. The postmaster would get upset if your mailing label program turned 02140 into 1120 decimal.

String literals

String literals are usually delimited by either single or double quotes. They work much like UNIX shell quotes: double-quoted string literals are subject to backslash and variable interpolation; single-quoted strings are not (except for \' and \\, so that you can put single quotes and backslashes into single-quoted strings).

You can also embed newlines directly in your strings; that is, they can begin and end on different lines. This is nice for many reasons, but it also means that if you forget a trailing quote, the error will not be reported until Perl finds another line containing the quote character, which may be much further on in the script. Fortunately, this usually causes an immediate syntax error on the same line, and Perl is then smart enough to warn you that you might have a runaway string.

Note that a single-quoted string must be separated from a preceding word by a space, since a single quote is a valid (though deprecated) character in an identifier; see Chapter 5, Packages, Modules, and Object Classes.

With double-quoted strings, the usual C-style backslash rules apply for inserting characters such as newline, tab, and so on. You may also specify characters in octal and hexadecimal, or as control characters:

Code Meaning

\n Newline

\r Carriage return

\t Horizontal tab

\f Form feed

\b Backspace

\a Alert (bell)

\e ESC character

\033 ESC in octal

\x7f DEL in hexadecimal

\cC Control-C

Code	Meaning
`\n`	Newline
`\r`	Carriage return
`\t`	Horizontal tab
`\f`	Form feed
`\b`	Backspace
`\a`	Alert (bell)
`\e`	ESC character
`\033`	ESC in octal
`\x7f`	DEL in hexadecimal
`\cC`	Control-C

In addition, there are escape sequences to modify the case of subsequent characters, as with the substitution operator in the vi editor:

Code Meaning

\u Force next character to uppercase.

\l Force next character to lowercase.

\U Force all following characters to uppercase.

\L Force all following characters to lowercase.

\Q
Backslash all following non-alphanumeric characters.

\E
End \U, \L, or \Q.

Code	Meaning
`\u`	Force next character to uppercase.
`\l`	Force next character to lowercase.
`\U`	Force all following characters to uppercase.
`\L`	Force all following characters to lowercase.
`\Q`	Backslash all following non-alphanumeric characters.
`\E`	End `\U`, `\L`, or `\Q`.

Besides the backslash escapes listed above, double-quoted strings are subject to variable interpolation of scalar and list values. This means that you can insert the values of certain variables directly into a string literal. It's really just a handy form of string concatenation. Variable interpolation may only be done for scalar variables, entire arrays (but not hashes), single elements from an array or hash, or slices (multiple subscripts) of an array or hash. In other words, you may only interpolate expressions that begin with $ or @, because those are the two characters (along with backslash) that the string parser looks for.[10] Although a complete hash specified with a % may not be interpolated into the string, single hash values and hash slices are okay, because they begin with $ and @ respectively.

[10] Inside strings a literal @ that is not part of an array or slice identifier must be escaped with a backslash (\@), or else a compilation error will result. See Chapter 9, Diagnostic Messages.

The following code segment prints out: "The price is $100."

$Price = '$100';                    # not interpolated
print "The price is $Price.\n";     # interpolated

As in some shells, you can put braces around the identifier to distinguish it from following alphanumerics: "How ${verb}able!". In fact, an identifier within such braces is forced to be a string, as is any single identifier within a hash subscript. For example:

$days{'Feb'}

can be written as:

$days{Feb}

and the quotes will be assumed automatically. But anything more complicated in the subscript will be interpreted as an expression.

Apart from the subscripts of interpolated array and hash variables, there are no multiple levels of interpolation. In particular, contrary to the expectations of shell programmers, backquotes do not interpolate within double quotes, nor do single quotes impede evaluation of variables when used within double quotes.

Pick your own quotes

While we usually think of quotes as literal values, in Perl they function more like operators, providing various kinds of interpolating and pattern matching capabilities. Perl provides the customary quote characters for these behaviors, but also provides a way for you to choose your quote character for any of them.

Customary Generic Meaning Interpolates

'' q// Literal No

"" qq// Literal Yes

`` qx// Command Yes

() qw// Word list No

// m// Pattern match Yes

s/// s/// Substitution Yes

y/// tr/// Translation No

Customary	Generic	Meaning	Interpolates
`''`	`q//`	Literal	No
`""`	`qq//`	Literal	Yes
``	`qx//`	Command	Yes
`()`	`qw//`	Word list	No
`//`	`m//`	Pattern match	Yes
`s///`	`s///`	Substitution	Yes
`y///`	`tr///`	Translation	No

Some of these are simply forms of "syntactic sugar" to let you avoid putting too many backslashes into quoted strings. Any non-alphanumeric, non-whitespace delimiter can be used in place of /.[11] If the delimiters are single quotes, no variable interpolation is done on the pattern. If the opening delimiter is a parenthesis, bracket, brace, or angle bracket, the closing delimiter will be the matching construct. (Embedded occurrences of the delimiters must match in pairs.) Examples:

[11] In particular, the newline and space characters are not allowed as delimiters. (Ancient versions of Perl allowed this.)

$single = q!I said, "You said, 'She said it.'"!;
$double = qq(Can't we get some "good" $variable?);
$chunk_of_code = q {
    if ($condition) {
        print "Gotcha!";
    }
};

Finally, for two-string constructs like s/// and tr///, if the first pair of quotes is a bracketing pair, then the second part gets its own starting quote character, which needn't be the same as the first pair. So you can write things like s{foo}(bar) or tr[a-z][A-Z]. Whitespace is allowed between the two inner quote characters, so you could even write that last one as:

tr [a-z]
   [A-Z];

Or leave the quotes out entirely

A word that has no other interpretation in the grammar will be treated as if it were a quoted string. These are known as barewords.[12] For example:

[12] As with filehandles and labels, a bareword that consists entirely of lowercase letters risks conflict with future reserved words. If you use the -w switch, Perl will warn you about barewords.

@days = (Mon,Tue,Wed,Thu,Fri);
print STDOUT hello, ' ', world, "\n";

sets the array @days to the short form of the weekdays and prints hello world followed by a newline on STDOUT. If you leave the filehandle out, Perl tries to interpret hello as a filehandle, resulting in a syntax error. Because this is so error-prone, some people may wish to outlaw barewords entirely. If you say:

use strict 'subs';

then any bareword that would not be interpreted as a subroutine call produces a compile-time error instead. The restriction lasts to the end of the enclosing block. An inner block may countermand this by saying:

no strict 'subs';

Note that the bare identifiers in constructs like:

"${verb}able"
$days{Feb}

are not considered barewords, since they're allowed by explicit rule rather than by having "no other interpretation in the grammar".

Interpolating array values

Array variables are interpolated into double-quoted strings by joining all the elements of the array with the delimiter specified in the $" variable[13] (which is a space by default). The following are equivalent:

[13] $LIST_SEPARATOR if you use the English library module. See Chapter 7, The Standard Perl Library.

$temp = join($",@ARGV);
print $temp;
print "@ARGV";

Within search patterns (which also undergo double-quotish interpolation) there is a bad ambiguity: Is /$foo[bar]/ to be interpreted as /${foo}[bar]/ (where [bar] is a character class for the regular expression) or as /${foo[bar]}/ (where [bar] is the subscript to array @foo)? If @foo doesn't otherwise exist, then it's obviously a character class. If @foo exists, Perl takes a good guess about [bar], and is almost always right.[14] If it does guess wrong, or if you're just plain paranoid, you can force the correct interpretation with braces as above. Even if you're merely prudent, it's probably not a bad idea.

[14] The guesser is too boring to describe in full, but basically takes a weighted average of all the things that look like character classes (a-z, \w, initial ^) versus things that look like expressions (variables or reserved words).

"Here" documents

A line-oriented form of quoting is based on the shell's here-document syntax.[15] Following a << you specify a string to terminate the quoted material, and all lines following the current line down to the terminating string are quoted. The terminating string may be either an identifier (a word), or some quoted text. If quoted, the type of quote you use determines the treatment of the text, just as in regular quoting. An unquoted identifier works like double quotes. There must be no space between the << and the identifier. (If you insert a space, it will be treated as a null identifier, which is valid but deprecated, and matches the first blank line--see the first Hurrah! example below.) The terminating string must appear by itself (unquoted and with no surrounding whitespace) on the terminating line.

[15] It's line-oriented in the sense that delimiters are lines rather than characters. The starting delimiter is the current line, and the terminating delimiter is a line consisting of the string you specify.

    print <<EOF;    # same as earlier example    
The price is $Price.
EOF
    print <<"EOF";  # same as above, with explicit quotes
The price is $Price.
EOF
    print <<'EOF';    # single-quoted quote
All things (e.g. a camel's journey through
A needle's eye) are possible, it's true.
But picture how the camel feels, squeezed out
In one long bloody thread, from tail to snout.
                                -- C.S. Lewis
EOF
    print << x 10;    # print next line 10 times
The camels are coming!  Hurrah!  Hurrah!
    print <<"" x 10;  # the preferred way to write that
The camels are coming!  Hurrah!  Hurrah!
    print <<`EOC`;    # execute commands
echo hi there
echo lo there
EOC
    print <<"dromedary", <<"camelid"; # you can stack them
I said bactrian.
dromedary
She said llama.
camelid

Just don't forget that you have to put a semicolon on the end to finish the statement, as Perl doesn't know you're not going to try to do this:

print <<ABC
179231
ABC
    + 20;   # prints 179251

Other literal tokens

Two special literals are _ _LINE_ _ and _ _FILE_ _, which represent the current line number and filename at that point in your program. They may only be used as separate tokens; they will not be interpolated into strings. In addition, the token _ _END_ _ may be used to indicate the logical end of the script before the actual end of file. Any following text is ignored, but may be read via the DATA filehandle.

The _ _DATA_ _ token functions similarly to the _ _END_ _ token, but opens the DATA filehandle within the current package's namespace, so that required files can each have their own DATA filehandles open simultaneously. For more information, see Chapter 5, Packages, Modules, and Object Classes.

Context

Until now we've seen a number of terms that can produce scalar values. Before we can discuss terms further, though, we must come to terms with the notion of context.

Scalar and list context

Every operation[16] that you invoke in a Perl script is evaluated in a specific context, and how that operation behaves may depend on the requirements of that context. There are two major contexts: scalar and list. For example, assignment to a scalar variable evaluates the right-hand side in a scalar context, while assignment to an array or a hash (or slice of either) evaluates the right-hand side in a list context. Assignment to a list of scalars would also provide a list context to the right-hand side.

[16] Here we use the term "operation" loosely to mean either an operator or a term. The two concepts fuzz into each other when you start talking about functions that parse like terms but look like unary operators.

You will be miserable until you learn the difference between scalar and list context, because certain operators know which context they are in, and return lists in contexts wanting a list, and scalar values in contexts wanting a scalar. (If this is true of an operation, it will be mentioned in the documentation for that operation.) In computer lingo, the functions are overloaded on the type of their return value. But it's a very simple kind of overloading, based only on the distinction between singular and plural values, and nothing else.

Other operations supply the list contexts to their operands, and you can tell which ones they are because they all have LIST in their syntactic descriptions. Generally it's quite intuitive.[17] If necessary, you can force a scalar context in the middle of a LIST by using the scalar function. (Perl provides no way to force a list context in a scalar context, because anywhere you would want a list context it's already provided by the LIST of some controlling function.)

[17] Note, however, that the list context of a LIST can propagate down through subroutine calls, so it's not always obvious by inspection whether a given simple statement is going to be evaluated in a scalar or list context. The program can find out its context within a subroutine by using the wantarray function.

Scalar context can be further classified into string context, numeric context, and don't-care context. Unlike the scalar versus list distinction we just made, operations never know which scalar context they're in. They simply return whatever kind of scalar value they want to, and let Perl translate numbers to strings in string context, and strings to numbers in numeric context. Some scalar contexts don't care whether a string or number is returned, so no conversion will happen. (This happens, for example, when you are assigning the value to another variable. The new variable just takes on the same subtype as the old value.)

Boolean context

One special scalar context is called Boolean context. Boolean context is simply any place where an expression is being evaluated to see whether it's true or false. We sometimes write true and false when we mean the technical definition that Perl uses: a scalar value is true if it is not the null string or the number 0 (or its string equivalent, "0"). References are always true.

A Boolean context is a don't-care context in the sense that it never causes any conversions to happen (at least, no conversions beyond what scalar context would impose).

We said that a null string is false, but there are actually two varieties of null scalars: defined and undefined. Boolean context doesn't distinguish between defined and undefined scalars. Undefined null scalars are returned when there is no real value for something, such as when there was an error, or at end of file, or when you refer to an uninitialized variable or element of an array. An undefined null scalar may become defined the first time you use it as if it were defined, but prior to that you can use the defined operator to determine whether the value is defined or not. (The return value of defined is always defined, but not always true.)

Void context

Another peculiar kind of scalar context is the void context. This context not only doesn't care what the return value is, it doesn't even want a return value. From the standpoint of how functions work, it's no different from an ordinary scalar context. But if you use the -w command-line switch, the Perl compiler will warn you if you use an expression with no side effects in a place that doesn't want a value, such as in a statement that doesn't return a value. For example, if you use a string as a statement:

"Camel Lot";

you may get a warning like this:

Useless use of a constant in void context in myprog line 123;

Interpolative context

We mentioned that double-quoted literal strings do backslash interpretation and variable interpolation, but the interpolative context (often called "double-quote context") applies to more than just double-quoted strings. Some other double-quotish constructs are the generalized backtick operator qx//, the pattern match operator m//, and the substitution operator s///. In fact, the substitution operator does interpolation on its left side before doing a pattern match, and then does interpolation on its right side each time the left side matches.

The interpolative context only happens inside quotes, or things that work like quotes, so perhaps it's not fair to call it a context in the same sense as scalar and list context. (Then again, maybe it is.)

List Values and Arrays

Now that we've talked about context, we can talk about list values, and how they behave in context. List values are denoted by separating individual values by commas (and enclosing the list in parentheses where precedence requires it):

(LIST)

In a list context, the value of the list literal is all the values of the list in order. In a scalar context, the value of a list literal is the value of the final element, as with the C comma operator, which always throws away the value on the left and returns the value on the right. (In terms of what we discussed earlier, the left side of the comma operator provides a void context.) For example:

@stuff = ("one", "two", "three");

assigns the entire list value to array @stuff, but:

$stuff = ("one", "two", "three");

assigns only the value three to variable $stuff. The comma operator knows whether it is in a scalar or a list context. An actual array variable also knows its context. In a list context, it would return its entire contents, but in a scalar context it returns only the length of the array (which works out nicely if you mention the array in a conditional). The following assigns to $stuff the value 3:

@stuff = ("one", "two", "three");
$stuff = @stuff;      # $stuff gets 3, not "three"

Until now we've pretended that LIST s are just lists of literals. But in fact, any expressions that return values may be used within lists. The values so used may either be scalar values or list values. LIST s do automatic interpolation of sublists. That is, when a LIST is evaluated, each element of the list is evaluated in a list context, and the resulting list value is interpolated into LIST just as if each individual element were a member of LIST. Thus arrays lose their identity in a LIST. The list:

(@foo,@bar,&SomeSub)

contains all the elements of @foo, followed by all the elements of @bar, followed by all the elements returned by the subroutine named SomeSub when it's called in a list context. You can use a reference to an array if you do not want it to interpolate. See Chapter 4, References and Nested Data Structures, yet again.

The null list is represented by (). Interpolating it in a list has no effect. Thus, ((),(),()) is equivalent to (). Similarly, interpolating an array with no elements is the same as if no array had been interpolated at that point.

You may place an optional comma at the end of any list value. This makes it easy to come back later and add more elements.

@numbers = (
    1,
    2,
    3,
);

Another way to specify a literal list is with the qw (quote words) syntax we mentioned earlier. This construct is equivalent to splitting a single-quoted string on whitespace. For example:

@foo = qw(
    apple       banana      carambola
    coconut     guava       kumquat
    mandarin    nectarine   peach
    pear        persimmon   plum
);

(Note that those parentheses are behaving as quote characters, not ordinary parentheses. We could just as easily have picked angle brackets or braces or slashes.)

A list value may also be subscripted like a normal array. You must put the list in parentheses (real ones) to avoid ambiguity. Examples:

# Stat returns list value.
$modification_time = (stat($file))[9];
# SYNTAX ERROR HERE.
$modification_time = stat($file)[9];  # OOPS, FORGOT PARENS
# Find a hex digit.
$hexdigit = ('a','b','c','d','e','f')[$digit-10];
# A "reverse comma operator".
return (pop(@foo),pop(@foo))[0];

Lists may be assigned to if and only if each element of the list is legal to assign to:

($a, $b, $c) = (1, 2, 3);
($map{red}, $map{green}, $map{blue}) = (0x00f, 0x0f0, 0xf00);

List assignment in a scalar context returns the number of elements produced by the expression on the right side of the assignment:

$x = ( ($foo,$bar) = (7,7,7) );       # set $x to 3, not 2
$x = ( ($foo,$bar) = f() );           # set $x to f()'s return count

This is handy when you want to do a list assignment in a Boolean context, since most list functions return a null list when finished, which when assigned produces a 0, which is interpreted as false. The final list element may be an array or a hash:

($a, $b, @rest) = split;
my ($a, $b, %rest) = @arg_list;

You can actually put an array or hash anywhere in the list you assign to, but the first one in the list will soak up all the values, and anything after it will get an undefined value. This may be useful in a local or my, where you probably want the arrays initialized to be empty anyway.

You may find the number of elements in the array @days by evaluating @days in a scalar context, such as:

@days + 0;      # implicitly force @days into a scalar context
scalar(@days)   # explicitly force @days into a scalar context

Note that this only works for arrays. It does not work for list values in general. A comma-separated list evaluated in a scalar context will return the last value, like the C comma operator.

Closely related to the scalar evaluation of @days is $#days. This will return the subscript of the last element of the array, or one less than the length, since there is (ordinarily) a 0th element.[18] Assigning to $#days changes the length of the array. Shortening an array by this method destroys intervening values. You can gain some measure of efficiency by pre-extending an array that is going to get big. (You can also extend an array by assigning to an element that is off the end of the array.) You can truncate an array down to nothing by assigning the null list () to it.[19] The following two statements are equivalent:

[18] For historical reasons, the special variable $[ can be used to change the array base. Its use is not recommended, however. In fact, this is the last we'll even mention it. Just don't use it.
[19] In the current version of Perl, re-extending a truncated array does not recover the values in the array. (It did in earlier versions.)

@whatever = ();
$#whatever = -1;

And the following is always true:[20]

[20] Unless you've diddled the deprecated $[ variable. Er, this is the last time we'll mention it . . .

scalar(@whatever) == $#whatever + 1;

Hashes (Associative Arrays)

As we indicated previously, a hash is just a funny kind of array in which you look values up using key strings instead of numbers. It defines associations between keys and values, so hashes are often called associative arrays.

There really isn't any such thing as a hash literal in Perl, but if you assign an ordinary list to a hash, each pair of values in the list will be taken to indicate one key/value association:

%map = ('red',0x00f,'green',0x0f0,'blue',0xf00);

This has the same effect as:

%map = ();            # clear the hash first
$map{red}   = 0x00f;
$map{green} = 0x0f0;
$map{blue}  = 0xf00;

It is often more readable to use the => operator between key/value pairs. The => operator is just a synonym for a comma, but it's more visually distinctive, and it also quotes any bare identifiers to the left of it (just like the identifiers in braces above), which makes it nice for initializing hash variables:

%map = (
    red   => 0x00f,
    green => 0x0f0,
    blue  => 0xf00,
);

or for initializing anonymous hash references to be used as records:

$rec = {
    witch => 'Mable the Merciless',
    cat   => 'Fluffy the Ferocious',
    date  => '10/31/1776',
};

or for using call-by-named-parameter to invoke complicated functions:

$field = $query->radio_group( 
                    NAME      => 'group_name',
                    VALUES    => ['eenie','meenie','minie'],
                    DEFAULT   => 'meenie',
                    LINEBREAK => 'true',
                    LABELS    => \%labels,
                );

But we're getting ahead of ourselves. Back to hashes.

You can use a hash variable (%hash) in a list context, in which case it interpolates all the key/value pairs into the list. But just because the hash was initialized in a particular order doesn't mean that the values come back in that order. Hashes are implemented internally using hash tables for speedy lookup, which means that the order in which entries are stored is dependent on the nature of the hash function used to calculate positions in the hash table, and not on anything interesting. So the entries come back in a seemingly random order. (The two elements of each key/value pair come out in the right order, of course.) For examples of how to arrange for an output ordering, see the keys entry in Chapter 3, Functions, or DB_BTREE description in the DB_File documentation in Chapter 7, The Standard Perl Library.

If you evaluate a hash variable in a scalar context, it returns a value that is true if and only if the hash contains any key/value pairs. (If there are any key/value pairs, the value returned is a string consisting of the number of used buckets and the number of allocated buckets, separated by a slash. This is pretty much only useful to find out whether Perl's (compiled in) hashing algorithm is performing poorly on your data set. For example, you stick 10,000 things in a hash, but evaluating %HASH in scalar context reveals "1/8", which means only one out of eight buckets has been touched, and presumably that one bucket contains all 10,000 of your items. This isn't supposed to happen.)

Typeglobs and Filehandles

Perl uses an internal type called a typeglob to hold an entire symbol table entry. The type prefix of a typeglob is a *, because it represents all types. This used to be the preferred way to pass arrays and hashes by reference into a function, but now that we have real references, this mechanism is seldom needed.

Typeglobs (or references thereto) are still used for passing or storing filehandles. If you want to save away a filehandle, do it this way:

$fh = *STDOUT;

or perhaps as a real reference, like this:

$fh = \*STDOUT;

This is also the way to create a local filehandle. For example:

sub newopen {
    my $path = shift;
    local *FH;  # not my!
    open (FH, $path) || return undef;
    return *FH;
}
$fh = newopen('/etc/passwd');

See the open entry in Chapter 3, Functions and the FileHandle module in Chapter 7, The Standard Perl Library, for how to generate new filehandles.

But the main use of typeglobs nowadays is to alias one symbol table entry to another symbol table entry. If you say:

*foo = *bar;

it makes everything named "foo" a synonym for every corresponding thing named "bar". You can alias just one of the variables in a typeglob by assigning a reference instead:

*foo = \$bar;

makes $foo an alias for $bar, but doesn't make @foo an alias for @bar, or %foo an alias for %bar. Aliasing variables like this may seem like a silly thing to want to do, but it turns out that the entire module export/import mechanism is built around this feature, since there's nothing that says the symbol you're aliasing has to be in your namespace. See Chapter 4, References and Nested Data Structures and Chapter 5, Packages, Modules, and Object Classes for more discussion on typeglobs.

Input Operators

There are several input operators we'll discuss here because they parse as terms. In fact, sometimes we call them pseudo-literals because they act like quoted strings in many ways. (Output operators like print parse as list operators and are discussed in Chapter 3, Functions.)

Command input (backtick) operator

First of all, we have the command input operator, also known as the backticks operator, because it looks like this:

$info = `finger $user`;

A string enclosed by backticks (grave accents) first undergoes variable interpolation just like a double-quoted string. The result of that is then interpreted as a command by the shell, and the output of that command becomes the value of the pseudo-literal. (This is modeled after a similar operator in some of the UNIX shells.) In scalar context, a single string consisting of all the output is returned. In list context, a list of values is returned, one for each line of output. (You can set $/ to use a different line terminator.)

The command is executed each time the pseudo-literal is evaluated. The numeric status value of the command is saved in $? (see the section "Special Variables" later in this chapter for the interpretation of $?). Unlike the csh version of this command, no translation is done on the return data--newlines remain newlines. Unlike any of the shells, single quotes do not hide variable names in the command from interpretation. To pass a $ through to the shell you need to hide it with a backslash. The $user in our example above is interpolated by Perl, not by the shell. (Because the command undergoes shell processing, see Chapter 6, Social Engineering, for security concerns.)

The generalized form of backticks is qx// (for "quoted execution"), but the operator works exactly the same way as ordinary backticks. You just get to pick your quote characters.

Line input (angle) operator

The most heavily used input operator is the line input operator, also known as the angle operator. Evaluating a filehandle in angle brackets (<STDIN>, for example) yields the next line from the associated file. (The newline is included, so according to Perl's criteria for truth, a freshly input line is always true, up until end of file, at which point an undefined value is returned, which is false.) Ordinarily you would assign the input value to a variable, but there is one situation where an automatic assignment happens. If and only if the line input operator is the only thing inside the conditional of a while loop, the value is automatically assigned to the special variable $_. The assigned value is then tested to see whether it is defined. (This may seem like an odd thing to you, but you'll use the construct in almost every Perl script you write.) Anyway, the following lines are equivalent to each other:

while (defined($_ = <STDIN>)) { print $_; }   # the long way
while (<STDIN>) { print; }                    # the short way
for (;<STDIN>;) { print; }                    # while loop in disguise
print $_ while defined($_ = <STDIN>);         # long statement modifier
print while <STDIN>;                          # short statement modifier

Remember that this special magic requires a while loop. If you use the input operator anywhere else, you must assign the result explicitly if you want to keep the value:

if (<STDIN>)      { print; }   # WRONG, prints old value of $_
if ($_ = <STDIN>) { print; }   # okay

The filehandles STDIN, STDOUT, and STDERR are predefined and pre-opened.[21] Additional filehandles may be created with the open function. See the open entry in Chapter 3, Functions for details on this. Some object modules also create object references that can be used as filehandles. See the FileHandle module in Chapter 7, The Standard Perl Library.

[21] The filehandles stdin, stdout, and stderr will also work except in packages, where they would be interpreted as local identifiers rather than global. They're only there for compatibility with very old scripts, so use the uppercase versions.

In the while loops above, we were evaluating the line input operator in a scalar context, so it returned each line separately. However, if you use it in a list context, a list consisting of all the remaining input lines is returned, one line per list element. It's easy to make a large data space this way, so use this feature with care:

$one_line = <MYFILE>;   # Get first line.
@all_lines = <MYFILE>;  # Get the rest of the lines.

There is no while magic associated with the list form of the input operator, because the condition of a while loop is always a scalar context (as is any conditional).

Using the null filehandle within the angle operator is special and can be used to emulate the command-line behavior of typical UNIX filter programs such as sed and awk. When you read lines from <>, it magically gives you all the lines from all the files mentioned on the command line. If no files were mentioned, it gives you standard input instead, so your program is easy to insert into the middle of a pipeline of processes.

Here's how it works: the first time <> is evaluated, the @ARGV array is checked, and if it is null, $ARGV[0] is set to "-", which when opened gives you standard input. The @ARGV array is then processed as a list of filenames. The loop:

while (<>) {
    ...                     # code for each line
}

is equivalent to the following Perl-like pseudocode:

@ARGV = ('-') unless @ARGV;
while ($ARGV = shift) {
    open(ARGV, $ARGV) or warn "Can't open $ARGV: $!\n";
    while (<ARGV>) {
        ...         # code for each line
    }
}

except that it isn't so cumbersome to say, and will actually work. It really does shift array @ARGV and put the current filename into variable $ARGV. It also uses filehandle ARGV internally--<> is just a synonym for <ARGV>, which is magical. (The pseudocode above doesn't work because it treats <ARGV> as non-magical.)

You can modify @ARGV before the first <> as long as the array ends up containing the list of filenames you really want. Line numbers ($.) continue as if the input were one big happy file. (But see the example under eof for how to reset line numbers on each file.)

If you want to set @ARGV to your own list of files, go right ahead. If you want to pass switches into your script, you can use one of the Getopts modules or put a loop on the front like this:

while ($_ = $ARGV[0], /^-/) {
    shift;
    last if /^--$/;
    if (/^-D(.*)/) { $debug = $1 }
    if (/^-v/)     { $verbose++  }
    ...             # other switches
}
while (<>) {
    ...             # code for each line
}

The <> symbol will return false only once. If you call it again after this it will assume you are processing another @ARGV list, and if you haven't set @ARGV, it will input from STDIN.

If the string inside the angle brackets is a scalar variable (for example, <$foo>), then that variable contains the name of the filehandle to input from, or a reference to the same. For example:

$fh = \*STDIN;
$line = <$fh>;

Filename globbing operator

You might wonder what happens to a line input operator if you put something fancier inside the angle brackets. What happens is that it mutates into a different operator. If the string inside the angle brackets is anything other than a filehandle name or a scalar variable (even if there are just extra spaces), it is interpreted as a filename pattern to be "globbed".[22] The pattern is matched against the files in the current directory (or the directory specified as part of the glob pattern), and the filenames so matched are returned by the operator. As with line input, the names are returned one at a time in scalar context, or all at once in list context. In fact, the latter usage is more prevalent. You generally see things like:

[22] This has nothing to do with the previously mentioned typeglobs, other than that they both use the * character in a wildcard fashion. The * character has the nickname "glob" when used like this. With typeglobs you're globbing symbols with the same name from the symbol table. With a filename glob, you're doing wildcard matching on the filenames in a directory, just as the various shells do.

my @files = <*.html>;

As with other kinds of pseudo-literals, one level of variable interpolation is done first, but you can't say <$foo> because that's an indirect filehandle as explained earlier. (In older version of Perl, programmers would insert braces to force interpretation as a filename glob: <${foo}>. These days, it's considered cleaner to call the internal function directly as glob($foo), which is probably the right way to have invented it in the first place.)

Whether you use the glob function or the old angle-bracket form, the globbing operator also does while magic like the line input operator, and assigns the result to $_. For example:

while (<*.c>) {
    chmod 0644, $_;
}

is equivalent to:

open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
while (<FOO>) {
    chop;
    chmod 0644, $_;
}

In fact, it's currently implemented that way, more or less. (Which means it will not work on filenames with spaces in them unless you have csh (1) on your machine.) Of course, the shortest way to do the above is:

chmod 0644, <*.c>;

Because globbing invokes a subshell, it's often faster to call readdir yourself and just do your own grep on the filenames. Furthermore, due to its current implementation of using a shell, the glob routine may get "Arg list too long" errors (unless you've installed tcsh (1) as /bin/csh).

A glob evaluates its (embedded) argument only when it is starting a new list. All values must be read before it will start over. In a list context this isn't important, because you automatically get them all anyway. In a scalar context, however, the operator returns the next value each time it is called, or a false value if you've just run out. Again, false is returned only once. So if you're expecting a single value from a glob, it is much better to say:

($file) = <blurch*>;  # list context

than to say:

$file = <blurch*>;    # scalar context

because the former slurps all the matched filenames and resets the operator, while the latter will alternate between returning a filename and returning false.

It you're trying to do variable interpolation, it's definitely better to use the glob operator, because the older notation can cause people to become confused with the indirect filehandle notation. But with things like this, it begins to become apparent that the borderline between terms and operators is a bit mushy:

@files = glob("$dir/*.[ch]");   # call glob as function
@files = glob $some_pattern;    # call glob as operator

We left the parentheses off of the second example to illustrate that glob can be used as a unary operator; that is, a prefix operator that takes a single argument. The glob operator is an example of a named unary operator, which is just one of the kinds of operators we'll talk about in the section "Operators" later in this chapter. But first we're going to talk about pattern matching operations, which also parse like terms but operate like operators.


Built-in Data Types		Pattern Matching