Labels

Sunday, June 6, 2010

PERL

Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier.[1][2] Since then, it has undergone many changes and revisions and become widely popular amongst programmers. Larry Wall continues to oversee development of the core language, and its upcoming version, Perl 6.

Perl borrows features from other programming languages including C, shell scripting (sh), AWK, and sed.[3] The language provides powerful text processing facilities without the arbitrary data length limits of many contemporary Unix tools,[4] facilitating easy manipulation of text files. It is also used for graphics programming, system administration, network programming, applications that require database access and CGI programming on the Web. Perl is nicknamed "the Swiss Army chainsaw of programming languages" due to its flexibility and adaptability.[5]




Design

The design of Perl can be understood as a response to three broad trends in the computer industry: falling hardware costs, rising labor costs, and improvements in compiler technology. Many earlier computer languages, such as Fortran and C, were designed to make efficient use of expensive computer hardware. In contrast, Perl is designed to make efficient use of expensive computer programmers.

Perl has many features that ease the programmer's task at the expense of greater CPU and memory requirements. These include automatic memory management; dynamic typing; strings, lists, and hashes; regular expressions; introspection; and an eval() function.

Wall was trained as a linguist, and the design of Perl is very much informed by linguistic principles. Examples include Huffman coding (common constructions should be short), good end-weighting (the important information should come first), and a large collection of language primitives. Perl favors language constructs that are concise and natural for humans to read and write, even where they complicate the Perl interpreter.

Perl syntax reflects the idea that "things that are different should look different." For example, scalars, arrays, and hashes have different leading sigils. Array indices and hash keys use different kinds of braces. Strings and regular expressions have different standard delimiters. This approach can be contrasted with languages such as Lisp, where the same S-expression construct and basic syntax are used for many different purposes.

Perl does not enforce any particular programming paradigm (procedural, object-oriented, functional, and others) or even require the programmer to choose among them.

There is a broad practical bent to both the Perl language and the community and culture that surround it. The preface to Programming Perl begins, "Perl is a language for getting your job done." One consequence of this is that Perl is not a tidy language. It includes many features, tolerates exceptions to its rules, and employs heuristics to resolve syntactical ambiguities. Because of the forgiving nature of the compiler, bugs can sometimes be hard to find. Discussing the variant behaviour of built-in functions in list and scalar contexts, the perlfunc(1) manual page says, "In general, they do what you want, unless you want consistency."

In addition to Larry Wall's two slogans mentioned above, Perl has several mottos that convey aspects of its design and use, including "Perl: the Swiss Army Chainsaw of Programming Languages" and "No unnecessary limits". Perl has also been called "The Duct Tape of the Internet".[34]

No written specification or standard for the Perl language exists for Perl versions through Perl 5, and there are no plans to create one for the current version of Perl. There has been only one implementation of the interpreter, and the language has evolved along with it. That interpreter, together with its functional tests, stands as a de facto specification of the language. Perl 6, however, started with a specification,[35] and several projects[36] aim to implement some or all of the specification.

[edit] Applications

Perl has many and varied applications, compounded by the availability of many standard and third-party modules.

Perl has been used since the early days of the Web to write CGI scripts. It is known as one of "the three Ps" (along with Python and PHP), the most popular dynamic languages for writing Web applications. It is also an integral component of the popular LAMP solution stack for web development. Large projects written in Perl include cPanel, Slash, Bugzilla, RT, TWiki, and Movable Type. Many high-traffic websites use Perl extensively. Examples include Amazon.com, bbc.co.uk, Priceline.com, Craigslist, IMDb,[37] LiveJournal, Slashdot and Ticketmaster.

Perl is often used as a glue language, tying together systems and interfaces that were not specifically designed to interoperate, and for "data munging,"[38] that is, converting or processing large amounts of data for tasks such as creating reports. In fact, these strengths are intimately linked. The combination makes Perl a popular all-purpose language for system administrators, particularly because short programs can be entered and run on a single command line.

With a degree of care, Perl code can be made portable across Windows and Unix. Portable Perl code is often used by suppliers of software (both COTS and bespoke) to simplify packaging and maintenance of software build and deployment scripts.

Graphical user interfaces (GUIs) may be developed using Perl. For example, Perl/Tk is commonly used to enable user interaction with Perl scripts. Such interaction may be synchronous or asynchronous using callbacks to update the GUI. For more information about the technologies involved, see Tk, Tcl, WxPerl and Prima Perl.

Perl is also widely used in finance and bioinformatics, where it is valued for rapid application development and deployment and for its capability to handle large data sets.

[edit] Implementation

Perl is implemented as a core interpreter, written in C, together with a large collection of modules, written in Perl and C. The source distribution is, as of 2009, 13.5 MB when packaged in a tar file and compressed.[39] The interpreter is 150,000 lines of C code and compiles to a 1 MB executable on typical machine architectures. Alternatively, the interpreter can be compiled to a link library and embedded in other programs. There are nearly 500 modules in the distribution, comprising 200,000 lines of Perl and an additional 350,000 lines of C code. (Much of the C code in the modules consists of character-encoding tables.)

The interpreter has an object-oriented architecture. All of the elements of the Perl language—scalars, arrays, hashes, coderefs, file handles—are represented in the interpreter by C structs. Operations on these structs are defined by a large collection of macros, typedefs, and functions; these constitute the Perl C API. The Perl API can be bewildering to the uninitiated, but its entry points follow a consistent naming scheme, which provides guidance to those who use it.

The life of a Perl interpreter divides broadly into a compile phase and a run phase.[40] In Perl, the phases are the major stages in the interpreter's life cycle. Each interpreter goes through each phase only once, and the phases follow in a fixed sequence.

Most of what happens in Perl's compile phase is compilation, and most of what happens in Perl's run phase is execution, but there are significant exceptions. Perl makes important use of its capability to execute Perl code during the compile phase. Perl will also delay compilation into the run phase. The terms that indicate the kind of processing that is actually occurring at any moment are compile time and run time. Perl is in compile time at most points during the compile phase, but compile time may also be entered during the run phase. The compile time for code in a string argument passed to the eval built-in occurs during the run phase. Perl is often in run time during the compile phase and spends most of the run phase in run time. Code in BEGIN blocks executes at run time but in the compile phase.

At compile time, the interpreter parses Perl code into a syntax tree. At run time, it executes the program by walking the tree. Text is parsed only once, and the syntax tree is subject to optimization before it is executed, so that execution is relatively efficient. Compile-time optimizations on the syntax tree include constant folding and context propagation, but peephole optimization is also performed.

Perl has a Turing-complete grammar because parsing can be affected by run-time code executed during the compile phase.[41] Therefore, Perl cannot be parsed by a straight Lex/Yacc lexer/parser combination. Instead, the interpreter implements its own lexer, which coordinates with a modified GNU bison parser to resolve ambiguities in the language.

It is often said that "Only perl can parse Perl," meaning that only the Perl interpreter (perl) can parse the Perl language (Perl), but even this is not, in general, true. Because the Perl interpreter can simulate a Turing machine during its compile phase, it would need to decide the Halting Problem in order to complete parsing in every case. It's a long-standing result that the Halting Problem is undecidable, and therefore not even perl can always parse Perl. Perl makes the unusual choice of giving the user access to its full programming power in its own compile phase. The cost in terms of theoretical purity is high, but practical inconvenience seems to be rare.

Other programs that undertake to parse Perl, such as source-code analyzers and auto-indenters, have to contend not only with ambiguous syntactic constructs but also with the undecidability of Perl parsing in the general case. Adam Kennedy's PPI project focused on parsing Perl code as a document (retaining its integrity as a document), instead of parsing Perl as executable code (which not even Perl itself can always do). It was Kennedy who first conjectured that, "parsing Perl suffers from the 'Halting Problem'."[42] and this was later proved.[43]

Perl is distributed with some 120,000 functional tests. These run as part of the normal build process and extensively exercise the interpreter and its core modules. Perl developers rely on the functional tests to ensure that changes to the interpreter do not introduce bugs; conversely, Perl users who see that the interpreter passes its functional tests on their system can have a high degree of confidence that it is working properly.

Maintenance of the Perl interpreter has become increasingly difficult over the years. The code base has been in continuous development since 1994. The code has been optimized for performance at the expense of simplicity, clarity, and strong internal interfaces. New features have been added, yet virtually complete backward compatibility with earlier versions is maintained. Major releases of Perl were coordinated by Perl pumpkings [44] which handled integrating patch submissions and bug fixes, but has since changed to a rotating, monthly release cycle. Development discussion takes place via the perl5_porters mailing list. As of Perl 5.11, development efforts have included refactoring certain core modules known as 'dual lifed' modules out of the Perl core[45] to help alleviate some of these problems.

[edit] Availability

Perl is free software and is licensed under both the Artistic License and the GNU General Public License. Distributions are available for most operating systems. It is particularly prevalent on Unix and Unix-like systems, but it has been ported to most modern (and many obsolete) platforms. With only six reported exceptions, Perl can be compiled from source code on all Unix-like, POSIX-compliant, or otherwise-Unix-compatible platforms.[46] However, this is rarely necessary, because Perl is included in the default installation of many popular operating systems.

Because of unusual changes required for the Mac OS Classic environment, a special port called MacPerl was shipped independently.[47]

The Comprehensive Perl Archive Network (CPAN) carries a complete list of supported platforms with links to the distributions available on each.[48] CPAN is also the source for publicly available Perl modules that are not part of the core Perl distribution.

[edit] Windows

Users of Microsoft Windows typically install one of the native binary distributions of Perl for Win32,[49] most commonly Strawberry Perl or ActivePerl. Compiling Perl from source code under Windows is possible, but most installations lack the requisite C compiler and build tools. This also makes it difficult to install modules from the CPAN, particularly those that are partially written in C. Users of the ActivePerl binary distribution are, therefore, dependent on the repackaged modules provided in ActiveState’s module repository, which are precompiled and can be installed with PPM. Limited resources to maintain this repository have been cause for various long-standing problems.[50][51]

Strawberry Perl,[52] is an open source distribution for Windows. It has had regular, quarterly releases since January 2008, including new modules as feedback and requests come in. Strawberry Perl aims to be able to install modules like standard Perl distributions on other platforms, including compiling XS modules. Strawberry Perl started as a way in part to address the flaws in ActiveState's distribution and resolve other problems of Perl on the Windows platform.

A community project[53] was launched by Adam Kennedy on behalf of The Perl Foundation in June 2006. A community website for "all things Windows and Perl." A major aim of this project is to provide production-quality alternative Perl distributions that include an embedded C compiler and build tools, so as to enable Windows users to install modules directly from the CPAN. A related version with research and experimental work was done in the Vanilla Perl distribution.[54]

The Cygwin emulation layer is another popular way of running Perl under Windows. Cygwin provides a Unix-like environment on Windows, and both perl and cpan are conveniently available as standard pre-compiled packages in the Cygwin setup program. Because Cygwin also includes the gcc, compiling Perl from source is also possible.

[edit] Language structure

In Perl, the minimal Hello world program may be written as follows:

print "Hello, world!\n";

This prints the string Hello, world! and a newline, symbolically expressed by an n character whose interpretation is altered by the preceding escape character (a backslash).

The canonical form of the program is slightly more verbose:

#!/usr/bin/perl

print "Hello, world!\n";

The hash mark character introduces a comment in Perl, which runs up to the end of the line of code and is ignored by the compiler. The comment used here is of a special kind: it’s called the shebang line. This tells Unix-like operating systems where to find the Perl interpreter, making it possible to invoke the program without explicitly mentioning perl. (Note that, on Microsoft Windows systems, Perl programs are typically invoked by associating the .pl extension with the Perl interpreter. In order to deal with such circumstances, perl detects the shebang line and parses it for switches;[55] therefore, it is not strictly true that the shebang line is ignored by the compiler.)

The second line in the canonical form includes a semicolon, which is used to separate statements in Perl. With only a single statement in a block or file, a separator is unnecessary, so it can be omitted from the minimal form of the program—or more generally from the final statement in any block or file. The canonical form includes it because it is common to terminate every statement even when it is unnecessary to do so, as this makes editing easier: code can be added to, or moved away from, the end of a block or file without having to adjust semicolons.

Version 5.10 of Perl introduces a say function that implicitly appends a newline character to its output, making the minimal "Hello world" program even shorter:

use 5.010; # must be present to import the new 5.10 functions, notice that it is 5.010 not 5.10

say 'Hello, world!'

[edit] Data types

Perl has a number of fundamental data types. The most commonly used and discussed are scalars, arrays, hashes, filehandles, and subroutines:

Type Sigil Example Description
Scalar $ $foo a single value; it may be a number, a string, a file handle, or a reference.
Array @ @foo An ordered collection of scalars.
Associative Array % %foo A map from strings to scalars; the strings are called keys, and the scalars are called values. Also known as a Hash.
File Handle none $foo or FOO A map to a file, device, pipe, or scalar that is open for reading, writing, or both.
Subroutine & &foo A piece of code that may be passed arguments, be executed, and return data.
Typeglob * *foo The symbol table entry for all types with the name 'foo'.

[edit] Scalar values

String values (literals) must be enclosed by quotes. Enclosing a string in double quotes allows the values of variables whose names appear in the string to automatically replace the variable name (or be interpolated) in the string. Enclosing a string in single quotes prevents variable interpolation. If $name is "Jim", print("My name is $name") will print "My name is Jim", but print('My name is $name') will print "My name is $name".

To include a double quotation mark in a string, precede it with a backslash or enclose the string in single quotes. To include a single quotation mark, precede it with a backslash or enclose the string in double quotes. Strings can also be quoted with the q and qq quote-like operators. 'this' is identical to q(this) and "$this" is identical to qq($this).

Finally, multiline strings can be defined using here documents:

$multilined_string = <This is my multilined string

note that I am terminating it with the word "EOF".
EOF

Numbers (numeric constants) do not require quotation. Perl will convert numbers into strings and vice versa depending on the context in which they are used. When strings are converted into numbers, trailing non-numeric parts of the strings are discarded. If no leading part of a string is numeric, the string will be converted to the number 0. In the following example, the strings $n and $m are treated as numbers. This code prints the number '5'. The values of the variables remain the same. Note that in Perl, + is always the numeric addition operator. The string concatenation operator is the period.

$n = '3 apples';

$m = '2 oranges';
print $n + $m;

Functions are provided for the rounding of fractional values to integer values: int chops off the fractional part, rounding towards zero; POSIX::ceil and POSIX::floor round always up and always down, respectively. The number-to-string conversion of printf "%f" or sprintf "%f" round out even, use bankers' rounding.

Perl also has a boolean context that it uses in evaluating conditional statements. The following values all evaluate as false in Perl:

$false = 0; # the number zero

$false = 0.0; # the number zero as a float
$false = 0b0; # the number zero in binary
$false = 0x0; # the number zero in hexadecimal
$false = '0'; # the string zero
$false = ""; # the empty string
$false = undef; # the return value from undef
$false = 2-3+1 # computes to 0 which is converted to "0" so it is false

All other (non-zero evaluating) values evaluate to true. This includes the odd self-describing literal string of "0 but true", which in fact is 0 as a number, but true when used as a boolean. All non-numeric strings also have this property, but this particular string is truncated by Perl without a numeric warning. A less explicit but more conceptually portable version of this string is '0E0' or '0e0', which does not rely on characters being evaluated as 0, because '0E0' is literally zero times ten to the power zero.

Evaluated boolean expressions are also scalar values. The documentation does not promise which particular value of true or false is returned. Many boolean operators return 1 for true and the empty-string for false. The defined() function determines whether a variable has any value set. In the above examples, defined($false) is true for every value except undef.

If either 1 or 0 are specifically needed, an explicit conversion can be done using the conditional operator:

my $real_result = $boolean_result ? 1 : 0;

[edit] Array values

An array value (or list) is specified by listing its elements, separated by commas, enclosed by parentheses (at least where required by operator precedence).

@scores = (32, 45, 16, 5);

The qw() quote-like operator allows the definition of a list of strings without typing of quotes and commas. Almost any delimiter can be used instead of parentheses. The following lines are equivalent:

@names = ('Billy', 'Joe', 'Jim-Bob');

@names = qw(Billy Joe Jim-Bob);

The split function returns a list of strings, which are split from a string expression using a delimiter string or regular expression.

@scores = split(',', '32,45,16,5');

Individual elements of a list are accessed by providing a numerical index in square brackets. The scalar sigil must be used. Sublists (array slices) can also be specified, using a range or list of numeric indices in brackets. The array sigil is used in this case. For example, $month[3] is "March", and @month[4..6] is ("April", "May", "June").

[edit] Hash values

A hash (or associative array) may be initialized from a list of key/value pairs. If the keys are separated from the values with the => operator (sometimes called a fat comma), rather than a comma, they may be unquoted (barewords). The following lines are equivalent:

%favorite = ('joe', "red", 'sam', "blue");

%favorite = (joe => 'red', sam => 'blue');

Individual values in a hash are accessed by providing the corresponding key, in curly braces. The $ sigil identifies the accessed element as a scalar. For example, $favorite{joe} equals 'red'. A hash can also be initialized by setting its values individually:

$favorite{joe}   = 'red';

$favorite{sam} = 'blue';
$favorite{oscar} = 'green';

Multiple elements may be accessed using the @ sigil instead (identifying the result as a list). For example, @favorite{'joe', 'sam'} equals ('red', 'blue').

[edit] File Handles

File Handles provide read and write access to resources. These are most often files on disk, but can also be a device, a pipe, or even a scalar value.

Originally, File Handles could only be created with package variables, using the ALL_CAPS convention to distinguish it from other variables. Perl 5.6 and newer also accept a scalar variable, which will be set (autovivified) to a reference to an anonymous file handle, in place of a named file handle. Using the ALL_CAPS method for file handles is considered deprecated by the community.[56]

[edit] Typeglob values

A typeglob value is a symbol table entry. The main use of typeglobs is creating symbol table aliases. For example:

*PI = \3.141592653; # creating constant scalar $PI

*this = *that; # creating aliases for all data types 'this' to all data types 'that'

[edit] Array functions

The number of elements in an array can be determined either by evaluating the array in scalar context or with the help of the $# sigil. The latter gives the index of the last element in the array, not the number of elements. The expressions scalar(@array) and ($#array + 1) are equivalent.

[edit] Hash functions

There are a few functions that operate on entire hashes. The keys function takes a hash and returns the list of its keys. Similarly, the values function returns a hash's values. Note that the keys and values are returned in a consistent but arbitrary order.

# Every call to each returns the next key/value pair.

# All values will be eventually returned, but their order
# cannot be predicted.
while (($name, $address) = each %addressbook) {
print "$name lives at $address\n";
}

# Similar to the above, but sorted alphabetically
foreach my $next_name (sort keys %addressbook) {
print "$next_name lives at $addressbook{$next_name}\n";
}

[edit] Control structures

Perl has several kinds of control structures.

It has block-oriented control structures, similar to those in the C, Javascript, and Java programming languages. Conditions are surrounded by parentheses, and controlled blocks are surrounded by braces:

label while ( cond ) { ... }

label while ( cond ) { ... } continue { ... }
label for ( init-expr ; cond-expr ; incr-expr ) { ... }
label foreach var ( list ) { ... }
label foreach var ( list ) { ... } continue { ... }
if ( cond ) { ... }
if ( cond ) { ... } else { ... }
if ( cond ) { ... } elsif ( cond ) { ... } else { ... }

Where only a single statement is being controlled, statement modifiers provide a more-concise syntax:

statement if cond ;

statement unless cond ;
statement while cond ;
statement until cond ;
statement foreach list ;

Short-circuit logical operators are commonly used to affect control flow at the expression level:

expr and expr

expr && expr
expr or expr
expr || expr

(The "and" and "or" operators are similar to && and || but have lower precedence, which makes it easier to use them to control entire statements.)

The flow control keywords next (corresponding to C's continue), last (corresponding to C's break), return, and redo are expressions, so they can be used with short-circuit operators.

Perl also has two implicit looping constructs, each of which has two forms:

results = grep { ... } list

results = grep expr, list
results = map { ... } list
results = map expr, list

grep returns all elements of list for which the controlled block or expression evaluates to true. map evaluates the controlled block or expression for each element of list and returns a list of the resulting values. These constructs enable a simple functional programming style.

Up until the 5.10.0 release, there was no switch statement in Perl 5. From 5.10.0 onward, a multi-way branch statement called given/when is available, which takes the following form:

use v5.10; # must be present to import the new 5.10 functions

given ( expr ) { when ( cond ) { ... } default { ... } }

Syntactically, this structure behaves similarly to switch statements found in other languages, but with a few important differences. The largest is that unlike switch/case structures, given/when statements break execution after the first successful branch, rather than waiting for explicitly defined break commands. Conversely, explicit continues are instead necessary to emulate switch behavior.

For those not using Perl 5.10, the Perl documentation describes a half-dozen ways to achieve the same effect by using other control structures. There is also a Switch module, which provides functionality modeled on the forthcoming Perl 6 re-design. It is implemented using a source filter, so its use is unofficially discouraged.[57]

Perl includes a goto label statement, but it is rarely used. Situations where a goto is called for in other languages don't occur as often in Perl because of its breadth of flow control options.

There is also a goto &sub statement that performs a tail call. It terminates the current subroutine and immediately calls the specified sub. This is used in situations where a caller can perform more-efficient stack management than Perl itself (typically because no change to the current stack is required), and in deep recursion, tail calling can have substantial positive impact on performance because it avoids the overhead of scope/stack management on return.

[edit] Subroutines

Subroutines are defined with the sub keyword and are invoked simply by naming them. If the subroutine in question has not yet been declared, invocation requires either parentheses after the function name or an ampersand (&) before it. But using & without parentheses will also implicitly pass the arguments of the current subroutine to the one called, and using & with parentheses will bypass prototypes.

# Calling a subroutine


# Parentheses are required here if the subroutine is defined later in the code
foo();
&foo; # (this also works, but has other consequences regarding arguments passed to the subroutine)

# Defining a subroutine
sub foo { ... }

foo; # Here parentheses are not required

A list of arguments may be provided after the subroutine name. Arguments may be scalars, lists, or hashes.

foo $x, @y, %z;

The parameters to a subroutine do not need to be declared as to either number or type; in fact, they may vary from call to call. Any validation of parameters must be performed explicitly inside the subroutine.

Arrays are expanded to their elements; hashes are expanded to a list of key/value pairs; and the whole lot is passed into the subroutine as one flat list of scalars.

Whatever arguments are passed are available to the subroutine in the special array @_. The elements of @_ are references to the actual arguments; changing an element of @_ changes the corresponding argument.

Elements of @_ may be accessed by subscripting it in the usual way.

$_[0], $_[1]

However, the resulting code can be difficult to read, and the parameters have pass-by-reference semantics, which may be undesirable.

One common idiom is to assign @_ to a list of named variables.

 my ($x, $y, $z) = @_;

This provides mnemonic parameter names and implements pass-by-value semantics. The my keyword indicates that the following variables are lexically scoped to the containing block.

Another idiom is to shift parameters off of @_. This is especially common when the subroutine takes only one argument or for handling the $self argument in object-oriented modules.

my $x = shift;

Subroutines may assign @_ to a hash to simulate named arguments; this is recommended in Perl Best Practices for subroutines that are likely to ever have more than three parameters.[58]

sub function1 {

my %args = @_;
print "'x' argument was '$args{x}'\n";
}
function1( x => 23 );

Subroutines may return values.

return 42, $x, @y, %z;

If the subroutine does not exit via a return statement, then it returns the last expression evaluated within the subroutine body. Arrays and hashes in the return value are expanded to lists of scalars, just as they are for arguments.

The returned expression is evaluated in the calling context of the subroutine; this can surprise the unwary.

sub list { (4, 5, 6) }

sub array { @x = (4, 5, 6); @x }

$x = list; # returns 6 - last element of list
$x = array; # returns 3 - number of elements in list
@x = list; # returns (4, 5, 6)
@x = array; # returns (4, 5, 6)

A subroutine can discover its calling context with the wantarray function.

sub either {

return wantarray ? (1, 2) : 'Oranges';
}

$x = either; # returns "Oranges"
@x = either; # returns (1, 2)

[edit] Regular expressions

The Perl language includes a specialized syntax for writing regular expressions (RE, or regexes), and the interpreter contains an engine for matching strings to regular expressions. The regular-expression engine uses a backtracking algorithm, extending its capabilities from simple pattern matching to string capture and substitution. The regular-expression engine is derived from regex written by Henry Spencer.

The Perl regular-expression syntax was originally taken from Unix Version 8 regular expressions. However, it diverged before the first release of Perl and has since grown to include far more features. Many other languages and applications are now adopting Perl compatible regular expressions over POSIX regular expressions, such as PHP, Ruby, Java, Microsoft's .NET Framework,[59] and the Apache HTTP server.

Regular-expression syntax is extremely compact, owing to history. The first regular-expression dialects were only slightly more expressive than globs, and the syntax was designed so that an expression would resemble the text that it matches.[citation needed] This meant using no more than a single punctuation character or a pair of delimiting characters to express the few supported assertions. Over time, the expressiveness of regular expressions grew tremendously, but the syntax design was never revised and continues to rely on punctuation. As a result, regular expressions can be cryptic and extremely dense.

[edit] Uses

The m// (match) operator introduces a regular-expression match. (If it is delimited by slashes, as in all of the examples here, then the leading m may be omitted for brevity. If the m is present, as in all of the following examples, other delimiters can be used in place of slashes.) In the simplest case, an expression such as

$x =~ /abc/;

evaluates to true if and only if the string $x matches the regular expression abc.

The s/// (substitute) operator, on the other hand, specifies a search-and-replace operation:

$x =~ s/abc/aBc/; # upcase the b

Another use of regular expressions is to specify delimiters for the split function:

@words = split /,/, $line;

The split function creates a list of the parts of the string that are separated by matches of the regular expression. In this example, a line is divided into a list of its comma-separated parts, and this list is then assigned to the @words array.

[edit] Syntax

[edit] Modifiers

Perl regular expressions can take modifiers. These are single-letter suffixes that modify the meaning of the expression:

$x =~ /abc/i; # case-insensitive pattern match

$x =~ s/abc/aBc/g; # global search and replace

Because the compact syntax of regular expressions can make them dense and cryptic, the /x modifier was added in Perl to help programmers write more-legible regular expressions. It allows programmers to place whitespace and comments inside regular expressions:

$x =~ /

a # match 'a'
. # followed by any character
c # then followed by the 'c'character
/x;
[edit] Capturing

Portions of a regular expression may be enclosed in parentheses; corresponding portions of a matching string are captured. Captured strings are assigned to the sequential built-in variables $1, $2, $3, ..., and a list of captured strings is returned as the value of the match.

$x =~ /a(.)c/; # capture the character between 'a' and 'c'

Captured strings $1, $2, $3, ... can be used later in the code.

Perl regular expressions also allow built-in or user-defined functions apply to the captured match, by using the /e modifier:

$x = "Oranges";

$x =~ s/(ge)/uc($1)/e; # OranGEs
$x .= $1; # append $x with the contents of the match in the previous statement: OranGEsge

[edit] Objects

There are many ways to write object-oriented code in Perl. The most basic is using "blessed" references.

Many modern Perl applications use the Moose object system. Moose is built on top of Class::MOP, a meta-object protocol, providing complete introspection for all Moose-using classes. Thus you can ask classes about their attributes, parents, children, methods, etc. using a simple API.

Moose classes:

  • A class has zero or more attributes.
  • A class has zero or more methods.
  • A class has zero or more superclasses (aka parent classes). A class inherits from its superclass(es).
  • A class does zero or more roles, which add the ability to add pre-defined functionality to classes without subclassing.
  • A class has a constructor and a destructor.
  • A class has a metaclass.
  • A class has zero or more method modifiers. These modifiers can apply to its own methods, methods that are inherited from its ancestors, or methods that are provided by roles.

Moose roles:

  • A role is something that a class does, somewhat like mixins or interfaces in other object-oriented programming languages. Unlike mixins and interfaces, roles can be applied to individual object instances.
  • A role has zero or more attributes.
  • A role has zero or more methods.
  • A role has zero or more method modifiers.
  • A role has zero or more required methods.

[edit] Examples

An example of a class written using the MooseX::Declare[60] extension to Moose:

use MooseX::Declare;


class Point3D extends Point {
has 'z' => (isa => 'Num', is => 'rw');

after clear {
$self->z(0);
}
method set_to (Num $x, Num $y, Num $z) {
$self->x($x);
$self->y($y);
$self->z($z);
}
}

This is a class named Point3D that extends another class named Point explained in Moose examples. Id adds to its base class a new attribute z, redefines the method set_to and extends the method clear.

[edit] Database interfaces

Perl is widely favored for database applications. Its text-handling facilities are useful for generating SQL queries; arrays, hashes, and automatic memory management make it easy to collect and process the returned data.

In early versions of Perl, database interfaces were created by relinking the interpreter with a client-side database library. This was sufficiently difficult that it was done for only a few of the most-important and most widely used databases, and it restricted the resulting perl executable to using just one database interface at a time.

In Perl 5, database interfaces are implemented by Perl DBI modules. The DBI (Database Interface) module presents a single, database-independent interface to Perl applications, while the DBD (Database Driver) modules handle the details of accessing some 50 different databases; there are DBD drivers for most ANSI SQL databases.

DBI provides caching for database handles and queries, which can greatly improve performance in long-lived execution environments such as mod_perl,[61] helping high-volume systems avert load spikes as in the Slashdot effect.

In modern Perl applications, especially those written using Web application frameworks such as Catalyst, the DBI module is often used indirectly via object-relational mappers such as DBIx::Class, Class::DBI or Rose::DB::Object which generate SQL queries and handle data transparently to the application author.

[edit]

1 comment: