How to Use...

Just work through from start to finish. All you need is a Win32 PC and curiosity. Oh, and some time. When you finish, please send me a critique. In fact, send one even if you don't finish. I appreciate all input, especially error checks ! And money cheques.

Conventions :

Sometimes you'll need to type something in on the command line.These commands will be in green, for example :
perl changeworld.pl parm1 datafile.txt
Code that you should load into your editor and run is in blue (don't run this now, it's just an example):
```
while (<DATFILE>) {
        printf "%2s : $_",$.;
}
```
when functions are referred to in the text, their names are highlighted in red. For example, the split function.

All the code examples have been tested, and you can just cut'n'paste (brave statement). I haven't listed the output of each example. You need to run it and see for yourself. Consider this course interactive.

What you need to know

You need to be able to differentiate between a PC and a toaster. No programming experience is necessary. You do need to understand the basics of PC operation. If you don't understand what directories and files are then you'll find this difficult. You might find it difficult even if you do :-)

Use of this document

If you want to translate this, use it for your intranet, mirror it or otherwise use it please email me. I'm agreeable to most proposals provided I know about them - which means you can get the latest versions. Remember this document is copyrighted.

--

Robert Pepper
mailto:Robert@netcat.co.uk
http://www.netcat.co.uk/rob/perl/win32perltut.html

What is Perl ?

Perl is a programming language. Perl stands for Pratical Report and Extraction Language. You'll notice people refer to 'perl' and "Perl". "Perl" is the programming language as a whole - 'perl' is the name of the core executable. Some of Perl's many strengths are :

Speed of development. You edit a text file, and just run it. You can develop programs very quickly like this. No seperate compiler needed. I find Perl runs a program quicker than Java, let alone compare the complete modify-compile-run-oh-no-forgot-that-semicolon sequence.
Power. Perl's regular expressions are some of the best available. You can work with objects, sockets...everything a systems administrator could want. And that's just the standard distribution. Add the wealth of modules available on CPAN and you have it all. Don't equate scripting languages with toy languages.
Usuability. All that power and capability can be learnt in easy stages. If you can write a batch file you can program Perl. You don't have to learn object oriented programming, but you can write OO programs in Perl. If autoincrementing nonexistent variables scares you, make perl refuse to let you. There is always more than one way to do it in Perl. You decide your style of programming, and Perl will accomodate you.
Portability. On the Superhighway to the Portability Panacea, Perl's Porsche powers past Java's jaded jalopy. Many people develop Perl scripts on NT, or Win95, then just FTP them to a Unix server where they run. No modification necessary.
Editing tools You don't need the latest Integrated Development Enviroment for Perl. You can develop Perl scripts with any text editor. Notepad, vi, MS Word 97, or even direct off the console. Of course, you can make things easy and use one of the many freeware or shareware programmer's file editors.
Price. Yes, 0 guilders, pounds, dmarks, dollars or whatever. And the peer to peer support is also free, and often far better than you'd ever get by paying some company to answer the phone and tell you to do what you just tried several times already, then look up the same reference books you already own.

What is Win32 ?

Win32 refers to Microsoft Windows 32-bit operating systems. At the time of writing that's Windows 95 and Windows NT. Win32 does not mean Windows 3.11 running Win32s.

What is Perl for Win32 ?

Microsoft decided Perl would be a Good Thing to have on Win32. See, they're not so bad. So they employed Hip Commnications to port Perl to the Win32 platform (port in this sense doesn't mean a shipyard or a drink - it means taking the source code for perl and changing it so it runs on Windows).

The main perl developer at Hip, Dick Hardt, later left the company and formed ActiveWare Internet Corp. Dick took Perl for Win32 with him, and continued development. In August 1997, ActiveWare changed their name to ActiveState Tool Corp. "Perl for Win32" is a trademark of ActiveState Tool Corp. However, the last release of the base (native) version will now compile directly for Win32. The latest native version at the time of writing is perl 5.004.

The ActiveState version includes some additional modules and features, not least of which are Perl for ISAPI (perlis.dll) and PerlScript. These don't work (yet) with the native version. But they will soon because ActiveState is merging their version with the native version. This should happen.....Some Time Soon...when Perl 5.005 is released.

Perl is developed on the latest NT platform, and may or may not work on older versions of NT. The latest version of Perl will run on the latest version of Windows 95 (or 98 when it is released). Be aware that some things which work under Windows NT don't work under Windows 95 because Win95 just doesn't have the functionality. In the same way, some Perl features you can use under Unix either don't work, or work differently compared to the Win32 platform. Check the documentation !

What can you do with Perl ?

Just two popular examples :

The Internet

Go surf. Notice how many websites have dynamic pages with .pl or similar as the filename extension ? That's Perl. It is the most popular language for CGI programming for many reasons, most of which are mentioned above. In fact, there are a great many more dynamic pages written with perl that may not have a .pl extension. Perl has spread across Internet.

Systems Administration

If you are an NT sysadmin, chances are you aren't used to programming. In which case, the advantages of Perl may not be clear. Do you need it ? Is it worth it ?

After you read this tutorial you will know more than enough to start using Perl productively. You really need very little knowledge to save time. Imagine driving a car for years, then realising it has five gears, not four. That's the sort of improvement learning Perl means. When you are proficient, you find the difference like realisng the same car has a revese gear and you don't have to push it backwards. Perl means you can be lazier. Lazy sysadmins are good sysadmins, as I keep telling my boss. You'll never touch a batch file again !

Support

There are six mailing lists for Perl for Win32. Read all about them on http://www.activestate.com/. Make sure you read the charter too. Many people put time and effort into the creation of those lists, so don't insult us by ignoring the guidelines. Anyone with an interest in Perl for Win32 should be subscribing to at least one of these lists. The charter also lists useful sites and newsgroups.

Setup

Three stages:

Get the software
Install it
Run a test Script

1. Getting the Software

An old version of Perl for Win32 is included with the Windows NT Resource Kit. Please don't use it. It is out of date. Follow the steps below to get a newer version.

The basic Perl for Win32 distribution kit is about 1.5Mb. This comprised of more than 250 files - the basic perl.exe interpreter, library modules (useful addons), documentation etc. Download times are about twice as long as for a 750Kb file. :-)

You might wish to create a root directory for the perl installation. The perl installation contains more than 250 files and it has its own directory structure. This tutorial will assume you are using c:\perl as your perl installation directory.

You can use FTP, HTTP (that's your web browser) or email.

Which file ?

You'll find three binaries for download. Don't worry about PerlScript or PerlIS. These are special versions of Perl for some web servers. If you run Microsoft Internet Information Server, they will be of use but you are strongly advised to work with perl a while before you start trying Perl for ISAPI (perlis.dll) or PerlScript.

As of the time of writing the latest build is 315, so the file to get is pw32i315.exe.

Make sure you do not download pw32axxx.exe. This is for a Alpha machine and will not work on an Intel PC. If you are an Alpha user you knew that already :-)

HTTP (Web browsers)

Go to http://www.activestate.com/ and follow the Download link.

FTP

ftp://ftp.linux.ActiveState.com/pub/Perl-Win32/Release

Email

Send an email to ListManager@ActiveState.com with the following commands in the body of the message:

GET perl-win32-announce Pw32ixxx.exe

where xxx is whatever build number we are up to at the time. Remember it is a 1.5Mb file, which is quite a large attachment. It is possible it won't make it to your machine for this reason. If you are using a company email account your friendly systems administrator would probably appreciate you discussing this with him (or her - that's the last politically correct statement for a while).

2. Installation

So you now have pw32ixxx.exe and it is in c:\perl or whatever directory you are using. Installation is easy. We'll use a command prompt as you will be working with the command prompt later. If the phrase 'command prompt' mystifies you, then doubleclick the MS-DOS icon and you'll see one. Looks like this : c:> If you can't find the icon, click Start, then Run, then if you are running Win95 type command.com and hit Enter. If you are running NT, type cmd.exe and hit Enter.

Switch to c:\perl
Run the install program thus : pw32ixxx.exe (of course, xxx is the latest build number...:-)
The install program will offer to unzip into c:\perl. If you are not using c:\perl as your perl installation directory, change the path. Leave both checkboxes about overwriting and when done checked.
You'll see a command window. If you have followed these instructions perl has indeed been unpacked into its final destination directory, so you can just respond Y.
Allow the search path to be modified
Associate perl with .pl ? If you do this you can run a perl script just by doubleclicking it. Personally I prefer doubleclicking to start a text editor and load the script, so I always answer no to this and run scripts from the command line. So politely refuse the kind request and answer N. If you do decide to associate perl.exe with .pl, change the mapping so perl.exe accepts several parameters.
If you are running IIS you'll see a message about I/O redirection. Just say Y. It is a Good Thing. Trust me.
If using NT, you'll need to logon/off as it says for the path to take effect, or reboot Win95.
After Step 8, start your command prompt again and run this : perl -v. You should see the version numberof Perl displayed. Remember this for when you ask questions on discussion groups.

If you didn't see the version number, perl.exe is not in the path. Review the steps above carefully.

3. Testing - Your First Perl Script

Assuming all has gone to plan, now create your first Perl script. I reccomend creating a new directory for your perl scripts, seperate to your data files and the perl installation. For example c:\pscripts\, which is what I'll assume you are using in this tutorial.

Start up whatever text editor you're going to hack Perl with. Notepad.Exe is just fine. Type in the following :

print "My first Perl script\n";

and save it to c:\scripts\myfirst.pl. You don't need to exit Notepad - keep it open, as we'll be making changes very soon. Switch to your command prompt, and change to the directory. Execute the script : perl myfirst.pl and you'll see the output. Welcome to the world of Perl ! See what I mean about it being easy to start ? However, it is difficult to finish with Perl once you begin :-)

Now we need to analyse what's going on here a little. First note that the line ends with a semicolon

. Almost all lines in Perl end with semicolons. Also note the \n . This the code to tell Perl to output a newline. If that's not clear, delete the

\n

from the program and run it again :

print "My first Perl script";

NB - almost every Perl book is written for UN*X, which is a problem for Win32. This leads to scripts like :

#!c:/perl/perl.exe

print "I'm a cool Perl hacker\n";

The function of the 'shebang' line is to tell the shell how to execute the file. Under Unix, this makes sense. Under Win32, the system must already know how to execute the file before it is loaded so the line is not needed.

However, the line It is not completely ignored, as it is searched for any switches you may have given Perl (for example -w to turn on warnings). However, you don't need it. You may also choose to add the line so your scripts run directly on Unix without modification, as Unix boxes probably do need it. Anyway, on with the lesson.

Variables

So Perl is working, and you are working with Perl. Now for something more interesting than simple printing. Variables. Let's take simple scalar variables first. A scalar variable is a single value. Like $var=10 which sets the variables $var to the value of 10. Later, we'll look at lists like arrays and hashes, where @var refers to more than one value. Scalar is Singular.

If you've learnt any JavaScript or BASIC you'd be surprised by $var=10. With those languages, if you want to assign the value 10 to a variable called var you'd write var=10.

Not so in Perl. This is a Feature. All variables are prefixed with a symbol such as

$
@ %

. This has certain advantages, like making programs easier to read. You can see where the variables are quite easily. And not only that, what sort of variable it is. The human language German has a similar principle (except nouns are captalised, not prefixed with $ and Perl is easier to pronounce). You'll agree later....

Anyway, more hands-on. Time to try some variables :

$string="perl";
$num=20;
print "The string is $string and the number is $num\n";

A closer look...notice you don't have to say what type of variable you are declaring. In other languges you need to say if the variable is a string, array, or whatever. You might even have to declare what type of number it is. If you know any Java you'd been saying things like int var=10 which defines the variable var as an integer, with the value 10. Yes, there are different types but you don't need to know about them with Perl. Typecasting ? That's not politically correct any more !

If you didn't already know, a tiny little comma out of place can lead to completely unexpected results. If the above code didn't work, you haven't typed it in exactly as you should have done. Those are double quotes

"
"

, not singles ' ' .

Also notice the way the variables are used in the string. Sticking variables inside of strings has a technical term - "variable interpolation". Now, if we didn't have the handy $ prefix for we'd have to do something like the example below, which is pseudocode. Pseudocode is code to demonstrate a concept, not designed to be run. Like certain Microsoft software.

print "The string is ".string." and the number is ".num."\n";

which is much more work. Convinced about those prefixes yet ?

Single quotes have their use. Try this :

$string="perl";
$num=20;
print "Doubles: The string is $string and the number is $num\n";
print 'Singles: The string is $string and the number is $num\n';

Double quotes allow the aforementioned variable interpolation. Single quotes do not. Both have their uses as you will see later, depending on whether you wish to interpolate anything.

More on Variables

If you want to add 1 to a variable you can, logically, do this :

$num=$num+1

. There is a shorter way to do this, which is $num++. This is an autoincrement. Guess what this is :

$num--

. Yes, an autodecrement.

This example illustrates the above :

$num=10;
print "\$num is $num\n";

$num++;
print "\$num is $num\n";

$num--;
print "\$num is $num\n";

$num+=3;
print "\$num is $num\n";

The last example demonstrates that it doesn't have to be just 1 you add/decrease by.

There's something else new in the code above. The \ . You can see what this does - it 'escapes' the special meaning of

. That means just the

symbol is printed instead of it referring to a variable. Actually \ has a deeper meaning - it escapes all of Perl's special characters, not just $ . Also, it turns some non-special characters into something special. Like what ? Like n . Add the magic

and the humble 'n' becomes the mighty NewLine ! The

character can also escape itself. So if you want to print a single \ try :

print "the MS-DOS path is c:\\scripts\\";

Oh, '\' is also used for other things like references. But that's not even covered here.

There is a technical term for these 'special characters' such as @ $ %. They are called metacharacters. Perl uses plenty of metacharacters. You'll be using all sorts of obscure characters in your Perl hacking career. This has earned perl a reputation for being difficult to understand. That's true, but once you learn the character meanings reading perl code becomes much easier precisely because of all these strange characters.

Perl uses so many weird characters that sometimes the same character has two or more meanings, depending on its context. As an example, the humble dot

can join two variables together, act as a wildcard or become a range operator if there are two of them together. If this sounds crazy, think about the English language. What do the following mean to you ?

MEAN
POLISH
LIKE

Mean is, in one context, is a word to used describe the purpose of something. It is also another word for average. Furthermore, it describes a nasty person, or a person who doesn't like spending money, and is used in slang to refer to something impressive and good. Polish, when captialised, can either mean pertaining to the country Poland, or the act of making something shiny. And 'like' can mean similar to, or affection for.

So, when you speak or write English (think of two, to and too) you know what these words mean by their context. It is exactly the same way with Perl. Just don't assume a given metacharacter always means what you first thought it did.

Finally, try this :

$string="perl";
$num=20;
$mx=3;

print "The string is $string and the number is $num\n";

$num*=$mx;
$string++;
print "The string is $string and the number is $num\n";

Note the easy shortcut *= meaning 'mulitply $num by $mx' or,

$num=$num*$mx

. Of course Perl supports the usual

+
- * / ** %

operators. The last two are exponentiation (to the power of) and modulus (remainder of x divided by y). Also note the way you can increment a string ! Is this language flexible or what ?

More on the print function

The

print

function is a list operator. That means it accepts a list of things to print, seperated by commas. As an example :

print "a doublequoted string ", $var, 'a variable called var', $num,"\n";

Of course, you just put all the above inside a single doublequoted string :

print "a doublequoted string $var a variable called var $num \n";

to achieve the same effect. The advantage of using the

print

function in list context is that expressions are evaluated before being printed. For example, try this :

$var="Perl";
$num=10;
print "Two \$nums are $num * 2 and adding one to \$var makes $var++\n";
print "Two \$nums are ", $num * 2," and adding one to \$var makes ", $var++,"\n";

You might have been slightly surprised by the result of that last experiment. In particular, what happened to our variable

$var

? It should have been incremented by one, resulting in Perm. The reason being that 'm' is the next letter after 'l' :-)

Actually, it was incremented by 1. We are postincrementing $var++ the variable, rather than preincrementing it.

The difference is that with postincrements, the value of the variable is returned, then the operation is performed on it. So in the example above, the current value of $var was returned to the print function, then 1 was added. You can prove this to yourself by adding the line

print
"\$var is now $var\n";

to the end of the example above.

If we want the operation to be peformed on $var before the value is returned to the print function, then preincrement is the way to go. ++$var will do the trick.

Subroutines - A First Look

Let's take a another look at the example we used to show how the autoincrement system works. Messy, isn't it ? This is Batch File Writing Mentality. Notice how we use exactly the same code four times. Why not just put it in a subroutine ?

$num=10;                # sets $num to 10
&print_results;         # prints variable $num

$num++;
&print_results;

$num*=3;
&print_results;

$num/=3;
&print_results;

sub print_results {
        print "\$num is $num\n";
}

Easier and neater. The subroutine can go anywhere in your script, at the beginning, end, middle...makes no difference. Personally I put all mine at the bottom and reserve the top part for setting variables and main program flow.

A subroutine is defined by starting with sub then the name. After that you need a curly left bracket

, then all the code for your subroutine. Finish it off with a closing brace } . The area between the two braces is called a block. Remember this. There are such things as anonymous subroutines but not here. Everything here has a name.

Subroutines are usually called by prefixing their name with & , like so &print_results; . In most circumstances you can forget the

prefix but it is wise to leave it for the time being to avoid confusion.

If you are worrying about variable visbility, don't. All the variables we are using so far are visible everywhere. You can restrict visibility quite easily, but that's not important right now. If you weren't worrying about variable visibility, please don't start. (paranoid ?)

Notice a

crept in there. That's a comment. Everything after a

is ignored. You can't continue it onto a newline however, so if your comment won't fit on one line start a new one with # . There are ways to create Plain Old Documentation (POD) and more ways to comment but they are not detailed here.

Comparisons

if

statement is simple.

if day is Sunday, lie in
bed

. A simple test, with two outcomes. Perl conversion (don't run this) :

if ($day eq "sunday") {
        &lie_in_bed;
}

You already know that &lie_in_bed is a call to a subroutine. We assume $day is set earlier in the program. If $day is not equal to weekend &lie_in_bed is not executed (pity). You don't need to say anything else. Try this :

$day="sunday";

if ($day eq "sunday") {
        print "Zzzzz....\n";
}

Note the syntax. The if statement requires something to test for Truth. This expression must be in (parens), then you have the braces to form a block.

There are many Perl functions which test for Truth. Some are

if, while,
unless

. So it is important you know what truth is, as defined by Perl, not your tax forms. Here are the three main rules :

Any string is true except for "" and "0".
Any number is true except for 0.
Any undefined value is false.

Some example code to illustrate the point :

&isit;                   # $test1 is at this moment undefined

$test1="hello";         # a string, not equal to "" or "0"
&isit;

$test1=0.0;             # $test1 is now a number, effectively 0
&isit;

$test1="0.0";           # $test1 is a string, but NOT effectively 0 !
&isit;

sub isit {
        if ($test1) {                           # tests $test1 for truth or not
                print "$test1 is true\n";
        } else {                                # else statement if it is not true
                print "$test1 is false\n";
        }
}

The first test fails because $test1 is undefined. This means it has not been created by assigning a value to it. So according to Rule 3 it is false. The last two tests are interesting. Of course, 0.0 is the same as 0 in a numeric context. But it is not the same as 0 in a string context, so it is true.

So here we are testing single variables. What's more useful is testing the result of an expression. For example, this is an expression :

$x *
2

and so is this

$day eq
"Sunday"

. It is the end result of these expressions that is evaluated for truth.

Another example :

if (5 - 5) {
        print "Testnum is true\n";
} else {
        print "Testnum is false\n";
}

$day="Sunday";

$y=($day eq "Sunday");
$x=($day eq "Monday");

print "\$x is $x and \$y is $y\n";

The first test fails because 5-5 of course is 0, which is false.

Next, we compare the variable

$day

to two different strings. The result of the comparison is stored in a variable.

The first test returns the value 1, which is true. The second test doesn't seem to return anything (actually it returns ""), which is false.

The parens are used to force Perl to evaluate the comparison first, then assign the result to the variable. Try it without the parens.

Now pay close attention, otherwise you'll end up posting an annoying question somewhere. The symbol

is an assignment operator, not a comparison operator. Therefore :

if ($x = 10) is always true, because $xhas been assigned the value 10 successfully.
if ($x == 10)compares the two values, which might not be equal.

There are two types of comparison operator - numeric and string. You've already seen two, == and eq. Run this :

$foo=291;
$bar=30;

if ($foo < $bar) { 
        print "$foo is less than $bar (numeric)\n"; 
}

if ($foo lt $bar) { 
        print "$foo is less than $bar (string)\n"; 
}

Alphabetically, that is in a string context, 291 comes before 30. It is actually decided by the ASCII value, but alphabetically is close enough. Change the numbers around a little. Notice how Perl doesn't care wheter it uses a string comparison operator on a numeric value, or vice versa. This is typical of Perl's flexibility. Bondage and discipline are alien concepts to Perl. This flexibility does have a drawback. If you're on a programming precipice, threatening suicide by jumping off, Perl won't talk you out your decision but will provide several ways of jumping, stepping or falling to your doom while silently watching your early conclusion. So be careful.

The Perl Motto is : "There is More Than One Way to Do It" or TIMTOWTDI. Pronounced Tim-Toady. This tutorial doesn't try and mention all possible ways of doing everything. Write your Perl programs the way you want to.

The rest of the operators are :

Comparison	Numeric	String
Equal	==	eq
Not equal	!=	ne
Greater than	>	gt
Less than	<	lt
Greater than or equal to	>=	ge
Less than or equal to	<=	le

Just remember :

if you are testing a value as a string there should be only letters in your comparsion operator.
if you are testing a value as a number there should only be non-alpha characters in your comparison operator
note 'as a' above. You can test numbers as string and vice versa. Perl never complains.

More about if statements. Run this :

$age=25;
$max=30;

if ($age > $max) {
        print "Too old !\n";
} else {
        print "Young person !\n";
}

It is easy to see what else does. If the expression is false then whatever is in the

else

block is evaluated (or carried out, executed, whatever term you choose to use). Simple. But what if you want another test ? Perl can do that too.

$age=25;
$max=30;
$min=18;

if ($age > $max) {
        print "Too old !\n";
} elsif ($age < $min) { 
        print "Too young !\n"; 
} else { 
        print "Just right !\n"; 
}

If the first test fails, the second is evaluated. This carries on until there are no more elsif statements, or an else statement is reached. An else statement is optional, and no elsif statements should come after it.

There is a big difference between the above example the the one below:

if ($age > $max) {
        print "Too old !\n";
} 

if ($age < $min) {
        print "Too young !\n";
}

If you run it, it will return the same result - in this case. However, it is Bad Programming Practice. In this case we are testing a number, but suppose we were testing a string to see if it contained R or S. It is possible that a string could contain both R and S. So it would pass both 'if' tests. Using an

elsif

avoids this. As soon as the first statement is true, no more elsif statements (and no else statement) are executed.

You don't need to take up a whole three lines :

print "Too old\n" if     $age > $max;
print "Too old\n" unless $age < $max;

I added some whitespace there for asthetic beauty. There are other operators that you can use instead of

if

and unless , but that's for later on.

User Input

Sometimes you have to interact with the user. It is a pain, but sometimes necessary, especially for the live ones. To ask for input and do something with it try this :

print "Please tell me your name :";
$name=<STDIN>;
print "Thanks for making me happy $name !\n"

New things to learn here. Firstly,

<STDIN>

. STDIN is where all information normally comes from. You could say it is the standard source for input. Guess what STDIN stands for :-)

In this case it is input from the keyboard. Also, the angle brackets

<>

read from a filehandle. Filehandles are what you use to interact with things such as files, socket connections and more.

So we are reading from the STDIN filehandle. The value is assigned to

$name

and printed. Any idea why the ! ends up on a new line ? on a new line on a newline ????

As you pressed Enter, you of course included a newline with your name. The easy way to get rid of it is to it like so :

print "Please tell me your name :";
$name=<STDIN>;
chop $name;
print "Thanks for making me happy $name !\n"

and that works as it should. The

chop

function removes the last character of whatever it is given to chop, in this case removing the newline for us. In fact, that can be shortened :

print "Please tell me your name :";
chop ($name=<STDIN>);
print "Thanks for making me happy $name !"

The parentheses ( ) force chop to act on the result of what is inside them. So

$name=<STDIN>

is evaluated first, then the result from that, which is $name , is chopped.

Chopping is dangerous, as my friend One Hand Harold will tell you. Everyone is concerned about safety these days, and your perl code is should be no exception. Rather than just remove the last character regardless of whatever it is, you can remove the last character only if it is a newline with chomp :

chomp ($name=<STDIN>);

At this point the perl gurus are screaming "I found an error !". Well, chomp doesn't always remove the last character if it is a newline but if it doesn't, you have set a special variable, namely

$/

, to something different. I presume that if you do set $/ you know what it does. It is explained later in this very document. Of course, being a good pupil, you wouldn't experiment with the unknown, blindly changing things just for the hell of it.

If you don't, you'll never learn anything useful :-)

Arrays

Perl has two types of array, associative arrays (hashes) and arrays. Both types are lists. A list is just a collection of variables referred to as the collection, not as indvidual elements.

You can think of Perl's lists a herd of animals. List context refers to the entire herd, scalar context refers to a single element. A list is a herd of variables. The variables don't have to be all of the same type - you might have a herd of ten sheep, three lions and two wolves. It would probably be just three lions and 1.5 wolves before long, but bear with me. In the same way, you might have a Perl list of three scalar variables, two array elements and ten hash elements.

Certain types of lists are known by certain names. Just as a herd of sheep is called a flock, a herd of lions is called a pride and a herd of wolves is called a pack, some types of Perl list have a special names.

For example, an array is an ordered list of scalar variables. This list can be referred to as a whole, or you can refer to individual elements in the list. The program below defines a an array, called

@names

. It puts five values into the array.

@names=("Muriel","Gavin","Susanne","Sarah","Anna");

print "The elements of \@names are @names\n";
print "The first element is $names[0] \n";
print "The third element is $names[2] \n";
print 'There are ',scalar(@names)," elements in the array\n";

Firstly, notice how we define @names . As it is in a list context, we are using parens. Each value is comma seperated, which is Perl's default list delimiter. The double quotes are not necessary, but as these are string values it makes it easier to read and change later on.

Next, notice how we print it. Simply refer to it as a whole, that is in list context.. List context means referring to more than one element of a list at a time. The code print @names; will work perfectly well too. But....

I usually learn something about Perl every time I work with it. When running a course, a student taught me this trick which he had discovered :

@names=("Muriel","Gavin","Susanne","Sarah","Anna","Paul","Trish","Simon");

print @names;
print "\n";
print "@names";

When a list is placed inside doublequotes, it is space delimited when interpolated. Useful.

If we want to do anything with the array as a list, that is doing something with all the values at once, refer to the array as @array .. That's important. The @ prefix is also used when you want to refer to more than one element, but not the entire array. That's called a slice . Cake analogies are appropiate, and somewhat tastier. Pie analogies are probably healthier but equally accurate.

Arrays are not much use unless we can get to indvidual elements. Firstly, we are dealing with a single element of the list, so we cannot use

which refers to multiple elements of the array. It is a single, scalar variable, so

is used. Secondly, we must specify which element we want. That's easy -

$array[0]

for the first,

$array[1]

for the second and so forth. Array indexes start at 0, unless you do something which is so highly deprecated ('deprecated' means allowed, usually for backwards compatibility, but disapproved of because there are better ways) I'm not even going to mention it.

Finally, we force what is normally list context (more than one element) into scalar context (single element) to give us the amount of elements in the array. Without the scalar , it would be the same as the second line of the program.

Please understand this :

$myvar="scalar variable";
@myvar=("one","element","of","an","array","called","myvar");

print $myvar;        # refers to the contents of a scalar variable called myvar
print $myvar[1];     # refers to the second element of the array myvar
print @myvar;        # refers to all the elements of array myvar

The two variables $myvar and @myvar are not, in any way, related. Not even distantly. Technically, they are in different namespaces.

Going back to the animal analogy, it is like having a dog named 'Myvar' and a goldfish called 'Myvar'. You'll never get the two mixed up because when you call 'Myvar !!!!' or open a can of dog food the 'Myvar' dog will come running and goldfish won't. Now, you couldn't have two dogs called 'Myvar' and in the same way you can't have two Perl variables in the same namespace called 'Myvar'.

More on Arrays

The element number can be a variable.

print "Enter a number :";
chomp ($x=<STDIN>);

@names=("Muriel","Gavin","Susanne","Sarah","Anna");

print "You requested element $x who is $names[$x]\n";

print "The index number of the last element is $#names \n";

This is useful. Notice the last line of the example. It returns the index number of the last element. Of course you could always just do this $last=scalar(@names)-1; but this is more efficient. It is an easy way to get the last element, as follows :

print "Enter a number :";
chomp ($x=<STDIN>);

@names=("Muriel","Gavin","Susanne","Sarah","Anna","Paul","Trish","Simon");

print "The first two elements are @names[0,1]\n";
print "The first three elements are @names[0..2]\n";
print "You requested element $x who is $names[$x]\n";
print "The elements before and after are : @names[$x-1,$x+1]\n";
print "The first, second, third and fifth elements are @names[0..2,4]\n";

print "The last element is $names[$#names]\n";

It looks complex, but it is not. Really. Notice you can have multiple values seperated by a comma. As many as you like, in whatever order. The range operator .. gives you everything between and including the values. And finally look at how we print the last element - remember

$#names

gives us a number ? Simply enclose it inside square brackets and you have the last element.

Do also note that because element accesses such as [0,1] are more than one variable, we cannot use the scalar prefix, namely the

symbol. We are accessing the array in list context, so we use the @ symbol. Doesn't matter that it is not the entire array. Remember, accessing more than one element of an array but not the entire array is called a slice. I won't go over the food analogies again.

Foreach

All well and good, but what if we want to load each element of the array in turn ? Well, we could build a for loop like this :

@names=("Muriel","Gavin","Susanne","Sarah","Anna","Paul","Trish","Simon");

for ($x=0; $x <= $#names; $x++) {
        print "$names[$x]\n"; 
}

which sets $x to 0, runs the loop once, then adds one to $x , checks it is less than $#names , if so carries on. By the way, that was your introduction to

for

loops. Just to go into a little detail there, the

for

loop has three parts to it :

Initialisation
Test Condition
Modification

In this case, the variable

$x

is initialised to 0. It is immediately tested to see if it is smaller than, or equal to $names . If that is true, then the block is executed once. Critically, if it is not true the block is not executed at all. We'll discuss that later.

Once the block has been executed, the modification expression is evaluated. That's

$x++

. Then, the test condition is checked to see if the block should be executed or not.

There is a another version :

for $x (0 .. $#names) {
        print "$names[$x]\n";
}

which takes advantage of the range operator

..

(two dots together). This simply gives

$x

the value of 0, then increments

$x

by 1 until it is equal to $#names .

For true beauty we must use

foreach

foreach $person (@names) {
        print "$person";
}

This goes through each element ('iterates', another good technical word to use) of @names , and assigns each element in turn to the variable

$person

. Then you can do what you like with the variable. Much easier. You can use

for $person (@names)
{

if you want. Makes no difference at all.

In fact, that gets shorter. And now I need to introduce you to $_ , which is the Default Input and Pattern Searching Variable.

foreach (@names) {
        print "$_";
}

If you don't specify a variable to put each element into,

$_

is used instead as it is the default for this operation, and many, many others in Perl. Including the

print

function :

foreach (@names) {
        print ;
}

As we haven't supplied any arguments to

print

, $_ is printed as default. You'll be seeing a lot of

$_

in Perl. Actually, that statement is not exactly true. You will be seeing lot of places where

$_

is used, but quite often when it is used, it is not actually written. In the above example, you don't actually see

$_

but you know it is there.

Changing the Elements

So we have

@names

. We want to change it. Run this :

print "Enter a name :";
chomp ($x=<STDIN>);

@names=("Muriel","Gavin","Susanne","Sarah");

print "@names\n";

push (@names, $x);

print "@names\n";

Fairly self explantaory. The

push

function just adds a value on to the end of the array. Of course, Perl being Perl, it doesn't have to be just the one value:

print "Enter a name :";
chop ($x=<STDIN>);

@names=("Muriel","Gavin","Susanne","Sarah");
@cities=("Brussels","Hamburg","London","Breda");

print "@names\n";

push (@names, $x, 10, @cities[2..4]);

print "@names\n";

This is worth looking at in more detail. It appears there is no fifth element of @cities , as referred to by @cities[2..4] .

Actually, there is a fifth element. Add this to the end of the example :

print "There are ",scalar(@names)," elements in \@names\n";

There appear to be 8 elements in @names . However, we have just proved there are in fact 9. The reason there are 9 is that we referred to nonexistent elements of

@cities

, and Perl has quite happily extended

@names

to suit. The array

@cities

remains unchanged. Try poping the array if you don't believe me.

So that's

push

. Now for some...

Jiggerypokery with Arrays

@names=("Muriel","Gavin","Susanne","Sarah");
@cities=("Brussels","Hamburg","London","Breda");

&look;

$last=pop(@names);
unshift (@cities, $last);

&look;

sub look {
        print "Names : @names\n";
        print "Cities: @cities\n";
}

Now we have two arrays. The pop function removes the last element of an array and returns it, which means you can do something like assign the returned value to a variable. The unshift function adds a value to the beginning of the array. Hope you didn't forget that &subroutinename calls a subroutine.

push	Adds value to the end of the array
pop	Removes and returns value from end of array
shift	Removes and returns value from beginning of array
unshift	Adds value to the beginning of array

Now, accessing other elements of arrays. May I present the splice function ?

Splice

@names=("Muriel","Sarah","Susanne","Gavin");

&look;

@middle=splice (@names, 1, 2);

&look;

sub look {
        print "Names : @names\n";
        print "The Splice Girls are: @middle\n";
}

The first arguments for splice is an array. Then second is the offset. The offset is the index number of the list element to begin splicing at. In this case it is 1. Then comes the number of elements to remove, which is sensibly 1 or more in this case. You can set it to 0 and perl, in true perl style, won't complain. Setting to 0 is handy because

splice

can add elements to the middle of an array, and if you don't want any deleted 0 is the number to use. Like so :

@names=("Muriel","Gavin","Susanne","Sarah");
@cities=("Brussels","Hamburg","London","Breda");

&look;

splice (@names, 1, 0, @cities[1..3]);

&look;

sub look {
        print "Names : @names\n";
        print "Cities: @cities\n";
}

Notice how the assignment to @middle has gone - it is no longer relevant. This is not to say nothing is returned from the function - you can test it for truth to see if it was successful - but it doesn't return a list of variables.

Splice is also the way to delete elements from an array. In fact, a discussion of :

Deleting Variables

is in order. Suppose we want to delete Hamburg from the following array. How do we do it ? Perhaps :

@cities=("Brussels","Hamburg","London","Breda");

&look;

$cities[1]="";

&look;

sub look {
        print "Cities: ",scalar(@cities), ": @cities\n";
}

would be appropiate. Certainly Hamburg is removed. Shame, such a great lake. But note, the array element still exists. There are still four elements in @cities. So what we need is the appropiate

splice

function, which removes the element entirely.

splice (@cities, 1, 1);

Now that's all well and good for arrays. What about ordinary variables, such as these :

$car ="Porsche 911";
$aircraft="G-BBNX";

&look;

$car="";

&look;

sub look {
        print "Car :$car: Aircraft:$aircraft:\n";
        print "Aircraft exists !\n" if $aircraft;
        print "Car exists !\n" if $car;
}

It looks like we have deleted the

$car

variable. Pity. But think about it. It is not deleted, it is just set to the null string "". As you recall (hopefully) from previous ramblings, the null string evaluates to false so the if test fails.

Just because something is false doesn't mean to say it doesn't exist. A wig is false hair, but a wig exists. Your variable is still there. Perl does have a function to test if something exists. Existence, in Perl terms, means defined. So :

print "Car is defined !\n" if defined $car;

will evaluate to true, as the

$car

variable does in fact exist.

This begs the question of how to really wipe variables from the face of the earth, or at least your Perl script. Simple.

$car     ="Porsche 911";
$aircraft="G-BBNX";

&look;

undef $car; # this undefines $car

&look;

sub look {
        print "Car :$car: Aircraft:$aircraft:\n";
        print "Aircraft exists !\n"  if $aircraft;
        print "Car exists !\n"       if defined $car;
}

This variable $car is eradicated, deleted, killed, destroyed.

And now for something completely different....

Regular Expressions

Or regex for short. These can be a little intimdating. But I'll be you have already used some regex in your computing life so far. Have you even said "I'll have any German beer ?" That's a regex which will match a Grolsch or Becks, but not a Budweiser, orange juice or cheese toastie. What about dir *.txt ? That's a regular expression too, listing any files ending in .txt.

Perl's regex often look like this :

$name=~/piper/

That is saying "If 'piper' is inside $name, then True."

The regular expression itself is between / / slashes, and the =~ operator assigns the target for the search.

An example is called for. Run this, and answer it with 'the faq'. Then try 'my tealeaves' and see what happens.

print "What do you read before joining any Perl discussion ? ";
chomp ($_=<STDIN>);

print "Your answer was : $_\n";

if ($_=~/the faq/) {
        print "Right !  Join up !\n";
} else {
        print "Begone, vile creature !\n";
}

So here $_ is searched for 'the faq'. Guess what we don't need ! The

=~

. This works just as well:

if (/the faq/) {

because if you don't specify a variable, then perl searches

$_

by default. In this particular case, it would be better to use

 if ($_ eq "the faq") {

as we are testing for exact matches.

But what if someone enters 'The FAQ' ? It fails, because the regex is case sensitive. We can easily fix that :

if (/the faq/i) {

with the /i switch, which specifies case-insensivity. Now it works for all variations, such as "the Faq" and "the FAQ".

Now you can appreciate why a regular expression is better in this situation than a simple test using eq . As the regex searches one string for another string, a response of "I would read the FAQ first !" will also work, because "the FAQ" will match the regex.

Study this example just to clarify the above. Tabs and spaces have been added for asthetic beauty :

$_="perl for Win32";                            # sets the string to be searched

if ($_=~/perl/) { print "Found perl\n" };       # is 'perl' inside $_ ?  $_ is "perl for Win32".
if (/perl/)     { print "Found perl\n" };       # same as the regex above.  Don't need the =~ as we are testing $_
if (/PeRl/)     { print "Found PeRl\n" };       # this will fail because of case sensivitiy
if (/er/)       { print "Found er\n" };         # this will work, because there is an 'er' in 'perl'
if (/n3/)       { print "Found n3\n" };         # this will work, because there is an 'n3' in 'Win32'
if (/win32/)    { print "Found win32\n" };      # this will fail because of case sensivitiy
if (/win32/i)   { print "Found win32 (i)\n" };  # this will *work* because of case insensivitiy (note the /i)

print "Found!\n"  if      / /;                  # another way of doing it, this time looking for a space

print "Found!!\n" unless $_!~/ /;               # both these are the same, but reversing the logic with unless and !
print "Found!!\n" unless    !/ /;               # don't do this, it will always never not confuse nobody :-)
                                                # the ~ stays the same, but = is changed to ! (negation)

$find=32;                                       # Create some variables to search for
$find2=" for ";                                 # some spaces in the variable too

if (/$find/)  { print "Found '$find'\n" };      # you can search for variables like numbers
if (/$find2/) { print "Found '$find2'\n" };     # and of course strings !

print "Found $find2\n" if /$find2/;           # different way to do the above

As you can see from the last example, you can embed a variable in the regex too. Regular expressions could fill entire books (and they have done, see the book critiques at http://www.perl.com/) but here are some useful tricks:

@names=qw(Karlson Carleon Karla Carla Karin Carina Needanotherword);

foreach (@names) {                      # sets each element of @names to $_ in turn
        if (/[KC]arl/) {                # this line will be changed a few times in the examples below
                print "Match !  $_\n";
        } else {
                print "Sorry.   $_\n";
        }
}

This time @names is initalised using whitespace as a delimiter instead of a comma.

qw

refers to 'quote words', which means split the list by words. A word ends with whitespace (like tabs, spaces, newlines etc).

The square brackets enclose single characters to be matched. Here either

Karl

or Carl must be in each element. It doesn't have to be two characters, and you can use more than one set. Change Line 4 in the above program to :

if (/[KCZ]arl[sa]/) {

matches if something begins with K, C, or Z, then arl, then either s or a. It does not match KCZarl. Negation is possible too, so try this :

if (/[KCZ]arl[^sa]/) {

which returns things beginning with K, C or Z, then arl, and then anything EXCEPT s or a. The caret ^ has to be the first character, otherwise it doesn't work as the negation. Having said

[
]

defines single characters only, I should mention than these two are the same :

/[abcdeZ]arl/;
/[a-eZ]arl/;

if you use a hypen then you get the list of characters icluding the start and finish characters. And if you want to match a special character (metacharacter), you must escape it :

/[\-K]arl/;

matches Karl or -arl. Although the

character is represented by two characters, it is just the one character to match.

If you want to match at the end of the line, make sure a $ is the last character in the regex. This one pulls out all those names ending in a. Slot it into the example above :

if (/a$/) {

And there is a corresponding character, the caret

, which in this context matches at the beginning of the string. Yes, the caret also negates a character class like this [^KCZ]arl but in this case it anchors the match to the beginning of the string.

if (/n/i)  {
if (/^n/i) {

The first one is true if the word contains an 'n' anywhere in it. The second specifies that the 'n' must be at the beginning of the string to be matched. Use this anchor where you can, because it makes the whole regex faster, and safer if you know what the first character must be.

If you want to negate the entire regex change =~ to

!~

(Remember ! means 'not equal to'.)

if ($_ !~/[KC]arl/) {

Of course, as we are testing $_ this works too :

if (!/[KC]arl/) {

Returning the Match

Now things get interesting. What if we want pull something out of a string ? So far all we have done is test for truth, that is say yea or nay if a string matches, but not return what we found. Run this :

$_='My email address is <Robert@NetCat.co.uk>.';

/(<robert\@netcat.co.uk>)/i;

print "Found it ! $1\n";

Firstly, note the single quotes when

$_

is assigned. If there were double quotes, we'd need \@ instead of @ . Remember, double quotes "" allow variable interporlation, so Perl looks for an array called

@NetCat

which does not exist.

Secondly, look at the parens around the entire regex. If you use parens, a side effect is that the first match is put into a variable called $1 . We'll get to the main effect later. The second match goes into

$2

and so on. Also note that the

\@

has been escaped, so perl doesn't think it is an array. Remember \ either escapes a special character, or gives a special meaning. Think of it as Superman's telephone box. Imagine Clark Kent walking around with with his magic partner Back Slash.

Notice how we specify in the regex case-insensitivity with /i and the regex returns the case-sensitive string - that is, exactly what it found.

Try the regex without parens. Then try this one :

/<(robert)\@netcat.co.uk>/i;

You can put the parens anywhere. More or less. Now, run this :

$_='My email address is <Robert@NetCat.co.uk>.';

/<(robert)\@(netcat.co.uk)>/i;

print "Found it ! $1 at $2\n";

See, you can have more than one ! Look at the above regex. Looks easy now, don't you think ? What about five minutes ago ? It would have looked like a typing mistake ! Well, there are some hairer regex to come, but you'll have a good barber.

What if we didn't know what the email address was going to be ?

$_='My email address is <webslave@work.com>.';

print "Found it ! :$1:" if /(<.*>)/i;

When you see an if statement like this, read it right to left. The

print

statement is only executed if code on the right of the expression is true.

We'll discuss this. Firstly, we have the opening parens ( . So everything from ( to

will be put into $1 if the match is successful. Then the first character of what we are searching for, < . Then we have a dot, or period . . For this regex, we can assume . matches any character at all.

So we are now matching

followed by any character. The

means 0 or more of the previous character. The regex finishes by requiring > .

This is important. Get the basics right and all regex are easy (I read somewhere once). An example best illustrates the point. Slot this regex in instead :

$_='My email address is <webslave@work.com>.';

print "Found it ! :$1:" if /(<*>)/i;

What's happening here ?

The regex starts, logically, at the start of the string. The first thing it finds is not

, but a nothing in between the start of the string and the 'M' from 'My email...". Does this match ?

As are are looking for "0 or more" < , we can certainly say that there are 0 < at the start of the string. So the match is, so far, successful. We have dealt with <* .

However, the next item to match is > . Unfortunately, the next item in the string is 'M', from 'My email..". The match fails at this point. Sure, it matched

without any problem, but the complete match has to work.

The only two characters that can match successfully at this point are

or > . The < matches because it falls into the '0 or more' specified with

, and > will match because it is the next character specifed in the regex.

'M' is neither of them, so it fails.

Quick clarification - the regex cannot successfully match < , then skip on ahead until it matches

. The characters in between

<
>

also need to match the regex, and they don't in this case.

All is not lost. Regexes are hardy little beasts and don't give up easily. An attempt is made to match the regex wherever possible. The regex system keeps trying the match at every possible place in the string, working towards the end.

Let's look at the match when it reaches the 'm' in 'work.com'.

Again, we have here 0

. So the match works as before. After success on

<*

the next character is analysed - it is a

, so the match is successful.

That's

explained. Just to consolidate, a quick look at :

$_='My email address is <webslave@work.com>.';
print "Match 1 worked :$1:" if /(<*)/i;

$_='<My email address is <webslave@work.com>.';
print "Match 2 worked :$1:" if /(<*)/i;

$_='My email address is <webslave@work.com<<<<>.';
print "Match 3 worked :$1:" if /(<*>)/i;

Match 1 is true. It doesn't return anything, but it is true because there are 0 < at the very start of the string.

Match 2 works. After the 0

at the start of the string, there is 1

so the regex can match that too.

Match 3 works. After the failing on the first < , it jumps to the second. After that, there are plenty more to match right up until the required ending.

Glad you followed that. Now, pay even closer attention ! Concentrate fully on the task at hand ! This should be straightforward now :

$_='HTML <I>munging</I> time !.';

/<I>(.*)<\/I>/i;

print "Found it ! $1\n";

Pretty much the same as the above, except the parens are moved so we return what's only inside the tags, not including the tags themselves. Also note how / is escaped like so : \/ otherwise Perl thinks that's the end of the regex.

Now, suppose we change

$_

to :

$_='HTML <I>munging</I> time is here <I>again</I> !.';

and run it again. Interesting effect, eh ? This is known as Greedy Matching. What happens is that when Perl finds the inital match, that is <I> it jumps right to the end of the string and works back from there to find a match, so the longest string matches. This is fine unless you want the shortest string. And there is a solution :

/<I>(.*?)<\/I>/i;

Just add a question mark and Perl does stingy matching. No nationalistic jokes. I have Dutch and Scottish friends I don't want to offend.

Suppose we didn't know what HTML tag we had to match ? It could be B, I, EM or whatever, and we want everything that is in between. Well, HTML container tags like B and EM have end tags which are the same as the start tag, except for the / . So what we could do is :

find out what is inside < >
search for exactly the same tag, but with the closing /
return whatever is in between.

Can this be done ? Of course. This is perl, all things are possible. Now, remember the side effect of parens. I promise I'll explain the primary effect at some point. If whatever is in (parens) matches, the result is stored in a variable called $1 . So we can use <(.*?)> which will find us < then as many anythings (the . and

) up to the next, not last

(the ? forces stingy matching).

The result is stored in

$1

because we used parens. Next, we need everything up to the closing tag. That's easy :

(.*?)

matches everything up until the next character or set of characters. And how exactly do we define where to stop ?

We can use

$1

even in the same regex it was found in. However, it is not referred to within a regex as $1 , but \1 .

So we want to match

</$1>

which in perl code is

<\/\1>

. The / must be escaped because it is the end of the regex, and

is escaped so it refers to

$1

instead of matching the number 1.

Still here ? This is what it looks like :

$_='HTML <I>munging</I> time is here <I>again</I> !.';
/<(.*?)>(.*?)<\/\1>/i;

print "Found it ! $2\n";

If you want to know how to return all the matches above, read on. But before that - How to Avoid Making Mountains while Escaping Special Characters.

You want to match this : http://language.perl.com/faq/ . That's a real (useful) URL by the way. Hint. To match it, you need to do this :

/http:\/\/language\.perl\.com\/faq\//;

which should make the awful metaphor above clearer, if not funnier. The slash, / , is not normally a metacharacter but as it is being used for the regular expression delimters, it needs to be escaped. We already know that

is special.

Fortunately for our eyes, Perl allows you to pick your delimter if you prefix it with 'm' as this example shows. We'll use a # :

m#http://language\.perl\.com/faq/#;

Which is a huge improvement, as we change

to # . We can go further with readability by quoting everything :

m#\Qhttp://language.perl.com/faq/\E#;

The \Q escapes everything up until \E or the regex delimiter (so we don't really need the \E above). In this case

will not be escaped, as it delimits the regex.

Someone once posted a question about this to the Perl-Win32-Users mailing list and I was so intruiged about this apparently undocumented trick I spent the next twenty minutes figuring it out by trial and error, and posted a reply. Next day I found lots of messages telling the poster to read the manual because it was clearly documented. <face colour='red' intensity='high'> My excuse was I didn't have the docs to hand....moral of the story - RTFM and RTF FAQs !

Subsitution

Suppose you want to replace bits of a string. For example, 'us' with 'them'.

$_='Us ? The bus usually waits for us, unless the driver forgets us.';

print "$_\n";

s/Us/them/;   # operates on $_, otherwise you need $foo=~s/Us/them/;

print "$_\n";

What happens here is that the string 'Us' is searched for, and when a match is found it is replaced with the right side of the expression, in this case 'them'. Simple.

You'll notice that only one substition was made. To match globally use

/g

which runs through the entire string, changing wherever it can. Try:

s/Us/them/g;

which fails. This is because regexes are not, by default, case-sensitive. So:

s/us/them/ig;

would be a better bet. Now, everything is changed. A little too much, but one problem at a time. Everything you have learn about regex so far can be used with s/// , like parens, character classes [ ] , greedy and stingy matching and much more. Deleting things is easy too. Just specify nothing as as the replacement character, like so s/Us//; .

So we can use some of that knowledge to fix this problem. We need to make sure that a space precedes the 'us'. What about :

s/ us/them/g;

An small improvement. The first 'Us' is now no longer changed, but one problem at a time ! We'll first consider the problem of the regex changing 'usually' and other words with 'us' in them.

What we are looking for is a space, then 'us', then a comma, period or space. We know how to specify one of a number of options - the character class.

s/ us[. ,]/them/g;

Another tiny step. Unfornately, that step wasn't really in the right direction, more on the slippery slope to Poor Programming Practice. Why ? Because we are limiting ourselves. Suppose someone wrote ' send it to us; when we get it'.

You can't think of all the possible permutations. It is often easier, and safer, to simply state what must not follow the match. In this case, it can be anything except a letter. We can define that as a-z. So we can add that to the regex.

s/ us[^a-z]/ them/g;

the caret ^ negates the character class, and a-z represents every alphabet from a to z inclusive. A space has been added to the substiution part - as the orignal space was matched, it should be replaced to maintain readability.

What would be more useful is to use a-zA-Z instead. If we weren't using /i we'd need that. As a-zA-Z is such a common construct, Perl provides an easy shorthand :

s/ us[^\w]/ them/g;

The \w construct actually means 'word' - equivalent to a-zA-Z_0-9 . So we'll use that instead.

To negate any construct, simply captalise it :

s/ us[\W]/ them/g;

and of course we don't need the negating caret now. In fact, we don't even need the character class !

s/ us\W/ them/g;

So far, so good. Matching the first 'us' is going to be difficult though. Fortunately, there is an easy solution. We've seen Perl's definition of a word - \w . Between each word is a boundary. You can match this with

\b

s/\bus\W/ them/g;

(that's \b followed by 'us', not 'bus' :-)
Now, we require a word boundary before 'us'. As there is a 'nothing' at the start of the string, we have a match. There is a space after the first 'Us', so the match is successful. You might notice an extra space has crept in - that's the space we added earlier. The match doesn't include the space any more - it matches on the word boundary, that is just before the word begins. The space doesn't count.

Did you notice the final period and the comma are replaced ? They are part of the match - it is the

\W

that matches them. We can't avoid that. We can however put back that part of the match.

s/\bus(\W)/them\1/g;

We start with capturing whatever the

\W

matches, using parens. Then, we add it to the replacement string. The capture is of course in

$1

, but as it is in a regex we refer to it as

\1

The final problem is of course captalising the replacement string when appropiate. Well, I have to leave something as an excerise for the reader :-)

There are several more constructs. We'll take a quick look at \d which means anything that is a digit, that is

0-9

. First we'll use the negated form,

\D

, which is anything except

0-9

print "Enter a number :";
chop ($input=<STDIN>);

if ($input=~/\D/) {
        print "Not a number !!!!\n";
} else {
        print 'Your answer is ',$input x 3,"\n";

}

this checks that there are no non-number characters in

$x

. It's not perfect because it'll choke on decimal points, but it's just an example. Writing your own number-checker is actually quite difficult, but it is an interesting exercise. Try it, and see how accurate yours is.

I hope you trusted me and typed the above in exactly as it is show (or pasted it), becaus the

is not a mistake, it is a feature. If you were too smart and changed it to a * or something change it back and see what it does.

Of course, there is another way to do it :

unless ($input=~/\d/) {
        print 'Your answer is ',$input x 3,"\n";
} else {
        print "Not a number !!!!\n";
}

which reverses the logic with an

unless

statement.

More Matching

Assume we have :

$_='HTML <I>munging</I> time is here <I>again</I> !.';

and we want to find all the italic words. We know that

/g

will match globally, so surely this will work :

$_='HTML <I>munging</I> time is here <I>again</I> ! What <EM>fun</EM> !';

$match=/<i>(.*?)<\/i>/ig;

print "$match\n";

except it returns 1, and there were defintely two matches. The match operator returns true or false, not the number of matches. So you can test it for truth with functions like

if,
while, unless

Incidentally, the

s///

operator does return the number of substutions.

To return what is matched, you need to supply a list.

($match) =~ /<i>(.*?)<\/i>/i;

which handily puts all the first matche into

$match

. The parens force a list context in this case. There is just the one element in the list, but it is still a list. The entire match will be assigned to the list, or whatever is in the parens. Try adding some parens :

$_='HTML <I>munging</I> time is here <I>again</I> ! What <EM>fun</EM> !';

($word1, $word2) = /<i>(.*?)<\/i>/ig;

print "Word 1 is $word1 and Word 2 is $word2\n";

In the example above notice /g has been added so a global replacement is done - this means perl carries on matching even after it finds the first match. Of course, you might not know how many matches there will be, so you can just use an array (or other type of list) :

$_='HTML <I>munging</I> time is here <I>again</I> ! What <EM>fun</EM> !';

@words = /<i>(.*?)<\/i>/ig;

foreach $word (@words) {
        print "Found $word\n";
}

and @words will be grown to the appropiate size for the matches. You really can supply what you like to be assigned to :

($word1, @words[2..3], $last) = /<i>(.*?)<\/i>/ig;

you'll need more italics for that last one to work. It was only a demonstration.

There is more another trick worth knowing. Because a regex returns true each time it matches, we can test that and do something every time it returns true. The ideal function is while which means 'do something as long the condition I'm testing is true'. In this case, we'll print out the match every time it is true.

$_='HTML <I>munging</I> time is here <I>again</I> ! What <EM>fun</EM> !';

while (/<(.*?)>(.*?)<\/\1>/g) {
        print "Found the HTML tag $1 which has $2 inside\n";
}

So the while operator runs the regex, and if it is true, carries out the statements inside the block.

Try running the program above without the /g . Notice how it loops forever ? That's because the expression always evaluates to true. By using the /g we force the match to move on until it eventually fails.

Now we know this, an easy way to find the number of matches is :

$_='HTML <I>munging</I> time is here <I>again</I> ! What <EM>fun</EM> !';

$found++ while /<i>.*?<\/i>/ig;

print "Found a $found matches\n";

You don't need braces in this case as nothing apart from the expression to be evaluated follows the

while

function.

Parentheses Again

The real use for them. Precendence. Try this :

$_='One word sentences ? Eliminate. Avoid cliches like the plague.  They are old hat.';

while (/o(rd|ne|ld)/gi) {
        print "Matched $1\n";
}

Firstly, notice the subtle introduction of the

or

operator, in this case | , the pipe. What I really want to explain however, is that this regex matches o followed by rd, ne or ld. Without the parens it would be /ord|ne|ld/ which is defintely not what we want. That matches just plain ord, or ne or ld.

Finally, take a look at this :

$_='I am sleepy....zzzz....DING ! Wake Up!';

if (/(z{5})/) {
        print "Matched $1\n";
} else {
        print "Match failed\n";
}

The braces { } specify how many of the preceding character to match. So

z{2}

matches exactly two 'z's and so on. Change

z{5}

to z{4} and see how it works. And there's more...

/z{3}/	3 z only
/z{3,}/	At least 3 z
/z{1,3}/	1 to 3 z
/z{4,8}/	4 to 8 z

To any of the above you may suffix an question mark, the effect of which is demonstrated thus :

print "How many letters do you want to match ? ";
chomp($num=<STDIN>);

# we assign and print in one smooth move
print $_="The lowest form of wit is indeed sarcasm, I don't think.\n";

print "Matched \\w{$num,} : $1 \n"  if /(\w{$num,})/;

print "Matched \\w{$num,?}: $1 \n"  if /(\w{$num,}?)/;

The first match is 'match any word (that's a-Z0-9_) equal to or longer than $num character, and return it.' So if you enter 4, then 'lowest' is returned. The word 'The' doesn't match.

The second match is exactly the same, but the ? forces a minimal match, so only the part actually matched is returned.

Just to clear this up, amend the program thus :

print "\nMatched \\w{$num,} :";
print "$1 " while /(\w{$num,})/g;

print "\nMatched \\w{$num,?} :";
print "$1 " while /(\w{$num,}?)/g;

Note the addition of /g . Try it without - notice how the match never moves on ?

And now on the Regex Programme Today, we have guest stars Prematch, Postmatch and Match. All of whom are going to slow our entire programme down, but are useful anyway :

$_='I am sleepy....snore....DING ! Wake Up!';

/snore/;        # look, no parens !

print "Postmatch: $'\n";
print "Prematch: $`\n";
print "Match: $&\n";

If you are wondering what the difference between match and using parens is you should remember than you can move the parens around, but you can't vary what $& and its ilk return. Also, using any of the above three operators does slow your entire program, whereas using parens will just slow the particular regex you use them for. However, once you've used one of the three matches you might as well use them all over the place as you've paid the speed penalty. Use parens where possible.

RHS Expressions

RHS means Right Hand Side. Suppose we have an HTML file, which contains :

<FONT SIZE=2> <FONT SIZE=4> <FONT SIZE=6>

and we wish to double the size of each font so 2 becomes 4 and 4 becomes 8 etc. What about :

$data="<FONT SIZE=2> <FONT SIZE=4> <FONT SIZE=6>";

print "$data\n";

$data=~s/(size=)(\d)/\1\2 * 2/ig;

print "$data\n";

which doesn't really work out. What this does is match size=x, where x is any digit. The first match, size=, goes into $1 and the second match, whatever the digit is, goes into

$2

. The second part of the regex simply prints

$1

and $2 (referred to as \1 and

\2

), and attempts to multiply

$2

by 2. Remember /i means case insensitve matching.

What we need to do is evaluate the right hand side of the regex as an expression - that is not just print out what it says, but actually evaluate it. That means work it through, not blindly treat it as string. Perl can do this :

$data=~s/(size=)(\d)/$1.($2 * 2)/eig;

A little explanation....the LHS is the same as before. We add

/e

so Perl evaluates the RHS as an expression. So we need to change \1 into

$1

and so on. The parens are there to ensure that

$2
* 2

is evaluated, then joined to

$1

. And that's it ! Here's another example, which is a little more cunning :

$red="96000";
$white="FFFFF";
$yellow="FFFF33";

$data='<FONT COLOR=yellow> <FONT COLOR=white> <FONT COLOR=red>';

print "$data\n";

$data=~s/(color=)(\w+)/$1.${$2}/eig;

print "$data\n";

This one is interesting because it refers to a variable of the same name as the replaced string. The { } are needed around $2 to force Perl to realise that you mean a scalar variable with the name of whatever $2 . contains. The \w matches a single word character, + means 'one or more of' the character immediately before it. In this case,

\w

Of course, this regex does not consider quoted parameters to HTML tags but I have to leave something as an exercise for the reader...

It is even possible to have more than one /e . For example:

$data='important perl names are $names';
$names="Camel, Llama, ActiveState, Perl";

print "$data\n";

$data=~s/(\$[a-zA-Z]+)/$1/ee;

print "$data\n";

This is very useful. Notice that \w is not used. This is because \w will match [a-zA-Z_0-9] and Perl variables may not start with a number. This is because

$1,
$2

etc are of course reserved for use by regex. You could write a more complicated regex to more precisely match variables, but that's a start.

Split and Join

While you are in the regex mood, a quick look at split and join . Destruction is always easier (just ask your car mechanic), so lets start with

split

$_='Piper:PA-28:Archer:OO-ROB:Antwerp';

@details=split /:/, $_;

foreach (@details) {
        print "$_\n";
}

Here we give split is given two arguments. The first one is a regex specifying what to split on. The next is what to split. Actually, I could leave

$_

out because as usual it is the default if nothing is specifed.

The assignment can either be a scalar variable or a list like an array (or hash, but at this time 'hash' to you means what you think the Dutch do or a silly drinking event spoilt by some running). If it's a scalar variable you get the number of elements the split has splut. Should that be 'the split has splittered' or 'the split has splat'. Hmmm. Probably 'the split has split'. You know what I mean. I think I just generated a Fatal Error in English.dll. Whoops. In any case, splitting to a scalar variable is not always a Good Thing, as we'll see later.

If the assigment is an array, then as you can see in the above example the array is created with the relevant elements in order. You can also assign to scalars, for example :

$_='Piper:PA-28:Archer:OO-ROB:Antwerp';

($maker,$model,$name,$reg,$location) = split /:/, $_;
(@aircraft[0..1],$aname,@regdetails) = split /:/, $_;

$number=split /:/ ;             # not bothering with the $_ at the end, as it is the default

print "Using the first 'split'\n";
print "$reg is a $maker $model $name based in $location\n";
print "There are $number details available on this aircraft\n\n";

print "Using the second 'split'\n";
print "You can find $regdetails[0], an $aircraft[1], $regdetails[1]\n";

This demonstrates that a list can be a list of scalar variables (which is bascially what an array is anyway), and that you can easily see how many elements the expression can be split into.

The example below adds a third paramter to split, which is how many elements you want returned. If you don't want the extra stuff at the end

pop

it.

$_='Piper:PA-28:Archer:OO-ROB:Antwerp';

@details=split /:/, $_, 3;

foreach (@details) {
        print "$_\n";
}

In the example below we split on whitespace. Whitespace, in perl terms, is a space, tab, newline, formfeed or carraige return. Instead of writing \t\n\f\r for each of the above, you can simply use

\s

, or the negated version

\S

which means anything except whitespace. Think of whitespace as anything you know is there, but you can't see.

The whitespace

split

is specially optimised for speed. I've used spaces, double spaces, a tab and a newline in the list below. Also note the

, which means one or more of the preceding character, so it will split on any combination of whitespace. And I think the final

split

is useful to know. The

split

function does not return the delimiter, so in this case the whitespace will not be returned.

$_='Piper       PA-28  Archer           OO-ROB
Antwerp';

@details=split /\s+/, $_;

foreach (@details) {
        print "$_\n";
}

@chars=split //, $details[0];

foreach $char (@chars) {
        print "$char !\n";
}

The following question has come up at least three times in the Perl-Win32-Users mailing list. Can you answer it ?

"My data is delimited by |, for example :
name|age|sex|height|
Why doesn't
@array=split /|/, $line;
work ?"

Why indeed. If you don't already know the answer, some simple troubleshooting steps can be applied. First, create a sample program and run it.

$line='name|age|sex|height';

@array=split /|/,$line;

print join "\n",@array;

The effect is to split each character. The | is returned/. As it is the delimiter,

should be ignored, not returned.

At this point you should be thinking 'metacharacter'. A little research (looking at the documentation) will reveal that | is indeed a metacharacter, which means 'or'. So, in effect, the regex /|/ means 'nothing, or nothing'. The split is therefore performed on 'nothings', and there are 'nothings' in between each character. The solution is easy ;

/\|/

So that's the fun stuff, destruction. Now to put it back together again with

join

What Humpty Dumpty needs : Join

$w1="Mission critical ?";
$w2="Internet ready modems !";
$w3="J(insert your cool phrase here)";  # anything prefixed by 'J' is now cool ;-)
$w4="y2k compatible.";
$w5="We know the Web.";
$w6="...the leading product in an emerging market.";

$cool=join ' ', $w1,$w2,$w3,$w4,$w5,$w6;

print $cool;

Join takes a 'glue' operator, which is not a regular expression. It can be a scalar variable however. In this case it is a space. Then it takes a list, which can either be a list of scalar variables, an array or whatever as long as its a list. And you can see what the result is. You could assign it to an array, but you'd end up with everything in the first element of the array.

The example below adds an array into the list, and demonstrates use of a variable as the delimiter.

$w1="Mission critical ?";
$w2="Internet ready modems !";
$w3="J(insert your cool phrase here)";  # anything prefixed by 'J' is now cool ;-)
$w4="y2k approved, tested and safe !";
$w5="We know the Web.";
$w6="...the leading product in an emerging market.";
@morecool=("networkable","compatible");

$sep=" ";

$cool=join $sep, $w1,$w2,$w3,@morecool,$w4,$w5,$w6;

print $cool;

Aren't you wishing you could mix and match randomly so you too could get a job marketing vapourware ? Heh.

@cool=("networkable","compatible","Mission critical ?","Internet ready modems !",
"J(insert your cool phrase here)","y2k approved, tested and safe !",
"We know the Web.","...the leading product in an emerging market.");
srand;

print "How many phrases would you like ?";
while (1) {
        chop ($input=<STDIN>);
        if ($input <$#cool && $input > 0) {
                last;
        }
        print 'Wrong.  Try again !';
}

for (1..$input) {
        $index=int(rand $#cool);
        print "$cool[$index] ";
        splice @cool, $index, 1;
}

A few things to explain. Firstly,

while
(1) {

. We want an everlasting loop, and this one way to do it. 1 is always true, so round it goes. We could test

$input

directly, but that wouldn't allow

last

to be demonstrated.

Everlasting loops aren't useful unless you are a politician being interviewed. We need to break out at some point. This is done by the

last

function. When $input is between 1 and the number of elements in

@cool

then out we go. (You can also break out to labels, in case you were wondering. And break out in a sweat. Don't start now if you weren't.)

The

srand

operator initalises the random number generator. Works ok for us, but CGI programmers should think of something different because their programs are so frequently run (they hope :-).

rand

generates a random number between 0 and 1, or 0 and a number it is given. In this case, the number of elements of

@cool

. The int function makes sure it is an integer, that is no messy bits after the decimal point.

The

splice

function removes the printed element from the array so it won't appear again. Don't want to stress the point.

Another Join Type Operator

There is another joining operator, this time the humble dot, or period : . . This concatanates (joins) variables :

$x="Hello";
$y=" World";
$z="\n";

print "$x\n";           # print $x and a newline

$prt=$x.$y.$z;          # make a new var $prt out of $x, $y and $z

print $prt;

$x.=$y." again ".$z;    # add stuff to $x

print $x;

Files

Perl is very good at handling files. Create, in your perl scripts directory c:\scripts, a file called stuff.txt. Copy the following into it :

The Main Perl Newsgroup:comp.lang.perl.misc
The Perl FAQ:http://www.perl.com/faq/
Where to download perl:http://www.activestate.com/

Now, to open and do things with this file. First, we must open the file and assign it to a filehandle. All operations will be done on the file via the filehandle. Earlier, we used

<STDIN>

as a filehandle - we read from it.

$stuff="c:\scripts\stuff.txt";

open STUFF, $stuff;

while (<STUFF>) {
        print "Line number $. is : $_";
}

What this script does is fail. What is should do is open the file defined in $stuff , assign it to the filehandle STUFF and then, while there are still lines left in the file, print the line number $. and the current line.

It fails. That's not so bad, everything fails sometimes. What is unforgivable is NOT CHECKING THE ERROR CODE !

This is a better line:

open STUFF, $stuff or die "Cannot open $stuff for read :$!";

If the open operation fails, the or means that the code on the RHS (right hand side) is evaluated. Perl dies. This means it exits the script with a and tells you the line number at which it died.. The error code is in $! , which we print.

Always check your return codes !

The problem should now be apparent. The backslashes, being escape characters, are not displayed. There are two ways to fix this :

Escape the backslashes, like so $stuff="c:\\scripts\\stuff.txt";
Convert backslashes into forward slashes : $stuff="c:/scripts/stuff.txt";

The forward slashes are the preferred option, even under Win32, because you can then port the script direct to Unix or other platforms (assuming you don't use drive letters), and it is less typing. If you wish to use Perl to start external processes then you must use the

\\

method, but this variable will be used only in a Perl program, not as a parameter to start an external program. Changing the $stuff variable results in a working script. Always check your return codes !

$stuff="c:/scripts/stuff.txt";

open STUFF, $stuff or die "Cannot open $stuff for read :$!";


while (<STUFF>) {
        print "Line $. is : $_";
}

A little more detail on what is happening here. The file is opened for read. You can append and write too. You don't have to use a variable, but I always do because it is then easy to change and easy to insert into the or die section, and it is easy to change later on. Hardcoding things is not the best way to write a maintainable and flexible program. Just ask the Year 2000 people about code that lived a little longer than the authors imagined :-).

open STUFF, "c:/scripts/stuff.txt" or die "Cannot open stuff.txt for read :$!";

is just as good but more work if you want to change anything.

The line input operator (that's the angle brackets <> reads from the beginning of the file up until and including the first newline. The read data goes into $_ , and you can do what you want with it there. On the next iteration of the loop data is read from where the last read left off, up to the next newline. And so on until there is no more data. When that happens the condition is false and the loop terminates. That's the default behaviour, but we can change this.

This means that you can open a 200Mb file in perl and run through it without having to load the entire file into memory. 200Mb of memory is quite a bit. If you really want to load the entire 200Mb file into one variable, Perl lets you. Limits are not the Perl Way.

The special variable

$.

is the current line number, starting at 1.

As usual, there is a quicker way to do the previous program.

$STUFF="c:/scripts/stuff.txt";

open STUFF or die "Cannot open $STUFF for read :$!";

while (<STUFF>) {
        print "Line $. is : $_";
}

and as that saves a little bit of typing I tend to use it. Reduces the possibility for eror too. In fact, that entire program could be compressed further, but that's for later.

Writing to a File

$out="c:/scripts/out.txt";

open OUT, ">$out" or die "Cannot open $stuff for write :$!";

for $i (1..10) {
        print OUT '$i : The time is now : ',scalar(localtime);
}

Note the addition of > to the filename. This opens it for writing. If we want to print to the filehandle, we now just specify the filehandle name. Filehandles don't have to be captalised, but it is wise. All Perl functions are lowercase, and Perl is case-sensitive. So if you choose uppercase names they are guaranteed not to conflict with current or future function words.

And a neat way to grab the date sneaked in there too. More on dates later.

$out="c:/scripts/out.txt";

&printfile;

open OUT, ">>$out" or die "Cannot open $out for append :$!";

print OUT 'The time is now : ',scalar(localtime),"\n";

close OUT;

&printfile;

sub printfile {
        open IN, $out or die "Cannot open $out for read :$!";
        while (<IN>) {
                print;
        }
        close IN;
}

This script demonstrates subrountines again, and how to append to a file, that is write additional data at the end. The

close

function is introduced here. This, well, closes a filehandle. You don't have to close a filehandle - just leave it open until the script finishes, or the next open command to the same filehandle will close it for you.

Perl has a special array called @ARGV . This is the list of arguments passed along with the script name on the command line. Run the following perl script as :

perl myscript.pl hello world how are you


foreach (@ARGV) {
        print "$_\n";
}

Another useful way to get parameters into a program - this time without user input. The relevance to filehandles is as follows. Run the following perl script as :

perl myscript.pl stuff.txt out.txt

while (<>) {
        print;
}

Short and sweet ? If you don't specify anything in the angle brackets, whatever is in @ARGV is used instead. And after it finishes with the first file, it will carry on with the next and so on. You'll need to remove non-file elements from @ARGV before you use this.

It can be shorter still :

perl myscript.pl stuff.txt out.txt

print while <>

Read it right to left. It is possible to shorten it even further !

perl myscript.pl stuff.txt out.txt

print <>;

This takes a little explanation. As you know, many things in Perl, including filehandles, can be evaluated in list or scalar context. The result that is returned depends on the context.

If a filehandle is evaluated in scalar context, it returns the first line of whatever file it is reading from. If it is evaluated in list context, it returns a list, the elements of which are the lines of the files it is reading from.

The

print

function is a list operator, and therefore evaluates everything it is given in list context. As the filehandle is evaluated in list context, it is given a list !

Who said short is sweet ? The shortest scripts are not usually the easiest to understand, and not even always the quickest.

Modifying a File

One of the most frequent Perl tasks is to open a file, make some changes and write it back to the original filename. You already have enough knowledge to do this. The steps are :

Make a backup copy of the file
Open the file for read
Open a new temporary file for write
Go through the read file, and write it and any changes to the temp file
When finished, close both files
Delete the original file
Rename the temp file to the original filename

Phew. Perl of course has a much easier way. Make sure you have data in

c:\scripts\out.txt

then run this:

@ARGV="c:/scripts/out.txt";

$^I=".bk";              # let the magic begin

while (<>) {
        tr/A-Z/a-z/;    # another new function sneaked in
        print;          # this goes to the temp filehandle, ARGVOUT, not STDOUT as usual, so don't mess with it !
}

Now take a look at out.txt . Notice how all capital letters have been translierated into lowercase. This is the tr operator at work, which is more efficient than regex for changing single characters. You should also have an

out.txt.bk

file. And finally, notice the way

@ARGV

has been created. You don't have to create it from the command line arguments - it is an array just like any other.

Finally, what if your input file is doesn't look like this :

Beer
Wine
Pizza
Catfood

which is nicely delimited with a newline each time, but like this :

shorts
t-shirt
blouse

pizza
beer
wine
catfood

Viz
Private Eye
The Independent
Byte

toothpaste
soap
towel

which is delimited by TWO newlines, not one. Now, if you want each set of items as elements in an array you'll have to do something like this:

$SHOP="shop.txt";
$x=0;

open SHOP or die "Can't open $SHOP for read: $!\n";

while (<SHOP>) {
        if (/^\n/) {            # does line begin with newline ?
                $x++;           # if so, increment $x.  Rest of if statement not executed.
        } else {
                $list[$x].=$_;  # glue $_ on the end of whatever is in $list[$x], using a .
        }               
}

foreach (@list) {
        print "Items are:\n$_\n\n";
}

which works, but there is a much easier way to do it. You knew I was going to say that.

$SHOP="shop.txt";
$/="\n\n";

open SHOP or die "Can't open $SHOP for read: $!\n";

while (<SHOP>) {
        push (@list, $_);
}

foreach (@list) {
        print "Items are:\n$_\n\n";
}

The $/ variable is a special variable (it even looks special). It is the Default Input Record Seperator. Remember the operation of the angle brackets being to read a file in up until the next newline ? Time to come clean. What the angle bracket actually do is read up until whatever $/ is set to. It is set to a newline by default.

So if we set it to two newlines, as above, then it reads up until it finds two consecutive newlines, then puts the data into $_ This makes the program a lot shorter and quicker. You can set

$_

to just about anything, not just a newline. If you want to hack this list for example:

Tea:Beer:Wine:Pizza:Catfood:Coffee:Chicken:Salmon:Icecream

you could just set leave $_ as a newline and slurp it into memory in one go, but imagine the above item is a list of clothes than your girlfriend wants to buy or a list of clothes your boyfriend should have thrown away by now. Either are going to be really big files, and you might not want to read it all into memory in one go. So set

$/=":";

and all will be well. There are also

read

and seek functions, but they aren't covered here. Those are useful for files where you read in a precise number of bytes.

We'll go back to the last example for a moment. It is useful to know how to read just one line (well, up to $/ ) at a time :

$SHOP="shop.txt";
$/="\n\n";

open SHOP or die "Can't open $SHOP for read: $!\n";

$clothes=<SHOP>;        # everything up until the first occurence of $/ into $clothes

$food=<SHOP>;   # everything from first occurence of $/ to the second into into $food

print "We need...\n",$clothes,"...and\n",$food;

And now we know that, there is a even quicker way to achieve the aim of the original program :

$SHOP="shop.txt";
$/="\n\n";

open SHOP or die "Can't open $SHOP for read: $!\n";

@list=<SHOP>;   # dumps *all* of $SHOP into @list, not just one line.

foreach (@list) {
        print "Items are:\n$_\n\n";
}

and you don't need to grab it all :

@list[0..2]=<SHOP>

. We haven't mentioned list context for a while. Wheter a the line input operator <> returns a single value or a list depends on the context you use it in. When you supply @xxxxx then this must be a list. If you supply $xxxxx then that's a scalar variable. You can force it into list context by using parens.

The two lines below are provided so you can paste them into the above program. They demonstrate how parens force list context. Remember to replace the

foreach

with something that prints the variables.

($first, $second) = <SHOP>;
$first,  $second  = <SHOP>;

Associative Arrays

Very, very useful. First, a quick recap on arrays. Arrays are an ordered list of scalar variables, which you access by their index number starting at 0. Arrays always stay in the same order. Hashes are a list of scalars, but instead of being accessed by index number, they are accessed by a key. The tables below illustrate the point:

@myarray
Index No.	Value
0	The Netherlands
1	Belgium
2	Germany
3	Monaco
4	Spain

%myhash
Key	Value
NL	The Netherlands
BE	Belgium
DE	Germany
MC	Monaco
ES	Spain

So if we want 'Belgium' from @myarray and also from %myhash , it'll be:

print "$myarray[1]";
print "$myhash{'BE'}";

Notice that the $ prefix is used, because it is a scalar variable. Despite the fact it is part of a list, it is still a scalar variable. The hash syntax is simply to use braces { } instead of square brackets.

So why use hashes ? When you want to look something up by a keyword. Suppose we wanted to create a program which returns the name of the country when given a country code. We'd input ES, and the program would come back with Spain.

You could do it with arrays. It would be messy however. One possible approach :

create @country, and give it values such as 'ES,Spain'
Itierate over the entire array and
spliteach element of the array, and check the first result to see if it matches the input
If so, return the index

@countries=('NL,The Netherlands','BE,Belgium','DE,Germany','MC,Monaco','ES,Spain');

print "Enter the country car code:";
chop ($find=<STDIN>);

foreach (@countries) {
        ($code,$name)=split /,/;
        if ($find=~/$code/i) {
                print "$name has the code $code\n";
        }
}

Complex and slow. We could also store a reference to another array in each element of @countries , but that is not efficient. Whatever way we choose, you still need to search the whole thing. And what if

@countries

is a big array ? See how much easier a hash is :

%countries=('NL','The Netherlands','BE','Belgium','DE','Germany','MC','Monaco','ES','Spain');

print "Enter the country car code:";
chop ($find=<STDIN>);

$find=~tr/a-z/A-Z/;
print "$countries{$find} has the code $find\n";

Very easy. All we need to do is make sure everything is in uppercase with tr and we are there. Notice the way %countries is defined - exactly the same as a normal array, except that the values are put into the hash in key/value pairs.

So why use arrays ? One excellent reason is because when an array is created, its variables stay in the same order you created them in. With a hash, perl reorders elements for quick access. Add

print %countries;

to the end of that program above and run it. See what I mean ? No recognisable order at all. If you were writing code that stored a list of variables over time and you wanted it back in the order you found it in, don't use a hash.

Finally, you should know that each key of a hash must be unique. Stands to reason, if you think about it. You are accessing the hash via keys, so how can you have two keys named 'NL' or something ? If you do define a certain key twice, the second value overwrites the first. This is a feature, and useful. The values of a hash can be duplicates, but never the keys.

If you want to assign to a hash, there is of course no concept of push , pop and

splice

etc. Instead :

Assigning	`$countries{PT}='Portugal';`
Deleting	`delete $countries{NL};`

Accessing Your Hash

Assuming you keep the same %countries hash as above, here are some useful ways to access it :

All the keys	`print keys %countries;`
All the values	`print values %countries;`
A Slice of Hash :-)	`print @countries{'NL','BE'};`
How many elements ?	`print scalar(keys %countries);`
Does the key exist ?	`print "It's there !\n" if exists $countries{'NL'};`

Well, that last one is not an access but useful anyway.

You may have noticed that

keys

and values return a list. And we can iteriate over a list, using foreach :

foreach (keys %countries) {
        print "The key $_ contains $countries{$_}\n";
}

which is useful. Note how any list can be fed to

foreach

, and off it goes. As usual, there is another way to do the above:

while (($code,$name)=each %countries) {
        print "The key $code contains $name\n";
}

The each function returns each key/value pair of the hash, and is slightly faster. In this example we assign them to a list (you spotted the parens ?) and away we go. Eventually there are no more pairs, which returns false to the while loop and it stops.

Sorting

If I was reading this I'd be wondering about sorting. Wonder no more, and behold :

foreach (sort keys %countries) {
        print "The key $_ contains $countries{$_}\n";
}

Spot the difference. Yes, sort crept in there. If you want the list sorted backwards, some cunning is called for. This is suitably foxy:

foreach (reverse sort keys %countries) {
        print "The key $_ contains $countries{$_}\n";
}

Perl is just so difficult at times, don't you think ? This works because :

keys returns a list
sort expects a list - gets one from keys , and sorts it
reverse also expects a list, get one and returns it
then the whole list is foreach 'd over.

This is a quick example to make sure the meaning of reverse is clear :

print "Enter string to be reversed :";
$input=<STDIN>;

@letters=split //,$input;       # splits on the 'nothings' in between each character of $input

print join ":", @letters;       # joins all elements of @letters with \n, prints it
print reverse   @letters;       # prints all of @letters, but sdrawkcab )-:

Perl's list operators can just feed directly to each other, saving many lines of code but also decreasing readbility to those that aren't Perl-literate :

print "Enter string to be reversed :";
print join ":",reverse split //,$_=<STDIN>;

This section is about sorting, so enough of reverse . Time to go forwards instead.

That's easy alphabetical sorting by the keys. If you had a hash of international access numbers like this one :

%countries=('976','Mongolia','52','Mexico','212','Morocco','64','New Zealand','33','France');

foreach (sort keys %countries) {
        print "The key $_ contains $countries{$_}\n";
}

You might want to sort numerically. In that case, you need to understand how Perl's sort function works.

The sort function compares two variables, $a and $b . They must be called $a and $b otherwise it won't work. One chap published a book with stolen code, and he changed $a and $b to $x and $y. He obviously didn't test the program because it would have failed and he would have noticed. And this book was really published ! Don't believe everything you read in books - but web tutorials are always 100% truthful :-)

Back to sorting. $a and $b are compared, and the result is :

1 if $a is greater than $b
-1 if $b is greater than $a
0 if $a and $b are equal

So as long as the sort operator gets one of those three values back it is happy. This means we can write our own sort routines, and feed them to sort. For example, we know the default sort is alphabetical. But if we write this :

%countries=('976','Mongolia','52','Mexico','212','Morocco','64','New Zealand','33','France');

foreach (sort supersort keys %countries) {
        print "$_ $countries{$_}\n";
}

sub supersort {
        if ($a > $b) {
                return 1;
        } elsif ($a < $b) { 
                return -1;
        } else { 
                return 0; 
        }
}

then it works correctly. Of course, there is an easier way. The 'spaceship' operator <=> . It does exactly what the supersort subrountine does, namely return 1, -1 or 0 depending on the comparison of two given values.

So we can write the above much more easily as :

%countries=('976','Mongolia','52','Mexico','212','Morocco','64','New Zealand','33','France');

foreach (sort { $b <=> $a } keys %countries) {
        print "$_ $countries{$_}\n";
}

Notice the { } braces, which define the contents as the subroutine sort must use. Pretty short subroutine. There is a companion operator to <=> , namely

cmp

which does exactly the same thing but of course compares the values as strings, not numbers. Remember if you are comparing numbers, your comparison operator should contain non-alphas, if you are comparing strings the operator should contains alphas only.

Anyway, you now have enough knowledge to sort a hash by value instead of keys. Suppose your pointy haired manager bounced up to you and demanded a hash sorted by value ? What would you do ? OK, what should you do ?

Well, we could just sort the values.

foreach (sort values %countries) {

But Pointy Hair wants the keys too. And if you have a value you can't find the key.

So we have to iteriate over the keys. But just because we are iterating over the keys doesn't mean to say we have to hand the keys over to

sort

. What about :

%countries=('976','Mongolia','52','Mexico','212','Morocco','64','New Zealand','33','France');

foreach (sort { $countries{$a} cmp $countries{$b} } keys %countries) {
        print "$_ $countries{$_}\n";
}

beautifully simple. If you want a reverse sort swap

$a

and $b .

You can sort several lists at the same time :

%countries=('976','Mongolia','52','Mexico','212','Morocco','64','New Zealand','33','France');
@nations=qw(China Hungary Japan Canada Fiji);

@sorted= sort values %countries, @nations;

foreach (@nations, values %countries) {
        print "$_\n";
}

print "#----\n";

foreach (@sorted) {
        print "$_\n";
}

This sorts @nations and the values from %countries into a new array. The example also demonstrates that you can foreach over more than one list value - each list is processed in turn.

Grep and Map

Grep

If you want to search a list, and create another list of things you found,

grep

is one solution. This is an example, which also demonstrates join again :

@stuff=qw(flying gliding skiing dancing parties racing);        # quote-worded list

@new = grep /ing/, @stuff;              # searches for anything with 'ing' in it

print join ":",@stuff,"\n";             # first makes on string out of the elements of @stuff, joined
                                        # with ':' , then prints it, then prints \n

print join ":",@new,"\n";

Remember qw means 'quote words', so word boundaries are used as delmiters intead. The

grep

function must be fed a list on the right hand side. On the left side, you may assign the results to a list or a scalar variable. The list gives you each actual element, and the scalar gives you the number of matches found :

@stuff=qw(flying gliding skiing dancing parties racing);

$new = grep /ing/, @stuff;

print join ":",@stuff,"\n";

print "Found $new elements of \@stuff which matched\n";

If you decide to modify the elements on they way through

grep

, you actually modify the original list.

@stuff=qw(flying gliding skiing dancing parties racing);

@new = grep s/ing//, @stuff;

print join ":",@stuff,"\n";
print join ":",@new,"\n";

To determine what actually matches you can either use an expression or a block. Up to now we've been using expressions, but when things become more complicated use a block :

@stuff=qw(flying gliding skiing dancing parties racing);

@new = grep { s/ing// if /^[gsp]/ } @stuff;

print join ":",@stuff,"\n";
print join ":",@new,"\n";

Try removing the braces and you'll get an error. Notice that the comma before the list has gone. It is now obvious where the expression ends, as it is inside a block delimited with { } . The regex says if the element begins with g, s or p, then remove ing. The result is only assigned to @new if the expression is completely true - 'parties' does begin with p, so that works, but s/ing// fails so the overall result is false, and the value is not assigned to

@new

Map

Map works the same way as

grep

, in that they both iteriate over a list, and return a list. There are two important differences however :

grepreturns the value of everything it evaluates to be true
mapreturns the results of everything it evaluates

As usual, an example will assist the penny in dropping, clear the fog and turn on the light (if not make my metaphors easier to understand) :

@stuff=qw(flying gliding skiing dancing parties racing);

print join ":",@stuff,"\n";

@mapped  = map  /ing/, @stuff;
@grepped = grep /ing/, @stuff;

print join ":",@stuff,"\n";
print join ":",@mapped,"\n";
print join ":",@grepped,"\n";

You can see that @mapped is just a list of 1 or nothing. This is the result of

map

- in every case the expression

/ing/

is successful, except for 'parties'. Notice there is a null value returned in this case - false. Contrast this action with the grep function, which returns the actual value, and only if it is true. Try this :

@letters=(a,b,c,d,e);

@ords=map ord, @letters;
print join ":",@ords,"\n";

@chrs=map chr, @ords;   
print join ":",@chrs,"\n";

This uses the ord function to change each letter into its ASCII equiavlent, then the

chr

function convert ASCII numbers to characters. If you change map to

grep

in the example above, you can see that nothing appears to happen. What is happening is that

grep

is trying the expression on each element, and if it succeeds (is true) it returns the element, not the result. The expression succeeds for each element, so each element is returned in turn. Another example :

@stuff=qw(flying gliding skiing dancing parties racing);

print join ":",@stuff,"\n";

@mapped  = map  { s/(^[gsp])/$1 x 2/e } @stuff;
@grepped = grep { s/(^[gsp])/$1 x 2/e } @stuff;

print join ":",@stuff,"\n";
print join ":",@mapped,"\n";
print join ":",@grepped,"\n";

Recapping on regex, what that does is match any element beginning with g, s or p, and replace it with the same element twice. The caret ^ forces a match at the beginning of the string, the [square brackets] denote a character class, and /e forces Perl to evaluate the RHS as an expression.

The output from this is a mixture of 1 and nothing for map , and a three-element array called

@grepped

from grep. Yet another example :

@mapped  = map  { chop } @stuff;
@grepped = grep { chop } @stuff;

The chop function removes the last character from a string, and returns it. So that's what you get back from map, the result of the expression. The grep function gives you the mangled remains of the original value.

Finally, you can write your own functions :

@stuff=qw(flying gliding skiing dancing parties racing);

print join ":",@stuff,"\n";

@mapped  = map  { &isit } @stuff;
@grepped = grep { &isit } @stuff;

print join ":",@mapped,"\n";
print join ":",@grepped,"\n";

sub isit {
        ($word)=/(^.*)ing/;

        if (length $word == 3) {
                return "ok";
        } else {
                return 0;
        }
}

The subroutine isit first grabs everything up until 'ing', puts it into

$word

, then returns 'ok' if the there are three characters in $word . If not, it returns the false value 0. You can make these subroutines (think of them as functions) as complex as you like.

Sometimes it is very useful to have map return the actual value, rather than the result. The answer is easy, but not obvious. like a subroutine, returns the result of the last expression evaluated. What if the expression was, very simply :

@grepstuff=@mapstuff=qw(flying gliding skiing dancing parties racing);

print join " ",map  { s/(^[gsp])/$1 x 2/e } @mapstuff;
print "\n";
print join " ",grep { s/(^[gsp])/$1 x 2/e } @grepstuff;

Now, make sure $_ is the last thing evaluated :

@grepstuff=@mapstuff=qw(flying gliding skiing dancing parties racing);

print join " ",map  { s/(^[gsp])/$1 x 2/e;$_} @mapstuff;
print "\n";
print join " ",grep { s/(^[gsp])/$1 x 2/e } @grepstuff;

and there you have it. Now you understand that you can go and impress your friends.

External Commands

Perl can start external commands. There are four main ways to do this :

system
exec
Command Input, also known as `backticks`
Piping data from a process

We'll compare system and exec first.

Exec

Poor old

exec

is broken on Perl for Win32. What it should do is stop running your Perl script and start running whatever you tell it to. If it can't start the external process, it should return with an error code. This doesn't work properly under Perl for Win32. The

exec

function does work properly on the standard Perl distribution.

System

This runs an external command for you, then carries on with the script. It always returns, and the value it returns goes into $? . This means you can test to see if the program worked. Actually you are testing to see if it could be started, what the program does when it runs is outside your control if you use

system

. This demonstrates

system

in action. Run the 'vol' command from a command prompt first if you are not familiar with it. Then run the 'vole' command. I'm assuming you have no cute furry executables called vole on your system, or at least in the path. If you do have an executable called 'vole', be creative and change it.

system("vole");

print "\n\nResult: $?\n\n";

system("vol");

print "\n\nResult: $?\n\n";

As you can see, a successful system call returns 0. An unsucessful one returns a value which you need to divide by 256 to get the real return value. Also notice you can see the output. And because system returns, the code after the first

system

call is executed. Not so with exec, which will terminate your perl script if it is successful.

Backticks

These

``

are different again to system and exec. They also start external processes, but return the output of the process. You can then do whatever you like with the output. If you aren't sure where backticks are on your keyboard, try the top left, just left of the 1 key. Often around there. Don't confuse single quotes

''

with backticks `` .

$volume=`vol`;

print "The contents of the variable \$volume are:\n\n";

print $volume;

print "\nWe shall regexise this variable thus :\n\n";

$volume=~m#Volume in drive \w is (.*)#;

print "$1\n";

As you can see here, the Win32 vol command is executed. We just print it out, escaping the $ in the variable name. Then a simple regex, using # as a delimiter just in case you'd forgotten delimiters don't have to be / .

Before you get carried away with creating elaborate scripts based on the output from NT's

net

commands, note there are plenty of excellent modules out there which do a very good job of this sort of thing, and that any form of external process call slows your script. Also note there are plenty of built in functions such as

readdir

which can be used instead of

`dir`

. You should use Perl functions where possible rather than calling external programs because Perl's functions are :

portable (usually, but there are exceptions). This means you can write a script on your Mac PowerBook, test it on an NT box and then use it live on your Unix box without modifying a single line of code.
faster, as every external process significantly slows your program
don't usually require regexing to find the result you want
don't rely on output in a particular format, which might be changed in the next version of your OS or application
are more likely to be understood by a Perl programmer - for example, $files=`ls`;on a Unix box means little to someone that doesn't know that ls is the Unix command for listing files, as dir is in Windows.

Don't start using backticks all over the place when system will do. You might get a very large return value which you don't need, and will consequently slurp lots of memory. Just use them when you want to check the return value.

Opening a Process

The problem with backticks is that you have to wait for the entire process to complete, then analyse the entire return code. This is a big problem if you have large return codes or slow processes. For example, the DOS command tree. If you aren't familiar with this, run it and look at the output.

We can open a process, and pipe data in via a filehandle in exactly the same way you would read a file.

open TRIN, "tree c:\\ /a |" or die "Can't see the tree :$!";

while (<TRIN>) {
        print "$. $_";
}

Note the | which denotes that data is to be piped from the specifed process. You can also pipe data to a process by using

as the first character.

As usual,

$.

is the line number. What we can do now is terminate our tree early. Enviromentally unsound, but efficient.

while (<TRIN>) {
        printf "%3s $_", $.;
        last if $. == 10;
}

As soon as $. hits 10 we shut the process off by exiting the loop. Easy. You might notice the presence of a new keyword - printf . It works like print , but formats the string before printing. The formatting is controlled by such parameters as %3s , which means "pad out to a total of three spaces". After the doublequoted string comes whatever you want to be printed in the format specified. Some examples follow. Just uncomment each line in turn to see what it does.

$windir=$ENV{'WINDIR'};         # yes, you can access the enviroment variables !

$x=0;

# whoops, another new function
opendir WDIR, "$windir" or die "Can't open $windir !!! Panic : $!";

while ($file= readdir WDIR) {
        next if $file=~/^\./;           # try commenting this line to see why it is there

        $age= -M "$windir/$file";       # -M returns the age in days
        $age=~s/(\d*\.\d{3}).*/$1/;     # hmmmmm

        #### %4.4d - must take up 4 columns, and pad with 0s to make up space
        ####         and minimum width is also 4
        #### %10s  - must take up 10 columns, pad with spaces
        # printf "%4.4d %10s %45s \n", $x, $age, $file;

        #### %-10s - left justify
        # printf "%4.4d %-10s %-45s \n", $x, $age, $file;

        ####  %10.3 - use 10 colums, pad with 0s if less than 3 columns used
        # printf "%4.4d %10.3d %45s \n", $x, $age, $file;

        $x++;

        last if $x==15;                 # we don't want to go through all the files :-)
}

There are some intentionally new functions there. When you start hacking Perl (actually, you already started if you have worked through this far) you'll see a lot of example code. Try and understand the above, then read the explanation below.

Firstly, all enviroment variables can be accessed and set via Perl. They are in the

%ENV

hash. If you aren't sure what enviroment variables are, refer to your friendly Microsoft documentation or books. The best known enviroment variable is path, and you can see it's value and that of all other enviroment variables by simply typing

set

at your command prompt.

Secondly,

opendir

. Similar to open but opens a directory, not a file. Usually, you want to read from the directory, so readdir is useful. There is no

while (<WDIR>)
{

construct.

The regex

/^\./

bounces out invalid entries before we bother do any processing on them. Good programming practice. What it matches is "anything that begins with '.'". The caret anchors the match to the beginning of the string, and as

is a metacharacter it has to be escaped.

Perl has several tests to apply on files. The -M test returns the age in days. See the documentation for similar tests. Note that the calls to readdir return just the file, not the complete pathname. As you were careful to use a variable for the directory to be opened, it is no trouble to glue it together by using doublequotes.

Try commenting out $age=~s/(\d*\.\d{3}).*/$1/ and note the size of $age . It could do with a trim. Just for regex practice, we make it a little smaller. What the the regex does is :

start capturing with (
look for 0 or more digits \d*
then a .(escaped)
followed by three digits \d{3}
and that's all we want to capture so the parens are closed. )
Finally, everything else in the string is matched .* where . is any character (almost) and * 0 or more. This is pretty much guaranteed to match to the end of the line
Having matched the entire string (and put part of it into $1 by using parens) we simply replace the string with what we have matched.

Easy !

Mention should also be made of sprintf , which is exactly like printf except it doesn't print. You just use it to format strings, which you can do something with later. For example :

open TRIN, "tree c:\\ /a |" or die "Can't see the tree :$!";

while (<TRIN>) {
        $line= sprintf "%3s $_", $.;
        print $line;
        last if $. == 10;
}

Oneliners

You'll have noticed Perl packs a lot of power into a small amount of code. You can feed Perl code directly on the command line. Try this :

perl -e"for (55..75) { print chr($_) }"

The -e switch tells Perl that a command is following. The command must be enclosed in doublequotes, not singles as on Unix. The command itself in this case simply prints the ASCII code for the number 55 to 75 inclusive.

This is a simple find routine. As it uses a regex, it is infintely superior to NT's findstr .

perl -e"while (<>) {print if /^[bv]/i}" shop.txt

Remember, the while (<>) construct will open whatever is in @ARGV . In this case, we have supplied shop.txt so it is opened and we print lines that begin with either 'b' or 'v'.

That can be made shorter. Run

perl
-h

and you'll see a whole list of switches. The one we'll use now is -n , which puts a

while (<>) {     }

loop around whatever code you supply with

-e

. So :

perl -ne"print if /^[bv]/i" shop.txt

which does exactly the same as the previous program, but uses the

-n

switch. A slightly more sophistcated version :

perl -ne"print \"$ARGV : $.\n\" if /^[bv]/i" shop.txt

which demonstrates that doublequotes must be escaped.

If you don't remember

$^I

then please review the section on Files before proceeding. When you're ready, copy shop.txt to shop2.txt .

perl -i.bk -ne"printf \"%4s : $_\",$." shop2.txt

The -i switch primes the inplace edit operator. We still need -n .

If you had a typical quoted email message such as :

>> this is what was said
>> blah blah
> blaaaaahhh

The new text

and you wanted to remove the >, then :

perl -i.bk -pe"s/^>//" email.txt

does the trick. Notice that the regex anchors the match to the start of the string with the caret. What is new is the use of

-p

, which does exactly the same thing as

-n

except that it adds a

print

statement too.

Some other useful oneliners - a calculator and a ASCII number lookup :

perl -e"print 50/200+2"
perl -e"for (50..90) { print chr($_) }"

There are plenty more. Send me your favourites ! Finally, a useful tip - to avoid escaping doublequotes, try this :

perl -e"for $i (50..90) { print chr($i),qq| is $i\n| }"

Whatever follows qq is used as a delimiter. I learnt this from the Perl-Win32-Users mailing list (see top) - I think it was Lennart Borgman who pointed it out. He also mentioned that you don't need the closing doublequote. Saves a little typing.

Subroutines and Parameters

We want a subroutine to calculate how long it will take us to drive a given distance.

($speed,$distance)=@ARGV;

&calcspeed;

print $time;

sub calcspeed {
        $time=$distance / $speed;
        $time=int($time*60);
}

Execute it thus :

perl calcspeed.pl 130 120

This works. As you remember ;-)

@ARGV

contains the command line arguments, which are assigned to variables. Then we call the subroutine and print the result. The int function returns an integer, that is without those messy digits after the decimal point.

This is a little inflexible. Suppose we wanted to also print times for 10kmph (I prefer kms to miles) above and below the speed given. Or, we allowed six parameters for three sets of distance/time calculations. Or there was some other change. This one solution to the first problem :

($speed,$distance)=@ARGV;

&calcspeed;
print "$time\n";

$speed=$speed+10;
&calcspeed;
print "$time\n";

$speed=$speed-20;
&calcspeed;
print "$time\n";

sub calcspeed {
        $time=$distance / $speed;
        $time=int($time*60);
}

That's an appalling bit of code. What would be really useful is if we could pass the subroutine parameters to act on. We can.

($speed,$distance)=@ARGV;

&calcspeed($speed,$distance);
print "$time\n";

&calcspeed($speed+=10,$distance);
print "$time\n";

&calcspeed($speed-=20,$distance);
print "$time\n";

sub calcspeed {
        ($speed,$distance)=@_;
        $time=$distance / $speed;
        $time=int($time*60);
}

First change - the shorter version of

$x=$x+$y

, that is $x+=$y . Secondly, and more importantly, we are now passing parameters to our subroutine via the parens. The parameters are comma delimited.

To use those parameters, analyse the @_ array. As you can see, we just assign the contents to two variables. These happen to have the same name as the originals.

Subroutines are just functions which are user-defined. I'll mix and match the terms. So you don't usually have to use the & prefix. Sometimes it is necessary if there is any ambiguity. You can also just print the result of a function, or assign it to a variable or do any other operation on it.

($speed,$distance)=@ARGV;

print calcspeed($speed,$distance),"\n";

print calcspeed($speed+=10,$distance),"\n";

print calcspeed($speed-=20,$distance),"\n";

sub calcspeed {
        ($speed,$distance)=@_;
        $time=$distance / $speed;
        $time=int($time*60);
}

Great. Now let's worry about fuel enconomy. On second thoughts, we'll just calculate it as an interesting excercise and not worry about it. Modify the function :

sub calcspeed {
        ($speed,$distance)=@_;
        $time=$distance / $speed;
        $time=int($time*60);
        int($litres=$distance / (15 / ($speed/100)));
}

All that does is work out a rough fuel consumption, which gets worse the faster you drive. Or better, if you take Texaco's viewpoint. The problem we have is that the time is now lost.

This is because any function returns the value of the last expression evaluated. In this case, the last expression was the calcuation for

$litres.

We can override this.

sub calcspeed {
        ($speed,$distance)=@_;
        $time=$distance / $speed;
        $time=int($time*60);
        int($litres=$distance / (15 / ($speed/100)));
        return ($time,$litres);
}

like so. Now, both values are returned. Progress. Or is it ?

You might be wondering what the point of all this is. Why bother passing parameters around ? Well, for short and small programs sometimes it is not worth it. But work through these examples and you'll see :

($speed,$distance)=@ARGV;

($time,$fuel)=calcspeed($speed,$distance);
print "At $speed, it takes $time minutes to travel $distance kms, using $fuel litres\n";

@result=calcspeed($speed+=10,$distance);
print "At $speed, it takes $result[0] minutes to travel $distance kms, using $result[1] litres\n";

($time,$fuel)=calcspeed($speed-=20,$distance);
print "At $speed, it takes $time minutes to travel $distance kms, using $fuel litres\n";

sub calcspeed {
        ($speed,$distance)=@_;
        $speed=70 if $speed < 70;
        $time=$distance / $speed;
        $time=int($time*60);
        int($litres=$distance / (15 / ($speed/100)));
        return ($time,$litres);
}

Here we demonstrate that you can just assign the results of a function to variables. You already knew that, but a demonstration doesn't hurt. Unless you happen to be working for an electric chair manufacturer.

The important part is the sneaky modification of $speed . Being good citizens, we have decided that if

$speed

is less than 70, it should become 70.

Unfornately, this returns some rather spurious results.

So what can we do ? We could assign $speed and

$distance

to new variable names. That would fix it. However, imagine a very large program. Can you really keep track of all those variable names ? What about a module, which is a plug-in bit of Perl code to extend your program's functionality ? Suppose the module programmer used a $speed variable too - it would stomp all over your

$speed

and your program would break.

What we need is a little privacy. Departing from the main program for just a moment (but we will return) :

($speed,$distance)=@ARGV;

print "1. ## Speed is $speed, distance is $distance\n";

&change;

print "3. ## Speed is $speed, distance is $distance\n";

sub change {
        $speed*=2;
        $distance/=10;
        print "2. ** Speed is $speed, distance is $distance\n";
}

No surprises here. Our two variables are duly changed. Now, try this :

sub change {
        my ($speed,$distance);
        $speed*=2;
        $distance/=10;
        print "2. ** Speed is $speed, distance is $distance\n";
}

Print 2 now shows 0. This is because we have used

my

to declare the variables. The variables now exist only inside the block they are declared in. A block is delimited by

{
}

braces. As you can see, the variables inside the block even have the same name as the variables outside the block. It doesn't matter.

Now we can use whatever variable names we like inside our function, safe in the knowledge that we won't stomp over any variables of the same name. Another advantage is that my variables are faster than global (non-my) variables.

Knowing this, a simple change can be made to the original program :

sub calcspeed {
        my ($speed,$distance)=@_;       # spot the difference
        my ($time,$litres);
        $speed=70 if $speed < 70;
        $time=$distance / $speed;
        $time=int($time*60);
        int($litres=$distance / (15 / ($speed/100)));
        return ($time,$litres);
}

This is looking a little more professional now. We have declared all the variables we are going to use with

my

, and sleep easily at night knowing they are protected in their own little world, the boundaries of which are the braces. After that, they cease to exist. Their scope has been restricted. This is what scoping variables is all about. Variables declared with my are known as lexically scoped.

There is much, much more to scoping than the above. It is however important to understand just one more concept :

($arg1,$arg2,$arg3)=@ARGV;

print "\nOutput Field Seperator is :$,:\n";
print "OUTSIDE\t",$arg1,$arg2,$arg3,"\n";
&change;


$,="_";
print "\nOutput Field Seperator is :$,:\n";
print "OUTSIDE\t",$arg1,$arg2,$arg3,"\n";
&change;

sub change {
        print "BLOCK\t",$arg1,$arg2,$arg3,"\n";
}

which should be executed something like this :

perl test.pl sarcasm is the lowest form of wit

The special variable $, defines what Perl should print in between lists it is given. By default, it is nothing. We can assign to it quite easily.

This causes a small problem with our happy world of lexically scoped

my

variables. The problem being that if we want to use

$,

in our own little subroutine, we can, but only as long as we accept the value that it is set to. This won't work, but try it :-)

sub change {
        my $,="!-!";
        print "BLOCK\t",$arg1,$arg2,$arg3,"\n";
}

So what can we do ? One solution is to assign the current value of

$,

to another variable, change

$,

, and make sure we change it back when the function returns. That's messy, extra work and prone to errors. This is a better solution :

sub change {
        local $,="!-!";
        print "BLOCK\t",$arg1,$arg2,$arg3,"\n";
}

The local function makes a copy of the given variable, which can be modified as you please inside your block. Outside the block,

$,

still has it's original value. Again, this is scoping, but this form of scoping is called dynamic scoping.

So when should you use

local

instead of my ? Not often, is the short answer. Personally I only tend to use it for changing special variables like $, . The problem with local is that it makes your subroutine dependent on variable names outside it's control. We could rewrite the speed/fuel/time program to use local instead, but what if we decided to change the name of the

$speed

variable to $kmph ? That would break the calcspeed function.

Write your subroutines as black boxes. They should accept input, return output and not be dependent on anything else in the program. Use

my

to lexically scope parameters,

local

where you have to, and use an explicit

return

instead of the default. Believe me, it'll save some trouble.

Note - when you are an proficient perl professional, you'll notice a simplification or two in this section. The problem is that this is a difficult concept to explain, and littering the text with precise clarifications about such things as closures, packages and evals will just complicate things further for no real gain.

bidadvertiser

How to speed up your internet connection

Learn python

WIndows essetial programs make your windows simple yet elegant

Perl tutorial for windows ver 2.2