How to Use...
Just work through from start to
finish. All you need is a Win32 PC and curiosity. Oh, and some time.
When you finish, please send me a critique. In fact, send one even if
you don't finish. I appreciate all input, especially error
checks ! And money cheques.
Conventions :
Sometimes you'll need to type
something in on the command line.These commands will be in green,
for example :
perl changeworld.pl
parm1 datafile.txt
Code that you should load into
your editor and run is in blue (don't run this now, it's just an
example):
while (<DATFILE>) {
printf "%2s : $_",$.;
}
when functions are referred to
in the text, their names are highlighted in red. For example, the
split
function.
All the code examples have been
tested, and you can just cut'n'paste (brave statement). I haven't
listed the output of each example. You need to run it and see for
yourself. Consider this course interactive.
What you need to
know
You need to be able to
differentiate between a PC and a toaster. No programming experience
is necessary. You do need to understand the basics of PC operation.
If you don't understand what directories and files are then you'll
find this difficult. You might find it difficult even if you do :-)
Use
of this document
If you want to translate this, use
it for your intranet, mirror it or otherwise use it please email me.
I'm agreeable to most proposals provided I know about them - which
means you can get the latest versions. Remember this document is
copyrighted.
What is Perl ?
Perl is a programming language.
Perl stands for Pratical Report and Extraction Language. You'll
notice people refer to 'perl' and "Perl". "Perl"
is the programming language as a whole - 'perl' is the name of the
core executable. Some of Perl's many strengths are :
Speed of development.
You edit a text file, and just run it. You can develop programs very
quickly like this. No seperate compiler needed. I find Perl runs a
program quicker than Java, let alone compare the complete
modify-compile-run-oh-no-forgot-that-semicolon sequence.
Power. Perl's regular
expressions are some of the best available. You can work with
objects, sockets...everything a systems administrator could want.
And that's just the standard distribution. Add the wealth of modules
available on CPAN and you have it all. Don't equate scripting
languages with toy languages.
Usuability. All that
power and capability can be learnt in easy stages. If you can write
a batch file you can program Perl. You don't have to learn object
oriented programming, but you can write OO programs in Perl. If
autoincrementing nonexistent variables scares you, make perl refuse
to let you. There is always more than one way to do it in Perl. You
decide your style of programming, and Perl will accomodate you.
Portability. On the
Superhighway to the Portability Panacea, Perl's Porsche powers past
Java's jaded jalopy. Many people develop Perl scripts on NT, or
Win95, then just FTP them to a Unix server where they run. No
modification necessary.
Editing tools You don't
need the latest Integrated Development Enviroment for Perl. You can
develop Perl scripts with any text editor. Notepad, vi, MS Word 97,
or even direct off the console. Of course, you can make things easy
and use one of the many freeware or shareware programmer's file
editors.
Price. Yes, 0 guilders,
pounds, dmarks, dollars or whatever. And the peer to peer support is
also free, and often far better than you'd ever get by paying some
company to answer the phone and tell you to do what you just tried
several times already, then look up the same reference books you
already own.
What is Win32 ?
Win32 refers to Microsoft Windows
32-bit operating systems. At the time of writing that's Windows 95
and Windows NT. Win32 does not mean Windows 3.11 running
Win32s.
What is Perl for
Win32 ?
Microsoft decided Perl would be a
Good Thing to have on Win32. See, they're not so bad. So they
employed Hip Commnications to port Perl to the Win32 platform (port
in this sense doesn't mean a shipyard or a drink - it means taking
the source code for perl and changing it so it runs on Windows).
The main perl developer at Hip,
Dick Hardt, later left the company and formed ActiveWare Internet
Corp. Dick took Perl for Win32 with him, and continued development.
In August 1997, ActiveWare changed their name to ActiveState Tool
Corp. "Perl for Win32" is a trademark of ActiveState Tool
Corp. However, the last release of the base (native) version will now
compile directly for Win32. The latest native version at the time of
writing is perl 5.004.
The ActiveState version includes
some additional modules and features, not least of which are Perl for
ISAPI (perlis.dll) and PerlScript. These don't work (yet) with the
native version. But they will soon because ActiveState is merging
their version with the native version. This should happen.....Some
Time Soon...when Perl 5.005 is released.
Perl is developed on the latest NT
platform, and may or may not work on older versions of NT. The latest
version of Perl will run on the latest version of Windows 95 (or 98
when it is released). Be aware that some things which work under
Windows NT don't work under Windows 95 because Win95 just doesn't
have the functionality. In the same way, some Perl features you can
use under Unix either don't work, or work differently compared to the
Win32 platform. Check the documentation !
What can you do
with Perl ?
Just two popular examples :
The Internet
Go surf. Notice how many websites
have dynamic pages with .pl or similar as the filename extension ?
That's Perl. It is the most popular language for CGI programming for
many reasons, most of which are mentioned above. In fact, there are a
great many more dynamic pages written with perl that may not have a
.pl extension. Perl has spread across Internet.
Systems
Administration
If you are an NT sysadmin, chances
are you aren't used to programming. In which case, the advantages of
Perl may not be clear. Do you need it ? Is it worth it ?
After you read this tutorial you
will know more than enough to start using Perl productively. You
really need very little knowledge to save time. Imagine driving a car
for years, then realising it has five gears, not four. That's the
sort of improvement learning Perl means. When you are proficient, you
find the difference like realisng the same car has a revese gear and
you don't have to push it backwards. Perl means you can be lazier.
Lazy sysadmins are good sysadmins, as I keep telling my boss. You'll
never touch a batch file again !
Support
There are six mailing lists for
Perl for Win32. Read all about them on
http://www.activestate.com/.
Make sure you read the charter too. Many people put time and
effort into the creation of those lists, so don't insult us by
ignoring the guidelines. Anyone with an interest in Perl for Win32
should be subscribing to at least one of these lists. The charter
also lists useful sites and newsgroups.
Setup
Three stages:
Get the software
Install it
Run a test Script
1. Getting the
Software
An old version of Perl for Win32 is
included with the Windows NT Resource Kit. Please don't use it. It is
out of date. Follow the steps below to get a newer version.
The basic Perl for Win32
distribution kit is about 1.5Mb. This comprised of more than 250
files - the basic perl.exe interpreter, library modules (useful
addons), documentation etc. Download times are about twice as long as
for a 750Kb file. :-)
You might wish to create a root
directory for the perl installation. The perl installation contains
more than 250 files and it has its own directory structure. This
tutorial will assume you are using c:\perl as your perl
installation directory.
You can use FTP, HTTP (that's your
web browser) or email.
Which file ?
You'll find three binaries for
download. Don't worry about PerlScript or PerlIS. These are special
versions of Perl for some web servers. If you run Microsoft Internet
Information Server, they will be of use but you are strongly
advised to work with perl a while before you start trying Perl
for ISAPI (perlis.dll) or PerlScript.
As of the time of writing the
latest build is 315, so the file to get is pw32i315.exe.
Make sure you do not download
pw32axxx.exe. This is for a Alpha machine and will not work on
an Intel PC. If you are an Alpha user you knew that already :-)
HTTP (Web
browsers)
FTP
ftp://ftp.linux.ActiveState.com/pub/Perl-Win32/Release
Email
GET perl-win32-announce
Pw32ixxx.exe
where xxx
is whatever
build number we are up to at the time. Remember it is a 1.5Mb file,
which is quite a large attachment. It is possible it won't make it to
your machine for this reason. If you are using a company email
account your friendly systems administrator would probably appreciate
you discussing this with him (or her - that's the last politically
correct statement for a while).
2. Installation
So you now have pw32ixxx.exe and it
is in c:\perl
or whatever directory you are using.
Installation is easy. We'll use a command prompt as you will be
working with the command prompt later. If the phrase 'command prompt'
mystifies you, then doubleclick the MS-DOS icon and you'll see one.
Looks like this : c:> If you can't find the icon, click
Start, then Run, then if you are running Win95 type command.com
and hit Enter. If you are running NT, type cmd.exe
and hit Enter.
Switch to c:\perl
Run the install program thus :
pw32ixxx.exe
(of course,
xxx is the latest build number...:-)
The install program will offer
to unzip into c:\perl
. If you are not using c:\perl
as your perl installation directory, change the path. Leave both
checkboxes about overwriting and when done checked.
You'll see a command window.
If you have followed these instructions perl has indeed been
unpacked into its final destination directory, so you can just
respond Y.
Allow the search path to be
modified
Associate perl with .pl
? If you do this you can run a perl script just by doubleclicking
it. Personally I prefer doubleclicking to start a text editor and
load the script, so I always answer no to this and run scripts from
the command line. So politely refuse the kind request and answer N.
If you do decide to associate perl.exe with .pl, change the mapping
so perl.exe accepts several parameters.
If you are running IIS you'll
see a message about I/O redirection. Just say Y. It is a Good Thing.
Trust me.
If using NT, you'll need to
logon/off as it says for the path to take effect, or reboot Win95.
After Step 8, start your
command prompt again and run this : perl
-v
. You should see the version numberof Perl
displayed. Remember this for when you ask questions on discussion
groups.
If you didn't see the version
number, perl.exe is not in the path. Review the steps above
carefully.
3. Testing - Your
First Perl Script
Assuming all has gone to plan, now
create your first Perl script. I reccomend creating a new directory
for your perl scripts, seperate to your data files and the perl
installation. For example c:\pscripts\, which is what I'll
assume you are using in this tutorial.
Start up whatever text editor
you're going to hack Perl with. Notepad.Exe is just fine. Type in the
following :
print "My first Perl script\n";
and save it to c:\scripts\myfirst.pl
. You don't need to
exit Notepad - keep it open, as we'll be making changes very soon.
Switch to your command prompt, and change to the directory. Execute
the script : perl myfirst.pl
and you'll see the output. Welcome to the world of Perl ! See what I
mean about it being easy to start ? However, it is difficult to
finish with Perl once you begin :-)
Now we need to analyse what's going
on here a little. First note that the line ends with a semicolon ;
. Almost all lines in Perl end with semicolons. Also
note the \n
. This the code
to tell Perl to output a newline. If that's not clear, delete the \n
from the program and run it again :
print "My first Perl script";
NB - almost every Perl book is written for UN*X, which is a problem
for Win32. This leads to scripts like :
#!c:/perl/perl.exe
print "I'm a cool Perl hacker\n";
The function of the 'shebang' line is to tell the shell how to
execute the file. Under Unix, this makes sense. Under Win32, the
system must already know how to execute the file before it is loaded
so the line is not needed.
However, the line It is not
completely ignored, as it is searched for any switches you may have
given Perl (for example -w
to
turn on warnings). However, you don't need it. You may also choose to
add the line so your scripts run directly on Unix without
modification, as Unix boxes probably do need it. Anyway, on
with the lesson.
Variables
So Perl is working, and you are
working with Perl. Now for something more interesting than simple
printing. Variables. Let's take simple scalar variables first. A
scalar variable is a single value. Like $var=10
which sets the variables $var
to
the value of 10. Later, we'll look at lists like arrays and hashes,
where @var
refers to more
than one value. Scalar is Singular.
If you've learnt any JavaScript or
BASIC you'd be surprised by $var=10
.
With those languages, if you want to assign the value 10 to a
variable called var
you'd
write var=10
.
Not so in Perl. This is a Feature.
All variables are prefixed with a symbol such as $
@ %
. This has certain advantages, like making programs
easier to read. You can see where the variables are quite easily. And
not only that, what sort of variable it is. The human language German
has a similar principle (except nouns are captalised, not prefixed
with $
and Perl is easier
to pronounce). You'll agree later....
Anyway, more hands-on. Time to try
some variables :
$string="perl";
$num=20;
print "The string is $string and the number is $num\n";
A closer look...notice you don't have to say what type of variable
you are declaring. In other languges you need to say if the variable
is a string, array, or whatever. You might even have to declare what
type of number it is. If you know any Java you'd been saying things
like int var=10 which defines the variable var as an
integer, with the value 10. Yes, there are different types but you
don't need to know about them with Perl. Typecasting ? That's not
politically correct any more !
If you didn't already know, a tiny
little comma out of place can lead to completely unexpected results.
If the above code didn't work, you haven't typed it in exactly as you
should have done. Those are double quotes "
"
, not singles ' '
.
Also notice the way the variables
are used in the string. Sticking variables inside of strings has a
technical term - "variable interpolation". Now, if
we didn't have the handy $
prefix
for we'd have to do something like the example below, which is
pseudocode. Pseudocode is code to demonstrate a concept, not designed
to be run. Like certain Microsoft software.
print "The string is
".string." and the number is ".num."\n";
which is much more work. Convinced
about those prefixes yet ?
Single quotes have their use. Try
this :
$string="perl";
$num=20;
print "Doubles: The string is $string and the number is $num\n";
print 'Singles: The string is $string and the number is $num\n';
Double quotes allow the aforementioned variable interpolation. Single
quotes do not. Both have their uses as you will see later, depending
on whether you wish to interpolate anything.
More on Variables
If you want to add 1 to a variable
you can, logically, do this : $num=$num+1
. There is a shorter way to do this, which is $num++
.
This is an autoincrement. Guess what this is : $num--
. Yes, an autodecrement.
This example illustrates the above
:
$num=10;
print "\$num is $num\n";
$num++;
print "\$num is $num\n";
$num--;
print "\$num is $num\n";
$num+=3;
print "\$num is $num\n";
The last example demonstrates that it doesn't have to be just 1 you
add/decrease by.
There's something else new in the
code above. The \
. You can
see what this does - it 'escapes' the special meaning of $
. That means just the $
symbol is printed instead of it referring to a
variable. Actually \
has a
deeper meaning - it escapes all of Perl's special characters, not
just $
. Also, it turns
some non-special characters into something special. Like what ? Like
n
. Add the magic \
and the humble 'n' becomes the mighty NewLine ! The \
character can also escape itself. So if you want to
print a single \
try :
print "the MS-DOS path is c:\\scripts\\";
Oh, '\' is also used for other things like references. But that's not
even covered here.
There is a technical term for these
'special characters' such as @ $ %
.
They are called metacharacters. Perl uses plenty of
metacharacters. You'll be using all sorts of obscure characters in
your Perl hacking career. This has earned perl a reputation for being
difficult to understand. That's true, but once you learn the
character meanings reading perl code becomes much easier precisely
because of all these strange characters.
Perl uses so many weird characters
that sometimes the same character has two or more meanings, depending
on its context. As an example, the humble dot .
can join two variables together, act as a wildcard or
become a range operator if there are two of them together. If this
sounds crazy, think about the English language. What do the following
mean to you ?
Mean is, in one context, is a word
to used describe the purpose of something. It is also another word
for average. Furthermore, it describes a nasty person, or a person
who doesn't like spending money, and is used in slang to refer to
something impressive and good. Polish, when captialised, can either
mean pertaining to the country Poland, or the act of making something
shiny. And 'like' can mean similar to, or affection for.
So, when you speak or write English
(think of two, to and too) you know what these words mean by their
context. It is exactly the same way with Perl. Just don't assume a
given metacharacter always means what you first thought it did.
Finally, try this :
$string="perl";
$num=20;
$mx=3;
print "The string is $string and the number is $num\n";
$num*=$mx;
$string++;
print "The string is $string and the number is $num\n";
Note the easy shortcut *=
meaning
'mulitply $num by $mx' or, $num=$num*$mx
. Of course Perl supports the usual +
- * / ** %
operators. The last two are exponentiation
(to the power of) and modulus (remainder of x divided by y). Also
note the way you can increment a string ! Is this language flexible
or what ?
More on the print
function
The print
function is a list operator. That means it accepts a
list of things to print, seperated by commas. As an example :
print "a doublequoted string ", $var, 'a variable called var', $num,"\n";
Of course, you just put all the above inside a single doublequoted
string :
print "a doublequoted string $var a variable called var $num \n";
to achieve the same effect. The advantage of using the print
function in list context is that expressions are
evaluated before being printed. For example, try this :
$var="Perl";
$num=10;
print "Two \$nums are $num * 2 and adding one to \$var makes $var++\n";
print "Two \$nums are ", $num * 2," and adding one to \$var makes ", $var++,"\n";
You might have been slightly surprised by the result of that last
experiment. In particular, what happened to our variable $var
? It should have been incremented by one, resulting in
Perm
. The reason being that 'm' is the next letter after
'l' :-)
Actually, it was incremented
by 1. We are postincrementing $var++
the variable, rather than preincrementing it.
The difference is that with
postincrements, the value of the variable is returned, then the
operation is performed on it. So in the example above, the current
value of $var
was returned
to the print
function, then
1 was added. You can prove this to yourself by adding the line print
"\$var is now $var\n";
to the end of the
example above.
If we want the operation to be
peformed on $var
before the
value is returned to the print function, then preincrement is the way
to go. ++$var
will do the
trick.
Subroutines - A
First Look
Let's take a another look at the
example we used to show how the autoincrement system works. Messy,
isn't it ? This is Batch File Writing Mentality. Notice how we use
exactly the same code four times. Why not just put it in a subroutine
?
$num=10; # sets $num to 10
&print_results; # prints variable $num
$num++;
&print_results;
$num*=3;
&print_results;
$num/=3;
&print_results;
sub print_results {
print "\$num is $num\n";
}
Easier and neater. The subroutine can go anywhere in your script, at
the beginning, end, middle...makes no difference. Personally I put
all mine at the bottom and reserve the top part for setting variables
and main program flow.
A subroutine is defined by starting
with sub
then the name.
After that you need a curly left bracket {
, then all the code for your subroutine. Finish it off
with a closing brace }
.
The area between the two braces is called a block. Remember
this. There are such things as anonymous subroutines but not here.
Everything here has a name.
Subroutines are usually called by
prefixing their name with &
,
like so &print_results;
.
In most circumstances you can forget the &
prefix but it is wise to leave it for the time being to
avoid confusion.
If you are worrying about variable
visbility, don't. All the variables we are using so far are visible
everywhere. You can restrict visibility quite easily, but that's not
important right now. If you weren't worrying about variable
visibility, please don't start. (paranoid ?)
Notice a #
crept in there. That's a comment. Everything after a #
is ignored. You can't continue it onto a newline
however, so if your comment won't fit on one line start a new one
with #
. There are ways to
create Plain Old Documentation (POD) and more ways to comment but
they are not detailed here.
Comparisons
An if
statement is simple. if day is Sunday, lie in
bed
. A simple test, with two outcomes. Perl conversion (don't
run this) :
if ($day eq "sunday") {
&lie_in_bed;
}
You already know that &lie_in_bed
is
a call to a subroutine. We assume $day
is
set earlier in the program. If $day
is
not equal to weekend &lie_in_bed
is
not executed (pity). You don't need to say anything else. Try this :
$day="sunday";
if ($day eq "sunday") {
print "Zzzzz....\n";
}
Note the syntax. The if
statement
requires something to test for Truth. This expression must be in
(parens), then you have the braces to form a block.
There are many Perl functions which
test for Truth. Some are if, while,
unless
. So it is important you know what truth is, as
defined by Perl, not your tax forms. Here are the three main rules :
Any string is true except for
""
and "0"
.
Any number is true except for
0
.
Any undefined value is false.
Some example code to illustrate the
point :
&isit; # $test1 is at this moment undefined
$test1="hello"; # a string, not equal to "" or "0"
&isit;
$test1=0.0; # $test1 is now a number, effectively 0
&isit;
$test1="0.0"; # $test1 is a string, but NOT effectively 0 !
&isit;
sub isit {
if ($test1) { # tests $test1 for truth or not
print "$test1 is true\n";
} else { # else statement if it is not true
print "$test1 is false\n";
}
}
The first test fails because $test1
is
undefined. This means it has not been created by assigning a value to
it. So according to Rule 3 it is false. The last two tests are
interesting. Of course, 0.0 is the same as 0 in a numeric
context. But it is not the same as 0 in a string context, so
it is true.
So here we are testing single
variables. What's more useful is testing the result of an expression.
For example, this is an expression : $x *
2
and so is this $day eq
"Sunday"
. It is the end result of these
expressions that is evaluated for truth.
Another example :
if (5 - 5) {
print "Testnum is true\n";
} else {
print "Testnum is false\n";
}
$day="Sunday";
$y=($day eq "Sunday");
$x=($day eq "Monday");
print "\$x is $x and \$y is $y\n";
The first test fails because 5-5 of course is 0, which is false.
Next, we compare the variable $day
to two different strings. The result of the comparison
is stored in a variable.
The first test returns the value 1,
which is true. The second test doesn't seem to return anything
(actually it returns ""), which is false.
The parens are used to force Perl
to evaluate the comparison first, then assign the result to the
variable. Try it without the parens.
Now pay close attention, otherwise
you'll end up posting an annoying question somewhere. The symbol =
is an assignment operator, not
a comparison operator. Therefore :
if
($x = 10)
is always true, because $x
has been assigned the value 10 successfully.
if
($x == 10)
compares the two values, which might
not be equal.
There are two types of comparison
operator - numeric and string. You've already seen two,
==
and eq
.
Run this :
$foo=291;
$bar=30;
if ($foo < $bar) {
print "$foo is less than $bar (numeric)\n";
}
if ($foo lt $bar) {
print "$foo is less than $bar (string)\n";
}
Alphabetically, that is in a string context, 291 comes before 30. It
is actually decided by the ASCII value, but alphabetically is close
enough. Change the numbers around a little. Notice how Perl doesn't
care wheter it uses a string comparison operator on a numeric value,
or vice versa. This is typical of Perl's flexibility. Bondage
and discipline are alien concepts to Perl. This flexibility does have
a drawback. If you're on a programming precipice, threatening suicide
by jumping off, Perl won't talk you out your decision but will
provide several ways of jumping, stepping or falling to your doom
while silently watching your early conclusion. So be careful.
The Perl Motto is : "There
is More Than One Way to Do It" or TIMTOWTDI. Pronounced
Tim-Toady. This tutorial doesn't try and mention all possible ways of
doing everything. Write your Perl programs the way you want to.
The rest of the operators are :
Comparison
|
Numeric
|
String
|
Equal
|
==
|
eq
|
Not equal
|
!=
|
ne
|
Greater than
|
>
|
gt
|
Less than
|
<
|
lt
|
Greater than or equal to
|
>=
|
ge
|
Less than or equal to
|
<=
|
le
|
Just remember :
if you are testing a value as
a string there should be only letters in your comparsion
operator.
if you are testing a value as
a number there should only be non-alpha characters in
your comparison operator
note 'as a' above. You can
test numbers as string and vice versa. Perl never complains.
More about if statements. Run this
:
$age=25;
$max=30;
if ($age > $max) {
print "Too old !\n";
} else {
print "Young person !\n";
}
It is easy to see what else
does.
If the expression is false then whatever is in the else
block is evaluated (or carried out, executed, whatever
term you choose to use). Simple. But what if you want another test ?
Perl can do that too.
$age=25;
$max=30;
$min=18;
if ($age > $max) {
print "Too old !\n";
} elsif ($age < $min) {
print "Too young !\n";
} else {
print "Just right !\n";
}
If the first test fails, the second is evaluated. This carries on
until there are no more elsif
statements,
or an else
statement is
reached. An else
statement
is optional, and no elsif
statements
should come after it.
There is a big difference between
the above example the the one below:
if ($age > $max) {
print "Too old !\n";
}
if ($age < $min) {
print "Too young !\n";
}
If you run it, it will return the same result - in this case.
However, it is Bad Programming Practice. In this case we are testing
a number, but suppose we were testing a string to see if it contained
R or S. It is possible that a string could contain both R and
S. So it would pass both 'if' tests. Using an elsif
avoids this. As soon as the first statement is true, no
more elsif
statements (and
no else
statement) are
executed.
You don't need to take up a whole
three lines :
print "Too old\n" if $age > $max;
print "Too old\n" unless $age < $max;
I added some whitespace there for asthetic beauty. There are other
operators that you can use instead of if
and unless
,
but that's for later on.
User Input
Sometimes you have to interact with
the user. It is a pain, but sometimes necessary, especially for the
live ones. To ask for input and do something with it try this :
print "Please tell me your name :";
$name=<STDIN>;
print "Thanks for making me happy $name !\n"
New things to learn here. Firstly, <STDIN>
. STDIN is where all information normally comes from.
You could say it is the standard source for input. Guess what STDIN
stands for :-)
In this case it is input from the
keyboard. Also, the angle brackets <>
read from a filehandle. Filehandles are what you use to
interact with things such as files, socket connections and more.
So we are reading from the STDIN
filehandle. The value is assigned to $name
and printed. Any idea why the ! ends up on a new line ?
on a new line on a newline ????
As you pressed Enter, you of course
included a newline with your name. The easy way to get rid of it is
to it like so :
print "Please tell me your name :";
$name=<STDIN>;
chop $name;
print "Thanks for making me happy $name !\n"
and that works as it should. The chop
function removes the last character of whatever it is
given to chop, in this case removing the newline for us. In fact,
that can be shortened :
print "Please tell me your name :";
chop ($name=<STDIN>);
print "Thanks for making me happy $name !"
The parentheses ( )
force
chop
to act on the result
of what is inside them. So $name=<STDIN>
is evaluated first, then the result from that, which is
$name
, is chopped.
Chopping is dangerous, as my friend
One Hand Harold will tell you. Everyone is concerned about safety
these days, and your perl code is should be no exception. Rather than
just remove the last character regardless of whatever it is, you can
remove the last character only if it is a newline with chomp :
chomp ($name=<STDIN>);
At this point the perl gurus are screaming "I found an error !".
Well, chomp doesn't always remove the last character if it is a
newline but if it doesn't, you have set a special variable, namely $/
, to something different. I presume that if you do set
$/
you know what it does.
It is explained later in this very document. Of course, being a good
pupil, you wouldn't experiment with the unknown, blindly changing
things just for the hell of it.
If you don't, you'll never learn
anything useful :-)
Arrays
Perl has two types of array,
associative arrays (hashes) and arrays. Both types are lists. A list
is just a collection of variables referred to as the collection, not
as indvidual elements.
You can think of Perl's lists a
herd of animals. List context refers to the entire herd, scalar
context refers to a single element. A list is a herd of variables.
The variables don't have to be all of the same type - you might have
a herd of ten sheep, three lions and two wolves. It would probably be
just three lions and 1.5 wolves before long, but bear with me. In the
same way, you might have a Perl list of three scalar variables, two
array elements and ten hash elements.
Certain types of lists are known by
certain names. Just as a herd of sheep is called a flock, a herd of
lions is called a pride and a herd of wolves is called a pack, some
types of Perl list have a special names.
For example, an array is an ordered
list of scalar variables. This list can be referred to as a
whole, or you can refer to individual elements in the list. The
program below defines a an array, called @names
. It puts five values into the array.
@names=("Muriel","Gavin","Susanne","Sarah","Anna");
print "The elements of \@names are @names\n";
print "The first element is $names[0] \n";
print "The third element is $names[2] \n";
print 'There are ',scalar(@names)," elements in the array\n";
Firstly, notice how we define @names
.
As it is in a list context, we are using parens. Each value is
comma seperated, which is Perl's default list delimiter. The
double quotes are not necessary, but as these are string values it
makes it easier to read and change later on.
Next, notice how we print it.
Simply refer to it as a whole, that is in list context.. List
context means referring to more than one element of a list at a time.
The code print @names;
will
work perfectly well too. But....
I usually learn something about
Perl every time I work with it. When running a course, a student
taught me this trick which he had discovered :
@names=("Muriel","Gavin","Susanne","Sarah","Anna","Paul","Trish","Simon");
print @names;
print "\n";
print "@names";
When a list is placed inside doublequotes, it is space delimited when
interpolated. Useful.
If we want to do anything with the
array as a list, that is doing something with all the values at once,
refer to the array as @array
..
That's important. The @
prefix
is also used when you want to refer to more than one element, but not
the entire array. That's called a slice . Cake analogies are
appropiate, and somewhat tastier. Pie analogies are probably
healthier but equally accurate.
Arrays are not much use unless we
can get to indvidual elements. Firstly, we are dealing with a single
element of the list, so we cannot use @
which refers to multiple elements of the array. It
is a single, scalar variable, so $
is used. Secondly, we must specify which
element we want. That's easy - $array[0]
for the first, $array[1]
for the second and so forth. Array indexes start at 0,
unless you do something which is so highly deprecated ('deprecated'
means allowed, usually for backwards compatibility, but disapproved
of because there are better ways) I'm not even going to mention it.
Finally, we force what is normally
list context (more than one element) into scalar context (single
element) to give us the amount of elements in the array. Without the
scalar
, it would be the
same as the second line of the program.
Please understand this :
$myvar="scalar variable";
@myvar=("one","element","of","an","array","called","myvar");
print $myvar; # refers to the contents of a scalar variable called myvar
print $myvar[1]; # refers to the second element of the array myvar
print @myvar; # refers to all the elements of array myvar
The two variables $myvar
and
@myvar
are not, in any way,
related. Not even distantly. Technically, they are in different
namespaces.
Going back to the animal analogy,
it is like having a dog named 'Myvar' and a goldfish called 'Myvar'.
You'll never get the two mixed up because when you call 'Myvar !!!!'
or open a can of dog food the 'Myvar' dog will come running and
goldfish won't. Now, you couldn't have two dogs called 'Myvar' and in
the same way you can't have two Perl variables in the same namespace
called 'Myvar'.
More on Arrays
The element number can be a
variable.
print "Enter a number :";
chomp ($x=<STDIN>);
@names=("Muriel","Gavin","Susanne","Sarah","Anna");
print "You requested element $x who is $names[$x]\n";
print "The index number of the last element is $#names \n";
This is useful. Notice the last line of the example. It returns the
index number of the last element. Of course you could always just do
this $last=scalar(@names)-1;
but
this is more efficient. It is an easy way to get the last element, as
follows :
print "Enter a number :";
chomp ($x=<STDIN>);
@names=("Muriel","Gavin","Susanne","Sarah","Anna","Paul","Trish","Simon");
print "The first two elements are @names[0,1]\n";
print "The first three elements are @names[0..2]\n";
print "You requested element $x who is $names[$x]\n";
print "The elements before and after are : @names[$x-1,$x+1]\n";
print "The first, second, third and fifth elements are @names[0..2,4]\n";
print "The last element is $names[$#names]\n";
It looks complex, but it is not. Really. Notice you can have multiple
values seperated by a comma. As many as you like, in whatever order.
The range operator ..
gives
you everything between and including the values. And finally look at
how we print the last element - remember $#names
gives us a number ? Simply enclose it inside square
brackets and you have the last element.
Do also note that because element
accesses such as [0,1]
are
more than one variable, we cannot use the scalar prefix, namely the $
symbol. We are accessing the array in list context,
so we use the @
symbol.
Doesn't matter that it is not the entire array. Remember, accessing
more than one element of an array but not the entire array is called
a slice. I won't go over the food analogies again.
Foreach
All well and good, but what if we
want to load each element of the array in turn ? Well, we could build
a for loop like this :
@names=("Muriel","Gavin","Susanne","Sarah","Anna","Paul","Trish","Simon");
for ($x=0; $x <= $#names; $x++) {
print "$names[$x]\n";
}
which sets $x
to 0, runs
the loop once, then adds one to $x
,
checks it is less than $#names
,
if so carries on. By the way, that was your introduction to for
loops. Just to go into a little detail there, the for
loop has three parts to it :
Initialisation
Test Condition
Modification
In this case, the variable $x
is initialised to 0. It is immediately tested to see if
it is smaller than, or equal to $names
.
If that is true, then the block is executed once. Critically, if it
is not true the block is not executed at all. We'll
discuss that later.
Once the block has been executed,
the modification expression is evaluated. That's $x++
. Then, the test condition is checked to see if the
block should be executed or not.
There is a another version :
for $x (0 .. $#names) {
print "$names[$x]\n";
}
which takes advantage of the range operator ..
(two dots together). This simply gives $x
the value of 0, then increments $x
by 1 until it is equal to $#names
.
For true beauty we must use foreach
.
foreach $person (@names) {
print "$person";
}
This goes through each element ('iterates', another good technical
word to use) of @names
,
and assigns each element in turn to the variable $person
. Then you can do what you like with the variable. Much
easier. You can use for $person (@names)
{
if you want. Makes no difference at all.
In fact, that gets shorter. And now
I need to introduce you to $_
,
which is the Default Input and Pattern Searching Variable.
foreach (@names) {
print "$_";
}
If you don't specify a variable to put each element into, $_
is used instead as it is the default for this
operation, and many, many others in Perl. Including the print
function :
foreach (@names) {
print ;
}
As we haven't supplied any arguments to print
, $_
is
printed as default. You'll be seeing a lot of $_
in Perl. Actually, that statement is not exactly true.
You will be seeing lot of places where $_
is used, but quite often when it is used, it is not
actually written. In the above example, you don't actually see $_
but you know it is there.
Changing the
Elements
So we have @names
. We want to change it. Run this :
print "Enter a name :";
chomp ($x=<STDIN>);
@names=("Muriel","Gavin","Susanne","Sarah");
print "@names\n";
push (@names, $x);
print "@names\n";
Fairly self explantaory. The push
function just adds a value on to the end of the array.
Of course, Perl being Perl, it doesn't have to be just the one value:
print "Enter a name :";
chop ($x=<STDIN>);
@names=("Muriel","Gavin","Susanne","Sarah");
@cities=("Brussels","Hamburg","London","Breda");
print "@names\n";
push (@names, $x, 10, @cities[2..4]);
print "@names\n";
This is worth looking at in more detail. It appears there is no fifth
element of @cities
, as
referred to by @cities[2..4]
.
Actually, there is a fifth element.
Add this to the end of the example :
print "There are ",scalar(@names)," elements in \@names\n";
There appear to be 8 elements in @names
.
However, we have just proved there are in fact 9. The reason there
are 9 is that we referred to nonexistent elements of @cities
, and Perl has quite happily extended @names
to suit. The array @cities
remains unchanged. Try pop
ing
the array if you don't believe me.
So that's push
. Now for some...
Jiggerypokery with
Arrays
@names=("Muriel","Gavin","Susanne","Sarah");
@cities=("Brussels","Hamburg","London","Breda");
&look;
$last=pop(@names);
unshift (@cities, $last);
&look;
sub look {
print "Names : @names\n";
print "Cities: @cities\n";
}
Now we have two arrays. The pop
function
removes the last element of an array and returns it, which means you
can do something like assign the returned value to a variable. The
unshift
function adds a
value to the beginning of the array. Hope you didn't forget that
&subroutinename
calls a
subroutine.
push
|
Adds value to the end of the array
|
pop
|
Removes and returns value from end of array
|
shift
|
Removes and returns value from beginning of array
|
unshift
|
Adds value to the beginning of array
|
Now, accessing other elements of
arrays. May I present the splice
function
?
Splice
@names=("Muriel","Sarah","Susanne","Gavin");
&look;
@middle=splice (@names, 1, 2);
&look;
sub look {
print "Names : @names\n";
print "The Splice Girls are: @middle\n";
}
The first arguments for splice
is
an array. Then second is the offset. The offset is the index number
of the list element to begin splicing at. In this case it is 1. Then
comes the number of elements to remove, which is sensibly 1 or more
in this case. You can set it to 0 and perl, in true perl style, won't
complain. Setting to 0 is handy because splice
can add elements to the middle of an array, and if you
don't want any deleted 0 is the number to use. Like so :
@names=("Muriel","Gavin","Susanne","Sarah");
@cities=("Brussels","Hamburg","London","Breda");
&look;
splice (@names, 1, 0, @cities[1..3]);
&look;
sub look {
print "Names : @names\n";
print "Cities: @cities\n";
}
Notice how the assignment to @middle
has
gone - it is no longer relevant. This is not to say nothing is
returned from the function - you can test it for truth to see if it
was successful - but it doesn't return a list of variables.
Splice is also the way to delete
elements from an array. In fact, a discussion of :
Deleting Variables
is in order. Suppose we want to
delete Hamburg from the following array. How do we do it ? Perhaps :
@cities=("Brussels","Hamburg","London","Breda");
&look;
$cities[1]="";
&look;
sub look {
print "Cities: ",scalar(@cities), ": @cities\n";
}
would be appropiate. Certainly Hamburg is removed. Shame, such a
great lake. But note, the array element still exists. There are still
four elements in @cities
.
So what we need is the appropiate splice
function, which removes the element entirely.
splice (@cities, 1, 1);
Now that's all well and good for arrays. What about ordinary
variables, such as these :
$car ="Porsche 911";
$aircraft="G-BBNX";
&look;
$car="";
&look;
sub look {
print "Car :$car: Aircraft:$aircraft:\n";
print "Aircraft exists !\n" if $aircraft;
print "Car exists !\n" if $car;
}
It looks like we have deleted the $car
variable. Pity. But think about it. It is not deleted,
it is just set to the null string "". As you recall
(hopefully) from previous ramblings, the null string evaluates to
false so the if
test fails.
Just because something is false
doesn't mean to say it doesn't exist. A wig is false hair, but a wig
exists. Your variable is still there. Perl does have a function to
test if something exists. Existence, in Perl terms, means defined. So
:
print "Car is defined !\n" if defined $car;
will evaluate to true, as the $car
variable does in fact exist.
This begs the question of how to
really wipe variables from the face of the earth, or at least your
Perl script. Simple.
$car ="Porsche 911";
$aircraft="G-BBNX";
&look;
undef $car; # this undefines $car
&look;
sub look {
print "Car :$car: Aircraft:$aircraft:\n";
print "Aircraft exists !\n" if $aircraft;
print "Car exists !\n" if defined $car;
}
This variable $car
is
eradicated, deleted, killed, destroyed.
And now for something completely
different....
Regular
Expressions
Or regex for short. These
can be a little intimdating. But I'll be you have already used some
regex in your computing life so far. Have you even said "I'll
have any German beer ?" That's a regex which will match a
Grolsch or Becks, but not a Budweiser, orange juice or cheese
toastie. What about dir *.txt ? That's a regular expression
too, listing any files ending in .txt.
Perl's regex often look like this :
$name=~/piper/
That is saying "If 'piper' is inside $name
,
then True."
The regular expression itself is
between / /
slashes, and
the =~
operator assigns the
target for the search.
An example is called for. Run this,
and answer it with 'the faq'. Then try 'my tealeaves' and see what
happens.
print "What do you read before joining any Perl discussion ? ";
chomp ($_=<STDIN>);
print "Your answer was : $_\n";
if ($_=~/the faq/) {
print "Right ! Join up !\n";
} else {
print "Begone, vile creature !\n";
}
So here $_
is searched for
'the faq'. Guess what we don't need ! The =~
. This works just as well:
if (/the faq/) {
because if you don't specify a variable, then perl searches $_
by default. In this particular case, it would be better
to use
if ($_ eq "the faq") {
as we are testing for exact matches.
But what if someone enters 'The
FAQ' ? It fails, because the regex is case sensitive. We can easily
fix that :
if (/the faq/i) {
with the /i
switch, which
specifies case-insensivity. Now it works for all variations, such as
"the Faq" and "the FAQ".
Now you can appreciate why a
regular expression is better in this situation than a simple test
using eq
. As the regex
searches one string for another string, a response of "I would
read the FAQ first !" will also work, because "the FAQ"
will match the regex.
Study this example just to clarify
the above. Tabs and spaces have been added for asthetic beauty :
$_="perl for Win32"; # sets the string to be searched
if ($_=~/perl/) { print "Found perl\n" }; # is 'perl' inside $_ ? $_ is "perl for Win32".
if (/perl/) { print "Found perl\n" }; # same as the regex above. Don't need the =~ as we are testing $_
if (/PeRl/) { print "Found PeRl\n" }; # this will fail because of case sensivitiy
if (/er/) { print "Found er\n" }; # this will work, because there is an 'er' in 'perl'
if (/n3/) { print "Found n3\n" }; # this will work, because there is an 'n3' in 'Win32'
if (/win32/) { print "Found win32\n" }; # this will fail because of case sensivitiy
if (/win32/i) { print "Found win32 (i)\n" }; # this will *work* because of case insensivitiy (note the /i)
print "Found!\n" if / /; # another way of doing it, this time looking for a space
print "Found!!\n" unless $_!~/ /; # both these are the same, but reversing the logic with unless and !
print "Found!!\n" unless !/ /; # don't do this, it will always never not confuse nobody :-)
# the ~ stays the same, but = is changed to ! (negation)
$find=32; # Create some variables to search for
$find2=" for "; # some spaces in the variable too
if (/$find/) { print "Found '$find'\n" }; # you can search for variables like numbers
if (/$find2/) { print "Found '$find2'\n" }; # and of course strings !
print "Found $find2\n" if /$find2/; # different way to do the above
As you can see from the last example, you can embed a variable in the
regex too. Regular expressions could fill entire books (and they have
done, see the book critiques at http://www.perl.com/) but here are
some useful tricks:
@names=qw(Karlson Carleon Karla Carla Karin Carina Needanotherword);
foreach (@names) { # sets each element of @names to $_ in turn
if (/[KC]arl/) { # this line will be changed a few times in the examples below
print "Match ! $_\n";
} else {
print "Sorry. $_\n";
}
}
This time @names
is
initalised using whitespace as a delimiter instead of a comma. qw
refers to 'quote words', which means split the list by
words. A word ends with whitespace (like tabs, spaces, newlines etc).
The square brackets enclose single
characters to be matched. Here either Karl
or Carl
must
be in each element. It doesn't have to be two characters, and you can
use more than one set. Change Line 4 in the above program to :
if (/[KCZ]arl[sa]/) {
matches if something begins with K, C, or Z, then arl, then either s
or a. It does not match KCZarl. Negation is possible too, so
try this :
if (/[KCZ]arl[^sa]/) {
which returns things beginning with K, C or Z,
then arl, and then anything EXCEPT s or a. The
caret ^
has to be the first
character, otherwise it doesn't work as the negation. Having said [
]
defines single characters only, I should mention than
these two are the same :
/[abcdeZ]arl/;
/[a-eZ]arl/;
if you use a hypen then you get the list of characters icluding the
start and finish characters. And if you want to match a special
character (metacharacter), you must escape it :
/[\-K]arl/;
matches Karl or -arl. Although the -
character is represented by two characters, it
is just the one character to match.
If you want to match at the end of
the line, make sure a $
is
the last character in the regex. This one pulls out all those names
ending in a. Slot it into the example above :
if (/a$/) {
And there is a corresponding character, the caret ^
, which in this context matches at the beginning
of the string. Yes, the caret also negates a character class like
this [^KCZ]arl
but in this
case it anchors the match to the beginning of the string.
if (/n/i) {
if (/^n/i) {
The first one is true if the word contains an 'n' anywhere in it. The
second specifies that the 'n' must be at the beginning of the string
to be matched. Use this anchor where you can, because it makes the
whole regex faster, and safer if you know what the first character
must be.
If you want to negate the entire
regex change =~
to !~
(Remember !
means
'not equal to'.)
if ($_ !~/[KC]arl/) {
Of course, as we are testing $_
this
works too :
if (!/[KC]arl/) {
Returning the Match
Now things get interesting. What if
we want pull something out of a string ? So far all we have done is
test for truth, that is say yea or nay if a string matches, but not
return what we found. Run this :
$_='My email address is <Robert@NetCat.co.uk>.';
/(<robert\@netcat.co.uk>)/i;
print "Found it ! $1\n";
Firstly, note the single quotes when $_
is assigned. If there were double quotes, we'd need
\@
instead of
@
. Remember,
double quotes ""
allow
variable interporlation, so Perl looks for an array called @NetCat
which does not exist.
Secondly, look at the parens around
the entire regex. If you use parens, a side effect is that the first
match is put into a variable called $1
.
We'll get to the main effect later. The second match goes into $2
and so on. Also note that the \@
has been escaped, so perl doesn't think it is an array.
Remember \
either escapes a
special character, or gives a special meaning. Think of it as
Superman's telephone box. Imagine Clark Kent walking around with with
his magic partner Back Slash.
Notice how we specify in the regex
case-insensitivity with /i
and
the regex returns the case-sensitive string - that is, exactly
what it found.
Try the regex without parens. Then
try this one :
/<(robert)\@netcat.co.uk>/i;
You can put the parens anywhere. More or less. Now, run this :
$_='My email address is <Robert@NetCat.co.uk>.';
/<(robert)\@(netcat.co.uk)>/i;
print "Found it ! $1 at $2\n";
See, you can have more than one ! Look at the above regex. Looks easy
now, don't you think ? What about five minutes ago ? It would have
looked like a typing mistake ! Well, there are some hairer regex to
come, but you'll have a good barber.
What if we didn't know what the
email address was going to be ?
$_='My email address is <webslave@work.com>.';
print "Found it ! :$1:" if /(<.*>)/i;
When you see an if
statement
like this, read it right to left. The print
statement is only executed if code on the right of the
expression is true.
We'll discuss this. Firstly, we
have the opening parens (
.
So everything from (
to )
will be put into $1
if
the match is successful. Then the first character of what we are
searching for, <
. Then
we have a dot, or period .
.
For this regex, we can assume .
matches
any character at all.
So we are now matching <
followed by any character. The *
means 0 or more of the previous character. The regex
finishes by requiring >
.
This is important. Get the basics
right and all regex are easy (I read somewhere once). An example best
illustrates the point. Slot this regex in instead :
$_='My email address is <webslave@work.com>.';
print "Found it ! :$1:" if /(<*>)/i;
What's happening here ?
The regex starts, logically, at the
start of the string. The first thing it finds is not <
, but a nothing in between the start of the string and
the 'M' from 'My email...". Does this match ?
As are are looking for "0 or
more" <
, we can
certainly say that there are 0 <
at
the start of the string. So the match is, so far, successful. We have
dealt with <*
.
However, the next item to match is
>
. Unfortunately, the
next item in the string is 'M', from 'My email..". The match
fails at this point. Sure, it matched <
without any problem, but the complete match has
to work.
The only two characters that can
match successfully at this point are <
or >
. The
<
matches because it
falls into the '0 or more' specified with *
, and >
will
match because it is the next character specifed in the regex.
'M' is neither of them, so it
fails.
Quick clarification - the regex
cannot successfully match <
,
then skip on ahead until it matches >
. The characters in between <
>
also need to match the regex, and they
don't in this case.
All is not lost. Regexes are hardy
little beasts and don't give up easily. An attempt is made to match
the regex wherever possible. The regex system keeps trying the match
at every possible place in the string, working towards the end.
Let's look at the match when it
reaches the 'm' in 'work.com'.
Again, we have here 0 <
. So the match works as before. After success on <*
the next character is analysed - it is a >
, so the match is successful.
That's *
explained. Just to consolidate, a quick look at :
$_='My email address is <webslave@work.com>.';
print "Match 1 worked :$1:" if /(<*)/i;
$_='<My email address is <webslave@work.com>.';
print "Match 2 worked :$1:" if /(<*)/i;
$_='My email address is <webslave@work.com<<<<>.';
print "Match 3 worked :$1:" if /(<*>)/i;
Match 1 is true. It doesn't return anything, but it is true because
there are 0 <
at the
very start of the string.
Match 2 works. After the 0 <
at the start of the string, there is 1 <
so the regex can match that too.
Match 3 works. After the failing on
the first <
, it jumps
to the second. After that, there are plenty more to match right up
until the required ending.
Glad you followed that. Now, pay
even closer attention ! Concentrate fully on the task at hand ! This
should be straightforward now :
$_='HTML <I>munging</I> time !.';
/<I>(.*)<\/I>/i;
print "Found it ! $1\n";
Pretty much the same as the above, except the parens are moved so we
return what's only inside the tags, not including the tags
themselves. Also note how /
is
escaped like so : \/
otherwise
Perl thinks that's the end of the regex.
Now, suppose we change $_
to :
$_='HTML <I>munging</I> time is here <I>again</I> !.';
and run it again. Interesting effect, eh ? This is known as Greedy
Matching. What happens is that when Perl finds the inital match, that
is <I>
it jumps right
to the end of the string and works back from there to find a match,
so the longest string matches. This is fine unless you want the
shortest string. And there is a solution :
/<I>(.*?)<\/I>/i;
Just add a question mark and Perl does stingy matching. No
nationalistic jokes. I have Dutch and Scottish friends I don't want
to offend.
Suppose we didn't know what HTML
tag we had to match ? It could be B, I, EM or whatever, and we want
everything that is in between. Well, HTML container tags like B and
EM have end tags which are the same as the start tag, except for the
/ . So what we could do is :
find out what is inside < >
search for exactly the same
tag, but with the closing /
return whatever is in between.
Can this be done ? Of course. This
is perl, all things are possible. Now, remember the side effect of
parens. I promise I'll explain the primary effect at some point. If
whatever is in (parens) matches, the result is stored in a variable
called $1
. So we can use
<(.*?)>
which will
find us <
then as many
anythings (the .
and *
) up to the next, not last >
(the ?
forces
stingy matching).
The result is stored in $1
because we used parens. Next, we need everything up to
the closing tag. That's easy : (.*?)
matches everything up until the next character or set
of characters. And how exactly do we define where to stop ?
We can use $1
even in the same regex it was found in. However, it is
not referred to within a regex as $1
,
but \1
.
So we want to match </$1>
which in perl code is <\/\1>
. The /
must
be escaped because it is the end of the regex, and 1
is escaped so it refers to $1
instead of matching the number 1.
Still here ? This is what it looks
like :
$_='HTML <I>munging</I> time is here <I>again</I> !.';
/<(.*?)>(.*?)<\/\1>/i;
print "Found it ! $2\n";
If you want to know how to return all the matches above, read on. But
before that - How to Avoid Making Mountains while Escaping Special
Characters.
You want to match this :
http://language.perl.com/faq/
. That's a real (useful)
URL by the way. Hint. To match it, you need to do this :
/http:\/\/language\.perl\.com\/faq\//;
which should make the awful metaphor above clearer, if not funnier.
The slash, /
, is not
normally a metacharacter but as it is being used for the regular
expression delimters, it needs to be escaped. We already know that .
is special.
Fortunately for our eyes, Perl
allows you to pick your delimter if you prefix it with 'm' as this
example shows. We'll use a # :
m#http://language\.perl\.com/faq/#;
Which is a huge improvement, as we change /
to #
. We can
go further with readability by quoting everything :
m#\Qhttp://language.perl.com/faq/\E#;
The \Q
escapes everything
up until \E
or the regex
delimiter (so we don't really need the \E above). In this case #
will not be escaped, as it delimits the regex.
Someone once posted a question
about this to the Perl-Win32-Users mailing list and I was so
intruiged about this apparently undocumented trick I spent the next
twenty minutes figuring it out by trial and error, and posted a
reply. Next day I found lots of messages telling the poster to read
the manual because it was clearly documented. <face colour='red'
intensity='high'> My excuse was I didn't have the docs to
hand....moral of the story - RTFM and RTF FAQs !
Subsitution
Suppose you want to replace bits of
a string. For example, 'us' with 'them'.
$_='Us ? The bus usually waits for us, unless the driver forgets us.';
print "$_\n";
s/Us/them/; # operates on $_, otherwise you need $foo=~s/Us/them/;
print "$_\n";
What happens here is that the string 'Us' is searched for, and when a
match is found it is replaced with the right side of the expression,
in this case 'them'. Simple.
You'll notice that only one
substition was made. To match globally use /g
which runs through the entire string, changing wherever
it can. Try:
s/Us/them/g;
which fails. This is because regexes are not, by default,
case-sensitive. So:
s/us/them/ig;
would be a better bet. Now, everything is changed. A little too much,
but one problem at a time. Everything you have learn about regex so
far can be used with s///
,
like parens, character classes [ ]
,
greedy and stingy matching and much more. Deleting things is easy
too. Just specify nothing as as the replacement character, like so
s/Us//;
.
So we can use some of that
knowledge to fix this problem. We need to make sure that a space
precedes the 'us'. What about :
s/ us/them/g;
An small improvement. The first 'Us' is now no longer changed, but
one problem at a time ! We'll first consider the problem of the regex
changing 'usually' and other words with 'us' in them.
What we are looking for is a space,
then 'us', then a comma, period or space. We know how to specify one
of a number of options - the character class.
s/ us[. ,]/them/g;
Another tiny step. Unfornately, that step wasn't really in the right
direction, more on the slippery slope to Poor Programming Practice.
Why ? Because we are limiting ourselves. Suppose someone wrote ' send
it to us; when we get it'.
You can't think of all the possible
permutations. It is often easier, and safer, to simply state what
must not follow the match. In this case, it can be anything
except a letter. We can define that as a-z. So we can add that to the
regex.
s/ us[^a-z]/ them/g;
the caret ^
negates the
character class, and a-z
represents
every alphabet from a to z inclusive. A space has been added to the
substiution part - as the orignal space was matched, it should be
replaced to maintain readability.
What would be more useful is to use
a-zA-Z
instead. If we
weren't using /i
we'd need
that. As a-zA-Z
is such a
common construct, Perl provides an easy shorthand :
s/ us[^\w]/ them/g;
The \w
construct actually
means 'word' - equivalent to a-zA-Z_0-9
.
So we'll use that instead.
To negate any construct, simply
captalise it :
s/ us[\W]/ them/g;
and of course we don't need the negating caret now. In fact, we don't
even need the character class !
s/ us\W/ them/g;
So far, so good. Matching the first 'us' is going to be difficult
though. Fortunately, there is an easy solution. We've seen Perl's
definition of a word - \w
.
Between each word is a boundary. You can match this with \b
.
s/\bus\W/ them/g;
(that's \b
followed by
'us', not 'bus' :-)
Now, we require a word boundary before 'us'.
As there is a 'nothing' at the start of the string, we have a match.
There is a space after the first 'Us', so the match is successful.
You might notice an extra space has crept in - that's the space we
added earlier. The match doesn't include the space any more - it
matches on the word boundary, that is just before the word begins.
The space doesn't count.
Did you notice the final period and
the comma are replaced ? They are part of the match - it is the \W
that matches them. We can't avoid that. We can however
put back that part of the match.
s/\bus(\W)/them\1/g;
We start with capturing whatever the \W
matches, using parens. Then, we add it to the
replacement string. The capture is of course in $1
, but as it is in a regex we refer to it as \1
.
The final problem is of course
captalising the replacement string when appropiate. Well, I have to
leave something as an excerise for the reader :-)
There are several more constructs.
We'll take a quick look at \d
which
means anything that is a digit, that is 0-9
. First we'll use the negated form, \D
, which is anything except 0-9
:
print "Enter a number :";
chop ($input=<STDIN>);
if ($input=~/\D/) {
print "Not a number !!!!\n";
} else {
print 'Your answer is ',$input x 3,"\n";
}
this checks that there are no non-number characters in $x
. It's not perfect because it'll choke on decimal
points, but it's just an example. Writing your own number-checker is
actually quite difficult, but it is an interesting exercise. Try it,
and see how accurate yours is.
I hope you trusted me and typed the
above in exactly as it is show (or pasted it), becaus the x
is not a mistake, it is a feature. If you were too
smart and changed it to a *
or
something change it back and see what it does.
Of course, there is another way to
do it :
unless ($input=~/\d/) {
print 'Your answer is ',$input x 3,"\n";
} else {
print "Not a number !!!!\n";
}
which reverses the logic with an unless
statement.
More Matching
Assume we have :
$_='HTML <I>munging</I> time is here <I>again</I> !.';
and we want to find all the italic words. We know that /g
will match globally, so surely this will work :
$_='HTML <I>munging</I> time is here <I>again</I> ! What <EM>fun</EM> !';
$match=/<i>(.*?)<\/i>/ig;
print "$match\n";
except it returns 1, and there were defintely two matches. The match
operator returns true or false, not the number of matches. So you can
test it for truth with functions like if,
while, unless
Incidentally, the s///
operator does return the number of substutions.
To return what is matched, you need
to supply a list.
($match) =~ /<i>(.*?)<\/i>/i;
which handily puts all the first matche into $match
. The parens force a list context in this case. There
is just the one element in the list, but it is still a list. The
entire match will be assigned to the list, or whatever is in the
parens. Try adding some parens :
$_='HTML <I>munging</I> time is here <I>again</I> ! What <EM>fun</EM> !';
($word1, $word2) = /<i>(.*?)<\/i>/ig;
print "Word 1 is $word1 and Word 2 is $word2\n";
In the example above notice /g
has
been added so a global replacement is done - this means perl
carries on matching even after it finds the first match. Of course,
you might not know how many matches there will be, so you can just
use an array (or other type of list) :
$_='HTML <I>munging</I> time is here <I>again</I> ! What <EM>fun</EM> !';
@words = /<i>(.*?)<\/i>/ig;
foreach $word (@words) {
print "Found $word\n";
}
and @words
will be grown to
the appropiate size for the matches. You really can supply what you
like to be assigned to :
($word1, @words[2..3], $last) = /<i>(.*?)<\/i>/ig;
you'll need more italics for that last one to work. It was only a
demonstration.
There is more another trick worth
knowing. Because a regex returns true each time it matches, we can
test that and do something every time it returns true. The ideal
function is while
which
means 'do something as long the condition I'm testing is true'. In
this case, we'll print out the match every time it is true.
$_='HTML <I>munging</I> time is here <I>again</I> ! What <EM>fun</EM> !';
while (/<(.*?)>(.*?)<\/\1>/g) {
print "Found the HTML tag $1 which has $2 inside\n";
}
So the while operator runs the regex, and if it is true, carries out
the statements inside the block.
Try running the program above
without the /g
. Notice how
it loops forever ? That's because the expression always evaluates to
true. By using the /g
we
force the match to move on until it eventually fails.
Now we know this, an easy way to
find the number of matches is :
$_='HTML <I>munging</I> time is here <I>again</I> ! What <EM>fun</EM> !';
$found++ while /<i>.*?<\/i>/ig;
print "Found a $found matches\n";
You don't need braces in this case as nothing apart from the
expression to be evaluated follows the while
function.
Parentheses Again
The real use for them. Precendence.
Try this :
$_='One word sentences ? Eliminate. Avoid cliches like the plague. They are old hat.';
while (/o(rd|ne|ld)/gi) {
print "Matched $1\n";
}
Firstly, notice the subtle introduction of the or
operator, in this case |
,
the pipe. What I really want to explain however, is that this regex
matches o followed by rd, ne or ld. Without the parens it would be
/ord|ne|ld/
which is
defintely not what we want. That matches just plain ord, or ne or ld.
Finally, take a look at this :
$_='I am sleepy....zzzz....DING ! Wake Up!';
if (/(z{5})/) {
print "Matched $1\n";
} else {
print "Match failed\n";
}
The braces { }
specify how
many of the preceding character to match. So z{2}
matches exactly two 'z's and so on. Change z{5}
to z{4}
and
see how it works. And there's more...
/z{3}/
|
3 z only
|
/z{3,}/
|
At least 3 z
|
/z{1,3}/
|
1 to 3 z
|
/z{4,8}/
|
4 to 8 z
|
To any of the above you may suffix
an question mark, the effect of which is demonstrated thus :
print "How many letters do you want to match ? ";
chomp($num=<STDIN>);
# we assign and print in one smooth move
print $_="The lowest form of wit is indeed sarcasm, I don't think.\n";
print "Matched \\w{$num,} : $1 \n" if /(\w{$num,})/;
print "Matched \\w{$num,?}: $1 \n" if /(\w{$num,}?)/;
The first match is 'match any word (that's a-Z0-9_
)
equal to or longer than $num
character,
and return it.' So if you enter 4, then 'lowest' is returned. The
word 'The' doesn't match.
The second match is exactly the
same, but the ?
forces a
minimal match, so only the part actually matched is returned.
Just to clear this up, amend the
program thus :
print "\nMatched \\w{$num,} :";
print "$1 " while /(\w{$num,})/g;
print "\nMatched \\w{$num,?} :";
print "$1 " while /(\w{$num,}?)/g;
Note the addition of /g
.
Try it without - notice how the match never moves on ?
And now on the Regex Programme
Today, we have guest stars Prematch, Postmatch and Match. All of whom
are going to slow our entire programme down, but are useful anyway :
$_='I am sleepy....snore....DING ! Wake Up!';
/snore/; # look, no parens !
print "Postmatch: $'\n";
print "Prematch: $`\n";
print "Match: $&\n";
If you are wondering what the difference between match and using
parens is you should remember than you can move the parens around,
but you can't vary what $&
and
its ilk return. Also, using any of the above three operators does
slow your entire program, whereas using parens will just slow the
particular regex you use them for. However, once you've used one of
the three matches you might as well use them all over the place as
you've paid the speed penalty. Use parens where possible.
RHS Expressions
RHS means Right Hand Side. Suppose
we have an HTML file, which contains :
<FONT SIZE=2> <FONT SIZE=4> <FONT SIZE=6>
and we wish to double the size of each font so 2 becomes 4 and 4
becomes 8 etc. What about :
$data="<FONT SIZE=2> <FONT SIZE=4> <FONT SIZE=6>";
print "$data\n";
$data=~s/(size=)(\d)/\1\2 * 2/ig;
print "$data\n";
which doesn't really work out. What this does is match size=x
,
where x
is any digit. The first match, size=
,
goes into $1
and the second
match, whatever the digit is, goes into $2
. The second part of the regex simply prints $1
and $2
(referred
to as \1
and \2
), and attempts to multiply $2
by 2. Remember /i
means
case insensitve matching.
What we need to do is evaluate the
right hand side of the regex as an expression - that is not just
print out what it says, but actually evaluate it. That means work it
through, not blindly treat it as string. Perl can do this :
$data=~s/(size=)(\d)/$1.($2 * 2)/eig;
A little explanation....the LHS is the same as before. We add /e
so Perl evaluates the RHS as an expression. So we need
to change \1
into $1
and so on. The parens are there to ensure that $2
* 2
is evaluated, then joined to $1
. And that's it ! Here's another example, which is a
little more cunning :
$red="96000";
$white="FFFFF";
$yellow="FFFF33";
$data='<FONT COLOR=yellow> <FONT COLOR=white> <FONT COLOR=red>';
print "$data\n";
$data=~s/(color=)(\w+)/$1.${$2}/eig;
print "$data\n";
This one is interesting because it refers to a variable of the same
name as the replaced string. The { }
are
needed around $2
to force
Perl to realise that you mean a scalar variable with the name of
whatever $2
. contains. The
\w
matches a single word
character, +
means 'one or
more of' the character immediately before it. In this case, \w
.
Of course, this regex does not
consider quoted parameters to HTML tags but I have to leave something
as an exercise for the reader...
It is even possible to have more
than one /e
. For example:
$data='important perl names are $names';
$names="Camel, Llama, ActiveState, Perl";
print "$data\n";
$data=~s/(\$[a-zA-Z]+)/$1/ee;
print "$data\n";
This is very useful. Notice that \w
is
not used. This is because \w
will
match [a-zA-Z_0-9]
and Perl
variables may not start with a number. This is because $1,
$2
etc are of course reserved for use by regex. You
could write a more complicated regex to more precisely match
variables, but that's a start.
Split and Join
While you are in the regex mood, a
quick look at split
and
join
. Destruction is
always easier (just ask your car mechanic), so lets start with split
.
$_='Piper:PA-28:Archer:OO-ROB:Antwerp';
@details=split /:/, $_;
foreach (@details) {
print "$_\n";
}
Here we give split
is given
two arguments. The first one is a regex specifying what to split on.
The next is what to split. Actually, I could leave $_
out because as usual it is the default if nothing is
specifed.
The assignment can either be a
scalar variable or a list like an array (or hash, but at this time
'hash' to you means what you think the Dutch do or a silly drinking
event spoilt by some running). If it's a scalar variable you get the
number of elements the split has splut. Should that be 'the split has
splittered' or 'the split has splat'. Hmmm. Probably 'the split has
split'. You know what I mean. I think I just generated a Fatal Error
in English.dll. Whoops. In any case, splitting to a scalar variable
is not always a Good Thing, as we'll see later.
If the assigment is an array, then
as you can see in the above example the array is created with the
relevant elements in order. You can also assign to scalars, for
example :
$_='Piper:PA-28:Archer:OO-ROB:Antwerp';
($maker,$model,$name,$reg,$location) = split /:/, $_;
(@aircraft[0..1],$aname,@regdetails) = split /:/, $_;
$number=split /:/ ; # not bothering with the $_ at the end, as it is the default
print "Using the first 'split'\n";
print "$reg is a $maker $model $name based in $location\n";
print "There are $number details available on this aircraft\n\n";
print "Using the second 'split'\n";
print "You can find $regdetails[0], an $aircraft[1], $regdetails[1]\n";
This demonstrates that a list can be a list of scalar variables
(which is bascially what an array is anyway), and that you can easily
see how many elements the expression can be split into.
The example below adds a third
paramter to split, which is how many elements you want returned. If
you don't want the extra stuff at the end pop
it.
$_='Piper:PA-28:Archer:OO-ROB:Antwerp';
@details=split /:/, $_, 3;
foreach (@details) {
print "$_\n";
}
In the example below we split
on
whitespace. Whitespace, in perl terms, is a space, tab,
newline, formfeed or carraige return. Instead of writing \t\n\f\r
for each of the above, you can simply use \s
, or the negated version \S
which means anything except whitespace. Think of
whitespace as anything you know is there, but you can't see.
The whitespace split
is specially optimised for speed. I've used spaces,
double spaces, a tab and a newline in the list below. Also note the +
, which means one or more of the preceding character,
so it will split
on any
combination of whitespace. And I think the final split
is useful to know. The split
function does not return the delimiter, so in this case
the whitespace will not be returned.
$_='Piper PA-28 Archer OO-ROB
Antwerp';
@details=split /\s+/, $_;
foreach (@details) {
print "$_\n";
}
@chars=split //, $details[0];
foreach $char (@chars) {
print "$char !\n";
}
The following question has come up at least three times in the
Perl-Win32-Users mailing list. Can you answer it ?
"My data is delimited by |, for example :
name|age|sex|height|
Why doesn't
@array=split /|/, $line;
work ?"
Why indeed. If you don't already know the answer, some simple
troubleshooting steps can be applied. First, create a sample program
and run it.
$line='name|age|sex|height';
@array=split /|/,$line;
print join "\n",@array;
The effect is to split
each
character. The |
is
returned/. As it is the delimiter, |
should be ignored, not returned.
At this point you should be
thinking 'metacharacter'. A little research (looking at the
documentation) will reveal that |
is
indeed a metacharacter, which means 'or'. So, in effect, the regex
/|/
means 'nothing, or
nothing'. The split
is
therefore performed on 'nothings', and there are 'nothings' in
between each character. The solution is easy ; /\|/
.
So that's the fun stuff,
destruction. Now to put it back together again with join
.
What Humpty Dumpty
needs : Join
$w1="Mission critical ?";
$w2="Internet ready modems !";
$w3="J(insert your cool phrase here)"; # anything prefixed by 'J' is now cool ;-)
$w4="y2k compatible.";
$w5="We know the Web.";
$w6="...the leading product in an emerging market.";
$cool=join ' ', $w1,$w2,$w3,$w4,$w5,$w6;
print $cool;
Join takes a 'glue' operator, which is not a regular
expression. It can be a scalar variable however. In this case it is a
space. Then it takes a list, which can either be a list of scalar
variables, an array or whatever as long as its a list. And you can
see what the result is. You could assign it to an array, but you'd
end up with everything in the first element of the array.
The example below adds an array
into the list, and demonstrates use of a variable as the delimiter.
$w1="Mission critical ?";
$w2="Internet ready modems !";
$w3="J(insert your cool phrase here)"; # anything prefixed by 'J' is now cool ;-)
$w4="y2k approved, tested and safe !";
$w5="We know the Web.";
$w6="...the leading product in an emerging market.";
@morecool=("networkable","compatible");
$sep=" ";
$cool=join $sep, $w1,$w2,$w3,@morecool,$w4,$w5,$w6;
print $cool;
Aren't you wishing you could mix and match randomly so you too could
get a job marketing vapourware ? Heh.
@cool=("networkable","compatible","Mission critical ?","Internet ready modems !",
"J(insert your cool phrase here)","y2k approved, tested and safe !",
"We know the Web.","...the leading product in an emerging market.");
srand;
print "How many phrases would you like ?";
while (1) {
chop ($input=<STDIN>);
if ($input <$#cool && $input > 0) {
last;
}
print 'Wrong. Try again !';
}
for (1..$input) {
$index=int(rand $#cool);
print "$cool[$index] ";
splice @cool, $index, 1;
}
A few things to explain. Firstly, while
(1) {
. We want an everlasting loop, and this one way
to do it. 1 is always true, so round it goes. We could test $input
directly, but that wouldn't allow last
to be demonstrated.
Everlasting loops aren't useful
unless you are a politician being interviewed. We need to break out
at some point. This is done by the last
function. When $input
is
between 1 and the number of elements in @cool
then out we go. (You can also break out to labels, in
case you were wondering. And break out in a sweat. Don't start now if
you weren't.)
The srand
operator initalises the random number generator. Works
ok for us, but CGI programmers should think of something different
because their programs are so frequently run (they hope :-).
rand
generates a random number between 0 and 1, or 0 and a
number it is given. In this case, the number of elements of @cool
. The int
function
makes sure it is an integer, that is no messy bits after the decimal
point.
The splice
function removes the printed element from the array so
it won't appear again. Don't want to stress the point.
Another Join Type
Operator
There is another joining operator,
this time the humble dot, or period : .
.
This concatanates (joins) variables :
$x="Hello";
$y=" World";
$z="\n";
print "$x\n"; # print $x and a newline
$prt=$x.$y.$z; # make a new var $prt out of $x, $y and $z
print $prt;
$x.=$y." again ".$z; # add stuff to $x
print $x;
Files
Perl is very good at handling
files. Create, in your perl scripts directory c:\scripts
,
a file called stuff.txt
. Copy the following into it :
The Main Perl Newsgroup:comp.lang.perl.misc
The Perl FAQ:http://www.perl.com/faq/
Where to download perl:http://www.activestate.com/
Now, to open and do things with this file. First, we must open the
file and assign it to a filehandle. All operations will be
done on the file via the filehandle. Earlier, we used <STDIN>
as a filehandle - we read from it.
$stuff="c:\scripts\stuff.txt";
open STUFF, $stuff;
while (<STUFF>) {
print "Line number $. is : $_";
}
What this script does is fail. What is should do is open the
file defined in $stuff
,
assign it to the filehandle STUFF
and
then, while there are still lines left in the file, print the line
number $.
and the current
line.
It fails. That's not so bad,
everything fails sometimes. What is unforgivable is NOT CHECKING THE
ERROR CODE !
This is a better line:
open STUFF, $stuff or die "Cannot open $stuff for read :$!";
If the open
operation
fails, the or
means that
the code on the RHS (right hand side) is evaluated. Perl dies. This
means it exits the script with a and tells you the line number at
which it died.. The error code is in $!
,
which we print.
Always check your return codes !
The problem should now be apparent.
The backslashes, being escape characters, are not displayed. There
are two ways to fix this :
Escape the backslashes, like
so $stuff="c:\\scripts\\stuff.txt";
Convert backslashes into
forward slashes : $stuff="c:/scripts/stuff.txt";
The forward slashes are the
preferred option, even under Win32, because you can then port the
script direct to Unix or other platforms (assuming you don't use
drive letters), and it is less typing. If you wish to use Perl to
start external processes then you must use the \\
method, but this variable will be used only in a Perl
program, not as a parameter to start an external program. Changing
the $stuff
variable results
in a working script. Always check your return codes !
$stuff="c:/scripts/stuff.txt";
open STUFF, $stuff or die "Cannot open $stuff for read :$!";
while (<STUFF>) {
print "Line $. is : $_";
}
A little more detail on what is happening here. The file is opened
for read. You can append and write too. You don't have to use
a variable, but I always do because it is then easy to change and
easy to insert into the or die
section,
and it is easy to change later on. Hardcoding things is not the best
way to write a maintainable and flexible program. Just ask the Year
2000 people about code that lived a little longer than the authors
imagined :-).
open STUFF, "c:/scripts/stuff.txt" or die "Cannot open stuff.txt for read :$!";
is just as good but more work if you want to change anything.
The line input operator (that's the
angle brackets <>
reads
from the beginning of the file up until and including the first
newline. The read data goes into $_
,
and you can do what you want with it there. On the next iteration of
the loop data is read from where the last read left off, up to the
next newline. And so on until there is no more data. When that
happens the condition is false and the loop terminates. That's the
default behaviour, but we can change this.
This means that you can open a
200Mb file in perl and run through it without having to load the
entire file into memory. 200Mb of memory is quite a bit. If you
really want to load the entire 200Mb file into one variable, Perl
lets you. Limits are not the Perl Way.
The special variable $.
is the current line number, starting at 1.
As usual, there is a quicker way to
do the previous program.
$STUFF="c:/scripts/stuff.txt";
open STUFF or die "Cannot open $STUFF for read :$!";
while (<STUFF>) {
print "Line $. is : $_";
}
and as that saves a little bit of typing I tend to use it. Reduces
the possibility for eror too. In fact, that entire program could be
compressed further, but that's for later.
Writing to a File
$out="c:/scripts/out.txt";
open OUT, ">$out" or die "Cannot open $stuff for write :$!";
for $i (1..10) {
print OUT '$i : The time is now : ',scalar(localtime);
}
Note the addition of >
to
the filename. This opens it for writing. If we want to print to the
filehandle, we now just specify the filehandle name. Filehandles
don't have to be captalised, but it is wise. All Perl functions are
lowercase, and Perl is case-sensitive. So if you choose uppercase
names they are guaranteed not to conflict with current or future
function words.
And a neat way to grab the date
sneaked in there too. More on dates later.
$out="c:/scripts/out.txt";
&printfile;
open OUT, ">>$out" or die "Cannot open $out for append :$!";
print OUT 'The time is now : ',scalar(localtime),"\n";
close OUT;
&printfile;
sub printfile {
open IN, $out or die "Cannot open $out for read :$!";
while (<IN>) {
print;
}
close IN;
}
This script demonstrates subrountines again, and how to append to a
file, that is write additional data at the end. The close
function is introduced here. This, well, closes a
filehandle. You don't have to close a filehandle - just leave it open
until the script finishes, or the next open command to the same
filehandle will close it for you.
Perl has a special array called
@ARGV
. This is the list of
arguments passed along with the script name on the command line. Run
the following perl script as :
perl myscript.pl hello world how are you
foreach (@ARGV) {
print "$_\n";
}
Another useful way to get parameters into a program - this time
without user input. The relevance to filehandles is as follows. Run
the following perl script as :
perl myscript.pl stuff.txt out.txt
while (<>) {
print;
}
Short and sweet ? If you don't specify anything in the angle
brackets, whatever is in @ARGV
is
used instead. And after it finishes with the first file, it will
carry on with the next and so on. You'll need to remove non-file
elements from @ARGV
before
you use this.
It can be shorter still :
perl myscript.pl stuff.txt out.txt
print while <>
Read it right to left. It is possible to shorten it even further !
perl myscript.pl stuff.txt out.txt
print <>;
This takes a little explanation. As you know, many things in Perl,
including filehandles, can be evaluated in list or scalar context.
The result that is returned depends on the context.
If a filehandle is evaluated in
scalar context, it returns the first line of whatever file it is
reading from. If it is evaluated in list context, it returns a list,
the elements of which are the lines of the files it is reading from.
The print
function is a list operator, and therefore evaluates
everything it is given in list context. As the filehandle is
evaluated in list context, it is given a list !
Who said short is sweet ? The
shortest scripts are not usually the easiest to understand, and not
even always the quickest.
Modifying a File
One of the most frequent Perl tasks
is to open a file, make some changes and write it back to the
original filename. You already have enough knowledge to do this. The
steps are :
Make a backup copy of the file
Open the file for read
Open a new temporary file for
write
Go through the read file, and
write it and any changes to the temp file
When finished, close both
files
Delete the original file
Rename the temp file to the
original filename
Phew. Perl of course has a much
easier way. Make sure you have data in c:\scripts\out.txt
then run this:
@ARGV="c:/scripts/out.txt";
$^I=".bk"; # let the magic begin
while (<>) {
tr/A-Z/a-z/; # another new function sneaked in
print; # this goes to the temp filehandle, ARGVOUT, not STDOUT as usual, so don't mess with it !
}
Now take a look at out.txt
.
Notice how all capital letters have been translierated into
lowercase. This is the tr
operator
at work, which is more efficient than regex for changing single
characters. You should also have an out.txt.bk
file. And finally, notice the way @ARGV
has been created. You don't have to create it from the
command line arguments - it is an array just like any other.
Finally, what if your input file is
doesn't look like this :
Beer
Wine
Pizza
Catfood
which is nicely delimited with a newline each time, but like this :
shorts
t-shirt
blouse
pizza
beer
wine
catfood
Viz
Private Eye
The Independent
Byte
toothpaste
soap
towel
which is delimited by TWO newlines, not one. Now, if you want each
set of items as elements in an array you'll have to do something like
this:
$SHOP="shop.txt";
$x=0;
open SHOP or die "Can't open $SHOP for read: $!\n";
while (<SHOP>) {
if (/^\n/) { # does line begin with newline ?
$x++; # if so, increment $x. Rest of if statement not executed.
} else {
$list[$x].=$_; # glue $_ on the end of whatever is in $list[$x], using a .
}
}
foreach (@list) {
print "Items are:\n$_\n\n";
}
which works, but there is a much easier way to do it. You knew I was
going to say that.
$SHOP="shop.txt";
$/="\n\n";
open SHOP or die "Can't open $SHOP for read: $!\n";
while (<SHOP>) {
push (@list, $_);
}
foreach (@list) {
print "Items are:\n$_\n\n";
}
The $/
variable is a
special variable (it even looks special). It is the Default Input
Record Seperator. Remember the operation of the angle brackets
being to read a file in up until the next newline ? Time to come
clean. What the angle bracket actually do is read up until whatever
$/
is set to. It is set to
a newline by default.
So if we set it to two newlines, as
above, then it reads up until it finds two consecutive newlines, then
puts the data into $_
This
makes the program a lot shorter and quicker. You can set $_
to just about anything, not just a newline. If you want
to hack this list for example:
Tea:Beer:Wine:Pizza:Catfood:Coffee:Chicken:Salmon:Icecream
you could just set leave $_
as
a newline and slurp it into memory in one go, but imagine the above
item is a list of clothes than your girlfriend wants to buy or a list
of clothes your boyfriend should have thrown away by now. Either are
going to be really big files, and you might not want to read it all
into memory in one go. So set $/=":";
and all will be well. There are also read
and seek
functions,
but they aren't covered here. Those are useful for files where you
read in a precise number of bytes.
We'll go back to the last example
for a moment. It is useful to know how to read just one line (well,
up to $/
) at a time :
$SHOP="shop.txt";
$/="\n\n";
open SHOP or die "Can't open $SHOP for read: $!\n";
$clothes=<SHOP>; # everything up until the first occurence of $/ into $clothes
$food=<SHOP>; # everything from first occurence of $/ to the second into into $food
print "We need...\n",$clothes,"...and\n",$food;
And now we know that, there is a even quicker way to achieve the aim
of the original program :
$SHOP="shop.txt";
$/="\n\n";
open SHOP or die "Can't open $SHOP for read: $!\n";
@list=<SHOP>; # dumps *all* of $SHOP into @list, not just one line.
foreach (@list) {
print "Items are:\n$_\n\n";
}
and you don't need to grab it all :
@list[0..2]=<SHOP>
. We haven't mentioned list context for a while. Wheter a the line
input operator <>
returns
a single value or a list depends on the context you use it in. When
you supply @xxxxx
then this
must be a list. If you supply $xxxxx
then
that's a scalar variable. You can force it into list context by using
parens.
The two lines below are provided so
you can paste them into the above program. They demonstrate how
parens force list context. Remember to replace the foreach
with something that prints the variables.
($first, $second) = <SHOP>;
$first, $second = <SHOP>;
Associative Arrays
Very, very useful. First, a quick
recap on arrays. Arrays are an ordered list of scalar variables,
which you access by their index number starting at 0. Arrays
always stay in the same order. Hashes are a list of scalars, but
instead of being accessed by index number, they are accessed by a
key. The tables below illustrate the point:
@myarray
|
Index No.
|
Value
|
0
|
The Netherlands
|
1
|
Belgium
|
2
|
Germany
|
3
|
Monaco
|
4
|
Spain
|
|
%myhash
|
Key
|
Value
|
NL
|
The Netherlands
|
BE
|
Belgium
|
DE
|
Germany
|
MC
|
Monaco
|
ES
|
Spain
|
|
|
So if we want 'Belgium' from
@myarray
and also from
%myhash
, it'll be:
print "$myarray[1]";
print "$myhash{'BE'}";
Notice that the $
prefix is
used, because it is a scalar variable. Despite the fact it is part of
a list, it is still a scalar variable. The hash syntax is simply to
use braces { }
instead of
square brackets.
So why use hashes ? When you want
to look something up by a keyword. Suppose we wanted to create a
program which returns the name of the country when given a country
code. We'd input ES, and the program would come back with Spain.
You could do it with arrays. It
would be messy however. One possible approach :
create @country
, and give it values such as 'ES,Spain'
Itierate over the entire array
and
split
each element of the array, and check the first result
to see if it matches the input
If so, return the index
@countries=('NL,The Netherlands','BE,Belgium','DE,Germany','MC,Monaco','ES,Spain');
print "Enter the country car code:";
chop ($find=<STDIN>);
foreach (@countries) {
($code,$name)=split /,/;
if ($find=~/$code/i) {
print "$name has the code $code\n";
}
}
Complex and slow. We could also store a reference to another array in
each element of @countries
,
but that is not efficient. Whatever way we choose, you still need to
search the whole thing. And what if @countries
is a big array ? See how much easier a hash is :
%countries=('NL','The Netherlands','BE','Belgium','DE','Germany','MC','Monaco','ES','Spain');
print "Enter the country car code:";
chop ($find=<STDIN>);
$find=~tr/a-z/A-Z/;
print "$countries{$find} has the code $find\n";
Very easy. All we need to do is make sure everything is in uppercase
with tr
and we are there.
Notice the way %countries
is
defined - exactly the same as a normal array, except that the values
are put into the hash in key/value pairs.
So why use arrays ? One excellent
reason is because when an array is created, its variables stay in the
same order you created them in. With a hash, perl reorders elements
for quick access. Add print %countries;
to the end of that program above and run it. See what I
mean ? No recognisable order at all. If you were writing code that
stored a list of variables over time and you wanted it back in the
order you found it in, don't use a hash.
Finally, you should know that each
key of a hash must be unique. Stands to reason, if you think
about it. You are accessing the hash via keys, so how can you have
two keys named 'NL' or something ? If you do define a certain key
twice, the second value overwrites the first. This is a feature, and
useful. The values of a hash can be duplicates, but never the keys.
If you want to assign to a hash,
there is of course no concept of push
,
pop
and splice
etc. Instead :
Assigning
|
$countries{PT}='Portugal';
|
Deleting
|
delete $countries{NL};
|
Accessing Your Hash
Assuming you keep the same
%countries
hash as above,
here are some useful ways to access it :
All the keys
|
print keys %countries;
|
All the values
|
print values %countries;
|
A Slice of Hash :-)
|
print @countries{'NL','BE'};
|
How many elements ?
|
print scalar(keys %countries);
|
Does the key exist ?
|
print "It's there !\n"
if exists $countries{'NL'};
|
Well, that last one is not an
access but useful anyway.
You may have noticed that keys
and values
return
a list. And we can iteriate over a list, using foreach
:
foreach (keys %countries) {
print "The key $_ contains $countries{$_}\n";
}
which is useful. Note how any list can be fed to foreach
, and off it goes. As usual, there is another way to do
the above:
while (($code,$name)=each %countries) {
print "The key $code contains $name\n";
}
The each
function returns
each key/value pair of the hash, and is slightly faster. In this
example we assign them to a list (you spotted the parens ?) and away
we go. Eventually there are no more pairs, which returns false to the
while
loop and it stops.
Sorting
If I was reading this I'd be
wondering about sorting. Wonder no more, and behold :
foreach (sort keys %countries) {
print "The key $_ contains $countries{$_}\n";
}
Spot the difference. Yes, sort
crept
in there. If you want the list sorted backwards, some cunning is
called for. This is suitably foxy:
foreach (reverse sort keys %countries) {
print "The key $_ contains $countries{$_}\n";
}
Perl is just so difficult at times, don't you think ? This works
because :
keys
returns a list
sort
expects a list - gets one from keys ,
and sorts it
reverse
also expects a list, get one and returns it
then the whole list is foreach
'd over.
This is a quick example to make
sure the meaning of reverse is clear :
print "Enter string to be reversed :";
$input=<STDIN>;
@letters=split //,$input; # splits on the 'nothings' in between each character of $input
print join ":", @letters; # joins all elements of @letters with \n, prints it
print reverse @letters; # prints all of @letters, but sdrawkcab )-:
Perl's list operators can just feed directly to each other, saving
many lines of code but also decreasing readbility to those that
aren't Perl-literate :
print "Enter string to be reversed :";
print join ":",reverse split //,$_=<STDIN>;
This section is about sorting, so enough of reverse
. Time to go forwards instead.
That's easy alphabetical sorting by
the keys. If you had a hash of international access numbers like this
one :
%countries=('976','Mongolia','52','Mexico','212','Morocco','64','New Zealand','33','France');
foreach (sort keys %countries) {
print "The key $_ contains $countries{$_}\n";
}
You might want to sort numerically. In that case, you need to
understand how Perl's sort function
works.
The sort function compares two
variables, $a and $b
. They must be called $a and $b
otherwise it won't work. One chap published a book with stolen
code, and he changed $a and $b
to $x and $y. He obviously didn't test the program because it
would have failed and he would have noticed. And this book was really
published ! Don't believe everything you read in books - but web
tutorials are always 100% truthful :-)
Back to sorting. $a
and $b are compared, and the
result is :
So as long as the sort operator
gets one of those three values back it is happy. This means we can
write our own sort routines, and feed them to sort. For example, we
know the default sort is alphabetical. But if we write this :
%countries=('976','Mongolia','52','Mexico','212','Morocco','64','New Zealand','33','France');
foreach (sort supersort keys %countries) {
print "$_ $countries{$_}\n";
}
sub supersort {
if ($a > $b) {
return 1;
} elsif ($a < $b) {
return -1;
} else {
return 0;
}
}
then it works correctly. Of course, there is an easier way. The
'spaceship' operator <=>
.
It does exactly what the supersort subrountine does, namely return 1,
-1 or 0 depending on the comparison of two given values.
So we can write the above much more
easily as :
%countries=('976','Mongolia','52','Mexico','212','Morocco','64','New Zealand','33','France');
foreach (sort { $b <=> $a } keys %countries) {
print "$_ $countries{$_}\n";
}
Notice the { } braces, which define the contents as the subroutine
sort must use. Pretty short subroutine. There is a companion operator
to <=>
, namely cmp
which does exactly the same thing but of course
compares the values as strings, not numbers. Remember if you are
comparing numbers, your comparison operator should contain
non-alphas, if you are comparing strings the operator should contains
alphas only.
Anyway, you now have enough
knowledge to sort a hash by value instead of keys. Suppose your
pointy haired manager bounced up to you and demanded a hash sorted by
value ? What would you do ? OK, what should you do ?
Well, we could just sort the
values.
foreach (sort values %countries) {
But Pointy Hair wants the keys too. And if you have a value you can't
find the key.
So we have to iteriate over the
keys. But just because we are iterating over the keys doesn't mean to
say we have to hand the keys over to sort
. What about :
%countries=('976','Mongolia','52','Mexico','212','Morocco','64','New Zealand','33','France');
foreach (sort { $countries{$a} cmp $countries{$b} } keys %countries) {
print "$_ $countries{$_}\n";
}
beautifully simple. If you want a reverse sort swap $a
and $b
.
You can sort several lists at the
same time :
%countries=('976','Mongolia','52','Mexico','212','Morocco','64','New Zealand','33','France');
@nations=qw(China Hungary Japan Canada Fiji);
@sorted= sort values %countries, @nations;
foreach (@nations, values %countries) {
print "$_\n";
}
print "#----\n";
foreach (@sorted) {
print "$_\n";
}
This sorts @nations
and the
values from %countries
into
a new array. The example also demonstrates that you can foreach
over more than one list value - each list is processed in turn.
Grep and Map
Grep
If you want to search a list, and
create another list of things you found, grep
is one solution. This is an example, which also
demonstrates join
again :
@stuff=qw(flying gliding skiing dancing parties racing); # quote-worded list
@new = grep /ing/, @stuff; # searches for anything with 'ing' in it
print join ":",@stuff,"\n"; # first makes on string out of the elements of @stuff, joined
# with ':' , then prints it, then prints \n
print join ":",@new,"\n";
Remember qw
means 'quote
words', so word boundaries are used as delmiters intead. The grep
function must be fed a list on the right hand side. On
the left side, you may assign the results to a list or a scalar
variable. The list gives you each actual element, and the scalar
gives you the number of matches found :
@stuff=qw(flying gliding skiing dancing parties racing);
$new = grep /ing/, @stuff;
print join ":",@stuff,"\n";
print "Found $new elements of \@stuff which matched\n";
If you decide to modify the elements on they way through grep
, you actually modify the original list.
@stuff=qw(flying gliding skiing dancing parties racing);
@new = grep s/ing//, @stuff;
print join ":",@stuff,"\n";
print join ":",@new,"\n";
To determine what actually matches you can either use an expression
or a block. Up to now we've been using expressions, but when things
become more complicated use a block :
@stuff=qw(flying gliding skiing dancing parties racing);
@new = grep { s/ing// if /^[gsp]/ } @stuff;
print join ":",@stuff,"\n";
print join ":",@new,"\n";
Try removing the braces and you'll get an error. Notice that the
comma before the list has gone. It is now obvious where the
expression ends, as it is inside a block delimited with { } . The
regex says if the element begins with g, s or p, then remove ing. The
result is only assigned to @new
if
the expression is completely true - 'parties' does begin with p, so
that works, but s/ing//
fails
so the overall result is false, and the value is not assigned to @new
.
Map
Map works the same way as grep
, in that they both iteriate over a list, and return a
list. There are two important differences however :
As usual, an example will assist
the penny in dropping, clear the fog and turn on the light (if not
make my metaphors easier to understand) :
@stuff=qw(flying gliding skiing dancing parties racing);
print join ":",@stuff,"\n";
@mapped = map /ing/, @stuff;
@grepped = grep /ing/, @stuff;
print join ":",@stuff,"\n";
print join ":",@mapped,"\n";
print join ":",@grepped,"\n";
You can see that @mapped
is
just a list of 1 or nothing. This is the result of map
- in every case the expression /ing/
is successful, except for 'parties'. Notice there is a
null value returned in this case - false. Contrast this action with
the grep
function, which
returns the actual value, and only if it is true. Try this :
@letters=(a,b,c,d,e);
@ords=map ord, @letters;
print join ":",@ords,"\n";
@chrs=map chr, @ords;
print join ":",@chrs,"\n";
This uses the ord
function
to change each letter into its ASCII equiavlent, then the chr
function convert ASCII numbers to characters. If you
change map
to grep
in the example above, you can see that nothing appears
to happen. What is happening is that grep
is trying the expression on each element, and if it
succeeds (is true) it returns the element, not the result. The
expression succeeds for each element, so each element is returned in
turn. Another example :
@stuff=qw(flying gliding skiing dancing parties racing);
print join ":",@stuff,"\n";
@mapped = map { s/(^[gsp])/$1 x 2/e } @stuff;
@grepped = grep { s/(^[gsp])/$1 x 2/e } @stuff;
print join ":",@stuff,"\n";
print join ":",@mapped,"\n";
print join ":",@grepped,"\n";
Recapping on regex, what that does is match any element beginning
with g, s or p, and replace it with the same element twice. The caret
^
forces a match at the
beginning of the string, the [square brackets] denote a character
class, and /e
forces Perl
to evaluate the RHS as an expression.
The output from this is a mixture
of 1 and nothing for map
,
and a three-element array called @grepped
from grep. Yet another example :
@mapped = map { chop } @stuff;
@grepped = grep { chop } @stuff;
The chop
function removes
the last character from a string, and returns it. So that's what you
get back from map, the result of the expression. The grep
function gives you the mangled remains of the original value.
Finally, you can write your own
functions :
@stuff=qw(flying gliding skiing dancing parties racing);
print join ":",@stuff,"\n";
@mapped = map { &isit } @stuff;
@grepped = grep { &isit } @stuff;
print join ":",@mapped,"\n";
print join ":",@grepped,"\n";
sub isit {
($word)=/(^.*)ing/;
if (length $word == 3) {
return "ok";
} else {
return 0;
}
}
The subroutine isit
first
grabs everything up until 'ing', puts it into $word
, then returns 'ok' if the there are three characters
in $word
. If not, it
returns the false value 0. You can make these subroutines (think of
them as functions) as complex as you like.
Sometimes it is very useful to have
map
return the actual
value, rather than the result. The answer is easy, but not obvious.
like a subroutine, returns the result of the last expression
evaluated. What if the expression was, very simply :
@grepstuff=@mapstuff=qw(flying gliding skiing dancing parties racing);
print join " ",map { s/(^[gsp])/$1 x 2/e } @mapstuff;
print "\n";
print join " ",grep { s/(^[gsp])/$1 x 2/e } @grepstuff;
Now, make sure $_
is the
last thing evaluated :
@grepstuff=@mapstuff=qw(flying gliding skiing dancing parties racing);
print join " ",map { s/(^[gsp])/$1 x 2/e;$_} @mapstuff;
print "\n";
print join " ",grep { s/(^[gsp])/$1 x 2/e } @grepstuff;
and there you have it. Now you understand that you can go and impress
your friends.
External Commands
Perl can start external commands.
There are four main ways to do this :
We'll compare system
and exec
first.
Exec
Poor old exec
is broken on Perl for Win32. What it should do is stop
running your Perl script and start running whatever you tell it to.
If it can't start the external process, it should return with an
error code. This doesn't work properly under Perl for Win32. The exec
function does work properly on the standard Perl
distribution.
System
This runs an external command for
you, then carries on with the script. It always returns, and the
value it returns goes into $?
.
This means you can test to see if the program worked. Actually you
are testing to see if it could be started, what the program does when
it runs is outside your control if you use system
. This demonstrates system
in action. Run the 'vol' command from a command prompt
first if you are not familiar with it. Then run the 'vole' command.
I'm assuming you have no cute furry executables called vole on your
system, or at least in the path. If you do have an executable called
'vole', be creative and change it.
system("vole");
print "\n\nResult: $?\n\n";
system("vol");
print "\n\nResult: $?\n\n";
As you can see, a successful system call returns 0. An unsucessful
one returns a value which you need to divide by 256 to get the real
return value. Also notice you can see the output. And because system
returns, the code after the first system
call is executed. Not so with exec
,
which will terminate your perl script if it is successful.
Backticks
These ``
are different again to system and exec. They also start
external processes, but return the output of the process. You
can then do whatever you like with the output. If you aren't sure
where backticks are on your keyboard, try the top left, just left of
the 1 key. Often around there. Don't confuse single quotes ''
with backticks ``
.
$volume=`vol`;
print "The contents of the variable \$volume are:\n\n";
print $volume;
print "\nWe shall regexise this variable thus :\n\n";
$volume=~m#Volume in drive \w is (.*)#;
print "$1\n";
As you can see here, the Win32 vol command is executed. We just print
it out, escaping the $
in
the variable name. Then a simple regex, using # as a delimiter just
in case you'd forgotten delimiters don't have to be / .
Before you get carried away with
creating elaborate scripts based on the output from NT's net
commands, note there are plenty of excellent modules
out there which do a very good job of this sort of thing, and that
any form of external process call slows your script. Also note there
are plenty of built in functions such as readdir
which can be used instead of `dir`
. You should use Perl functions where possible
rather than calling external programs because Perl's functions
are :
portable (usually, but there
are exceptions). This means you can write a script on your Mac
PowerBook, test it on an NT box and then use it live on your Unix
box without modifying a single line of code.
faster, as every external
process significantly slows your program
don't usually require regexing
to find the result you want
don't rely on output in a
particular format, which might be changed in the next version of
your OS or application
are more likely to be
understood by a Perl programmer - for example, $files=`ls`;
on a Unix box means little to someone that doesn't
know that ls
is the Unix command for listing files, as
dir
is in Windows.
Don't start using backticks all
over the place when system
will do. You might get a very large return value which you don't
need, and will consequently slurp lots of memory. Just use them when
you want to check the return value.
Opening a Process
The problem with backticks is that
you have to wait for the entire process to complete, then analyse the
entire return code. This is a big problem if you have large return
codes or slow processes. For example, the DOS command tree
.
If you aren't familiar with this, run it and look at the output.
We can open a process, and pipe
data in via a filehandle in exactly the same way you would read a
file.
open TRIN, "tree c:\\ /a |" or die "Can't see the tree :$!";
while (<TRIN>) {
print "$. $_";
}
Note the |
which denotes
that data is to be piped from the specifed process. You can
also pipe data to a process by using |
as the first character.
As usual, $.
is the line number. What we can do now is terminate our
tree
early. Enviromentally unsound, but efficient.
while (<TRIN>) {
printf "%3s $_", $.;
last if $. == 10;
}
As soon as $.
hits 10 we
shut the process off by exiting the loop. Easy. You might notice the
presence of a new keyword - printf
.
It works like print
, but
formats the string before printing. The formatting is controlled by
such parameters as %3s
,
which means "pad out to a total of three spaces". After the
doublequoted string comes whatever you want to be printed in the
format specified. Some examples follow. Just uncomment each line in
turn to see what it does.
$windir=$ENV{'WINDIR'}; # yes, you can access the enviroment variables !
$x=0;
# whoops, another new function
opendir WDIR, "$windir" or die "Can't open $windir !!! Panic : $!";
while ($file= readdir WDIR) {
next if $file=~/^\./; # try commenting this line to see why it is there
$age= -M "$windir/$file"; # -M returns the age in days
$age=~s/(\d*\.\d{3}).*/$1/; # hmmmmm
#### %4.4d - must take up 4 columns, and pad with 0s to make up space
#### and minimum width is also 4
#### %10s - must take up 10 columns, pad with spaces
# printf "%4.4d %10s %45s \n", $x, $age, $file;
#### %-10s - left justify
# printf "%4.4d %-10s %-45s \n", $x, $age, $file;
#### %10.3 - use 10 colums, pad with 0s if less than 3 columns used
# printf "%4.4d %10.3d %45s \n", $x, $age, $file;
$x++;
last if $x==15; # we don't want to go through all the files :-)
}
There are some intentionally new functions there. When you start
hacking Perl (actually, you already started if you have worked
through this far) you'll see a lot of example code. Try and
understand the above, then read the explanation below.
Firstly, all enviroment variables
can be accessed and set via Perl. They are in the %ENV
hash. If you aren't sure what enviroment variables are,
refer to your friendly Microsoft documentation or books. The best
known enviroment variable is path
, and you can see it's
value and that of all other enviroment variables by simply typing set
at your command prompt.
Secondly, opendir
. Similar to open
but
opens a directory, not a file. Usually, you want to read from the
directory, so readdir
is
useful. There is no while (<WDIR>)
{
construct.
The regex /^\./
bounces out invalid entries before we bother do any
processing on them. Good programming practice. What it matches is
"anything that begins with '.'". The caret anchors the
match to the beginning of the string, and as .
is a metacharacter it has to be escaped.
Perl has several tests to apply on
files. The -M
test returns
the age in days. See the documentation for similar tests. Note that
the calls to readdir
return
just the file, not the complete pathname. As you were careful to use
a variable for the directory to be opened, it is no trouble to glue
it together by using doublequotes.
Try commenting out
$age=~s/(\d*\.\d{3}).*/$1/
and
note the size of $age
. It
could do with a trim. Just for regex practice, we make it a little
smaller. What the the regex does is :
start capturing with (
look for 0 or more digits \d*
then a .
(escaped)
followed by three digits \d{3}
and that's all we want to
capture so the parens are closed. )
Finally, everything else in
the string is matched .*
where
.
is any character
(almost) and *
0 or more.
This is pretty much guaranteed to match to the end of the line
Having matched the entire
string (and put part of it into $1
by
using parens) we simply replace the string with what we have
matched.
Easy !
Mention should also be made of
sprintf
, which is exactly
like printf
except it
doesn't print. You just use it to format strings, which you can do
something with later. For example :
open TRIN, "tree c:\\ /a |" or die "Can't see the tree :$!";
while (<TRIN>) {
$line= sprintf "%3s $_", $.;
print $line;
last if $. == 10;
}
Oneliners
You'll have noticed Perl packs a
lot of power into a small amount of code. You can feed Perl code
directly on the command line. Try this :
perl -e"for (55..75) { print chr($_) }"
The -e
switch tells Perl
that a command is following. The command must be enclosed in
doublequotes, not singles as on Unix. The command itself in this case
simply prints the ASCII code for the number 55 to 75 inclusive.
This is a simple find routine. As
it uses a regex, it is infintely superior to NT's findstr
.
perl -e"while (<>) {print if /^[bv]/i}" shop.txt
Remember, the while (<>)
construct
will open whatever is in @ARGV
.
In this case, we have supplied shop.txt
so it is opened
and we print lines that begin with either 'b' or 'v'.
That can be made shorter. Run perl
-h
and you'll see a whole list of switches. The one
we'll use now is -n
, which
puts a while (<>) { }
loop around whatever code you supply with -e
. So :
perl -ne"print if /^[bv]/i" shop.txt
which does exactly the same as the previous program, but uses the -n
switch. A slightly more sophistcated version :
perl -ne"print \"$ARGV : $.\n\" if /^[bv]/i" shop.txt
which demonstrates that doublequotes must be escaped.
If you don't remember $^I
then please review the section on Files before
proceeding. When you're ready, copy shop.txt
to
shop2.txt
.
perl -i.bk -ne"printf \"%4s : $_\",$." shop2.txt
The -i
switch primes the
inplace edit operator. We still need -n
.
If you had a typical quoted email
message such as :
>> this is what was said
>> blah blah
> blaaaaahhh
The new text
and you wanted to remove the >
, then :
perl -i.bk -pe"s/^>//" email.txt
does the trick. Notice that the regex anchors the match to the start
of the string with the caret. What is new is the use of -p
, which does exactly the same thing as -n
except that it adds a print
statement too.
Some other useful oneliners - a
calculator and a ASCII number lookup :
perl -e"print 50/200+2"
perl -e"for (50..90) { print chr($_) }"
There are plenty more. Send me your favourites ! Finally, a useful
tip - to avoid escaping doublequotes, try this :
perl -e"for $i (50..90) { print chr($i),qq| is $i\n| }"
Whatever follows qq
is used
as a delimiter. I learnt this from the Perl-Win32-Users mailing list
(see top) - I think it was Lennart Borgman who pointed it out. He
also mentioned that you don't need the closing doublequote. Saves a
little typing.
Subroutines and
Parameters
We want a subroutine to calculate
how long it will take us to drive a given distance.
($speed,$distance)=@ARGV;
&calcspeed;
print $time;
sub calcspeed {
$time=$distance / $speed;
$time=int($time*60);
}
Execute it thus :
perl calcspeed.pl 130 120
This works. As you remember ;-) @ARGV
contains the command line arguments, which are assigned
to variables. Then we call the subroutine and print the result. The
int
function returns an
integer, that is without those messy digits after the decimal point.
This is a little inflexible.
Suppose we wanted to also print times for 10kmph (I prefer kms to
miles) above and below the speed given. Or, we allowed six parameters
for three sets of distance/time calculations. Or there was some other
change. This one solution to the first problem :
($speed,$distance)=@ARGV;
&calcspeed;
print "$time\n";
$speed=$speed+10;
&calcspeed;
print "$time\n";
$speed=$speed-20;
&calcspeed;
print "$time\n";
sub calcspeed {
$time=$distance / $speed;
$time=int($time*60);
}
That's an appalling bit of code. What would be really useful is if we
could pass the subroutine parameters to act on. We can.
($speed,$distance)=@ARGV;
&calcspeed($speed,$distance);
print "$time\n";
&calcspeed($speed+=10,$distance);
print "$time\n";
&calcspeed($speed-=20,$distance);
print "$time\n";
sub calcspeed {
($speed,$distance)=@_;
$time=$distance / $speed;
$time=int($time*60);
}
First change - the shorter version of $x=$x+$y
, that is $x+=$y
.
Secondly, and more importantly, we are now passing parameters to our
subroutine via the parens. The parameters are comma delimited.
To use those parameters, analyse
the @_
array. As you can
see, we just assign the contents to two variables. These happen to
have the same name as the originals.
Subroutines are just functions
which are user-defined. I'll mix and match the terms. So you don't
usually have to use the &
prefix.
Sometimes it is necessary if there is any ambiguity. You can also
just print the result of a function, or assign it to a variable or do
any other operation on it.
($speed,$distance)=@ARGV;
print calcspeed($speed,$distance),"\n";
print calcspeed($speed+=10,$distance),"\n";
print calcspeed($speed-=20,$distance),"\n";
sub calcspeed {
($speed,$distance)=@_;
$time=$distance / $speed;
$time=int($time*60);
}
Great. Now let's worry about fuel enconomy. On second thoughts, we'll
just calculate it as an interesting excercise and not worry about it.
Modify the function :
sub calcspeed {
($speed,$distance)=@_;
$time=$distance / $speed;
$time=int($time*60);
int($litres=$distance / (15 / ($speed/100)));
}
All that does is work out a rough fuel consumption, which gets worse
the faster you drive. Or better, if you take Texaco's viewpoint. The
problem we have is that the time is now lost.
This is because any function
returns the value of the last expression evaluated. In
this case, the last expression was the calcuation for $litres.
We can override this.
sub calcspeed {
($speed,$distance)=@_;
$time=$distance / $speed;
$time=int($time*60);
int($litres=$distance / (15 / ($speed/100)));
return ($time,$litres);
}
like so. Now, both values are returned. Progress. Or is it ?
You might be wondering what the
point of all this is. Why bother passing parameters around ? Well,
for short and small programs sometimes it is not worth it. But work
through these examples and you'll see :
($speed,$distance)=@ARGV;
($time,$fuel)=calcspeed($speed,$distance);
print "At $speed, it takes $time minutes to travel $distance kms, using $fuel litres\n";
@result=calcspeed($speed+=10,$distance);
print "At $speed, it takes $result[0] minutes to travel $distance kms, using $result[1] litres\n";
($time,$fuel)=calcspeed($speed-=20,$distance);
print "At $speed, it takes $time minutes to travel $distance kms, using $fuel litres\n";
sub calcspeed {
($speed,$distance)=@_;
$speed=70 if $speed < 70;
$time=$distance / $speed;
$time=int($time*60);
int($litres=$distance / (15 / ($speed/100)));
return ($time,$litres);
}
Here we demonstrate that you can just assign the results of a
function to variables. You already knew that, but a demonstration
doesn't hurt. Unless you happen to be working for an electric chair
manufacturer.
The important part is the sneaky
modification of $speed
.
Being good citizens, we have decided that if $speed
is less than 70, it should become 70.
Unfornately, this returns some
rather spurious results.
So what can we do ? We could assign
$speed
and $distance
to new variable names. That would fix it. However,
imagine a very large program. Can you really keep track of all those
variable names ? What about a module, which is a plug-in bit of Perl
code to extend your program's functionality ? Suppose the module
programmer used a $speed
variable
too - it would stomp all over your $speed
and your program would break.
What we need is a little privacy.
Departing from the main program for just a moment (but we will
return) :
($speed,$distance)=@ARGV;
print "1. ## Speed is $speed, distance is $distance\n";
&change;
print "3. ## Speed is $speed, distance is $distance\n";
sub change {
$speed*=2;
$distance/=10;
print "2. ** Speed is $speed, distance is $distance\n";
}
No surprises here. Our two variables are duly changed. Now, try this
:
sub change {
my ($speed,$distance);
$speed*=2;
$distance/=10;
print "2. ** Speed is $speed, distance is $distance\n";
}
Print 2 now shows 0. This is because we have used my
to declare the variables. The variables now exist only
inside the block they are declared in. A block is delimited by {
}
braces. As you can see, the variables inside the
block even have the same name as the variables outside the block. It
doesn't matter.
Now we can use whatever variable
names we like inside our function, safe in the knowledge that we
won't stomp over any variables of the same name. Another advantage is
that my
variables are
faster than global (non-my) variables.
Knowing this, a simple change can
be made to the original program :
sub calcspeed {
my ($speed,$distance)=@_; # spot the difference
my ($time,$litres);
$speed=70 if $speed < 70;
$time=$distance / $speed;
$time=int($time*60);
int($litres=$distance / (15 / ($speed/100)));
return ($time,$litres);
}
This is looking a little more professional now. We have declared all
the variables we are going to use with my
, and sleep easily at night knowing they are protected
in their own little world, the boundaries of which are the braces.
After that, they cease to exist. Their scope has been
restricted. This is what scoping variables is all about. Variables
declared with my
are known
as lexically scoped.
There is much, much more to scoping
than the above. It is however important to understand just one more
concept :
($arg1,$arg2,$arg3)=@ARGV;
print "\nOutput Field Seperator is :$,:\n";
print "OUTSIDE\t",$arg1,$arg2,$arg3,"\n";
&change;
$,="_";
print "\nOutput Field Seperator is :$,:\n";
print "OUTSIDE\t",$arg1,$arg2,$arg3,"\n";
&change;
sub change {
print "BLOCK\t",$arg1,$arg2,$arg3,"\n";
}
which should be executed something like this :
perl test.pl sarcasm is the lowest form of wit
The special variable $,
defines
what Perl should print in between lists it is given. By default, it
is nothing. We can assign to it quite easily.
This causes a small problem with
our happy world of lexically scoped my
variables. The problem being that if we want to use $,
in our own little subroutine, we can, but only as long
as we accept the value that it is set to. This won't work, but try it
:-)
sub change {
my $,="!-!";
print "BLOCK\t",$arg1,$arg2,$arg3,"\n";
}
So what can we do ? One solution is to assign the current value of $,
to another variable, change $,
, and make sure we change it back when the function
returns. That's messy, extra work and prone to errors. This is a
better solution :
sub change {
local $,="!-!";
print "BLOCK\t",$arg1,$arg2,$arg3,"\n";
}
The local
function makes a
copy of the given variable, which can be modified as you please
inside your block. Outside the block, $,
still has it's original value. Again, this is scoping,
but this form of scoping is called dynamic scoping.
So when should you use local
instead of my
?
Not often, is the short answer. Personally I only tend to use it for
changing special variables like $,
.
The problem with local
is
that it makes your subroutine dependent on variable names outside
it's control. We could rewrite the speed/fuel/time program to use
local
instead, but what if
we decided to change the name of the $speed
variable to $kmph
?
That would break the calcspeed
function.
Write your subroutines as black
boxes. They should accept input, return output and not be dependent
on anything else in the program. Use my
to lexically scope parameters, local
where you have to, and use an explicit return
instead of the default. Believe me, it'll save some
trouble.
Note - when you are an
proficient perl professional, you'll notice a simplification or two
in this section. The problem is that this is a difficult concept to
explain, and littering the text with precise clarifications about
such things as closures, packages and evals will just complicate
things further for no real gain.