next up previous index
Next: 6. Is this Bum Up: 5. The Impact of Previous: The Coming of the

Languages of the Internet: Perl and HTML

It is an interesting phenomenon that most computer scientists go through a five year period early in their career when they think that computers and programming are really, really neat. They will kill enormous amounts of time customizing their personal computers to get everything to work just right, learning the arcane details of the latest programming languages.

Students in this obsession phase are a joy to have around, largely because their professors have long since left it. These days, I am much more excited about finding interesting things to do with computers (like predicting jai-alai matches) than I am in dealing with an upgrade from Windows 98 to Windows 2000. Fortunately, Dario did windows, and a whole lot more.

Dario was particularly eager to learn the behind-the-scenes language which makes the Internet go, a programming language called Perl. Perl is not as much a reflection of hot new technology as it is a manifestation of old ideas freshly applicable to today's problems.  

Although the explosive growth of the Internet has clearly been the most exciting recent development in computer technology, the dirty truth is that it really doesn't require very much computing to make the Internet work. Throughout most of the information age, computers spent the bulk of their time crunching numbers (like predicting the weather) or in business data processing (doing things like payroll and accounting). Most applications ran on expensive, mainframe computers that kept busy round the clock and charged users for every minute of computer time.

Fast forward to today. Now millions of desks across the nation contain personal computers, each of which is vastly more powerful than the ``big iron'' of yesteryear. And what do we do with the billions of instructions per second that we have at our disposal? We run increasing elaborate screen-saving programs whose shimmering images decorate our desks as they protect the phosphors on our monitors.  

The truth is that the Internet is really about communication, not computation. Although the World-Wide Web has been dubbed the ``World-Wide Wait'' because of sluggish response times, the primary source of these delays is not insufficient processing power, but that of too many people trying to use too few dedicated telephone lines, all at the same time.

An embarrassingly high percentage of the computing tasks associated with the World-Wide Web are basic bookkeeping and simple text reformatting. Perl is a language which is designed to make writing these conversion tasks as simple and painless as possible. Depending upon who you believe, Perl is an acronym for either ``Practical Extracting and Reporting Language'' or ``Pathologically Eclectic Rubbish Lister''. The goals of its creator Larry Wall was to ``make the easy jobs easy, without making the hard jobs impossible''.

Perl programs are not particularly efficient, but they are particularly short. They are designed to be written quickly, plugged in place, and forgotten. No one would think of building a Monte Carlo simulation to simulate a million jai-alai games in Perl, because such high-performance number-crunching jobs must be carefully written to utilize the machine efficiently. Perl is for those quick-and-dirty, hit-and-run reformating tasks which help programmers untangle the Web.

One of the common text processing tasks in which Perl scripts are used is preparing WWW pages on demand from data bases. Look up your favorite book (ideally, look up my book) on Amazon.com or some other on-line book dealer and you will see a customized page with the title and publisher, a picture of the cover, reader-supplied reviews, even the current rank on the company's bestsellers list. This WWW page was not written by a person, but a computer program which extracts the relevant information from the database and adds formating commands to make it look right on the reader's screen.  

A second language of the Internet is HTML, an acronym for the ``Hypertext Markup Language''. HTML is the language in which all WWW pages are written, i.e. the text spit out by Amazon.com's Perl programs. It really isn't a computer programming language at all, since you can't write a program in HTML to do anything. HTML provides a medium for an author (or computer) to specify what a WWW page should look like to the reader.  

As we saw, Milford's schedule and results files were presented as unexciting-to-read but simple-to-parse text files. Dania Jai-Alai was more ambitious, and used HTML formatting to present its results and schedule files. The following portion of a Dania schedule file illustrates HTML:

<HTML>
<HEAD>
<TITLE>Entries Shell</TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFff" TEXT = "#000000" LINK ="#FF0000" VLINK="#0f4504">
<font color="#ff0000">
<center><img src="botlogo.gif"></center>
ENTRIES DANIA JAI ALAI AFTERNOON 07/19/98 14
GAMES</font>
 
<table cellpadding="15" align="top">
<tr align=left valign="top">
<td>
 
<!--column 1 entries -->
 
<table valign="top">
<tr valign=top align=left>
<td><font color="#ff0000">GAME 1 - Spec 7 -Tri,DD<br></font>
<font>1 Mouhica-Oyhara<br>
2 Blanco-Verge<br>
3 Scotty-Zuri<br>
4 Arecha-Inigo<br>
5 Rocha III-Ondo<br>
6 Aymar-Eneko<br>
7 Laucirica II-Bilbao<br>
8 Andonegui-Homero<br>
SUBS: Burgo-Ulises</font></td>
</tr>

The formatting commands of HTML appear within the angle brackets, such as <TITLE>. This portion starts by presenting the title of this page and then specifies the color of both the background and the text (the actual colors are described by ``names'' like #ff0000). It then specified that a picture named ``botlogo'' should be inserted, neatly centered in the middle of the line. The schedule of each game is formatted as a table, with each row presenting the post number and the two members of each doubles team.

This HTML formatting may seem ungainly, but you weren't intended to read it - your WWW browser was. It would be tedious for a person to write all those formatting commands each day, but that was done by a Perl program, not a person. As is the case with Amazon.com, these WWW pages are produced by formatting the information in a database using a straightforward computer program. Because a computer program writes the actual HTML files, we can rely on the format to be the same day to day, without any typing or formatting errors.

My student Dario did not have access to the fronton's private database containing the unformated schedule and result information. However, he did have access to these HTML pages. By writing his own Perl program, he could carefully strip away all that formatting the fronton's program had diligently inserted. He could take the remaining data and format it just as we did the Milford data, enabling us to add it each day to our library of jai-alai scores. Once we had amassed enough data to work with our fun could really begin.  

Any discussion of the languages of the Internet would be incomplete without mentioning Java. At the risk of slightly oversimplifying things, Java is a programming language which is used to write programs which will run on somebody else's machine, typically using an Internet browser.  

For example, suppose I want to put on the WWW a facility enabling you to calculate the amount of money you will pay each month if you take out a mortage. I could create a WWW page which would prompt you to type in the interest rate, loan amount, and term of the loan, then calculate the number on my machine, and send this number to you on your machine. Alternately, I could write a little program in Java which my machine could give your machine, which when run on your machine prompts you for the relevant numbers and does the calculation there. This second arrangement is better for me, in that it reduces the amount of interaction on my machine, and also better for you, since I don't need to know how much money you are thinking of embezzling from the bank.

We don't use Java anywhere in our system because there is no program we want to run on somebody else's machine, and because no fronton's WWW site provides an program that we want to run (as opposed to data which we want to read). Still, Java is a good thing. In fact, it is such a good thing that Microsoft devoted considerable energy and resources trying to kill it.



I hope you have enjoyed this excerpt from Calculated Bets: Computers, Gambling, and Mathematical Modeling to Win!, by Steven Skiena, copublished by Cambridge University Press and the Mathematical Association of America.

This is a book about a gambling system that works. It tells the story of how the author used computer simulation and mathematical modeling techniques to predict the outcome of jai-alai matches and bet on them successfully -- increasing his initial stake by over 500% in one year! His method can work for anyone: at the end of the book he tells the best way to watch jai-alai, and how to bet on it. With humor and enthusiasm, Skiena details a life-long fascination with the computer prediction of sporting events. Along the way, he discusses other gambling systems, both successful and unsuccessful, for such games as lotto, roulette, blackjack, and the stock market. Indeed, he shows how his jai-alai system functions just like a miniature stock trading system.

Do you want to learn about program trading systems, the future of Internet gambling, and the real reason brokerage houses don't offer mutual funds that invest at racetracks and frontons? How mathematical models are used in political polling? The difference between correlation and causation? If you are curious about gambling and mathematics, odds are this is the book for you!

This book is available in both hardcover and paperback.



next up previous index
Next: 6. Is this Bum Up: 5. The Impact of Previous: The Coming of the
Steve Skiena
2001-06-04