How to Distribute Daily News Updates to a Web Site

A Paper Presented to the Southern Association of Agricultural Scientists
Agricultural Communications Section
Orlando, FL
February 2002

Scott Janish
Communications Specialist
Texas A&M Univeristy

Background

The news team of the Texas A&M Agriculture Program has been distributing news stories via email and on the web since 1994. Each news story had to be written into HTML to post on the web. The writers did not have the time required to write the stories and learn the necessary technology to put the stories on the web so a student worker fluent in HTML was hired. The job soon became too much even for the student working 20 hours a week so a new process had to be developed. The original agnews web site was hosted on a Macintosh server which did not allow the type of programming required. The news team made the switch to Linux.

Method

The Linux/UNIX Operating System

Linux is a version of the UNIX operating system that was designed to run on the Intel x86 architecture, more commonly referred to as a PC. The advantage of using Linux on a PC is that Linux gives you all of the power of UNIX, a true multitasking operating system. Another advantage of Linux is that it does not require the purchase of expensive software. UNIX can be quite costly, and Linux can be obtained for no cost. Most UNIX operating systems also require expensive server hardware designed specifically to run UNIX. Linux is capable of running well on old 486 hardware, and has revived the life of many old PCs. It can even run on Macintosh, and the expensive hardware designed to run the expensive UNIX systems. With no budget the news team took this approach. Since then the server has been upgraded to a Pentium II.

The only other aspect of Linux important to this paper is the directory structure. The structure is similar to that of a PC running DOS, Windows, or Macintosh, but does not refer to drive letters or names such as "hard drive." It is commonly referred to as a tree growing upside down. The first level is referred to a "root." The "root" directory is referred to as a "/" by the operating system. All other directories are subdirectories of root and can have subdirectories of their own.

The agnews server stores all of its news stories under "/home/www/agnews/dailynews/stories." Under the stories directory there are 29 of subdirectories, one for each story beat. Each beat subdirectory contains three of its own subdirectories: photos, audio, and video. These subdirectories store the various forms of multimedia and associated files that go with the news stories.

Programming in Perl

Perl is a powerful programming language that comes with Linux and can also be made to run on other platforms. Perl is good to use for complex tasks that would be too complicated to code in other programming languages such as C. The same thing programmed in Perl will not be executed as quickly as it would in C, but the development time can be significantly less...and in most cases where Perl is the appropriate tool the execution time difference is measured in nanoseconds.

Perl allows for easily searching for patterns in text files. For example, if a user wanted to find the word "abc" in a piece of text he could issue the Perl command: m/abc/. Perl will also allow the substitution of one pattern for another. If the user wanted to change the word "abc" to "xyz" he could issue the Perl command: s/abc/xyz/. This is a very brief explanation of these commands. Depending on what type of text pattern the program is looking for the patterns can be very cumbersome and look like gibberish to the untrained eye. Regardless, they allow for the easy transformation of text to HTML if the format of the text is known to follow certain protocols.

To post a story to the agnews web site is as simple as cutting and pasting the story into CGI programs called "posters." A CGI program is simply a program that allows a person to enter data via a web browser. The web server can then manipulate this information to produce the desired results. In this case the CGI program is written in Perl. The CGI program takes the entered data and writes a special email message. When this email message arrives at its destination it starts the process of updating the web site. To post a story, complete with photos, audio, and/or video to the web site requires two main steps: multimedia poster and story poster.

Multimedia Poster

First the necessary photo, audio, and video files must be placed in the corresponding category directories via FTP or any other method the user desires. Then the multimedia poster, a CGI program, is brought up in the web browser. In this program the names of the multimedia files are entered. There are also areas to enter information such as captions for any pictures and scripts for audio and video. The category of the story is also entered so the program knows how to create the links that are required for the HTML pages. When this information is entered a specially formatted email is sent. This email contains "triggers." Other Perl programs are watching a special email box and when an email arrives with the correct triggers the appropriate files are written to the web site.

Story Poster

Once the necessary multimedia files have been written the web page that contains the whole story must be created. The story poster, another CGI program, completes this task. The text story is entered with certain information like story title, writer, contact, etc. being marked with simple tags. These tags will be used later to format the HTML. For example, "Writer: John Doe, jdoe@tamu.edu" gets changed to "Writer: John Doe, jdoe@tamu.edu."

The category of the story is also entered. Typically there is a single category, but the program is able to handle multiple categories in case the story is relevant to multiple story beats. The names of the files created by the multimedia poster must also be entered. Then the email is formatted and sent.

Parse and Sort

Once the email is received several things happen. First, a program called "parse.pl" takes the email message. It begins by striping away all of the email header information, i.e. From, To, Subject, etc. Next it scans the remainder of the email in search of the simple tags that the story poster uses to mark special information. These tags are replaced with the HTML tags to highlight the story title and writer information as mentioned above. Parse also makes sure that special characters that do not translate directly into HTML (for example, "&" is represented as "&" in HTML) are formatted correctly. Next parse takes the file names of any associated multimedia and writes them into links so anyone viewing the web site can look at the pictures, listen to the audio or watch the video. Parse then attaches the appropriate HTML header information and the appropriate end tags. The result is an HTML page that contains all of the story information ready to present to an Internet reader.

Once parse is finished the only step left is to place the story on the web site and the links to the various index pages. Several archives are kept on the agnews web site. Each beat keeps an archive of all of its stories. Stories are also stored by month of the year they run. Lastly, stories are archived by the types of multimedia they may contain. A program called "sort.pl" does this housekeeping procedure. Sort takes the HTML page from parse and determines which category the story belongs in and writes the file to the proper story category directory. (Parse wrote the category of the story into the body of the story as an HTML comment) Sort names the file containing the story by the date it runs. Next sort writes the link that will go on the archive pages. Each link contains the first graph of the story as a summary. Once all of the links are written to the archive pages sort checks for the special "TOP" marker. This marker means the story should also appear on the front page of the web site. If the TOP marker is present sort rewrites the front page of the web site to feature the story.

Conclusions

The process may seem complicated but it only requires intermediate programming skills. The program simply automates each of the steps previously performed by a student worker. The known structure of the web site (managed by the Linux operating system), and the simple formatting done by the writers along with the power of Perl allows for non-technical personnel to update the web site without having to understand the underlying technology. Not only does it allow the writers to focus more on their writing, but it also eliminates student worker positions and the funding that goes with them. Before the poster programs were developed the news team employed a student worker to devote 20 hours per week to put their stories on the web site. Now the only student to work on the agnews web site also works on several others web sites and sometimes has to actively search for extra work to fill a 20 hour work week.

Appendix

Story Categories/Beats:
4-H and Youth
Agricultural Communications
Agricultural Economics
Agricultural Education
Agricultural Engineering
Agriculture Program
Animal Science
Anthrax
Biochemistry and Biophysics
Consumer and Family Sciences
Entomology
Food and Nutrition
Foot-and-Mouth Disease
Forest Science
General
Health
Horticultural Sciences
Plant Pathology and Microbiology
Poultry Science
Rangeland Ecology and Management
Recreation, Park and Tourism Sciences
Rural Sociology
Soil and Crop Sciences
Texas Agricultural and Natural Resources Summit Initiative
Texas Crop and Weather Report
Texas Drought
Texas Forest Service
Veterinary Medicine and Science
Wildlife and Fisheries Sciences


mm_poster.cgi:
http://agnews.tamu.edu/posters/mm_poster.cgi

story_poster.cgi
http://agnews.tamu.edu/posters/story_poster.cgi

parse.pl
http://agnews.tamu.edu/posters/parse.pl.txt

sort.pl
http://agnews.tamu.edu/posters/sort.pl.txt