As a part of my recent project to build a web based fortune cookie, I wrote the following script to parse fortune files and insert them into a database table. Most of this code is relevant regardless of what you want to do with the data.
It makes use of database handles which I described here.
The only really new thing in this script is a state machine to handle multiple lines of quote text. I have removed the non essential elements here.
# read each line from the file
while(<FH>){
# set the state to "in quote"
$in_quote=1;
# % indicates the end of a fortune
if ($_ eq '%'){
# run query
# reset our state machine
$in_quote=0;
$author='';
$text='';
}
elsif ($in_quote == 1 && $_ =~ m/^[\s|\t]+-- (.*)$/){
$author="$1";
}
elsif ($in_quote == 1 && !($_ =~ m/^$/)){
$text.="$_<BR />";
}
}
As you may be able to see, this type of behavior can be used to parse a great many things that have a beginning and ending within a file. A more sophisticated version could be used to parse an apache config file, an xml file or something similar.
#!/usr/bin/perl
use warnings;
use strict;
use DBI;
# variable definition
my $dbuser = "redacted";
my $dbpass = "redacted";
my $db = "redacted";
my $dbhost = 'localhost';
my $in_quote;
my $text;
my $author;
my $query;
# open the file
open(FH,"$ARGV[0]") || die "cannot open file $ARGV[0]";
# connect to the database
my $dbh = DBI->connect("DBI:mysql:$db:$dbhost", $dbuser, $dbpass) || die "Could not connect to database: $DBI::errstr";
$dbh->{PrintError} = 1; # do this, or check every call for errors
# prepare some queries we are likely to use
# note the use of q{} to protect the quotes in the statement
my $sthInsertRecord = $dbh->prepare(q{
INSERT INTO fortunes (`text`, `author`)
VALUE (?,?);
});
# read each line from the file
while(<FH>){
# handle new lines
chomp;
# set the state to "in quote"
$in_quote=1;
if ($_ eq '%'){
# i limited my table column to 255 characters to keep it fast
if(length($text) <= 255){
# execute the query
$sthInsertRecord->execute($text, $author);
}
# reset our state machine
$in_quote=0;
$author='';
$text='';
}
# if there is an author, it is indicated by a leading --
elsif ($in_quote == 1 && $_ =~ m/^[\s|\t]+-- (.*)$/){
$author="$1";
}
# if it isn't the terminator and it isn't an author then it is text
elsif ($in_quote == 1 && !($_ =~ m/^$/)){
$text.="$_<BR />";
}
}
#clean up
$sthInsertRecord->finish;
$dbh->disconnect();
close(FH);