Parsing linux fortune files into mysql


As a part of my recent project to build a web based fortune cookie, I wrote the following script to parse fortune files and insert them into a database table. Most of this code is relevant regardless of what you want to do with the data.

It makes use of database handles which I described here.

The only really new thing in this script is a state machine to handle multiple lines of quote text. I have removed the non essential elements here.

# read each line from the file
while(<FH>){
        # set the state to "in quote"
        $in_quote=1;
        # % indicates the end of a fortune
        if ($_ eq '%'){
                # run query

                # reset our state machine
                $in_quote=0;
                $author='';
                $text='';
        }
        elsif ($in_quote == 1 && $_ =~ m/^[\s|\t]+-- (.*)$/){
                $author="$1";
        }
        elsif ($in_quote == 1 && !($_ =~ m/^$/)){
                $text.="$_<BR />";
        }
}

As you may be able to see, this type of behavior can be used to parse a great many things that have a beginning and ending within a file. A more sophisticated version could be used to parse an apache config file, an xml file or something similar.

#!/usr/bin/perl

use warnings;
use strict;
use DBI;

# variable definition
my $dbuser = "redacted";
my $dbpass = "redacted";
my $db     = "redacted";
my $dbhost = 'localhost';
my $in_quote;
my $text;
my $author;
my $query;
# open the file

open(FH,"$ARGV[0]") || die "cannot open file $ARGV[0]";

# connect to the database

my $dbh = DBI->connect("DBI:mysql:$db:$dbhost", $dbuser, $dbpass) || die "Could not connect to database: $DBI::errstr";
$dbh->{PrintError} = 1; # do this, or check every call for errors

# prepare some queries we are likely to use
# note the use of q{} to protect the quotes in the statement
my $sthInsertRecord = $dbh->prepare(q{
    INSERT INTO fortunes (`text`, `author`)
    VALUE (?,?);
});

# read each line from the file
while(<FH>){
        # handle new lines
        chomp;
        # set the state to "in quote"
        $in_quote=1;
        if ($_ eq '%'){
                # i limited my table column to 255 characters to keep it fast
                if(length($text) <= 255){
                        # execute the query
                        $sthInsertRecord->execute($text, $author);
                }
                # reset our state machine
                $in_quote=0;
                $author='';
                $text='';
        }
        # if there is an author, it is indicated by a leading --
        elsif ($in_quote == 1 && $_ =~ m/^[\s|\t]+-- (.*)$/){
                $author="$1";
        }
        # if it isn't the terminator and it isn't an author then it is text
        elsif ($in_quote == 1 && !($_ =~ m/^$/)){
                $text.="$_<BR />";
        }

}
#clean up
$sthInsertRecord->finish;
$dbh->disconnect();
close(FH);
About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s