SQLite is a great database tool that is available almost anywhere and is quick and easy to use.
I am currently using it for work I am doing for a client. I am writing a rules-based data validator in for a tab-delimited file using Perl on Windows. With SQLite, it’s simple to slurp the whole file into a database table and then query it multiple times to see whether columns conform to rules.
SQLite doesn’t quite come with a built-in REGEXP operator. I say not quite because it does alias REGEXP to a call to a function called regexp(re,str), however the function does not exist unless you provide it.
You can provide it by compiling a dynamic library extension (.so in Linux, .dll in Windows, .dylib on Mac), but my experience with that route was frustrating and bothersome. It’s only really necessary if you need to use SQLite with regular expressions from the commandline tool or a program outside your control.
Today I found out that if you are using SQLite in a program, you can provide the function in your own language!
Here is a simple example in PHP (it’s just as easy in Ruby or Perl or Python or whatever). Save this to dbtest.php and run it from the command prompt with ‘php dbtest.php’:
<?
if(! file_exists('test.db')){
$db = new SQLite3('test.db');
$db->exec("create table users (name varchar(10))");
$db->exec("insert into users values ('Robert')");
$db->exec("insert into users values ('Egbert')");
$db->exec("insert into users values ('Dilbert')");
$db->exec("insert into users values ('Ratbert')");
$db->exec("insert into users values ('Dogbert')");
$db->exec("insert into users values ('Umberto')");
$db->close();
}
$db = new SQLite3('test.db');
// add the REGEXP magic to the db
function regexp($r,$s){return (preg_match("/$r/",$s)===1);}
$db->createFunction('regexp', 'regexp', 2);
// find names that start with a capital, two lowercase, then bert
$sql = "SELECT * FROM users WHERE name REGEXP '^[A-Z][a-z]{2}bert'";
$result = $db->query($sql);
while ($row = $result->fetchArray()) {
echo $row['name'] . "\n";
}
/p?easy/ !
I would have thought it would be really slow, but I can run a REGEXP comparison against thousands of records and it’s plenty quick enough for my needs.
*Update*
I just added a CAPTURE function to SQLite that returns the first captured section of a string when you supply a regular expression with parentheses.
Here it is in Perl:
$dbh->func('capture',2,sub {
my($regex,$string) = @_;
my($capture) = $string =~ /$regex/;
return $capture;
}, 'create_function');
# get month from YYYY/MM/DD format where month or day could be one or two digits
my $row = $dbh->selectrow_arrayref(
"SELECT CAPTURE('\\d{4}/(\\d{1,2})/\\d{1,2}', date_created) FROM invoices"
);