{"id":903,"date":"2013-11-27T23:12:21","date_gmt":"2013-11-28T04:12:21","guid":{"rendered":"http:\/\/www.ashleyit.com\/blogs\/brentashley\/?p=903"},"modified":"2013-11-28T19:56:08","modified_gmt":"2013-11-29T00:56:08","slug":"using-regular-expressions-with-sqlite","status":"publish","type":"post","link":"https:\/\/ashleyit.com\/blogs\/brentashley\/2013\/11\/27\/using-regular-expressions-with-sqlite\/","title":{"rendered":"Using Regular Expressions with SQLite"},"content":{"rendered":"<p><a href=\"http:\/\/sqlite.org\" title=\"SQLite\">SQLite<\/a> is a great database tool that is available almost anywhere and is quick and easy to use.<\/p>\n<p>I am currently using it for work I am doing for a client.  I am writing a rules-based data validator in for a tab-delimited file using Perl on Windows.  With SQLite, it&#8217;s simple to slurp the whole file into a database table and then query it multiple times to see whether columns conform to rules.<\/p>\n<p>SQLite doesn&#8217;t quite come with a built-in REGEXP operator.  I say <b><i>not quite<\/i><\/b> because it <i>does<\/i> alias REGEXP to a call to a function called regexp(re,str), however the function does not exist unless you provide it.<\/p>\n<p>You can provide it by compiling a dynamic library extension (.so in Linux, .dll in Windows, .dylib on Mac), but my experience with that route was frustrating and bothersome.  It&#8217;s only really necessary if you need to use SQLite with regular expressions from the commandline tool or a program outside your control.<\/p>\n<p>Today I found out that if you are using SQLite in a program, you can provide the function in your own language!<\/p>\n<p>Here is a simple example in PHP (it&#8217;s just as easy in Ruby or Perl or Python or whatever).  Save this to dbtest.php and run it from the command prompt with &#8216;php dbtest.php&#8217;:<\/p>\n<blockquote>\n<pre>\r\n&lt;?\r\n\r\nif(! file_exists('test.db')){\r\n        $db = new SQLite3('test.db');\r\n        $db->exec(\"create table users (name varchar(10))\");\r\n        $db->exec(\"insert into users values ('Robert')\");\r\n        $db->exec(\"insert into users values ('Egbert')\");\r\n        $db->exec(\"insert into users values ('Dilbert')\");\r\n        $db->exec(\"insert into users values ('Ratbert')\");\r\n        $db->exec(\"insert into users values ('Dogbert')\");\r\n        $db->exec(\"insert into users values ('Umberto')\");\r\n        $db->close();\r\n}\r\n\r\n$db = new SQLite3('test.db');\r\n\r\n\/\/ add the REGEXP magic to the db\r\nfunction regexp($r,$s){return (preg_match(\"\/$r\/\",$s)===1);}\r\n$db->createFunction('regexp', 'regexp', 2);\r\n\r\n\/\/ find names that start with a capital, two lowercase, then bert\r\n$sql = \"SELECT * FROM users WHERE name REGEXP '^[A-Z][a-z]{2}bert'\";\r\n$result = $db->query($sql);\r\nwhile ($row = $result->fetchArray()) {\r\n  echo $row['name'] . \"\\n\";\r\n}\r\n<\/pre>\n<\/blockquote>\n<p>\/p?easy\/ !<\/p>\n<p>I would have thought it would be really slow, but I can run a REGEXP comparison against thousands of records and it&#8217;s plenty quick enough for my needs.<\/p>\n<p><strong>*Update*<\/strong><\/p>\n<p>I just added a CAPTURE function to SQLite that returns the first captured section of a string when you supply a regular expression with parentheses.<\/p>\n<p>Here it is in Perl:<\/p>\n<pre>\r\n<blockquote>\r\n$dbh->func('capture',2,sub {\r\n  my($regex,$string) = @_;\r\n  my($capture) = $string =~ \/$regex\/;\r\n  return $capture;\r\n}, 'create_function');\r\n\r\n# get month from YYYY\/MM\/DD format where month or day could be one or two digits\r\nmy $row = $dbh->selectrow_arrayref(\r\n  \"SELECT CAPTURE('\\\\d{4}\/(\\\\d{1,2})\/\\\\d{1,2}', date_created) FROM invoices\"\r\n);\r\n \r\n<\/blockquote>\r\n<\/pre>\n<pre>\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>SQLite is a great database tool that is available almost anywhere and is quick and easy to use. I am currently using it for work I am doing for a client. I am writing a rules-based data validator in for a tab-delimited file using Perl on Windows. With SQLite, it&#8217;s simple to slurp the whole [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"activitypub_content_warning":"","footnotes":""},"categories":[1,12],"tags":[],"class_list":["post-903","post","type-post","status-publish","format-standard","hentry","category-blather","category-development"],"_links":{"self":[{"href":"https:\/\/ashleyit.com\/blogs\/brentashley\/wp-json\/wp\/v2\/posts\/903","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ashleyit.com\/blogs\/brentashley\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ashleyit.com\/blogs\/brentashley\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ashleyit.com\/blogs\/brentashley\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ashleyit.com\/blogs\/brentashley\/wp-json\/wp\/v2\/comments?post=903"}],"version-history":[{"count":27,"href":"https:\/\/ashleyit.com\/blogs\/brentashley\/wp-json\/wp\/v2\/posts\/903\/revisions"}],"predecessor-version":[{"id":930,"href":"https:\/\/ashleyit.com\/blogs\/brentashley\/wp-json\/wp\/v2\/posts\/903\/revisions\/930"}],"wp:attachment":[{"href":"https:\/\/ashleyit.com\/blogs\/brentashley\/wp-json\/wp\/v2\/media?parent=903"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ashleyit.com\/blogs\/brentashley\/wp-json\/wp\/v2\/categories?post=903"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ashleyit.com\/blogs\/brentashley\/wp-json\/wp\/v2\/tags?post=903"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}