/dev/null

Static is evil.

May 19, 2008

Updated version of Zend_Db_Adapter_Odbtp_Mssql

As people start asking me about that ODBTP adapter for Zend Framework, I will publish it as attachment to this post now. It should work with the latest version of Zend Framework (as reported by a developer). If you want to use it, just copy the files to your include path (make sure the path comes before the ZF path). Of course, you need to install ODBTP as well and add extension=php_odbtp_mssql.dll to your php.ini.

Download here

April 30, 2008

Switched from Drupal 5 to WordPress 2.5

Tags: — 13:37

After having trouble with the Drupal upgrade I decided to give WordPress a try. The funny thing is to copy the existing posts to the new tables. If you got the same problem, you can use this view:

CREATE VIEW wp_import AS SELECT a.nid AS ID, 1 AS post_author, FROM_UNIXTIME(a.created) AS post_date, FROM_UNIXTIME(a.created) AS post_date_gmt, b.body AS post_content, b.title AS post_title, 0 AS post_category, b.teaser AS post_except, ‘publish’ AS post_status, ‘open’ AS comment_status, ‘open’ AS ping_status, ” AS post_password, ‘post’ + a.nid AS post_name, ” AS to_ping, ” AS pinged, FROM_UNIXTIME(a.changed) AS post_modified, FROM_UNIXTIME(a.changed) AS post_modified_gmt, ” AS post_content_filtered, 0 AS post_parent, a.nid AS guid, 0 AS menu_order, ‘post’ AS post_type, ” AS post_mime_type, 0 AS comment_count FROM node a JOIN node_revisions b ON a.nid = b.nid WHERE a.type = ‘blog’ AND a.nid > 1 AND a.status = 1;

Then you can easily import all your stuff (without the images, sorry):

INSERT wordpress.wp_posts SELECT * FROM wp_import;

Wanna convert your custom markup to HTML? No problem, just run this query (works for cached content only, so you might want to click on each of your older posts once as a quick fix):

UPDATE wordpress.wp_posts w, (SELECT a.nid, b.data AS filtered FROM node_revisions a, cache_filter b WHERE CONCAT_WS(‘:’, a.format, MD5(a.body)) = b.cid) AS c SET post_content = c.filtered WHERE c.filtered IS NOT NULL AND w.ID = c.nid;

There might be a way to import a Drupal blog via RSS as well – let’s see if somebody answers to my posting on the WordPress support forum.

Update (6.5.2008)

I found another blog post that explains how to migrate from Drupal 5 to WordPress 2.0. The provided SQL script also converts categories and comments, which mine does not. However, it does not care about input filters, so Wikicode will not be converted to HTML. The perfect solution would be a combination of our scripts ;)

October 6, 2007

Comma separated list in SQL

Tags: — 06:42

Just a short example of how to write a stored function that returns a comma separated list of values (Microsoft Transact-SQL):


CREATE FUNCION get_authors (@pubId int)
RETURNS nvarchar(max)
WITH EXECUTE AS CALLER
AS
BEGIN
  DECLARE @authorList nvarchar(max);

  SELECT @authorList = @authorList + ', ' +
    authorFirstName + ' ' + authorLastName
    FROM pdb_authors a, pdb_publicationAuthors b
    WHERE a.authorId = b.authorId AND b.pubId = @pubId;

  RETURN(SUBSTRING(@authorList, 3, LEN(@authorList) - 2))
END

September 18, 2007

Optimizing SQL – Part 1

Tags: — 15:37

Design approaches for multi-lingual data

In this example, we’re assuming a dataset consisting of 6 multi-lingual fields. Each field holds strings with an average length of 200 characters in the 4 locales en, de, zh and ru. A fallback function is implemented using the CASE expression (doesn’t add much overhead and is faster than fetching data in two locales and implementing the fallback at application level).

This is how your queries will look like, if you use the “dictionary approach” (master data in dict_books and locale specific strings in the general table dict_dictionary, which is shared between all master tables):

SELECT book_id, CASE WHEN loc0.dict_value IS NOT NULL THEN loc0.dict_value ELSE org0.dict_value END AS trans_author_id, CASE WHEN loc1.dict_value IS NOT NULL THEN loc1.dict_value ELSE org1.dict_value END AS trans_title_id, CASE WHEN loc2.dict_value IS NOT NULL THEN loc2.dict_value ELSE org2.dict_value END AS trans_subtitle_id, CASE WHEN loc3.dict_value IS NOT NULL THEN loc3.dict_value ELSE org3.dict_value END AS trans_abstract_id, CASE WHEN loc4.dict_value IS NOT NULL THEN loc4.dict_value ELSE org4.dict_value END AS trans_copyright_id, CASE WHEN loc5.dict_value IS NOT NULL THEN loc5.dict_value ELSE org5.dict_value END AS trans_body_id FROM dict_books a JOIN dict_dictionary org0 ON a.author_id = org0.dict_id AND org0.dict_locale = 'en' LEFT JOIN dict_dictionary loc0 ON a.author_id = loc0.dict_id AND loc0.dict_locale = 'ru' JOIN dict_dictionary org1 ON a.title_id = org1.dict_id AND org1.dict_locale = 'en' LEFT JOIN dict_dictionary loc1 ON a.title_id = loc1.dict_id AND loc1.dict_locale = 'ru' JOIN dict_dictionary org2 ON a.subtitle_id = org2.dict_id AND org2.dict_locale = 'en' LEFT JOIN dict_dictionary loc2 ON a.subtitle_id = loc2.dict_id AND loc2.dict_locale = 'ru' JOIN dict_dictionary org3 ON a.abstract_id = org3.dict_id AND org3.dict_locale = 'en' LEFT JOIN dict_dictionary loc3 ON a.abstract_id = loc3.dict_id AND loc3.dict_locale = 'ru' JOIN dict_dictionary org4 ON a.copyright_id = org4.dict_id AND org4.dict_locale = 'en' LEFT JOIN dict_dictionary loc4 ON a.copyright_id = loc4.dict_id AND loc4.dict_locale = 'ru' JOIN dict_dictionary org5 ON a.body_id = org5.dict_id AND org5.dict_locale = 'en' LEFT JOIN dict_dictionary loc5 ON a.body_id = loc5.dict_id AND loc5.dict_locale = 'ru'

First, I would like to mention that inserting data in those tables is a mess, because you’ll need 25 INSERTs for each dataset (compared to 5 for the relational design below).

The query for the “true relational design” with one table (books) for the master data and one for the locale specific strings (books_locale) looks like:

SELECT a.book_id, CASE WHEN c.author IS NOT NULL THEN c.author ELSE b.author END AS author, CASE WHEN c.title IS NOT NULL THEN c.title ELSE b.title END AS title, CASE WHEN c.subtitle IS NOT NULL THEN c.subtitle ELSE b.subtitle END AS subtitle, CASE WHEN c.abstract IS NOT NULL THEN c.abstract ELSE b.abstract END AS abstract, CASE WHEN c.copyright IS NOT NULL THEN c.copyright ELSE b.copyright END AS copyright, CASE WHEN c.body IS NOT NULL THEN c.body ELSE b.body END AS body FROM books a JOIN books_locale b ON a.book_id = b.book_id AND b.locale = 'en' LEFT JOIN books_locale c ON a.book_id = c.book_id AND c.locale = 'ru';

Now, wonder what’s the difference in execution time? The first query takes 0.30s and the second just 0.06s (1000 rows). If do the query with 10,000 books instead, the execution time is 3.07s against 0.56s. Conclusion: The relational design is about 5.5 times faster than the dictionary type of design.

There’s a third option, which is using a single table with a primary key consisting of book_id and book_locale. The query would look like

SELECT b.book_id, CASE WHEN c.author IS NOT NULL THEN c.author ELSE b.author END AS author, CASE WHEN c.title IS NOT NULL THEN c.title ELSE b.title END AS title, CASE WHEN c.subtitle IS NOT NULL THEN c.subtitle ELSE b.subtitle END AS subtitle, CASE WHEN c.abstract IS NOT NULL THEN c.abstract ELSE b.abstract END AS abstract, CASE WHEN c.copyright IS NOT NULL THEN c.copyright ELSE b.copyright END AS copyright, CASE WHEN c.body IS NOT NULL THEN c.body ELSE b.body END AS body FROM books b LEFT JOIN books c ON b.book_id = c.book_id AND c.book_locale = 'ru' WHERE b.book_locale = 'en'

And the execution time for 10.000 entries is… just 0.38s! (0.05s for 1.000) That is below the relational design and makes me a bit angry, but anyways,… Inserting data is also faster, though not that safe because you can get in trouble with foreign keys. I think it’s time for database vendors to integrate multi-lingual features in their products. Like they did with XML and GIS before.

Why use subselects?

Compare those two queries:

SELECT a.article_id, article_name, b.effective_from_date, b.price FROM articles a JOIN price_history b WHERE a.article_id = b.article_id AND b.effective_from_date = (SELECT max(effective_from_date) FROM price_history c WHERE a.article_id = c.article_id AND c.effective_from_date <= NOW()) ORDER BY a.article_id;

SELECT a.article_id, article_name, b.effective_from_date, b.price FROM articles a JOIN price_history b ON a.article_id = b.article_id AND b.effective_from_date <= NOW() LEFT JOIN price_history c ON b.article_id = c.article_id AND b.effective_from_date < c.effective_from_date AND c.effective_from_date <= NOW() WHERE c.effective_from_date IS NULL ORDER BY a.article_id;

Which one is faster? They both do the same, which is returning a list of products with their current price, according to a price history table (note that there might be future prices stored as well).

The first query takes about 0.27s and the second 6.26s (1000 articles with 100 prices in the history table each) – this is why subselects are often not that bad!

June 8, 2007

Fun with SQL

Tags: — 10:02

I had heaps of fun with SQL today (again). Ever heard of Common Table Expressions? Here is an example of a recursive query:


WITH
nodeCTE (nodeId, nodeName)
AS (
SELECT
a.nodeId,
nodeName = CONVERT(varchar(8000), nodeName)
FROM
nodeNames a JOIN nodes b ON a.nodeId = b.nodeId
AND a.locale = 'en' AND b.parentId = 0
UNION ALL SELECT y.nodeId,
x.nodeName + ' -> ' + CONVERT(varchar(8000),
z.nodeName)
FROM
nodeCTE x JOIN nodes y ON y.parentId = x.nodeId
JOIN nodeNames z ON z.nodeId = y.nodeId AND
z.locale = 'en'
)
SELECT * FROM nodeCTE;

Another task was to create a database record in the locale ‘de’, if it did not exist yet:


INSERT INTO users
(userId, locale, title, firstName, lastName,
middleName, nobleName, displayName, email)
SELECT
userId, 'de', title, firstName, lastName,
middleName, nobleName, displayName, email
FROM
users b
WHERE
locale = 'en' AND NOT EXISTS
(
SELECT NULL
FROM mdb_users a
WHERE a.userId = b.userId AND locale = 'de'
);

Afterwards, it was easy to eliminate all special (German) characters from the original records in the default locale ‘en’.
Did you know that there is a major difference between German and English is the abbreviation for “Doctor”, which is “Dr” in English and “Dr.” in German? That had to be fixed too:

UPDATE users
SET title = REPLACE(title, 'Dr.', 'Dr')
WHERE locale = 'en';

Last query example is a search for name parts and the phone extension. You had to search in all locales of the multi-lingual database, but only return distinct results in the locale ‘en’ with Name und Initial:


SELECT
displayName, initial
FROM
users
WHERE
locale = 'en'
AND
userId IN


(
SELECT
userId
FROM
users
WHERE
(lastName LIKE '%example%' OR
firstName LIKE '%example%' OR
middleName LIKE '%example%' OR
nobleName LIKE '%example%' OR
title LIKE '%example%' OR
initial LIKE '%example%')
UNION SELECT
userId
FROM
userAddresses
WHERE
adrPhoneExt = '%example%'
)


ORDER BY
lastName

May 31, 2007

Zend Framework: Unicode DB adapter for SQL Server

Tags: — 13:26

Yeah, I finally got it managed to publish my proposal for the Zend Framwork ODBTP database adapter component:

http://framework.zend.com/wiki/display/ZFPROP/Zend_Db_Adapter_Odbtp_Mssql

The default MS SQL database driver does not work with Unicode, as SQL Server expects Unicode strings in the UCS-2 encoding (UCS-2 contains null bytes and the driver is not binary-safe). ODBTP works as layer between PHP and SQL Server and accepts Unicode strings in the UTF-8 encoding.

The required classes are attached bellow. Feel free to download, use and test them! Please note that they *might* need to be adapted for the latest framework release.

I also wrote an article about multi-lingual enterprise applications for the German PHP Magazine. Hope it will be published in the next issue.

November 10, 2006

Country list as MySQL dump

Tags: — 14:18

I just imported a list of all countries from a CSV file and re-exported it as (My)SQL dump:

http://www.nulldevice.de/wp-content/uploads/2008/04/countries.sql

Feel free to download and use it! An even more complete list (including TLDs etc) will follow…

Powered by PHP, Memcached, Suhosin, MySQL and WordPress