Posts Tagged ‘xml’

lxml vs. ElementTree

Wednesday, October 14th, 2009

While lxml has some excellent benchmarks about the speed of lxml.etree vs. ElementTree, I wanted to run some tests that were as close as possible to my own use case (fairly simple multi-megabyte XML files).

Here are the results of my little test script lxml-v-etree.py (times are in milliseconds):

name           generate | tostring | total | write | parse | find | total
------------------------+----------+-------+-------+-------+------+------
xml.cElementTree    132 |   2430   |  2562 |  2433 |   158 |   58 |   216
xml.cElementTree    112 |   2384   |  2497 |  2387 |   158 |   25 |   183
xml.cElementTree    113 |   2393   |  2507 |  2396 |   161 |   25 |   187
xml.ElementTree     591 |   2571   |  3163 |  2574 |  3613 |   25 |  3638
xml.ElementTree     619 |   2567   |  3187 |  2570 |  3589 |   55 |  3644
xml.ElementTree     609 |   2578   |  3188 |  2581 |  3564 |   55 |  3619
lxml                333 |     75   |   409 |    82 |   200 |    0 |   201
lxml                355 |     93   |   448 |    95 |   182 |   32 |   214
lxml                310 |     94   |   404 |    96 |   156 |   56 |   213
------------------------+----------+-------+-------+-------+------+------
name           generate | tostring | total | write | parse | find | total
------------------------+----------+-------+-------+-------+------+------

Note that the first “total” is “generate + tostring” while the second “total” is for the 2 parsing related tests (previous 2 columns summed).

My parsing tests are basically “etree.parse” and then running “Element.getchildren()” 3 times, which is ridiculously simplistic and should probably be ignored. My writing tests are far more thorough/realistic.

I’m running Python 2.6.2 with lxml 2.1.5 and libxml2 2.6.32 on Ubuntu 9.04 x86_64.

Fun with SQLObject and mxDateTime

Thursday, November 29th, 2007

I’m working on a small CherryPy web service that among other things saves timestamps to a database. The timestamp is in RFC 3339 format (like 2007-07-31T16:05:00.000-05:00), and I needed to store the timezone.

Luckily mxDateTime and SQLObject’s DateTimeCol both support full dates with times and time zone. Unfortunately its not immediately obvious from SQLObject’s lackluster documentation how to use mxDateTime instead of Python’s built-in datetime.

A little searching brought me to a mailing list post about how to use mxDateTime by default in SQLObject. (I don’t know why the sample code includes the conditional as I would think you’d want your code to outright fail if you’re unable to use the datetime library you expect.)

So my model’s code looks something like this:

from sqlobject import *
from sqlobject import col
 
col.default_datetime_implementation = MXDATETIME_IMPLEMENTATION
 
class Foo(SQLObject):
    timestamp = DateTimeCol(default=DateTimeCol.now)

Then my parsing code looks something like this:

import model
from mx import DateTime
 
timestamp = '2007-07-31T16:05:00.000-05:00'
bar = model.Foo(timestamp=DateTime.DateTimeFrom(timestamp))
print 'UTC Timestamp:', bar.timestamp
print 'Local Timestamp:', bar.timestamp.localtime()

Basically once you use the magic line col.default_datetime_implementation = MXDATETIME_IMPLEMENTATION, everything Just Works.