Archive for the ‘Python’ Category

Making Server-Side MongoDB Functions Less Awkward

Monday, January 11th, 2010

I’ve recently switched my project at work to use MongoDB for the user database and a few other datasets.

Currently I don’t use many JavaScript functions, but when I do I like to store them on the server so that they’re accessible when I’m poking around in a console.

I use something similar to the following function to load all of my JS functions onto the server when my app starts:

import os
import pymongo
import pkg_resources
 
# Relative to distribution's root
SCRIPT_DIR = os.path.join('model', 'js')
 
def init_js(db):
    '''Initializes server-side javascript functions'''
    scripts = filter(
            lambda f: f.endswith('.js'),
            pkg_resources.resource_listdir(__name__, SCRIPT_DIR)
        )
    for script in scripts:
        # Name the function after the script name
        func_name, _ = script.split('.', 1)
        script_path = os.path.join(SCRIPT_DIR, script)
 
        # Create a pymongo Code object
        # otherwise it will be stored as a string
        code = pymongo.code.Code(
                pkg_resources.resource_string(__name__, script_path))
 
        # Upsert the function
        db.system.js.save({ '_id': func_name, 'value': code, })

However, using server-side functions from Python is awkward at best. Say I have the JavaScript function:

add.js

function(x, y) {
    return x + y;
}

To run that function via PyMongo requires wrapping the function call with placeholder parameters in a Code object and passing in values as a dict:

var1 = 1
var2 = 2
result = db.eval(pymongo.code.Code('add(a, b)', {'a': var1, 'b': var2,}))
assert result == 3

Update: See MongoDB dev Mike Dirolf comment to see a much more concise way of executing server-side functions.

Bearable for simple functions, but having to manually map parameters to values is tiresome and error prone with longer function signatures.

What I wanted was something more natural like:

var1 = 1
var2 = 2
result = db.add(var1, var2)
assert result == 3

I use a simple PyMongo Database object wrapper to make my life easier:

import string
 
from pymongo.code import Code
 
class ServerSideFunctions(object):
    def __init__(self, db):
        self.db = db
 
    def func_wrapper(self, func):
        '''Returns a closure for calling a server-side function.'''
        params = [] # To keep params ordered
        kwargs = {}
        def server_side_func(*args):
            '''Calls server side function with positional arguments.'''
            # Could be removed with better param generating logic
            if len(args) > len(string.letters):
                raise TypeError('%s() takes at most %d arguments (%d given)'
                        % (func, len(string.letters), len(args)))
 
            # Prepare arguments
            for k, v in zip(string.letters, args):
                kwargs[k] = v
                params.append(k) 
 
            # Prepare code object
            code = Code('%s(%s)' % (func, ', '.join(params)), kwargs)
 
            # Return result of server-side function
            return self.db.eval(code)
        return server_side_func
 
    def __getattr__(self, func):
        '''Return a closure for calling server-side function named `func`'''
        return self.func_wrapper(func)
 
dbjs = ServerSideFunctions('foo')
var1 = 1
var2 = 2
result = dbjs.add(var1, var2)
assert result == 3

I’m tempted to monkey-patch PyMongo’s Database class to add a ServerSideFunctions instance directly as a js attribute, so then I could drop the confusing dbjs variable and just use:

assert db.js.add(1,2) == 3

If someone knows of a better way to access server-side MongoDB functions from Python, please let me know!

I modified this code to remove code specific to my project, so please let me know if there are errors.

lxml vs. ElementTree

Wednesday, October 14th, 2009

While lxml has some excellent benchmarks about the speed of lxml.etree vs. ElementTree, I wanted to run some tests that were as close as possible to my own use case (fairly simple multi-megabyte XML files).

Here are the results of my little test script lxml-v-etree.py (times are in milliseconds):

name           generate | tostring | total | write | parse | find | total
------------------------+----------+-------+-------+-------+------+------
xml.cElementTree    132 |   2430   |  2562 |  2433 |   158 |   58 |   216
xml.cElementTree    112 |   2384   |  2497 |  2387 |   158 |   25 |   183
xml.cElementTree    113 |   2393   |  2507 |  2396 |   161 |   25 |   187
xml.ElementTree     591 |   2571   |  3163 |  2574 |  3613 |   25 |  3638
xml.ElementTree     619 |   2567   |  3187 |  2570 |  3589 |   55 |  3644
xml.ElementTree     609 |   2578   |  3188 |  2581 |  3564 |   55 |  3619
lxml                333 |     75   |   409 |    82 |   200 |    0 |   201
lxml                355 |     93   |   448 |    95 |   182 |   32 |   214
lxml                310 |     94   |   404 |    96 |   156 |   56 |   213
------------------------+----------+-------+-------+-------+------+------
name           generate | tostring | total | write | parse | find | total
------------------------+----------+-------+-------+-------+------+------

Note that the first “total” is “generate + tostring” while the second “total” is for the 2 parsing related tests (previous 2 columns summed).

My parsing tests are basically “etree.parse” and then running “Element.getchildren()” 3 times, which is ridiculously simplistic and should probably be ignored. My writing tests are far more thorough/realistic.

I’m running Python 2.6.2 with lxml 2.1.5 and libxml2 2.6.32 on Ubuntu 9.04 x86_64.

Python Packaging Talk

Wednesday, September 9th, 2009

I gave a talk at PDX Python last night on Python Packaging. It’s just an overview and introduction completely lacking in any practical examples.

Let me know if the ODP source is messed up. OpenOffice.org liked randomly losing background images and forgetting other formatting.

So as penance I quick hacked up a silly little command line utility and uploaded it to PyPI to serve as a simple packaging example:

It’d be nice to add some more advanced features like test running, including package data, and building C extensions. If you feel adventurous please fork it and send me a pull request on BitBucket.

Thanks to everyone who came to PDX Python last night! Especially Armin Ronacher who was able to clarify and elaborate on a number of different distutils/setuptools topics!

Update: Just spotted an excellent post on distutils and setuptools by Tarek Ziadé. Make sure to read his blog if you’re interested in packaging in Python.

Switched tc-rest to webob

Monday, August 10th, 2009

Small update on my toy tc-rest project: I switched to using WebOb for creating HTTP Request and Response objects. Cleaned up the code a bit, but a real dispatcher is what’s needed to really remove the cruft.

I’m anxious to extend the API and add features, but I have no clue when I’ll have time to touch it again. In the mean time I’ve pushed tc-rest to bitbucket.org if you want to take a look.

TokyoCabinet + fapws3 = tc-rest

Saturday, August 8th, 2009

Have you ever wondered how hard it would be to tack a RESTful HTTP interface on top of a fast key/value database like TokyoCabinet?

Probably not, but I did: tc-rest.tar.gz

Components:

  • TokyoCabinet – my favorite persistent key/value database
  • pytc – a wonderful Python wrapper for TC
  • fapws3 – a fast libev based HTTP/WSGI server
  • simpleson – (or Python >= 2.6) for encapsulating HTTP responses
  • okapi – a fantastic little static HTML page for testing HTTP APIs

Getting TokyoCabinet+pytc to work inside a virtualenv was a bit tricky, so check out my run.sh script if you’re having trouble getting it to start.

Once you get it started, load okapi in your browser:

http://localhost:8080/static/okapi.html

And then create a database by doing a POST like:

http://localhost:8080/foo/

And finally store/get keys and values using GET and POST requests like:

http://localhost:8080/foo/bar/
http://localhost:8080/foo/baz/

Doing a GET request to a database URL lists keys.

At any rate, I had big dreams for building a system where you would store JSON, specify indexes on certain keys, and the server would maintain those indexes for you by creating ad hoc TokyoCabinet databases.

Instead I ended up wasting most of my time learning how to write a low-level WSGI app. I should have just used CherryPy or Django from the beginning, but I had never written a pure WSGI app before. It was a good lesson even if it meant not getting some of my features implemented.

I’ll probably keep playing with this idea, but the next version will probably be based on some existing framework. Parsing environ['PATH_INFO'] and running start_response(...) manually gets old fast.

fapws3 is pretty neat, but had lots of annoying rough edges. I had to manually create a README file because its setup.py expects one to exist. Then I had to manually allow DELETE HTTP methods in fapws/base.py, otherwise it would return an HTML error message for me! That was a bit shocking since I was working under the assumption fapws3 is just a low-level HTTP/WSGI server.

Update

  1. If you’re new to TokyoCabinet, I posted my presentation on it that I did at Portland Python meetup.
  2. Someone want to benchmark this for me? Might be interesting since its made with the fastest libs available in Python for their respective tasks. I’m just feeling lazy at this point. :-)

I Love Python: ZipFile Edition

Wednesday, July 8th, 2009

For a client web project I needed to create a zip file containing a number of generated XML files. This isn’t something I need to do very often, so I briefly considered just writing the XML files to disk and running a zip command. Ugly, but surely trying to dig up a pleasant Python zip library would be more work?

Turns out Python has had a wonderful zip library in its standard library since 1.6! The zipfile module makes creating zip files a breeze:

import os
from zipfile import ZipFile, ZIP_DEFLATED
 
from django.template.defaultfilters import slugify
 
from somewhere_else import render_spam_xml, render_egg_xml
 
ZIP_PATH = "/some/system/path/for/zips"
 
def create_zip(spam):
    spam_slug = slugify(spam.name)
    filename = "%s.zip" % spam_slug
    abspath = os.path.join(ZIP_PATH, filename)
 
    # Create zip
    z = ZipFile(abspath, "w", ZIP_DEFLATED)
 
    # Write spam xml directly to zip
    z.writestr("%s.xml" % spam_slug, render_spam_xml(spam))
 
    # Write xml files to zip
    for egg in spam.egg_set.all():
        egg_slug = slugify(egg.name)
 
        # Renders the egg object to an xml string
        xml = render_egg_xml(egg)
 
        # Note how easy it is to specify paths in the zip file:
        z.writestr("eggs/%s.xml" % egg_slug, xml)
 
    # Zip file must be closed to be valid
    z.close()
    return abspath

(Sorry for the Django bits in there, but they should be easy to replace.)

My favorite part is that you can use either the ZipFile.write method to add files to the zip or the ZipFile.writestr method to write bytes (strings in my case) directly to the zip file.

At any rate, just wanted to blog about it, so when I need to do it again in a few years I don’t do something stupid like running the zip command.