NeverSawUs

Py A La Node.

Or, "Watch where you stick that Python!"

Have you heard about Node.js? It's pretty awesome. Evented IO for servers! All of the promise of erlang, with a syntax that you probably already know. But, you know, it'd be really nice if I could take my various django applications with me when I leave on this coffee-themed rapture. Bonus points if I can take all of the muscle memory I developed using python libraries with me as a carry-on.

Wouldn't it be cool if this worked:

# my_module.py
def greet(greeting, *greeters):
    return greeting % ' and '.join(greeters)

-----------
// python.js
var python = require('python'),
    sys = require('sys'),
    py_sys = python.import('sys'),
    path = py_sys.path;

path.append(process.cwd());
var my_module = python.import('my_module');
sys.puts(my_module.greet("hello from %s", "python", "javascript"));

With that goal in mind (and a whole lot of free time on my hands, since the weather has been so crappy in Lawrence lately), I set out on an adventure! Can I make Python work with Node.js? I've got a background in C++ (of course, this is C++ from almost 10 years ago, and it was all developed in MSVC, so...), and I've always wanted to poke around with the CPython internals — not to mention that I was really excited to take a look under the hood at V8.

Let's get started by building a really simple plugin for node.js in C++ using the V8 engine. You know what was really handy when figuring this stuff out? Links to documentation.

  1. The DOxygen docs were really helpful, believe it or not
  2. The Google V8 Embedder's guide is also pretty cool
  3. Grab the Node.js source code
  4. ry's node_postgres is also very instructive.

This is pretty much the extent of the documentation I was able to find. Keep in mind I'm building on OSX 10.6, and the following steps may contain some things specific to OSX building.


Aim low, sweet chariot

Comin for to java me hooome

First thing's first. We need to make sure we can actually compile a C++ plugin for node, and call it from javascript. It's not too hard. Node's build process uses waf, for which they provide a node-waf command. Put the following into a file called wscript in your working directory:

# wscript
import Options
from os import unlink, symlink, popen
from os.path import exists 

srcdir = '.'
blddir = 'build'
VERSION = '0.0.1'

def set_options(opt):
    opt.tool_options('compiler_cxx')
    opt.tool_options('python')

def configure(conf):
    conf.check_tool('compiler_cxx')
    conf.check_tool('node_addon')
    conf.check_tool('osx')                # see what I mean about OSX specific?
    conf.check_tool('python')

def build(bld):
    obj = bld.new_task_gen('cxx', 'shlib', 'node_addon', 'py', 'pyembed', 'pyext')
    obj.env['FRAMEWORK'] = 'python'
    obj.target = 'binding'
    obj.source = "binding.cc"
    obj.init_py()
    obj.init_pyembed()

def shutdown():
    if Options.commands['clean']:
        if exists('binding.node'):
            unlink('binding.node')
    else:
        if exists('build/default/binding.node') and not exists('binding.node'):
            symlink('build/default/binding.node', 'binding.node')

To enable python support, we're setting tool_options, check_tool, adding the python framework to obj.env, and adding pyembed and pyext to the new_task_gen (then we go off an init_py and init_pyembed.) This is pretty much verbatim from ry's node_postgres repo except for those lines, and minus the lines referencing postgres specifically. (thanks ry!)

'Kay. So node-waf will freak out if you run it now, since there is no such file as binding.cc. Let's make a super simple one:

// binding.cc
#include <v8.h>

using namespace v8;

// all node plugins must emit
// a "init" function
extern "C" void
init (Handle<Object> target) {
    HandleScope scope;
    Local<String> output = String::New("hello javascript!");
    target->Set(String::New("greeting"), output);
}

Run node-waf configure build, and you should see that it generated a binding.node. Now, let's test it out:

// test.js
var sys = require('sys'),
    binding = require('./binding');

sys.puts(binding.greeting);     // should output "hello javascript!"

Run node test.js and you should see the expected output. So what did we do? HandleScope scope initializes a V8 scope. All objects are referred to by Handle wrappers, and local variables in a scope are referred to with a subclass of that wrapper, Local.

target is the equivalent of exports in node.js. We're creating a new string object and assigning it to the index greeting within that exports. Super simple? I already kind of like this. Let's mix it up, just slightly:

// binding.cc
#include <v8.h>
#include <string>

using namespace v8;
using std::string;


// functions return handles.
Handle<Value>
SayHello(const Arguments& args) {
    HandleScope scope;
    if(args.Length() != 1 || !args[0]->IsString()) {
        return ThrowException(
            Exception::Error(String::New("You gotta call this with a string, dog."))
        );
    }

    // convert the value of the first argument to a UTF8String.
    // the `*` dereference is important!
    char* utf8value = *String::Utf8Value(args[0]->ToString());

    // using std::string, put together a nice greeting
    string greeting("Hello "),
        to_who(utf8value),
        result = greeting + to_who;

    // close the scope around our result (given that our result is local, we need
    // to tell the scope we're returning it.)
    // technically you don't have to give the length of the string, but I feel
    // safer that way.
    return scope.Close(String::New(result.c_str(), result.length()));
}

extern "C" void
init (Handle<Object> target) {
    HandleScope scope;
    // create a new wrapped FunctionTemplate
    Local<FunctionTemplate> say_hello = FunctionTemplate::New(SayHello);

    // and grab the actual function out of it, assign it to 'greeting'
    target->Set(String::New("greeting"), say_hello->GetFunction());
}

Now you should be able to do things like binding.greeting("butts"); and get the best sophomoric response from your C++ plugin. Joy and joy unrelenting! V8 makes it really, really, easy to create C++ plugins.


And now for something completely different.

Py_ in yr eye

So we've successfully called into C++ from Javascript at this point. Not a small victory! We should probably talk about Python a little now. Their API is written in C — which is, IMHO, a much saner language than C++. However, the code you end up writing to make sure all is well in CPython world ends up being so very much more verbose than the V8 code you've seen above.

#include <python2.6/Python.h>

Handle<Value>
ImportAModule(const Arguments& args) {
    HandleScope scope;
    if(args.Length() < 1 || !args[0]->IsString()) {
        return ThrowException(
            Exception::Error(String::New("I don't know how to import that."))
        );
    }
    Py_Initialize();
    PyObject* module_name = PyString_FromString(*String::Utf8Value(args[0]->ToString()));
    PyObject* module = PyImport_Import(module_name);
    PyObject* module_as_string = PyObject_Str(module);
    char* cstr = PyString_AsString(module_as_string);
    Local<String> jsstr = String::New(cstr);

    Py_XDECREF(module_as_string);
    Py_XDECREF(module);
    Py_XDECREF(module_name);
    Py_Finalize();
    return scope.Close(jsstr);
}

And add it to the list of things being exported in your init function. So you basically have to be as explicit as humanly possible with CPython (not necessarily a bad thing, at least there's no crazy magic going on.) This just calls into python, imports the module, and returns the result of str(module) from inside python. Py_Initialize starts up the interpreter, Py_Finalize shuts it down, while Py_XDECREF decrements the reference count of the python object (when there are no more references, the object is freed). We're one step down the path, now.


back to javascript.

because now we need to know how to make objects.

So what we've got is helpful — we've peeked into Python, said "hi", and left just as quickly. For the moment, that's all we're going to do with python. We need to go back into Javascript-land, and figure out how to make an object that can wrap our adorable little PyObject*'s. We'll probably want to provide the typical javascript accessors valueOf and toString, not to mention overriding what happens when we call the objects as a function. Property access should be controlled so we can attempt to load up PyObject* children of the current PyObject*. Wow! That's a mouthful.

// assuming that we have python_function_template_
//         static Persistent<FunctionTemplate> python_function_template_;

static void
Initialize(Handle<Object> target) {
    HandleScope scope;
    Local<FunctionTemplate> fn_tpl = FunctionTemplate::New();                                                
    Local<ObjectTemplate> obj_tpl = fn_tpl->InstanceTemplate();                                              

    obj_tpl->SetInternalFieldCount(1);                                                                       

    // this has first priority. see if the properties already exist on the python object                     
    obj_tpl->SetNamedPropertyHandler(Get, Set);                                                              

    // If we're calling `toString`, delegate to our version of ToString                                      
    obj_tpl->SetAccessor(String::NewSymbol("toString"), ToStringAccessor);                                   

    // Python objects can be called as functions.
    obj_tpl->SetCallAsFunctionHandler(Call, Handle<Value>());                                                

    python_function_template_ = Persistent<FunctionTemplate>::New(fn_tpl);                                   
    // let's also export "import"                                                                            
    Local<FunctionTemplate> import = FunctionTemplate::New(Import);                                          
    target->Set(String::New("import"), import->GetFunction());                                               
};       

That leaves us to define Import, ToStringAccessor, Call, Get, and Set. I'll be referring to snippets from the node-python repository from this point forward, as we're about to start getting a little heady, file-size wise. Importantly, we've introduced a class: PyObjectWrapper, which inherits from ObjectWrap — a utility class that Node.js provides to deal with garbage collection of C++ classes.

Let's take a look at the accessor functions first.

static Handle<Value>
ToStringAccessor(Local<String> property, const AccessorInfo& info) {
    HandleScope scope;
    Local<FunctionTemplate> func = FunctionTemplate::New(ToString);
    return scope.Close(func->GetFunction());
};

Accessors are pretty simple. In the case that the accessor should be called as a function — like toString should -- we just create a FunctionTemplate, assign it to the function we want to call, and return that function. You can access the current object by calling info.Holder() — and if you need the C++ PyObjectWrapper object, call PyObjectWrapper* pyobjwrapper = ObjectWrap::Unwrap<PyObjectWrapper>(info.Holder());. Easy peasy!

static Handle<Value>
ToString(const Arguments& args) {
    HandleScope scope;
    Local<Object> this_object = args.This();
    PyObjectWrapper* pyobjwrap = ObjectWrap::Unwrap<PyObjectWrapper>(args.This());
    Local<String> result = String::New(pyobjwrap->InstanceToString().c_str());          // <-- this is the exciting line
    return scope.Close(result);
}

We're just delegating to the actual object! How nice. And now — look at InstanceToString():

string InstanceToString() {
    PyObject* as_string = PyObject_Str(mPyObject);
    string native_string(PyString_AsString(as_string));
    Py_XDECREF(as_string);
    return native_string;
}

PWHEW. We're done with our call to toString(). valueOf works in a very similar fashion, though it delves into the code ghetto that is ValueOf, where we have to decide what kind of object to cast our internal PyObject* to.

Now — the NamedPropertyHandlers, Get and Set.

    static Handle<Value>
    Get(Local<String> key, const AccessorInfo& info) {
        // returning an empty Handle<Value> object signals V8 that we didn't
        // find the property here, and we should check the "NamedAccessor" functions
        HandleScope scope;
        PyObjectWrapper* wrapper = ObjectWrap::Unwrap<PyObjectWrapper>(info.Holder());
        String::Utf8Value utf8_key(key);
        string value(*utf8_key);
        PyObject* result = wrapper->InstanceGet(value);         // call down into `InstanceGet`.
        if(result) {
            RETURN_NEW_PYOBJ(scope, result);        // <-- a macro to create a PyObjectWrapper instance, wrap it around a jsobj
                                                    // and return it.
        }
        return Handle<Value>();
    }

    PyObject* InstanceGet(const string& key) {
        if(PyObject_HasAttrString(mPyObject, key.c_str())) {
            PyObject* attribute = PyObject_GetAttrString(mPyObject, key.c_str());
            return attribute;
        }
        return (PyObject*)NULL;
    }

Things to note: return Handle<Value>(); in Get signals to V8 that we haven't found any corresponding property for the key we were passed, and that it should continue to looking at the Accessor elements to figure out if we can avoid returning undefined. Otherwise, we're just asking Python if our object has that attribute, and then returning it unwrapped if it does. Just to be comprehensive; I present the RETURN_NEW_PYOBJ macro:

#define RETURN_NEW_PYOBJ(scope,pyobject) \
        Local<Object> jsobject = python_function_template_->GetFunction()->NewInstance();   \
        PyObjectWrapper* py_object_wrapper = new PyObjectWrapper(pyobject);                 \
        py_object_wrapper->Wrap(jsobject);                                                  \
        return scope.Close(jsobject);

We create a local instance of the python_function_template_, which carries along our property accessors, etc, and wraps it with our python_function_template_. Internally, V8 Javascript objects are able to carry around an "InternalField", which is just a void* pointer to whatever C++ object you wish to piggyback on that javascript object.

That's pretty much the C++ side of things. The new Import function calls RETURN_NEW_PYOBJ on the module we load up.


ObjectWrapping up

yeah, that pun was lame

Hopefully that wasn't too scattershot to follow. At this point, if you compiled it, you could import python modules, append to sys.path, and load up custom modules. Included in my node-python repo is a really really simple, somewhat broken wsgi.js file that calls into WSGI from node.js.

My experience with this binding is as follows: it is analogous to a mod_python, except for node.js, and it probably has the exact same things against it. Embedding a python process doesn't let you predict the memory usage, and it's made even worse by the fact that V8 garbage collects only at certain points. I would certainly avoid using it in a production setting at the moment. I'm currently leaning towards connecting to a UWSGI socket through Node.js, and I have a project that follows that format on github. At the very least, hopefully this opens up the possibility of writing C++ plugins to node to more developers.