Py A La Node.
Or, "Watch where you stick that Python!"
Have you heard about Node.js? It's pretty awesome. Evented IO for servers! All of the promise of erlang, with a syntax that you probably already know. But, you know, it'd be really nice if I could take my various django applications with me when I leave on this coffee-themed rapture. Bonus points if I can take all of the muscle memory I developed using python libraries with me as a carry-on.
Wouldn't it be cool if this worked:
# my_module.py
def greet(greeting, *greeters):
return greeting % ' and '.join(greeters)
-----------
// python.js
var python = require('python'),
sys = require('sys'),
py_sys = python.import('sys'),
path = py_sys.path;
path.append(process.cwd());
var my_module = python.import('my_module');
sys.puts(my_module.greet("hello from %s", "python", "javascript"));
With that goal in mind (and a whole lot of free time on my hands, since the weather has been so crappy in Lawrence lately), I set out on an adventure! Can I make Python work with Node.js? I've got a background in C++ (of course, this is C++ from almost 10 years ago, and it was all developed in MSVC, so...), and I've always wanted to poke around with the CPython internals — not to mention that I was really excited to take a look under the hood at V8.
Let's get started by building a really simple plugin for node.js in C++ using the V8 engine. You know what was really handy when figuring this stuff out? Links to documentation.
- The DOxygen docs were really helpful, believe it or not
- The Google V8 Embedder's guide is also pretty cool
- Grab the Node.js source code
- ry's node_postgres is also very instructive.
This is pretty much the extent of the documentation I was able to find. Keep in mind I'm building on OSX 10.6, and the following steps may contain some things specific to OSX building.
Aim low, sweet chariot
Comin for to java me hooome
First thing's first. We need to make sure we can actually compile a C++ plugin for node,
and call it from javascript. It's not too hard. Node's build process uses waf
, for which
they provide a node-waf
command. Put the following into a file called wscript
in your
working directory:
# wscript
import Options
from os import unlink, symlink, popen
from os.path import exists
srcdir = '.'
blddir = 'build'
VERSION = '0.0.1'
def set_options(opt):
opt.tool_options('compiler_cxx')
opt.tool_options('python')
def configure(conf):
conf.check_tool('compiler_cxx')
conf.check_tool('node_addon')
conf.check_tool('osx') # see what I mean about OSX specific?
conf.check_tool('python')
def build(bld):
obj = bld.new_task_gen('cxx', 'shlib', 'node_addon', 'py', 'pyembed', 'pyext')
obj.env['FRAMEWORK'] = 'python'
obj.target = 'binding'
obj.source = "binding.cc"
obj.init_py()
obj.init_pyembed()
def shutdown():
if Options.commands['clean']:
if exists('binding.node'):
unlink('binding.node')
else:
if exists('build/default/binding.node') and not exists('binding.node'):
symlink('build/default/binding.node', 'binding.node')
To enable python support, we're setting tool_options
, check_tool
, adding the python
framework
to obj.env
, and adding pyembed
and pyext
to the new_task_gen
(then we go off an init_py
and
init_pyembed
.) This is pretty much verbatim from ry's node_postgres
repo except for those lines,
and minus the lines referencing postgres specifically. (thanks ry!)
'Kay. So node-waf will freak out if you run it now, since there is no such file as binding.cc
. Let's
make a super simple one:
// binding.cc
#include <v8.h>
using namespace v8;
// all node plugins must emit
// a "init" function
extern "C" void
init (Handle<Object> target) {
HandleScope scope;
Local<String> output = String::New("hello javascript!");
target->Set(String::New("greeting"), output);
}
Run node-waf configure build
, and you should see that it generated a binding.node
. Now, let's test
it out:
// test.js
var sys = require('sys'),
binding = require('./binding');
sys.puts(binding.greeting); // should output "hello javascript!"
Run node test.js
and you should see the expected output. So what did we do? HandleScope scope
initializes a V8 scope. All objects are referred to by Handle
wrappers, and local variables in a scope
are referred to with a subclass of that wrapper, Local
.
target
is the equivalent of exports
in node.js. We're creating a new string object and assigning it to
the index greeting
within that exports
. Super simple? I already kind of like this. Let's mix it up, just
slightly:
// binding.cc
#include <v8.h>
#include <string>
using namespace v8;
using std::string;
// functions return handles.
Handle<Value>
SayHello(const Arguments& args) {
HandleScope scope;
if(args.Length() != 1 || !args[0]->IsString()) {
return ThrowException(
Exception::Error(String::New("You gotta call this with a string, dog."))
);
}
// convert the value of the first argument to a UTF8String.
// the `*` dereference is important!
char* utf8value = *String::Utf8Value(args[0]->ToString());
// using std::string, put together a nice greeting
string greeting("Hello "),
to_who(utf8value),
result = greeting + to_who;
// close the scope around our result (given that our result is local, we need
// to tell the scope we're returning it.)
// technically you don't have to give the length of the string, but I feel
// safer that way.
return scope.Close(String::New(result.c_str(), result.length()));
}
extern "C" void
init (Handle<Object> target) {
HandleScope scope;
// create a new wrapped FunctionTemplate
Local<FunctionTemplate> say_hello = FunctionTemplate::New(SayHello);
// and grab the actual function out of it, assign it to 'greeting'
target->Set(String::New("greeting"), say_hello->GetFunction());
}
Now you should be able to do things like binding.greeting("butts");
and get the best sophomoric response
from your C++ plugin. Joy and joy unrelenting! V8 makes it really, really, easy to create C++ plugins.
And now for something completely different.
Py_ in yr eye
So we've successfully called into C++ from Javascript at this point. Not a small victory! We should probably talk about Python a little now. Their API is written in C — which is, IMHO, a much saner language than C++. However, the code you end up writing to make sure all is well in CPython world ends up being so very much more verbose than the V8 code you've seen above.
#include <python2.6/Python.h>
Handle<Value>
ImportAModule(const Arguments& args) {
HandleScope scope;
if(args.Length() < 1 || !args[0]->IsString()) {
return ThrowException(
Exception::Error(String::New("I don't know how to import that."))
);
}
Py_Initialize();
PyObject* module_name = PyString_FromString(*String::Utf8Value(args[0]->ToString()));
PyObject* module = PyImport_Import(module_name);
PyObject* module_as_string = PyObject_Str(module);
char* cstr = PyString_AsString(module_as_string);
Local<String> jsstr = String::New(cstr);
Py_XDECREF(module_as_string);
Py_XDECREF(module);
Py_XDECREF(module_name);
Py_Finalize();
return scope.Close(jsstr);
}
And add it to the list of things being exported in your init
function. So you basically have to be as explicit
as humanly possible with CPython (not necessarily a bad thing, at least there's no crazy magic going on.) This just
calls into python, imports the module, and returns the result of str(module)
from inside python. Py_Initialize
starts up the interpreter, Py_Finalize
shuts it down, while Py_XDECREF
decrements the reference count of the
python object (when there are no more references, the object is freed). We're one step down the path, now.
back to javascript.
because now we need to know how to make objects.
So what we've got is helpful — we've peeked into Python, said "hi", and left just as quickly. For the moment, that's
all we're going to do with python. We need to go back into Javascript-land, and figure out how to make an object that
can wrap our adorable little PyObject*
's. We'll probably want to provide the typical javascript accessors valueOf
and toString
, not to mention overriding what happens when we call the objects as a function. Property access should
be controlled so we can attempt to load up PyObject*
children of the current PyObject*
. Wow! That's a mouthful.
// assuming that we have python_function_template_
// static Persistent<FunctionTemplate> python_function_template_;
static void
Initialize(Handle<Object> target) {
HandleScope scope;
Local<FunctionTemplate> fn_tpl = FunctionTemplate::New();
Local<ObjectTemplate> obj_tpl = fn_tpl->InstanceTemplate();
obj_tpl->SetInternalFieldCount(1);
// this has first priority. see if the properties already exist on the python object
obj_tpl->SetNamedPropertyHandler(Get, Set);
// If we're calling `toString`, delegate to our version of ToString
obj_tpl->SetAccessor(String::NewSymbol("toString"), ToStringAccessor);
// Python objects can be called as functions.
obj_tpl->SetCallAsFunctionHandler(Call, Handle<Value>());
python_function_template_ = Persistent<FunctionTemplate>::New(fn_tpl);
// let's also export "import"
Local<FunctionTemplate> import = FunctionTemplate::New(Import);
target->Set(String::New("import"), import->GetFunction());
};
That leaves us to define Import
, ToStringAccessor
, Call
, Get
, and Set
. I'll be referring to snippets from
the node-python repository from this point forward, as we're about to
start getting a little heady, file-size wise. Importantly, we've introduced a class: PyObjectWrapper
, which inherits
from ObjectWrap
— a utility class that Node.js provides to deal with garbage collection of C++ classes.
Let's take a look at the accessor functions first.
static Handle<Value>
ToStringAccessor(Local<String> property, const AccessorInfo& info) {
HandleScope scope;
Local<FunctionTemplate> func = FunctionTemplate::New(ToString);
return scope.Close(func->GetFunction());
};
Accessors are pretty simple. In the case that the accessor should be called as a function — like toString should --
we just create a FunctionTemplate
, assign it to the function we want to call, and return that function. You can access
the current object by calling info.Holder()
— and if you need the C++ PyObjectWrapper object, call
PyObjectWrapper* pyobjwrapper = ObjectWrap::Unwrap<PyObjectWrapper>(info.Holder());
. Easy peasy!
static Handle<Value>
ToString(const Arguments& args) {
HandleScope scope;
Local<Object> this_object = args.This();
PyObjectWrapper* pyobjwrap = ObjectWrap::Unwrap<PyObjectWrapper>(args.This());
Local<String> result = String::New(pyobjwrap->InstanceToString().c_str()); // <-- this is the exciting line
return scope.Close(result);
}
We're just delegating to the actual object! How nice. And now — look at InstanceToString()
:
string InstanceToString() {
PyObject* as_string = PyObject_Str(mPyObject);
string native_string(PyString_AsString(as_string));
Py_XDECREF(as_string);
return native_string;
}
PWHEW. We're done with our call to toString()
. valueOf
works in a very similar fashion, though it delves into the
code ghetto that is ValueOf
, where we have to decide what kind of object to cast our internal PyObject*
to.
Now — the NamedPropertyHandlers
, Get
and Set
.
static Handle<Value>
Get(Local<String> key, const AccessorInfo& info) {
// returning an empty Handle<Value> object signals V8 that we didn't
// find the property here, and we should check the "NamedAccessor" functions
HandleScope scope;
PyObjectWrapper* wrapper = ObjectWrap::Unwrap<PyObjectWrapper>(info.Holder());
String::Utf8Value utf8_key(key);
string value(*utf8_key);
PyObject* result = wrapper->InstanceGet(value); // call down into `InstanceGet`.
if(result) {
RETURN_NEW_PYOBJ(scope, result); // <-- a macro to create a PyObjectWrapper instance, wrap it around a jsobj
// and return it.
}
return Handle<Value>();
}
PyObject* InstanceGet(const string& key) {
if(PyObject_HasAttrString(mPyObject, key.c_str())) {
PyObject* attribute = PyObject_GetAttrString(mPyObject, key.c_str());
return attribute;
}
return (PyObject*)NULL;
}
Things to note: return Handle<Value>();
in Get
signals to V8 that we haven't found any corresponding property
for the key
we were passed, and that it should continue to looking at the Accessor
elements to figure out if
we can avoid returning undefined
. Otherwise, we're just asking Python if our object has that attribute,
and then returning it unwrapped if it does. Just to be comprehensive; I present the RETURN_NEW_PYOBJ
macro:
#define RETURN_NEW_PYOBJ(scope,pyobject) \
Local<Object> jsobject = python_function_template_->GetFunction()->NewInstance(); \
PyObjectWrapper* py_object_wrapper = new PyObjectWrapper(pyobject); \
py_object_wrapper->Wrap(jsobject); \
return scope.Close(jsobject);
We create a local instance of the python_function_template_
, which carries along our property accessors, etc,
and wraps it with our python_function_template_
. Internally, V8 Javascript objects are able to carry around an
"InternalField", which is just a void*
pointer to whatever C++ object you wish to piggyback on that javascript
object.
That's pretty much the C++ side of things. The new Import
function calls RETURN_NEW_PYOBJ
on the module we
load up.
ObjectWrapping up
yeah, that pun was lame
Hopefully that wasn't too scattershot to follow. At this point, if you compiled it, you could import python modules,
append to sys.path
, and load up custom modules. Included in my node-python
repo is a really really simple,
somewhat broken wsgi.js
file that calls into WSGI from node.js.
My experience with this binding is as follows: it is analogous to a mod_python
, except for node.js
, and it
probably has the exact same things against it. Embedding a python process doesn't let you predict the memory usage,
and it's made even worse by the fact that V8 garbage collects only at certain points. I would certainly avoid using
it in a production setting at the moment. I'm currently leaning towards connecting to a UWSGI socket through Node.js,
and I have a project that follows that format on github. At the very
least, hopefully this opens up the possibility of writing C++ plugins to node to more developers.