<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Paul Joseph Davis</title>
    <subtitle>Semicoherent Writings</subtitle>
    <link rel="self" href="http://www.davispj.com/feeds/all.xml" />
    <link href="http://www.davispj.com/" />
    <updated>2010-02-19T23:33:32-05:00</updated>
    <author>
        <name>Paul J. Davis</name>
        <email>paul.joseph.davis@gmail.com</email>
    </author>
    <id>http://www.davispj.com/</id>

    
    <entry>
        <id>http://www.davispj.com/2010/02/19/agpl-not-awesome-gpl.html</id>
        <title>AGPL != Awesome GPL.</title>
        <link href="http://www.davispj.com/2010/02/19/agpl-not-awesome-gpl.html" />
        <updated>2010-02-19T00:00:00-05:00</updated>
        <content type="html">&lt;h1&gt;AGPL != Awesome GPL.&lt;/h1&gt;

&lt;p&gt;So I was about to write a really funny blog post. It would've been really super amazingly funny. It was going to go something like such:&lt;/p&gt;

&lt;p&gt;A verbatim copy of the AGPL license. And I would rename it to something like the CGPL - Contemplorary GNU Public License. I was going to replace all the mentions of Affero with Contemplorary (yes, I made it up) and then rewrite clause 13 that describes network interaction. My new clause would've read something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;13. Willful Contemplation of Enhancement; Use with the GNU General Public License.

Any entity who contemplates or can be shown to have likely entertained
the notion of contemplating the use or enhancement of The Progarm for
any purpose shall be required to convey a copy of the covered work plus
any contemplation of enhancement to anyone willing to receive said copy.

As contemplation is fleeting it is required that all persons having
obtained a copy of The Program are required to record all thoughts in
triplicate such that these contemplated enhancements may be conveyed to
all interested parties. In order to ensure that interested parties are
able to retain the right to contemplating said contemplations, the
aforementioned notes must be made publically accessible within five
business days under applicable local laws.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Yes. I would actually have found that amusing. And I would've gone back and relicensed a project with it just to see what happens. Nothing would've happened. But I would've still enjoyed the thought of someone scratching their head over the whole thing.&lt;/p&gt;

&lt;h2&gt;As it Turns Out&lt;/h2&gt;

&lt;p&gt;I started reading the AGPL and for the first time I really read the copyright statement:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt; Copyright (C) 2007 Free Software Foundation, Inc. &amp;lt;http://fsf.org/&amp;gt;
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You're shitting me. The license itself can not be changed? Now IANAL, but does that really mean I can't make an amusing (to me) alteration to the license? Assuming derivative work as defined in the license itself I'm gonna have to guess that yes, making fun of the AGPL by writing a derived license would be a violation.&lt;/p&gt;

&lt;p&gt;So assuming that theory is correct, one of the most popular open source licenses doesn't allow derivative works of the license itself?&lt;/p&gt;

&lt;h2&gt;Caveats, Always Caveats&lt;/h2&gt;

&lt;p&gt;My legal interpretation is probably wrong. Though I doubt that it's so wrong that a lawyer would say, "Go for it!" And you could argue that allowing copies may result in brand dilution or something along that vein. I know of no other license that has a copyright notice on the license itself. The whole thing is just really messing with my head.&lt;/p&gt;

&lt;h2&gt;Tumbolia Public License&lt;/h2&gt;

&lt;p&gt;So, instead of writing that really funny blog post, I'm just going to advertise a license that more people should be using: the Tumbolia Public License. I've included a verbatim copy here for reference. I have already released projects under this license and so should you. Mostly just to screw with the lawyer types that make us non-lawyer types extremely confused over what it is we're supposed to be doing with licenses.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;                           Tumbolia Public License

Copyright &amp;lt;year&amp;gt;, &amp;lt;name of author&amp;gt;

Copying and distribution of this file, with or without modification, are
permitted in any medium without royalty provided the copyright notice and
this notice are preserved.

TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION

  0. opan saurce LOL
&lt;/code&gt;&lt;/pre&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2010/01/19/erlang_js_awesome.html</id>
        <title>erlang_js - Awesome</title>
        <link href="http://www.davispj.com/2010/01/19/erlang_js_awesome.html" />
        <updated>2010-01-19T00:00:00-05:00</updated>
        <content type="html">&lt;h1&gt;erlang_js - Awesome&lt;/h1&gt;

&lt;h2&gt;erlang_js&lt;/h2&gt;

&lt;p&gt;If you haven't heard, the Basho team &lt;a href="http://twitter.com/justinsheehy/status/7950307815"&gt;released&lt;/a&gt; &lt;a href="http://bitbucket.org/basho/erlang_js/"&gt;erlang_js&lt;/a&gt; today. Its a linked in driver that provides a Spidermonkey JavaScript context to run JS code for Erlang. This is interesting to me because it avoids the stdio overhead incurred by the current Map/Reduce system that CouchDB uses. So I did what any bored hacker would do: threw erlang_js into the CouchDB build system and hacked the view generation code to use the in-VM contexts.&lt;/p&gt;

&lt;h2&gt;Numbers&lt;/h2&gt;

&lt;p&gt;These times are for the "mega view" reported in seconds from raindrop-perf.py found &lt;a href="/scripts/2010-01-19-raindrop-perf.py"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;table&gt;
    &lt;tr&gt;
        &lt;th&gt;Run&lt;/th&gt;
        &lt;th&gt;Trunk&lt;/th&gt;
        &lt;th&gt;erlang_js&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;1&lt;/td&gt;
        &lt;td&gt;13.63&lt;/td&gt;
        &lt;td&gt;6.89&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;2&lt;/td&gt;
        &lt;td&gt;11.16&lt;/td&gt;
        &lt;td&gt;6.94&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;3&lt;/td&gt;
        &lt;td&gt;11.82&lt;/td&gt;
        &lt;td&gt;6.80&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;


&lt;h2&gt;Code&lt;/h2&gt;

&lt;p&gt;Look &lt;a href="http://github.com/davisp/couchdb/tree/erlang_js"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Next Up&lt;/h2&gt;

&lt;p&gt;The communication between Erlang and JS is unnecessarily converting Erlang -&gt; JSON -&gt; Spidermonkey Objects. I've written the code to go from the external Erlang representation to Spidermonkey objects directly so I plan on integrating that in the next couple days to see how these numbers change.&lt;/p&gt;

&lt;h2&gt;Code Might be Nice&lt;/h2&gt;

&lt;p&gt;Just thought that maybe people would be interested in the code that's used to talk to erlang_js. Its pretty straight forward, though not very elegant on my side.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;% From couch_query_servers.erl
start_doc_map(_Lang, Functions) -&amp;gt;
    {ok, Port} = js_driver:new(),
    ok = js_driver:define_js(
        Port, &amp;lt;&amp;lt;"map_support.js"&amp;gt;&amp;gt;, map_support(), 5000
    ),
    lists:foreach(fun(FuncSource) -&amp;gt;
        Source = &amp;lt;&amp;lt;"map_funs.push(", FuncSource/binary, ");"&amp;gt;&amp;gt;,
        ok = js_driver:define_js(Port, Source)
    end, Functions),
    {ok, Port}.

map_docs(Port, Docs) -&amp;gt;
    Results = lists:map(
        fun(Doc) -&amp;gt;
            Json = couch_doc:to_json_obj(Doc, []),
            {ok, Results} = js:call(Port, &amp;lt;&amp;lt;"map_doc"&amp;gt;&amp;gt;, [Json]),
            lists:map(
                fun(FunRs) -&amp;gt;
                    [list_to_tuple(FunResult) || FunResult &amp;lt;- FunRs]
                end,
            Results)
        end,
        Docs),
    {ok, Results}.

stop_doc_map(nil) -&amp;gt;
    ok;
stop_doc_map(Port) -&amp;gt;
    js_driver:destroy(Port).
% EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And map_support.js I wrote to avoid having to think to hard on the Erlang side of things:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;// src/couchdb/priv/map_support.js
var map_funs = [];
var results = [];

var emit = function(key, value) {
  results.push([key, value]);
};

var map_doc = function(doc) {
  var ret = [];
  map_funs.forEach(function(func) {
    results = [];
    func(doc);
    ret.push(results);
  });
  return ret;
};
// EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Shoutout&lt;/h2&gt;

&lt;p&gt;Go, &lt;a href="http://basho.com/"&gt;Basho&lt;/a&gt;, Go!&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2010/01/18/fighting-hype-with-hype.html</id>
        <title>Fighting Hype with Hype - RDBMS FTW!!!1!</title>
        <link href="http://www.davispj.com/2010/01/18/fighting-hype-with-hype.html" />
        <updated>2010-01-18T00:00:00-05:00</updated>
        <content type="html">&lt;h1&gt;Fighting Hype with Hype - RDBMS FTW!!!1!&lt;/h1&gt;

&lt;h2&gt;Yay Google Alerts&lt;/h2&gt;

&lt;p&gt;I woke up this morning to an amusing &lt;a href="http://www.ryanpark.org/2008/04/top-10-avoid-the-simpledb-hype.html"&gt;blog post&lt;/a&gt; by &lt;a href="http://www.ryanpark.org/about-2"&gt;Ryan Park&lt;/a&gt; in my Google alerts. I tend to read alot of the "RDBMS's are awesome! No, NOSQL is moar awesome!" with great bemusement. Even though it's a year and a half old it was amusing enough to motivate me to write out some of the thoughts I had while reading it.&lt;/p&gt;

&lt;h2&gt;1. Data integrity is not guaranteed.&lt;/h2&gt;

&lt;p&gt;Ryan spends a couple paragraphs talking about how horrible it is that non-RDBMS systems don't provide data constraints. In general he's pretty spot on here. One of the first things that tends to get left out of non-RDBMS systems is constraint enforcement.&lt;/p&gt;

&lt;p&gt;There are two and a half points I'd like to make. First, constraints are usually the first to go because they're costly. Costly to implement and costly at runtime. Especially when the system is being designed with the ability to run on multiple machines.&lt;/p&gt;

&lt;p&gt;Secondly, there are plenty of people that don't use constraints. Ryan falls pretty squarely into the "RDBMS's work for me, so they should work for you too" camp. He appears to know his stuff but what people like Ryan forget is that there are a lot of people that don't. They use an RDBMS because that's what the internet says to do. And then they plop an ORM on top of it and never actually use any of the RDBMS features that are so lauded. As it turns out, lots of these  developers are super happy using a database that doesn't provide constraint enforcement.&lt;/p&gt;

&lt;p&gt;The last half a point I'll make later as it was suggested in a comment on Ryan's post and applies later on in the conversation.&lt;/p&gt;

&lt;h2&gt;2. Inconsistency will provide a terrible user experience.&lt;/h2&gt;

&lt;p&gt;Even reading the bullet point on this one and I knew I was in for some fun. I mean seriously, if that's not proof by assertion then I don't know what is.&lt;/p&gt;

&lt;p&gt;Ryan is quite right that developers need to make some things appear consistent so as to not confuse users. I can't speak to the specifics of SimpleDB, but obviously someone's using it successfully so I'll assume that its possible.&lt;/p&gt;

&lt;p&gt;The two things I'd point out though is that consistency is not limited to those crazy non-RDBMS people. Even in a traditional three-tier web architecture, there's the issue with sessions. Basically the issue is that a client needs to be repeatedly routed to the same application server handling their session.&lt;/p&gt;

&lt;p&gt;The other thing I'll point out is an interesting &lt;a href="http://www.facebook.com/note.php?note_id=23844338919"&gt;Facebook blog post&lt;/a&gt; I read a couple months ago. Its an interesting look at how Facebook added a second datacenter on the east coast. A datacenter based on MySQL no less. I'll draw your attention to the "Cache Consistency" section. And I'll I'm going to point out is that their solution required modifying MySQL's query parser. Seriously.&lt;/p&gt;

&lt;h2&gt;3. Aggregate operations will require more coding.&lt;/h2&gt;

&lt;p&gt;For the bullet point, yes, that's more or less true. The argument in support of this is pretty much non-existent. If Ryan really wanted to make an argument about aggregates, the best thing would be to go on about how a non-RDBMS requires you to know what type of aggregates you'll want up front and then do insert time calculations for these values. While that will work just fine, it makes ad-hoc queries harder. The ad-hoc issue is the next bullet point, but for some reason the connection wasn't made.&lt;/p&gt;

&lt;h2&gt;4. Complicated reports, and ad hoc queries, will require a lot more coding.&lt;/h2&gt;

&lt;p&gt;This was one of my favorite bullet points in the whole article. And by favorite I mean that it produced the most WTF's per word.&lt;/p&gt;

&lt;p&gt;Firstly, Ryan points out that there are three general work loads for databases. (1) General queries that are used by the application, (2) More complicated queries run by staff for reporting, (3) ad-hoc queries for debugging. I would pretty much agree with him there. But then he goes on to make the assertion that points 2 and 3 are better served by SQL.&lt;/p&gt;

&lt;p&gt;The entire second paragraph is some sort of weird twisted logic to bolster the argument that SQL makes reporting super easy. My favorite part of the whole thing is the quote right in the middle:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;In my previous jobs, our reports often required hundreds 
of lines of SQL to get the right information out of the
database. This is a lot of code, but it was required to
generate the data for our customers.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As far as I can tell, the argument is that SQL makes complex reports easy even though it still might take hundreds of lines to get the data required. And the other thing that's not mentioned, these reports can still take a substantial amount of time to generate. But obviously this is still better than the non-RDBMS systems where they don't even have SQL! Because obviously its impossible that an any imperative language could be as good as SQL... because... because... well I never figured that part out either.&lt;/p&gt;

&lt;h2&gt;5. Aggregate operations will be much slower if you don't use an RDBMS.&lt;/h2&gt;

&lt;p&gt;Yeah. This one is special.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;RDBMSes are highly optimized for performing aggregate
operations across huge volumes of data. Fast algorithms
like the hash join, merge join, and indexed binary search
have been around for 20 years or more.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Ok. Breathe. I'm assuming that he forgot that joins are not aggregates. And a binary search, well, shit... I guess the non-RDBMS people really are screwed. And the second paragraph talks about how the client is going to need to scan the entire database thus incurring the huge network transfer to even try and compute an aggregate.&lt;/p&gt;

&lt;p&gt;I'll just point out that there are non-RDBMS systems that provide aggregate functionality and anything that uses a b+tree probably uses binary search. Remember people, just because you can't think of a different solution to your problem doesn't mean it can't exist.&lt;/p&gt;

&lt;h2&gt;6. Data import, export, and backup will be slow and difficult.&lt;/h2&gt;

&lt;p&gt;Now this is just FUD. Sorry, but there's no better way to say it. If you've ever had to fit some random piece of data into your existing relational schema you'll probably agree that this is crap. Munging random data is hard. And if its not random data then its not really that important. And getting data out? Perhaps Ryan was being satirical?&lt;/p&gt;

&lt;h2&gt;7. SimpleDB isn't that fast.&lt;/h2&gt;

&lt;p&gt;&lt;a href="http://jan.prima.de/plok/archives/175-Benchmarks-You-are-Doing-it-Wrong.html"&gt;Jan&lt;/a&gt; &lt;a href="http://jan.prima.de/plok/archives/176-Caveats-of-Evaluating-Databases.html"&gt;Lehnardt&lt;/a&gt; has a couple thought provoking arguments and &lt;a href="http://vmx.cx/cgi-bin/blog/index.cgi/benchmarking-is-not-easy:2009-09-23:en,CouchDB,Python,TileCache,geo"&gt;Volker Mische&lt;/a&gt; provides some interesting fodder as well. Basically, to say something isn't fast requires you to define what fast is and, generally, no two people will ever agree on the same definition.&lt;/p&gt;

&lt;p&gt;That said, Ryan does make an allusion to this situation when he mentions that SimpleDB probably needs a larger DB to be measured on. And he also points out that lots of databases probably fit into RAM.&lt;/p&gt;

&lt;p&gt;There's an interesting &lt;a href="http://37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on-sharding"&gt;article&lt;/a&gt; by one of the 37signals guys about buying more RAM instead of sharding. While definitely a valid approach, not everyone can go out and buy a single machine with 32 GiB of RAM (though obviously that's getting closer). Though I now curiously wonder what type of disks they have to keep up with the write load they might have.&lt;/p&gt;

&lt;h2&gt;8. Relational databases are scalable, even with massive data sets.&lt;/h2&gt;

&lt;p&gt;I don't have a better response than the commenter &lt;a href="http://www.ryanpark.org/2008/04/top-10-avoid-the-simpledb-hype.html#comment-4308692"&gt;jackson&lt;/a&gt; on the original blog post. Once an RDBMS is scaled to multiple machines, lots of the benefits are nullified and you're dealing with the same issues that the non-RDBMS folks are.&lt;/p&gt;

&lt;h2&gt;9. Super-scalability is overrated. Slowing the pace of your product development is even worse.&lt;/h2&gt;

&lt;p&gt;There is definitely a lot of noise in the echo chamber about scalability. Developers like to talk about needing hundreds of nodes to support their work load because that's just cool. But in reality, the issue isn't adding the hundredth node to a system, its adding the second. Regardless of the database being used, if that second node isn't planned for it'll be painful. Non-RDBMS systems generally reduce that pain point by discouraging designs that exacerbate the problems when adding a second node.&lt;/p&gt;

&lt;h2&gt;10. SimpleDB is useful, but only in certain contexts.&lt;/h2&gt;

&lt;p&gt;I'll file this under the "No shit?" category. There are plenty of places that an RDBMS might be a better fit than any given non-RDBMS. And vice versa. The underlying issue that people seem to miss is being able to describe situations where one might be better than the other.&lt;/p&gt;

&lt;p&gt;The bottom line to this whole "My database is better than your database!" argument is that "You're both right, so STFU!" Eventually people will calm down and start to realize that there are multiple solutions and the right one will depend as much on the problem domain as the developer coding the solution. A better use of time would be finding personal projects and drawing up the arguments for and against the coded solution so that others might learn from past experience.&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2009/11/23/erlang-nif-test.html</id>
        <title>Erlang NIF Test</title>
        <link href="http://www.davispj.com/2009/11/23/erlang-nif-test.html" />
        <updated>2009-11-23T00:00:00-05:00</updated>
        <content type="html">&lt;h1&gt;Erlang NIF Test&lt;/h1&gt;

&lt;h2&gt;&lt;em&gt;NOTE&lt;/em&gt;&lt;/h2&gt;

&lt;p&gt;This tutorial requires a fairly recent &lt;a href="http://erlang.org/download/snapshots/?M=D"&gt;snapshot&lt;/a&gt; of Erlang. I'm using the snapshot from November 22nd, 2009. The official release containing the required functionality is slated to be out on November 25th, 2009. You'll see I haven't actually installed the snapshot build in case you're like me and want to wait for an official release.&lt;/p&gt;

&lt;h2&gt;&lt;em&gt;UPDATE&lt;/em&gt;&lt;/h2&gt;

&lt;p&gt;No more NIF's! The daily snapshot page was updated and the new tarball for today doesn't include the new NIF functions. Luckily there are public mirrors of the code.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ git clone git://github.com/janl/erlang0d.git otp_src_R13B03
$ cd otp_src_R13B03
$ git checkout origin/R13B03-20091122225501
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Erlang NIF's&lt;/h2&gt;

&lt;p&gt;I've been waiting excitedly for the new Natively Implemented Function (NIF) interface to land in the next Erlang release since I first saw them &lt;a href="http://twitter.com/FrancescoC/status/5651602607"&gt;announced&lt;/a&gt;. Then I saw another &lt;a href="http://twitter.com/dizzyco/status/5889832914"&gt;message&lt;/a&gt; form &lt;a href="http://twitter.com/dizzyco"&gt;@dizzyco&lt;/a&gt; that was more &lt;a href="http://twitter.com/dizzyco/status/5891652969"&gt;specific&lt;/a&gt;. So I did what any normal person would do. Read the test suite and wrote a minimal NIF to figure out the compiling and call semantics.&lt;/p&gt;

&lt;h2&gt;The NIF Module C API&lt;/h2&gt;

&lt;p&gt;The first thing to note is that a NIF module has four callbacks that are used for bookkeeping with loading the shared library code: load, reload, upgrade, and unload. Each function gets an ErlNifEnv* argument, a pointer to some driver specific data, and (except unload) an ERL_NIF_TERM load_info argument. The environment and private data pointers are pretty standard for this sort of thing. I'm not entirely certain what load_info is for. The method for initializing NIF modules takes a second parameter which may be what this is for, but I haven't investigated to find out for certain.&lt;/p&gt;

&lt;p&gt;After defining each of those four methods, to actually implement the NIF functions we define a function that takes an ErlNifEnv* argument and zero or more positional parameters of type ERL_NIF_TERM. These functions will show up in the Erlang side and can be called as expected.&lt;/p&gt;

&lt;p&gt;The code for our minimal NIF module looks like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;// mynif.c
#include &amp;lt;stdio.h&amp;gt;
#include "erl_nif.h"

static int
load(ErlNifEnv* env, void** priv, ERL_NIF_TERM load_info)
{
    return 0;
}

static int
reload(ErlNifEnv* env, void** priv, ERL_NIF_TERM load_info)
{
    return 0;
}

static int
upgrade(ErlNifEnv* env, void** priv, void** old_priv,
          ERL_NIF_TERM load_info)
{
    return 0;
}

static void
unload(ErlNifEnv* env, void* priv)
{
    return;
}

static ERL_NIF_TERM
do_something(ErlNifEnv* env, ERL_NIF_TERM a1)
{
    unsigned long val;
    if(!enif_get_ulong(env, a1, &amp;amp;val)) {
        return enif_make_badarg(env);
    } else {
        return enif_make_ulong(env, val*2);
    }
}

static ErlNifFunc mynif_funcs[] =
{
    {"do_something", 1, do_something}
};

ERL_NIF_INIT(mynif, mynif_funcs, load, reload, upgrade, unload)
// EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That's all fairly straight forward. Define the four required functions and just return 0 to indicate no error. The ErlNifFunc structure appears to be a triple of {name_in_erlang, arity, name_in_c} calls. There's an example in the source on having the same Erlang name and different arities. As you'd expect, you just specify the same string, and change the second value.&lt;/p&gt;

&lt;p&gt;The implementation of do_something shows a basic error when the argument is not an unsigned long. We'll test that this works as expected later.&lt;/p&gt;

&lt;h2&gt;The Erlang API&lt;/h2&gt;

&lt;p&gt;The Erlang side is pretty simple as well. To load a NIF module we just call erlang:load_nif/2. The first parameter is the path to the shared object to load. The second parameter I just specify as 0 to follow the test code, I've not investigated its use though I assume it shows up in the load_info argument in the module API.&lt;/p&gt;

&lt;p&gt;Another thing to note is that the NIF module and its corresponding Erlang module have overlapping function namespaces. When we define a function in the NIF module, it shows up in our Erlang module. The tests use a pattern to throw an error if the Erlang function gets called. In other words, when we load the NIF module it replaces the Erlang definition, so if we hit the Erlang definition we report an error.&lt;/p&gt;

&lt;p&gt;Our Erlang code looks like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;// mynif.erl
-module(mynif).
-export([start/0, do_something/1]).

start() -&amp;gt;
    erlang:load_nif("mynif", 0).

do_something(_Val) -&amp;gt;
    nif_error(?LINE).    

nif_error(Line) -&amp;gt;
    exit({nif_not_loaded,module,?MODULE,line,Line}).
// EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Building the Modules&lt;/h2&gt;

&lt;p&gt;Building appears to just be the standard shared object style. I happened to have an example lying around from my earlier work on an EEP0018 module (which I'll definitely be revisiting now). The linker dark magic is outside this simple example, but there are plenty of places that will explain this. I haven't tested the Linux flags, but they should work just fine.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;// Makefile
OTPROOT=/Users/davisp/tmp/otp_src_R13B03/
INCLUDES = -I$(OTPROOT)/erts/emulator/beam/

# OS X flags.
GCCFLAGS = -O3 -fPIC -bundle -flat_namespace -undefined suppress -fno-common -Wall

# Linux Flags
#GCCFLAGS = -O3 -fPIC -shared -fno-common -Wall

CFLAGS = $(GCCFLAGS) $(INCLUDES)
LDFLAGS = $(GCCFLAGS) $(LIBS)

OBJECTS = mynif.o

DRIVER = mynif.so
BEAM = mynif.beam

all: $(DRIVER) $(BEAM)

clean: 
    rm -f *.o *.beam $(DRIVER)

$(DRIVER): $(OBJECTS)
    gcc -o $@ $^ $(LDFLAGS)

$(BEAM): mynif.erl
    erlc $^
# EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;With all three of those files in your $CWD you should be able to just run &lt;code&gt;make&lt;/code&gt; and have the proper output in the same directory.&lt;/p&gt;

&lt;h2&gt;Running the Example&lt;/h2&gt;

&lt;p&gt;A sample console log to show that it behaves as expected:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ ~/tmp/otp_src_R13B03/bin/erl
Erlang R13B03 (erts-5.7.4) [source] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false]

Eshell V5.7.4  (abort with ^G)
1&amp;gt; mynif:start().
ok
2&amp;gt; mynif:do_something(0).
0
3&amp;gt; mynif:do_something(2).
4
4&amp;gt; mynif:do_something(nil).
** exception error: bad argument
     in function  mynif:do_something/1
        called as mynif:do_something(nil)
5&amp;gt; mynif:do_something(2.3). 
** exception error: bad argument
     in function  mynif:do_something/1
        called as mynif:do_something(2.3)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And there you have it. This is fairly exciting stuff. I've already got a list of projects I'm going to play with integrating into the NIF API to see what type of speedups I can get.&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2009/11/17/unix-in-python.html</id>
        <title>Unix in 13 lines of Python - Mostly trivial</title>
        <link href="http://www.davispj.com/2009/11/17/unix-in-python.html" />
        <updated>2009-11-17T00:00:00-05:00</updated>
        <content type="html">&lt;h1&gt;Unix in 13 lines of Python - Mostly trivial&lt;/h1&gt;

&lt;p&gt;I found &lt;a href="http://pwpwp.blogspot.com/2009/11/unix-in-14-lines-of-ruby-its-trivial.html"&gt;this&lt;/a&gt; fairly amusing, but just a big if statement? So I
ported it to Python. I was reminded that iterating over stdin in Python is
always a bit weird. I generally create an iterator that just yields
sys.stdin.readline() forever, but went more hackish here to conserve lines and
add lulz.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;import sys
print "You have no mail"

commands = {
    "uname": lambda args: "Punix 1.0",
    "halt": lambda args: exit(0)
}
error = lambda args: "Command not found."

sys.stdout.write("$ ")
for line in (sys.stdin.readline() for i in xrange(sys.maxint)):
    print commands.get(line.split()[0], error)(line.split()[1:])
    sys.stdout.write("$ ")
&lt;/code&gt;&lt;/pre&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2009/10/22/couchdb-boston-meetup.html</id>
        <title>CouchDB Boston Meetup - Tues, 10/27/09</title>
        <link href="http://www.davispj.com/2009/10/22/couchdb-boston-meetup.html" />
        <updated>2009-10-22T00:00:00-04:00</updated>
        <content type="html">&lt;h1&gt;CouchDB Boston Meetup - Tues, 10/27/09&lt;/h1&gt;

&lt;p&gt;A couple out-of-towners are planning on being in Boston, MA next week so I thought that was a good enough of a reason to organize a bit of a meetup. And by organize I mean I ask who else is interested.&lt;/p&gt;

&lt;p&gt;The current plan is to meetup this comming Tuesday. As you can tell I've spent quite a lot of time figuring out the details.&lt;/p&gt;

&lt;p&gt;Email me at &lt;a href="mailto:paul.joseph.davis@gmail.com"&gt;paul.joseph.davis@gmail.com&lt;/a&gt; if you're interested so I can gauge interest and expected group size.&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2009/09/04/gcd-clang-awesome.html</id>
        <title>Grand Central Dispatch + Clang == Awesome</title>
        <link href="http://www.davispj.com/2009/09/04/gcd-clang-awesome.html" />
        <updated>2009-09-04T00:00:00-04:00</updated>
        <content type="html">&lt;h1&gt;Grand Central Dispatch + Clang == Awesome&lt;/h1&gt;

&lt;h2&gt;Blocks&lt;/h2&gt;

&lt;p&gt;I read about blocks in 10.6 and they sounded quite cool. So the first thing I did (after poking at the new Expose) was to try and write a block:&lt;/p&gt;

&lt;p&gt;So I started with this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#include &amp;lt;stdio.h&amp;gt;

int
main(int argc, char* argv[])
{
    fprintf(stderr, "Creating a closure!\n");
    int x = ^{ printf("hello world\n"); };
    x();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Obviously the int "x = ^{}" syntax was wrong, but I was banking on compiler warnings to give me a hint:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;davisp@cube:~/tests/gcd$ gcc -o gcd first.c
first.c: In function 'main':
first.c:7: error: incompatible types in initialization
first.c:8: error: called object 'x' is not a function
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Doh! No love. But then an epiphany. The &lt;a href="http://arstechnica.com/apple/reviews/2009/08/mac-os-x-10-6.ars/9#llvm-clang"&gt;review&lt;/a&gt; on Ars Technica mentioned how Clang (a new compiler) was way more awesome at compiler warnings. Thanks to a &lt;a href="http://twitter.com/binary42/statuses/3689493854"&gt;tweet&lt;/a&gt; the other day I took it for a spin:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;davisp@cube:~/tests/gcd$ /Developer/usr/bin/clang -o gcd first.c
first.c:7:13: error: incompatible type initializing 'void (^)(void)', expected
      'int'
    int x = ^{ printf("hello world\n"); };
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 diagnostic generated.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Whoa neato! So make the suggested update:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#include &amp;lt;stdio.h&amp;gt;

int
main(int argc, char* argv[])
{
    fprintf(stderr, "Creating a closure!\n");
    void (^x)(void) = ^{ printf("hello world\n"); };
    x();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And compile and run:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;davisp@cube:~/tests/gcd$ /Developer/usr/bin/clang -o gcd first.c
davisp@cube:~/tests/gcd$ ./gcd
Creating a closure!
hello world
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Grand Central Dispatch&lt;/h2&gt;

&lt;p&gt;So the only thing left to do was to set one of our closures up to run in one of the GCD queues.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#include &amp;lt;dispatch/dispatch.h&amp;gt;
#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;time.h&amp;gt;

int
main(int argc, char* argv[])
{
    fprintf(stderr, "DO IT!\n");
    dispatch_apply(10, dispatch_get_global_queue(0, 0), ^(size_t i){
        fprintf(stderr, "I: %d\n", i);
    });
    sleep(1);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And try that on for size:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;davisp@cube:~/tests/gcd$ /Developer/usr/bin/clang -o gcd first.c
davisp@cube:~/tests/gcd$ ./gcd 
DO IT!
I: 0
I: 1
I: 2
I: 3
I: 4
I: 5
I: 6
I: 7
I: 8
I: 9
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I think me and GCD are going to be very good friends.&lt;/p&gt;

&lt;h2&gt;One Thought&lt;/h2&gt;

&lt;p&gt;So far I'm quite fond of GCD. It looks to be as easy to use as was advertised. The only thing that worries me is if I start using this for lots of code, I'm locked to OS X. Some of the scientific bits wouldn't even be worth sketching out without an implementation for Linux.&lt;/p&gt;

&lt;p&gt;I know its only a matter of time, but I haven't the slightest how long that will be.&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2009/07/06/gpl-makeshift-patriot.html</id>
        <title>GPL - Makeshift Patriot</title>
        <link href="http://www.davispj.com/2009/07/06/gpl-makeshift-patriot.html" />
        <updated>2009-07-06T00:00:00-04:00</updated>
        <content type="html">&lt;h1&gt;GPL - Makeshift Patriot&lt;/h1&gt;

&lt;h2&gt;Background&lt;/h2&gt;

&lt;p&gt;The other day &lt;a href="http://daringfireball.net"&gt;Gruber&lt;/a&gt; once again linked to a good &lt;a href="http://www.red-sweater.com/blog/825/getting-pretty-lonely/"&gt;article&lt;/a&gt;. The author, &lt;a href="http://www.red-sweater.com/about/DanielJalkut.html"&gt;Daniel Jalkut&lt;/a&gt;, described his opinion on the effects of using the &lt;a href="http://www.fsf.org/licensing/licenses/gpl.html"&gt;GPL&lt;/a&gt; as an &lt;a href="http://en.wikipedia.org/wiki/Open-source_software"&gt;OSS&lt;/a&gt; license. The post can be pretty well summed up with this quote:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;I suggest that the GPL does more to harm collaborative development
than it does to help it. - Daniel Jalkut
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Today Gruber posted a &lt;a href="http://ma.tt/2009/07/not-lonely-at-all/"&gt;follow up&lt;/a&gt; from &lt;a href="http://ma.tt/about/"&gt;Matt Mullenweg&lt;/a&gt; that argues the exact opposite, that &lt;a href="http://wordpress.com/"&gt;Wordpress&lt;/a&gt; is successful not in spite of, but because of the GPL.&lt;/p&gt;

&lt;h2&gt;Matt, you're wrong.&lt;/h2&gt;

&lt;p&gt;Towards the bottom of Matt's post he goes off on a tangent about how the GPL protects the rights (as in constitution) of 'users'. He even links to the Declaration of Independence article at &lt;a href="http://wikipedia.org/"&gt;Wikipedia&lt;/a&gt;. It'd be heart warming if it weren't such a fallacy. The core of his argument:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;You are free to do pretty much whatever you want as long as it does not
infringe on the freedoms of others. - Matt Mullenweg
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The only way a user has rights to the derivative work is if those rights are somehow propagated from original to derivative which is exactly why the GPL exists. Welcome to the beginning, Alice.&lt;/p&gt;

&lt;p&gt;Lets rephrase the argument: "The GPL is good because it protects users' rights which only exist because of the GPL." Recursive FTW!&lt;/p&gt;

&lt;h2&gt;Bottom Line&lt;/h2&gt;

&lt;p&gt;The GPL exists and is used as a tool to further the political and philosophical agendas of those who choose to use it. This is not a bad thing. As a developer you are free to do as you choose with your code. As a user I must decide if I'm willing to abide by your license just as I must choose to obey a closed source license. But for fuck's sake, don't pretend you're any different than a closed source licensor. Your onerous requirements annoy the shit out of me just as much as anyone else's.&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2009/06/17/forging-beats-tlas.html</id>
        <title>Forging - Beats TLAs</title>
        <link href="http://www.davispj.com/2009/06/17/forging-beats-tlas.html" />
        <updated>2009-06-17T00:00:00-04:00</updated>
        <content type="html">&lt;h1&gt;Forging - Beats TLAs&lt;/h1&gt;

&lt;h2&gt;Avoiding hate-mail&lt;/h2&gt;

&lt;p&gt;I am a fan of testing. I write tests. I try and write good tests that test functionality and not that a computer can properly add integers. Testing is great for validating my mental model and checking that refactored code continues to conform to my mental model.&lt;/p&gt;

&lt;h2&gt;New analogy!&lt;/h2&gt;

&lt;p&gt;I just realized the reason why I dislike test driven development. My coding is like simulated annealing. I work up alternative solutions quickly and iterate through until I find a local extrema that appears to suck the least. My testing phases are more like a non-linear cooling optimization at the end. To me these phases are the final step that gives any particular solution its strength and confidence.&lt;/p&gt;

&lt;h2&gt;Slightly differently&lt;/h2&gt;

&lt;p&gt;I code like a blacksmith forges. Get the object of current obsession malleable, beat on it for awhile and then temper the result. Face it. Forging etymologically kicks the crap out of all those TLA's.&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2009/06/09/erlang-deps-graph.html</id>
        <title>Erlang Dependency Graph</title>
        <link href="http://www.davispj.com/2009/06/09/erlang-deps-graph.html" />
        <updated>2009-06-09T00:00:00-04:00</updated>
        <content type="html">&lt;h1&gt;Erlang Dependency Graph&lt;/h1&gt;

&lt;h2&gt;Overview&lt;/h2&gt;

&lt;p&gt;Got distracted and wrote a small parser for &lt;a href="http://www.erlang.org/doc/apps/dialyzer/index.html"&gt;dialyzer&lt;/a&gt; to print the
dependencies for a set of Erlang beam files. Just used CouchDB sources as that's
what I had handy.&lt;/p&gt;

&lt;h2&gt;&lt;a href="/scripts/2009-06-09-erlang-deps-graph.py"&gt;Script&lt;/a&gt;&lt;/h2&gt;

&lt;pre&gt;&lt;code&gt;#! /usr/bin/env python

import os
import re
import subprocess as sp
import sys

edge_re = re.compile(r"(couch[^:]+):([^/]+)/\d+")

def dialyze(file):
    command = ' '.join([
        "dialyzer",
        "--build_plt",
        "-pa", "src/ibrowse",
        "-pa", "src/mochiweb",
        "-pa", "/usr/local/lib/erlang/lib",
        "-c", file
    ])
    pipe = sp.Popen(
        command, shell=True, stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE
    )
    (stdout, stderr) = pipe.communicate(input="")
    for line in stdout.split("\n"):
        match = edge_re.match(line.strip())
        if not match:
            continue
        yield (match.group(1), match.group(2))

def start_graph():
    print "digraph G {"

def add_edges(file, edges):
    src = os.path.split(file)[1][:-len(".beam")]
    edges[src] = set()
    for (mod, fun) in dialyze(file):
        edges[src].add(mod)

def end_graph():
    print "}"

def main():
    if len(sys.argv) != 2:
        print "usage: %s code_dir" % sys.argv[0]
        exit(-1)

    start_graph()
    edges = {}
    for root, dnames, fnames in os.walk(sys.argv[1]):
        for fname in fnames:
            if not fname.endswith(".beam"):
                continue
            add_edges(os.path.join(root, fname), edges)
    keys = edges.keys()
    keys.sort(key=lambda x: len(edges[x]), reverse=True)
    for k in keys:
        for m in edges[k]:
            print "  %s -&amp;gt; %s;" % (k, m)
    end_graph()

if __name__ == '__main__':
    main()
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Result&lt;/h2&gt;

&lt;p&gt;&lt;a href="/images/2009-06-09-erlang-deps-graph.jpg"&gt;
&lt;img width="600" src="/images/2009-06-09-erlang-deps-graph.jpg" alt="CouchDB Dependency Graph" border="0" /&gt;
&lt;/a&gt;&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2009/05/18/couchdb-timings.html</id>
        <title>CouchDB JSON Parser Timings</title>
        <link href="http://www.davispj.com/2009/05/18/couchdb-timings.html" />
        <updated>2009-05-18T00:00:00-04:00</updated>
        <content type="html">&lt;h1&gt;CouchDB JSON Parser Timings&lt;/h1&gt;

&lt;h2&gt;Overview&lt;/h2&gt;

&lt;p&gt;I've put together some work on integrating &lt;a href="http://github.com/davisp/couchdb/tree/eep0018"&gt;eep0018&lt;/a&gt; int CouchDB as
well as adding support for &lt;a href="http://github.com/davisp/couchdb/tree/spidermonkey181"&gt;Spidermonkey 1.8.1&lt;/a&gt;. This is all still very
experimental. I have the test suite passing for both branches except where
Spidermonkey's JSON serialization differs from the JavaScript function
previously used ([undefined] is serialized as [null]).&lt;/p&gt;

&lt;p&gt;So, after getting those branches together I spent a bit of time and ran some
tests to see what kind of speed differences I could get. Turns out it's
dependent on the amount of data we give it, but there is a noticeable impact.&lt;/p&gt;

&lt;h2&gt;Branches&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://github.com/davisp/couchdb/tree/master"&gt;Trunk&lt;/a&gt; - As of when I ran the tests today.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://github.com/davisp/couchdb/tree/eep0018"&gt;eep0018&lt;/a&gt; - Includes the JSON parser in Erlang&lt;/li&gt;
&lt;li&gt;&lt;a href="http://github.com/davisp/couchdb/tree/spidermonkey181"&gt;Spidermonkey 1.8.1&lt;/a&gt; - Includes eep0018 and Spidermonkey trunk as of today (or yesterday...) Configured with --enable-optimized=-O3 is the only flag touched. (Ie, no JIT enabled.)&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;Caveats&lt;/h2&gt;

&lt;p&gt;Notice that I'm inserting 4KiB documents. If we shrunk those down to a couple bytes then these numbers tend to even out. I know that the eep0018 code is hampered by moving across the VM boundary so when we're dealing with _bulk_docs the key would be that eep0018 allows us to post more docs in a single request.&lt;/p&gt;

&lt;p&gt;Other things people might want to play with are the numbers in couch_utils:should_flush/0 to see if we can tune how much data gets sent to the view server in one go.&lt;/p&gt;

&lt;p&gt;So, there's a lot of permutations for different speed tests, not to mention just making sure that these branches aren't screwed beyond recognition in terms of actually working.&lt;/p&gt;

&lt;p&gt;If you're bored and looking for something to do instead of clicking through Twitter or the current trendy social news site on a Monday morning, I invite you to grab one or two or all of the branches and run your own tests.&lt;/p&gt;

&lt;h2&gt;Benchmark Script&lt;/h2&gt;

&lt;p&gt;I realize this isn't the most sound measurement system, but I'm tired and didn't
feel like being thorough. You can grab it &lt;a href="/scripts/2009-05-18-couchdb-timings.py"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#! /usr/bin/env python
import time
import couchdb
server = couchdb.Server("http://127.0.0.1:5984/")
if "eep0018" in server:
    del server["eep0018"]
db = server.create("eep0018")

start = time.time()
updates = []
for docid in xrange(10000):
    doc = {"_id": "%.10d" % docid, "integer": docid, "text": "a" * 4096}
    updates.append(doc)
    if len(updates) &amp;gt;= 1000:
        db.update(updates)
        updates = []
if len(updates): db.update(updates)
end = time.time()
print "Inserting: %f" % (end - start)

start = time.time()
for row in db.query("function(doc) {emit(doc._id, doc.integer % 100);}"):
    pass
end = time.time()
print "Map only: %f" % (end - start)

start = time.time()
for row in db.query("function(doc) {emit(doc._id, doc.integer / 100);}",
            reduce_fun="function(keys, vals) {return sum(vals);}"
        ):
    pass
end = time.time()
print "With reduce: %f" % (end - start)

start = time.time()
for row in db.query("function(doc) {emit(doc._id, doc.integer * 2);}",
            reduce_fun="_sum"
        ):
    pass
end = time.time()
print "With erlang reduce: %f" % (end - start)
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Results&lt;/h2&gt;

&lt;p&gt;This is the data that I got from running that test script against each of the three branches three times. The error bars are simple min/max notations. No fancy standard deviation shit going on here.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/2009-05-18-couchdb-timings.jpg" alt="CouchDB JSON Parsing Times" /&gt;&lt;/p&gt;

&lt;h2&gt;Hand Waving&lt;/h2&gt;

&lt;p&gt;The results generally make sense. We get a speed bump during insertion when we switch to the eep0018 branch. The views are faster too. When we add the Spidermonkey 1.8.1 updates we get the same insert speed (because we don't touch the view server) and faster view computation.&lt;/p&gt;

&lt;p&gt;For the more motivated timing people out there, if someone wants to play around with data sizes and look at timings for different scenarios that'd be pretty awesome. And more fancy number math probably wouldn't hurt.&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2009/01/24/hypercouch.html</id>
        <title>HyperCouch</title>
        <link href="http://www.davispj.com/2009/01/24/hypercouch.html" />
        <updated>2009-01-24T00:00:00-05:00</updated>
        <content type="html">&lt;h1&gt;HyperCouch&lt;/h1&gt;

&lt;h2&gt;Overview&lt;/h2&gt;

&lt;p&gt;I really don't like Java. Not one bit. I spent entirely too long fighting with it to get a &lt;a href="http://lucene.apache.org/" title="Apache Lucene"&gt;Lucene&lt;/a&gt; full text indexer going for &lt;a href="http://couchdb.apache.org" title="Apache CouchDB"&gt;CouchDB&lt;/a&gt; running. Somewhere during that work I stumbled across some &lt;a href="http://www.python.org/" title="Most awesomest programming language. Evar!"&gt;Python&lt;/a&gt; bindings for &lt;a href="http://hyperestraier.sourceforge.net/" title="Hyper Estraier"&gt;Hyper Estraier&lt;/a&gt; that looked to be pretty awesome. In about three hours I managed to duplicate about forty hours of Java work (estimates adjusted for personal bias). Now I present you with &lt;a href="http://github.com/davisp/hypercouch" title="HyperCouch Full Text Goodness"&gt;HyperCouch&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Yay Features!&lt;/h2&gt;

&lt;p&gt;Just to start you off with some indication of indication of what's currently available in HyperCouch:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Full text searching - Too Obvious?&lt;/li&gt;
&lt;li&gt;Skip/limit parameters for paging&lt;/li&gt;
&lt;li&gt;Indexing via custom &lt;a href="http://en.wikipedia.org/wiki/JavaScript" title="Yay browser programming maturing to server language!"&gt;JavaScript&lt;/a&gt; methods - No wasted view generation here.&lt;/li&gt;
&lt;li&gt;Indexing supports full-text and property searching.&lt;/li&gt;
&lt;li&gt;Searching arbitrary document properties&lt;/li&gt;
&lt;li&gt;Custom sorting based on document properties&lt;/li&gt;
&lt;li&gt;HTML text snippets from the document based on the search&lt;/li&gt;
&lt;li&gt;Did I mention indexing via custom JavaScript methods?&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;Wha?&lt;/h2&gt;

&lt;p&gt;The basic steps to get &lt;a href="http://github.com/davisp/hypercouch" title="HyperCouch Full Text Goodness"&gt;HyperCouch&lt;/a&gt; up and running are covered on the &lt;a href="http://github.com/davisp/hypercouch" title="HyperCouch @ GitHub"&gt;GitHub&lt;/a&gt; page. Hopefully those directions are sufficient enough for you to get the necessary bits installed. I feel a bit bad about requiring so many projects to get things working but hopefully the install process doesn't cause to many issues. Feel more than free to email me directly at &lt;a href="mailto:paul.joseph.davis@gmail.com" title="My email address!"&gt;paul.joseph.davis@gmail.com&lt;/a&gt; if you're having issues.&lt;/p&gt;

&lt;p&gt;After it appears to be up and running the only thing you should need to do is add a &lt;code&gt;ft_index&lt;/code&gt; function to one or many of your &lt;code&gt;_design/documents&lt;/code&gt; in &lt;a href="http://couchdb.apache.org" title="Apache CouchDB"&gt;CouchDB&lt;/a&gt;. An &lt;code&gt;ft_index&lt;/code&gt; function acts very much like a normal &lt;a href="http://couchdb.apache.org" title="Apache CouchDB"&gt;CouchDB&lt;/a&gt; &lt;code&gt;view&lt;/code&gt; function in that it takes a single document as input and produces some output. Unlike &lt;code&gt;view&lt;/code&gt; functions &lt;code&gt;ft_index&lt;/code&gt; functions don't use an &lt;code&gt;emit(key, value)&lt;/code&gt; function to communicate results. Instead they use &lt;code&gt;index(data)&lt;/code&gt; and &lt;code&gt;property(name, value)&lt;/code&gt; functions to specify data that should be indexed.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;index(data)&lt;/code&gt; calls should specify textual data that is intended for searching via full text queries like 'foo AND bar'.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;property(name, value)&lt;/code&gt; calls specify properties of the document that should be indexed for operations such as filtering and sorting of full text search results.&lt;/p&gt;

&lt;h2&gt;Design Document Foo&lt;/h2&gt;

&lt;pre&gt;&lt;code&gt;{
    "_id": "_design/my_awesome_doc",
    "_rev": "024244112",
    "ft_index" : "function(doc) {if(doc.body) index(doc.body); if(doc.baz) property("baz_prop", doc.baz)}"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This stuff is pretty straight forward. You can specify an ft_index on as many &lt;code&gt;_design/documents&lt;/code&gt; as you want. The functions are additive in terms of &lt;code&gt;index(data)&lt;/code&gt; and &lt;code&gt;property(name, value)&lt;/code&gt; both attach data to the same document object. There will probably be weirdness if your function doesn't compile in JavaScript right now. Its on the agenda to add proper error reporting for.&lt;/p&gt;

&lt;h2&gt;Query String Parameters&lt;/h2&gt;

&lt;p&gt;Without further ado, the list of url parameters for querying &lt;a href="http://github.com/davisp/hypercouch" title="HyperCouch Full Text Goodness"&gt;HyperCouch&lt;/a&gt; are as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;q&lt;/code&gt; - The full text query (Can use AND OR etc. See the &lt;a href="http://hyperestraier.sourceforge.net/uguide-en.html#searchcond" title="Searching Hyper Estraier"&gt;Hyper Estraier Search&lt;/a&gt; documentation.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;matching&lt;/code&gt; - Specifies different HyperEstraier query processing types. (Default is most applicable)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;limit&lt;/code&gt; - Limit the number of returned documents.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;skip&lt;/code&gt; - Skip a number of documents in the result set.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;order&lt;/code&gt; - Specify ordering of results on arbitrary parameters. See the &lt;a href="http://hyperestraier.sourceforge.net/uguide-en.html#searchcond" title="Searching Hyper Estraier"&gt;Hyper Estraier&lt;/a&gt; docs.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;highlight&lt;/code&gt; - Receive HTML highlights from the documents returned. (Currently only supports &lt;code&gt;hightlight=html&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Arbitrary property operations. See the &lt;a href="http://hyperestraier.sourceforge.net/uguide-en.html#searchcond" title="Searching Hyper Estraier"&gt;Hyper Estraier&lt;/a&gt; attribute search conditons section.&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;Matching&lt;/h2&gt;

&lt;p&gt;I'm not sure exactly what all the applications of the different &lt;code&gt;matching&lt;/code&gt; types are, but I've included support for it. Mostly because I was curious what they did and still can't really figure out the difference.&lt;/p&gt;

&lt;p&gt;Supported matching types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;simple - Default&lt;/li&gt;
&lt;li&gt;rought - Different NOT syntax (-term instead of !term) (I think)&lt;/li&gt;
&lt;li&gt;union - Default OR&lt;/li&gt;
&lt;li&gt;isect - Default AND&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;Limit - Skip&lt;/h2&gt;

&lt;p&gt;I ran into issues testing this bit based on ordering. There were a few oddities with results being returned in different orders etc. I know that Hyper Estraier uses a call to &lt;code&gt;quicksort&lt;/code&gt; in the internals which isn't a stable sort so I guess technically this could be part of the issue. Let it be said, I only test limit/offset for a specified ordering.&lt;/p&gt;

&lt;h2&gt;Order&lt;/h2&gt;

&lt;p&gt;You can specify an 'order=[STRA|STRD|NUMA|NUMD] prop_name' to receive results in 'string ascending', 'string descending', 'number ascending', or 'number descending' order. From testing, I can say that it appears that Hyper Estraier appears to be doing an internal conversion so you shouldn't have to worry about proper typing, though you might get unexpected results if you number sort something that can't be converted to a number.&lt;/p&gt;

&lt;h2&gt;Highlight&lt;/h2&gt;

&lt;p&gt;If you specify 'highlight=html' in the query string each returned document will contain a &lt;code&gt;highlight&lt;/code&gt; member that is an HTML snippet of the indexed document. It was easy to implement and isn't thoroughly tested, but it's there.&lt;/p&gt;

&lt;h2&gt;Arbitrary Properties&lt;/h2&gt;

&lt;p&gt;You can specify arbitrary property limiting using the operators specified in the &lt;a href="http://hyperestraier.sourceforge.net/uguide-en.html#searchcond" title="Searching Hyper Estraier"&gt;Hyper Estraier Search&lt;/a&gt; docs. Each doc can have an arbitrary number of properties associated with it. You can limit and combine any number of limits to properties etc. For those of you reading ahead in the Hyper Estraier docs, the proper format for the query string is to do a &lt;code&gt;property_name=operator argument&lt;/code&gt;. Ie, if you called &lt;code&gt;property("foo", doc.foo_value)&lt;/code&gt; in your &lt;code&gt;ft_index&lt;/code&gt; method, you can specify 'foo=NUMLT 3' in the URL to receive documents that only contain a &lt;code&gt;foo&lt;/code&gt; value less than three. There are approximately fifteen or so different operators you can use for limiting both string and numeric properties.&lt;/p&gt;

&lt;p&gt;Operator types for property matching taken from the Hyper Estraier docs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;STREQ - is equal to the string&lt;/li&gt;
&lt;li&gt;STRNE - is not equal to the string&lt;/li&gt;
&lt;li&gt;STRINC - includes the string&lt;/li&gt;
&lt;li&gt;STRBW - begins with the string&lt;/li&gt;
&lt;li&gt;STREW - ends with the string&lt;/li&gt;
&lt;li&gt;STRAND - includes all tokens in the string&lt;/li&gt;
&lt;li&gt;STROR - includes at least one token in the string&lt;/li&gt;
&lt;li&gt;STROREQ - is equal to at least one token in the string&lt;/li&gt;
&lt;li&gt;STRRX - matches regular expressions of the string&lt;/li&gt;
&lt;li&gt;NUMEQ - is equal to the number or date&lt;/li&gt;
&lt;li&gt;NUMNE - is not equal to the number or date&lt;/li&gt;
&lt;li&gt;NUMGT - is greater than the number or date&lt;/li&gt;
&lt;li&gt;NUMGE - is greater than or equal to the number or date&lt;/li&gt;
&lt;li&gt;NUMLT - is less than the number or date&lt;/li&gt;
&lt;li&gt;NUMLE - is less than or equal to the number or date&lt;/li&gt;
&lt;li&gt;NUMBT - is between the two numbers or dates&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;URL Foo&lt;/h2&gt;

&lt;pre&gt;&lt;code&gt;http://127.0.0.1:5984/db_name/_fti?q=foo+bar
http://127.0.0.1:5984/db_name/_fti?q=baz&amp;amp;limit=2
http://127.0.0.1:5984/db_name/_fti?q=homies&amp;amp;offset=19&amp;amp;limit=1
http://127.0.0.1:5984/db_name/_fti?q=no+ide&amp;amp;matching=rough
http://127.0.0.1:5984/db_name/_fti?q=*.**&amp;amp;my_property=NUMLT+2
http://127.0.0.1:5984/db_name/_fti?q=random+doc&amp;amp;prop_awesome=NUMBT+50+100000
http://127.0.0.1:5984/db_name/_fti?q=witty&amp;amp;order=wicked_prop_name+NUMD
http://127.0.0.1:5984/db_name/_fti?q=domain+universe&amp;amp;highlight=html
http://127.0.0.1:5984/db_name/_fti?q=which&amp;amp;skip=2
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Returned Structure&lt;/h2&gt;

&lt;p&gt;The returned data should look something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
    "total_rows": 2,
    "rows": [
        {"id": "doc_id", "prop1": "val1", "prop2": "val2"},
        {"id": "doc_id", "prop3": "val_schrodinger"}
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The structure is probably going to have some refinements and there are a few caveats in property names for indexing, but all in all it should be fairly easy to figure out.&lt;/p&gt;

&lt;p&gt;Remember that &lt;code&gt;total_rows&lt;/code&gt; is the number of documents matching the query. At the moment there is no way to get the total number of indexed documents for a given database.&lt;/p&gt;

&lt;h2&gt;That is All&lt;/h2&gt;

&lt;p&gt;Hopefully that's enough of a description to whet your appetite. I'll be adding more features and better error messages as I go along. Hopefully I can trick a few people into using it and sending me feed back to make it better. Like I said feel free to email &lt;a href="mailto:paul.joseph.davis@gmail.com" title="My email address!"&gt;me&lt;/a&gt; with questions or suggestions.&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2009/01/18/couchdb-lucene-indexing.html</id>
        <title>CouchDB Lucene Indexing</title>
        <link href="http://www.davispj.com/2009/01/18/couchdb-lucene-indexing.html" />
        <updated>2009-01-18T00:00:00-05:00</updated>
        <content type="html">&lt;h1&gt;CouchDB Lucene Indexing&lt;/h1&gt;

&lt;h2&gt;Notice&lt;/h2&gt;

&lt;p&gt;I rewrote &lt;a href="http://github.com/davisp/couchdb-lucene"&gt;couchdb-lucene&lt;/a&gt; pretty thoroughly last night. I decided that it was quick enough that instead of spamming my own blog I'll just spam your feed reader.&lt;/p&gt;

&lt;h2&gt;Overview&lt;/h2&gt;

&lt;p&gt;I updated &lt;a href="http://github.com/davisp/couchdb-lucene"&gt;couchdb-lucene&lt;/a&gt; today to work with trunk. I changed the behavior of the indexer to index views generated as per normal &lt;a href="http://couchdb.apache.org/"&gt;CouchDB&lt;/a&gt; semantics with a few minor constraints.&lt;/p&gt;

&lt;h2&gt;Indexing Strategy&lt;/h2&gt;

&lt;p&gt;The basics now revolve around a &lt;code&gt;_design/lucene&lt;/code&gt; document in your database. Any view defined in this document will be indexed by &lt;a href="http://lucene.apache.org/"&gt;Lucene&lt;/a&gt;. In order to be indexed appropriately you should make sure that all of the &lt;code&gt;emit(key, value)&lt;/code&gt; calls specify &lt;code&gt;doc._id&lt;/code&gt; as the key.&lt;/p&gt;

&lt;p&gt;In the future I plan on adding more configuration options to this document so that indexing can be controlled from Futon etc. At the moment the interaction is limited to just specifying views to be indexed.&lt;/p&gt;

&lt;p&gt;The reason for changing semantics from any specified views in any design document to all views in a single document are two fold. First it makes the index reset semantics a lot easier to think about. Second, it allows us to extend the awesomeness of Lucene querying a crap load further.&lt;/p&gt;

&lt;p&gt;For instance, if your &lt;code&gt;_design/lucene&lt;/code&gt; document specifies two views &lt;code&gt;foo&lt;/code&gt; and &lt;code&gt;bar&lt;/code&gt; you can use the standard Lucene syntax like &lt;code&gt;foo:plankton AND bar:goat&lt;/code&gt; to get back the intersection of those two views. In the future I can see adding support for numbers to support numeric range queries. Technically, if you emit string sortable dates, you can already do date ranges.&lt;/p&gt;

&lt;h2&gt;Caveats&lt;/h2&gt;

&lt;p&gt;At the moment, Lucene indexes are reset every time the &lt;code&gt;_design/lucene&lt;/code&gt; revision changes. Obviously this is sub-par and will change eventually, but I didn't have the brain power to consider all that was necessary to track when coding that bit.&lt;/p&gt;

&lt;h2&gt;Querying&lt;/h2&gt;

&lt;p&gt;After the rewrite, querying should be a crapload more efficient. I'm caching all of the Lucene objects in an LRU cache so things should keep pretty quick.&lt;/p&gt;

&lt;h2&gt;For Lucene People&lt;/h2&gt;

&lt;p&gt;Right now I'm not using any of the extra fancy features in Lucene and the few conversations I've had about Lucene internals make me realize that I'm probably doing some fairly nasty things. If you know Lucene and have some time please take a look at &lt;code&gt;org.apache.couchdb.lucene.Index&lt;/code&gt; and send me any comments about things I should be doing to make stuff suck less.&lt;/p&gt;

&lt;h2&gt;Java People&lt;/h2&gt;

&lt;p&gt;My Java is probably less than awesome. Any of you out there that feels like helping out with this project, please for the love of god start sending me pull requests on github. That is all.&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2008/12/31/couchdb-expectations.html</id>
        <title>CouchDB Expectations</title>
        <link href="http://www.davispj.com/2008/12/31/couchdb-expectations.html" />
        <updated>2008-12-31T00:00:00-05:00</updated>
        <content type="html">&lt;h1&gt;CouchDB Expectations&lt;/h1&gt;

&lt;h2&gt;Overview&lt;/h2&gt;

&lt;p&gt;Just got finished reading this &lt;a href="http://spyced.blogspot.com/2008/12/couchdb-not-drinking-kool-aid.html"&gt;post&lt;/a&gt; by Jonathan Ellis that brought up some interesting ideas about what CouchDB is and more specifically what it isn't. I started to write a comment trying to explain some of my disagreement and realized I was writing an entire post. So instead of spamming his comments, I figure I'll just post my thoughts in my own little corner.&lt;/p&gt;

&lt;h2&gt;I distribute, therefore I am... distributed?&lt;/h2&gt;

&lt;p&gt;Jonathon's main beef appears to be that CouchDB considers itself a distributed database and he disagrees with that billing. So before I get to far ahead of myself I thought I'd pull up &lt;a href="http://en.wikipedia.org/wiki/Distributed_database"&gt;Wikipedia's&lt;/a&gt; definition:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;A distributed database is a database that is under the control of a central
database management system (DBMS) in which storage devices are not all attached
to a common CPU. It may be stored in multiple computers located in the same
physical location, or may be dispersed over a network of interconnected computers.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;As in all things life, the true meaning of that definition depends on how you parse it. As a proponent of the "CouchDB is distributed" idea, I would say that we meet the definition in full. Someone arguing against CouchDB being distributed would refer to the line that comments on the "under the control of a central database management system". Or to put it another way, with no apparent coordination of a set of CouchDB nodes, is it really a distributed database?&lt;/p&gt;

&lt;h2&gt;The Interwebs?&lt;/h2&gt;

&lt;p&gt;So, to answer whether CouchDB is a distributed database, I'll merely ask, "Is the web a distributed system?" Because really, the answers are one in the same. Some would argue that the web is merely the collection of emergent properties of the underlying systems. I would argue that the web, while having no central coordinating authority, is a distributed system. So, in reality, its merely, "You say tomato, I say tomato."&lt;/p&gt;

&lt;h2&gt;Less Hand Waving&lt;/h2&gt;

&lt;p&gt;While given the whole theoretical hand wavy arguments, I think I understand Jonathon's concern. CouchDB does not provide users with a method of automatically spreading load amongst a set of physically distinct nodes. No automatic document sharding or re-balancing as nodes enter and leave the system. Yet. This sort of work is planned, its been discussed, general methods and algorithms have been proposed on IRC and the mailing lists. The thing is, such features haven't hit the top of the priority queue in terms of their cost/benefit ratio. I think one of Damien Katz's less appreciated traits is that he appears to be extremely focused on developing features in order of the most benefit to the community, instead of in the order of neato-ness. That or we have quite different definitions of neato.&lt;/p&gt;

&lt;h2&gt;CouchDB != RDBMS&lt;/h2&gt;

&lt;p&gt;Lots of people seem to confuse CouchDB as a replacement for an RDBMS. They may try to convince you otherwise by saying things like, "Now, I know CouchDB isn't trying to replace the RDBMS's out there, but..." and then launch into a laundry list of things CouchDB doesn't do. I don't think its a conscious decision at all. I spent my first three or four months with CouchDB trying to figure out how to bend it to my preconceived notions. It turns out I was bending the wrong thing. It was how I thought of CouchDB that needed to change.&lt;/p&gt;

&lt;p&gt;I have spent a fair amount of time with PostgreSQL. Its awesome. I very much dislike MySQL. Its not awesome. The reasons I like PostgreSQL are all the reasons that I imagine Jonathon is alluding to when he says that you should ask your favorite non-MySQL DBA why those fancy features exist. The thing is, he's also disregarding that a huge part of the market using RDBMS's aren't using these features. He also doesn't mention the fact that all the talk about denormalizing to improve scalability are spitting in the face of these features.&lt;/p&gt;

&lt;p&gt;The most important part of this entire post is the following statement: The only time when CouchDB should be considered as a replacement for an RDBMS is when an RDBMS was the wrong choice in the first place.&lt;/p&gt;

&lt;p&gt;Just as CouchDB is not always the right tool for the job, RDBMS's are also not always the right answer. Now, to be clear, my financial and medical institutions better damn well be using some sort of RDBMS that has all of those fancy features. My blog on the other hand (if it weren't static) does not require materialized views or pivot tables.&lt;/p&gt;

&lt;h2&gt;Other Responses&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Yes, writes to disk are serialized for a given DB on a given CouchDB node. Though its embarrassingly parallel once sharding is implemented.&lt;/li&gt;
&lt;li&gt;CouchDB databases require compaction to reclaim unused space. Similar in spirit to PostgreSQL's vaccuum. This isn't at all a consequence of MVCC though. It has to do with CouchDB's append only file structure. The append only file operations make things like reader snapshots trivial in implementation. Could compaction be better? Certainly! And it will be eventually. Patches welcome :D&lt;/li&gt;
&lt;li&gt;The argument that Map/Reduce is hard could be valid. But I don't think it's any harder than SQL. Doing complicated things is generally hard regardless of the paradigm/language.&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;Updatifying&lt;/h2&gt;

&lt;p&gt;I wrote this pretty quick so it's probably got errors and what not. I'll be re-reading it and updating over the next day or so.&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2008/11/18/couchdb-from-far-away.html</id>
        <title>CouchDB from Far Away</title>
        <link href="http://www.davispj.com/2008/11/18/couchdb-from-far-away.html" />
        <updated>2008-11-18T00:00:00-05:00</updated>
        <content type="html">&lt;h1&gt;CouchDB from Far Away&lt;/h1&gt;

&lt;h2&gt;Overview&lt;/h2&gt;

&lt;p&gt;I read another blog post about &lt;a href="http://couchdb.apache.org/"&gt;CouchDB&lt;/a&gt; today. I'm not going to link to it. This post isn't intended to be whiny drivel. Instead I thought I'd try and describe CouchDB from the point of view of someone who's been using and developing for it for a couple months. Hopefully I can manage to paint a bit of an overall picture that tries to help new comers understand some of the more basic ideas of CouchDB.&lt;/p&gt;

&lt;h2&gt;In a Nutshell&lt;/h2&gt;

&lt;p&gt;Notice the 'Nutshell'. This description is not intended to be complete or detailed. I'm merely trying to paint the overall picture. Moving on.&lt;/p&gt;

&lt;p&gt;CouchDB is centered around two core ideas. A document store and Map/Reduce.&lt;/p&gt;

&lt;h2&gt;Document Store&lt;/h2&gt;

&lt;p&gt;The central document store component is a flat name space of documents. A document is represented in JSON. JSON looks like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
    "_id": "me",
    "_rev": "234999001",
    "name": "Paul Davis",
    "online": {
        "email": "paul.joseph.davis@gmail.com",
        "blog": "http://www.davispj.com",
        "twitter": "http://twitter.com/davisp"
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That's it. CRUD operations are performed via HTTP using the GET, PUT, POST, and DELETE verbs. Its dead simple.&lt;/p&gt;

&lt;h2&gt;Map/Reduce&lt;/h2&gt;

&lt;p&gt;The second major component is the Map/Reduce framework. For those of you out there who have never heard of Google, a brief description is: Map functions take an input and give an output consisting of Key/Value pairs. A reduce function takes a set of Key/Value pairs as input and produces a single output. Yes. It's that simple. Kind of. (More later).&lt;/p&gt;

&lt;p&gt;A map functions looks like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function(doc)
{
    for(var i in doc.online)
    {
        emit(doc.name, doc.online[i]);
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Given the example JSON document from above, it should be clear that this is going to result in three Key/Value pairs. One for each of my online object members. That's it.&lt;/p&gt;

&lt;h2&gt;Using Map&lt;/h2&gt;

&lt;p&gt;These next two sentences are very important, pay close attention: Map output is stored in ascending order according to the Key. Keys can be arbitrary JSON objects over which there is a very specific &lt;a href="http://wiki.apache.org/couchdb/View_collation"&gt;sort order&lt;/a&gt;. Now read those last two sentences three or four more times. Regardless of how many times you read them and think you understand them, chances are you're going to end up in a situation and be taken aback about the crafty ways you can use those two properties.&lt;/p&gt;

&lt;p&gt;Now that you have that idea in your head, go read Christopher Lenz's blog post on &lt;a href="http://www.cmlenz.net/archives/2007/10/couchdb-joins"&gt;CouchDB 'Joins&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This is one of the main ideas behind using CouchDB. Understanding your data and understanding how to sort it in such away that you can get what you need by taking a slice of that sorted list. More on this later.&lt;/p&gt;

&lt;h2&gt;Using Reduce&lt;/h2&gt;

&lt;p&gt;While ignoring the &lt;code&gt;rereduce&lt;/code&gt; parameter for now, a reduce function looks like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function(keys, values, rereduce)
{
    if(rereduce)
    {
        return sum(values);
    }
    else
    {
        return values.length;
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Reductions have quite a few parameters but suffice it to say, this function will give us two pieces of useable information. The total number of online places in our entire database as well as the number of places per unique name. In my experience, Reduce is used to get summary type information which makes a certain amount of obvious sense. They can be used for getting a sum of values as shown, or getting the top N results for each Key emitted by a Map. I won't say much more on the subject because use cases are either very straightforward or entirely too detailed for this post.&lt;/p&gt;

&lt;h2&gt;Gory Details&lt;/h2&gt;

&lt;p&gt;So we have a Map. We have a Reduce. What &lt;em&gt;can't&lt;/em&gt; we do with these? As it turns out there are two specific types of limits. Both types of functions must produce identical output for identical input. Reduce functions must also produce identical output regardless of the order of input as well as operate on it's own output. The second type of limitation regards calculating values that require input from multiple documents. More on this shortly.&lt;/p&gt;

&lt;h2&gt;The reason?&lt;/h2&gt;

&lt;p&gt;Map/Reduce functions are calculated incrementally. If you create or update 5 records and access the Map/Reduce view, only 5 records are processed. 0 records means no processing (other than data access). This is a fairly important point people like to confuse. They think that these possibly complex Map/Reduce jobs mean hours of waiting around when in reality, construction of the Map/Reduce view is amortized over every single request made to the server. Shake your SQL stick at them apples (No, caching is not the same thing).&lt;/p&gt;

&lt;h2&gt;Too Restrictive You Say?&lt;/h2&gt;

&lt;p&gt;For anyone who is dejected after reading that last section, think of this next one as your chocolate chip cookie. Consider the semi-recent release of Google's AppEngine. There was a &lt;em&gt;lot&lt;/em&gt; of confusion over how it worked. I mean, how could the world's most hugest computing giant not support count(*), sum(), or return more than a 1K records per query? Well it turns out that things in SQL (and other systems) actually don't scale worth shit beyond a single node. And last I heard, Google doesn't run on a single node so my guess is they're probably on to something here.&lt;/p&gt;

&lt;p&gt;By no means is CouchDB trying to directly emulate Google. Many might point at the Map/Reduce frame work and scream the big G. To those people, I say go design a multi-node query system that is simple and robust to errors. One of three things will happen: you'll realize Map/Reduce is an excellent match, you'll write something that is hugely overly complicated that everyone hates working with, or you'll have a huge break through and define an entirely new area of distributed computation.&lt;/p&gt;

&lt;h2&gt;Future Feature Fantasies&lt;/h2&gt;

&lt;p&gt;I am not a core contributor. I speak for no one. I guarantee nothing. But this is my list of features I want bad enough in CouchDB that I'm planning on actually implementing them.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Extend the Map/Reduce framework to include recently published &lt;a href="http://portal.acm.org/citation.cfm?doid=1247480.1247602"&gt;Map/Reduce/Merge&lt;/a&gt; algorithms for distributed relational data processing. M/R/M provides for the possibility of having incrementally computed joins.&lt;/li&gt;
&lt;li&gt;Transparent physical host spanning with fault tolerance as hosts enter and exit the system at random. This part gets a bit tricky. Off the top of my head, it'd require Paxos, some sort of doc-to-node hashing, a minimum copy count algorithm etc.&lt;/li&gt;
&lt;li&gt;Erlang full text indexing. I've spent a lot of time trying many different solutions for full text indexing. The more I work through the different options, I'm becoming more and more convinced that a pure Erlang implementation is going to be required.&lt;/li&gt;
&lt;li&gt;Erlang Plugins. Goes with the full text indexing but also applies to other types of indexing that I'll be needing. Things like nested containment lists etc. A good system that ends up being even easier than FireFox extension installation is going to be necessary. This is a big infrastructure and planning feature so it'll be mostly reliant on community agreement rather than the code.&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;Kool-Aid&lt;/h2&gt;

&lt;p&gt;If it's not obvious, I've drunk my fair share of the CouchDB Kool-Aid. If you're still a bit uncertain, I suggest you take a peek over at Chris Anderson's excellent blog posts on the &lt;a href="http://jchris.mfdz.com/posts/128"&gt;applications&lt;/a&gt; of the &lt;a href="http://jchris.mfdz.com/posts/129"&gt;future&lt;/a&gt;. In my opinion, this is going to turn out to be a Very Big Deal&amp;trade;.&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2008/10/19/couchdb-view-indexing.html</id>
        <title>CouchDB View Indexing</title>
        <link href="http://www.davispj.com/2008/10/19/couchdb-view-indexing.html" />
        <updated>2008-10-19T00:00:00-04:00</updated>
        <content type="html">&lt;h1&gt;CouchDB View Indexing&lt;/h1&gt;

&lt;h2&gt;Note:&lt;/h2&gt;

&lt;p&gt;This was part of an experiment I tried awhile back. There hasn't been any interest in including it in trunk and I don't really think it should be. So this is up for reference, but you probably shouldn't think about using this.&lt;/p&gt;

&lt;h2&gt;Overview&lt;/h2&gt;

&lt;p&gt;This is a fun little patch I wrote today for indexing CouchDB Map/Reduce views. The CouchDB path allows you to setup a process that will listen for Key/Value pairs being added and removed from a view so that you can keep an external index in sync. The patch also exposes a url that you can use to query the view.&lt;/p&gt;

&lt;h2&gt;Installation&lt;/h2&gt;

&lt;p&gt;A real quick nothing-goes-wrong type of installation would be along the lines of the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ cd /usr/local/src/temp
$ git clone git://github.com/davisp/couchdb-lucene.git
$ cd couchdb-lucene
$ git checkout -b view_indexer origin/view_indexer
$ ant
$ cd ../
$ git clone git://github.com/davisp/couchdb.git
$ cd couchdb/trunk
$ git checkout -b index_server origin/index_server
$ ./bootstrap &amp;amp;&amp;amp; ./configure &amp;amp;&amp;amp; make
$ sudo make install
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Config file: &lt;code&gt;/usr/local/etc/couchdb/local.ini&lt;/code&gt;&lt;/h2&gt;

&lt;pre&gt;&lt;code&gt;; CouchDB Configuration Settings

[couchdb]
;max_document_size = 4294967296 ; bytes

[httpd]
;port = 5984
;bind_address = 127.0.0.1

[log]
level = debug

[daemons]
index_servers={couch_index_servers, start_link, []}

[index_servers]
fti=/usr/local/src/temp/couchdb-lucene/bin/couchdb-lucene-index

[httpd_db_handlers]
_index = {couch_httpd_view, handle_index_req}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Pay special attention to line 17 if you changed where you built couchdb-lucene.&lt;/p&gt;

&lt;h2&gt;Example Design Document&lt;/h2&gt;

&lt;pre&gt;&lt;code&gt;{
    "_id": "_design/foo",
    "_rev": "999203757",
    "views": {
        "bar": {
            "map": "function(doc) {emit(doc.id,doc.value);}",
            "index": "fti"
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Querying the View&lt;/h2&gt;

&lt;pre&gt;&lt;code&gt;$ # Generate some data
$ curl http://127.0.0.1:5984/dbname/_index/fti/foo/bar?q="my query"
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Supported Options&lt;/h2&gt;

&lt;p&gt;CouchDB-Lucene View-Indexer supports the following URL parameters:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;q -&gt; Text query to pass to lucene. Supports the full Lucene syntax (ala org.apache.lucene.queryParser.QueryParser)&lt;/li&gt;
&lt;li&gt;count -&gt; Number of documents to return&lt;/li&gt;
&lt;li&gt;skip -&gt; Number of documents to skip&lt;/li&gt;
&lt;/ol&gt;

</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2008/10/19/couchdb-view-index-protocol.html</id>
        <title>CouchDB View Index Protocol</title>
        <link href="http://www.davispj.com/2008/10/19/couchdb-view-index-protocol.html" />
        <updated>2008-10-19T00:00:00-04:00</updated>
        <content type="html">&lt;h1&gt;CouchDB View Index Protocol&lt;/h1&gt;

&lt;h2&gt;Note:&lt;/h2&gt;

&lt;p&gt;This was part of an experiment I tried awhile back. There hasn't been any interest in including it in trunk and I don't really think it should be. So this is up for reference, but you probably shouldn't think about using this.&lt;/p&gt;

&lt;h2&gt;Outline&lt;/h2&gt;

&lt;p&gt;A quick document outlining the line protocol for the couchdb-view-index patch I wrote yesterday.&lt;/p&gt;

&lt;p&gt;There are four types of interactions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reset - Sent when the external process has been grabbed for new processing&lt;/li&gt;
&lt;li&gt;Index - Sent with view rows to add and delete from the index&lt;/li&gt;
&lt;li&gt;Delete - Sent when the view has been reset and things need to start over&lt;/li&gt;
&lt;li&gt;Query - Sent with url query string parameters to query the view.&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;Note&lt;/h2&gt;

&lt;p&gt;This is a line protocol. I have formatted everything here, but in real life all messages and responses (except the query response) must be a single line terminated by a newline character. The query response is dicussed below.&lt;/p&gt;

&lt;h2&gt;Reset&lt;/h2&gt;

&lt;p&gt;Message:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{"action": "reset"}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Response:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;true
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Index&lt;/h2&gt;

&lt;p&gt;Message:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
    "action": "index",
    "db": "db_name",
    "group": "_design/foo",
    "views": ["bar", "baz"],
    "current_seq": 508,
    "new_seq": 602,
    "insert": [
        {"docid": "a", "key": 1, "value": "data here"},
        {"docid": "b", "key": 2, "value": null}
    ],
    "remove": [{"docid": "c", "key": 3}]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;When implementing an indexer you should process the &lt;code&gt;remove&lt;/code&gt; documents first to match the semantics of CouchDB's internal system.&lt;/p&gt;

&lt;p&gt;Response:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;true
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Delete&lt;/h2&gt;

&lt;p&gt;Message:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
    "action": "delete",
    "db": "foo",
    "group": "_design/bar",
    "current_seq": 6
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Response:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;true
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Query&lt;/h2&gt;

&lt;p&gt;Message:&lt;/p&gt;

&lt;p&gt;Given a url of something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl 'http://127.0.0.1:5984/zing/_index/idx_type/foo/bar?q="my query"&amp;amp;count=19&amp;amp;skip=10'
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Anything passed as a URL parameter will be sent to the index process. You must ensure that all query string parameters are valid JSON that can be decoded by CouchDB's JSON module.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
    "action": "query",
    "db": "zing",
    "group": "_design/foo",
    "view": "bar",
    "query": {
        "q": "my query",
        "count": 19,
        "skip": 10
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Response:&lt;/p&gt;

&lt;p&gt;This is where things get complicated. The query response comes in three stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Initialization:

&lt;ul&gt;
&lt;li&gt;All is well:
  &lt;pre&gt;true&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;Or not well:
  &lt;pre&gt;[404, "missing", "not_found"]&lt;/pre&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Streaming:
&lt;pre&gt;
 {"total_rows": 10, "offset": 8, "rows": [
 {"id": "h_docid", "key": 8, "value": "foo"},
 {"id": "i_docid", "key": 9, "value": "bar"}
 {"id": "j_docid", "key": 10, "value": "baz"}
 ]}
&lt;/pre&gt;
 You can stream any free-form json that you want to end up at the client. You should probably support a minimum of the expected view output.&lt;/li&gt;
&lt;li&gt;Termination:
 &lt;pre&gt;\n\0\n&lt;/pre&gt;
 Once you've sent the termination sequence you should not attempt to write anything to stdout until getting the next request.&lt;/li&gt;
&lt;/ol&gt;

</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2008/10/02/couchdb-joins.html</id>
        <title>CouchDB Joins</title>
        <link href="http://www.davispj.com/2008/10/02/couchdb-joins.html" />
        <updated>2008-10-02T00:00:00-04:00</updated>
        <content type="html">&lt;h1&gt;CouchDB Joins&lt;/h1&gt;

&lt;h2&gt;Overview&lt;/h2&gt;

&lt;p&gt;There has been a fairly active &lt;a href="http://mail-archives.apache.org/mod_mbox/incubator-couchdb-user/200810.mbox/%3C2098F155-ECB7-468E-8CA7-8E54F18EE606@groovie.org%3E"&gt;thread&lt;/a&gt; on the couchdb-user about how to join couchdb documents. The authoritative documentation on such things so far has been a blog &lt;a href="http://www.cmlenz.net/archives/2007/10/couchdb-joins"&gt;post&lt;/a&gt; by &lt;a href="http://www.cmlenz.net/"&gt;cmlenz&lt;/a&gt;. Its a very good introduction into the power of collation. Unfortunately a couple issues keep cropping up on the list generally related to avoiding denormalization in some form or another.&lt;/p&gt;

&lt;p&gt;After thinking about the most recent thread I think I've managed to sit down and create a couple views that would fulfill the offered requirements. For reference, a short description of the setup:&lt;/p&gt;

&lt;h2&gt;The Documents&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;User documents: A unique non-changing document_id with a cusotmizable username.&lt;/li&gt;
&lt;li&gt;Comment documents: Contain a reference to the user document id.&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;Goals&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Get a list of users that have comments&lt;/li&gt;
&lt;li&gt;Get a list of users with 5 comments (sorted on some criteria).&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;General Outline&lt;/h2&gt;

&lt;p&gt;All code available &lt;a href="http://www.davispj.com/git/?p=couchdb-examples.git"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Briefly, the idea here is to generate a Map/Reduce view that will list the users with comments and the number of comments per user. This is pretty easy but necessary to allow us to get the list of users with their top five comments while ignoring users with no comments.&lt;/p&gt;

&lt;p&gt;There are two Map/Reduce views we'll be using, "with-comments" and "max-comments". "with-comments" will be responsible for listing the users with the number of their comments. Users with zero comments will not appear in this view. By paging through this view will allow us to walk through the "max-comments" view getting the data we need to display a listing of users with their top N comments. (Sorry, N not configurable at query time.)&lt;/p&gt;

&lt;p&gt;I'm using couchview from &lt;a href="http://github.com/jchris/couchrest/tree/master"&gt;couchrest&lt;/a&gt; to manage my views. Hence the '-map' and '-reduce' suffices on each view name. Both Map/Reduce views are assumed to be placed in a "users" design doc.&lt;/p&gt;

&lt;h2&gt;Generating data&lt;/h2&gt;

&lt;p&gt;First things first, here's a script to load some data into a db named "test" on the local host:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#! /usr/bin/env python

import random
import uuid

import couchdb

server = couchdb.Server('http://localhost:5984')
if 'test' not in server:
    server.create('test')
db = server['test']

def new_uuid():
    return uuid.uuid4().hex.upper()

users = 1000
docs = []
for i in range(users):
    uid = new_uuid()
    docs.append({'_id': uid, 'type': 'user', 'name': 'user-%06d' % i})
    for c in range(random.randint(0,25)):
        cid = new_uuid()
        docs.append({'_id': cid, 'type': 'comment', 'user_id': uid,
                    'position': c, 'text': "Comment %d %d" % (i, c)})

db.update(docs)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So, here we load 1000 users and a random number of comments (0 to 25). Its pretty straight forward. You'll notice it requires the &lt;a href="http://code.google.com/p/couchdb-python/"&gt;couchdb-python&lt;/a&gt; module.&lt;/p&gt;

&lt;h2&gt;Use the group_level, Luke&lt;/h2&gt;

&lt;p&gt;Remember, the default behavior of CouchDB's reduce is to produce a single value. Reduces work by creating a 'shadow' tree on top of the view b+tree. A good description (with pretty pictures) is this &lt;a href="http://horicky.blogspot.com/2008/10/couchdb-implementation.html"&gt;article&lt;/a&gt; by Ricky Ho from Adobe. So in order to produce a reduce result that has multiple rows we must include a group=true or group_level=N query parameter.&lt;/p&gt;

&lt;p&gt;The group=true parameter means that for every unique key, we will get a single reduce value. If we have a view that has keys that are JSON arrays, we can use group_level=N to reduce arrays that have N identical prefix elements. Ie:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;for i in range(N):
    key1[i] == key2[i]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Since each of the reduces below is going to want multiple rows from the reduce functions each will use a group_level=1 to group keys for reduction.&lt;/p&gt;

&lt;h2&gt;Users with Number of Comments&lt;/h2&gt;

&lt;p&gt;This view is straight forward. Any comments are emitted, and then reduced to a sum. Notice users with zero comments will not show up in the reduce because only rows for each comment are emit()'ed.&lt;/p&gt;

&lt;h2&gt;with-comments-map.js&lt;/h2&gt;

&lt;pre&gt;&lt;code&gt;function(doc)
{
    if(doc.type == "comment")
    {
        emit([doc.user_id, doc._id], 1);
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;with-comments-reduce.js&lt;/h2&gt;

&lt;pre&gt;&lt;code&gt;function(keys, values)
{
    return sum(values) ;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Users with Top N Comments&lt;/h2&gt;

&lt;p&gt;This is where things get a bit complicated.&lt;/p&gt;

&lt;p&gt;The Map portion is fairly straight forward. For each comment we emit a user id, -1 pair. The -1 is to indicate that this is a user document. Thinking about it now, this is an artifact of an earlier iteration, but any value that is outside the range of comment id's would be acceptable. (Assuming you change the reduce appropriately)&lt;/p&gt;

&lt;p&gt;The reduce function is a bit of wild one. The basic idea is that we're going to take a set of values and condense them into an object that lists the user_id and top N comments. We have to be careful on re-reduce steps that we can merge these summary objects correctly. A more detailed discussion follows the code.&lt;/p&gt;

&lt;p&gt;max-comments-map.js:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function(doc)
{
    if(doc.type == "user")
    {
        emit([doc._id, -1], doc.name);
    }
    else if(doc.type == "comment")
    {
        emit([doc.user_id, doc._id], [doc.position, doc.text]);
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;max-comments-reduce.js:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function(keys, values, rereduce)
{
    var MAX = 5 ;
    var obj = null ;

    function sort_comments(a, b)
    {
        return a[0] - b[0] ;
    }

    function add_comment(current, to_add)
    {
        current[current.length] = to_add ;
        current.sort(sort_comments) ;
        log("Current: " + current.toSource()) ;
        return current.slice(0,MAX) ;
    }

    try
    {
        if(rereduce)
        {
            for(var i = 0 ; i &amp;lt; values.length ; i++)
            {
                if(typeof(values[i]) == 'object' &amp;amp;&amp;amp; values[i].type == 'reduction')
                {
                    if(obj == null)
                    {
                        obj = values[i] ;
                    }
                    else
                    {
                        if(obj.user_id == null)
                        {
                            obj.user_id = values[i].user_id ;
                        }

                        for(var j = 0 ; j &amp;lt; values[i].comments.length ; j++)
                        {
                            obj.comments = add_comment(obj.comments, values[i].comments[j]) ;
                        }
                    }
                }
            }
        }

        if(obj == null)
        {
            obj = {"comments": [], 'type': 'reduction', 'user_id': null} ;
        }

        if(keys == null)
        {
            return obj ;
        }

        for(var i = 0 ; i &amp;lt; keys.length ; i++)
        {
            if(typeof(values[i]) == 'object' &amp;amp;&amp;amp; values[i].type == 'reduction')
            {
                continue ;
            }

            if(keys[i][0][1] == -1)
            {
                obj.user_id = values[i] ;
            }
            else
            {
                obj.comments = add_comment(obj.comments, values[i]) ;
            }
        }
    }
    catch(e)
    {
        log("Error: " + obj + " Rereduce: " + rereduce) ;
        log(e.stack) ;
    }

    return obj ;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;First we define a couple functions for merging lists of comments. This is a fairly constrained function. The returned comments array must not grow quickly. And quickly includes linear relative to the number of documents. Ie, if you just append every comment to this array it most likely will not work. By keeping just a maximum number of rows we're alleviating that concern.&lt;/p&gt;

&lt;p&gt;The second major section (lines 19-45) deals with when we're running a rereduce. Rereduce is important because we may be consuming the output of this function. So here we go through and merge all reduction objects we returned earlier. The biggest gotchya in this code is that user_id only exists on one reduction object initially (because we only emit()'ed it once). So we need to be sure we move it up as things are rereduced.&lt;/p&gt;

&lt;p&gt;In the third section (lines 57-73) we are dealing with the original Map output. Ignoring a couple checks we're basically testing if the row was emit()'ed, then depending on the result adding the user_id or merging a new comment.&lt;/p&gt;

&lt;p&gt;Hopefully that's all clear.&lt;/p&gt;

&lt;h2&gt;Using the Views&lt;/h2&gt;

&lt;p&gt;So the general idea here is to page through the with-comments reduce view and use the first and last user_id to get the data we need to page through the max-comments view.&lt;/p&gt;

&lt;p&gt;Paging through the with-comments view as per normal suggestion, we would start like such:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl 'http://localhost:5984/test/_view/users/with-comments-reduce?group_level=1&amp;amp;count=10'
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Using the first and last row from this query, we have enough information to get the appropriate range in our max-comments view using the &lt;code&gt;startkey&lt;/code&gt; and &lt;code&gt;endkey&lt;/code&gt; query parameters with a GET request like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl 'http://localhost:5984/test/_view/users/max-comments-reduce?group_level=1&amp;amp;startkey=`blah`&amp;amp;endkey=`foo`'
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Keeping in mind that we would have to filter any returned results that had zero comments.&lt;/p&gt;

&lt;h2&gt;Feedback&lt;/h2&gt;

&lt;p&gt;Hopefully this works for people other than me. If you try it out and have questions or comments feel free to email me &lt;a href="mailto:paul.joseph.davis@gmail.com"&gt;here&lt;/a&gt;.&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2008/09/25/introducing-efti.html</id>
        <title>Introducing EFTI - Full Text Indexing in Erlang</title>
        <link href="http://www.davispj.com/2008/09/25/introducing-efti.html" />
        <updated>2008-09-25T00:00:00-04:00</updated>
        <content type="html">&lt;h1&gt;Introducing EFTI - Full Text Indexing in Erlang&lt;/h1&gt;

&lt;h2&gt;Overview&lt;/h2&gt;

&lt;p&gt;I've just posted the first version of &lt;a href="http://github.com/davisp/efti"&gt;EFTI&lt;/a&gt; (Full Text Indexing in &lt;a href="http://erlang.org"&gt;Erlang&lt;/a&gt;) project on &lt;a href="http://github.com"&gt;github&lt;/a&gt;. Its a first stab/proof of concept project, so don't laugh too hard if you look at it. As per suggestion of &lt;a href="http://jchris.mfdz.com/"&gt;Chris Anderson&lt;/a&gt; I'm sitting down to write a hopefully explanatory post on the design decisions etc.&lt;/p&gt;

&lt;h2&gt;Indexing&lt;/h2&gt;

&lt;p&gt;The indexing is based on an &lt;a href="http://en.wikipedia.org/wiki/Inverted_index"&gt;inverted index&lt;/a&gt; using &lt;a href="http://incubator.apache.org/couchdb/"&gt;CouchDB's&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/B-tree"&gt;b-tree&lt;/a&gt; implementation. I chose using the b-tree code instead of &lt;a href="http://www.erlang.org/doc/apps/mnesia/index.html"&gt;Mnesia&lt;/a&gt; due to the fact that I'm aiming to integrate this into CouchDB as a plugin indexer (The actual plugin framework doesn't exist yet, but &lt;a href="http://damienkatz.net/"&gt;Damien Katz&lt;/a&gt; is working on the necessary refactoring to make this a breeze).&lt;/p&gt;

&lt;h2&gt;Index Steps&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Tokenize the input document&lt;/li&gt;
&lt;li&gt;Apply processor list to each token in the document&lt;/li&gt;
&lt;li&gt;Store each processed token with the list of positions it occurs at in the inverted index.&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;The tokenizer and list of processors applied to the input document are both configurable. Currently I have two tokenizers that support either splitting tokens on any whitespace element or splitting tokens on any character not provided in a custom alphabet.&lt;/p&gt;

&lt;p&gt;The processors that I've been using include a length threshold, ignore words list, and Porter &lt;a href="http://en.wikipedia.org/wiki/Stemming"&gt;stemming&lt;/a&gt;. I borrowed the Erlang Porter stemmer implementation from &lt;a href="mailto:alden.dima@nist.gov"&gt;Alden Dima&lt;/a&gt; which can be found &lt;a href="http://tartarus.org/~martin/PorterStemmer/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;An &lt;a href="http://github.com/davisp/efti/tree/master/index.conf"&gt;example index configuration&lt;/a&gt; is available.&lt;/p&gt;

&lt;h2&gt;Querying&lt;/h2&gt;

&lt;p&gt;Querying is a bit more complex. First, an example configuration so we have some idea what we're talking about.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{config, reader, nil}.
{runner, plain, nil}.
{tokenizer, whitespace, nil}.
{processor, word_length, 3}.
{processor, stemmer, {en_us, nil}}.
{matcher, trigram, {0.5, 1.0}}.
{rank, hit_count, nil}.
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Query Steps&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Parse the input query into clauses&lt;/li&gt;
&lt;li&gt;For each clause:

&lt;ol&gt;
&lt;li&gt;Tokenize&lt;/li&gt;
&lt;li&gt;Apply list of processors&lt;/li&gt;
&lt;li&gt;Match processed tokens to database&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;Combine matches to the database&lt;/li&gt;
&lt;li&gt;Rank results&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;Referencing our example config above we can pretty much see how these things might be configured. The only thing that's a bit tricky is the fact that the runner is responsible for both the clause and the combine steps. This is to eventually support boolean logic in queries. The code that split the query should be associated with the code that combines the results etc.&lt;/p&gt;

&lt;h2&gt;Notes/Caveats&lt;/h2&gt;

&lt;p&gt;Obviously there's a lot of work left. Adding support for more powerful processors, tokenizers, runners, matchers, and rankers. As well as optimizing the common internals.&lt;/p&gt;

&lt;p&gt;Also, I need to work on adding things in like OTP gen_server stuffs. Right now its serial execution. Yes. Serial in Erlang. I don't have much experience with the OTP stuff and what not, but that's on the agenda after a few more tweaks.&lt;/p&gt;

&lt;p&gt;Also, the build system blows. Deal with it. I'll get around to making actual driver programs before too long.&lt;/p&gt;

&lt;h2&gt;Request for Comment (Plea for Help)&lt;/h2&gt;

&lt;p&gt;If you look at &lt;a href="http://github.com/davisp/efti/tree/master/src/efti/efti_query.erl"&gt;efti_query.erl&lt;/a&gt; you'll notice that it would be prohibitively expensive for a large dataset. As it is now all results are stored into a dict keyed by {document id, query_word, index_word} with position information as a value. This will eventually blow up trying to store an entire result set in memory I imagine.&lt;/p&gt;

&lt;p&gt;Thing is, I have no idea on how people stream and/or rank results without doing something similar. Obviously, I could just hold the document id's in memory and go back to disk for position information if/when needed, but that just delays the inevitable of getting to the point where too many document id's are stored in memory.&lt;/p&gt;

&lt;p&gt;If anyone out there has any idea on how to design this part of the system to Not Suck &amp;trade; I would be much obliged. Keep in mind that ideally we would be able to stream and seek through the result set to make things down the road more efficient. (Think index set math in CouchDB)&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2008/09/14/broken-bioinformatics.html</id>
        <title>Broken Bioinformatics Formats. JSON FTW.</title>
        <link href="http://www.davispj.com/2008/09/14/broken-bioinformatics.html" />
        <updated>2008-09-14T00:00:00-04:00</updated>
        <content type="html">&lt;h1&gt;Broken Bioinformatics Formats. JSON FTW.&lt;/h1&gt;

&lt;h2&gt;Overview&lt;/h2&gt;

&lt;p&gt;As a recently anointed bioinformaticist I have to come out and say it. Bioinformatics has some fucking shit file formats. As an entire discipline we need to think about the future and move away from the horrible state in which data is regularly produced and consumed by software. My vote is for JSON. Hopefully I can convince a few brave souls to jump on board.&lt;/p&gt;

&lt;h2&gt;Broken file formats&lt;/h2&gt;

&lt;p&gt;Current formats are broken. If you don't believe me, first go write a &lt;a href="ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt"&gt;GenBank&lt;/a&gt; parser and then read the three specifications of the General Feature Format (&lt;a href="http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml"&gt;GFF&lt;/a&gt;). And those are just the tip of the iceberg. If that hasn't irritated the crap out of you, write a &lt;a href="http://blast.ncbi.nlm.nih.gov/Blast.cgi"&gt;Blast&lt;/a&gt; result parser. I'll even be nice and suggest the XML output. By now I should have majority convinced. And if not, well then maybe I should think about a different career before I commit &lt;a href="http://en.wikipedia.org/wiki/Seppuku"&gt;seppuku&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Badly designed formats (Now, not then)&lt;/h2&gt;

&lt;p&gt;Lets face it. Each of these formats was poorly designed. And I say this with all possible respect, but each of these formats was designed with human consumption in mind. And that's just wrong. What's that you say? These formats were designed years ago when they were viewed directly by humans? I don't fucking care. This is the world of today not 10 or 15 years ago. Scientists don't normally view results in textual formats any more. They view them in terms of markup or graphical displays from some desktop application.&lt;/p&gt;

&lt;h2&gt;Recent pushes towards XML&lt;/h2&gt;

&lt;p&gt;I've noticed that EBI appears to be doing alot of work on distributing &lt;a href="http://en.wikipedia.org/wiki/XML"&gt;XML&lt;/a&gt;. XML sucks. I used to like XML. Now it just irritates the crap out of me. It all started when I tried researching &lt;a href="http://en.wikipedia.org/wiki/Document_Type_Definition"&gt;DTD&lt;/a&gt;, &lt;a href="http://relaxng.org/"&gt;RelaxNG&lt;/a&gt;, and &lt;a href="http://www.w3.org/XML/Schema"&gt;XSD&lt;/a&gt;. Go read up on those a bit. They'll make you want to punch something. And its not that they're intrinsically bad at what they do. Its what they do is intrinsically bad (for applications in biology, more on that in another post).&lt;/p&gt;

&lt;p&gt;Fact is though, XML has failed to take hold. Its overly complicated to deal with for lots of the small tasks that are currently common for lots of bioinformaticists. We live in a land of one off scripts for testing random ideas. Quick development to test approaches etc. Sitting down and writing a decent parser for each highly repetitive yet slightly different task gets old quick.&lt;/p&gt;

&lt;p&gt;XML is just too heavy for bioinformatics.&lt;/p&gt;

&lt;h2&gt;Libraries! Use the libraries!&lt;/h2&gt;

&lt;p&gt;This is a misnomer. Why not hide the ugliness of the data behind a nice library interface? I'm a purist and throwing the bathroom mat over the puddle of vomit doesn't change the fact you had one or ten too many tequila shots. This only works as long as the libraries exist. And these huge lumbering libraries just don't keep up with every language. I mean, how many BioErlang or BioLisp (I should probably google those, but lets just assume they don't exist for now...) libraries have you evaluated lately?&lt;/p&gt;

&lt;p&gt;Think how much better our libraries would be if we didn't have to deal with these horrible data formats. Think about how using a standardized data format would allow any tool in any language to communicate without forcing data through some obtuse outdated format.&lt;/p&gt;

&lt;h2&gt;JSON - No seriously. JSON&lt;/h2&gt;

&lt;p&gt;I used to think &lt;a href="http://www.json.org/"&gt;JSON&lt;/a&gt; was a weird little cousin of the other markup languages. Turns out, it is the weird little cousin of the markup languages but in a slightly less deformed way. Its a dead simple format that has language bindings in practically every language. Its a public specification so for those languages that don't have bindings, writing new ones would be fairly straight forward.&lt;/p&gt;

&lt;p&gt;Seriously. Those of you feeling underwhelmed think a bit harder on it. Go toy with it in your language of choice. Keep in mind all the brilliant possibilities that having a common file format would open up. Imagine how easy it'd be to design large customizable pipelines for shuffling JSON documents around.&lt;/p&gt;

&lt;p&gt;Obviously I'm no dreamer. Getting JSON to actually take hold would be incredibly difficult. Official committees and standards bodies would be involved. But I think if we poke around at the idea and start with a few people designing JSON dependent libraries and tools we can spread JSON like a virus through the bioinformatics world.&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2008/09/14/python-in-bioinformatics.html</id>
        <title>Python in Bioinformatics</title>
        <link href="http://www.davispj.com/2008/09/14/python-in-bioinformatics.html" />
        <updated>2008-09-14T00:00:00-04:00</updated>
        <content type="html">&lt;h1&gt;Python in Bioinformatics&lt;/h1&gt;

&lt;h2&gt;Overview&lt;/h2&gt;

&lt;p&gt;Just read another interesting &lt;a href="http://ivory.idyll.org/blog/sep-08/the-future-of-bioinformatics-part-1a.html"&gt;article&lt;/a&gt; by &lt;a href="http://ivory.idyll.org/"&gt;Titus Brown&lt;/a&gt; referencing an earlier &lt;a href="http://igotgenes.blogspot.com/2008/08/not-biopythonista-i-thought-id-be.html"&gt;post&lt;/a&gt; I'd read that lamented the woes of why Python hasn't taken over the bioinformatics world. I'd spent some time reflecting on the fact why Perl seems to have gained such dominance in the world of bioinformatics.&lt;/p&gt;

&lt;p&gt;Titus argued that alot of it had to deal with Lincoln Stein getting things rolling early on in the life of BioPerl. First I have to admit that I don't know a whole lot about the history of the rise of any of the Bio* projects. But I can't say that I see one person being the main cause for such an industrial/social rise of BioPerl in particular.&lt;/p&gt;

&lt;p&gt;Titus does make an observation I agree with though:&lt;/p&gt;

&lt;p&gt;bq. However, I think the tide is shifting away from Perl: from the not-so-imminent release of a complex, backwardsly-incompatible Perl 6, to the massive quantities of completely non-reusable Perl code that have been flung in every direction, people are starting to get sick of Perl. also, a lot of people in academia are moving towards Python for bioinformatics, if not in a very coordinated way&lt;/p&gt;

&lt;p&gt;This observation (although indirect) is at the very core of my theory on why BioPerl rose to dominance.&lt;/p&gt;

&lt;h2&gt;Namespacing&lt;/h2&gt;

&lt;p&gt;If you think about it the ideas behind namespacing in Perl and pretty much any other language are quite different. Perl uses an &lt;em&gt;ad hoc&lt;/em&gt; namespacing system that allows anyone to contribute to particular areas of a given 'project'.&lt;/p&gt;

&lt;p&gt;Or think of it this way. For me to have code accepted into any of the other Bio* projects, I have to go through the rigamarole of submitting patches to core developers. This process involves not only writing the code, but navigating project conventions, politics, and other random hurdles.&lt;/p&gt;

&lt;p&gt;To contribute to BioPerl, I write some code and upload it to CPAN.&lt;/p&gt;

&lt;p&gt;Without devolving into a rant on why I hate Perl and this model, I'll just end it there. My theory on BioPerl's gargantuan size has more to do with its easier conglomeration of subprojects into the overall Bio::* namespace.&lt;/p&gt;

&lt;h2&gt;Brief note&lt;/h2&gt;

&lt;p&gt;James Casbon mentioned that python supports namespace packages to allow a similar type of developmental style. I knew of these via paste and they irritate me to no end, but its a fair point.&lt;/p&gt;

&lt;p&gt;Two things to note though. The first mention of namespaces in the change log is version 0.5a9. Googling for "setuptools 0.5a9" returns results from 2005. I'd be willing to say BioPerl was dominant before 2005 and perhaps even more dominant than it is now.&lt;/p&gt;

&lt;p&gt;Secondly, the docs on "namespace packaging":ns_packaging highlight the sad state of Python package management. But that's a whole different story.&lt;/p&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2008/08/31/webby-post-commit.html</id>
        <title>Webby Post-Commit Hook for SVN</title>
        <link href="http://www.davispj.com/2008/08/31/webby-post-commit.html" />
        <updated>2008-08-31T00:00:00-04:00</updated>
        <content type="html">&lt;h1&gt;Webby Post-Commit Hook for SVN&lt;/h1&gt;

&lt;h2&gt;Overview&lt;/h2&gt;

&lt;p&gt;Just finished setting up a post-commit hook for &lt;a href="http://subversion.tigris.org/"&gt;subversion&lt;/a&gt; and thought I'd post it. Its fairly simple. It does emailing etc. when things break.&lt;/p&gt;

&lt;p&gt;Basically, it just listens to svn commits and checks to see if it the specified webby directory was modified, and if so clobber-builds the site and copies the built version to your web root.&lt;/p&gt;

&lt;p&gt;Its not very fantastic, but it appears to get the job done.&lt;/p&gt;

&lt;p&gt;Download the script &lt;a href="http://www.davispj.com/git/?p=webby-svn-hooks.git"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#! /usr/bin/env python

import os
import sys
import subprocess as sp
import traceback

# Arguments for post-commit
PATH = sys.argv[1]
REV = sys.argv[2]

# Email Setup
SENDMAIL = "/usr/sbin/sendmail"
RECIPIENTS = ["paul.joseph.davis@gmail.com"]
FROM = "davisp@cavernum.net"

# SVN Setup
SVN = "/usr/bin/svn"
SVNLOOK = "/usr/bin/svnlook"
WATCH = "www/root"

# Website Setup
WEB_ROOT = "/var/www/davispj.com/public"

# Webby Config
WEBBY = "/var/lib/gems/1.8/bin/webby"
WEBBY_ROOT = "/var/www/davispj.com/root"

# System Utils
COPY = "/bin/cp"

class SPError(Exception):
    def __init__(self, command, returncode, stdout, stderr):
        self.command = command
        self.returncode = returncode
        self.stdout = stdout
        self.stderr = stderr
    def __str__(self):
        return repr(self)
    def __repr__(self):
        mesg = """
            COMMAND: %s
            RETURNCODE: %s
            STDOUT:
            %s
            STDERR:
            %s
        """
        mesg = '\n'.join([t.lstrip() for t in mesg.strip().split('\n')])
        return mesg % (self.command, self.returncode, self.stdout, self.stderr)

def spcall(command, shell=False, input=None):
    pipe = sp.Popen(command, shell=shell, stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE)
    (stdout, stderr) = pipe.communicate(input=input)
    if pipe.returncode != 0:
        if isinstance(command, list):
            command = ' '.join(command)
        raise SPError(command, pipe.returncode, stdout, stderr)
    return (stdout, stderr)

def email(subj, mesg):
    fmt = """
        TO: %s
        FROM: %s
        SUBJECT: %s

        %s
    """
    fmt = '\n'.join([t.lstrip() for t in fmt.strip().split('\n')])
    content = fmt % (','.join(RECIPIENTS), FROM, subj, mesg)
    command = [SENDMAIL]
    command.extend(RECIPIENTS)
    spcall(command, input=content)

def site_changed():
    (stdout, stderr) = spcall([SVNLOOK, "dirs-changed", PATH])
    for line in stdout.split('\n'):
        if line.startswith(WATCH):
            return True
    return False

def main():
    try:
        if not site_changed():
            print "No change."
            exit(0)
        os.chdir(WEBBY_ROOT)
        spcall([SVN, 'up'])
        spcall([WEBBY, 'clobber'])
        spcall([WEBBY, 'build'])
        spcall(' '.join([COPY, '-r', os.path.join(WEBBY_ROOT, 'output', '*'), WEB_ROOT]), shell=True)
    except SPError, inst:
        email("Post-Commit Subprocess Failure", repr(inst))
    except:
        mesg = traceback.format_exc()
        email("Post-Commit Unknown Failure", mesg)

if __name__ == '__main__':
    main()
&lt;/code&gt;&lt;/pre&gt;
</content>
    </entry>
    
    <entry>
        <id>http://www.davispj.com/2008/08/29/first-post.html</id>
        <title>First Post</title>
        <link href="http://www.davispj.com/2008/08/29/first-post.html" />
        <updated>2008-08-29T00:00:00-04:00</updated>
        <content type="html">&lt;h1&gt;First Post&lt;/h1&gt;

&lt;h2&gt;Overview&lt;/h2&gt;

&lt;p&gt;So. This is it. I'm blogging. Yay me being Web 2.0 compliant.&lt;/p&gt;

&lt;p&gt;This is mainly a place for me to post writings on work. Work = programming. By induction... fuck it. I'm gonna write about programming stuffs. Things that make me giggle and yell and kick things. So, that's where we're going. Giggling psychotic. If you'll bear with me long enough, we'll eventually get to the technical stuff.&lt;/p&gt;

&lt;h2&gt;The technical stuff&lt;/h2&gt;

&lt;p&gt;So for those of you that care I've built this blog around &lt;a href="http://webby.rubyforge.org"&gt;Webby&lt;/a&gt; which is an awesome little tool for generating static content sites. &amp;lt;rant&amp;gt; Notice I said static content sites. No comments on this blog. Comments are a &lt;a href="http://xkcd.com/202/"&gt;waste&lt;/a&gt;. If you want to comment on something, go watch a &lt;a href="http://www.youtube.com"&gt;video&lt;/a&gt; or start you're own damn &lt;a href="http://www.yourdamnblog.com"&gt;blog&lt;/a&gt;. &amp;lt;/rant&amp;gt;&lt;/p&gt;

&lt;p&gt;Basically, that's all there is. I bask in it's simplicity. KISS and all that stuff. It was a long journey through the world of python web frameworks (including a couple obligatory frameworks of my own). Turns out I don't need to have dynamicity in a simple little blog. So, I'll leave you feeling wholly unsatisfied with my first technical stuffs content.&lt;/p&gt;

&lt;p&gt;Cheers!&lt;/p&gt;
</content>
    </entry>
    

</feed>
