Posts tagged "clojure":

17 Nov 2011

Hack OTB on the Clojure REPL

The WHY and the WHAT

Lately, I have been playing with several configurations in order to be able to fully exploit the ORFEO Toolbox library from Clojure.

As explained in previous posts, I am using OTB through the Java bindings. Recently, the folks of the OTB team have made very cool things – the application wrappers – which enrich even further the ways of using OTB from third party applications. The idea of the application wrappers is that when you code a processing chain, you may want to use it on the command line, with a GUI or even as a class within another application.

What Julien and the others have done is to build an application engine which given a pipeline and its description (inputs, outputs and parameters) is able to generate a CLI interface and a Qt-based GUI interface. But once you are there, there is no major problem (well, if your name is Julien, at least) to generate a SWIG interface which will then allow to build bindings for the different languages supported by SWIG. This may seem a kitchen sink – usine à gaz in French – and actually it is if you don't pay attention (like me).

The application wrappers use advanced techniques which require a recent version of GCC. At least recent from a Debian point of view (GCC 4.5 or later). On the other hand, the bindings for the library (the OTB-Wrapping package) use CableSwig, which in turn uses its own SWIG interfaces. So if you want to be able to combine the application wrappers with the library bindings, odds are that you may end up with a little mess in terms of compiler versions, etc.

And this is exactly what happened to me when I had to build GCC from source in Debian Squeeze (no GCC 4.5 in the backports yet). So I started making some experiments with virtual machines in order to understand what was happening. Eventually I found that the simplest configuration was the best to sort things out. This is the HOW-TO for having a working installation of OTB with application wrappers, OTB-Wrapping and access everything from a Clojure REPL.

The HOW-TO

Installing Arch Linux on a virtual machine

I have chosen Arch Linux because with few steps you can have a minimal Linux install. Just grab the net-install ISO and create a new virtual machine on VirtualBox. In terms of packages just get the minimal set (the default selection, don't add anything else). After the install, log in as root, and perform a system update with:

    pacman -Suy

If you used the net-install image, there should be no need for this step, but in case you used an ISO with the full system, it's better to make sure that your system is up to date. Then, create a user account and set up a password if you want to:

    useradd -m jordi
    passwd jordi

Again, this is not mandatory, but things are cleaner this way, and you may want to keep on using this virtual machine. Don't forget to add your newly created user to the sudoers file:

    pacman -S sudo
    vi /etc/sudoers

You can now log out and log in with the user name you created. Oh, yes, I forgot to point out that you don't have Xorg, or Gnome or anything graphical. Even the mouse is useless, but since you are a real hacker, you don't mind.

Install the packages you need

Now the fun starts. You are going to keep things minimal and only install the things you really need.

    sudo pacman -S mercurial make cmake gcc gdal openjdk6 \
                   mesa swig cvs bison links unzip rlwrap

Yes, I know, Mesa (OpenGL) is useless without X, but we want to check that everything builds OK and OTB uses OpenGL for displaying images.

Mercurial is needed to get the OTB sources, cvs is needed for getting CableSwig
Make, cmake and gcc are the tools for building from sources
Swig is needed for the binding generation, and bison is needed by CableSwig
OpenJDK is our Java version of choice
Links will be used to grab the Clojure distro, unzip to extract the corresponding jar file and rlwrap is used by the Clojure REPL.

And that's all!

Get the source code for OTB and friends

I prefer using the development version of OTB, so I can complain when things don't work.

    hg clone http://hg.orfeo-toolbox.org/OTB
    hg clone http://hg.orfeo-toolbox.org/OTB-Wrapping

CableSwig is only available through CVS.

    cvs -d :pserver:anonymous@pubilc.kitware.com:/cvsroot/CableSwig \
            co CableSwig

Start building everything

We will compile everything on a separate directory:

    mkdir builds
    cd builds/

OTB and OTB-Wrapping

We create a directory for the OTB build and configure with CMake

    mkdir OTB
    cd OTB
    ccmake ../../OTB

Don't forget to set to ON the application build and the Java wrappers. Then just make (literally):

    make

By the way, I have noticed that the compilation of the application engine can blow up your gcc if you don't allocate enough RAM for your virtual machine. At this point, you should be able to use the Java application wrappers. But we want also the library bindings so we gon on. We can now build CableSwig which will be needed by OTB-Wrapping. Same procedure as before:

    cd ../
    mkdir CableSwig
    cd CableSwig/
    ccmake ../../CableSwig/
    make

And now, OTB-Wrapping. Same story:

    cd ../
    mkdir OTB-Wrapping
    cd OTB-Wrapping/
    ccmake ../../OTB-Wrapping/

In the cmake configuration, I choose only to build Java, but even in this case a Python interpreter is needed. I think that CableSwig needs it to generate XML code to represent the C++ class hierarchies. If you did not install Python explicitly in Arch, you will have by default a 2.7 version. This is OK. If you decided to install Python with pacman, you will have both, Python 2.7 and 3.2 and the default Python executable will point to the latter. In this case, don't forget set the PYTHON\_EXECUTABLE in CMake to /usr/bin/python2. Then, just make and cd to your home directory.

    make
    cd

And you are done. Well not really. Right now, you can do Java, but what's the point? You might as well use the C++ version, right?

Land of Lisp

Since Lisp is the best language out there, and OTB is the best remote sensing image processing software (no reference here, but trust me), we'll do OTB in Lisp.

PWOB

You may want to get some examples that I have gathered on Bitbucket.

    mkdir Dev
    cd Dev/
    hg clone http://bitbucket.org/inglada/pwob

PWOB stands for Playing With OTB Bindings. I have only put there 3 languages which run on the JVM for reasons I stated in a previous post. You will of course avoid the plain Java ones. I have mixed feelings about Scala. I definetly love Clojure since it is a Lisp.

Get Clojure

The cool thing about this Lisp implementation is that it is contained into a jar file. You can get it with the text-based web browser links:

    links http://clojure.org

Go to the download page and grab the latest release. It is a zip file which contains, among other things the needed jar. You can unzip the file:

    mkdir src
    mv clojure-1.3.0.zip src/
    cd src/
    unzip clojure-1.3.0.zip

Copy the jar file to the user .clojure dir:

    cd clojure-1.3.0
    mkdir ~/.clojure
    mv clojure-1.3.0.jar ~/.clojure/

Make a sym link so we have a clojure.jar:

    ln -s /home/inglada/.clojure/clojure-1.3.0.jar /home/inglada/.clojure/clojure.jar

And clean up useless things

    cd ..
    rm -rf clojure-1.3.0*

Final steps before hacking

Some final minor steps are needed before the fun starts. You may want to create a file for the REPL to store completions:

    touch ~/.clj_completions

In order for Clojure to find all the jars and shared libraries, you have to define some environment variables. You may choose to set them into your .bashrc file:

    export LD_LIBRARY_PATH="~/builds/OTB-Wrapping/lib/:~/builds/OTB/bin/"
    export ITK_AUTOLOAD_PATH="~/builds/OTB/bin/"

PWOB provides a script to run a Clojure REPL with everything set up:

    cd ~/Dev/pwob/Clojure/src/Pwob
    ./otb-repl.sh

Now you should see something like this:

    Clojure 1.3.0
    user=>

Welcome to Lisp! If you want to use the applications you can for instance do:

    (import '(org.otb.application Registry))
    (def available-applications (Registry/GetAvailableApplications))

In the PWOB source tree you will find other examples.

Final remarks

Using the REPL is fun, but you will soon need to store the lines of code, test things, debug, etc. In this case, the best choice is to use SLIME, the Superior Lisp Interaction Mode for Emacs. There are many tutorials on the net on how to set it up for Clojure. Search for it using DuckDuckGo. In the PWOB tree (classpath.clj) you will find some hints on how to set it up for OTB and Clojure. A simpler config for Emacs is to use the inferior lisp mode, for which I have also written a config (otb-clojure-config.el). I may write a post some day about that. Have fun!

20 May 2011

Lambda Omega Lambda

In my current research work, I am investigating the possibility of integrating domain expert knowledge together with image processing techniques (segmentation, classification) in order to provide accurate land cover maps from remote sensing image time series. When we look at this in terms of tools (software) there are a set of requirements which have to be met in order to be able to go from toy problems to operational processing chains. These operational chains are needed by scientists for carbon and water cycle studies, climate change analysis, etc. In the coming years, the image data needed for this kind of near-real-time mapping will be available through Earth observation missions like ESA's Sentinel program. Briefly, the constraints that the software tools for this task have are the following.

Interactive environment for exploratory analysis
Concurrent/parallel processing
Availability of state of the art, validated image processing algorithms
Symbolic, semantic information processing

This long post / short article describes some of the thoughts I have and some of the conclusions I have come to after a rather long analysis of the tools available.

Need for a repl

The repl acronym stands for Read-Evaluate-Print-Loop and I think has its origins in the first interactive Lisp systems. Nowadays, we call this an interactive interpreter. People used to numerical and/or scientific computing will think about Matlab, IDL, Octave, Scilab and what not. Computer scientists will see here the Python/Ruby/Tcl/etc. interpreters. Many people use nowadays Pyhton+Numpy or Scipy (now gathered in Pylab) in order to have something similar to Matlab/IDL/Scilab/Octave, but with a general purpose programming language. What a repl allows is to interactively explore the problem at hand without going through the cycle of writing the code, compiling, running etc. The simple script that I wrote for the OTB blog using OTB's Python bindings could be typed on the interpreter and bring together OTB and Pylab.

Need for concurrent programming

Without going into the details and differences between parallel and concurrent programming, it seems clear that Moore's law can only continue to hold through multi-core architectures. In terms of low-level (pixel-wise) processing, OTB provides an interesting solution (multi-threaded execution of filters by dividing images into chunks). This approach can be generalized to GPUs. However, sometimes an algorithm needs to operate on the whole image because the image splitting affects the results. This is typically the case for Markovian approaches to filtering or methods for image segmentation. For this cases, one way to speed up things is to process several images in parallel (if the memory footprint allows that!). On way of trying to maximize the use of all available computing cores in Python is using the multiprocessing module which allows to deal with a pool of threads. One example would be as follows: However, this does not allow for easy inter-thread communication which is not needed in the above example, but can be very useful if the different processes are working on the same image: imagine a multi-agent system where classifiers, algorithms for biophysical parameter extraction, data assimilation techniques, etc. work together to produce an accurate land cover map. They may want to communicate in order to share information. As far as I understand, Python has some limitations due to the global interpreter lock. Some languages as for instance Erlang offer appropriate concurrency primitives for this. I have played a little bit with them in my exploration of 7 languages in 7 weeks. Unfortunately, there are no OTB bindings for Erlang. Scala has copied the Erlang actors, but I didn't really got into the Scala thing.

Need for OTB access

This one here shouldn't need much explanation. Efficient remote sensing image processing needs OTB. Period. I am sorry. I am rather biased on that! I like C++ and I have no problem in using it. But there is no repl, one needs several lines of typedef before being able to use anything. This is the price to pay in order to have a good static checking of the types before running the problem. And it's damned fast! We have Python bindings which allows us to have a clean syntax, like in the pipeline example of the OTB tutorials. However, the lack of easy concurrency is a bad point for Python. Also, the lack of Artificial Intelligence frameworks for Python is an anti-feature. Java has them, but Java has no repl and look at its syntax. It's worse than C++. You have all these mangled names which were clean in C++ and Python and become things like otbImageFileReaderIUS2. Scala, thanks to its interoperability with Java (Scala runs in the JVM), can use OTB bindings. Actually, we have a cleaner syntax than Java's: There is still the problem of the mangled names, but with some pattern matching or case classes, this should disappear. So Scala seems a good candidate. It has:

Python-like syntax (although statically typed)
Concurrency primitives
A repl

Unfortunately, Scala is not a Lisp. Bear with me.

A Lisp would be nice

I want to build an expert system, it would be nice to have something for remote sensing similar to Wolfram Alpha. We could call it Ω Toolbox and keep the OTB name (or close). Why Lisp? Well I am not able to explain that here, but you can read P. Norvig's or P. Graham's essays on the topic. If you have a look at books like PAIP or AIMA, or systems like LISA, CLIPS or JESS, they are either written in Lisp or the offer a Lisp-like DSLs. I am aware of implementations of the AIMA code in Python, and even P. Norvig himself has reasons to have migrated from Lisp to Python, but as stated above, Python seems to be out of the game for me. The code is data philosophy of Lisp is, as far as I understand it, together with the repl tool, one of the main assets for AI programming. Another aspect which is also important is the functional programming paradigm used in Lisp (even though other programming paradigms are also available in Lisp). Concurrency is the main reason for the upheaval of functional languages in recent years (Haskell, for instance). Even though I (still) don't see the need for pure functional programming for my applications, lambda calculus is elegant and interesting. Maybe λ Toolbox should be a more appropriate name?

Clojure

If we recap the needs:

A repl (no C++, no Java)
Concurrency (no Python)
OTB bindings available (no Erlang, no Haskell, no Ruby)
Lisp (none of the above)

there is one single candidate: Clojure. Clojure is a Lisp dialect which runs on the JVM and has nice concurrency features like inmutability and STM and agents. And by the way, OTB bindings work like a charm: Admittedly, the syntax is less beautiful than Python's or Scala's, but (because!) it's a Lisp. And it's a better Java than Java. So you have all the Java libs available, and even Clojure specific repositories like Clojars. A particular interesting project is Incanter which provides is a Clojure-based, R-like platform for statistical computing and graphics. Have a look at this presentation to get an overview of what you can do at the Clojure repl with that. If we bear in mind that in Lisp code is data and that Lisp macros are mega-powerful, one could imagine writing things like:

    (make-otb-pipeline reader gradientFilter thresholdFilter writer)

Or even emulating the C++ template syntax to avoid using the mangled names of the OTB classes in the Java bindings (using macros and keywords):

    (def filter (RescaleIntensityImageFilter :itk (Image. : otb :Float 2)
                                                  (Image. : otb :UnsignedChar 2)))

instead of

    (def filter (itkRescaleIntensityImageFilterIF2IUC2.))

I have already found a cool syntax for using the Setters and Getters of a filter using the doto macro (see line 19 in the example below):

Conclusion

I am going to push further the investigation of the use of Clojure because is seems to fit my needs:

Has an interactive interpreter
Access to OTB (through the Java bindings)
Concurrency primitives (agents, STM, etc.)
It's a lisp, so I can easily port existing rule-based expert systems.

Given the fact that this would be the sum of many cool features, I think I should call it Σ Toolbox, but I don't like the name. The mix of λ calculus and latex Ω Toolbox, should be called λΩλ, which is LoL in Greek.