Posts tagged "clojure":
The WHY and the WHAT
As explained in previous posts, I am using OTB through the Java bindings. Recently, the folks of the OTB team have made very cool things – the application wrappers – which enrich even further the ways of using OTB from third party applications. The idea of the application wrappers is that when you code a processing chain, you may want to use it on the command line, with a GUI or even as a class within another application.
What Julien and the others have done is to build an application engine which given a pipeline and its description (inputs, outputs and parameters) is able to generate a CLI interface and a Qt-based GUI interface. But once you are there, there is no major problem (well, if your name is Julien, at least) to generate a SWIG interface which will then allow to build bindings for the different languages supported by SWIG. This may seem a kitchen sink – usine à gaz in French – and actually it is if you don't pay attention (like me).
The application wrappers use advanced techniques which require a recent version of GCC. At least recent from a Debian point of view (GCC 4.5 or later). On the other hand, the bindings for the library (the OTB-Wrapping package) use CableSwig, which in turn uses its own SWIG interfaces. So if you want to be able to combine the application wrappers with the library bindings, odds are that you may end up with a little mess in terms of compiler versions, etc.
And this is exactly what happened to me when I had to build GCC from source in Debian Squeeze (no GCC 4.5 in the backports yet). So I started making some experiments with virtual machines in order to understand what was happening. Eventually I found that the simplest configuration was the best to sort things out. This is the HOW-TO for having a working installation of OTB with application wrappers, OTB-Wrapping and access everything from a Clojure REPL.
Installing Arch Linux on a virtual machine
I have chosen Arch Linux because with few steps you can have a minimal Linux install. Just grab the net-install ISO and create a new virtual machine on VirtualBox. In terms of packages just get the minimal set (the default selection, don't add anything else). After the install, log in as root, and perform a system update with:
If you used the net-install image, there should be no need for this step, but in case you used an ISO with the full system, it's better to make sure that your system is up to date. Then, create a user account and set up a password if you want to:
useradd -m jordi passwd jordi
Again, this is not mandatory, but things are cleaner this way, and you may want to keep on using this virtual machine. Don't forget to add your newly created user to the sudoers file:
pacman -S sudo vi /etc/sudoers
You can now log out and log in with the user name you created. Oh, yes, I forgot to point out that you don't have Xorg, or Gnome or anything graphical. Even the mouse is useless, but since you are a real hacker, you don't mind.
Install the packages you need
sudo pacman -S mercurial make cmake gcc gdal openjdk6 \ mesa swig cvs bison links unzip rlwrap
Yes, I know, Mesa (OpenGL) is useless without X, but we want to check that everything builds OK and OTB uses OpenGL for displaying images.
- Mercurial is needed to get the OTB sources, cvs is needed for getting CableSwig
- Make, cmake and gcc are the tools for building from sources
- Swig is needed for the binding generation, and bison is needed by CableSwig
- OpenJDK is our Java version of choice
- Links will be used to grab the Clojure distro, unzip to extract the corresponding jar file and rlwrap is used by the Clojure REPL.
And that's all!
Get the source code for OTB and friends
Start building everything
OTB and OTB-Wrapping
mkdir OTB cd OTB ccmake ../../OTB
Don't forget to set to ON the application build and the Java wrappers. Then just make (literally):
By the way, I have noticed that the compilation of the application engine can blow up your gcc if you don't allocate enough RAM for your virtual machine. At this point, you should be able to use the Java application wrappers. But we want also the library bindings so we gon on. We can now build CableSwig which will be needed by OTB-Wrapping. Same procedure as before:
cd ../ mkdir CableSwig cd CableSwig/ ccmake ../../CableSwig/ make
And now, OTB-Wrapping. Same story:
cd ../ mkdir OTB-Wrapping cd OTB-Wrapping/ ccmake ../../OTB-Wrapping/
In the cmake configuration, I choose only to build Java, but even in this case a Python interpreter is needed. I think that CableSwig needs it to generate XML code to represent the C++ class hierarchies. If you did not install Python explicitly in Arch, you will have by default a 2.7 version. This is OK. If you decided to install Python with pacman, you will have both, Python 2.7 and 3.2 and the default Python executable will point to the latter. In this case, don't forget set the PYTHON\EXECUTABLE in CMake to /usr/bin/python2. Then, just make and cd to your home directory.
And you are done. Well not really. Right now, you can do Java, but what's the point? You might as well use the C++ version, right?
Land of Lisp
Since Lisp is the best language out there, and OTB is the best remote sensing image processing software (no reference here, but trust me), we'll do OTB in Lisp.
mkdir Dev cd Dev/ hg clone http://bitbucket.org/inglada/pwob
PWOB stands for Playing With OTB Bindings. I have only put there 3 languages which run on the JVM for reasons I stated in a previous post. You will of course avoid the plain Java ones. I have mixed feelings about Scala. I definetly love Clojure since it is a Lisp.
Go to the download page and grab the latest release. It is a zip file which contains, among other things the needed jar. You can unzip the file:
mkdir src mv clojure-1.3.0.zip src/ cd src/ unzip clojure-1.3.0.zip
Copy the jar file to the user .clojure dir:
cd clojure-1.3.0 mkdir ~/.clojure mv clojure-1.3.0.jar ~/.clojure/
Make a sym link so we have a clojure.jar:
ln -s /home/inglada/.clojure/clojure-1.3.0.jar /home/inglada/.clojure/clojure.jar
And clean up useless things
cd .. rm -rf clojure-1.3.0*
Final steps before hacking
In order for Clojure to find all the jars and shared libraries, you have to define some environment variables. You may choose to set them into your .bashrc file:
export LD_LIBRARY_PATH="~/builds/OTB-Wrapping/lib/:~/builds/OTB/bin/" export ITK_AUTOLOAD_PATH="~/builds/OTB/bin/"
PWOB provides a script to run a Clojure REPL with everything set up:
cd ~/Dev/pwob/Clojure/src/Pwob ./otb-repl.sh
Now you should see something like this:
Clojure 1.3.0 user=>
Welcome to Lisp! If you want to use the applications you can for instance do:
(import '(org.otb.application Registry)) (def available-applications (Registry/GetAvailableApplications))
In the PWOB source tree you will find other examples.
Using the REPL is fun, but you will soon need to store the lines of code, test things, debug, etc. In this case, the best choice is to use SLIME, the Superior Lisp Interaction Mode for Emacs. There are many tutorials on the net on how to set it up for Clojure. Search for it using DuckDuckGo. In the PWOB tree (classpath.clj) you will find some hints on how to set it up for OTB and Clojure. A simpler config for Emacs is to use the inferior lisp mode, for which I have also written a config (otb-clojure-config.el). I may write a post some day about that. Have fun!
In my current research work, I am investigating the possibility of integrating domain expert knowledge together with image processing techniques (segmentation, classification) in order to provide accurate land cover maps from remote sensing image time series. When we look at this in terms of tools (software) there are a set of requirements which have to be met in order to be able to go from toy problems to operational processing chains. These operational chains are needed by scientists for carbon and water cycle studies, climate change analysis, etc. In the coming years, the image data needed for this kind of near-real-time mapping will be available through Earth observation missions like ESA's Sentinel program. Briefly, the constraints that the software tools for this task have are the following.
- Interactive environment for exploratory analysis
- Concurrent/parallel processing
- Availability of state of the art, validated image processing algorithms
- Symbolic, semantic information processing
This long post / short article describes some of the thoughts I have and some of the conclusions I have come to after a rather long analysis of the tools available.
Need for a repl
The repl acronym stands for Read-Evaluate-Print-Loop and I think has its origins in the first interactive Lisp systems. Nowadays, we call this an interactive interpreter. People used to numerical and/or scientific computing will think about Matlab, IDL, Octave, Scilab and what not. Computer scientists will see here the Python/Ruby/Tcl/etc. interpreters. Many people use nowadays Pyhton+Numpy or Scipy (now gathered in Pylab) in order to have something similar to Matlab/IDL/Scilab/Octave, but with a general purpose programming language. What a repl allows is to interactively explore the problem at hand without going through the cycle of writing the code, compiling, running etc. The simple script that I wrote for the OTB blog using OTB's Python bindings could be typed on the interpreter and bring together OTB and Pylab.
Need for concurrent programming
Without going into the details and differences between parallel and
concurrent programming, it seems clear that Moore's law can only
continue to hold through multi-core architectures. In terms of low-level
(pixel-wise) processing, OTB provides an interesting solution
(multi-threaded execution of filters by dividing images into chunks).
This approach can be
to GPUs. However, sometimes an algorithm needs to operate on the whole
image because the image splitting affects the results. This is typically
the case for Markovian approaches to filtering or methods for image
segmentation. For this cases, one way to speed up things is to process
several images in parallel (if the memory footprint allows that!). On
way of trying to maximize the use of all available computing cores in
Python is using the
multiprocessing module which allows to deal with a
pool of threads. One example would be as follows:
However, this does not allow for easy inter-thread communication which
is not needed in the above example, but can be very useful if the
different processes are working on the same image: imagine a multi-agent
system where classifiers, algorithms for biophysical parameter
extraction, data assimilation techniques, etc. work together to produce
an accurate land cover map. They may want to communicate in order to
share information. As far as I understand, Python has some limitations
due to the
interpreter lock. Some languages as for instance Erlang offer
concurrency primitives for this. I have
a little bit with them in my exploration of 7 languages in 7 weeks.
Unfortunately, there are no OTB bindings for Erlang. Scala has copied
the Erlang actors, but I didn't really got into the
Need for OTB access
This one here shouldn't need much explanation. Efficient remote sensing
image processing needs OTB. Period. I
am sorry. I am rather biased on that! I like C++ and I have no problem
in using it. But there is no repl, one needs several lines of
typedef before being able to use anything. This is the price to pay in
order to have a good static checking of the types before running the
problem. And it's damned fast! We have Python bindings which allows us
to have a clean syntax, like in the
example of the OTB tutorials. However, the lack of easy concurrency is a
bad point for Python. Also, the lack of Artificial Intelligence
frameworks for Python is an anti-feature. Java has them, but Java has no
repl and look at
syntax. It's worse than C++. You have all these mangled names which
were clean in C++ and Python and become things like
otbImageFileReaderIUS2. Scala, thanks to its interoperability with
Java (Scala runs in the JVM), can use OTB bindings. Actually, we have a
cleaner syntax than Java's:
There is still the problem of the mangled names, but with some pattern
matching or case classes, this
should disappear. So Scala seems a good candidate. It has:
- Python-like syntax (although statically typed)
- Concurrency primitives
- A repl
Unfortunately, Scala is not a Lisp. Bear with me.
A Lisp would be nice
I want to build an expert system, it would be nice to have something for remote sensing similar to Wolfram Alpha. We could call it Ω Toolbox and keep the OTB name (or close). Why Lisp? Well I am not able to explain that here, but you can read P. Norvig's or P. Graham's essays on the topic. If you have a look at books like PAIP or AIMA, or systems like LISA, CLIPS or JESS, they are either written in Lisp or the offer a Lisp-like DSLs. I am aware of implementations of the AIMA code in Python, and even P. Norvig himself has reasons to have migrated from Lisp to Python, but as stated above, Python seems to be out of the game for me. The code is data philosophy of Lisp is, as far as I understand it, together with the repl tool, one of the main assets for AI programming. Another aspect which is also important is the functional programming paradigm used in Lisp (even though other programming paradigms are also available in Lisp). Concurrency is the main reason for the upheaval of functional languages in recent years (Haskell, for instance). Even though I (still) don't see the need for pure functional programming for my applications, lambda calculus is elegant and interesting. Maybe λ Toolbox should be a more appropriate name?
- A repl (no C++, no Java)
- Concurrency (no Python)
- OTB bindings available (no Erlang, no Haskell, no Ruby)
- Lisp (none of the above)
there is one single candidate: Clojure. Clojure is a Lisp dialect which runs on the JVM and has nice concurrency features like inmutability and STM and agents. And by the way, OTB bindings work like a charm: Admittedly, the syntax is less beautiful than Python's or Scala's, but (because!) it's a Lisp. And it's a better Java than Java. So you have all the Java libs available, and even Clojure specific repositories like Clojars. A particular interesting project is Incanter which provides is a Clojure-based, R-like platform for statistical computing and graphics. Have a look at this presentation to get an overview of what you can do at the Clojure repl with that. If we bear in mind that in Lisp code is data and that Lisp macros are mega-powerful, one could imagine writing things like:
(make-otb-pipeline reader gradientFilter thresholdFilter writer)
Or even emulating the C++ template syntax to avoid using the mangled names of the OTB classes in the Java bindings (using macros and keywords):
(def filter (RescaleIntensityImageFilter :itk (Image. : otb :Float 2) (Image. : otb :UnsignedChar 2)))
(def filter (itkRescaleIntensityImageFilterIF2IUC2.))
I have already found a cool syntax for using the Setters and Getters of
a filter using the
doto macro (see line 19 in the example below):
- Has an interactive interpreter
- Access to OTB (through the Java bindings)
- Concurrency primitives (agents, STM, etc.)
- It's a lisp, so I can easily port existing rule-based expert systems.
Given the fact that this would be the sum of many cool features, I think I should call it Σ Toolbox, but I don't like the name. The mix of λ calculus and latex Ω Toolbox, should be called λΩλ, which is LoL in Greek.