20 May 2011

# Lambda Omega Lambda

In my current research work, I am investigating the possibility of integrating domain expert knowledge together with image processing techniques (segmentation, classification) in order to provide accurate land cover maps from remote sensing image time series. When we look at this in terms of tools (software) there are a set of requirements which have to be met in order to be able to go from toy problems to operational processing chains. These operational chains are needed by scientists for carbon and water cycle studies, climate change analysis, etc. In the coming years, the image data needed for this kind of near-real-time mapping will be available through Earth observation missions like ESA's Sentinel program. Briefly, the constraints that the software tools for this task have are the following.

1. Interactive environment for exploratory analysis
2. Concurrent/parallel processing
3. Availability of state of the art, validated image processing algorithms
4. Symbolic, semantic information processing

This long post / short article describes some of the thoughts I have and some of the conclusions I have come to after a rather long analysis of the tools available.

## Need for a repl

The repl acronym stands for Read-Evaluate-Print-Loop and I think has its origins in the first interactive Lisp systems. Nowadays, we call this an interactive interpreter. People used to numerical and/or scientific computing will think about Matlab, IDL, Octave, Scilab and what not. Computer scientists will see here the Python/Ruby/Tcl/etc. interpreters. Many people use nowadays Pyhton+Numpy or Scipy (now gathered in Pylab) in order to have something similar to Matlab/IDL/Scilab/Octave, but with a general purpose programming language. What a repl allows is to interactively explore the problem at hand without going through the cycle of writing the code, compiling, running etc. The simple script that I wrote for the OTB blog using OTB's Python bindings could be typed on the interpreter and bring together OTB and Pylab.

## Need for concurrent programming

Without going into the details and differences between parallel and concurrent programming, it seems clear that Moore's law can only continue to hold through multi-core architectures. In terms of low-level (pixel-wise) processing, OTB provides an interesting solution (multi-threaded execution of filters by dividing images into chunks). This approach can be generalized to GPUs. However, sometimes an algorithm needs to operate on the whole image because the image splitting affects the results. This is typically the case for Markovian approaches to filtering or methods for image segmentation. For this cases, one way to speed up things is to process several images in parallel (if the memory footprint allows that!). On way of trying to maximize the use of all available computing cores in Python is using the multiprocessing module which allows to deal with a pool of threads. One example would be as follows: However, this does not allow for easy inter-thread communication which is not needed in the above example, but can be very useful if the different processes are working on the same image: imagine a multi-agent system where classifiers, algorithms for biophysical parameter extraction, data assimilation techniques, etc. work together to produce an accurate land cover map. They may want to communicate in order to share information. As far as I understand, Python has some limitations due to the global interpreter lock. Some languages as for instance Erlang offer appropriate concurrency primitives for this. I have played a little bit with them in my exploration of 7 languages in 7 weeks. Unfortunately, there are no OTB bindings for Erlang. Scala has copied the Erlang actors, but I didn't really got into the Scala thing.

## Need for OTB access

This one here shouldn't need much explanation. Efficient remote sensing image processing needs OTB. Period. I am sorry. I am rather biased on that! I like C++ and I have no problem in using it. But there is no repl, one needs several lines of typedef before being able to use anything. This is the price to pay in order to have a good static checking of the types before running the problem. And it's damned fast! We have Python bindings which allows us to have a clean syntax, like in the pipeline example of the OTB tutorials. However, the lack of easy concurrency is a bad point for Python. Also, the lack of Artificial Intelligence frameworks for Python is an anti-feature. Java has them, but Java has no repl and look at its syntax. It's worse than C++. You have all these mangled names which were clean in C++ and Python and become things like otbImageFileReaderIUS2. Scala, thanks to its interoperability with Java (Scala runs in the JVM), can use OTB bindings. Actually, we have a cleaner syntax than Java's: There is still the problem of the mangled names, but with some pattern matching or case classes, this should disappear. So Scala seems a good candidate. It has:

1. Python-like syntax (although statically typed)
2. Concurrency primitives
3. A repl

Unfortunately, Scala is not a Lisp. Bear with me.

## A Lisp would be nice

I want to build an expert system, it would be nice to have something for remote sensing similar to Wolfram Alpha. We could call it Ω Toolbox and keep the OTB name (or close). Why Lisp? Well I am not able to explain that here, but you can read P. Norvig's or P. Graham's essays on the topic. If you have a look at books like PAIP or AIMA, or systems like LISA, CLIPS or JESS, they are either written in Lisp or the offer a Lisp-like DSLs. I am aware of implementations of the AIMA code in Python, and even P. Norvig himself has reasons to have migrated from Lisp to Python, but as stated above, Python seems to be out of the game for me. The code is data philosophy of Lisp is, as far as I understand it, together with the repl tool, one of the main assets for AI programming. Another aspect which is also important is the functional programming paradigm used in Lisp (even though other programming paradigms are also available in Lisp). Concurrency is the main reason for the upheaval of functional languages in recent years (Haskell, for instance). Even though I (still) don't see the need for pure functional programming for my applications, lambda calculus is elegant and interesting. Maybe λ Toolbox should be a more appropriate name?

## Clojure

If we recap the needs:

1. A repl (no C++, no Java)
2. Concurrency (no Python)
3. OTB bindings available (no Erlang, no Haskell, no Ruby)
4. Lisp (none of the above)

there is one single candidate: Clojure. Clojure is a Lisp dialect which runs on the JVM and has nice concurrency features like inmutability and STM and agents. And by the way, OTB bindings work like a charm: Admittedly, the syntax is less beautiful than Python's or Scala's, but (because!) it's a Lisp. And it's a better Java than Java. So you have all the Java libs available, and even Clojure specific repositories like Clojars. A particular interesting project is Incanter which provides is a Clojure-based, R-like platform for statistical computing and graphics. Have a look at this presentation to get an overview of what you can do at the Clojure repl with that. If we bear in mind that in Lisp code is data and that Lisp macros are mega-powerful, one could imagine writing things like:

    (make-otb-pipeline reader gradientFilter thresholdFilter writer)


Or even emulating the C++ template syntax to avoid using the mangled names of the OTB classes in the Java bindings (using macros and keywords):

    (def filter (RescaleIntensityImageFilter :itk (Image. : otb :Float 2)
(Image. : otb :UnsignedChar 2)))


    (def filter (itkRescaleIntensityImageFilterIF2IUC2.))


I have already found a cool syntax for using the Setters and Getters of a filter using the doto macro (see line 19 in the example below):

## Conclusion

I am going to push further the investigation of the use of Clojure because is seems to fit my needs:

1. Has an interactive interpreter
3. Concurrency primitives (agents, STM, etc.)
4. It's a lisp, so I can easily port existing rule-based expert systems.

Given the fact that this would be the sum of many cool features, I think I should call it Σ Toolbox, but I don't like the name. The mix of λ calculus and latex Ω Toolbox, should be called λΩλ, which is LoL in Greek.