27 May 2015

# Installing OTB has never been so easy

You've heard about the Orfeo Toolbox library and its wonders, but urban legends say that it is difficult to install. Don't believe that. Maybe it was difficult to install, but this is not the case anymore.

Thanks to the heroic work of the OTB core development team, installing OTB has never been so easy. In this post, you will find the step-by-step procedure to compile OTB from source on a Debian 8.0 Jessie GNU/Linux distribution.

## Prepare the user account

I assume that you have a fresh install. The procedure below has been tested in a virtual machine running under VirtualBox. The virtual machine was installed from scratch using the official netinst ISO image.

During the installation, I created a user named otb that I will use for the tutorial below. For simplicity, I give this user root privileges in order to install some packages. This can be done as follows. Log in as root or use the command:

su -


You can then edit the /etc/sudoers file by using the following command:

visudo


This will open the file with the nano text editor. Scroll down to the lines containing

# User privilege specification
root    ALL=(ALL:ALL) ALL


and copy the second line and below and replace root by otb:

otb     ALL=(ALL:ALL) ALL


Write the file and quit by doing C^o ENTER C^x. Log out and log in as otb. You are set!

## System dependencies

Now, let's install some packages needed to compile OTB. Open a terminal and use aptitude to install what we need:

sudo aptitude install mercurial \
cmake-curses-gui build-essential \
qt4-dev-tools libqt4-core \
libqt4-dev libboost1.55-dev \
zlib1g-dev libopencv-dev curl \
libcurl4-openssl-dev swig \
libpython-dev


We will install OTB in its own directory. So from your $HOME directory create a directory named OTB and go into it: mkdir OTB cd OTB  Now, get the OTB sources by cloning the repository (depending on your network speed, this may take several minutes): hg clone http://hg.orfeo-toolbox.org/OTB  This will create a directory named OTB (so in my case, this is /home/otb/OTB/OTB). Using mercurial commands, you can choose a particular version or you can go bleeding edge. You will at least need the first release candidate for OTB-5.0, which you can get with the following commands: cd OTB hg update 5.0.0-rc1 cd ../  ## Get OTB dependencies OTB's SuperBuild is a procedure which deals with all external libraries needed by OTB which may not be available through your Linux package manager. It is able to download source code, configure and install many external libraries automatically. Since the download process may fail due to servers which are not maintained by the OTB team, a big tarball has been prepared for you. From the $HOME/OTB directory, do the following:

wget https://www.orfeo-toolbox.org/packages/SuperBuild-archives.tar.bz2
tar xvjf SuperBuild-archives.tar.bz2


The download step can be looooong. Be patient. Go jogging or something.

Once you have downloaded and extracted the external dependencies, you can start compiling OTB. From the $HOME/OTB directory, create the directory where OTB will be built: mkdir -p SuperBuild/OTB  At the end of the compilation, the $HOME/OTB/SuperBuild/ directory will contain a classical bin/, lib/, include/ and share/ directory tree. The $HOME/OTB/SuperBuild/OTB/ is where the configuration and compilation of OTB and all the dependencies will be stored. Go into this directory: cd SuperBuild/OTB  Now we can configure OTB using the cmake tool. Since you are on a recent GNU/Linux distribution, you can tell the compiler to use the most recent C++ standard, which can give you some benefits even if OTB still does not use it. We will also compile using the Release option (optimisations). The Python wrapping will be useful with the OTB Applications. We also tell cmake where the external dependencies are. The options chosen below for OpenJPEG make OTB use the gdal implementation. cmake \ -DCMAKE_CXX_FLAGS:STRING=-std=c++14 \ -DOTB_WRAP_PYTHON:BOOL=ON \ -DCMAKE_BUILD_TYPE:STRING=Release \ -DCMAKE_INSTALL_PREFIX:PATH=/home/otb/OTB/SuperBuild/ \ -DDOWNLOAD_LOCATION:PATH=/home/otb/OTB/SuperBuild-archives/ \ -DOTB_USE_OPENJPEG:BOOL=ON \ -DUSE_SYSTEM_OPENJPEG:BOOL=OFF \ ../../OTB/SuperBuild/  After the configuration, you should be able to compile. I have 4 cores in my machine, so I use the -j4 option for make. Adjust the value to your configuration: make -j4  This will take some time since there are many libraries which are going to be built. Time for a marathon. ## Test your installation Everything should be compiled and available now. You can set up some environment variables for an easier use of OTB. You can for instance add the following lines at the end of $HOME/.bashrc:

export OTB_HOME=${HOME}/OTB/SuperBuild export PATH=${OTB_HOME}/bin:$PATH export LD_LIBRARY_PATH=${OTB_HOME}/lib


You can now open a new terminal for this to take effect or use:

cd
source .bashrc


You should now be able to use the OTB Applications. For instance, the command:

otbcli_BandMath


should display the documentation for the BandMath application.

Another way to run the applications, is using the command line application launcher as follows:

otbApplicationLauncherCommandLine BandMath \$OTB_HOME/lib/otb/applications/


## Conclusion

The SuperBuild procedure allows to easily install OTB without having to deal with different combinations of versions for the external dependencies (TIFF, GEOTIFF, OSSIM, GDAL, ITK, etc.).

This means that once you have cmake and a compiler, you are pretty much set. QT4 and Python are optional things which will be useful for the applications, but they are not required for a base OTB installation.

I am very grateful to the OTB core development team (Julien, Manuel, Guillaume, the other Julien, Mickaël, and maybe others that I forget) for their efforts in the work done for the modularisation and the development of the SuperBuild procedure. This is the kind of thing which is not easy to see from the outside, but makes OTB go forward steadily and makes it a very mature and powerful software.

13 May 2015

# A simple thread pool to run parallel PROSail simulations

In the otb-bv we use the OTB versions of the Prospect and Sail models to perform satellite reflectance simulations of vegetation.

The code for the simulation of a single sample uses the ProSail simulator configured with the satellite relative spectral responses, the acquisition parameters (angles) and the biophysical variables (leaf pigments, LAI, etc.):

ProSailType prosail;
prosail.SetRSR(satRSR);
prosail.SetParameters(prosailPars);
prosail.SetBVs(bio_vars);
auto result = prosail();


A simulation is computationally expensive and it would be difficult to parallelize the code. However, if many simulations are going to be run, it is worth using all the available CPU cores in the machine.

I show below how using C++11 standard support for threads allows to easily run many simulations in parallel.

Each simulation uses a set of variables given in an input file. We parse the sample file in order to get the input parameters for each sample and we construct a vector of simulations with the appropriate size to store the results.

otbAppLogINFO("Processing simulations ..." << std::endl);
auto bv_vec = parse_bv_sample_file(m_SampleFile);
auto sampleCount = bv_vec.size();
std::vector<SimulationType> simus{sampleCount};


The simulation function is actually a lambda which will sequentially process a sequence of samples and store the results into the simus vector. We capture by reference the parameters which are the same for all simulations (the satellite relative spectral responses satRSR and the acquisition angles in prosailPars):

auto simulator = [&](std::vector<BVType>::const_iterator sample_first,
std::vector<BVType>::const_iterator sample_last,
std::vector<SimulationType>::iterator simu_first){
ProSailType prosail;
prosail.SetRSR(satRSR);
prosail.SetParameters(prosailPars);
while(sample_first != sample_last)
{
prosail.SetBVs(*sample_first);
*simu_first = prosail();
++sample_first;
++simu_first;
}
};


We start by figuring out how to split the simulation into concurrent threads. How many cores are there?

auto num_threads = std::thread::hardware_concurrency();


So we define the size of the chunks we are going to run in parallel and we prepare a vector to store the threads:

auto block_size = sampleCount/num_threads;


Here, I choose to use as many threads as cores available, but this could be changed by a multiplicative factor if we know, for instance that disk I/O will introduce some idle time for each thread.

An now we can fill the vector with the threads that will process every block of simulations :

auto input_start = std::begin(bv_vec);
auto output_start = std::begin(simus);
{
auto input_end = input_start;
input_start,
input_end,
output_start);
input_start = input_end;
}


The std::thread takes the name of the function object to be called, followed by the arguments of this function, which in our case are the iterators to the beginning and the end of the block of samples to be processed and the iterator of the output vector where the results will be stored. We use std::advance to update the iterator positions.

After that, we have a vector of threads which have been started concurrently. In order to make sure that they have finished before trying to write the results to disk, we call join on each thread, which results in waiting for each thread to end:

std::for_each(threads.begin(),threads.end(),
otbAppLogINFO("" << sampleCount << " samples processed."<< std::endl);
for(const auto& s : simus)
this->WriteSimulation(s);


This may no be the most efficient solution, nor the most general one. Using std::async and std::future would have allowed not to have to deal with block sizes, but in this solution we can easily decide the number of parallel threads that we want to use, which may be useful in a server shared with other users.

25 Mar 2015

# Scripting in C++

## Introduction

This quote of B. Klemens1 that I used in my previous post made me think a little bit:

"I spent much of my life ignoring the fundamentals of computing and just hacking together projects using the package or language of the month: C++, Mathematica, Octave, Perl, Python, Java, Scheme, S-PLUS, Stata, R, and probably a few others that I’ve forgotten."

Although much less skilled than M. Klemens, I have also wandered around the programming language landscape, out of curiosity, but also with particular needs in mind.

For example, I listed a set of features that I found useful for easily going from prototyping to operational code for remote sensing image processing. The main 3 bullet points were:

1. OTB accessibility, which meant C++, Python or Java (and its derivatives as Clojure);
2. Robust concurrent programming, which ruled out Python;
3. Having a REPL, which led me to say silly things and even develop working code.

Before going the OTB-from-LISP-through-bindings route, I had looked for C++ interpreters to come close to a REPL, some are available, but, at least at the time, I didn't manage to get one running.

Since I started using C++11 a little bit, I find more and more pleasant to write C++ even for small things that I would usually use Python for. Add to that all the new things in the standard library, like threads, regular expressions, tuples and proper random number generators, just to name a few, and here I am wondering how to make the edit/compile/run loop shorter.

B. Klemens proposes crazy things like compiling via cut/paste, but this is too extreme even for me. Since I use emacs, I can compile from within the editor just by hitting C-c C-c which asks for a command to compile. I usually do things like:

export ITK_AUTOLOAD_PATH="" && cd ~/Dev/builds/otb-bv/ && make \
&& ctest -I 8,8 -VV && gnuplot ~/Dev/otb-bv/data/plot_bvMultiT.gpl


I am nearly there, but not yet.

Remember that I discovered Klemens because I was looking for statistical libs after watching some videos of the Coursera Data Science specialization? One of the courses is about reproducible research and they show how you can mix together R code and Markdown to produce executable research papers.

When I saw this R Markdown approach, I thought, well, that's nice, but inferior to org-babel. The main advantage of org-babel with respect to R Markdown or IPython is that in org-mode you can use different languages in the same document.

In a recent research report written in org-mode and exported to PDF, I used, in the same document, Python, bash, gnuplot and maxima. This is a real use case. Of course, these are all interpreted languages, so it is easy to embed them in documents and talk to the interpreter.

Well, 15 years after I started using emacs and more than 6 years after discovering org-mode, I found that compiled languages can be used with org-babel. At least, decent compiled languages, since there is no mention of Objective-C or Java2.

## Setting up and testing

So how one does use C++ with org-babel? Just drop this into your emacs configuration:

Yes, don't forget the flag for C++11, since you want to be cool.

Now, let the fun begin. Open an org buffer and write:

#+begin_src C++ :includes <iostream>
std::cout << "Hello scripting world\n";
#+end_src


Go inside the code block and hit C-c C-c and you will magically see this appear just below the code block:

#+RESULTS:
: Hello scripting world


No problem. You are welcome. Good-bye.

If you want to go beyond the Hello world example keep reading.

## Using the standard library

You have noticed that the first line of the code block says it is C++ and gives information about the header files to include. If you need to include more than one header file3, just make a list of them (remember, emacs is a LISP machine):

#+begin_src C++ :includes (list "<iostream>" "<vector>")
std::vector<int> v{1,2,3};
for(auto& val : v) std::cout << val << "\n";
#+end_src

#+RESULTS:
| 1 |
| 2 |
| 3 |


Do you see the nice formatted results? This is an org-mode table. More fun with the standard library and tables:

#+begin_src C++ :includes (list "<iostream>" "<string>" "<map>")
std::map<std::string, int> m{{"pizza",1},{"muffin",2},{"cake",3}};
for(auto& val : m) std::cout << val.first << " " << val.second << "\n";
#+end_src

#+RESULTS:
| cake   | 3 |
| muffin | 2 |
| pizza  | 1 |


## Using other libraries

Sometimes, you may want to use other libraries than the C++ standard one4. Just give the header files as above and also the flags (here I split the header of the org-babel block in 2 lines for readability):

#+header: :includes (list "<gsl/gsl_fit.h>" "<stdio.h>")
#+begin_src C++ :exports both :flags (list "-lgsl" "-lgslcblas")
int i, n = 4;
double x[4] = { 1970, 1980, 1990, 2000 };
double y[4] = {   12,   11,   14,   13 };
double w[4] = {  0.1,  0.2,  0.3,  0.4 };

double c0, c1, cov00, cov01, cov11, chisq;

gsl_fit_wlinear (x, 1, w, 1, y, 1, n,
&c0, &c1, &cov00, &cov01, &cov11,
&chisq);

printf ("# best fit: Y = %g + %g X\n", c0, c1);
printf ("# covariance matrix:\n");
printf ("# [ %g, %g\n#   %g, %g]\n",
cov00, cov01, cov01, cov11);
printf ("# chisq = %g\n", chisq);

for (i = 0; i < n; i++)
printf ("data: %g %g %g\n",
x[i], y[i], 1/sqrt(w[i]));

printf ("\n");

for (i = -30; i < 130; i++)
{
double xf = x[0] + (i/100.0) * (x[n-1] - x[0]);
double yf, yf_err;

gsl_fit_linear_est (xf,
c0, c1,
cov00, cov01, cov11,
&yf, &yf_err);

printf ("fit: %g %g\n", xf, yf);
printf ("hi : %g %g\n", xf, yf + yf_err);
printf ("lo : %g %g\n", xf, yf - yf_err);
}
#+end_src

#+RESULTS:
| #     | best       |    fit: |       Y | = | -106.6 | + | 0.06 | X |
| #     | covariance | matrix: |         |   |        |   |      |   |
| #     | [          |  39602, |   -19.9 |   |        |   |      |   |
| #     | -19.9,     |   0.01] |         |   |        |   |      |   |
| #     | chisq      |       = |     0.8 |   |        |   |      |   |

etc.


## Defining things outside main

You may have noticed that we are really scripting like perl hackers, since there is no main() function in our code snippets. org-babel kindly generates all the boilerplate for us. Thank you.

But wait, what happens if I want to define or declare things outside main?

No problem, just tell org-babel not to generate a main():

#+begin_src C++ :includes <iostream> :main no
struct myStr{int i;};

int main()
{
myStr ms{2};
std::cout << "The value of the member is " << ms.i << std::endl;
}
#+end_src

#+RESULTS:
: The value of the member is 2


There are many other options you can play with and I suggest to read the documentation. Just 2 more useful things.

## Using org-mode tables as data

You saw that org-mode has tables. They are very powerful and can even be used as a spreadsheet. Let's see how they can be used as input to C++ code.

Let's start by giving a name to our table:

#+tblname: somedata
|  sqr | noise |
|------+-------|
|  0.0 |  0.23 |
|  1.0 |  1.31 |
|  4.0 |  4.61 |
|  9.0 |  9.05 |
| 16.0 | 16.55 |


Now, we can refer to it from org-babel using variables:

#+header: :exports results :includes <iostream>
#+begin_src C++ :var somedata=somedata :var somedata_cols=2 :var somedata_rows=5
for (int i=0; i<somedata_rows; i++)
{
for (int j=0; j<somedata_cols; j++)
{
std::cout << somedata[i][j]/2.0 << "\t";
}
std::cout << "\n";
}
#+end_src

#+RESULTS:
|   0 | 0.115 |
| 0.5 | 0.655 |
|   2 | 2.305 |
| 4.5 | 4.525 |
|   8 | 8.275 |


## Tangle when happy

Once you have scripted your solution to save the world from hunger5, you may want to get it out from org-mode and upload it to github6. You can of course copy and paste the code, but since you are using a superior editor, you can ask it to do that7 for you. Just tell it the name of your file using the tangle option:

#+begin_src C++ :tangle MyApp.cpp :main no :includes <iostream>
namespace ns {
class MyClass {};
}

int main(int argc, char* argv[])
{
ns::MyClass{};
std::cout << "hi" << std::endl;
return 0;
}
#+end_src


If you the call org-babel-tangle from this block, it will generate a file with the right name containing this:

#include <iostream>
namespace ns {
class MyClass {};
}

int main(int argc, char* argv[])
{
ns::MyClass{};
std::cout << "hi" << std::endl;
return 0;
}


## Conclusion

Since reproducible research in emacs has entered the realm of scientific journals8, I will end as follows:

In this article, we have shown the feasibility of coding in C++ without writing much boilerplate and going through the edit/compile/run loop. With the advent of modern C++ standards and richer standard libraries, C++ is a good candidate to quickly write throw-away code. The integration of C++ in org-babel gets C++ even closer to the tasks for which scripting languages are used for.

Future work will include learning to use the C++ standard library (data structures and algorithms), browsing the catalog of BOOST libraries, trying to get good C++ developers to use emacs and C++11, and find a generic variadic-template-based solution to world hunger.

## Footnotes:

1

Ben Klemens, Modeling with Data: Tools and Techniques for Scientific Computing, Princeton University Press, 2009, ISBN: 9780691133140.

2

Well, this is just trolling, since Java is in the list of supported languages, and I guess that you can use gcc for Objective-C.

3

This may happen if you are writing code for a nuclear power plant.

4

To guide nuclear missiles towards the nuclear power plant of the evil guys, for instance.

5

Just send some nuclear bombs to those who are hungry?

6

Isn't that cool, to be on the same cloud as all those Javascript programmers?

7

But tell it to make coffee first.

8

Eric Schulte, Dan Davison, Tom Dye, Carsten Dominik. A Multi-Language Computing Environment for Literate Programming and Reproducible Research Journal of Statistical Software http://www.jstatsoft.org/v46/i03

18 Mar 2015

# Data science in C?

Coursera is offering a Data Science specialization taught by professors from Johns Hopkins. I discovered it via one of their courses which is about reproducible research. I have been watching some of the video lectures and they are very interesting, since they combine data processing, programming and statistics.

The courses use the R language which is free software and is one the reference tools in the field of statistics, but it is also very much used for other kinds of data analysis.

While I was watching some of the lectures, I had some ideas to be implemented in some of my tools. Although I have used GSL and VXL1 for linear algebra and optimization, I have never really needed statistics libraries in C or C++, so I ducked a bit and found the apophenia library2, which builds on top of GSL, SQLite and Glib to provide very useful tools and data structures to do statistics in C.

Browsing a little bit more, I found that the author of apophenia has written a book "Modeling with data"3, which teaches you statistics like many books about R, but using C, SQLite and gnuplot.

This is the kind of technical book that I like most: good math and code, no fluff, just stuff!

The author sets the stage from the foreword. An example from page xii (the emphasis is mine):

" The politics of software

All of the software in this book is free software, meaning that it may be freely downloaded and distributed. This is because the book focuses on portability and replicability, and if you need to purchase a license every time you switch computers, then the code is not portable. If you redistribute a functioning program that you wrote based on the GSL or Apophenia, then you need to redistribute both the compiled final program and the source code you used to write the program. If you are publishing an academic work, you should be doing this anyway. If you are in a situation where you will distribute only the output of an analysis, there are no obligations at all. This book is also reliant on POSIX-compliant systems, because such systems were built from the ground up for writing and running replicable and portable projects. This does not exclude any current operating system (OS): current members of the Microsoft Windows family of OSes claim POSIX compliance, as do all OSes ending in X (Mac OS X, Linux, UNIX,…)."

Of course, the author knows the usual complaints about programming in C (or C++ for that matter) and spends many pages explaining his choice:

"I spent much of my life ignoring the fundamentals of computing and just hacking together projects using the package or language of the month: C++, Mathematica, Octave, Perl, Python, Java, Scheme, S-PLUS, Stata, R, and probably a few others that I’ve forgotten. Albee (1960, p 30)4 explains that “sometimes it’s necessary to go a long distance out of the way in order to come back a short distance correctly;” this is the distance I’ve gone to arrive at writing a book on data-oriented computing using a general and basic computing language. For the purpose of modeling with data, I have found C to be an easier and more pleasant language than the purpose-built alternatives—especially after I worked out that I could ignore much of the advice from books written in the 1980s and apply the techniques I learned from the scripting languages."

The author explains that C is a very simple language:

" Simplicity

C is a super-simple language. Its syntax has no special tricks for polymorphic operators, abstract classes, virtual inheritance, lexical scoping, lambda expressions, or other such arcana, meaning that you have less to learn. Those features are certainly helpful in their place, but without them C has already proven to be sufficient for writing some impressive programs, like the Mac and Linux operating systems and most of the stats packages listed above."

And he makes it really simple, since he actually teaches you C in one chapter of 50 pages (and 50 pages counting source code is not that much!). He does not teach you all C, though:

"As for the syntax of C, this chapter will cover only a subset. C has 32 keywords and this book will only use 18 of them."

At one point in the introduction I worried about the author bashing C++, which I like very much, but he actually gives a good explanation of the differences between C and C++ (emphasis and footnotes are mine):

"This is the appropriate time to answer a common intro-to-C question: What is the difference between C and C++? There is much confusion due to the almost-compatible syntax and similar name—when explaining the name C-double-plus5, the language’s author references the Newspeak language used in George Orwell’s 1984 (Orwell, 19496; Stroustrup, 1986, p 47). The key difference is that C++ adds a second scope paradigm on top of C’s file- and function-based scope: object-oriented scope. In this system, functions are bound to objects, where an object is effectively a struct holding several variables and functions. Variables that are private to the object are in scope only for functions bound to the object, while those that are public are in scope whenever the object itself is in scope. In C, think of one file as an object: all variables declared inside the file are private, and all those declared in a header file are public. Only those functions that have a declaration in the header file can be called outside of the file. But the real difference between C and C++ is in philosophy: C++ is intended to allow for the mixing of various styles of programming, of which object-oriented coding is one. C++ therefore includes a number of other features, such as yet another type of scope called namespaces, templates and other tools for representing more abstract structures, and a large standard library of templates. Thus, C represents a philosophy of keeping the language as simple and unchanging as possible, even if it means passing up on useful additions; C++ represents an all-inclusive philosophy, choosing additional features and conveniences over parsimony."

It is actually funny that I find myself using less and less class inheritance and leaning towards small functions (often templates) and when I use classes, it is usually to create functors. This is certainly due to the influence of the Algorithmic journeys of Alex Stepanov.

## Footnotes:

1

Actually the numerics module, vnl.

2

Apophenia is the experience of perceiving patterns or connections in random or meaningless data.

3

Ben Klemens, Modeling with Data: Tools and Techniques for Scientific Computing, Princeton University Press, 2009, ISBN: 9780691133140.

4

Albee, Edward. 1960. The American Dream and Zoo Story. Signet.

6

Orwell, George. 1949. 1984. Secker and Warburg.

7

Stroustrup, Bjarne. 1986. The C++ Programming Language. Addison-Wesley.

04 Feb 2015

# expert.py

L'autre jour je discutais avec mon ami Gilles sur la frustration ressentie face à des services clients qui n'en sont pas. Que ce soient des avatars appelés conseillers virtuels ou des humains qui suivent une procédure préétablie pour essayer de vous donner une réponse à une question posée, il me disait que c'est pareil. Et qu'en fait, il vaudrait mieux qu'il n'y ait pas d'humain dans le système, car ça doit être désagréable pour eux de recevoir des remarques méprisantes voire des insultes de la part des clients.

Notre conversation a dérivé vers l'idée générale d'automatisation du travail pour enfin arriver au constat que les possibilités ne se limitent pas à des tâches répétitives pour lesquelles des procédures explicites peuvent être établies. Ce qui, il y a quelques années, était encore considéré comme de la science fiction ou des utilisations de l'intelligence artificielle qui restaient au niveau de la démonstration, est en train de devenir presque banal.

Ceux qui utilisent les services de Google, Twitter ou Facebook sont maintenant habitués à la reconnaissance faciale sur les photos ou l'interprétation automatique du langage naturel pour proposer des publicités ciblées.

Plus insidieux encore est le filtrage des messages reçus (priority inbox dans Gmail ou sur les flux de messages Facebook ou Twitter) qui ne présentent que ce qui est censé les intéresser en fonction d'un ensemble de critères qui leur conviennent mais qu'ils n'ont pas choisi explicitement. Cela a rappelé Gilles des idées qu'il avait eues il y a quelques années quand il participait à des commissions de choix où siégeaient des experts. Je ne dévoilerai de quoi s'agissait-il dans ces commissions, car ce qui vient pourrait être considéré comme un manque de professionnalisme de la part des experts mais aussi de Gilles.

Je n'ai pas les éléments pour défendre les experts, mais à la décharge de mon ami Gilles, je dirais qu'il était jeune et qu'il ne se rendait pas compte des enjeux de haute importance qui y étaient traités. Dans ces commissions il s'agissait d'évaluer des dossiers. Beaucoup de dossiers en peu de temps. Souvent, le panel d'experts était composé de vrais experts, très bons, mais pas du domaine concerné. Apparemment, ceci n'était pas considéré comme un problème par les méta-experts qui organisaient les commissions.

Malheureusement, cette situation transformait les évaluations en une sorte de mélange de loterie combinée à l'inquisition où chaque expert essayait de s'en sortir sans être trop ridicule.

Au bout que quelques commissions, Gilles croyait être capable d'anticiper les réactions de la plupart des experts (ils étaient des experts professionnels et donc souvent les mêmes). La quantité de dossiers à évaluer poussait les experts à faire très vite et malheureusement à bâcler le travail. La plupart fonctionnaient en pilotage automatique. Ceci désespérait Gilles. J'ai d'autres amis qui auraient eu envie de frapper certains des experts, mais l'ami dont il est question ici n'est pas un garçon violent. Il est plutôt passionné d'informatique et donc sa frustration vis-à-vis des experts s'est traduite par une envie de les remplacer par des algorithmes. En séance, il regardait les experts et voyait des logigrammes.

La frustration augmentant, ces logigrammes sont devenus du pseudo-code. Et de là, il n'a pas pu s'empêcher de franchir le pas et de coder chaque expert en séance. Il a fait de l'eXtreme programming en temps réel. Il est comme ça, mon ami : ce sont les méthodes à Gilles.

Certains des experts étaient parfaitement clairs ordonnés et élégants. Ils les remplaçait par un script Python. D'autres faisaient beaucoup attention à la forme et peu au fond des dossiers, ils devenaient des classes Ruby.

Certains, on le voyait, avaient été comme cela, mais appartenaient à une autre époque. Ils méritaient du Java. D'autres étaient, malgré tout efficaces, mais parfois difficiles à suivre. Il leur accordait du C++. D'autres étaient rapidement débordés dès qu'ils avaient plusieurs dossiers à traiter. Ils finissaient en script Matlab.

D'autres, avaient une telle mémoire des dossiers qu'ils avaient traités pendant des années, qu'il fallait les remplacer par du SQL. D'autres étaient incompréhensibles, désordonnés, embrouillés. Ils méritaient à peine un peu de Perl, voir de l'awk et il avait envie de les envoyer vers /dev/null.

Il était déçu de ne jamais avoir pu se servir du Lisp, qu'il réservait pour le jour où il croiserait un expert qui se conduirait de façon élégante, intelligente et iconoclaste, mais juste.

Gilles a beaucoup mûri professionnellement et a fait reconnaître ses compétences. D'ingénieur junior il est passé architecte, puis chef de projet. Il a récemment atteint la sublimation et a été nommé responsable technico-commercial.

19 Aug 2012

# Emacs Lisp as a general purpose language?

## I live in Emacs

As you may already know if you read this blog (yes, you, the only one who reads this blog …) I live inside Emacs. I have been an Emacs user for more than 12 years now and I do nearly everything with it:

• org-mode for organization and document writing (thanks to the export functionalities),
• gnus for e-mail (I used to use VM before),
• ERC and identica-mode for instant messaging and micro-blogging (things that I don't do much anymore),
• this very post is written in Emacs and automatically uploaded to my blog using the amazing org2blog package,
• dired as a file manager,
• and of course all my programming tasks.

This has been a progressive shift from just using Emacs as a text editor to using it nearly as an operating system. An operating system with a very decent editor, despite of what some may claim!. It is not a matter of totalitarism or aesthetics, but a matter of efficiency: the keyboard is the most efficient user input method for me, so being able to use the same keyboard shortcuts for programming, writing e-mails, etc. is very useful.

## Looking for a Lisp

In a previous post I wrote about my thoughts on choosing a programming language for a particular project at work. My main criteria for the choice were:

• availability of a REPL;
• easy OTB access;
• integrated concurrent programming constructs.

## But what are the real needs?

1. Availability of a REPL: well, Emacs not only has a REPL (M-x ielm), but any Emacs lisp code in any buffer can be evaluated on the fly (C-j on elisp buffers or C-x C-e in any other buffer). Actually, you can see Emacs, not as an editor, but as some sort of Smalltalk-like environment for Lisp which gets to be enriched and modified on the fly. So this need is more than fulfilled.
2. Easy access to OTB: well, there are no bindings for Emacs lisp, but given the fact that the OTB applications are available on the command line, they can be easily wrapped with system calls which can even be asynchronous. The use of the querying facilities of the OTB application engine, should make easy this wrapping. So the solution is not so straightforward as using the Java bindings on Clojure, but seems feasible without a huge effort.
3. Concurrency: this one hurts, since Emacs lisp does not have threads, so I don't think it is possible to do concurrency at the Lisp level. However, since much of the heavy processing is done at the OTB level, OTB multi-threading capabilities allow to cope with this.

For those who think that Emacs Lisp is only useful for text manipulation, a look at the Common Lisp extensions and to the object oriented CLOS-compatible library EIEIO may be useful.

## So why not Emacs lisp then?

The main problem I see for the use of Emacs Lisp is that I have been unable to find examples of relatively large programs using it for other things than extending Emacs. There are of course huge tools like org-mode and Gnus, but they have been designed to run interactively inside Emacs. There are other more technical limitations of Emacs Lisp which can make the development of large applications difficult, as for instance the lack of namespaces, and other little things like that. However, there are very cool things that could be done to make OTB development easier using Emacs. For instance, one thing which can be tedious when testing OTB applications on the command line is setting the appropriate parameters or even remembering the list of the available ones. It would nice to have an Emacs mode able to query the application registry and auto-complete or suggest the available options. This could consist of a set of Emacs Lisp functions using the comint mode to interact with the shell and the OTB application framework. This would also allow to parse the documentation strings for the different applications and present them to the user in a different buffer. Once we have that, it should be possible to build a set of macros to set-up a pipeline of applications and generate an Emacs Lisp source file to be run later on (within another instance of Emacs or as an asynchronous process). This would build a development environment for application users as opposed to a development environment for application developers. And since Lisp code is data, the generates Emacs Lisp code, could be easily read again and used as a staring point for other programs, but better than that, a graphical representation of the pipeline could be generated (think of Tikz or GraphViz). Yes I know, talk is cheap, show me the code, etc. I agree.

## Conclusion

Clojure has a thriving community and new libraries appear every week to address very difficult problems (core.logic, mimir, etc.). Added to that, all the Java libraries out there are at your fingertips. And this is very difficult to beat. I hear the criticisms about Clojure: it's just a hype, it's mainly promoted by a company, etc. Yes, I would like to be really cool and use Scheme or Haskell, but pragmatism counts. And, anyway, the programming language you are thinking of is not good enough either.

17 Nov 2011

# Hack OTB on the Clojure REPL

## The WHY and the WHAT

Lately, I have been playing with several configurations in order to be able to fully exploit the ORFEO Toolbox library from Clojure.

As explained in previous posts, I am using OTB through the Java bindings. Recently, the folks of the OTB team have made very cool things – the application wrappers – which enrich even further the ways of using OTB from third party applications. The idea of the application wrappers is that when you code a processing chain, you may want to use it on the command line, with a GUI or even as a class within another application.

What Julien and the others have done is to build an application engine which given a pipeline and its description (inputs, outputs and parameters) is able to generate a CLI interface and a Qt-based GUI interface. But once you are there, there is no major problem (well, if your name is Julien, at least) to generate a SWIG interface which will then allow to build bindings for the different languages supported by SWIG. This may seem a kitchen sink – usine à gaz in French – and actually it is if you don't pay attention (like me).

The application wrappers use advanced techniques which require a recent version of GCC. At least recent from a Debian point of view (GCC 4.5 or later). On the other hand, the bindings for the library (the OTB-Wrapping package) use CableSwig, which in turn uses its own SWIG interfaces. So if you want to be able to combine the application wrappers with the library bindings, odds are that you may end up with a little mess in terms of compiler versions, etc.

And this is exactly what happened to me when I had to build GCC from source in Debian Squeeze (no GCC 4.5 in the backports yet). So I started making some experiments with virtual machines in order to understand what was happening. Eventually I found that the simplest configuration was the best to sort things out. This is the HOW-TO for having a working installation of OTB with application wrappers, OTB-Wrapping and access everything from a Clojure REPL.

## The HOW-TO

### Installing Arch Linux on a virtual machine

I have chosen Arch Linux because with few steps you can have a minimal Linux install. Just grab the net-install ISO and create a new virtual machine on VirtualBox. In terms of packages just get the minimal set (the default selection, don't add anything else). After the install, log in as root, and perform a system update with:

    pacman -Suy


If you used the net-install image, there should be no need for this step, but in case you used an ISO with the full system, it's better to make sure that your system is up to date. Then, create a user account and set up a password if you want to:

    useradd -m jordi
passwd jordi


Again, this is not mandatory, but things are cleaner this way, and you may want to keep on using this virtual machine. Don't forget to add your newly created user to the sudoers file:

    pacman -S sudo
vi /etc/sudoers


You can now log out and log in with the user name you created. Oh, yes, I forgot to point out that you don't have Xorg, or Gnome or anything graphical. Even the mouse is useless, but since you are a real hacker, you don't mind.

### Install the packages you need

Now the fun starts. You are going to keep things minimal and only install the things you really need.

    sudo pacman -S mercurial make cmake gcc gdal openjdk6 \
mesa swig cvs bison links unzip rlwrap


Yes, I know, Mesa (OpenGL) is useless without X, but we want to check that everything builds OK and OTB uses OpenGL for displaying images.

• Mercurial is needed to get the OTB sources, cvs is needed for getting CableSwig
• Make, cmake and gcc are the tools for building from sources
• Swig is needed for the binding generation, and bison is needed by CableSwig
• OpenJDK is our Java version of choice
• Links will be used to grab the Clojure distro, unzip to extract the corresponding jar file and rlwrap is used by the Clojure REPL.

And that's all!

### Get the source code for OTB and friends

I prefer using the development version of OTB, so I can complain when things don't work.

    hg clone http://hg.orfeo-toolbox.org/OTB
hg clone http://hg.orfeo-toolbox.org/OTB-Wrapping


CableSwig is only available through CVS.

    cvs -d :pserver:anonymous@pubilc.kitware.com:/cvsroot/CableSwig \
co CableSwig


### Start building everything

We will compile everything on a separate directory:

    mkdir builds
cd builds/


#### OTB and OTB-Wrapping

We create a directory for the OTB build and configure with CMake

    mkdir OTB
cd OTB
ccmake ../../OTB


Don't forget to set to ON the application build and the Java wrappers. Then just make (literally):

    make


By the way, I have noticed that the compilation of the application engine can blow up your gcc if you don't allocate enough RAM for your virtual machine. At this point, you should be able to use the Java application wrappers. But we want also the library bindings so we gon on. We can now build CableSwig which will be needed by OTB-Wrapping. Same procedure as before:

    cd ../
mkdir CableSwig
cd CableSwig/
ccmake ../../CableSwig/
make


And now, OTB-Wrapping. Same story:

    cd ../
mkdir OTB-Wrapping
cd OTB-Wrapping/
ccmake ../../OTB-Wrapping/


In the cmake configuration, I choose only to build Java, but even in this case a Python interpreter is needed. I think that CableSwig needs it to generate XML code to represent the C++ class hierarchies. If you did not install Python explicitly in Arch, you will have by default a 2.7 version. This is OK. If you decided to install Python with pacman, you will have both, Python 2.7 and 3.2 and the default Python executable will point to the latter. In this case, don't forget set the PYTHON\EXECUTABLE in CMake to /usr/bin/python2. Then, just make and cd to your home directory.

    make
cd


And you are done. Well not really. Right now, you can do Java, but what's the point? You might as well use the C++ version, right?

### Land of Lisp

Since Lisp is the best language out there, and OTB is the best remote sensing image processing software (no reference here, but trust me), we'll do OTB in Lisp.

#### PWOB

You may want to get some examples that I have gathered on Bitbucket.

    mkdir Dev
cd Dev/


PWOB stands for Playing With OTB Bindings. I have only put there 3 languages which run on the JVM for reasons I stated in a previous post. You will of course avoid the plain Java ones. I have mixed feelings about Scala. I definetly love Clojure since it is a Lisp.

#### Get Clojure

The cool thing about this Lisp implementation is that it is contained into a jar file. You can get it with the text-based web browser links:

    links http://clojure.org


Go to the download page and grab the latest release. It is a zip file which contains, among other things the needed jar. You can unzip the file:

    mkdir src
mv clojure-1.3.0.zip src/
cd src/
unzip clojure-1.3.0.zip


Copy the jar file to the user .clojure dir:

    cd clojure-1.3.0
mkdir ~/.clojure
mv clojure-1.3.0.jar ~/.clojure/


Make a sym link so we have a clojure.jar:

    ln -s /home/inglada/.clojure/clojure-1.3.0.jar /home/inglada/.clojure/clojure.jar


And clean up useless things

    cd ..
rm -rf clojure-1.3.0*


#### Final steps before hacking

Some final minor steps are needed before the fun starts. You may want to create a file for the REPL to store completions:

    touch ~/.clj_completions


In order for Clojure to find all the jars and shared libraries, you have to define some environment variables. You may choose to set them into your .bashrc file:

    export LD_LIBRARY_PATH="~/builds/OTB-Wrapping/lib/:~/builds/OTB/bin/"


PWOB provides a script to run a Clojure REPL with everything set up:

    cd ~/Dev/pwob/Clojure/src/Pwob
./otb-repl.sh


Now you should see something like this:

    Clojure 1.3.0
user=>


Welcome to Lisp! If you want to use the applications you can for instance do:

    (import '(org.otb.application Registry))
(def available-applications (Registry/GetAvailableApplications))


In the PWOB source tree you will find other examples.

## Final remarks

Using the REPL is fun, but you will soon need to store the lines of code, test things, debug, etc. In this case, the best choice is to use SLIME, the Superior Lisp Interaction Mode for Emacs. There are many tutorials on the net on how to set it up for Clojure. Search for it using DuckDuckGo. In the PWOB tree (classpath.clj) you will find some hints on how to set it up for OTB and Clojure. A simpler config for Emacs is to use the inferior lisp mode, for which I have also written a config (otb-clojure-config.el). I may write a post some day about that. Have fun!

20 May 2011

# Lambda Omega Lambda

In my current research work, I am investigating the possibility of integrating domain expert knowledge together with image processing techniques (segmentation, classification) in order to provide accurate land cover maps from remote sensing image time series. When we look at this in terms of tools (software) there are a set of requirements which have to be met in order to be able to go from toy problems to operational processing chains. These operational chains are needed by scientists for carbon and water cycle studies, climate change analysis, etc. In the coming years, the image data needed for this kind of near-real-time mapping will be available through Earth observation missions like ESA's Sentinel program. Briefly, the constraints that the software tools for this task have are the following.

1. Interactive environment for exploratory analysis
2. Concurrent/parallel processing
3. Availability of state of the art, validated image processing algorithms
4. Symbolic, semantic information processing

This long post / short article describes some of the thoughts I have and some of the conclusions I have come to after a rather long analysis of the tools available.

## Need for a repl

The repl acronym stands for Read-Evaluate-Print-Loop and I think has its origins in the first interactive Lisp systems. Nowadays, we call this an interactive interpreter. People used to numerical and/or scientific computing will think about Matlab, IDL, Octave, Scilab and what not. Computer scientists will see here the Python/Ruby/Tcl/etc. interpreters. Many people use nowadays Pyhton+Numpy or Scipy (now gathered in Pylab) in order to have something similar to Matlab/IDL/Scilab/Octave, but with a general purpose programming language. What a repl allows is to interactively explore the problem at hand without going through the cycle of writing the code, compiling, running etc. The simple script that I wrote for the OTB blog using OTB's Python bindings could be typed on the interpreter and bring together OTB and Pylab.

## Need for concurrent programming

Without going into the details and differences between parallel and concurrent programming, it seems clear that Moore's law can only continue to hold through multi-core architectures. In terms of low-level (pixel-wise) processing, OTB provides an interesting solution (multi-threaded execution of filters by dividing images into chunks). This approach can be generalized to GPUs. However, sometimes an algorithm needs to operate on the whole image because the image splitting affects the results. This is typically the case for Markovian approaches to filtering or methods for image segmentation. For this cases, one way to speed up things is to process several images in parallel (if the memory footprint allows that!). On way of trying to maximize the use of all available computing cores in Python is using the multiprocessing module which allows to deal with a pool of threads. One example would be as follows: However, this does not allow for easy inter-thread communication which is not needed in the above example, but can be very useful if the different processes are working on the same image: imagine a multi-agent system where classifiers, algorithms for biophysical parameter extraction, data assimilation techniques, etc. work together to produce an accurate land cover map. They may want to communicate in order to share information. As far as I understand, Python has some limitations due to the global interpreter lock. Some languages as for instance Erlang offer appropriate concurrency primitives for this. I have played a little bit with them in my exploration of 7 languages in 7 weeks. Unfortunately, there are no OTB bindings for Erlang. Scala has copied the Erlang actors, but I didn't really got into the Scala thing.

## Need for OTB access

This one here shouldn't need much explanation. Efficient remote sensing image processing needs OTB. Period. I am sorry. I am rather biased on that! I like C++ and I have no problem in using it. But there is no repl, one needs several lines of typedef before being able to use anything. This is the price to pay in order to have a good static checking of the types before running the problem. And it's damned fast! We have Python bindings which allows us to have a clean syntax, like in the pipeline example of the OTB tutorials. However, the lack of easy concurrency is a bad point for Python. Also, the lack of Artificial Intelligence frameworks for Python is an anti-feature. Java has them, but Java has no repl and look at its syntax. It's worse than C++. You have all these mangled names which were clean in C++ and Python and become things like otbImageFileReaderIUS2. Scala, thanks to its interoperability with Java (Scala runs in the JVM), can use OTB bindings. Actually, we have a cleaner syntax than Java's: There is still the problem of the mangled names, but with some pattern matching or case classes, this should disappear. So Scala seems a good candidate. It has:

1. Python-like syntax (although statically typed)
2. Concurrency primitives
3. A repl

Unfortunately, Scala is not a Lisp. Bear with me.

## A Lisp would be nice

I want to build an expert system, it would be nice to have something for remote sensing similar to Wolfram Alpha. We could call it Ω Toolbox and keep the OTB name (or close). Why Lisp? Well I am not able to explain that here, but you can read P. Norvig's or P. Graham's essays on the topic. If you have a look at books like PAIP or AIMA, or systems like LISA, CLIPS or JESS, they are either written in Lisp or the offer a Lisp-like DSLs. I am aware of implementations of the AIMA code in Python, and even P. Norvig himself has reasons to have migrated from Lisp to Python, but as stated above, Python seems to be out of the game for me. The code is data philosophy of Lisp is, as far as I understand it, together with the repl tool, one of the main assets for AI programming. Another aspect which is also important is the functional programming paradigm used in Lisp (even though other programming paradigms are also available in Lisp). Concurrency is the main reason for the upheaval of functional languages in recent years (Haskell, for instance). Even though I (still) don't see the need for pure functional programming for my applications, lambda calculus is elegant and interesting. Maybe λ Toolbox should be a more appropriate name?

## Clojure

If we recap the needs:

1. A repl (no C++, no Java)
2. Concurrency (no Python)
3. OTB bindings available (no Erlang, no Haskell, no Ruby)
4. Lisp (none of the above)

there is one single candidate: Clojure. Clojure is a Lisp dialect which runs on the JVM and has nice concurrency features like inmutability and STM and agents. And by the way, OTB bindings work like a charm: Admittedly, the syntax is less beautiful than Python's or Scala's, but (because!) it's a Lisp. And it's a better Java than Java. So you have all the Java libs available, and even Clojure specific repositories like Clojars. A particular interesting project is Incanter which provides is a Clojure-based, R-like platform for statistical computing and graphics. Have a look at this presentation to get an overview of what you can do at the Clojure repl with that. If we bear in mind that in Lisp code is data and that Lisp macros are mega-powerful, one could imagine writing things like:

    (make-otb-pipeline reader gradientFilter thresholdFilter writer)


Or even emulating the C++ template syntax to avoid using the mangled names of the OTB classes in the Java bindings (using macros and keywords):

    (def filter (RescaleIntensityImageFilter :itk (Image. : otb :Float 2)
(Image. : otb :UnsignedChar 2)))


    (def filter (itkRescaleIntensityImageFilterIF2IUC2.))


I have already found a cool syntax for using the Setters and Getters of a filter using the doto macro (see line 19 in the example below):

## Conclusion

I am going to push further the investigation of the use of Clojure because is seems to fit my needs:

1. Has an interactive interpreter
3. Concurrency primitives (agents, STM, etc.)
4. It's a lisp, so I can easily port existing rule-based expert systems.

Given the fact that this would be the sum of many cool features, I think I should call it Σ Toolbox, but I don't like the name. The mix of λ calculus and latex Ω Toolbox, should be called λΩλ, which is LoL in Greek.

19 Apr 2011

# Functional Python

Small things can make you smile when you have the Aha! moment. It seems that these few days of 7li7w are starting to have real effects in the way I think when programming. Today, I was faced to a simple but annoying problem. I had about 60 Formosat 2 acquisitions that I needed to use. These images have been pre-processed (thank Olivier!) and, because of historical reasons they are presented like this:

    Sudouest_20060206_MS_fmsat_ortho_surf_8m.B1
Sudouest_20060206_MS_fmsat_ortho_surf_8m.B2
Sudouest_20060206_MS_fmsat_ortho_surf_8m.B3
Sudouest_20060206_MS_fmsat_ortho_surf_8m.B4
Sudouest_20060206_MS_fmsat_ortho_surf_8m.hdr


That is, a single ENVI header file and the 4 bands in one file each. And the same thing for every single acquisition date. I wanted to have the list of all the acquisition dates (the 20060206 above). Since this was not a one shot operation, I wanted something a little bit more generic than opening the directory containing the images in an Emacs Dired buffer and copying and pasting the dates, so I have used Python. The thing is solved in a single line of code:

    dates = [fil.split('_')[1] for fil in os.listdir(imageDir)
if(fil.split('.')[-1]=="hdr")]


I use a list comprehension, where for each file in the imageDir directory (os.listdir), if the extension is hdr I keep the second chunk of the name of the file when _ is used as a separator.

16 Apr 2011

# 7 languages in 7 weeks

Some months ago I found a book of the Pragmatic Bookshelf which seemed intriguing to me: Seven Languages in Seven Weeks: A Pragmatic Guide to Learning Programming Languages by Bruce A. Tate.

I say that it was intriguing, because I had read not long before a very interesting essay by Peter Norvig entitled Teach Yourself Programming in Ten Years were he is very critical of the current tendency to be in a rush for learning any new skill and mostly for computer programming languages.

Since I was very much persuaded by P. Norvig's arguments, the 7li7w book upset me a little bit: I am a big fan of the Pragmatic Programmers and I couldn't believe that they were going this road! But I finally picked the book and went through quickly and found it very appealing. B. Tate, the author is very clear from the beginning: you can not pretend to learn all those languages in a week each.

The goal of the book is to present the things that make each one of these languages interesting (the paradigms, the typing, etc.) and make the reader feel the strengths and weaknesses of each one of them. Unfortunately, I didn't go further on the exploration and I forgot about the book. Several weeks ago, I found the book unexpectedly looking at me and I decided to give it a try.

The first chapter about Ruby went smoothly. I have been using Python since early 2000 and last year I played a little bit with Squeak, a version of Smalltalk. I find Ruby to be somewhere in the middle of these two languages.

When I started the chapter on Io, I realized that I had to stretch my neurons a little bit more. I started to fear that the motivation could disappear soon, so I asked for help. I e-mailed Emmanuel Christophe (a former work colleague with whom we have sometimes fun together programming and talking about crazy ideas) in order to ask him to monitor my work on the book.

As I should have expected, he proposed to join me since he also had started to read the book and never finished. So we did Io, Prolog and we are now on Scala. The source code of our project is hosted on bitbucket and available for anyone who may be interested.

26 Feb 2011

# HDF4 support for GDAL on Arch Linux

I have been trouble reading HDF files with OTB on Arch Linux for a while. I finally took the time to investigate this problem and come to a solution. At the beginning I was misled by the fact that I was able to open a HDF5 file with Monteverdi on Arch. But I finally understood that the GDAL pre-compiled package for Arch was only missing HDF4 support. It is due to the fact that HDF4 depends on libjpeg version 6, and is incompatible with the standard current version of libjpeg on Arch. So the solution is to install libjpeg6 and HDF4 from the AUR and then regenerate the gdal package, who, during the configuration phase will automatically add HDF4 support. Here are the detailed steps I took:

## Install libjpeg6 from AUR:

mkdir ~/local/packages
cd ~/local/packages
wget http://aur.archlinux.org/packages/libjpeg6/libjpeg6.tar.gz
tar xzf libjpeg6.tar.gz
cd libjpeg6
makepkg -i #will prompt for root psswd for installation


## Install HDF4 from AUR:

cd ~/local/packages
wget [[http://aur.archlinux.org/packages/hdf4-nonetcdf/hdf4-nonetcdf.tar.gz]]
tar xzf hdf4-nonetcdf.tar.gz
cd hdf4-nonetcdf
makepkg -i #will prompt for root psswd for installation


## Setup an Arch Build System build tree

sudo pacman -S abs
sudo abs
mkdir ~/local/abs


## Compile gdal

cp -r /var/abs/community/gdal ~/local/abs
makepkg -s #generated the package without installing it
sudo pacman -U gdal-1.8.0-2-x86_{64}.pkg.tar.gz


If you are new to using AUR and makepkg, please note that the AUR package installation need the sudo package to be installed first (the packages are build as a non root user and sudo is called by makepkg). Step 3 above is only needed if you have never set up the Arch Build System on your system.