Jordi Inglada

Posts tagged "cpp":

13 May 2015

A simple thread pool to run parallel PROSail simulations

In the otb-bv we use the OTB versions of the Prospect and Sail models to perform satellite reflectance simulations of vegetation.

The code for the simulation of a single sample uses the ProSail simulator configured with the satellite relative spectral responses, the acquisition parameters (angles) and the biophysical variables (leaf pigments, LAI, etc.):

ProSailType prosail;
prosail.SetRSR(satRSR);
prosail.SetParameters(prosailPars);
prosail.SetBVs(bio_vars);
auto result = prosail();

A simulation is computationally expensive and it would be difficult to parallelize the code. However, if many simulations are going to be run, it is worth using all the available CPU cores in the machine.

I show below how using C++11 standard support for threads allows to easily run many simulations in parallel.

Each simulation uses a set of variables given in an input file. We parse the sample file in order to get the input parameters for each sample and we construct a vector of simulations with the appropriate size to store the results.

otbAppLogINFO("Processing simulations ..." << std::endl);
auto bv_vec = parse_bv_sample_file(m_SampleFile);
auto sampleCount = bv_vec.size();
otbAppLogINFO("" << sampleCount << " samples read."<< std::endl);
std::vector<SimulationType> simus{sampleCount};

The simulation function is actually a lambda which will sequentially process a sequence of samples and store the results into the simus vector. We capture by reference the parameters which are the same for all simulations (the satellite relative spectral responses satRSR and the acquisition angles in prosailPars):

auto simulator = [&](std::vector<BVType>::const_iterator sample_first,
                     std::vector<BVType>::const_iterator sample_last,
                     std::vector<SimulationType>::iterator simu_first){
  ProSailType prosail;
  prosail.SetRSR(satRSR);
  prosail.SetParameters(prosailPars);
  while(sample_first != sample_last)
    {
    prosail.SetBVs(*sample_first);
    *simu_first = prosail();
    ++sample_first;
    ++simu_first;
    }
};    

We start by figuring out how to split the simulation into concurrent threads. How many cores are there?

auto num_threads = std::thread::hardware_concurrency();
otbAppLogINFO("" << num_threads << " CPUs available."<< std::endl);

So we define the size of the chunks we are going to run in parallel and we prepare a vector to store the threads:

auto block_size = sampleCount/num_threads;
if(num_threads>=sampleCount) block_size = sampleCount;
std::vector<std::thread> threads(num_threads);

Here, I choose to use as many threads as cores available, but this could be changed by a multiplicative factor if we know, for instance that disk I/O will introduce some idle time for each thread.

An now we can fill the vector with the threads that will process every block of simulations :

auto input_start = std::begin(bv_vec);
auto output_start = std::begin(simus);
for(size_t t=0; t<num_threads; ++t)
  {
  auto input_end = input_start;
  std::advance(input_end, block_size);
  threads[t] = std::thread(simulator,
                           input_start,
                           input_end,
                           output_start);
  input_start = input_end;
  std::advance(output_start, block_size);
  }

The std::thread takes the name of the function object to be called, followed by the arguments of this function, which in our case are the iterators to the beginning and the end of the block of samples to be processed and the iterator of the output vector where the results will be stored. We use std::advance to update the iterator positions.

After that, we have a vector of threads which have been started concurrently. In order to make sure that they have finished before trying to write the results to disk, we call join on each thread, which results in waiting for each thread to end:

std::for_each(threads.begin(),threads.end(),
              std::mem_fn(&std::thread::join));
otbAppLogINFO("" << sampleCount << " samples processed."<< std::endl);
for(const auto& s : simus)
  this->WriteSimulation(s);

This may no be the most efficient solution, nor the most general one. Using std::async and std::future would have allowed not to have to deal with block sizes, but in this solution we can easily decide the number of parallel threads that we want to use, which may be useful in a server shared with other users.

Tags: remote-sensing otb research programming open-source cpp algorithms
25 Mar 2015

Scripting in C++

Introduction

This quote of B. Klemens1 that I used in my previous post made me think a little bit:

"I spent much of my life ignoring the fundamentals of computing and just hacking together projects using the package or language of the month: C++, Mathematica, Octave, Perl, Python, Java, Scheme, S-PLUS, Stata, R, and probably a few others that I’ve forgotten."

Although much less skilled than M. Klemens, I have also wandered around the programming language landscape, out of curiosity, but also with particular needs in mind.

For example, I listed a set of features that I found useful for easily going from prototyping to operational code for remote sensing image processing. The main 3 bullet points were:

  1. OTB accessibility, which meant C++, Python or Java (and its derivatives as Clojure);
  2. Robust concurrent programming, which ruled out Python;
  3. Having a REPL, which led me to say silly things and even develop working code.

Before going the OTB-from-LISP-through-bindings route, I had looked for C++ interpreters to come close to a REPL, some are available, but, at least at the time, I didn't manage to get one running.

Since I started using C++11 a little bit, I find more and more pleasant to write C++ even for small things that I would usually use Python for. Add to that all the new things in the standard library, like threads, regular expressions, tuples and proper random number generators, just to name a few, and here I am wondering how to make the edit/compile/run loop shorter.

B. Klemens proposes crazy things like compiling via cut/paste, but this is too extreme even for me. Since I use emacs, I can compile from within the editor just by hitting C-c C-c which asks for a command to compile. I usually do things like:

export ITK_AUTOLOAD_PATH="" && cd ~/Dev/builds/otb-bv/ && make \
    && ctest -I 8,8 -VV && gnuplot ~/Dev/otb-bv/data/plot_bvMultiT.gpl

I am nearly there, but not yet.

Remember that I discovered Klemens because I was looking for statistical libs after watching some videos of the Coursera Data Science specialization? One of the courses is about reproducible research and they show how you can mix together R code and Markdown to produce executable research papers.

When I saw this R Markdown approach, I thought, well, that's nice, but inferior to org-babel. The main advantage of org-babel with respect to R Markdown or IPython is that in org-mode you can use different languages in the same document.

In a recent research report written in org-mode and exported to PDF, I used, in the same document, Python, bash, gnuplot and maxima. This is a real use case. Of course, these are all interpreted languages, so it is easy to embed them in documents and talk to the interpreter.

Well, 15 years after I started using emacs and more than 6 years after discovering org-mode, I found that compiled languages can be used with org-babel. At least, decent compiled languages, since there is no mention of Objective-C or Java2.

Setting up and testing

So how one does use C++ with org-babel? Just drop this into your emacs configuration:

Yes, don't forget the flag for C++11, since you want to be cool.

Now, let the fun begin. Open an org buffer and write:

#+begin_src C++ :includes <iostream>
std::cout << "Hello scripting world\n";
#+end_src

Go inside the code block and hit C-c C-c and you will magically see this appear just below the code block:

#+RESULTS:
: Hello scripting world

No problem. You are welcome. Good-bye.

If you want to go beyond the Hello world example keep reading.

Using the standard library

You have noticed that the first line of the code block says it is C++ and gives information about the header files to include. If you need to include more than one header file3, just make a list of them (remember, emacs is a LISP machine):

#+begin_src C++ :includes (list "<iostream>" "<vector>")
std::vector<int> v{1,2,3};
for(auto& val : v) std::cout << val << "\n";
#+end_src

#+RESULTS:
| 1 |
| 2 |
| 3 |

Do you see the nice formatted results? This is an org-mode table. More fun with the standard library and tables:

#+begin_src C++ :includes (list "<iostream>" "<string>" "<map>")
std::map<std::string, int> m{{"pizza",1},{"muffin",2},{"cake",3}};
for(auto& val : m) std::cout << val.first << " " << val.second << "\n";
#+end_src

#+RESULTS:
| cake   | 3 |
| muffin | 2 |
| pizza  | 1 |

Using other libraries

Sometimes, you may want to use other libraries than the C++ standard one4. Just give the header files as above and also the flags (here I split the header of the org-babel block in 2 lines for readability):

#+header: :includes (list "<gsl/gsl_fit.h>" "<stdio.h>") 
#+begin_src C++ :exports both :flags (list "-lgsl" "-lgslcblas")
int i, n = 4;
double x[4] = { 1970, 1980, 1990, 2000 };
double y[4] = {   12,   11,   14,   13 };
double w[4] = {  0.1,  0.2,  0.3,  0.4 };

double c0, c1, cov00, cov01, cov11, chisq;

gsl_fit_wlinear (x, 1, w, 1, y, 1, n, 
                 &c0, &c1, &cov00, &cov01, &cov11, 
                 &chisq);

printf ("# best fit: Y = %g + %g X\n", c0, c1);
printf ("# covariance matrix:\n");
printf ("# [ %g, %g\n#   %g, %g]\n", 
        cov00, cov01, cov01, cov11);
printf ("# chisq = %g\n", chisq);

for (i = 0; i < n; i++)
  printf ("data: %g %g %g\n", 
          x[i], y[i], 1/sqrt(w[i]));

printf ("\n");

for (i = -30; i < 130; i++)
  {
  double xf = x[0] + (i/100.0) * (x[n-1] - x[0]);
  double yf, yf_err;

  gsl_fit_linear_est (xf, 
                      c0, c1, 
                      cov00, cov01, cov11, 
                      &yf, &yf_err);

  printf ("fit: %g %g\n", xf, yf);
  printf ("hi : %g %g\n", xf, yf + yf_err);
  printf ("lo : %g %g\n", xf, yf - yf_err);
  }
#+end_src

#+RESULTS:
| #     | best       |    fit: |       Y | = | -106.6 | + | 0.06 | X |
| #     | covariance | matrix: |         |   |        |   |      |   |
| #     | [          |  39602, |   -19.9 |   |        |   |      |   |
| #     | -19.9,     |   0.01] |         |   |        |   |      |   |
| #     | chisq      |       = |     0.8 |   |        |   |      |   |

etc.

Defining things outside main

You may have noticed that we are really scripting like perl hackers, since there is no main() function in our code snippets. org-babel kindly generates all the boilerplate for us. Thank you.

But wait, what happens if I want to define or declare things outside main?

No problem, just tell org-babel not to generate a main():

#+begin_src C++ :includes <iostream> :main no
struct myStr{int i;};

int main()
{
myStr ms{2};
std::cout << "The value of the member is " << ms.i << std::endl;
}
#+end_src

#+RESULTS:
: The value of the member is 2

There are many other options you can play with and I suggest to read the documentation. Just 2 more useful things.

Using org-mode tables as data

You saw that org-mode has tables. They are very powerful and can even be used as a spreadsheet. Let's see how they can be used as input to C++ code.

Let's start by giving a name to our table:

#+tblname: somedata
|  sqr | noise |
|------+-------|
|  0.0 |  0.23 |
|  1.0 |  1.31 |
|  4.0 |  4.61 |
|  9.0 |  9.05 |
| 16.0 | 16.55 |

Now, we can refer to it from org-babel using variables:

#+header: :exports results :includes <iostream>
#+begin_src C++ :var somedata=somedata :var somedata_cols=2 :var somedata_rows=5
for (int i=0; i<somedata_rows; i++)
  {
  for (int j=0; j<somedata_cols; j++) 
    {
    std::cout << somedata[i][j]/2.0 << "\t";
    }
  std::cout << "\n";
  }
#+end_src

#+RESULTS:
|   0 | 0.115 |
| 0.5 | 0.655 |
|   2 | 2.305 |
| 4.5 | 4.525 |
|   8 | 8.275 |

Tangle when happy

Once you have scripted your solution to save the world from hunger5, you may want to get it out from org-mode and upload it to github6. You can of course copy and paste the code, but since you are using a superior editor, you can ask it to do that7 for you. Just tell it the name of your file using the tangle option:

#+begin_src C++ :tangle MyApp.cpp :main no :includes <iostream>
namespace ns {
    class MyClass {};
}

int main(int argc, char* argv[])
{
ns::MyClass{};
std::cout << "hi" << std::endl;
return 0;
}
#+end_src

If you the call org-babel-tangle from this block, it will generate a file with the right name containing this:

#include <iostream>
namespace ns {
    class MyClass {};
}

int main(int argc, char* argv[])
{
ns::MyClass{};
std::cout << "hi" << std::endl;
return 0;
}

Conclusion

Since reproducible research in emacs has entered the realm of scientific journals8, I will end as follows:

In this article, we have shown the feasibility of coding in C++ without writing much boilerplate and going through the edit/compile/run loop. With the advent of modern C++ standards and richer standard libraries, C++ is a good candidate to quickly write throw-away code. The integration of C++ in org-babel gets C++ even closer to the tasks for which scripting languages are used for.

Future work will include learning to use the C++ standard library (data structures and algorithms), browsing the catalog of BOOST libraries, trying to get good C++ developers to use emacs and C++11, and find a generic variadic-template-based solution to world hunger.

Footnotes:

1

Ben Klemens, Modeling with Data: Tools and Techniques for Scientific Computing, Princeton University Press, 2009, ISBN: 9780691133140.

2

Well, this is just trolling, since Java is in the list of supported languages, and I guess that you can use gcc for Objective-C.

3

This may happen if you are writing code for a nuclear power plant.

4

To guide nuclear missiles towards the nuclear power plant of the evil guys, for instance.

5

Just send some nuclear bombs to those who are hungry?

6

Isn't that cool, to be on the same cloud as all those Javascript programmers?

7

But tell it to make coffee first.

8

Eric Schulte, Dan Davison, Tom Dye, Carsten Dominik. A Multi-Language Computing Environment for Literate Programming and Reproducible Research Journal of Statistical Software http://www.jstatsoft.org/v46/i03

Tags: programming cpp linux emacs org-mode
18 Mar 2015

Data science in C?

Coursera is offering a Data Science specialization taught by professors from Johns Hopkins. I discovered it via one of their courses which is about reproducible research. I have been watching some of the video lectures and they are very interesting, since they combine data processing, programming and statistics.

The courses use the R language which is free software and is one the reference tools in the field of statistics, but it is also very much used for other kinds of data analysis.

While I was watching some of the lectures, I had some ideas to be implemented in some of my tools. Although I have used GSL and VXL1 for linear algebra and optimization, I have never really needed statistics libraries in C or C++, so I ducked a bit and found the apophenia library2, which builds on top of GSL, SQLite and Glib to provide very useful tools and data structures to do statistics in C.

Browsing a little bit more, I found that the author of apophenia has written a book "Modeling with data"3, which teaches you statistics like many books about R, but using C, SQLite and gnuplot.

This is the kind of technical book that I like most: good math and code, no fluff, just stuff!

The author sets the stage from the foreword. An example from page xii (the emphasis is mine):

" The politics of software

All of the software in this book is free software, meaning that it may be freely downloaded and distributed. This is because the book focuses on portability and replicability, and if you need to purchase a license every time you switch computers, then the code is not portable. If you redistribute a functioning program that you wrote based on the GSL or Apophenia, then you need to redistribute both the compiled final program and the source code you used to write the program. If you are publishing an academic work, you should be doing this anyway. If you are in a situation where you will distribute only the output of an analysis, there are no obligations at all. This book is also reliant on POSIX-compliant systems, because such systems were built from the ground up for writing and running replicable and portable projects. This does not exclude any current operating system (OS): current members of the Microsoft Windows family of OSes claim POSIX compliance, as do all OSes ending in X (Mac OS X, Linux, UNIX,…)."

Of course, the author knows the usual complaints about programming in C (or C++ for that matter) and spends many pages explaining his choice:

"I spent much of my life ignoring the fundamentals of computing and just hacking together projects using the package or language of the month: C++, Mathematica, Octave, Perl, Python, Java, Scheme, S-PLUS, Stata, R, and probably a few others that I’ve forgotten. Albee (1960, p 30)4 explains that “sometimes it’s necessary to go a long distance out of the way in order to come back a short distance correctly;” this is the distance I’ve gone to arrive at writing a book on data-oriented computing using a general and basic computing language. For the purpose of modeling with data, I have found C to be an easier and more pleasant language than the purpose-built alternatives—especially after I worked out that I could ignore much of the advice from books written in the 1980s and apply the techniques I learned from the scripting languages."

The author explains that C is a very simple language:

" Simplicity

C is a super-simple language. Its syntax has no special tricks for polymorphic operators, abstract classes, virtual inheritance, lexical scoping, lambda expressions, or other such arcana, meaning that you have less to learn. Those features are certainly helpful in their place, but without them C has already proven to be sufficient for writing some impressive programs, like the Mac and Linux operating systems and most of the stats packages listed above."

And he makes it really simple, since he actually teaches you C in one chapter of 50 pages (and 50 pages counting source code is not that much!). He does not teach you all C, though:

"As for the syntax of C, this chapter will cover only a subset. C has 32 keywords and this book will only use 18 of them."

At one point in the introduction I worried about the author bashing C++, which I like very much, but he actually gives a good explanation of the differences between C and C++ (emphasis and footnotes are mine):

"This is the appropriate time to answer a common intro-to-C question: What is the difference between C and C++? There is much confusion due to the almost-compatible syntax and similar name—when explaining the name C-double-plus5, the language’s author references the Newspeak language used in George Orwell’s 1984 (Orwell, 19496; Stroustrup, 1986, p 47). The key difference is that C++ adds a second scope paradigm on top of C’s file- and function-based scope: object-oriented scope. In this system, functions are bound to objects, where an object is effectively a struct holding several variables and functions. Variables that are private to the object are in scope only for functions bound to the object, while those that are public are in scope whenever the object itself is in scope. In C, think of one file as an object: all variables declared inside the file are private, and all those declared in a header file are public. Only those functions that have a declaration in the header file can be called outside of the file. But the real difference between C and C++ is in philosophy: C++ is intended to allow for the mixing of various styles of programming, of which object-oriented coding is one. C++ therefore includes a number of other features, such as yet another type of scope called namespaces, templates and other tools for representing more abstract structures, and a large standard library of templates. Thus, C represents a philosophy of keeping the language as simple and unchanging as possible, even if it means passing up on useful additions; C++ represents an all-inclusive philosophy, choosing additional features and conveniences over parsimony."

It is actually funny that I find myself using less and less class inheritance and leaning towards small functions (often templates) and when I use classes, it is usually to create functors. This is certainly due to the influence of the Algorithmic journeys of Alex Stepanov.

Footnotes:

1

Actually the numerics module, vnl.

2

Apophenia is the experience of perceiving patterns or connections in random or meaningless data.

3

Ben Klemens, Modeling with Data: Tools and Techniques for Scientific Computing, Princeton University Press, 2009, ISBN: 9780691133140.

4

Albee, Edward. 1960. The American Dream and Zoo Story. Signet.

6

Orwell, George. 1949. 1984. Secker and Warburg.

7

Stroustrup, Bjarne. 1986. The C++ Programming Language. Addison-Wesley.

Tags: otb research programming books open-source free-software c cpp linux algorithms
28 Jul 2014

Why C++?

Recently, on the otb-users mailing list somebody asked why C++ was chosen to implement OTB.

The main reasons why I chose C++ for OTB were the following (by order of relevance):

  1. Most of the existing libraries on which OTB was going to rely on were written in C++. Not only ITK, which is at the core of OTB, but also OSSIM and gdal.
  2. C++ can be as efficient as C and has higher level abstraction mechanisms allowing for cleaner code and architecture (templates, classes, name spaces, etc.).
  3. C++ was the language I was most familiar with.

C++ was first introduced in the late 70's and was first called "C with classes" since C++ was designed to provide Simula's facilities for program organization together with C's efficiency and flexibility for systems programming. The class concept (with derived classes and virtual functions) was borrowed from the Simula programming language.

C++ supports different programming styles:

Let's have a look at these different items.

Procedural and imperative styles: the C style

C++ has the same tools as C for imperative programming. So there are the classical control flow mechanisms such as for and while loops, if statements, etc.

C++ supports procedural programming via functions, which are like C functions but with more strict type checking of the arguments. Unlike C, function arguments can be passed by reference, which is like passing a pointer to the object, but using the same syntax as if the object itself was passed. This allows to pass large objects without copying them as with pointers, but the programmer can choose to declare the reference as constant so that the object can't be modified inside the function.

Object orientation: C with classes

C++ has classes which are composite types (that is, they can contain other objects) like C structs, but they can also have associated methods, or member functions. At first, object orientation in C++ can be seen very as similar to Java's or Python's in terms of what a class can contain and how it is used. The main difference is that C++ uses value semantics. In clear, when a variable of a given class is created in C++, it is a value, like an int or a double. In Java or Python, this is not the case. In these languages, what you get is a reference to an object, not the object itself, and therefore, you can't manage memory as you would want to and a garbage collector is needed in order to free memory when the object is not used anymore.

In C++ you can choose to have an object (the default behavior) and in this case, like an int or a double, it will be allocated in the stack and be deallocated automatically when it goes out of scope (typically at the end of the block, when a closing curly brace is found). So there is no need for garbage collector, since no garbage is generated. Java programmers will appreciate this joke …

On the other hand, you can choose to use pointers to the objects, and then the objects are allocated in the free store and the programmer is responsible for freeing the memory.

C++ offers multiple inheritance (unlike Java) and virtual functions (like Java).

C++ also offers operator overloading. This means that arithmetic operators like + for addition or * for multiplication can be redefined for each class. This is useful for instance for a matrix class. Other operators like [], -> can also be overloaded.

Generic programming: templates and the STL

Generic programming can be used in C++ thanks to templates. Class templates are classes defined in terms of generic types. That is, a class can contain an object of type T which is not defined. All code, like member functions, etc. can be written in terms of this generic type.

When a programmer wants to use this class template, she has to say which is the concrete type that will be used instead of T. This is very useful when algorithms are exactly the same for different data types like integers, floating point numbers, etc. In languages which don't support generic programming, the same algorithm has to be rewritten for every different type, or the programmer has to chose the best type (for some definition of best) for which to write the algorithm.

There are also function templates, which are functions defined in terms of generic types.

The mechanism behind templates generates, at compile time, the code for the concrete types which will be as efficient as hand-written code for these types.

C++ comes with the Standard Template Library, the STL, which provides generic containers and generic algorithms. Generic containers are containers like vectors, lists, etc. whose elements are generic (like ints, doubles, strings or any other type or class). Generic algorithms are algorithms which operate on generic types (like ints, doubles, strings or any other type or class). These algorithms operate on ranges inside a container, so they are the same for a vector of ints or for a list of doubles. These ranges are defined by iterators, which are like pointers or coordinates inside the container.

In this way, if you have N types of containers and M different algorithms, you don't have to write NxM versions of the code, but just N+M. So less code to write and maintain. I am a huge fan of templates.

Functional programming

Functional programming is not just programming using functions, but rather using functions as first class citizens. That means being able to pass functions as arguments to other functions and return them as results.

In C, one can achieve this by using pointers to functions, but this is risky and ugly, in terms of function signature.

In C++ we can define functions as types, and therefore they can be used as arguments to and return values from other functions. The way of creating a function type (in C++ we call them function objects or functors) is to create a class and use operator overloading. In the same way as + can be overloaded in a matrix class, the () operator can be overloaded and therefore if myfunc is an object of the class F where the operator has been overloaded, we can use myfunc() like a normal function call.

And therefore, now we can define functions which take or return objects of class F and implement functional programming.

The STL uses extensively function objects in order to tune the behavior of algorithms. For example, the STL find_if algorithm returns the position of first element in a container for which a particular condition is true. It takes as one of its arguments the function which will be used to test if the elements of the container verify a given condition.

Conclusion

As far as I know, C++ offers the best trade-off between efficiency and abstraction level. Or as Bjarne Stroustrup said in a recent interview:

The essence of C++ is that it provides a direct map to hardware and offers mechanisms for very general zero-overhead abstraction.

Since C++ can be as efficient as C, I don't see any reason to use C for a new programming project. Learning C is still useful if one needs to modify existing C code. However, if one wants to use an existing C library in a new project, C++ seems a better choice.

The recent C++11 and C++14 versions of the language (which is by the way an ISO standard) have introduced many new features which make the use of C++ easier and this is giving a sort of renaissance to the language.

Nowadays, added to its classical uses in systems programming and scientific applications, C++ is starting to be used in mobile applications.

Tags: cpp programming-languages otb
16 Jan 2014

OTB++11

Modern C++

I have just recently had a look at the last C++ ISO standard (known as C++11). I was not aware of all the fancy features that have been included in the language.

Some of the most interesting features are:

  • auto,
  • range-based for loops,
  • general constant expressions,
  • uniform initialisation,
  • type aliases,
  • decltype,
  • variadic templates,
  • type safe threading,
  • user-defined literals,
  • move semantics

There are also new containers and algorithms in the STL.

During the end of the year break, I spent some time reading and watching on line presentations about the new C++. I recommend watching Bjarne Stroustrup's keynote at Going Native 2012 (link) and Scott Meyer's talk at Going Native 2013 where he even anticipates some C++14 features (link).

Stroustrup introduces his talk by saying:

I use C++11 freely. Examples include auto, general constant expressions, uniform initialisation, type aliases, type safe threading, and user-defined literals. C++11 features are only just starting to appear in production compilers, so some of my suggestions are conjecture. Developing a "modern style," however, is essential if we don't want to maintain newly-written 1970s and 1980s style code in 2020.

After listening at him, I realised that the most easy to use features alone could make C++ programming much more fun and simple. Since I have spent some time in the last 2 years flirting with functional and dynamic typed (or at least with type inference) languages (Scheme, Clojure, Python, SML), I wanted to see how using C++11 affects the style of OTB code.

As usual, GCC has been the first compiler to make available these new features (closely followed by Clang). C++11 is available in recent versions of GCC using -std=c++11. So I recompiled OTB with this flag and started modifying some of the Software Guide examples using C++11 features.

All in all, I can say that these features make OTB code shorter, clearer and maybe safer. I report some examples here.

Examples

auto

The auto keyword allows to declare variables without explicitly stating their type and puts this burden on the compiler. This is not dynamic typing, this is type inference and nothing changes in terms of what the compiler generates. So you can write:

auto x = sqrt(2.0);

and the type of x will be deduced from the return value type of sqrt. You can also do this:

auto x=0;
auto y=0.0;

and x will be of type int and y will be of type double. If you want to force other types, you have to use the classical declarations.

How to benefit from auto when using OTB? A common pattern in OTB code is to instantiate a smart pointer using the New() class method. Sometimes, we just want to create one object of a given type and we do this:

typedef otb::ImageFileReader<ImageType> ReaderType;
ReaderType::Pointer reader = ReaderType::New();

typedef otb::ImageFileWriter<ImageType> WriterType;
WriterType::Pointer writer = WriterType::New();

in order to create a reader and a writer. That is, since type names are long we define a type using typedef and then we use it twice, once for Pointer and once for New.

Using auto, we can reduce the boilerplate like this:

auto reader = otb::ImageFileReader<ImageType>::New();
auto writer = otb::ImageFileWriter<ImageType>::New();

The same applies for filters, for which usually we only use one instance:

typedef itk::RescaleIntensityImageFilter<ImageType,
                                         OutputImageType> RescalerType;
RescalerType::Pointer rescaler = RescalerType::New();

becomes

auto rescaler = itk::RescaleIntensityImageFilter<ImageType,
                                                 OutputImageType>::New();

which is once again much shorter.

Another example in a for loop:

for (PolygonListType::Iterator it = polygonList->Begin();
     it != polygonList->End(); ++it)
  {
  DataNodeType::Pointer newPolygon = DataNodeType::New();
  newPolygon->SetPolygonExteriorRing(it.Get());
  tree->Add(newPolygon, multiPolygon);
  }

becomes

for (auto it = polygonList->Begin();
     it != polygonList->End(); ++it)
  {
  auto newPolygon = DataNodeType::New();
  newPolygon->SetPolygonExteriorRing(it.Get());
  tree->Add(newPolygon, multiPolygon);
  }

using

The using keyword allows to define template aliases. Whenever we write this:

typedef otb::Image<unsigned int, 2> ImageType;
typedef otb::VectorImage<unsigned int, 2> VectorImageType;
typedef otb::Image<double, 2> DoubleImageType;
typedef otb::VectorImage<double, 2> DoubleVectorImageType;

we could do this:

using ImageType = otb::Image<unsigned int, 2>;
using VectorImageType = otb::VectorImage<unsigned int, 2>;
using DoubleImageType = otb::Image<double, 2>;
using DoubleVectorImageType = otb::VectorImage<double, 2>;

For people like me who have always found counterintuitive the order of the types in a typedef, this is much clear. I know that the code is not shorter, but for a C++ beginner, this should be much clearer.

One cool thing that using allows and that typedef doesn't is partial template specialisation1:

template<typename T> 
using Image2D = otb::Image<T, 2>;
auto im = Image2D<int>::New();  

decltype

Sometimes, we don't want to mentally track the type names, but we know that to objects should have the same type. decltype allows to declare the type of one object by using the name of another object:

ImageType::IndexType start = {0, 0};
decltype(start) start2 = {0, 0};

In OTB, at first, I didn't see a way to use decltype on smart pointers and templates, since we usually manipulate ObjectType::Pointer, but the template parameters are simply ObjectType. Fortunately, ITK designers declared an ObjectType in the SmartPointer class which is an alias for the type of the pointed object.

I found an example where this is useful (the orthofusion example in the Tutorial chapter of the Software Guide).

The original code defines a type for a projection and creates an object:

typedef otb::GenericMapProjection<otb::TransformDirection::INVERSE>
        InverseProjectionType;

InverseProjectionType::Pointer utmMapProjection =
                               InverseProjectionType::New();

Later in the program, we can use the defined type as a template parameter for a filter:

typedef otb::OrthoRectificationFilter<ImageType, DoubleImageType,
                                      InverseProjectionType> 
             OrthoRectifFilterType;
OrthoRectifFilterType::Pointer orthoRectifPAN =
                               OrthoRectifFilterType::New();

If we created the projection without defining the type like this:

auto utmMapProjection =
     otb::GenericMapProjection<otb::TransformDirection::INVERSE>::New();

how can we then instantiate the filter with a type which has not been defined? Like this:

auto orthoRectifPAN =
      otb::OrthoRectificationFilter<ImageType,
                                    DoubleImageType,
                                    decltype(utmMapProjection)::ObjectType>::New();

I am not sure whether this is a good solution in production code which has to be maintained long time after it was written, but it is for sure an interesting solution when prototyping processing chains.

Lambdas

Lambdas, or anonymous functions, are one of the most exciting things in C++11 when one has used them in LISP-like languages or even in Python. They make fast prototyping much easier mainly when making a heavy use of the STL algorithms because they can replace functors or plain functions.

Usually, STL algorithms can be specialised by passing them functions as parameters. If one wants to store state, a functor or function object (a class which defines the () operator) is used.

The inconvenient of functions and functors is that they are defined away from where they are used, so they are not ideal when they are only used once. Functors are worse than functions since they need much boilerplate for only one useful method.

Lambdas are the ideal solution for this, since they are defined locally. They can store state by capturing local variables either by reference or by value.

OTB iterators are not fully compatible with the STL, so one can think that lambdas would not be very useful with OTB specific code. But actually, there are many filters which can be implemented by specialising one of the N-ary functor filters. These filters are useful when one wants to implement a filtering operation which can't be obtained with the otb::BandMathFilter, as for instance when a local neighbourhood is needed.

One example of this is the change detection framework. A long time ago when I wrote this example I had to use a lot of code to achieve a simple thing like computing the mean difference around a pixel.

First of all, I needed a functor to implement the processing in the neighbourhood:

template<class TInput1, class TInput2, class TOutput>
class MyChangeDetector
{
public:
  MyChangeDetector() {}
  ~MyChangeDetector() {}
  inline TOutput operator ()(const TInput1& itA,
                             const TInput2& itB)
  {
    TOutput result = 0.0;

    for (unsigned long pos = 0; pos < itA.Size(); ++pos)
      {

      result += static_cast<TOutput>(itA.GetPixel(pos) - itB.GetPixel(pos));

      }
    return static_cast<TOutput>(result / itA.Size());
  }
};

That is, a class definition with empty constructor and destructor. Then I needed to inherit from the binary functor which performs a neighbourhood filtering and specialise it using my functor.

template <class TInputImage1, class TInputImage2, class TOutputImage>
class ITK_EXPORT MyChangeDetectorImageFilter :
  public otb::BinaryFunctorNeighborhoodImageFilter<
      TInputImage1, TInputImage2, TOutputImage,
      MyChangeDetector<
          typename itk::ConstNeighborhoodIterator<TInputImage1>,
          typename itk::ConstNeighborhoodIterator<TInputImage2>,
          typename TOutputImage::PixelType> >
{
public:
  typedef MyChangeDetectorImageFilter Self;

  typedef typename otb::BinaryFunctorNeighborhoodImageFilter<
      TInputImage1, TInputImage2, TOutputImage,
      MyChangeDetector<
          typename itk::ConstNeighborhoodIterator<TInputImage1>,
          typename itk::ConstNeighborhoodIterator<TInputImage2>,
          typename TOutputImage::PixelType>
      >  Superclass;

  typedef itk::SmartPointer<Self>       Pointer;
  typedef itk::SmartPointer<const Self> ConstPointer;

  itkNewMacro(Self);

protected:
  MyChangeDetectorImageFilter() {}
  virtual ~MyChangeDetectorImageFilter() {}

private:
  MyChangeDetectorImageFilter(const Self &);
  void operator =(const Self&);

};

So again, a class definition for nothing. I could maybe just have defined a type, but the code using it would not have been as simple as this:

typedef MyChangeDetectorImageFilter<InputImageType1, InputImageType2,
    ChangeImageType>      FilterType;
FilterType::Pointer   filter = FilterType::New();

Even if I had defined a type alias for the filter in order to avoid to inherit from the existing filter, the functor definition is an overkill and it is located far from where it is used.

Using lambdas, the code becomes this:

using CPT = ChangeImageType::PixelType;
using CNIType1 = itk::ConstNeighborhoodIterator<InputImageType1>;
using CNIType2 = itk::ConstNeighborhoodIterator<InputImageType2>;

That is, 3 lines of type aliases for a simpler notation. Then the lambda:

std::function<CPT (CNIType1, CNIType2)>
  cdt = [](CNIType1 itA, CNIType2 itB){
  CPT result{0};
  for (auto pos = 0; pos < itA.Size(); ++pos)
    result += static_cast<CPT>(itA.GetPixel(pos) - itB.GetPixel(pos));
  return static_cast<CPT>(result / itA.Size());
};                                            

And finally the filter instantiation:

auto filter = otb::BinaryFunctorNeighborhoodImageFilter<InputImageType1,
                                                        InputImageType2,
                                                        ChangeImageType,
                                                        decltype(cdt)>::New();

If we had wanted to have a function with state, objects in the scope where the lambda is defined can be captured by value or by reference within the [], so we are very close to what a functor may achieve with much less code.

The main advantage here is that all the code is in the same location thanks to the possibility of creating a function anywhere in the sources. If I understand it well, lambdas are inlined, which means that the code is faster than a function call.

Conclusion

I have only started to use this kind of code, but I think that once you start using auto and lambdas it must be very difficult to go back to C++98/03.

The only drawback with using C++11 is dropping compatibility with old compilers, but this is not an issue for most of the things I do. So I can start writing modern OTB code instead of poo (plain old OTB)!

Footnotes:

1

With GCC 4.7 this only seems to work out of a scoped block.

Tags: cpp otb
Other posts
Creative Commons License
jordiinglada.net by Jordi Inglada is licensed under a Creative Commons Attribution-ShareAlike 4.0 Unported License. RSS. Feedback: info /at/ jordiinglada.net Mastodon