28 Jul 2014

Why C++?

Recently, on the otb-users mailing list somebody asked why C++ was chosen to implement OTB.

The main reasons why I chose C++ for OTB were the following (by order of relevance):

Most of the existing libraries on which OTB was going to rely on were written in C++. Not only ITK, which is at the core of OTB, but also OSSIM and gdal.
C++ can be as efficient as C and has higher level abstraction mechanisms allowing for cleaner code and architecture (templates, classes, name spaces, etc.).
C++ was the language I was most familiar with.

C++ was first introduced in the late 70's and was first called "C with classes" since C++ was designed to provide Simula's facilities for program organization together with C's efficiency and flexibility for systems programming. The class concept (with derived classes and virtual functions) was borrowed from the Simula programming language.

C++ supports different programming styles:

procedural and imperative styles due to its closeness to C,
object orientation with classes, virtual functions and inheritance,
generic programming through templates,
functional programming through function objects.

Let's have a look at these different items.

Procedural and imperative styles: the C style

C++ has the same tools as C for imperative programming. So there are the classical control flow mechanisms such as for and while loops, if statements, etc.

C++ supports procedural programming via functions, which are like C functions but with more strict type checking of the arguments. Unlike C, function arguments can be passed by reference, which is like passing a pointer to the object, but using the same syntax as if the object itself was passed. This allows to pass large objects without copying them as with pointers, but the programmer can choose to declare the reference as constant so that the object can't be modified inside the function.

Object orientation: C with classes

C++ has classes which are composite types (that is, they can contain other objects) like C structs, but they can also have associated methods, or member functions. At first, object orientation in C++ can be seen very as similar to Java's or Python's in terms of what a class can contain and how it is used. The main difference is that C++ uses value semantics. In clear, when a variable of a given class is created in C++, it is a value, like an int or a double. In Java or Python, this is not the case. In these languages, what you get is a reference to an object, not the object itself, and therefore, you can't manage memory as you would want to and a garbage collector is needed in order to free memory when the object is not used anymore.

In C++ you can choose to have an object (the default behavior) and in this case, like an int or a double, it will be allocated in the stack and be deallocated automatically when it goes out of scope (typically at the end of the block, when a closing curly brace is found). So there is no need for garbage collector, since no garbage is generated. Java programmers will appreciate this joke …

On the other hand, you can choose to use pointers to the objects, and then the objects are allocated in the free store and the programmer is responsible for freeing the memory.

C++ offers multiple inheritance (unlike Java) and virtual functions (like Java).

C++ also offers operator overloading. This means that arithmetic operators like + for addition or * for multiplication can be redefined for each class. This is useful for instance for a matrix class. Other operators like [], -> can also be overloaded.

Generic programming: templates and the STL

Generic programming can be used in C++ thanks to templates. Class templates are classes defined in terms of generic types. That is, a class can contain an object of type T which is not defined. All code, like member functions, etc. can be written in terms of this generic type.

When a programmer wants to use this class template, she has to say which is the concrete type that will be used instead of T. This is very useful when algorithms are exactly the same for different data types like integers, floating point numbers, etc. In languages which don't support generic programming, the same algorithm has to be rewritten for every different type, or the programmer has to chose the best type (for some definition of best) for which to write the algorithm.

There are also function templates, which are functions defined in terms of generic types.

The mechanism behind templates generates, at compile time, the code for the concrete types which will be as efficient as hand-written code for these types.

C++ comes with the Standard Template Library, the STL, which provides generic containers and generic algorithms. Generic containers are containers like vectors, lists, etc. whose elements are generic (like ints, doubles, strings or any other type or class). Generic algorithms are algorithms which operate on generic types (like ints, doubles, strings or any other type or class). These algorithms operate on ranges inside a container, so they are the same for a vector of ints or for a list of doubles. These ranges are defined by iterators, which are like pointers or coordinates inside the container.

In this way, if you have N types of containers and M different algorithms, you don't have to write NxM versions of the code, but just N+M. So less code to write and maintain. I am a huge fan of templates.

Functional programming

Functional programming is not just programming using functions, but rather using functions as first class citizens. That means being able to pass functions as arguments to other functions and return them as results.

In C, one can achieve this by using pointers to functions, but this is risky and ugly, in terms of function signature.

In C++ we can define functions as types, and therefore they can be used as arguments to and return values from other functions. The way of creating a function type (in C++ we call them function objects or functors) is to create a class and use operator overloading. In the same way as + can be overloaded in a matrix class, the () operator can be overloaded and therefore if myfunc is an object of the class F where the operator has been overloaded, we can use myfunc() like a normal function call.

And therefore, now we can define functions which take or return objects of class F and implement functional programming.

The STL uses extensively function objects in order to tune the behavior of algorithms. For example, the STL find_if algorithm returns the position of first element in a container for which a particular condition is true. It takes as one of its arguments the function which will be used to test if the elements of the container verify a given condition.

Conclusion

As far as I know, C++ offers the best trade-off between efficiency and abstraction level. Or as Bjarne Stroustrup said in a recent interview:

The essence of C++ is that it provides a direct map to hardware and offers mechanisms for very general zero-overhead abstraction.

Since C++ can be as efficient as C, I don't see any reason to use C for a new programming project. Learning C is still useful if one needs to modify existing C code. However, if one wants to use an existing C library in a new project, C++ seems a better choice.

The recent C++11 and C++14 versions of the language (which is by the way an ISO standard) have introduced many new features which make the use of C++ easier and this is giving a sort of renaissance to the language.

Nowadays, added to its classical uses in systems programming and scientific applications, C++ is starting to be used in mobile applications.