In my last post, I talked about the main source of pain in C++, namely having to know the type of all objects at compile time.
Consider a simple example: I want to parse a comma-separated text file, with an a priori unknown number of columns and rows, and store the result in my C++ program. What data structure should I use? If all of the data in the file are integers, I could use a map<string, vector<int> >, with column headers as keys. If it’s a mixture of doubles and integers, I could store them all as doubles (though that already has its problems). What if some of the data is strings?
A response frequently heard from C++ die-hards is ‘Why would you want to store data like that?’ (an actual quote from comp.lang.c++: “By definition, an array is a contiguous sequence of objects of the same type. So what you're asking is not possible.”) – a typical example of how a language can warp your brain ;) .
The obvious brute force method is using void* everywhere, and then bravely casting every entry back to what you want it to be, but the scary thing about that is that it might work when it shouldn’t. A more promising option is using boost::any and any_cast for the same purpose, and the most elegant one I’m aware of is using map<string, vector<string>> (it came from a text file after all), and only converting the data to integers/doubles when you are about to access it. Far uglier solutions can be found by Googling “parse csv c++”, or by asking job candidates you happen to be interviewing.
All this creativity is spent to achieve something that is a non-issue in dynamically typed languages such as python or MATLAB, where you can simply have an array of heterogeneous objects.
A different gripe is that C++ objects are not self-aware. You cannot ask an arbitrary object what data members or functions it has, and if you are referring to it through a pointer to its base class, you can’t really tell what type it is either. Thus, again, the need for header files and hard-coded function names, and a need to rewrite glue code each time you add a new function.
Yet one more reason to hate C++ is that it is not at all cross-platform unless written with real understanding and intent to make it so. There is Windows vs. Unix, there is gcc vs. the three or so different versions of Visual C++ in circulation at any given moment, each of them with its unique set of quirks (and they also interact with different processors in interesting and surprising ways).
Why, then, do I find myself persistently using C++ anyway? The first obvious reason is runtime speed. However, this is not as much of an advantage as it seems, as I tend to spend way more time writing code and especially debugging it, than I spend waiting for the code to finish running – but still, sometimes you just need things to be fast, especially in realtime systems.
A more important reason in my opinion is that it allows you to exercise absolute control over memory allocation. That is one area where MATLAB, otherwise my preferred data exploration platform, runs into a brick wall (more on that in a later post). You don’t need control over memory allocation very often, but when you do, you really, really need it.
Neither of these is in my opinion nearly enough to justify using C++ as the primary language for quant work, but they do mean you have to know at least enough of it to do the heavy lifting when necessary.
Are there other reasons why you hate C++? Do you think the things I
hate are actually features? Is there a more elegant solution to the
comma-separated file example? Please let me know.
In spite of all my complaints, there is an undeniable beauty to C++, because of its very weirdness. Singletons, factories, recursive templates and other beasts of the abyss might not have a friendly inclination, but having tamed them yet again into actually doing something useful does give one the proverbial warm glow.
Egor
Recent Forum Discussions