May 15, 2007

Egor Kraev: Just two more items on my MATLAB wishlist

I know you have probably had enough of MATLAB by now, and this is indeed the last post in this series. After I wrote the last one, I realized there are two more things I have long been wishing for.

The first one is simple: MATLAB does allow you to call perl scripts straight from its command line. However, as far as I could see, it doesn’t allow to call perl functions, and thus not to pass any kind of objects besides strings between perl and MATLAB.

Now in my view, this renders that feature virtually useless. The one time I’ve used a perl script it was pumping data from a remote source – but the only halfway reasonable way to put that data into MATLAB that I could find was to open a socket and pump it over as a string, and parse that in MATLAB – how much easier would it be if I could return a perl array of doubles, or a hash of such arrays, and MATLAB would convert it!

This, then, is my next-to-last wish: allow to call perl, and python, functions from inside MATLAB, and do on-the-fly conversion of at least arrays, strings, and structs to/from their perl/python equivalents.

My very very last wish is a bit trickier to explain. Suppose you have a vector v of length N, and a square NxN matrix A, and you want to multiply each row of A by the corresponding entry of V. Currently, you have to construct a diagonal matrix from v, and then multiply it by A, and try to remember whether you need to do A=diag(v)*A or A=A*diag(v). Now suppose that instead of one colon operator, you had a family of named colon operators – that you could just write

A(:a, :b) = A(:a,:b).*v(:a)

This makes it rather clearer which indexes of A are matched to those of v, no?

Now suppose you wanted to sum across some of those ranges, but not all – then you could mark those that you want to sum over by an exclamation sign, say. Then, multiplying a vector by a matrix could be written as

sum(A(:a,:b!) .* v(:b!)).

MATLAB already has matrix multiplication? Sure, but what about multiplying two multi-dimensional tensors across a given dimension, such as

A(:a,:b,:d)=sum( B(:a,:c!,:d).* C(:c!,:b) )

(Yes, there are legitimate modelling uses for this kind of thing!)

In true physicist fashion, once ! is seen in an expression using colon operators, the summing might even be automatic, without needing to write sum().

No existing code would be affected, as all the expressions I suggest are not legal in current MATLAB syntax.

Would that not be a neat extension to the already concise and powerful colon operator?

This is the very last post in this series. Coming up: a discussion of HDF5, my favourite format for structured numerical data.

Send in your comments!
Egor

May 10, 2007

Egor Kraev: What I miss most in MATLAB

My last post dealt with the annoyances of MATLAB. This, the final post of the series, is about one bit of extra functionality that is not, unlike most of the entries in my last post, embarrassing not to have (that is, if you hold them to the supremely high standard of the rest of MATLAB), but rather an extra feature that would make my using MATLAB a lot more productive yet.

The request is very simple to state: make using external C++ code with MATLAB as easy as using external Java code. When using Java, you issue one command to load the jars, and you can instantiate any objects therein, call their functions and receive the results. As you do so, MATLAB auto-converts a lot of native Java types to/from MATLAB types, such as strings and double arrays. Thus, the integration if the java classes is truly seamless.

Now compare that to the options of using C++ in MATLAB: either you need to write wrappers to make the C++ code available as MEX functions (and first you need to understand the C++ object model for MATLAB types); or you can load existing dlls, but only if they have been C-linked and you have access to the header files and the header files are simple enough that the MATLAB built-in parser can digest them. To me, neither of the options has been worth the effort involved, so far.

How nice it would be if you could just load an existing library and instantiate its classes, with MATLAB doing at least simple conversions such as strings to std::strings, MATLAB arrays to std::vectors, and std::maps to MATLAB structs? That would allow me to do moving-boundary coding, ‘graduating’ bits of code into C++ as they mature, yet continue to use them in MATLAB. Wouldn’t that be cool?

Yes, I know such tricks are inherently easier in Java than in C++, but I’m sure the smart people at MATLAB could come up with a way. Maybe leverage the CINT interpreter?

Egor

May 08, 2007

Egor Kraev: The bits of MATLAB I really hate!

While there are a lot of things to like in MATLAB, there are of course also annoyances and things I miss. I’ll devote this post to itemizing annoyances and bits of behaviour that in my opinion are just plain embarrassing; next post will be devoted to an additional major bit of functionality that I would like to see that’d make MATLAB an even cooler environment to work in. (Most of the problems refer to version 2006b, the one I’ve been playing with. If any of these have been fixed in 2007a, please let me know).

 

  • The debugger loses settings and breakpoints. If I want to go into debug mode each time an error is encountered, I have to set that option EVERY TIME I start MATLAB. Also, during a session I often find the breakpoint I’d set has disappeared, and the program happily zaps past the spot I wanted to drill into
  • In the editor, there is no option for collapsing cells, for-loops, comment blocks etc. Even Visual Studio, hardly anybody’s candidate for The Friendliest GUI Ever, can do that!
  • The only way to spawn a separate process is in a DOS window – this means if want to run a script in the background that launches, say, a handful of perl processes, each of these pops up in a DOS window and steals my focus (that actually happened to me).
  • Secondary axis plotting. Even Excel can do that. Plot some series with scaling on the left axis, some more series on the same plot with scaling on the right axis, sharing the x axis – how hard is that? And no, plotyy is not nearly good enough; and ‘this is not easy, because of the way MATLAB graphics work’ is not a good enough excuse.
  • The HDF5 file format support is a great thing to have, but the implementation at least in 2006b appears to have a major memory leak. At least, reading a 300MB HDF5 file increased MATLAB memory consumption by the same amount (fair enough in itself), and neither clearing all variables nor any other action I could think of, short of restarting MATLAB, managed to release that memory. Also, why does the MATLAB version not support szip decompression? I know compression must be licensed, but decompression is free - why not enable it?
  • That brings me to my major gripe, namely memory management. MATLAB runs in a Java virtual machine, and appears bound by its memory limit. If I try to set the JVM memory limit to over 1GB (on a 2GB RAM machine), it won’t even start. After working with 100+ MB matrices, after some point memory fragmentation appears to set in, so that you can’t create any reasonably-sized objects before restarting the program. There is the ‘pack’ command that is supposed to remedy that, but you can’t call it from scripts, only interactively (why???), and even then it does not always help greatly. Likewise, the textscan command for parsing long strings is reasonably fast, but requires the strings to fit into contiguous RAM chunks (which, as you can see from the above, is a tough call). If there’s not enough space, it gives up. Did MATLAB programmers never hear about swap?

Do you have your own favourite MATLAB annoyances? Can you tell me of ways to fix the above problems? Do tell.

Egor

May 06, 2007

Egor Kraev: Why I Love MATLAB: the GUI

  • The side of MATLAB that in my opinion puts it well ahead of any comparable software that I’m aware of is the very mature integration of the command line with the Graphical User Interface. For example, you can select any portion of code in the editor, and right-click to evaluate it at the command line. Also, most interactive GUIs (such as plot windows in interactive mode, or the wonderful Distribution Fitting Tool) will generate a script for you that will reproduce the result, so that you can tweak things with the mouse and then look at that code to find out how to script it.
  • Another nifty feature is the ‘cell mode’ (not related to cell arrays mentioned in my last post): by formatting your comments in a particular way, you can split your files into ‘cells’ and execute them one cell at a time using ctrl-enter (and a range of related tricks).
  • Oh, and you can dock/undock any window from the MDI mother window – so on a four-monitor screen, I have a swarm of detached windows, and when I’m remoting into my machine, they can all be docked nicely into one window. A minor thing perhaps, but having to break my thought flow to hunt for a lost window is just so annoying and distracting.
  • Thus, I’d say MATLAB manages to give me the best of both the command line and the GUI world by making scripts more interactive and creating scripts from my mouse-clicks on GUIs (though admittedly some bits of that code are more readable than others).
  • It has the tools you’d expect from any decent development environment, such as a debugger and a profiler (on a side note, does S-plus have a debugger yet? I’m pretty sure they didn’t a year ago), and also on-the-fly code analyzer that points out, for example, if a variable is written to but never read (usually signifying a typo), and even occasionally suggests a better way of using a command.
  • Finally, MATLAB graphing facilities are just awesome. I’d keep it around just to do my plotting even if it couldn’t do anything else.

So much for the praises – it is no accident I’m using MATLAB for most of my data exploration and prototyping. The next two posts will be devoted to annoyances and things I wish MATLAB had but doesn’t.

Egor

May 04, 2007

Egor Kraev: Why I Love MATLAB: The Language

A bad workman always blames his tools. But even a good workman finds himself wasting a lot of time when faced with tools not adequate to the job at hand. Thus, after a lengthy exploration of confidence intervals, I’m devoting the next couple of posts to one of my favourite tools, MATLAB

Why do I like it so much? Well, there’s two reasons. First of all, there’s the language – wonderfully concise and expressive. Second, the mature GUI. I’ll devote the next post to the GUI, and talk about the language today. It starts simple - just treat everything as a matrix; operations on vectors can be written really simply. For example, if you are given a vector x and want to extract the excess of each element over a threshold t, but only if that excess is positive, all you have to do is to write y=x(x>t)-t;

Suppose you want to make a function that does that, and call it thresh:

           thresh=@(x0,t0) x0(x0>t0)-t0;

Then y=thresh(x1, t1) will give exactly the same result as above. And the ‘function handle’ thus created can be assigned to variables and stored in cell arrays, as described next.

  • If you want to store heterogeneous objects, there is something called a ‘cell array’ which is like a matrix whose elements can be anything – structs, matrices, other cell arrays, functions, etc. And the objects are self-aware – there is a bunch of functions to ask an unknown object whether it’s a struct, a number, or a function.

  • Another feature that makes MATLAB ideal for prototyping and data exploration is that if you assign to something that doesn’t exist, it’s normally created for you, be it a variable, a field in a struct, or extra elements in an array.

  • A final little-known feature of MATLAB that I find really cool is its close Java integration. After issuing a single command to load your jar, the classes therein can be instantiated inside MATLAB and intermixed freely with native objects. With just a little effort, it’s also possible to hook MATLAB up to Eclipse via JDWP, so that you can set breakpoints in your Java code in Eclipse, manipulate the java objects inside MATLAB, and use Eclipse debugger if they throw exceptions or hit your breakpoints. Thus, MATLAB can serve as a scripting playground for your Java code.

The other side of MATLAB that I like a lot is the really mature integration of the command line with the GUI/development environment, which I’ll discuss in the next post.

Do you also have your favourite corners of the language? Do let me know.

Egor

February 07, 2007

Egor Kraev: Why I Hate C++ and Still Use It (Part 2)

In my last post, I talked about the main source of pain in C++, namely having to know the type of all objects at compile time.

Consider a simple example: I want to parse a comma-separated text file, with an a priori unknown number of columns and rows, and store the result in my C++ program. What data structure should I use? If all of the data in the file are integers, I could use a map<string, vector<int> >, with column headers as keys. If it’s a mixture of doubles and integers, I could store them all as doubles (though that already has its problems). What if some of the data is strings?

A response frequently heard from C++ die-hards is ‘Why would you want to store data like that?’ (an actual quote from comp.lang.c++: “By definition, an array is a contiguous sequence of objects of the same type. So what you're asking is not possible.”) – a typical example of how a language can warp your brain ;) .

The obvious brute force method is using void* everywhere, and then bravely casting every entry back to what you want it to be, but the scary thing about that is that it might work when it shouldn’t. A more promising option is using boost::any and any_cast for the same purpose, and the most elegant one I’m aware of is using map<string, vector<string>> (it came from a text file after all), and only converting the data to integers/doubles when you are about to access it. Far uglier solutions can be found by Googling “parse csv c++”, or by asking job candidates you happen to be interviewing.

All this creativity is spent to achieve something that is a non-issue in dynamically typed languages such as python or MATLAB, where you can simply have an array of heterogeneous objects.

A different gripe is that C++ objects are not self-aware. You cannot ask an arbitrary object what data members or functions it has, and if you are referring to it through a pointer to its base class, you can’t really tell what type it is either. Thus, again, the need for header files and hard-coded function names, and a need to rewrite glue code each time you add a new function.

Yet one more reason to hate C++ is that it is not at all cross-platform unless written with real understanding and intent to make it so. There is Windows vs. Unix, there is gcc vs. the three or so different versions of Visual C++ in circulation at any given moment, each of them with its unique set of quirks (and they also interact with different processors in interesting and surprising ways).

Why, then, do I find myself persistently using C++ anyway? The first obvious reason is runtime speed. However, this is not as much of an advantage as it seems, as I tend to spend way more time writing code and especially debugging it, than I spend waiting for the code to finish running – but still, sometimes you just need things to be fast, especially in realtime systems.

A more important reason in my opinion is that it allows you to exercise absolute control over memory allocation. That is one area where MATLAB, otherwise my preferred data exploration platform, runs into a brick wall (more on that in a later post). You don’t need control over memory allocation very often, but when you do, you really, really need it.

Neither of these is in my opinion nearly enough to justify using C++ as the primary language for quant work, but they do mean you have to know at least enough of it to do the heavy lifting when necessary.

Are there other reasons why you hate C++? Do you think the things I hate are actually features? Is there a more elegant solution to the comma-separated file example? Please let me know.

In spite of all my complaints, there is an undeniable beauty to C++, because of its very weirdness. Singletons, factories, recursive templates and other beasts of the abyss might not have a friendly inclination, but having tamed them yet again into actually doing something useful does give one the proverbial warm glow.

Egor

C(omp) Search


WWW
compplusplus.com

C(omp) Community

Could this be you? Thijs van den Berg Dr. Jörg Kienitz Bjarne Stroustrup Dr. Egor Kraev Daniel Duffy Andrea Germani Umberto Cherubini Luigi Ballabio

More Members

Meet the Editorial Team



C(omp) Feeds


Want to know when new posts and features are made available? Sign up to receive email notifications by entering your email address:

Delivered by FeedBurner



Any Comments?

Send in questions for our authors and bloggers: comp@wiley.co.uk



C(omp) Events

1) 13-15 November: Quant Invest 2007
Russell Hotel, London
Key speakers include Sushil Wadhhwani, Paul Wilmott and Deborah Fuhr...

2) 30 November: CCCP Mathematical Finance Conference
Princeton University
Speakers include Paul Glasserman, Peter Carr and Rama Cont.

3) 10-14 December: Risk Minds 2007
President Wilson, Geneva

4) 12-15 December: Quantitative Methods in Finance
Manly Pacific Hotel
Sydney, Australia
Speakers include Mark Joshi

5) January 2008: Distance Learning for Financial Engineers
Computational and Quantitative Finance in C++
Datasim Education BV


Recent Forum Discussions


C(omp) Calendar

June 2008
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30