When people tell me that Mathematica isn’t fast enough, I usually ask to see the offending code and often find that the problem isn’t a lack in Mathematica’s performance, but sub-optimal use of Mathematica. I thought I would share the list of things that I look for first when trying to optimize Mathematica code.
1. Use floating-point numbers if you can, and use them early.
Of the most common issues that I see when I review slow code is that the programmer has inadvertently asked Mathematica to do things more carefully than needed. Unnecessary use of exact arithmetic is the most common case.
In most numerical software, there is no such thing as exact arithmetic. 1/3 is the same thing as 0.33333333333333. That difference can be pretty important when you hit nasty, numerically unstable problems, but in the majority of tasks, floating-point numbers are good enough and, importantly, much faster. In Mathematica any number with a decimal point and less than 16 digits of input is automatically treated as a machine float, so always use the decimal point if you want speed ahead of accuracy (e.g. enter a third as 1./3.). Here is a simple example where working with floating-point numbers is nearly 40 times faster than doing the computation exactly and then converting the result to a decimal afterward. And in this case it gets the same result.
The same is true for symbolic computation. If you don’t care about the symbolic answer and are not worried about stability, then substitute numerical values as soon as you can. For example, solving this polynomial symbolically before substituting the values in causes Mathematica to produce a five-page-long intermediate symbolic expression.
But do the substitution first, and Solve will use fast numerical methods.
When working with lists of data, be consistent in your use of reals. It only takes one exact value to cause the whole dataset to have to be held in a more flexible but less efficient form.
2. Learn about Compile…
The Compile function takes Mathematica code and allows you to pre-declare the types (real, complex, etc.) and structures (value, list, matrix, etc.) of input arguments. This takes away some of the flexibility of the Mathematica language, but freed from having to worry about “What if the argument was symbolic?” and the like, Mathematica can optimize the program and create a byte code to run on its own virtual machine. Not everything can be compiled, and very simple code might not benefit, but complex low-level numerical code can get a really big speedup.
Here is an example:
Using Compile instead of Function makes the execution over 10 times faster.
But we can go further by giving Compile some hints about the parallelizable nature of the code, getting an even better result.
On my dual-core machine I get a result 150 times faster than the original; the benefit would be even greater with more cores.
Be aware though that many Mathematica functions like Table, Plot, NIntegrate, and so on automatically compile their arguments, so you won’t see any improvement when passing them compiled versions of your code.
2.5. …and use Compile to generate C code.
Furthermore, if your code is compilable, then you can also use the option CompilationTarget->“C” to generate C code, call your C compiler to compile it to a DLL, and link the DLL back into Mathematica, all automatically. There is more overhead in the compilation stage, but the DLL runs directly on your CPU, not on the Mathematica virtual machine, so the results can be even faster.
3. Use built-in functions.
Mathematica has a lot of functions. More than the average person would care to sit down and learn in one go. So it is not surprising that I often see code where someone has implemented some operation without having realized that Mathematica already knows how to do it. Not only is it a waste of time re-implementing work that is already done, but our guys are paid to worry about what the best algorithms are for different kinds of input and how to implement them efficiently, so most built-in functions are really fast.
If you find something close-but-not-quite-right, then check the options and optional arguments; often they generalize functions to cover many specialized uses or abstracted applications.
Here is such an example. If I have a list of a million 2×2 matrices that I want to turn into a list of a million flat lists of 4 elements, the conceptually easiest way might be to Map the basic Flatten operation down the list of them.
But Flatten knows how to do this whole task on its own when you specify that levels 2 and 3 of the data structure should be merged and level 1 be left alone. Specifying such details might be comparatively fiddly, but staying within Flatten to do the whole flattening job turns out to be nearly 4 times faster than re-implementing that sub-feature yourself.
So remember—do a search in the Help menu before you implement anything.
4. Use Wolfram Workbench.
Mathematica can be quite forgiving of some kinds of programming mistakes—it will proceed happily in symbolic mode if you forget to initialize a variable at the right point and doesn’t care about recursion or unexpected data types. That’s great when you just need to get a quick answer, but it will also let you get away with less than optimal solutions without realizing it.
Workbench helps in several ways. First it lets you debug and organize large code projects better, and having clean, organized code should make it easier to write good code. But the key feature in this context is the profiler that lets you see which lines of code used up the time, and how many times they were called.
Take this example, a truly horrible way (computationally speaking) to implement Fibonacci numbers. If you didn’t think about the consequences of the double recursion, you might be surprised by the 22 seconds it takes to evaluate fib[35] (about the same time it takes the built-in function to calculate all 208,987,639 digits of Fibonacci[1000000000] [see tip 3]).
Running the code in the profiler reveals the reason. The main rule is invoked 9,227,464 times, and the fib[1] value is requested 18,454,929 times.
Being told what your code really does, rather than what you thought it would do, can be a real eye-opener.
5. Remember values that you will need in the future.
This is good programming advice in any language. The Mathematica construct that you will want to know is this:
It saves the result of calling f on any value, so that if it is called again on the same value, Mathematica will not need to work it out again. You are trading speed for memory here, so it isn’t appropriate if your function is likely to be called for a huge number of values, but rarely the same ones twice. But if the possible input set is constrained, this can really help. Here it is rescuing the program that I used to illustrate tip 3. Change the first rule to this:
And it becomes immeasurably fast, since fib[35] now only requires the main rule to be evaluated 33 times. Looking up previous results prevents the need to repeatedly recurse down to fib[1].
6. Parallelize.
An increasing number of Mathematica operations will automatically parallelize over local cores (most linear algebra, image processing, and statistics), and, as we have seen, so does Compile if manually requested. But for other operations, or if you want to parallelize over remote hardware, you can use the built-in parallel programming constructs.
There is a collection of tools for this, but for very independent tasks, you can get quite a long way with just ParallelTable, ParallelMap, and ParallelTry. Each of these automatically takes care of communication, worker management, and collection of results. There is some overhead for sending the task and retrieving the result, so there is a trade-off of time gained versus time lost. Your Mathematica comes with four compute kernels, and you can scale up with gridMathematica if you have access to additional CPU power. Here, ParallelTable gives me double the performance, since it is running on my dual-core machine. More CPUs would give a better speedup.
Anything that Mathematica can do, it can also do in parallel. For example, you could send a set of parallel tasks to remote hardware, each of which compiles and runs in C or on a GPU.
6.5. Think about CUDALink and OpenCLLink.
If you have GPU hardware, there are some really fast things you can do with massive parallelization. Unless one of the built-in CUDA-optimized functions happens to do what you want, you will need to do a little work, but the CUDALink and OpenCLLink tools automate a lot of the messy details for you.
7. Use Sow and Reap to accumulate large amounts of data (not AppendTo).
Because of the flexibility of Mathematica data structures, AppendTo can’t assume that you will be appending a number, because you might equally append a document or a sound or an image. As a result, AppendTo must create a fresh copy of all of the data, restructured to accommodate the appended information. This makes it progressively slower as the data accumulates. (And the construct data=Append[data,value] is the same as AppendTo.)
Instead use Sow and Reap. Sow throws out the values that you want to accumulate, and Reap collects them and builds a data object once at the end. The following are equivalent:
8. Use Block or With rather than Module.
Block, With, and Module are all localization constructs with slightly different properties. In my experience, Block and Module are interchangeable in at least 95% of code that I write, but Block is usually faster, and in some cases With (effectively Block with the variables in a read-only state) is faster still.
9. Go easy on pattern matching.
Pattern matching is great. It can make complicated tasks easy to program. But it isn’t always fast, especially the fuzzier patterns like BlankNullSequence (usually written as “___”), which can search long and hard through your data for patterns that you, as a programmer, might already know will never be there. If execution speed matters, use tighter patterns, or none at all.
As an example, here is a rather neat way to implement a bubble sort in a single line of code using patterns:
Conceptually neat, but slow compared to this procedural approach that I was taught when I first learned programming:
Of course in this case you should use the built-in function (see tip 3), which will use better sorting algorithms than bubble sort.
10. Try doing things differently.
One of Mathematica’s great strengths is that it can tackle the same problem in different ways. It allows you to program the way you think, as opposed to reconceptualizing the problem for the style of the programming language. However, conceptual simplicity is not always the same as computational efficiency. Sometimes the easy-to-understand idea does more work than is necessary.
But another issue is that because special optimizations and smart algorithms are applied automatically in Mathematica, it is often hard to predict when something clever is going to happen. For example, here are two ways of calculating factorial, but the second is over 10 times faster.
Why? You might guess that the Do loop is slow, or all those assignments to temp take time, or that there is something else “wrong” with the first implementation, but the real reason is probably quite unexpected. Times knows a clever binary splitting trick that can be used when you have a large number of integer arguments. It is faster to recursively split the arguments into two smaller products, (1*2*…*32767)*(32768*…*65536), rather than working through the arguments from first to last. It still has to do the same number of multiplications, but fewer of them involve very big integers, and so, on average, are quicker to do. There are lots of such pieces of hidden magic in Mathematica, and more get added with each release.
Of course the best way here is to use the built-in function (tip 3 again):
Mathematica is capable of superb computational performance, and also superb robustness and accuracy, but not always both at the same time. I hope that these tips will help you to balance the sometimes conflicting needs for rapid programming, rapid execution, and accurate results.
Download this post as a Computable Document Format (CDF) file.
All timings use a Windows 7 64-bit PC with 2.66 GHz Intel Core 2 Duo and 6 GB RAM.
No comments:
Post a Comment