Making Efficient use of memory in Haskell

A summary of various techniques to make efficient use of memory in Haskell.

Introduction

In part two of my series on Haskell we spoke about identifying memory leaks and tuning the GC to improve performance. In part three we’ll be looking at a number of other techniques and libraries we have found that maximise efficient use of memory in Haskell when writing performance critical code.

So without further ado.

Lazy vs Strict

As I mentioned in my previous post, Haskell is lazily evaluated by default. This can be useful for performance as we will see later in this post, but it can also make it hard to reason about memory usage, which in turn makes it easy to introduce memory problems: if a data structure is repeatedly updated without being used, the unevaluated thunks will use up more and more memory.

In order to alleviate this problem, we keep long lived data structures strict. Beyond this the general consensus seems to be that the leaves of data structures should be strict, and the spines should be lazy so that unused sub parts do not need to be evaluated and they can be efficiently traversed. We will go into more detail about this later.

Many libraries provide strict variants of data structures. To make your own data structures strict, a “bang” can be put in front of a type, e.g. data StrictPoint = StrictPoint !Int !Int. Additionally you can force evaluation of particular expressions with seq or deepseq.

Unboxing data

By default, Haskell heap objects are stored “boxed” — they are represented as a pointer to an object on the heap. Unboxed objects on the other hand are represented as the actual raw data. In some languages these would also be referred to as reference and value types. Unboxed types do not require that a pointer (or possibly nested pointers) have to be followed to read a data structure.

As well as primitive data types (Ints, Doubles, etc), fields of algebraic data types with a single data constructor can also be unboxed. This can be done if the fields are strict (a big advantage of using strict leaves in data structures), and you use the UNPACK pragma, like so:

1data Name =
2  Name
3    {-# UNPACK #-} !String
4    {-# UNPACK #-} !String

In memory the fields will then be in-lined with the object in the heap, much like a C struct.

This can make code look ugly, so you may find that making your types strict and compiling with the -funbox-strict-fields worth trying. This is essentially equivalent to putting the UNPACK pragma on all strict fields. We found this significantly boosted the performance of our application, but it is worth testing that it is does actually help in your specific case; you may have to be more selective. Note that -funbox-small-strict-fields, which unboxes fields whose representation is smaller or equal to a pointer, is already on by default in GHC 7.10.

Unboxed data may be slower if a data structure needs to be updated. Normally updating a sub part of a data structure would only involve updating the pointer to point to the new value, but with an unboxed field the whole data structure must be rewritten.

Mutable data

Haskell Data is immutable by default. This is one of the reasons it is such a nice language to program in; it removes a whole class of errors that can occur when multiple threads are modifying a data structure. It also allows the compiler to apply more optimisations.

However the disadvantage is that every time some data is “modified”, a new object (or at least part of it) must be allocated on the heap, and the old object must be garbage collected.

The Haskell GC is fast when collecting large numbers of short lived objects, but there may be critical parts of your program where this is too slow. In this case it might be much faster to manually read and write from aligned mutable buffers.

An example of this is if you have some transformations you need to apply to data, and you know that the input and output will fit in a fixed section of memory, it may be more efficient to overwrite the existing data in place, without having to allocate more memory. When doing this low level effectful work in Haskell, we lose the usual guarantees of referential transparency. Therefore we have to be very careful that we do not introduce side effects, and test the code thoroughly.

To do this at a higher level, libraries like vector and array support mutable variants. If you need full manual control over memory layout, you could use the functions from Data.Foreign to allocate memory and manipulate pointers. Although at that level you may as well be writing C because you lose a lot of safety (hello segfaults!), so don’t go down this route if you can avoid it!

There are also some data structures that can only be implemented efficiently when they can be mutated. For example a hashmap implementation with O(1) lookups and insertions requires mutable data. The hashtables package supports this, which is why it has better time complexity than the more commonly used HashMap from unordered-containers (O(log n)). The disadvantage is that it must be modified in the IO or ST monad.

Stream fusion

Because Haskell allocates new objects whenever a function transforms some data, it can get very expensive when a list of values is transformed by multiple functions. Stream fusion is an attempt at solving this problem: essentially a series of function applications are converted into a new function that only needs to allocate enough space for the final result. This lets us continue writing in a nice functional style without having to do low level memory management as described above. This Stack Overflow answer gives a nice high level overview, and the library that lets you do this is called stream-fusion.

Streaming data

Laziness

Haskell’s laziness can be a nice way of using very low total memory. Laziness means that an expression is evaluated only when the result is needed. This means that if you read from some data source lazily, and write it out lazily, only the chunk currently being processed will need to be resident in memory at any one time. Even though your source code will look like everything is loaded into memory before writing it, at runtime the memory use will be constant.

Pipes/Conduit

Another way of processing streaming data efficiently is to use the pipes or conduit libraries. Unlike lazy IO, they allow effects to be interleaved and provide deterministic resource usage. They also provide very nice abstractions for architecting your program provided it fits into the conceptual model of a data processing pipeline.

Conclusion

In my last three posts we have seen how it can be difficult to reason memory usage in Haskell, which in turn can make memory leaks and performance issues common. This is one of the symptoms of its abstractness; the code is far removed from what is actually happening on the machine. The fact that you may need to really understand what the runtime and GC are doing implies to me that the high level abstractions baked into the language are not always watertight.

On the other hand it is great to see some innovative libraries and techniques (e.g. stream fusion, pipes, conduit) that help move us towards the point where Haskell developers can continue to write high level code without worrying about the issues I have described in this post.

As usual the parts where performance matters significantly should only affect a small portion of your codebase, so the UNPACKs should be minimal. Often the libraries you use will have already done most of this work for you. Most of the time we have found finding the issues to be the challenging part. Once you know enough to understand the problem, you should know enough to fix it.

We’re always exploring new ways to use Haskell, so any tips and/or corrections are greatly appreciated :) feel free to get in touch, I’m @willsewell_ on Twitter.

Docs

Tutorials

Support

Glossary