Thursday, September 28. 2006Cache Coherent Data Structures in C++: A Real Page-TurnerOnce upon a time, cache coherency was a problem that, for the most part, was faced by game programmers and systems programmers. The basic problem is this: given an application that requires a large amount of memory, an operating system that supports virtual memory, and a very small amount of physical memory, the application can experience "death by swapping". If the data that the application needs to operate on at any given time is scattered all over the app's memory space, rather than stored contiguously in memory, and physical memory can't store all the pages that the data is found on at the same time, then the operating system will constantly be swapping memory in and out of physical memory, causing major slowdown due to disk access. If you happen to be iterating on data that's in this situation--well, you're sunk. Now days, with the growing presence of multi-core processors, the same problem is cropping up for everyone, albeit in a different context and on a different scale. In a multi-core environment, access to main memory becomes a bottleneck. When assigning work to different cores, you want to have all of the data a particular core is working on in the processor's on-chip caches, so the core isn't constantly waiting for memory to be moved between cache and main memory. Given the size of caches on today's processors, it's not always possible to have all your data in the cache at one time--but you can still minimize the amount of time spent moving memory. Again, if your data is scattered all over the place, the core that's operating on it is likely to spend most of its time waiting for data to be fetched from main memory--kinda defeating the purpose of multiple cores. Luckily, a good general solution to both of these problems is to use cache-coherent data structures, and it's generally pretty easy to do so. Cache-coherent data structures are data structures that cause your data to be stored contiguously in memory--meaning that instead of scattering your data over thousands of virtual pages, you can ensure that it gets written to just a few. This also optimizes memory prefetch on most modern processors, which fetch blocks of main memory into cache at a time rather than just the bytes you ask for. With that in mind, here are some easy ways to use cache-coherent data structures to keep your app out of disk and processor cache-miss hell: Prefer Array-Based Data Structures Over Pointer-Linked Data StructuresYep, turns out everything they said was good for you in your college Data Structures class is bad for you. When storing like data elements in a collection, the type of pointer-linked data structures in which you grow the collection by allocating one data element at a time and linking it to the rest of the collection are bad news. Because you're allocating a large amount of small bits of memory over time, you're likely to spread your data over many, many pages. Instead, it's best to implement your data structures using arrays as the underlying storage mechanism. As you probably remember from that same Data Structures class, just about any data structure that can be implemented using pointers can be implemented using arrays--you just link between array indices rather than memory locations. You can grow the array as necessary using realloc() (or your operating system's equivalent). This way, all of your data is stored contiguously in memory and will span the minimum number of pages possible. The downside to this is that implementing complex data structures on top of arrays is a big pain. Ever try to implement a red-black tree on top of an array? Not fun. And of course you have to manage "holes" in your array caused by deleted elements--either by shifting array elements around and fixing up links, or by reusing array elements. But for simple data structures, it's not such a bad thing--replacing that pointer-based linked list, stack or queue with an array-based implementation will give you a big cache win for not too much effort. For more complex data structures, the STL can help you out a bit, as we'll see below. Let STL Vectors Help You OutSTL Vectors are probably one of the most underrated classes in the Standard Library. They're amazingly flexible, and especially handy for our purposes, since in the vast majority of STL implementations, they use an array as their underlying storage mechanism. The standard library specification says that vectors have the following properties: O(1) insertion and deletion at the end of the vector, O(n) insertion and deletion anywhere else in the vector, O(1) element access, and of course O(n) search (unless you apply some tricks, as mentioned below). The only exception to this is when the vector has to expand and reallocates the array; depending on your STL implementation and your operating system's realloc implementation, that operation may take O(n) when it happens. If you're good about presizing the vector and managing its capacity, you can avoid this cost a lot of the time. With that in mind, let's take a look at some advanced functionality you can implement with a vector, while still maintaining cache coherency:
Even more interesting possibilities open up in certain situations when you start to look at the STL heap algorithms, but, for the sake of brevity, I'll leave that as an exercise for the reader. Use Memory PoolsMemory pools are a great way to ensure cache coherency in situations where you need to allocate a lot of small objects. The basic idea of a memory pool is that you allocate a large number of fixed-sized objects (probably using an array or vector). Then, as you need those objects, you request them from the memory pool, which gives you a pointer to an unused object in its object store. When you're done with the object, you notify the memory pool and it "frees" that object. Using this technique, you can use new/delete semantics and maintain cache coherency. It's almost like having your cake and eating it too! There are a few requirements, of course. Nothing's free--that's why this is only like having your cake and eating it too. First of all, the objects in your pool need the ability to return to a freshly initialized state. Since they will most likely be reused, but you can't re-construct them, you'll have to write code to reset them to pristine condition. In addition to this, you will need to keep track of which objects in your memory pool are in use and which aren't; this can be achieved in a number of ways, such as maintaining a free list or a bitfield indicating the state of each object in the pool. You'll also have to decide whether your memory pool can grow or not; if it can, that implies some additional coding. The good news is that it's relatively easy to implement a memory pool as a template class, and instantiate it on any class you want to pool that follows the guidelines above. An actual implementation of that is beyond the scope of this article--but might make a good blog entry later on (or perhaps readers can point me to an existing, free implementation). What If I Really, Really, Really Want to Use STL?Simple: use vector! Seriously, if you have a burning need to use the pointer-linked STL classes and want to ensure some measure of cache coherency, it can be done. My suggestion is to employ memory pooling. As mentioned above, memory pools preserve some of the new/delete semantic; therefore, it should be possible to write a custom STL allocator that uses a memory pool as storage. ConclusionThere you have it--some ways to ease the pain of working on a system with low physical memory, or on a multicore system. Of course, there are many other techniques for cache coherency, and I encourage all of you to comment with any that you know about/recommend. Before I send you forth into the world with your new-found knowledge, some advice: remember that there are situations where cache coherency is critical, and some where it's not. If you've got plenty of physical memory and/or aren't worried about your processor cache, using these techniques is overkill. Linked lists, sets, maps, and pointers will do just fine. Custom data structures are always more difficult to maintain then pre-existing data structures like those provided by STL. So make your choices wisely. Copyright (c) 2006, Matt Kimmel. All Rights Reserved.
(Page 1 of 1, totaling 1 entries)
|
QuicksearchSyndicate This Blog |

