Thursday, September 28. 2006Cache Coherent Data Structures in C++: A Real Page-TurnerTrackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
Good article, I like it. Very important concept for beginner-intermediate programmers, especially in peformance critical environment.
Good article, with one exception: std::binary_search doesn't actually return an iterator to the found element, it merely returns a bool indicating if the element is found. The stl algorithm that returns an iterator to this element is std::lower_bound (which has the complexity of a binary search).
As for a free implementation of memory pools, check out boost's memory pool library. Relted to the "Binary Search Table" structure: binary searching a vector is NOT cache-coherent; the fact that the elements are nicely put in a contiguous array is irrelevant when calling STL's lower_bound. So I don't see how binary searching on a vector is related to cache-coherent structures. Nor can I see it eve being a good idea if you have to insert or delete elements (O(n) time vs. O(log n) time for BSTs), unless of course you have a very small number of elements (though in this case, linear search might actually be better than binary search, precisely because of cache coherency).
Good points. I'm certainly not saying that vectors are a magic bullet that are going to solve all your cache-coherency problems; I'm just saying that they're a tool in your toolbox that can be very effective when you understand the problem you're facing. A vector is certainly more likely to be contained within a page or two than a data structure made up of dynamically allocated nodes--but it depends on the size of the vector, the way you access the data structure, and all kinds of factors. It's some food for thought.
Addendum: you also mention STL heap algorithms, which (like binary search) are not cache-coherent on a vector.
You also mention storing trees like red-black on top of an array. This is again far from helping the cache-coherent issues, as in general the daughters of a node will not be close to the node itself. While linearizing trees can help performance (e.g. by eliminating child pointers), it is not because it improves cache coherency. Also, you seem to suggest that it is important for data to span contiguous memory PAGES. This is not true; in fact, contiguous virtual memory pages probably don't even correspond to contiguous physical memory pages. Caches operate in blocks, which are of sizes on the order of 32-128 bytes; you want to minimize the number of blocks you "touch". It doesn't matter if you touch 100 blocks within the same page, or 100 blocks each in a different page. So for example having a lot of arrays of size 4096 (page size) should be the same as having a single contiguous array (provided of course you can somehow ensure proper page alignment of the 4k arrays). With these issues in mind, I am now unsure of what "advanced" things you can do with a vector to ensure cache-coherency of operations. This is not cache coherency!
(It is important stuff, though.) Cache coherency is about the hardware making sure that all the entities that talk to memory (be they CPUs or devices) have the same idea of what's actually /in/ the memory. That's not easy when caches are involved but it's crucial for keeping down the complexity of operating systems (and user-space programs that try to be smart). When I say cache coherency, what I mean is attempting to keep related data, and hopefully entire data structures, in contiguous memory so that when it's accessed by your code, it can all load into the CPU cache at once (or close to all at once) and reduce the number of cache misses you encounter while performing operations on it. This is critical on modern game consoles, where memory access is quite slow relative to cache access.
You have a good point, though--there are other meanings to the term "cache coherency" as well. Good article. About memory pools, I think the book Modern C++ Design (http://www.amazon.com/Modern-Design-Programming-Patterns-Depth/dp/0201704315/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1219507495&sr=8-1_ already has good abstraction and template implementation for pools (See chapter "Small Object Allocation")
|
QuicksearchArchives |