A blossoming paradigm for block-recursive matrix algorithms is presented that, at once, attains excellent performance measured by
It provides a philosophy and tools that allow the programmer to deal with the memory hierarchy invisibly, from L1 and L2 to TLB, paging, and interprocessor communication. Used together, they provide a cache-oblivious style of programming.
Plots are presented to support these claims on an implementation of Cholesky factorization crafted directly from the paradigm in C with a few intrinsic calls. The results in this paper focus on low-level performance, including the new Morton-hybrid representation to take advantage of hardware and compiler optimizations. In particular, this code beats Intel’s Matrix Kernel Library and matches AMD’s Core Math Library, losing a bit on L1 misses while winning decisively on TLB-misses.
Cache misses, TLB, Paging, Cholesky factorization, quad-trees, Morton-hybrid, parallel processing
and . Seven at one stroke: Results from a cache-oblivious paradigm for scalable matrix algorithms. In MSPC ’06: Proceedings of the 2006 workshop on Memory system performance and correctness, pages 41–50. ACM, New York, NY, USA, 2006. ISBN 1-59593-578-9. doi: 10.1145/1178597.1178604.
@inproceedings{adams2006seven,
author = {Adams, Michael D. and Wise, David S.},
title = {Seven at One Stroke: Results from a Cache-Oblivious Paradigm for Scalable Matrix Algorithms},
booktitle = {MSPC '06: Proceedings of the 2006 workshop on Memory system performance and correctness},
pages = {41--50},
year = {2006},
address = {New York, NY, USA},
publisher = {ACM},
isbn = {1-59593-578-9},
doi = {10.1145/1178597.1178604},
}