Seven at One Stroke: Results from a Cache-Oblivious Paradigm for Scalable Matrix Algorithms

Michael D. Adams and David S. Wise

Status: Published

Abstract

A blossoming paradigm for block-recursive matrix algorithms is presented that, at once, attains excellent performance measured by

time
TLB misses
L1 misses
L2 misses
paging to disk
scaling on distributed processors, and
portability to multiple platforms.

It provides a philosophy and tools that allow the programmer to deal with the memory hierarchy invisibly, from L1 and L2 to TLB, paging, and interprocessor communication. Used together, they provide a cache-oblivious style of programming.

Plots are presented to support these claims on an implementation of Cholesky factorization crafted directly from the paradigm in C with a few intrinsic calls. The results in this paper focus on low-level performance, including the new Morton-hybrid representation to take advantage of hardware and compiler optimizations. In particular, this code beats Intel’s Matrix Kernel Library and matches AMD’s Core Math Library, losing a bit on L1 misses while winning decisively on TLB-misses.

Keywords

Cache misses, TLB, Paging, Cholesky factorization, quad-trees, Morton-hybrid, parallel processing

Citation

Michael D. Adams and David S. Wise. Seven at one stroke: Results from a cache-oblivious paradigm for scalable matrix algorithms. In MSPC ’06: Proceedings of the 2006 workshop on Memory system performance and correctness, pages 41–50. ACM, New York, NY, USA, 2006. ISBN 1-59593-578-9. doi: 10.1145/1178597.1178604.

BibTeX Entry

@inproceedings{adams2006seven,
  author = {Adams, Michael D. and Wise, David S.},
  title = {Seven at One Stroke: Results from a Cache-Oblivious Paradigm for Scalable Matrix Algorithms},
  booktitle = {MSPC '06: Proceedings of the 2006 workshop on Memory system performance and correctness},
  pages = {41--50},
  year = {2006},
  address = {New York, NY, USA},
  publisher = {ACM},
  isbn = {1-59593-578-9},
  doi = {10.1145/1178597.1178604},
}

Copyright Notice

© ACM, 2006. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in MSPC ’06: Proceedings of the 2006 workshop on Memory system performance and correctness, (2006). http://doi.acm.org/10.1145/1178597.1178604.