Multi-core optimization for conjugate gradient benchmark onheterogeneous processors

来源期刊:中南大学学报(英文版)2011年第2期

论文作者:邓林 窦勇

文章页码:490 - 498

Key words:multi-core processor; NAS parallelization; CG; memory optimization

Abstract: Developing parallel applications on heterogeneous processors is facing the challenges of ‘memory wall’, due to limited capacity of local storage, limited bandwidth and long latency for memory access. Aiming at this problem, a parallelization approach was proposed with six memory optimization schemes for CG, four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20, the parallelization approach can reach up to 21 and 133 times speedups with size A and B, respectively, compared with single power processor element. Finally, the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV, simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.

相关论文

  • 暂无!

相关知识点

  • 暂无!

有色金属在线官网  |   会议  |   在线投稿  |   购买纸书  |   科技图书馆

中南大学出版社 技术支持 版权声明   电话:0731-88830515 88830516   传真:0731-88710482   Email:administrator@cnnmol.com

互联网出版许可证:(署)网出证(京)字第342号   京ICP备17050991号-6      京公网安备11010802042557号