| 2017 | A performance analysis framework for exploiting GPU microarchitectural capability. Keren Zhou, Guangming Tan, Xiuxia Zhang, Chaowei Wang, Ninghui Sun |
| 2017 | Automatic topology mapping of diverse large-scale parallel applications. Juan J. Galvez, Nikhil Jain, Laxmikant V. Kalé |
| 2017 | Carpool: a bufferless on-chip network supporting adaptive multicast and hotspot alleviation. Xi-Yue Xiang, Wentao Shi, Saugata Ghose, Lu Peng, Onur Mutlu, Nian-Feng Tzeng |
| 2017 | Compile-time optimized and statically scheduled N-D convnet primitives for multi-core and many-core (Xeon Phi) CPUs. Aleksandar Zlateski, H. Sebastian Seung |
| 2017 | Demystifying automata processing: GPUs, FPGAs or Micron's AP? Marziyeh Nourian, Xiang Wang, Xiaodong Yu, Wu-chun Feng, Michela Becchi |
| 2017 | Design and implementation of bandwidth-aware memory placement and migration policies for heterogeneous memory systems. Seongdae Yu, Seongbeom Park, Woongki Baek |
| 2017 | Dynamic scheduling for efficient hierarchical sparse matrix operations on the GPU. Andreas Derler, Rhaleb Zayer, Hans-Peter Seidel, Markus Steinberger |
| 2017 | Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigation. Peng Jiang, Gagan Agrawal |
| 2017 | Enabling scalability-sensitive speculative parallelization for FSM computations. Junqiao Qiu, Zhijia Zhao, Bo Wu, Abhinav Vishnu, Shuaiwen Leon Song |
| 2017 | Fast segmented sort on GPUs. Kaixi Hou, Weifeng Liu, Hao Wang, Wu-chun Feng |
| 2017 | Frequent subtree mining on the automata processor: challenges and opportunities. Elaheh Sadredini, Reza Rahimi, Ke Wang, Kevin Skadron |
| 2017 | Globally homogeneous, locally adaptive sparse matrix-vector multiplication on the GPU. Markus Steinberger, Rhaleb Zayer, Hans-Peter Seidel |
| 2017 | GraphGrind: addressing load imbalance of graph partitioning. Jiawen Sun, Hans Vandierendonck, Dimitrios S. Nikolopoulos |
| 2017 | HPAT: high performance analytics with scripting ease-of-use. Ehsan Totoni, Todd A. Anderson, Tatiana Shpeisman |
| 2017 | Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures. Haikun Liu, Yujie Chen, Xiaofei Liao, Hai Jin, Bingsheng He, Long Zheng, Rentong Guo |
| 2017 | HiPA: history-based piecewise approximation for functions. Aurangzeb, Rudolf Eigenmann |
| 2017 | Iteration-fusing conjugate gradient. Sicong Zhuang, Marc Casas |
| 2017 | Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs. Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra |
| 2017 | On improving performance of sparse matrix-matrix multiplication on GPUs. Rakshith Kunchum, Ankur Chaudhry, Aravind Sukumaran-Rajam, Qingpeng Niu, Israt Nisa, P. Sadayappan |
| 2017 | Optimizing recursive task parallel programs. Suyash Gupta, Rahul Shrivastava, V. Krishna Nandivada |
| 2017 | Packet coalescing exploiting data redundancy in GPGPU architectures. Kyung Hoon Kim, Rahul Boyapati, Jiayi Huang, Yuho Jin, Ki Hwan Yum, Eun Jung Kim |
| 2017 | Proceedings of the International Conference on Supercomputing, ICS 2017, Chicago, IL, USA, June 14-16, 2017 William D. Gropp, Pete Beckman, Zhiyuan Li, Francisco J. Cazorla |
| 2017 | Revisiting phased transactional memory. Joao P. L. de Carvalho, Guido Araujo, Alexandro Baldassin |
| 2017 | SPIRIT: a framework for creating distributed recursive tree applications. Nikhil Hegde, Jianqiao Liu, Milind Kulkarni |
| 2017 | SSDUP: a traffic-aware ssd burst buffer for HPC systems. Xuanhua Shi, Ming Li, Wei Liu, Hai Jin, Chen Yu, Yong Chen |
| 2017 | Simplification and runtime resolution of data dependence constraints for loop transformations. Diogo Nunes Sampaio, Louis-Noël Pouchet, Fabrice Rastello |
| 2017 | Supporting automatic recovery in offloaded distributed programming models through MPI-3 techniques. Antonio J. Peña, Vicenç Beltran, Carsten Clauss, Thomas Moschny |
| 2017 | Way-combining directory: an adaptive and scalable low-cost coherence directory. J. Rubén Titos Gil, Antonio Flores, Ricardo Fernández-Pascual, Alberto Ros, Manuel E. Acacio |
| 2017 | libPRISM: an intelligent adaptation of prefetch and SMT levels. Cristobal Ortega, Miquel Moretó, Marc Casas, Ramon Bertran, Alper Buyuktosunoglu, Alexandre E. Eichenberger, Pradip Bose |