PACT B

66 papers

YearTitle / Authors
2014A run-time power manager exploiting software parallelism.
Simon Holmbacka, Sébastien Lafond, Johan Lilius
2014A runtime support mechanism for fast mode switching of a self-morphing core for power efficiency.
Sudarshan Srinivasan, Nithesh Kurella, Israel Koren, Rance Rodrigues, Sandip Kundu
2014ADHA: automatic data layout framework for heterogeneous architectures.
Deepak Majeti, Kuldeep S. Meel, Rajkishore Barik, Vivek Sarkar
2014ATCache: reducing DRAM cache latency via a small SRAM tag cache.
Cheng-Chieh Huang, Vijay Nagarajan
2014Active learning accelerated automatic heuristic construction for parallel program mapping.
William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather
2014Adaptive heterogeneous scheduling for integrated GPUs.
Rashid Kaleem, Rajkishore Barik, Tatiana Shpeisman, Brian T. Lewis, Chunling Hu, Keshav Pingali
2014An event-based language for dynamic binary translation frameworks.
Serguei Makarov, Angela Demke Brown, Ashvin Goel
2014ArrayTool: a lightweight profiler to guide array regrouping.
Xu Liu, Kamal Sharma, John M. Mellor-Crummey
2014Automatic execution of single-GPU computations across multiple GPUs.
Javier Cabezas, Lluís Vilanova, Isaac Gelado, Thomas B. Jablin, Nacho Navarro, Wen-mei W. Hwu
2014Automatic optimization of thread-coarsening for graphics processors.
Alberto Magni, Christophe Dubach, Michael F. P. O'Boyle
2014Automatic parallelism through macro dataflow in high-level array languages.
Pushkar Ratnalikar, Arun Chauhan
2014Bitwise data parallelism in regular expression matching.
Robert D. Cameron, Thomas C. Shermer, Arrvindh Shriraman, Kenneth S. Herdy, Dan Lin, Benjamin R. Hull, Meng Lin
2014Bounded memory scheduling of dynamic task graphs.
Dragos Sbirlea, Zoran Budimlic, Vivek Sarkar
2014CAWS: criticality-aware warp scheduling for GPGPU workloads.
Shin-Ying Lee, Carole-Jean Wu
2014COLORIS: a dynamic cache partitioning system using page coloring.
Ying Ye, Richard West, Zhuoqun Cheng, Ye Li
2014Coarrays in GNU Fortran.
Alessandro Fanfarillo, Tobias Burnus, Valeria Cardellini, Salvatore Filippone, Dan Nagle, Damian W. I. Rouson
2014Compiler support for selective page migration in NUMA architectures.
Guilherme Piccoli, Henrique Nazaré Santos, Raphael Ernani Rodrigues, Christiane Pousa, Edson Borin, Fernando Magno Quintão Pereira
2014Consolidated conflict detection for hardware transactional memory.
Lihang Zhao, Jeffrey T. Draper
2014Cooperative cache scrubbing.
Jennifer B. Sartor, Wim Heirman, Stephen M. Blackburn, Lieven Eeckhout, Kathryn S. McKinley
2014D
Davoud Anoushe Jamshidi, Mehrzad Samadi, Scott A. Mahlke
2014Data remapping for an energy efficient burst chop in DRAM memory systems.
Sudharsan Jagathrakshakan, Venkata Kalyan Tavva, Madhu Mutyam
2014Data-reuse optimizations for pipelined tiling with parametric tile sizes.
Alexandre Isoard
2014DeSTM: harnessing determinism in STMs for application development.
Kaushik Ravichandran, Ada Gavrilovska, Santosh Pande
2014Design for scalability in enterprise SSDs.
Arash Tavakkol, Mohammad Arjomand, Hamid Sarbazi-Azad
2014Design of a hybrid MPI-CUDA benchmark suite for CPU-GPU clusters.
Tejaswi Agarwal, Michela Becchi
2014Domain-specific models for innovation in analytics.
Bob Blainey
2014EFetch: optimizing instruction fetch for event-driven webapplications.
Gaurav Chadha, Scott A. Mahlke, Satish Narayanasamy
2014From petascale to the pocket: Adaptively scaling parallel programs for mobile SoCs.
Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger
2014Graph-based performance accounting for chip multiprocessor memory systems.
Magnus Jahre
2014Heterogeneous microarchitectures trump voltage scaling for low-power cores.
Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Ronald G. Dreslinski, Thomas F. Wenisch, Scott A. Mahlke
2014ILP and TLP in shared memory applications: a limit study.
Ehsan Fatehi, Paul Gratz
2014Improving performance of streaming applications with filtering and control messages.
Peng Li, Jeremy Buhler
2014International Conference on Parallel Architectures and Compilation, PACT '14, Edmonton, AB, Canada, August 24-27, 2014
José Nelson Amaral, Josep Torrellas
2014Internet of mobile things: challenges and opportunities.
Klara Nahrstedt
2014Invyswell: a hybrid transactional memory for haswell's restricted transactional memory.
Irina Calciu, Justin Gottschlich, Tatiana Shpeisman, Gilles Pokam, Maurice Herlihy
2014KLA: a new algorithmic paradigm for parallel graph computations.
Harshvardhan, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger
2014LCA: a memory link and cache-aware co-scheduling approach for CMPs.
Alexandros-Herodotos Haritatos, Georgios I. Goumas, Nikos Anastopoulos, Konstantinos Nikas, Kornilios Kourtis, Nectarios Koziris
2014Locality-aware memory association for multi-target worksharing in OpenMP.
Thomas R. W. Scogland, Wu-chun Feng
2014Measuring flexibility in single-ISA heterogeneous processors.
Erik Tomusk, Christophe Dubach, Michael F. P. O'Boyle
2014Memory scheduling towards high-throughput cooperative heterogeneous computing.
Hao Wang, Ripudaman Singh, Michael J. Schulte, Nam Sung Kim
2014OpenTuner: an extensible framework for program autotuning.
Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, Saman P. Amarasinghe
2014Optimizing stencil code via locality of computation.
Yulong Luo, Guangming Tan
2014PATS: pattern aware scheduling and power gating for GPGPUs.
Qiumin Xu, Murali Annavaram
2014PEMOGEN: automatic adaptive performance modeling during program runtime.
Arnamoy Bhattacharyya, Torsten Hoefler
2014Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels.
Sreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil
2014Processing big data graphs on memory-restricted systems.
Harshvardhan, Nancy M. Amato, Lawrence Rauchwerger
2014Protection and utilization in shared cache through rationing.
Raj Parihar, Jacob Brock, Chen Ding, Michael C. Huang
2014RCS: runtime resource and core scaling for power-constrained multi-core processors.
Hamid Reza Ghasemi, Nam Sung Kim
2014Realm: an event-based low-level runtime for distributed memory architectures.
Sean Treichler, Michael Bauer, Alex Aiken
2014Rollback-free value prediction with approximate loads.
Bradley Thwaites, Gennady Pekhimenko, Hadi Esmaeilzadeh, Amir Yazdanbakhsh, Onur Mutlu, Jongse Park, Girish Mururu, Todd C. Mowry
2014SM-centric transformation: circumventing hardware restrictions for flexible GPU scheduling.
Bo Wu, Guoyang Chen, Dong Li, Xipeng Shen, Jeffrey S. Vetter
2014SQRL: hardware accelerator for collecting software data structures.
Snehasish Kumar, Arrvindh Shriraman, Vijayalakshmi Srinivasan, Dan Lin, Jordon Phillips
2014Shuffling: a framework for lock contention aware thread scheduling for multicore multiprocessor systems.
Kishore Kumar Pusukuri, Rajiv Gupta, Laxmi N. Bhuyan
2014SpongeDirectory: flexible sparse directories utilizing multi-level memristors.
Lunkai Zhang, Dmitri B. Strukov, Hebatallah Saadeldeen, Dongrui Fan, Mingzhe Zhang, Diana Franklin
2014Stratified sampling for even workload partitioning.
Jeeva Paudel, José Nelson Amaral
2014Tiling and optimizing time-iterated computations on periodic domains.
Uday Bondhugula, Vinayaka Bandishti, Albert Cohen, Guillain Potron, Nicolas Vasilache
2014Trading cache hit rate for memory performance.
Wei Ding, Mahmut T. Kandemir, Diana R. Guttman, Adwait Jog, Chita R. Das, Praveen Yedlapalli
2014Using STT-RAM to enable energy-efficient near-threshold chip multiprocessors.
Xiang Pan, Radu Teodorescu
2014VAST: the illusion of a large memory space for GPUs.
Janghaeng Lee, Mehrzad Samadi, Scott A. Mahlke
2014Velociraptor: an embedded compiler toolkit for numerical programs targeting CPUs and GPUs.
Rahul Garg, Laurie J. Hendren
2014Versatile and scalable parallel histogram construction.
Wookeun Jung, Jongsoo Park, Jaejin Lee
2014Virtues and limitations of commodity hardware transactional memory.
Nuno Diegues, Paolo Romano, Luís E. T. Rodrigues
2014Warp-aware trace scheduling for GPUs.
James A. Jablin, Thomas B. Jablin, Onur Mutlu, Maurice Herlihy
2014What is the cost of weak determinism?
Cedomir Segulja, Tarek S. Abdelrahman
2014XStream: cross-core spatial streaming based MLC prefetchers for parallel applications in CMPs.
Biswabandan Panda, Shankar Balachandran
2014kMAF: automatic kernel-level management of thread and data affinity.
Matthias Diener, Eduardo Henrique Molina da Cruz, Philippe Olivier Alexandre Navaux, Anselm Busse, Hans-Ulrich Heiß