PACT - RankMe – RankMe

66 papers

Year	Title / Authors
2014	A run-time power manager exploiting software parallelism. Simon Holmbacka, Sébastien Lafond, Johan Lilius
2014	A runtime support mechanism for fast mode switching of a self-morphing core for power efficiency. Sudarshan Srinivasan, Nithesh Kurella, Israel Koren, Rance Rodrigues, Sandip Kundu
2014	ADHA: automatic data layout framework for heterogeneous architectures. Deepak Majeti, Kuldeep S. Meel, Rajkishore Barik, Vivek Sarkar
2014	ATCache: reducing DRAM cache latency via a small SRAM tag cache. Cheng-Chieh Huang, Vijay Nagarajan
2014	Active learning accelerated automatic heuristic construction for parallel program mapping. William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather
2014	Adaptive heterogeneous scheduling for integrated GPUs. Rashid Kaleem, Rajkishore Barik, Tatiana Shpeisman, Brian T. Lewis, Chunling Hu, Keshav Pingali
2014	An event-based language for dynamic binary translation frameworks. Serguei Makarov, Angela Demke Brown, Ashvin Goel
2014	ArrayTool: a lightweight profiler to guide array regrouping. Xu Liu, Kamal Sharma, John M. Mellor-Crummey
2014	Automatic execution of single-GPU computations across multiple GPUs. Javier Cabezas, Lluís Vilanova, Isaac Gelado, Thomas B. Jablin, Nacho Navarro, Wen-mei W. Hwu
2014	Automatic optimization of thread-coarsening for graphics processors. Alberto Magni, Christophe Dubach, Michael F. P. O'Boyle
2014	Automatic parallelism through macro dataflow in high-level array languages. Pushkar Ratnalikar, Arun Chauhan
2014	Bitwise data parallelism in regular expression matching. Robert D. Cameron, Thomas C. Shermer, Arrvindh Shriraman, Kenneth S. Herdy, Dan Lin, Benjamin R. Hull, Meng Lin
2014	Bounded memory scheduling of dynamic task graphs. Dragos Sbirlea, Zoran Budimlic, Vivek Sarkar
2014	CAWS: criticality-aware warp scheduling for GPGPU workloads. Shin-Ying Lee, Carole-Jean Wu
2014	COLORIS: a dynamic cache partitioning system using page coloring. Ying Ye, Richard West, Zhuoqun Cheng, Ye Li
2014	Coarrays in GNU Fortran. Alessandro Fanfarillo, Tobias Burnus, Valeria Cardellini, Salvatore Filippone, Dan Nagle, Damian W. I. Rouson
2014	Compiler support for selective page migration in NUMA architectures. Guilherme Piccoli, Henrique Nazaré Santos, Raphael Ernani Rodrigues, Christiane Pousa, Edson Borin, Fernando Magno Quintão Pereira
2014	Consolidated conflict detection for hardware transactional memory. Lihang Zhao, Jeffrey T. Draper
2014	Cooperative cache scrubbing. Jennifer B. Sartor, Wim Heirman, Stephen M. Blackburn, Lieven Eeckhout, Kathryn S. McKinley
2014	D Davoud Anoushe Jamshidi, Mehrzad Samadi, Scott A. Mahlke
2014	Data remapping for an energy efficient burst chop in DRAM memory systems. Sudharsan Jagathrakshakan, Venkata Kalyan Tavva, Madhu Mutyam
2014	Data-reuse optimizations for pipelined tiling with parametric tile sizes. Alexandre Isoard
2014	DeSTM: harnessing determinism in STMs for application development. Kaushik Ravichandran, Ada Gavrilovska, Santosh Pande
2014	Design for scalability in enterprise SSDs. Arash Tavakkol, Mohammad Arjomand, Hamid Sarbazi-Azad
2014	Design of a hybrid MPI-CUDA benchmark suite for CPU-GPU clusters. Tejaswi Agarwal, Michela Becchi
2014	Domain-specific models for innovation in analytics. Bob Blainey
2014	EFetch: optimizing instruction fetch for event-driven webapplications. Gaurav Chadha, Scott A. Mahlke, Satish Narayanasamy
2014	From petascale to the pocket: Adaptively scaling parallel programs for mobile SoCs. Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger
2014	Graph-based performance accounting for chip multiprocessor memory systems. Magnus Jahre
2014	Heterogeneous microarchitectures trump voltage scaling for low-power cores. Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Ronald G. Dreslinski, Thomas F. Wenisch, Scott A. Mahlke
2014	ILP and TLP in shared memory applications: a limit study. Ehsan Fatehi, Paul Gratz
2014	Improving performance of streaming applications with filtering and control messages. Peng Li, Jeremy Buhler
2014	International Conference on Parallel Architectures and Compilation, PACT '14, Edmonton, AB, Canada, August 24-27, 2014 José Nelson Amaral, Josep Torrellas
2014	Internet of mobile things: challenges and opportunities. Klara Nahrstedt
2014	Invyswell: a hybrid transactional memory for haswell's restricted transactional memory. Irina Calciu, Justin Gottschlich, Tatiana Shpeisman, Gilles Pokam, Maurice Herlihy
2014	KLA: a new algorithmic paradigm for parallel graph computations. Harshvardhan, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger
2014	LCA: a memory link and cache-aware co-scheduling approach for CMPs. Alexandros-Herodotos Haritatos, Georgios I. Goumas, Nikos Anastopoulos, Konstantinos Nikas, Kornilios Kourtis, Nectarios Koziris
2014	Locality-aware memory association for multi-target worksharing in OpenMP. Thomas R. W. Scogland, Wu-chun Feng
2014	Measuring flexibility in single-ISA heterogeneous processors. Erik Tomusk, Christophe Dubach, Michael F. P. O'Boyle
2014	Memory scheduling towards high-throughput cooperative heterogeneous computing. Hao Wang, Ripudaman Singh, Michael J. Schulte, Nam Sung Kim
2014	OpenTuner: an extensible framework for program autotuning. Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, Saman P. Amarasinghe
2014	Optimizing stencil code via locality of computation. Yulong Luo, Guangming Tan
2014	PATS: pattern aware scheduling and power gating for GPGPUs. Qiumin Xu, Murali Annavaram
2014	PEMOGEN: automatic adaptive performance modeling during program runtime. Arnamoy Bhattacharyya, Torsten Hoefler
2014	Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels. Sreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil
2014	Processing big data graphs on memory-restricted systems. Harshvardhan, Nancy M. Amato, Lawrence Rauchwerger
2014	Protection and utilization in shared cache through rationing. Raj Parihar, Jacob Brock, Chen Ding, Michael C. Huang
2014	RCS: runtime resource and core scaling for power-constrained multi-core processors. Hamid Reza Ghasemi, Nam Sung Kim
2014	Realm: an event-based low-level runtime for distributed memory architectures. Sean Treichler, Michael Bauer, Alex Aiken
2014	Rollback-free value prediction with approximate loads. Bradley Thwaites, Gennady Pekhimenko, Hadi Esmaeilzadeh, Amir Yazdanbakhsh, Onur Mutlu, Jongse Park, Girish Mururu, Todd C. Mowry
2014	SM-centric transformation: circumventing hardware restrictions for flexible GPU scheduling. Bo Wu, Guoyang Chen, Dong Li, Xipeng Shen, Jeffrey S. Vetter
2014	SQRL: hardware accelerator for collecting software data structures. Snehasish Kumar, Arrvindh Shriraman, Vijayalakshmi Srinivasan, Dan Lin, Jordon Phillips
2014	Shuffling: a framework for lock contention aware thread scheduling for multicore multiprocessor systems. Kishore Kumar Pusukuri, Rajiv Gupta, Laxmi N. Bhuyan
2014	SpongeDirectory: flexible sparse directories utilizing multi-level memristors. Lunkai Zhang, Dmitri B. Strukov, Hebatallah Saadeldeen, Dongrui Fan, Mingzhe Zhang, Diana Franklin
2014	Stratified sampling for even workload partitioning. Jeeva Paudel, José Nelson Amaral
2014	Tiling and optimizing time-iterated computations on periodic domains. Uday Bondhugula, Vinayaka Bandishti, Albert Cohen, Guillain Potron, Nicolas Vasilache
2014	Trading cache hit rate for memory performance. Wei Ding, Mahmut T. Kandemir, Diana R. Guttman, Adwait Jog, Chita R. Das, Praveen Yedlapalli
2014	Using STT-RAM to enable energy-efficient near-threshold chip multiprocessors. Xiang Pan, Radu Teodorescu
2014	VAST: the illusion of a large memory space for GPUs. Janghaeng Lee, Mehrzad Samadi, Scott A. Mahlke
2014	Velociraptor: an embedded compiler toolkit for numerical programs targeting CPUs and GPUs. Rahul Garg, Laurie J. Hendren
2014	Versatile and scalable parallel histogram construction. Wookeun Jung, Jongsoo Park, Jaejin Lee
2014	Virtues and limitations of commodity hardware transactional memory. Nuno Diegues, Paolo Romano, Luís E. T. Rodrigues
2014	Warp-aware trace scheduling for GPUs. James A. Jablin, Thomas B. Jablin, Onur Mutlu, Maurice Herlihy
2014	What is the cost of weak determinism? Cedomir Segulja, Tarek S. Abdelrahman
2014	XStream: cross-core spatial streaming based MLC prefetchers for parallel applications in CMPs. Biswabandan Panda, Shankar Balachandran
2014	kMAF: automatic kernel-level management of thread and data affinity. Matthias Diener, Eduardo Henrique Molina da Cruz, Philippe Olivier Alexandre Navaux, Anselm Busse, Hans-Ulrich Heiß