PACT B

80 papers

YearTitle / Authors
2012A low-overhead dynamic optimization framework for multicores.
Christopher W. Fletcher, Rachael Harding, Omer Khan, Srinivas Devadas
2012A software memory partition approach for eliminating bank-level interference in multicore systems.
Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao, Mingyu Chen, Chengyong Wu
2012A yoke of oxen and a thousand chickens for heavy lifting graph processing.
Abdullah Gharaibeh, Lauro Beltrão Costa, Elizeu Santos-Neto, Matei Ripeanu
2012APCR: an adaptive physical channel regulator for on-chip interconnects.
Lei Wang, Poornachandran Kumar, Ki Hwan Yum, Eun Jung Kim
2012Acceleration of bulk memory operations in a heterogeneous multicore architecture.
Jong-Hyuk Lee, Ziyi Liu, Xiaonan Tian, Dong Hyuk Woo, Weidong Shi, Dainis Boumber, Yonghong Yan, Kyeong-An Kwon
2012Application-aware prefetch prioritization in on-chip networks.
Nachiappan Chidambaram Nachiappan, Asit K. Mishra, Mahmut T. Kandemir, Anand Sivasubramaniam, Onur Mutlu, Chita R. Das
2012Application-to-core mapping policies to reduce memory interference in multi-core systems.
Reetuparna Das, Rachata Ausavarungnirun, Onur Mutlu, Akhilesh Kumar, Mani Azimi
2012Auto-parallelizing stateful distributed streaming applications.
Scott Schneider, Martin Hirzel, Bugra Gedik, Kun-Lung Wu
2012Bandwidth bandit: quantitative characterization of memory contention.
David Eklov, Nikos Nikoleris, David Black-Schaffer, Erik Hagersten
2012Base-delta-immediate compression: practical data compression for on-chip caches.
Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry
2012Boost.SIMD: generic programming for portable SIMDization.
Pierre Estérie, Mathias Gaunard, Joel Falcou, Jean-Thierry Lapresté, Brigitte Rozoy
2012Branch and data herding: reducing control and memory divergence for error-tolerant GPU applications.
John Sartori, Rakesh Kumar
2012Chrysalis analysis: incorporating synchronization arcs in dataflow-analysis-based parallel monitoring.
Michelle L. Goodstein, Shimin Chen, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry
2012Coalition threading: combining traditional andnon-traditional parallelism to maximize scalability.
Md. Kamruzzaman, Steven Swanson, Dean M. Tullsen
2012Compiling to avoid communication.
Kathy Yelick
2012Complexity-effective multicore coherence.
Alberto Ros, Stefanos Kaxiras
2012Database analytics acceleration using FPGAs.
Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Balakrishna Iyer, Bernard Brezzo, Donna Dillenberger, Sameh W. Asaad
2012Design of a storage processing unit.
Peng Li, Kevin Gomez, David J. Lilja
2012Efficient techniques for predicting cache sharing and throughput.
Andreas Sandberg, David Black-Schaffer, Erik Hagersten
2012Energy-efficient cache partitioning for future CMPs.
Karthik T. Sundararajan, Timothy M. Jones, Nigel P. Topham
2012Energy-efficient workload mapping in heterogeneous systems with multiple types of resources.
Cong Liu
2012Enhancing performance optimization of multicore chips and multichip nodes with data structure metrics.
Ashay Rane, James C. Browne
2012Evaluation of blue Gene/Q hardware support for transactional memories.
Amy Wang, Matthew Gaudet, Peng Wu, José Nelson Amaral, Martin Ohmacht, Christopher Barton, Raúl Silvera, Maged M. Michael
2012Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme.
Sreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil
2012Fine-grained parallel traversals of irregular data structures.
Bin Ren, Gagan Agrawal, James R. Larus, Todd Mytkowicz, Tomi Poutanen, Wolfram Schulte
2012HaLock: hardware-assisted lock contention detection in multithreaded applications.
Yongbing Huang, Zehan Cui, Licheng Chen, Wenli Zhang, Yungang Bao, Mingyu Chen
2012Hardware acceleration in the IBM PowerEN processor: architecture and performance.
Anil Krishna, Timothy Heil, Nicholas Lindberg, Farnaz Toussi, Steven VanderWiel
2012Hardware prefetchers for emerging parallel applications.
Biswabandan Panda, Shankar Balachandran
2012High-performance analysis of filtered semantic graphs.
Aydin Buluç, Armando Fox, John R. Gilbert, Shoaib Kamil, Adam Lugowski, Leonid Oliker, Samuel Williams
2012Inference and declaration of independence: impact on deterministic task parallelism.
Foivos S. Zakkak, Dimitrios Chasapis, Polyvios Pratikakis, Angelos Bilas, Dimitrios S. Nikolopoulos
2012Integrating nanophotonics in GPU microarchitecture.
Nilanjan Goswami, Zhongqi Li, Ajit Verma, Ramkumar Shankar, Tao Li
2012International Conference on Parallel Architectures and Compilation Techniques, PACT '12, Minneapolis, MN, USA - September 19 - 23, 2012
Pen-Chung Yew, Sangyeun Cho, Luiz DeRose, David J. Lilja
2012Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches.
Mainak Chaudhuri, Jayesh Gaur, Nithiyanandan Bashyam, Sreenivas Subramoney, Joseph Nuzman
2012Layout-oblivious optimization for matrix computations.
Huimin Cui, Qing Yi, Jingling Xue, Xiaobing Feng
2012Linearly compressed pages: a main memory compression framework with low complexity and low latency.
Gennady Pekhimenko, Todd C. Mowry, Onur Mutlu
2012Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads.
Vijay Sathish, Michael J. Schulte, Nam Sung Kim
2012LumiNOC: a power-efficient, high-performance, photonic network-on-chip for future parallel architectures.
Cheng Li, Mark Browning, Paul V. Gratz, Samuel Palermo
2012MaSiF: machine learning guided auto-tuning of parallel skeletons.
Alexander Collins, Christian Fensch, Hugh Leather
2012Making data prefetch smarter: adaptive prefetching on POWER7.
Víctor Jiménez, Roberto Gioiosa, Francisco J. Cazorla, Alper Buyuktosunoglu, Pradip Bose, Francis P. O'Connell
2012Making it practical and effective: fast and precise may-happen-in-parallel analysis.
Congming Chen, Wei Huo, Xiaobing Feng
2012Many-thread aware instruction-level parallelism: architecting shader cores for GPU computing.
Ping Xiang, Yi Yang, Mike Mantor, Norm Rubin, Huiyang Zhou
2012Mileage-based contention management in transactional memory.
Woojin Choi, Lihang Zhao, Jeff Draper
2012Multi2Sim: a simulation framework for CPU-GPU computing.
Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, David R. Kaeli
2012Off-chip access localization for NoC-based multicores.
Wei Ding, Mahmut T. Kandemir, Yuanrui Zhang, Emre Kultursay
2012Optimal bypass monitor for high performance last-level caches.
Lingda Li, Dong Tong, Zichao Xie, Junlin Lu, Xu Cheng
2012Optimizing datacenter power with memory system levers for guaranteed quality-of-service.
Kshitij Sudan, Sadagopan Srinivasan, Rajeev Balasubramonian, Ravi R. Iyer
2012PEPON: performance-aware hierarchical power budgeting for NoC based multicores.
Akbar Sharifi, Asit K. Mishra, Shekhar Srikantaiah, Mahmut T. Kandemir, Chita R. Das
2012PGCapping: exploiting power gating for power capping and core lifetime balancing in CMPs.
Kai Ma, Xiaorui Wang
2012PS-Dir: a scalable two-level directory cache.
Joan J. Valls, Alberto Ros, Julio Sahuquillo, María Engracia Gómez, José Duato
2012Phase-based scheduling and thread migration for heterogeneous multicore processors.
Lina Sawalha, Ronald D. Barnes
2012Pointy: a hybrid pointer prefetcher for managed runtime systems.
Ioana Burcea, Livio Soares, Andreas Moshovos
2012Power-aware multi-core simulation for early design stage hardware/software co-optimization.
Wim Heirman, Souradip Sarkar, Trevor E. Carlson, Ibrahim Hur, Lieven Eeckhout
2012Power-efficient computing for compute-intensive GPGPU applications.
Syed Zohaib Gilani, Nam Sung Kim, Michael J. Schulte
2012Power-efficient time-sensitive mapping in heterogeneous systems.
Cong Liu, Jian Li, Wei Huang, Juan Rubio, Evan Speight, Xiaozhu Lin
2012Practically private: enabling high performance CMPs through compiler-assisted data classification.
Yong Li, Rami G. Melhem, Alex K. Jones
2012Probabilistic diagnosis of performance faults in large-scale parallel applications.
Ignacio Laguna, Dong H. Ahn, Bronis R. de Supinski, Saurabh Bagchi, Todd Gamblin
2012RISE: improving the streaming processors reliability against soft errors in gpgpus.
Jingweijia Tan, Xin Fu
2012ReCaP: a region-based cure for the common cold cache.
Jason Zebchuk, Harold W. Cain, Vijayalakshmi Srinivasan, Andreas Moshovos
2012Riposte: a trace-driven compiler and parallel VM for vector code in R.
Justin Talbot, Zachary DeVito, Pat Hanrahan
2012Runtime detection and optimization of collective communication patterns.
Torsten Hoefler, Timo Schneider
2012Sandboxing transactional memory.
Luke Dalessandro, Michael L. Scott
2012Scalability-based manycore partitioning.
Hiroshi Sasaki, Teruo Tanimoto, Koji Inoue, Hiroshi Nakamura
2012Shared memory multiplexing: a novel way to improve GPGPU throughput.
Yi Yang, Ping Xiang, Mike Mantor, Norm Rubin, Huiyang Zhou
2012SkipCache: miss-rate aware cache management.
Kanakagiri Raghavendra, Tripti S. Warrier, Madhu Mutyam
2012Speculative dynamic vectorization for HW/SW co-designed processors.
Rakesh Kumar, Alejandro Martínez, Antonio González
2012Speculative parallelization needs rigor: probabilistic analysis for optimal speculation of finite-state machine applications.
Zhijia Zhao, Bo Wu, Xipeng Shen
2012Strategies based on green policies to the grid resource allocation.
Fábio Coutinho, Luís Alfredo V. de Carvalho
2012Supporting stateful tasks in a dataflow graph.
Vladimir Gajinov, Srdjan Stipic, Osman S. Unsal, Tim Harris, Eduard Ayguadé, Adrián Cristal
2012System-level power-performance efficiency modeling for emergent GPU architectures.
Shuaiwen Song, Kirk W. Cameron
2012TMNOC: a case of HTM and NoC co-design for increased energy efficiency and concurrency.
Lihang Zhao, Woojin Choi, Jeffrey T. Draper
2012The changing role of supercomputing.
Peter J. Ungaro
2012The evicted-address filter: a unified mechanism to address both cache pollution and thrashing.
Vivek Seshadri, Onur Mutlu, Michael A. Kozuch, Todd C. Mowry
2012Top500 versus sustained performance: the top problems with the top500 list - and what to do about them.
William T. C. Kramer
2012Transactional event profiling in a best-effort hardware transactional memory system.
Matthew Gaudet, José Nelson Amaral
2012Transactional prefetching: narrowing the window of contention in hardware transactional memory.
Anurag Negi, Adrià Armejach, Adrián Cristal, Osman S. Unsal, Per Stenström
2012Transparent runtime deadlock elimination.
Hari K. Pyla, Srinidhi Varadarajan
2012Using combined profiling to decide when thread level speculation is profitable.
Arnamoy Bhattacharyya
2012Visualizing transactional memory.
Justin Emile Gottschlich, Maurice Herlihy, Gilles Pokam, Jeremy G. Siek
2012Workload and power budget partitioning for single-chip heterogeneous processors.
Hao Wang, Vijay Sathish, Ripudaman Singh, Michael J. Schulte, Nam Sung Kim
2012XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systems.
Ronald G. Dreslinski, Thomas Manville, Korey Sewell, Reetuparna Das, Nathaniel Ross Pinckney, Sudhir Satpathy, David T. Blaauw, Dennis Sylvester, Trevor N. Mudge