PACT - RankMe – RankMe

80 papers

Year	Title / Authors
2012	A low-overhead dynamic optimization framework for multicores. Christopher W. Fletcher, Rachael Harding, Omer Khan, Srinivas Devadas
2012	A software memory partition approach for eliminating bank-level interference in multicore systems. Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao, Mingyu Chen, Chengyong Wu
2012	A yoke of oxen and a thousand chickens for heavy lifting graph processing. Abdullah Gharaibeh, Lauro Beltrão Costa, Elizeu Santos-Neto, Matei Ripeanu
2012	APCR: an adaptive physical channel regulator for on-chip interconnects. Lei Wang, Poornachandran Kumar, Ki Hwan Yum, Eun Jung Kim
2012	Acceleration of bulk memory operations in a heterogeneous multicore architecture. Jong-Hyuk Lee, Ziyi Liu, Xiaonan Tian, Dong Hyuk Woo, Weidong Shi, Dainis Boumber, Yonghong Yan, Kyeong-An Kwon
2012	Application-aware prefetch prioritization in on-chip networks. Nachiappan Chidambaram Nachiappan, Asit K. Mishra, Mahmut T. Kandemir, Anand Sivasubramaniam, Onur Mutlu, Chita R. Das
2012	Application-to-core mapping policies to reduce memory interference in multi-core systems. Reetuparna Das, Rachata Ausavarungnirun, Onur Mutlu, Akhilesh Kumar, Mani Azimi
2012	Auto-parallelizing stateful distributed streaming applications. Scott Schneider, Martin Hirzel, Bugra Gedik, Kun-Lung Wu
2012	Bandwidth bandit: quantitative characterization of memory contention. David Eklov, Nikos Nikoleris, David Black-Schaffer, Erik Hagersten
2012	Base-delta-immediate compression: practical data compression for on-chip caches. Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry
2012	Boost.SIMD: generic programming for portable SIMDization. Pierre Estérie, Mathias Gaunard, Joel Falcou, Jean-Thierry Lapresté, Brigitte Rozoy
2012	Branch and data herding: reducing control and memory divergence for error-tolerant GPU applications. John Sartori, Rakesh Kumar
2012	Chrysalis analysis: incorporating synchronization arcs in dataflow-analysis-based parallel monitoring. Michelle L. Goodstein, Shimin Chen, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry
2012	Coalition threading: combining traditional andnon-traditional parallelism to maximize scalability. Md. Kamruzzaman, Steven Swanson, Dean M. Tullsen
2012	Compiling to avoid communication. Kathy Yelick
2012	Complexity-effective multicore coherence. Alberto Ros, Stefanos Kaxiras
2012	Database analytics acceleration using FPGAs. Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Balakrishna Iyer, Bernard Brezzo, Donna Dillenberger, Sameh W. Asaad
2012	Design of a storage processing unit. Peng Li, Kevin Gomez, David J. Lilja
2012	Efficient techniques for predicting cache sharing and throughput. Andreas Sandberg, David Black-Schaffer, Erik Hagersten
2012	Energy-efficient cache partitioning for future CMPs. Karthik T. Sundararajan, Timothy M. Jones, Nigel P. Topham
2012	Energy-efficient workload mapping in heterogeneous systems with multiple types of resources. Cong Liu
2012	Enhancing performance optimization of multicore chips and multichip nodes with data structure metrics. Ashay Rane, James C. Browne
2012	Evaluation of blue Gene/Q hardware support for transactional memories. Amy Wang, Matthew Gaudet, Peng Wu, José Nelson Amaral, Martin Ohmacht, Christopher Barton, Raúl Silvera, Maged M. Michael
2012	Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme. Sreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil
2012	Fine-grained parallel traversals of irregular data structures. Bin Ren, Gagan Agrawal, James R. Larus, Todd Mytkowicz, Tomi Poutanen, Wolfram Schulte
2012	HaLock: hardware-assisted lock contention detection in multithreaded applications. Yongbing Huang, Zehan Cui, Licheng Chen, Wenli Zhang, Yungang Bao, Mingyu Chen
2012	Hardware acceleration in the IBM PowerEN processor: architecture and performance. Anil Krishna, Timothy Heil, Nicholas Lindberg, Farnaz Toussi, Steven VanderWiel
2012	Hardware prefetchers for emerging parallel applications. Biswabandan Panda, Shankar Balachandran
2012	High-performance analysis of filtered semantic graphs. Aydin Buluç, Armando Fox, John R. Gilbert, Shoaib Kamil, Adam Lugowski, Leonid Oliker, Samuel Williams
2012	Inference and declaration of independence: impact on deterministic task parallelism. Foivos S. Zakkak, Dimitrios Chasapis, Polyvios Pratikakis, Angelos Bilas, Dimitrios S. Nikolopoulos
2012	Integrating nanophotonics in GPU microarchitecture. Nilanjan Goswami, Zhongqi Li, Ajit Verma, Ramkumar Shankar, Tao Li
2012	International Conference on Parallel Architectures and Compilation Techniques, PACT '12, Minneapolis, MN, USA - September 19 - 23, 2012 Pen-Chung Yew, Sangyeun Cho, Luiz DeRose, David J. Lilja
2012	Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches. Mainak Chaudhuri, Jayesh Gaur, Nithiyanandan Bashyam, Sreenivas Subramoney, Joseph Nuzman
2012	Layout-oblivious optimization for matrix computations. Huimin Cui, Qing Yi, Jingling Xue, Xiaobing Feng
2012	Linearly compressed pages: a main memory compression framework with low complexity and low latency. Gennady Pekhimenko, Todd C. Mowry, Onur Mutlu
2012	Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads. Vijay Sathish, Michael J. Schulte, Nam Sung Kim
2012	LumiNOC: a power-efficient, high-performance, photonic network-on-chip for future parallel architectures. Cheng Li, Mark Browning, Paul V. Gratz, Samuel Palermo
2012	MaSiF: machine learning guided auto-tuning of parallel skeletons. Alexander Collins, Christian Fensch, Hugh Leather
2012	Making data prefetch smarter: adaptive prefetching on POWER7. Víctor Jiménez, Roberto Gioiosa, Francisco J. Cazorla, Alper Buyuktosunoglu, Pradip Bose, Francis P. O'Connell
2012	Making it practical and effective: fast and precise may-happen-in-parallel analysis. Congming Chen, Wei Huo, Xiaobing Feng
2012	Many-thread aware instruction-level parallelism: architecting shader cores for GPU computing. Ping Xiang, Yi Yang, Mike Mantor, Norm Rubin, Huiyang Zhou
2012	Mileage-based contention management in transactional memory. Woojin Choi, Lihang Zhao, Jeff Draper
2012	Multi2Sim: a simulation framework for CPU-GPU computing. Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, David R. Kaeli
2012	Off-chip access localization for NoC-based multicores. Wei Ding, Mahmut T. Kandemir, Yuanrui Zhang, Emre Kultursay
2012	Optimal bypass monitor for high performance last-level caches. Lingda Li, Dong Tong, Zichao Xie, Junlin Lu, Xu Cheng
2012	Optimizing datacenter power with memory system levers for guaranteed quality-of-service. Kshitij Sudan, Sadagopan Srinivasan, Rajeev Balasubramonian, Ravi R. Iyer
2012	PEPON: performance-aware hierarchical power budgeting for NoC based multicores. Akbar Sharifi, Asit K. Mishra, Shekhar Srikantaiah, Mahmut T. Kandemir, Chita R. Das
2012	PGCapping: exploiting power gating for power capping and core lifetime balancing in CMPs. Kai Ma, Xiaorui Wang
2012	PS-Dir: a scalable two-level directory cache. Joan J. Valls, Alberto Ros, Julio Sahuquillo, María Engracia Gómez, José Duato
2012	Phase-based scheduling and thread migration for heterogeneous multicore processors. Lina Sawalha, Ronald D. Barnes
2012	Pointy: a hybrid pointer prefetcher for managed runtime systems. Ioana Burcea, Livio Soares, Andreas Moshovos
2012	Power-aware multi-core simulation for early design stage hardware/software co-optimization. Wim Heirman, Souradip Sarkar, Trevor E. Carlson, Ibrahim Hur, Lieven Eeckhout
2012	Power-efficient computing for compute-intensive GPGPU applications. Syed Zohaib Gilani, Nam Sung Kim, Michael J. Schulte
2012	Power-efficient time-sensitive mapping in heterogeneous systems. Cong Liu, Jian Li, Wei Huang, Juan Rubio, Evan Speight, Xiaozhu Lin
2012	Practically private: enabling high performance CMPs through compiler-assisted data classification. Yong Li, Rami G. Melhem, Alex K. Jones
2012	Probabilistic diagnosis of performance faults in large-scale parallel applications. Ignacio Laguna, Dong H. Ahn, Bronis R. de Supinski, Saurabh Bagchi, Todd Gamblin
2012	RISE: improving the streaming processors reliability against soft errors in gpgpus. Jingweijia Tan, Xin Fu
2012	ReCaP: a region-based cure for the common cold cache. Jason Zebchuk, Harold W. Cain, Vijayalakshmi Srinivasan, Andreas Moshovos
2012	Riposte: a trace-driven compiler and parallel VM for vector code in R. Justin Talbot, Zachary DeVito, Pat Hanrahan
2012	Runtime detection and optimization of collective communication patterns. Torsten Hoefler, Timo Schneider
2012	Sandboxing transactional memory. Luke Dalessandro, Michael L. Scott
2012	Scalability-based manycore partitioning. Hiroshi Sasaki, Teruo Tanimoto, Koji Inoue, Hiroshi Nakamura
2012	Shared memory multiplexing: a novel way to improve GPGPU throughput. Yi Yang, Ping Xiang, Mike Mantor, Norm Rubin, Huiyang Zhou
2012	SkipCache: miss-rate aware cache management. Kanakagiri Raghavendra, Tripti S. Warrier, Madhu Mutyam
2012	Speculative dynamic vectorization for HW/SW co-designed processors. Rakesh Kumar, Alejandro Martínez, Antonio González
2012	Speculative parallelization needs rigor: probabilistic analysis for optimal speculation of finite-state machine applications. Zhijia Zhao, Bo Wu, Xipeng Shen
2012	Strategies based on green policies to the grid resource allocation. Fábio Coutinho, Luís Alfredo V. de Carvalho
2012	Supporting stateful tasks in a dataflow graph. Vladimir Gajinov, Srdjan Stipic, Osman S. Unsal, Tim Harris, Eduard Ayguadé, Adrián Cristal
2012	System-level power-performance efficiency modeling for emergent GPU architectures. Shuaiwen Song, Kirk W. Cameron
2012	TMNOC: a case of HTM and NoC co-design for increased energy efficiency and concurrency. Lihang Zhao, Woojin Choi, Jeffrey T. Draper
2012	The changing role of supercomputing. Peter J. Ungaro
2012	The evicted-address filter: a unified mechanism to address both cache pollution and thrashing. Vivek Seshadri, Onur Mutlu, Michael A. Kozuch, Todd C. Mowry
2012	Top500 versus sustained performance: the top problems with the top500 list - and what to do about them. William T. C. Kramer
2012	Transactional event profiling in a best-effort hardware transactional memory system. Matthew Gaudet, José Nelson Amaral
2012	Transactional prefetching: narrowing the window of contention in hardware transactional memory. Anurag Negi, Adrià Armejach, Adrián Cristal, Osman S. Unsal, Per Stenström
2012	Transparent runtime deadlock elimination. Hari K. Pyla, Srinidhi Varadarajan
2012	Using combined profiling to decide when thread level speculation is profitable. Arnamoy Bhattacharyya
2012	Visualizing transactional memory. Justin Emile Gottschlich, Maurice Herlihy, Gilles Pokam, Jeremy G. Siek
2012	Workload and power budget partitioning for single-chip heterogeneous processors. Hao Wang, Vijay Sathish, Ripudaman Singh, Michael J. Schulte, Nam Sung Kim
2012	XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systems. Ronald G. Dreslinski, Thomas Manville, Korey Sewell, Reetuparna Das, Nathaniel Ross Pinckney, Sudhir Satpathy, David T. Blaauw, Dennis Sylvester, Trevor N. Mudge