# **2024 IEEE International** Symposium on Performance **Analysis of Systems and Software** (ISPASS 2024)

5-7 May 2024 Indianapolis, Indiana, USA



**IEEE Catalog Number: CFP24PER-POD ISBN**:

979-8-3503-7639-5

## Copyright © 2024 by the Institute of Electrical and Electronics Engineers, Inc. All Rights Reserved

Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limit of U.S. copyright law for private use of patrons those articles in this volume that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.

For other copying, reprint or republication permission, write to IEEE Copyrights Manager, IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08854. All rights reserved.

\*\*\* This is a print representation of what appears in the IEEE Digital Library. Some format issues inherent in the e-media version may also appear in this print version.

 IEEE Catalog Number:
 CFP24PER-POD

 ISBN (Print-On-Demand):
 979-8-3503-7639-5

 ISBN (Online):
 979-8-3503-7638-8

ISSN: 2994-9513

#### Additional Copies of This Publication Are Available From:

Curran Associates, Inc 57 Morehouse Lane Red Hook, NY 12571 USA

Phone: (845) 758-0400 Fax: (845) 758-2633

E-mail: curran@proceedings.com Web: www.proceedings.com



## 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

### **ISPASS 2024**

#### **Table of Contents**

| Message from the General Chairs                                                                                                                                                                                                                                           | xi     |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|
| Message from the Program Chairs                                                                                                                                                                                                                                           | xii    |
| Organizing Committee                                                                                                                                                                                                                                                      | xiv    |
| Program Committee                                                                                                                                                                                                                                                         |        |
| Steering Committee                                                                                                                                                                                                                                                        |        |
| Sponsors                                                                                                                                                                                                                                                                  | xviii  |
|                                                                                                                                                                                                                                                                           |        |
|                                                                                                                                                                                                                                                                           |        |
| Best Papers                                                                                                                                                                                                                                                               |        |
| •                                                                                                                                                                                                                                                                         |        |
| Aiding Microprocessor Performance Validation with Machine Learning<br>Erick Carvajal Barboza (Universidad de Costa Rica, Costa Rica), Mahesh<br>Ketkar (Intel Corportation, USA), Paul Gratz (Texas A&M University,<br>USA), and Jiang Hu (Texas A&M University, USA)     | 1      |
| CiMLoop: A Flexible, Accurate, and Fast Compute-In-Memory Modeling Tool  Tanner Andrulis (Massachusetts Institute of Technology, USA), Joel S.  Emer (Massachusetts Institute of Technology, Nvidia, USA), and  Vivienne Sze (Massachusetts Institute of Technology, USA) | 10     |
| Mohammadreza Rezvani (University of California Riverside, USA), Ali<br>Jahanshahi (University of California Riverside, USA), and Daniel Wong                                                                                                                              | with24 |
| (University of California Riverside, USA)                                                                                                                                                                                                                                 |        |

| BTBench: A Benchmark for Comprehensive Binary Translation Performance Evaluation                                                                                                                                                                                                                                     |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Performance Modeling & Analysis                                                                                                                                                                                                                                                                                      |
| MuchiSim: A Simulation Framework for Design Exploration of Multi-Chip Manycore Systems 48 Marcelo Orenes-Vera (Princeton University), Esin Tureci (Princeton University), Margaret Martonosi (Princeton University), and David Wentzlaff (Princeton University)                                                      |
| CiFlow: Dataflow Analysis and Optimization of Key Switching for Homomorphic Encryption61  Negar Neda (New York University, USA), Austin Ebel (New York  University, USA), Benedict Reynwar (Information Sciences Institute,  University of Southern California, USA), and Brandon Reagen (New York  University, USA) |
| Workload Characterization of Commercial Mobile Benchmark Suites                                                                                                                                                                                                                                                      |
| RTune: Towards Automated and Coordinated Optimization of Computing and Computational Objectives of Parallel Iterative Applications                                                                                                                                                                                   |
| Analysis of HW Systems                                                                                                                                                                                                                                                                                               |
| Characterizing Soft-Error Resiliency in Arm's Ethos-U55 Embedded Machine Learning Accelerator                                                                                                                                                                                                                        |
| SAP: Silicon Authentication Platform for System-on-Chip Supply Chain Vulnerabilities                                                                                                                                                                                                                                 |

| SimPoint-Based Microarchitectural Hotspot & Energy-Efficiency Analysis of RISC-V OoO CPUs 120 Odysseas Chatzopoulos (University of Athens, Greece), Maria Trakosa (University of Athens, Greece), George Papadimitriou (University of Athens, Greece), Wing Shek Wong (Intel, Austin, Texas), and Dimitris Gizopoulos (University of Athens, Greece) |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| On the Rise of AMD Matrix Cores: Performance, Power Efficiency, and Programmability                                                                                                                                                                                                                                                                  |
| Simulation                                                                                                                                                                                                                                                                                                                                           |
| DNA Storage Toolkit: A Modular End-to-End DNA Data Storage Codec and Simulator                                                                                                                                                                                                                                                                       |
| Zatel: Sample Complexity–Aware Scale–Model Simulation for Ray Tracing                                                                                                                                                                                                                                                                                |
| BZSim: Fast, Large-Scale Microarchitectural Simulation with Detailed Interconnect Modeling 167 Panagiotis Strikos (Chalmers University of Technology, Sweden), Ahsen Ejaz (Chalmers University of Technology, Sweden), and Ioannis Sourdis (Chalmers University of Technology, Sweden)                                                               |
| Userspace Networking in gem5                                                                                                                                                                                                                                                                                                                         |
| System-Level Optimization                                                                                                                                                                                                                                                                                                                            |
| Vision Transformer Computation and Resilience for Dynamic Inference                                                                                                                                                                                                                                                                                  |
| LIBRA: Enabling Workload-Aware Multi-Dimensional Network Topology Optimization for Distributed Training of Large AI Models                                                                                                                                                                                                                           |

| SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory Systems 2: Kailash Gogineni (George Washington University, USA), Sai Santosh Dayapule (George Washington University, USA), Juan Gómez-Luna (ETH Zürich, Switzerland), Karthikeya Gogineni (Independent), Peng Wei (George Washington University, USA), Tian Lan (George Washington University, USA), Mohammad Sadrosadati (ETH Zürich, Switzerland), Onur Mutlu (ETH Zürich, Switzerland), and Guru Venkataramani (George Washington University, USA) | l <i>7</i> |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
| Forward to the Past: An Alternative to Hybrid CPU Design 23 Sanyam Mehta (HPE, USA) and Anna Yue (University of Minnesota, USA)                                                                                                                                                                                                                                                                                                                                                                                                      | 30         |
| AI & LLM Models & Analysis                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |            |
| Bandwidth Characterization of DeepSpeed on Distributed Large Language Model Training                                                                                                                                                                                                                                                                                                                                                                                                                                                 | <b>1</b> 1 |
| Generative AI Beyond LLMs: System Implications of Multi-Modal Generation                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 57         |
| Towards Cognitive AI Systems: Workload and Characterization of Neuro-Symbolic AI                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 58         |
| Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production                                                                                                                                                                                                                                                                                                                                                                                                                      | 80         |

#### **Posters**

| Leveraging Memory Expansion to Accelerate Large-Scale DL Training                                                                                                                                                                                                                                   |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| APGPM: Automated PMC-Based Power Modeling Methodology for Modern Mobile GPUs 295  Pranab Dash (Purdue University), Y. Charlie Hu (Purdue University),  and Abhilash Jindal (IIT Delhi)                                                                                                              |
| gem5-Based Evaluation of CVA6 SoC: Insights into the Architectural Design                                                                                                                                                                                                                           |
| Accel-Bench: Exploring the Potential of Programming using Hardware-Accelerated Functions 302<br>Abenezer Wudenhe (University of California, USA), Yu-Chia Liu<br>(University of California, USA), Chris Chen (University of California,<br>USA), and Hung-Wei Tseng (University of California, USA) |
| SEFsim: A Statistically-Guided Fast DRAM Simulator                                                                                                                                                                                                                                                  |
| Architecture-Level Modeling of Photonic Deep Neural Network Accelerators                                                                                                                                                                                                                            |
| Automatic Extraction of Network Configurations for Realistic Simulation and Validation                                                                                                                                                                                                              |
| MindPalace: A Framework for Studying Microarchitecture Design of Function-as-a-Service                                                                                                                                                                                                              |
| Infrastructure for Exploring SIMT Architecture in General-Purpose Processors                                                                                                                                                                                                                        |

| Distributed Training of Neural Radiance Fields: A Performance Characterization                                                                                                                                                                                                                                                                                          | 319 |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Institute, Canada)                                                                                                                                                                                                                                                                                                                                                      |     |
| Bottleneck Scenarios in use of the Conveyors Message Aggregation Library                                                                                                                                                                                                                                                                                                | 322 |
| A Profiling-Based Benchmark Suite for Warehouse-Scale Computers                                                                                                                                                                                                                                                                                                         | 325 |
| Characterizing Dynamic Memory Behavior in WebAssembly Workloads                                                                                                                                                                                                                                                                                                         | 328 |
| Probing Weaknesses in GPU Reliability Assessment: A Cross-Layer Approach  Lishan Yang (George Mason University, USA), George Papadimitriou (University of Athens, Greece), Dimitrios Sartzetakis (University of Athens, Greece), Adwait Jog (University of Virginia, USA), Evgenia Smirni (William & Mary, USA), and Dimitris Gizopoulos (University of Athens, Greece) | 331 |
| Author Index                                                                                                                                                                                                                                                                                                                                                            | 335 |