## 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC 2022)

Virtual Conference 17 – 20 January 2022



IEEE Catalog Number: CFP22ASP-POD ISBN: 978-1-6654-2136-2

## Copyright © 2022 by the Institute of Electrical and Electronics Engineers, Inc. All Rights Reserved

Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limit of U.S. copyright law for private use of patrons those articles in this volume that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.

For other copying, reprint or republication permission, write to IEEE Copyrights Manager, IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08854. All rights reserved.

\*\*\* This is a print representation of what appears in the IEEE Digital Library. Some format issues inherent in the e-media version may also appear in this print version.

 IEEE Catalog Number:
 CFP22ASP-POD

 ISBN (Print-On-Demand):
 978-1-6654-2136-2

 ISBN (Online):
 978-1-6654-2135-5

ISSN: 2153-6961

## Additional Copies of This Publication Are Available From:

Curran Associates, Inc 57 Morehouse Lane Red Hook, NY 12571 USA Phone: (845) 758-0400

Fax: (845) 758-2633

E-mail: curran@proceedings.com Web: www.proceedings.com



## **Technical Program**

| Session 1A: University Design Contest- | . ] |
|----------------------------------------|-----|
|----------------------------------------|-----|

| A 0.5 mm <sup>2</sup> Ambient Light-Driven Solar Cell-Powered Biofuel Cell-Input Biosensing System with LED Driving for Stand-Alone RF-Less Continuous Glucose Monitoring Contact Lens | . 1 |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| A 76-81 GHz FMCW 2TX/3RX Radar Transceiver with Integrated Mixed-Mode PLL and Series-Fed Patch Antenna Array                                                                           | . 3 |
| A 5.2GHz RFID Chip Contactlessly Mountable on FPC at Any 90-Degree Rotation and Face Orientation                                                                                       | . 5 |
| A 40nm CMOS SoC for Real-Time Dysarthric Voice Conversion of Stroke Patients                                                                                                           | . 7 |
| A Side-Channel Hardware Trojan in 65nm CMOS with 2μW precision and Multi-bit Leakage Capability ······                                                                                 | . 9 |
| Session 1B: (SS-1) New Advances towards Building Secure Computer Architectures                                                                                                         |     |
| SC-K9: A Self-synchronizing Framework to Counter Micro-architectural Side Channels                                                                                                     | 11  |
| CacheGuard: A Behavior Model Checker for Cache Timing Side-Channel Security                                                                                                            | 19  |
| Lightweight and Secure Branch Predictors against Spectre Attacks                                                                                                                       | 25  |
| Computation-in-Memory Accelerators for Secure Graph Database: Opportunities and Challenges                                                                                             | 31  |
| Session 1C: Research Paradigm in Approximate and Neuromorphic Computing                                                                                                                |     |
| HEALM: Hardware-Efficient Approximate Logarithmic Multiplier with Reduced Error                                                                                                        | 37  |
| DistriHD: A Memory Efficient Distributed Binary Hyperdimensional Computing Architecture for Image Classification                                                                       | 43  |
| Thermal-aware Layout Optimization and Mapping Methods for Resistive Neuromorphic Engines · · · · · · · · · · · · · · · · · · ·                                                         | 50  |
| Session 1D: New Design Techniques for Emerging Challenges in Microfluidic Biochips                                                                                                     |     |
| NR-Router: Non-Regular Electrode Routing with Optimal Pin Selection for Electrowetting-on-Dielectric Chips                                                                             | 56  |
| Design-for-Reliability and Probability-Based Fault Tolerance for Paper-Based Digital Microfluidic Biochips with Multiple Faults                                                        | 62  |
| Improving the Robustness of Microfluidic Networks ·····                                                                                                                                | 68  |
| Session 1E: Advances in Machine Learning Assisted Analog Circuit Sizing                                                                                                                |     |
| An Efficient Kriging-based Constrained Multi-objective Evolutionary Algorithm for Analog Circuit Synthesis via Self-adaptive Incremental Learning                                      | 74  |
| Fast Variation-aware Circuit Sizing Approach for Analog Design with ML-Assisted Evolutionary Algorithm                                                                                 | 80  |
| A Novel and Efficient Bayesian Optimization Approach for Analog Designs with Multi-<br>Testbench                                                                                       | 86  |

| Session 2A: University Design Contest-2                                                                                               |
|---------------------------------------------------------------------------------------------------------------------------------------|
| A 2.17µW @120fps Ultra-Low-Power Dual-Mode CMOS Image Sensor with Senputing Architecture 92                                           |
| A Reconfigurable Inference Processor for Recurrent Neural Networks Based on Programmable Data Format in a Resource-Limited FPGA       |
| Supply-Variation-Tolerant Transimpedance Amplifier Using Non-Inverting Amplifier in 180-nm CMOS · · · · · · · · · · · · · · · · · · · |
| Deformable Chiplet-Based Computer Using Inductively Coupled Wireless Communication 98                                                 |
| Session 2B: (SS-2) Analog Circuit and Layout Synthesis: Advancement and Prospect                                                      |
| AMS Circuit Synthesis Enabled by the Advancements of Circuit Architectures and ML Algorithms                                          |
| Automating Analog Constraint Extraction: From Heuristics to Learning                                                                  |
| Common-Centroid Layout for Active and Passive Devices: A Review and the Road Ahead · · · · · 114                                      |
| Session 2C: Low-cost and Memory-Efficient Deep Learning                                                                               |
| PUMP: Profiling-free Unified Memory Prefetcher for Large DNN Model Support 122                                                        |
| RADARS: Memory Efficient Reinforcement Learning Aided Differentiable Neural Architecture Search                                       |
| A Heuristic Exploration to Retraining-free Weight Sharing for CNN Compression                                                         |
| HiKonv: High Throughput Quantized Convolution With Novel Bit-wise Management and Computation                                          |
| Session 2D: High-level Verification and Application                                                                                   |
| Mapping Large Scale Finite Element Computing onto Wafer-Scale Engines · · · · · 147                                                   |
| Generalizing Tandem Simulation: Connecting High-level and RTL Simulation Models                                                       |
| Automated Detection of Spatial Memory Safety Violations for Constrained Devices 160                                                   |
| Session 2E: Design for Manufacturing and Signal Integrity                                                                             |
| Lithography Hotspot Detection via Heterogeneous Federated Learning with Local Adaptation · 166                                        |
| Voronoi Diagram Based Heterogeneous Circuit Layout Centerline Extraction for Mask Verification                                        |
| Signal-Integrity-Aware Interposer Bus Routing in 2.5D Heterogeneous Integration                                                       |
| Session 3B: Analysis and optimization for timing, power, and reliability                                                              |
| Pre-Routing Path Delay Estimation Based on Transformer and Residual Framework                                                         |

| Fast Electromigration Stress Analysis Considering Spatial Joule Heating Effects                                                          |
|------------------------------------------------------------------------------------------------------------------------------------------|
| Session 3C: Advanced Machine Learning with Emerging Technologies                                                                         |
| SONIC: A Sparse Neural Network Inference Accelerator with Silicon Photonics for Energy-<br>Efficient Deep Learning                       |
| XCelHD: An Efficient GPU-Powered Hyperdimensional Computing with Parallelized Training 22                                                |
| HAWIS: Hardware-Aware Automated WIdth Search for Accurate, Energy-Efficient and Robust Binary Neural Network on ReRAM Dot-Product Engine |
| SynthNet: A High-throughput yet Energy-efficient Combinational Logic Neural Network · · · · · 23                                         |
| Session 3D: Software Solutions for Heterogeneous Embedded Architectures                                                                  |
| Optimal Data Allocation for Graph Processing in Processing-in-Memory Systems 23                                                          |
| Boosting the Search Performance of B+-tree with Sentinels for Non-volatile Memory 24                                                     |
| Algorithm and Hardware Co-design for Reconfigurable CNN Accelerator                                                                      |
| Exploring ILP for VLIW architecture by Quantified Modeling and Dynamic Programming-based Instruction Scheduling                          |
| Time-Triggered Scheduling for Time-Sensitive Networking with Preemption                                                                  |
| Session 4A: (SS-3) Technology Advancements inside the Edge Computing Paradigm and using the Machine Learning Techniques                  |
| A Task Parallelism Runtime Solution for Deep Learning Applications using MPSoC on Edge Devices · · · · · 26                              |
| Circuit and System Technologies for Energy-Efficient Edge Robotics                                                                       |
| RTL Regression Test Selection using Machine Learning                                                                                     |
| Session 4B: Recent Advances in Placement Techniques                                                                                      |
| Net Separation-Oriented Printed Circuit Board Placement via Margin Maximization 28                                                       |
| HybridGP: Global Placement for Hybrid-Row-Height Designs                                                                                 |
| DREAMPlaceFPGA: An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit                         |
| Session 4C: Emerging Trends in Stochastic Computing                                                                                      |
| Linear Feedback Shift Register Reseeding for Stochastic Circuit Repairing and Minimization $\cdots$ 30                                   |
| BSC: Block-based Stochastic Computing to Enable Accurate and Efficient TinyML 31                                                         |
| Streaming Accuracy: Characterizing Early Termination in Stochastic Computing                                                             |
| Session 4D: Efficient Techniques for Emerging Applications                                                                               |
| TENET: Temporal CNN with Attention for Anomaly Detection in Automotive Cyber-Physical Systems                                            |
| ELight: Enabling Efficient Photonic In-Memory Neurocomputing with Life Enhancement 33                                                    |

| Solving Least-Squares Fitting in O(1) Using RRAM-based Computing-in-Memory Technique · 339                                        | 9 |
|-----------------------------------------------------------------------------------------------------------------------------------|---|
| SonicFFT: A system architecture for ultrasonic-based FFT acceleration                                                             | 5 |
| Session 5B: Moving frontiers of test and simulation                                                                               |   |
| FIRVER: Concolic Testing for Systematic Validation of Firmware Binaries                                                           | 2 |
| WAL: A Novel Waveform Analysis Language for Advanced Design Understanding and Debugging                                           | 8 |
| Accelerate SAT-based ATPG via Preprocessing and New Conflict Management Heuristics 36.                                            | 5 |
| A Fast and Accurate Middle End of Line Parasitic Capacitance Extraction for MOSFET and FinFET Technologies Using Machine Learning | 1 |
| Session 5C: Optimizations in Modern Memory Architecture                                                                           |   |
| Lamina: Low Overhead Wear Leveling for NVM with Bounded Tail····· 37                                                              | 7 |
| Heterogeneous Memory Architecture Accommodating Processing-In-Memory on SoC For AIoT Applications                                 | 3 |
| Optimal Loop Tiling for Minimizing Write Operations on NVMs with Complete Memory Latency Hiding                                   | 9 |
| Boolean Rewriting Strikes Back: Reconvergence-Driven Windowing Meets Resynthesis 39.                                              | 5 |
| Delay Optimization of Combinational Logic by And-Or Path Restructuring 40.                                                        | 3 |
| A Versatile Mapping Approach for Technology Mapping and Graph Optimization 41                                                     | 0 |
| Session 6B: Towards Reliable and Secure Circuits: Cross Perspectives                                                              |   |
| Avatar: Reinforcing Fault Attack Countermeasures in EDA with Fault Transformations 41                                             | 7 |
| Anti-Piracy of Analog and Mixed-Signal Circuits in FD-SOI                                                                         | 3 |
| Toward Optical Probing Resistant Circuits: A Comparison of Logic Styles and Circuit Design Techniques                             | 9 |
| Session 6C: Accelerator Architectures for Machine Learning                                                                        |   |
| Dynamic CNN Accelerator Supporting Efficient Filter Generator with Kernel Enhancement and Online Channel Pruning                  | 6 |
| Toward Low-Bit Neural Network Training Accelerator by Dynamic Group Accumulation 44.                                              | 2 |
| An Energy-Efficient Bit-Split-and-Combination Systolic Accelerator for NAS-Based Multi-Precision Convolution Neural Networks      | 8 |
| Multi-Precision Deep Neural Network Acceleration on FPGAs 45                                                                      | 4 |
| Session 6D: Quantum and Reconfigurable Computing                                                                                  |   |
| Efficient Preparation of Cyclic Quantum States · · · · · 46                                                                       | 0 |
| Limiting the Search Space in Optimal Quantum Circuit Mapping · · · · · · 46                                                       | 6 |
| Efficient Routing in Coarse-Grained Reconfigurable Arrays using Multi-Pole NEM Relays · · · · 47                                  | 2 |

| Fault Testing and Diagnosis Techniques for Carbon Nanotube-Based FPGAs······ 479                                                          |
|-------------------------------------------------------------------------------------------------------------------------------------------|
| Session 7A: (SS-4) Reshaping the Future of Physical and Circuit Design, Power and Memory with Machine Learning                            |
| Fast Thermal Analysis for Chiplet Design based on Graph Convolution Networks 485                                                          |
| Design Close to the Edge for Advanced Technology using Machine Learning and Brain-inspired Algorithms                                     |
| Reinforcement Learning for Electronic Design Automation: Case Studies and Perspectives 500                                                |
| Differentially Evolving Memory Ensembles: Pareto Optimization based on Computational Intelligence for Embedded Memories on a System Level |
| Session 7B: Advances in Analog Design Methodologies                                                                                       |
| Transient Adjoint DAE Sensitivities: a Complete, Rigorous, and Numerically Accurate Formulation                                           |
| Generative-Adversarial-Network-Guided Well-Aware Placement for Analog Circuits 519                                                        |
| TAFA: Design Automation of Analog Mixed-Signal FIR Filters Using Time Approximation Architecture                                          |
| Session 7C: Low-Energy Edge AI Computing                                                                                                  |
| Efficient Computer Vision on Edge Devices with Pipeline-Parallel Hierarchical Neural Networks                                             |
| Efficient On-Device Incremental Learning by Weight Freezing                                                                               |
| Edge <sup>n</sup> AI: Distributed Inference with Local Edge Devices and Minimum Latency 544                                               |
| Large Forests and Where to Partially Fit Them                                                                                             |
| Session 7D: Emerging Technologies in Embedded Systems and Cyber-Physical Systems                                                          |
| AdaSens: Adaptive Environment Monitoring by Coordinating Intermittently-Powered Sensors · 556                                             |
| Energy Harvesting Aware Multi-hop Routing Policy in Distributed IoT System Based on Multi-agent Reinforcement Learning                    |
| An Accuracy Reconfigurable Vector Accelerator based on Approximate Logarithmic Multipliers                                                |
| Neural Network Pruning and Fast Training for DRL-based UAV Trajectory Planning 574                                                        |
| Session 8B: Advances in VLSI Routing                                                                                                      |
| High-Correlation 3D Routability Estimation for Congestion-guided Global Routing 580                                                       |
| SPRoute 2.0: A detailed-routability-driven deterministic parallel global router with soft capacity                                        |
| FPGA-Accelerated Maze Routing Kernel for VLSI Designs                                                                                     |
| Session 8C: Machine Learning with Crossbar Memories                                                                                       |
| Reliable Memristive Neural Network Accelerators Based on Early Denoising and Sparsity                                                     |

| Induction ····                                                                                                                           | . 598      |
|------------------------------------------------------------------------------------------------------------------------------------------|------------|
| Boosting ReRAM-based DNN by Row Activation Oversubscription · · · · · · · · · · · · · · · · · · ·                                        | . 604      |
| XBM: A Crossbar Column-wise Binary Mask Learning Method for Efficient Multiple Tas Adaption                                              |            |
| Session 8D: High Level Synthesis, CGRA mapping and P&R for hotspot mitigation                                                            |            |
| CGRA Mapping Using Zero-Suppressed Binary Decision Diagrams                                                                              | . 616      |
| Improving the Quality of Hardware Accelerators through automatic Behavioral Input Languag Conversion in HLS                              |            |
| Hotspot Mitigation through Multi-Row Thermal-aware Re-Placement of Logic Cells based o High-Level Synthesis Scheduling                   |            |
| Session 9A: (SS-5) Artificial Intelligence on Back-End EDA: Panacea or One-Trick Pony?                                                   |            |
| Techniques for CAD Tool Parameter Auto-tuning in Physical Synthesis: A Survey                                                            | . 635      |
| Application of Deep Learning in Back-End Simulation: Challenges and Opportunities                                                        | . 641      |
| EasyMAC: Design Exploration-Enabled Multiplier-Accumulator Generator using a Canonica Architectural Representation                       |            |
| Session 9B: Side Channel Leakage: Characterization and Protection                                                                        |            |
| DVFSspy: Using Dynamic Voltage and Frequency Scaling As A Covert Channel for Multipl<br>Procedures · · · · · · · · · · · · · · · · · · · |            |
| Fortify: Analytical Pre-Silicon Side-Channel Characterization of Digital Designs                                                         | . 660      |
| Data Leakage through Self-Terminated Write Schemes in Memristive Caches                                                                  | . 666      |
| A Voltage Template Attack on the Modular Polynomial Subtraction in Kyber·····                                                            | . 672      |
| Session 9C: Emerging Non-volatile Memory-based In-Memory Computing                                                                       |            |
| FeMIC: Multi-Operands In-Memory Computing Based on FeFETs·····                                                                           | . 678      |
| Sparsity-Aware Non-Volatile Computing-In-Memory Macro with Analog Switch Array an Low-Resolution Current-Mode ADC                        | d<br>· 684 |
| STREAM: Towards READ-based In-Memory Computing for Streaming based Data Processing                                                       | g 690      |
| Session 9D: System Level Design of Learning Systems                                                                                      |            |
| On the Viability of Decision Trees for Learning Models of Systems · · · · · · · · · · · · · · · · · · ·                                  | . 696      |
| This is SPATEM! A Spatial-Temporal Optimization Framework for Efficient Inference o ReRAM-based CNN Accelerator·····                     |            |
| HACScale: Hardware-Aware Compound Scaling for Resource-Efficient DNNs······                                                              | . 708      |
| Pearl: Towards Optimization of DNN-accelerators Via Closed-Form Analytical Representation                                                | 714        |