# 2020 30th International Conference on Field-Programmable Logic and **Applications (FPL 2020)**

Gothenburg, Sweden 31 August – 4 September 2020



**IEEE Catalog Number: CFP20623-POD** 

**ISBN**:

978-1-7281-9903-0

## Copyright © 2020 by the Institute of Electrical and Electronics Engineers, Inc. All Rights Reserved

Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limit of U.S. copyright law for private use of patrons those articles in this volume that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.

For other copying, reprint or republication permission, write to IEEE Copyrights Manager, IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08854. All rights reserved.

\*\*\* This is a print representation of what appears in the IEEE Digital Library. Some format issues inherent in the e-media version may also appear in this print version.

 IEEE Catalog Number:
 CFP20623-POD

 ISBN (Print-On-Demand):
 978-1-7281-9903-0

 ISBN (Online):
 978-1-7281-9902-3

ISSN: 1946-147X

#### **Additional Copies of This Publication Are Available From:**

Curran Associates, Inc 57 Morehouse Lane Red Hook, NY 12571 USA

Phone: (845) 758-0400 Fax: (845) 758-2633

E-mail: curran@proceedings.com Web: www.proceedings.com



# 2020 30th International Conference on Field-Programmable Logic and Applications (FPL) FPL 2020

## **Table of Contents**

| reface xiv essage from the FPL Steering Committee xv rganizing Committee xvi echnical Program Committee xviii eering Committee xxiii dditional Reviewers xxiv eynotes xxviii oonsors and Organizers xxxiii                                                                                                                                                                                                                                                                               |   |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|
| ession F1: Architecture                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |   |
| gh Bandwidth Memory on FPGAs: A Data Analytics Perspective .1. Kaan Kara (ETH Zurich, Switzerland), Christoph Hagleitner (IBM<br>Research, Switzerland), Dionysios Diamantopoulos (IBM Research,<br>Switzerland), Dimitris Syrivelis (IBM Research, Ireland), and Gustavo<br>Alonso (ETH Zurich, Switzerland)                                                                                                                                                                            |   |
| ERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling .9<br>Gagandeep Singh (Eindhoven University of Technology; ETH Zürich; IBM<br>Research Europe, Zurich), Dionysios Diamantopoulos (IBM Research<br>Europe, Zurich), Christoph Hagleitner (IBM Research Europe, Zurich),<br>Juan Gomez-Luna (ETH Zürich), Sander Stuijk (Eindhoven University of<br>Technology), Onur Mutlu (ETH Zürich), and Henk Corporaal (Eindhoven<br>University of Technology) |   |
| earn the Switches: Evolving FPGA NoCs with Stall-Free and Backpressure Based Routers .1.8  Gurshaant Malik (University of Waterloo, Canada), Ian Elmor Lang  (University of Waterloo, Canada), Rodolfo Pellizoni (University of  Waterloo, Canada), and Nachiket Kapre (University of Waterloo, Canada)                                                                                                                                                                                  |   |
| ANTT: A RISC-V Architecture Extension for the Number Theoretic Transform .26<br>Emre Karabulut (North Carolina State University, USA) and Aydin Aysu<br>(North Carolina State University, USA)                                                                                                                                                                                                                                                                                           | • |

MCEA: A Resource-Aware Multicore CGRA Architecture for the Edge .3.3..... Guilherme Korol (Universidade Federal do Rio Grande do Sul (UFRGS), Brazil), Michael Guilherme Jordan (Universidade Federal do Rio Grande do Sul (UFRGS), Brazil), Marcelo Brandalero (Brandenburg University of Technology (B-TU), Germany), Michael Hübner (Brandenburg University of Technology (B-TU), Germany), Mateus Beck Rutzig (Universidade Federal de Santa Maria (UFSM), Brazil), and Antonio Carlos Schneider Beck (Universidade Federal do Rio Grande do Sul (UFRGS), Brazil) Endurance-Aware RRAM-Based Reconfigurable Architecture using TCAM Arrays .40...... João Paulo Cardoso de Lima (Federal University of Rio Grande do Sul (UFRGS), Brazil), Marcelo Brandalero (Brandenburg University of Technology, Germany), and Luigi Carro (Federal University of Rio Grande do Sul (UFRGS), Brazil) **Session F2: Applications** HyperLogLog Sketch Acceleration on FPGA .47.

Amit Kulkarni (ETH Zurich, Switzerland), Monica Chiosa (ETH Zurich, Switzerland), Thomas B. Preußer (ETH Zurich, Switzerland), Kaan Kara (ETH Zurich, Switzerland), David Sidler (Microsoft Coporation, Redmond WA), and Gustavo Alonso (ETH Zurich, Switzerland) A Hardware/Software Co-Design of K-mer Counting Using a CAPI-Enabled FPGA .57..... Abbas Haghi (Barcelona Supercomputing Center (BSC), Spain), Lluc Alvarez (Barcelona Supercomputing Center (BSC), Spain / Universitat Politècnica de Catalunya (UPC), Spain), Jordà Polo (Barcelona Supercomputing Center (BSC), Spain), Dionysios Diamantopoulos (IBM Research Europe, Switzerland), Christoph Hagleitner (IBM Research Europe, Switzerland), and Miquel Moreto (Barcelona Supercomputing Center (BSC), Spain / Universitat Politècnica de Catalunya (UPC), Spain) An Adaptable High-Throughput FPGA Merge Sorter for Accelerating Database Analytics .6.5...... Philippos Papaphilippou (Imperial College London, UK), Chris Brooks (Dunnhumby, UK), and Wayne Luk (Imperial College London, UK) **Session S1: Architecture** A High-Performance Out-of-Order Soft Processor Without Register Renaming .73..... Satoshi Mitsuno (The University of Tokyo, Japan), Junichiro Kadomoto (The University of Tokyo, Japan), Toru Koizumi (The University of Tokyo, Japan), Ryota Shioya (The University of Tokyo, Japan), Hidetsugu Irie (The University of Tokyo, Japan), and Shuichi Sakai (The University of Tokyo, Japan) TTA-SIMD Soft Core Processors .7.9. Kati Tervo (Tampere University, Finland), Samawat Malik (Tampere University, Finland), Topi Leppänen (Tampere University, Finland), and

Pekka Jääskeläinen (Tampere University, Finland)

A Configurable TLB Hierarchy for the RISC-V Architecture .85. Nikolaos Charalampos Papadopoulos (National Technical University of Athens, Greece), Vasileios Karakostas (National Technical University of Athens, Greece), Konstantinos Nikas (National Technical University of Athens, Greece), Nectarios Koziris (National Technical University of Athens, Greece), and Dionisios N. Pnevmatikatos (National Technical University of Athens, Greece) A Service-Oriented Memory Architecture for FPGA Computing .91..... Joseph Melber (Carnegie Mellon University, USA) and James C. Hoe (Carnegie Mellon University, USA) **Session S2: Applications** FPGA Acceleration of Ray-Based Iterative Algorithm for 3D Low-Dose CT Reconstruction .98..... Linjun Qiao (Peking University, China), Guojie Luo (Peking University, China), Wentai Zhang (Peking University, China), and Ming Jiang (Peking University, China) On the Feasibility of TERO-Based True Random Number Generator on Xilinx FPGAs .1.03...... Naoki Fujieda (Aichi Institute of Technology, Japan) Accelerating Local Laplacian Filters on FPGAs .1.09. Shashwat Khandelwal (International Institute of Infomation Technology Hyderabad, India), Ziaul Choudhury (International Institute of Information Technology Hyderabad, India), Shashwat Shrivastava (International Institute of Information Technology Hyderabad, India), and Suresh Purini (International Institute of Information Technology Hyderabad, India) A Seamless DFT/FFT Self-Adaptive Architecture for Embedded Radar Applications .1.15...... Iulien Mazuet (Thales LAS-France, France; Lab-STICC CNRS, Université de Bretagne Occidentale, France), Michel Narozny (Thales LAS-France, France), Catherine Dezan (Lab-STICC CNRS, France; Université de Bretagne Occidentale, France), and Jean-Philippe Diguet (Lab-STICC CNRS, France; Université de Bretagne, France) Using DSP Slices as Content-Addressable Update Queues .1.2.1..... Thomas B Preußer (ETH Zürich), Monica Chiosa (ETH Zürich), Alexander Weiss (Accemic Technologies, Germany), and Gustavo Alonso (ETH Zürich) A Domain-Specific Architecture for Accelerating Sparse Matrix Vector Multiplication on FPGAs .1.27..... Abhishek Kumar Jain (Xilinx Inc., USA), Hossein Omidian (Xilinx Inc., USA), Henri Fraisse (Xilinx Inc., USA), Mansimran Benipal (Xilinx Inc., USA), Lisa Liu (Xilinx Inc., USA), and Dinesh Gaitonde (Xilinx Inc., USA) Exploring FPGA Optimizations in OpenCL for Breadth-First Search on Sparse Graph Datasets .133 Atharva Gondhalekar (Virginia Tech, USA) and Wu-Chun Feng (Virginia Tech, USA)

#### **Session F3: Synthesis and Testing**

(University of Guelph, Canada) RapidLayout: Fast Hard Block Placement of FPGA-Optimized Systolic Arrays using Evolutionary Algorithms .1.4.5..... Niansong Zhang (Sun Yat-sen University, China), Xiang Chen (Sun Yat-sen University, China), and Nachiket Kapre (University of Waterloo, Canada) Timing-Driven Placement for FPGA Architectures with Dedicated Routing Paths .1.53..... Stefan Nikolić (École Polytechnique Fédérale de Lausanne (EPFL), Switzerland), Grace Zgheib (Intel Corporation, USA), and Paolo Ienne (École Polytechnique Fédérale de Lausanne (EPFL), Switzerland) LFTSM: Lightweight and Fully Testable SEU Mitigation System for Xilinx Processor-Based SoCs .1.62..... Farah Abid (University of New South Wales, Australia), Darshana Jayasinghe (University of New South Wales, Australia), Sompasong Somsavaddy (Seeing Machines, Australia), and Sri Parameswaran (University of New South Wales, Australia) Using Novel Configuration Techniques for Accelerated FPGA Aging .1.69..... Tanner Gaskin (Brigham Young University, USA), Hayden Cook (Brigham Young University, USA), Wesley Stirk (Brigham Young University, USA), Robert Lucas (Brigham Young University, USA), Jeffrey Goeders (Brigham Young University, USA), and Brad Hutchings (Brigham Young University, USA)

#### **Session F4: Security**

Compact and Programmable yet High-Performance SoC Architecture for Cryptographic Pairings ...

Milad Bahadori (University of Helsinki, Finland) and Kimmo Järvinen (University of Helsinki, Finland)

X-Attack: Remote Activation of Satisfiability Don't-Care Hardware Trojans on Shared FPGAs .1.85 Dina G. Mahmoud (EPFL, Switzerland), Wei Hu (Northwestern Polytechnical University, China), and Mirjana Stojilovic (EPFL, Switzerland)

Side Channel Resistance at a Cost: A Comparison of ARX-Based Authenticated Encryption .1.93 Flora Coleman (Virginia Tech, USA), Behnaz Rezvani (Virginia Tech, USA), Sachin Sachin (Virginia Tech, USA), and William Diehl (Virginia Tech, USA)

#### **Session S3: Synthesis and Testing**

Automated Design of FPGAs Facilitated by Cycle-Free Routing .208..... Ang Li (Princeton University, USA), Ting-Jung Chang (Princeton University, USA), and David Wentzlaff (Princeton University, USA) Measuring the Accuracy of Layout Area Estimation Models of Tile-Based FPGAs in FinFET Technology .214. Sajjad Rostami Sani (Ryerson University, Canada), Farheen Fatima Khan (Ryerson University, Canada), Anas Razzag (Ryerson University, Canada), and Andy Ye (Ryerson University, Canada) Precise Pointer Analysis in High-Level Synthesis .220..... Nadesh Ramanathan (Imperial College London, UK), George A. Constantinides (Imperial College London, UK), and John Wickerson (Imperial College London, UK) Syncopation: Adaptive Clock Management for High-Level Synthesis Generated Circuits on FPGAs 225..... Kahlan Gibson (University of British Columbia, Canada), Esther Roorda (University of British Columbia, Canada), Daniel Holanda Noronha (University of British Columbia, Canada), and Steve Wilton (University of British Columbia, Canada) **Session S4: Security** Secret Sharing MPC on FPGAs in the Datacenter .23.6..... Pierre-Francois Wolfe (Boston University, USA), Rushi Patel (Boston University, USA), Robert Munafo (Boston University, USA), Mayank Varia (Boston University, USA), and Martin Herbordt (Boston University, USA) Mask Scrambling Against SCA on Reconfigurable TBOX-Based AES 243..... João Carlos Resende (Instituto Superior Técnico - Universidade de Lisboa / INESC-ID. Portugal), Ricardo I. R. Macãs (Instituto Superior Técnico - Universidade de Lisboa / INESC-ID / INESC-MN, Portugal), and Ricardo Chaves (Instituto Superior Técnico - Universidade de Lisboa / INESC-ID, Portugal) FLASH: FPGA Locality-Aware Sensitive Hash for Nearest Neighbor Search and Clustering Application 249. Wei Yan (Washington University in St. Louis, USA), Fatemeh Tehranipoor (Santa Clara University, USA), Xuan Zhang (Washington University in St. Louis, USA), and John Chandy (University of Connecticut, USA)

#### Session F5: AI, Vision & Robotics

262. Dynamically Growing Neural Network Architecture for Lifelong Deep Learning on the Edge Duvindu Piyasena (Nanyang Technological University, Singapore), Miyuru Thathsara (Nanyang Technological University, Singapore), Sathursan Kanagarajah (Nanyang Technological University, Singapore), Siew Kei Lam (Nanyang Technological University, Singapore), and Meiging Wu (Nanyang Technological University, Singapore) FP-Stereo: Hardware-Efficient Stereo Vision for Embedded Applications .269...... *Jieru Zhao (Hong Kong University of Science and Technology, China),* Tingyuan Liang (Hong Kong University of Science and Technology, China), Liang Feng (Alibaba Group, China), Wenchao Ding (Hong Kong University of Science and Technology, China), Sharad Sinha (India Institute of Technology Goa, India), Wei Zhang (Hong Kong University of Science and Technology, China), and Shaojie Shen (Hong Kong University of Science and Technology, China) A High Throughput MobileNetV2 FPGA Implementation Based on a Flexible Architecture for Depthwise Separable Convolution .277..... Justin Knapheide (Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute, Germany), Benno Stabernack (Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute, Germany), and Maximilian Kuhnke (Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute, Germany) Hardware Acceleration of Monte-Carlo Sampling for Energy Efficient Robust Robot

Manipulation .28.4.

Yangi Liu (Brown University, USA), Giuseppe Calderoni (Politecnico di Torino, Italy), and Ruth Iris Bahar (Brown University, USA)

LogicNets: Co-Designed Neural Networks and Circuits for Extreme-Throughput Applications .291 Yaman Umuroglu (Xilinx, Ireland), Yash Akhauri (Xilinx, Ireland), Nicholas James Fraser (Xilinx, Ireland), and Michaela Blott (Xilinx, *Ireland*)

#### **Session S5: AI, Vision & Robotics**

An FPGA-Based Low-Latency Accelerator for Randomly Wired Neural Networks .298..... Ryosuke Kuramochi (Tokyo Institute of Technology, Japan) and Hiroki Nakahara (Tokyo Institute of Technology, Japan)

| Relaxation 3.04                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack .31.0. Dionysios Diamanopoulos (IBM Research Europe, Switzerland), Burkhard Ringlein (IBM Research Europe, Switzerland), Mitra Purandare (IBM Research Europe, Switzerland), Gagandeep Singh (Eindhoven University of Technology, Netherlands), and Christoph Hagleitner (IBM Research Europe, Switzerland)                                                                                                                                  |
| Caffe Barista: Brewing Caffe with FPGAs in the Training Loop .31.7  Diederik Adriaan Vink (Imperial College London, UK), Aditya Rajagopal (Imperial College London, UK), Stylianos I. Venieris (Samsung AI Centre, Cambridge UK), and Christos-Savvas Bouganis (Imperial College London, UK)                                                                                                                                                                                                                                        |
| Session S6: Tools, Technology, and Other                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| A 171k-LUT Nonvolatile FPGA using Cu Atom-Switch Technology in 28nm CMOS .323                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| Ryusuke Nebashi (NEC Corporation, Japan), Naoki Banno (NEC<br>Corporation, Japan), Makoto Miyamura (NEC Corporation, Japan), Xu Bai<br>(NEC Corporation, Japan), Kazunori Funahashi (NEC Corporation, Japan),<br>Koichiro Okamoto (NEC Corporation, Japan), Noriyuki Iguchi (NEC<br>Corporation, Japan), Hideaki Numata (NEC Corporation, Japan), Tadahiko<br>Sugibayashi (NEC Corporation, Japan), Toshitsugu Sakamoto (NEC                                                                                                        |
| Ryusuke Nebashi (NEC Corporation, Japan), Naoki Banno (NEC Corporation, Japan), Makoto Miyamura (NEC Corporation, Japan), Xu Bai (NEC Corporation, Japan), Kazunori Funahashi (NEC Corporation, Japan), Koichiro Okamoto (NEC Corporation, Japan), Noriyuki Iguchi (NEC Corporation, Japan), Hideaki Numata (NEC Corporation, Japan), Tadahiko Sugibayashi (NEC Corporation, Japan), Toshitsugu Sakamoto (NEC Corporation, Japan), and Munehiro Tada (NEC Corporation, Japan)  Partial Reconfiguration for Design Optimization .328 |
| Ryusuke Nebashi (NEC Corporation, Japan), Naoki Banno (NEC Corporation, Japan), Makoto Miyamura (NEC Corporation, Japan), Xu Bai (NEC Corporation, Japan), Kazunori Funahashi (NEC Corporation, Japan), Koichiro Okamoto (NEC Corporation, Japan), Noriyuki Iguchi (NEC Corporation, Japan), Hideaki Numata (NEC Corporation, Japan), Tadahiko Sugibayashi (NEC Corporation, Japan), Toshitsugu Sakamoto (NEC Corporation, Japan), and Munehiro Tada (NEC Corporation, Japan)  Partial Reconfiguration for Design Optimization .328 |
| Ryusuke Nebashi (NEC Corporation, Japan), Naoki Banno (NEC Corporation, Japan), Makoto Miyamura (NEC Corporation, Japan), Xu Bai (NEC Corporation, Japan), Kazunori Funahashi (NEC Corporation, Japan), Koichiro Okamoto (NEC Corporation, Japan), Noriyuki Iguchi (NEC Corporation, Japan), Hideaki Numata (NEC Corporation, Japan), Tadahiko Sugibayashi (NEC Corporation, Japan), Toshitsugu Sakamoto (NEC Corporation, Japan), and Munehiro Tada (NEC Corporation, Japan)  Partial Reconfiguration for Design Optimization .328 |

## **PhD Forum**

| Efficient Ab-Initio Molecular Dynamic Simulations by Offloading Fast Fourier Transformations to FPGAs .353                                                                                                                                                                                       |         |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
| Design for ReConfigurability: An Electronic System Level Methodology to Exploit Reconfigurable Platforms .355                                                                                                                                                                                    |         |
| High-Speed Chromatic Dispersion Compensation Filtering in FPGAs for Coherent Optical Communication .357                                                                                                                                                                                          |         |
| Acceleration of Simulation Models Through Automatic Conversion to FPGA Hardware .3.5.9  Frans Skarman (Linköping University, Sweden), Oscar Gustafsson (Linköping University, Sweden), Daniel Jung (Linköping University, Sweden), and Mattias Krysander (Linköping University, Sweden)          |         |
| Securing FPGA Accelerators at the Electrical Level for Multi-tenant Platforms .3.6.1                                                                                                                                                                                                             |         |
| Resource Elastic Database Acceleration .3.63.  Kristiyan Manev (University of Manchester, UK) and Dirk Koch (University of Manchester, UK)                                                                                                                                                       |         |
| Transparent Integration of a Dynamic FPGA Database Acceleration System .365                                                                                                                                                                                                                      |         |
| Demo Night                                                                                                                                                                                                                                                                                       |         |
| Executing ARMv8 Loop Traces on Reconfigurable Accelerator via Binary Translation Framework 367                                                                                                                                                                                                   | · • • • |
| Nuno Paulino (INESC TEC and the University of Porto, Portugal), João<br>Canas Ferreira (INESC TEC and the University of Porto, Portugal), João<br>Bispo (INESC TEC and the University of Porto, Portugal), and João M.P.<br>Cardoso (INESC TEC and the University of Porto, Portugal)            |         |
| A Self-Compilation Flow Demo on FOS – The FPGA Operating System .368                                                                                                                                                                                                                             |         |
| Demo: A Closer Look at Malicious Bitstreams .3.69  Tuan La (The University of Manchester, UK), Kaspar Matas (The University of Manchester, UK), Joseph Powell (The University of Manchester, UK), Khoa Pham (The University of Manchester, UK), and Dirk Koch (The University of Manchester, UK) |         |

| RISC-V FPGA Platform Toward ROS-Based Robotics Application .37.0                                                                                                                                                         |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Demonstrating Reduced-Voltage FPGA-Based Neural Network Acceleration for Power-Efficiency 371  Erhan Baturay Onural (TOBB ETÜ), Ismail Emir Yuksel (TOBB ETÜ), and Behzad Salami (Barcelona Supercomputing Center (BSC)) |
| Author Index 373                                                                                                                                                                                                         |