Embedded Computing for High Performance: Efficient Mapping of Computations Using Customization, Code Transformations and Compilation
4/5
()
About this ebook
Embedded Computing for High Performance: Design Exploration and Customization Using High-level Compilation and Synthesis Tools provides a set of real-life example implementations that migrate traditional desktop systems to embedded systems. Working with popular hardware, including Xilinx and ARM, the book offers a comprehensive description of techniques for mapping computations expressed in programming languages such as C or MATLAB to high-performance embedded architectures consisting of multiple CPUs, GPUs, and reconfigurable hardware (FPGAs).
The authors demonstrate a domain-specific language (LARA) that facilitates retargeting to multiple computing systems using the same source code. In this way, users can decouple original application code from transformed code and enhance productivity and program portability.
After reading this book, engineers will understand the processes, methodologies, and best practices needed for the development of applications for high-performance embedded computing systems.
- Focuses on maximizing performance while managing energy consumption in embedded systems
- Explains how to retarget code for heterogeneous systems with GPUs and FPGAs
- Demonstrates a domain-specific language that facilitates migrating and retargeting existing applications to modern systems
- Includes downloadable slides, tools, and tutorials
João Manuel Paiva Cardoso
João Manuel Paiva Cardoso, Associate Professor, Department of Informatics Engineering (DEI), Faculty of Engineering, University of Porto, Portugal. Previously I was Assistant Professor in the Department of Computer Science and Engineering, Instituto Superior Técnico (IST), Technical University of Lisbon (UTL), in Lisbon (April 4, 2006- Sept. 3, 2008), and Assistant Professor (2001-2006) in the Department of Electronics and Informatics Engineering (DEEI), Faculty of Sciences and Technology, at the University of Algarve, and Teaching Assistant in the same university (1993-2001). I have been a senior researcher at INESC-ID (Systems and Computer Engineering Institute) in Lisbon. I was member of INESC-ID from 1994 to 2009.
Related to Embedded Computing for High Performance
Related ebooks
Software Engineering for Embedded Systems: Methods, Practical Techniques, and Applications Rating: 3 out of 5 stars3/5Real World Multicore Embedded Systems Rating: 3 out of 5 stars3/5Embedded Systems Architecture: A Comprehensive Guide for Engineers and Programmers Rating: 5 out of 5 stars5/5DSP for Embedded and Real-Time Systems Rating: 5 out of 5 stars5/5Demystifying Embedded Systems Middleware Rating: 4 out of 5 stars4/5ARM-based Microcontroller Projects Using mbed Rating: 5 out of 5 stars5/5The Designer's Guide to the Cortex-M Processor Family: A Tutorial Approach Rating: 5 out of 5 stars5/5Embedded Systems Design with Platform FPGAs: Principles and Practices Rating: 5 out of 5 stars5/5Fast and Effective Embedded Systems Design: Applying the ARM mbed Rating: 5 out of 5 stars5/5ARM® Cortex® M4 Cookbook Rating: 4 out of 5 stars4/5Real-Time Systems Development Rating: 0 out of 5 stars0 ratingsEmbedded C Programming: Techniques and Applications of C and PIC MCUS Rating: 3 out of 5 stars3/5Heterogeneous System Architecture: A New Compute Platform Infrastructure Rating: 0 out of 5 stars0 ratingsLinux for Embedded and Real-time Applications Rating: 4 out of 5 stars4/5Rapid System Prototyping with FPGAs: Accelerating the Design Process Rating: 0 out of 5 stars0 ratingsAPI Design for C++ Rating: 3 out of 5 stars3/5Programming 8-bit PIC Microcontrollers in C: with Interactive Hardware Simulation Rating: 3 out of 5 stars3/5Shared Memory Application Programming: Concepts and Strategies in Multicore Application Programming Rating: 0 out of 5 stars0 ratingsFeature Extraction and Image Processing for Computer Vision Rating: 4 out of 5 stars4/5The Art of Assembly Language Programming Using PIC® Technology: Core Fundamentals Rating: 0 out of 5 stars0 ratingsPrinciples of Computer System Design: An Introduction Rating: 1 out of 5 stars1/5Computer Engineering: A DEC View of Hardware Systems Design Rating: 4 out of 5 stars4/5GPU-based Parallel Implementation of Swarm Intelligence Algorithms Rating: 0 out of 5 stars0 ratingsBeginning Linux Programming Rating: 0 out of 5 stars0 ratingsObject-Oriented Analysis and Design for Information Systems: Agile Modeling with UML, OCL, and IFML Rating: 1 out of 5 stars1/5Design Patterns for Embedded Systems in C: An Embedded Software Engineering Toolkit Rating: 5 out of 5 stars5/5The Art of Designing Embedded Systems Rating: 4 out of 5 stars4/5Real-Time Critical Systems Rating: 3 out of 5 stars3/5Modern Embedded Computing: Designing Connected, Pervasive, Media-Rich Systems Rating: 5 out of 5 stars5/5
Hardware For You
50 Android Hacks Rating: 5 out of 5 stars5/5TI-84 Plus CE Graphing Calculator For Dummies Rating: 0 out of 5 stars0 ratingsMacBook For Dummies Rating: 4 out of 5 stars4/5CompTIA A+ Complete Review Guide: Core 1 Exam 220-1101 and Core 2 Exam 220-1102 Rating: 5 out of 5 stars5/5macOS Big Sur For Dummies Rating: 0 out of 5 stars0 ratingsDancing with Qubits: How quantum computing works and how it can change the world Rating: 5 out of 5 stars5/5Electrical Engineering | Step by Step Rating: 0 out of 5 stars0 ratingsComputer Science: A Concise Introduction Rating: 4 out of 5 stars4/5CompTIA A+ Complete Review Guide: Exam Core 1 220-1001 and Exam Core 2 220-1002 Rating: 5 out of 5 stars5/5Upgrading and Fixing Computers Do-it-Yourself For Dummies Rating: 4 out of 5 stars4/5Samsung Galaxy S23 Ultra User Guide for Beginners and Seniors Rating: 3 out of 5 stars3/5Brilliant S-Pen Apps for Your Galaxy Note Rating: 5 out of 5 stars5/5Raspberry Pi for Secret Agents - Third Edition Rating: 0 out of 5 stars0 ratingsGetting Started With MacBook Air (2020 Model): A Guide For New MacOS Users Rating: 0 out of 5 stars0 ratingsProgramming Arduino: Getting Started with Sketches Rating: 4 out of 5 stars4/5Mastering ChatGPT Rating: 0 out of 5 stars0 ratingsiPhone 12, iPhone Pro, and iPhone Pro Max For Senirs: A Ridiculously Simple Guide to the Next Generation of iPhone and iOS 14 Rating: 0 out of 5 stars0 ratingsGoing iPad (Third Edition): Making the iPad Your Only Computer Rating: 5 out of 5 stars5/5iPhone For Seniors For Dummies: Updated for iPhone 12 models and iOS 14 Rating: 4 out of 5 stars4/5Arduino: A Quick-Start Beginner's Guide Rating: 4 out of 5 stars4/5Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems Rating: 4 out of 5 stars4/53D Printing For Dummies Rating: 4 out of 5 stars4/5So you want to build a computer... Rating: 5 out of 5 stars5/5Raspberry Pi for Secret Agents - Second Edition Rating: 3 out of 5 stars3/5Build Your Own PC Do-It-Yourself For Dummies Rating: 4 out of 5 stars4/5Computer Organization and Design: The Hardware / Software Interface Rating: 4 out of 5 stars4/5iPhone 14 Pro Max User Guide for Beginners and Seniors Rating: 0 out of 5 stars0 ratingsUSB Complete: The Developer's Guide Rating: 4 out of 5 stars4/5Raspberry Pi Cookbook for Python Programmers Rating: 0 out of 5 stars0 ratings
Reviews for Embedded Computing for High Performance
1 rating0 reviews
Book preview
Embedded Computing for High Performance - João Manuel Paiva Cardoso
time
Chapter 1
Introduction
Abstract
This chapter introduces embedded systems and embedded computing in general while highlighting their importance in everyday life. We provide an overview of their main characteristics and possible external environment interfaces. In addition to introducing these topics, this chapter highlights the trends in terms of target architectures and design flows. The chapter explains the objectives of the book, its major target audience, the dependences in terms of prior knowledge, and using this book within different contexts and readers’ aptitudes.
Keywords
Embedded computing; Embedded systems; High-performance embedded computing; Embedded computing trends
1.1 Overview
Embedded computing systems permeate our lives from consumer devices, such as smartphones and game consoles, to less visible electronic devices that control, for instance, different aspects of a car's operation. Applications executing on current embedded systems exhibit a sophistication on par with applications running on desktop computers. In particular, mobile devices now support computationally intensive applications, and the trend points to a further increase in application complexity to meet the growing expectations of their users. In addition to performance requirements, energy and power consumption are of paramount importance for embedded applications, imposing restrictions on how applications are developed and which algorithms can be used.
Fig. 1.1 presents a generic and simplified architecture of an embedded computing system. A key distinguishing feature of an embedded system lies in the diversity of its input and output devices, generically known as sensors and actuators, fueled by the need to customize their use for each specific domain. In this diagram, we have a bus-based computing core system consisting of a RAM, ROM, and a processor unit. The computing core system interacts with its physical environment via a set of actuators and sensors using Analog-to-Digital (ADC) and Digital-to-Analog (DAC) converter units. At the software level, the operating system and application software are stored in ROM or in Flash memory, possibly running a customized version of the Linux operating system able to satisfy specific memory and/or real-time requirements [1] and can support additional software components, such as resident monitors, required by the embedded system.
Fig. 1.1 Block diagram of a typical embedded computing system.
Developing applications in heavily constrained environments, which are typical targets of embedded applications, requires considerable programming skills. Not only programmers need to understand the limitations of the underlying hardware and accompanying runtime support, but they must also develop solutions able to meet stringent nonfunctional requirements, such as performance. Developing these interdisciplinary skills is nontrivial and not surprisingly there is a lack of textbooks addressing the development of the relevant competences. These aptitudes are required when developing and mapping high-performance applications to current and emerging embedded computing systems. We believe that this textbook is a step in this direction.
1.2 Embedded Systems in Society and Industry
While not necessarily comprehensive, Fig. 1.2 illustrates the diversity of the domains and environments in which embedded systems operate. At home, embedded systems are used to control and monitor our appliances from the simple stand-alone microwave oven, washing machine, and thermostat, to the more sophisticated and sensitive security system that monitors cameras and possibly communicates with remote systems via the Internet. Embedded systems also control our vehicles from the fuel injection control to the monitoring of emissions while managing a plethora of information using visual aids to display the operation of our vehicles. In our cities, embedded systems monitor public transportation systems which are ubiquitously connected to central stations performing online scheduling of buses and trains and provide real-time updates of arrival times across all stops for all transportation lines. At the office, embedded systems handle small electronic devices such as printers, cameras, and security systems as well as lighting.
Fig. 1.2 Embedded computing in every daily life.
Moreover, today's smartphones are a marvel of technological integration and software development. These high-end embedded systems now include multicore processor(s), WiFi, Bluetooth, touch screens, and have performance commensurate to the performance of high-end multiprocessing systems available just a few years ago to solve scientific and engineering computing problems.
1.3 Embedded Computing Trends
Over the last decades, the seemingly limitless availability of transistors has enabled the development of impressive computing architectures with a variety and heterogeneity of devices. This resulted in the ability to combine, on a single integrated circuit, computing and storage capacity, previously unimaginable in terms of raw hardware performance and consequently in terms of software complexity and functionality. Various empirical laws have highlighted and captured important trends and are still relevant today in the context of embedded systems. These empirical laws include:
- Moore's Law—The number of components in integrated circuits doubles every 18 months.
Moore's Law is one of the driving forces leading to the miniaturization of electronic components and to their increasing complexity;
- Gustafson's Law—Any sufficiently large problem can be efficiently parallelized.
Due to the increasing complexity of embedded applications, the potential to use the many- and multicore architectures has also increased;
- Wirth's Law—Software gets slower faster than hardware gets faster.
This observation is supported by the fast advances of multicore systems and custom computing and hardware accelerators when compared with the availability of effective development tools and APIs (Application Programming Interfaces) to exploit the target architectures;
- Gilder's Law—Bandwidth grows at least three times faster than computer power.
This observation points to the advances in data transmission which amplify the advances of computing and storage technologies, forming the basis of technologies such as cloud computing and the Internet of Things (IoT).
One of the key trends in embedded systems has been the growing reliance on multicore heterogeneous architectures to support computationally intensive applications while ensuring long battery lifetimes. This increase in computing power coupled with the ever increase desire to be connected has fueled a fundamental transition from embedded systems, mostly operating in stand-alone environments, to a context where they are ubiquitously connected to other devices and communication infrastructures, in what has been coined as the Internet of Things (IoT).
As with the evolution of hardware, IoT software requirements have also evolved to support more application domains. While in the past years, the focus has been on digital signal and image processing, embedded systems are now expected to interact with other devices on the network and to support a variety of applications, e.g., with the capability to search remote databases and to compute using geographically distributed data.
Not surprisingly, the computational demands of mobile applications have also increased exponentially [2], thus exacerbating the complexity of mapping these applications to mobile architectures. It is believed (see, e.g., [3]) that in some domains neither hardware scaling nor hardware replication is enough to satisfy the performance requirements of advanced mobile applications. Therefore, in addition to the research and development of next-generation hardware architectures, it will be critical to revisit the development and mapping process of applications on resource-constrained devices. In particular, a key step ahead, when considering algorithmic and/or target hardware system changes, is the evaluation of code transformations and compiler optimizations that fully leverage the acceleration capabilities of the target system.
Another critical issue for companies is time-to-market [4] (see Fig. 1.3). Delays entering the market mean smaller overall sales as products have less time to benefit from the market before it starts to decline. Thus, a fast and efficient process to develop applications is one of the key factors for success in such competitive markets.
Fig. 1.3 Simplified sales model and marketing entry. Based on Vahid F, Givargis T. Embedded system design: a unified hardware/software introduction. 1st ed. New York, NY: John Wiley & Sons, Inc.; 2001.
1.4 Embedded Systems: Prototyping and Production
Fig. 1.4 illustrates a design flow for embedded systems. Developing a high-performance application for an embedded platform requires developers to exploit sophisticated tool flows and to master different levels of abstraction across the various stages of application development, including deployment and maintenance. In our example, the development flow begins by capturing the user requirements followed by the actual development of the application. A first proof-of-concept prototype is usually validated on a desktop computer, possibly using a programming language with features that facilitate early prototyping such as MATLAB, and relying on the emulation of external interfaces (e.g., instead of using the real camera, one can use prerecorded videos or simple sequences of images stored as files). If this initial prototype does not meet its functional requirements, the developer must iterate and modify the application, possibly changing its data types (e.g., converting double to single floating-point precision), and applying code transformations and/or refactoring code to meet the desired requirements. This process is guided by developers’ knowledge about the impact of these modifications on the final embedded version. Depending on the project at hand, the prototype may be developed in the same programming language used for the embedded version, but possibly using different APIs.
Fig. 1.4 An example of a design flow for developing an embedded application.
The next step of the development process involves modifying the prototype code to derive an embedded code implementation. This includes using emulators, simulators, and/or virtual platforms to validate the embedded version and to optimize it if needed. At this stage, developers must consider the full set of nonfunctional requirements. It is also typical at this stage to explore and test hardware accelerators. This step partitions and maps the selected computations to available accelerators. If this validation stage is successful, the application is then deployed to the target embedded system or to a hardware system emulator, and a second stage of validation is performed. If this second level validation is successful, then the application is ready to be deployed as a product.
Depending on the application, target system requirements and nonfunctional requirements (in some of the development stages described earlier) might be merged. One of the barriers preventing an effective integration of these development stages is the lack of interfaces between the corresponding tools to allow them to be truly interoperable. This limitation forces developers to manually relate the effects of the transformations and analyses across them, in an error-prone process.
The development and mapping of applications to high-performance embedded systems must consider a myriad of design choices. Typically, developers must analyze the application and partition its code among the most suitable system components through a process commonly known as hardware/software partitioning [5]. In addition, developers have to deal with multiple compilation tools (subchains) for targeting each specific system component. These problems are further exacerbated when dealing with FPGAs (Field-Programmable Gate Arrays), a technology for hardware acceleration and for fast prototyping as it combines the performance of custom hardware with the flexibility of software [5,6]. As embedded platforms are becoming increasingly more heterogeneous, developers must also explore code and mapping transformations specific to each architecture so that the resulting solutions meet their overall requirements.
One of the key stages of the mapping process is to profile the code to understand its behavior (see, e.g., [7]), which is commonly achieved by extensive code instrumentation and monitoring. In addition, the development of applications targeting high-performance embedded systems leads to source code transformed by the extensive use of architecture-specific transformations and/or by the use of tool-specific compilation directives. Such practices require developer expertize in order to understand when transformations may limit portability, as otherwise when the underlying architecture changes, developers may need to restart the design process. Another issue contributing to development complexity is the presence of different product lines for the same application in order to support multiple target platforms and/or multiple application scenarios.
A key aspect for enhancing performance is exploiting parallelism available in the computing platform. In this context, when improving the performance of an application, developers need to consider Amdahl's law [8,9] and its extensions to the multicore era [10,11] to guide code transformations and optimizations, as well as code partitioning and mapping.
1.5 About LARA: An Aspect-Oriented Approach
LARA is an aspect-oriented language able to express code transformations and mapping strategies, allowing developers to codify nonfunctional concerns in a systematic fashion, which can be subsequently applied in an automated way on their application code.
The use of Aspect-Oriented Programming (AOP) mechanisms allows LARA descriptions to be decoupled from the application code itself—an important feature to improve maintainability and program portability across target platforms. In addition, LARA descriptions can be easily composed to create increasingly sophisticated design space exploration (DSE) strategies using native LARA looping and reporting analysis constructs. In short, LARA provides a more formal vehicle to specify the strategies for the various stages of an application's design flow, in what can be seen as executable strategies.¹
Many of the descriptions of code instrumentation and transformations in this book use the LARA language [12]. Despite the many advantages of LARA as a transformation description language, this book is not about LARA. In other texts, many if not all of the mapping techniques and code transformations used when targeting high-performance embedded systems have been described in an informal, often ad hoc fashion, using abstractions of the underlying hardware and even runtime systems. The LARA descriptions presented in this book can thus be viewed as a vehicle to help the reader to clearly and unambiguously understand the various code and data transformations used in a complex mapping process. Furthermore, LARA has been developed and validated in the context of many research projects targeting real computing systems, and supports popular languages such as C and MATLAB to target heterogeneous systems including both GPUs and FPGAs.
1.6 Objectives and Target Audience
This book aims at providing Informatics Engineering, Computer Science, Computer Engineering undergraduate and graduate students, practitioners, and engineers with the knowledge to analyze and to efficiently map computations described in high-level programming languages to the architectures used in high-performance embedded computing (HPEC) domains. The required skills are transversal to various areas encompassing algorithm analysis, target architectures, compiler transformations, and optimizations. This book has also been designed to address specific trends regarding technical competences required by industry, and thus prepare computer science and informatics engineering students with the necessary skills.
This book focuses mainly on code transformations and optimizations suitable to improve performance and/or to achieve energy savings in the context of embedded computing. The topics include ways to describe computations in order for compilers and mapping tools to exploit heterogeneous architectures and hardware accelerators effectively. Specifically, the book considers the support of data and task parallelism provided by multicore architectures and the use of GPUs and FPGAs for hardware acceleration (via C and OpenCL