Publications http://www.madnessproject.org/bebop My publication list en-us Design Space Pruning through Hybrid Analysis in System-level Design Space Exploration http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=PisPim.DATE.2012 Enabling fast ASIP design space exploration: an FPGA-based runtime reconfigurable prototyper http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=MePoTuSeRaLi.VLSI.2012 Application Specific Instruction-set Processors (ASIPs) expose to the designer a large number of degrees of freedom. Accurate and rapid simulation tools are required to handle the complexity of exploring the design space. To this aim, FPGA-based emulators have recently been proposed as a viable alternative to pure software cycle-accurate simulators, since they preserve maximum accuracy in much shorter simulation times. However, the advantages of on-hardware emulation are reduced by the overhead of the RTL synthesis/implementation process that needs to be run for every architectural configuration to be emulated. The work presented in this paper aims at mitigating this overhead, exploiting a form of software-driven platform runtime reconfiguration. We present a complete emulation toolchain that, given a set of candidate ASIP configurations, identifies and builds an overdimensioned architecture capable of being reconfigured via software at runtime, to emulate all the design space points under evaluation. The approach has been validated against two different design space exploration case studies, with a filtering kernel and an M-JPEG encoding kernel. Moreover, the presented emulation toolchain couples FPGA emulation with activity-based physical modeling to extract area and power/energy consumption figures. We show how the adoption of the presented toolchain reduces significantly the design space exploration time, while introducing an overhead lower than 10\% for the FPGA platform resources and lower than 0.5\% in terms of the operating frequency. A Signature-based Power Model for MPSoC on FPGA http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=PisPim.VLSI.2012 This paper presents a framework for high-level power estimation of multiprocessor systems-on-chip (MPSoC) architectures on FPGA. The technique is based on abstract execution profiles, called event signatures, and it operates at a higher level of abstraction than, e.g., commonly-used instruction-set simulator (ISS) based power estimation methods and should thus be capable of achieving good evaluation performance. As a consequence, the technique can be very useful in the context of early system-level design space exploration. We integrated the power estimation technique in a system-level MPSoC synthesis framework. Subsequently, using this framework, we designed a range of different candidate architectures which contain different numbers of Microblaze processors and compared our power estimation results to those from real measurements on a Virtex-6 FPGA board. Adaptivity Support for MPSoCs based on Process Migration in Polyhedral Process Networks http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=CaDeMeTuSt12.VLSI System adaptivity is becoming an important feature of modern embedded multiprocessor systems. To achieve the goal of system adaptivity when executing Polyhedral Process Networks (PPNs) on a generic tiled Network-on-Chip (NoC) MPSoC platform, we propose an approach to enable the run-time migration of processes among the available platform resources. In our approach, process migration is allowed by a middleware layer which comprises two main components. The first component concerns the inter-tile data communication between processes. We develop and evaluate a number of different communication approaches which implement the semantics of the PPN model of computation on a generic NoC platform. The presented communication approaches do not depend on the mapping of processes, and have been implemented on a Network-on-Chip multiprocessor platform prototyped on an FPGA. Their comparison in terms of the introduced overhead is presented in two case studies with different communication characteristics. The second middleware component allows the actual run-time migration of PPN processes. To this end, we propose and evaluate a process migration mechanism which leverages the PPN model of computation to guarantee a predictable and efficient migration procedure. The efficiency and applicability of the proposed migration mechanism is shown in a real-life case study. A High-Level Power Model for MPSoC on FPGA http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=PisPim.CAL.2012 This paper presents a framework for high-level power estimation of multiprocessor systems-on-chip (MPSoC) architectures on FPGA. The technique is based on abstract execution profiles, called event signatures. As a result, it is capable of achieving good evaluation performance, thereby making the technique highly useful in the context of early system-level design space exploration. We have integrated the power estimation technique in a system-level MPSoC synthesis framework. Using this framework, we have designed a range of different candidate MPSoC architectures and compared our power estimation results to those from real measurements on a Virtex-6 FPGA board. Middleware Approaches for Adaptivity of Kahn Process Networks on Networks-on-Chip http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=CaDeSt11MISC Middleware Approaches for Adaptivity of Kahn Process Networks on Networks-on-Chip http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=CaDeSt11.DASIP We investigate and propose a number of different middleware approaches, namely virtual connector, virtual connector with variable rate, and request-driven, which implement the semantics of Kahn Process Networks on Network-on-Chip architectures. All of the presented solutions allow for run-time system adaptivity. We implement the approaches on a Network-on-Chip multiprocessor platform prototyped on an FPGA. Their comparison in terms of the introduced overhead is presented on two case studies with different communication characteristics. We found out that the virtual connector mechanism outperforms other approaches in the communication-intensive application. In the other case study, which has a higher computation/communication ratio, the middleware approaches show similar performance. Towards an ESL Design Framework for Adaptive and Fault-tolerant MPSoCs: MADNESS or not? http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=madness.2011 The MADNESS project aims at the definition of innovative system-level design methodologies for embedded MPSoCs, extending the classic concept of design space exploration in multi-application domains to cope with high heterogeneity, technology scaling and system reliability. The main goal of the project is to provide a framework able to guide designers and researchers to the optimal composition of embedded MPSoC architectures, according to the requirements and the features of a given target application field. The proposed approach will tackle the new challenges, related to both architecture and design methodologies, arising with the technology scaling, the system reliability and the ever-growing computational needs of modern applications. The methodologies proposed with this project act at different levels of the design flow, enhancing the state-of-the art with novel features in system-level synthesis, architectural evaluation and prototyping. Support for fault resilience and efficient adaptive runtime management is introduced at hardware and middleware level, and considered by the system-level synthesis as one of the optimization factors to be taken into account. This paper presents the first stable results obtained in the MADNESS project, already demonstrating the effectiveness of the proposed methods. Mapping of Applications to MPSoCs http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=marwedel:2011:codes-isss2 The advent of embedded many-core architectures results in the need to come up with techniques for mapping embedded applications onto such architectures. This paper presents a representative set of such techniques. The techniques focus on optimizing performance, temperature distribution, reliability and fault tolerance for various models. Design and Architectures for Dependable Embedded Systems http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=SPP1500:11 The paper presents an overview of a major research project on dependable embedded systems that has started in Fall 2010 and is running for a projected duration of six years. Aim is a `dependability co-design' that spans various levels of abstraction in the design process of embedded systems starting from gate level through operating system, applications software to system architecture. In addition, we present a new classication on faults, errors, and failures. Embedded System Design 2.0: Rationale Behind a Textbook Revision http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=marwedel:2011:wese Seven years after its first release, it became necessary to publish a new edition of the author's text book on embedded systemdesign. This paper explains the key changes that were incorporated into the second edition. These changes reflect seven years of teaching of the subject, with two courses every year. The rationale behind these changes can also be found in the paper. In this way, the paper also reflects changes in the area over time, while the area becomes more mature. The paper helps understanding why a particular topic is included in this curriculum for embedded system design and why a certain structure of the course is suggested. Design of Fault Tolerant Network Interfaces for NoCs http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=FiMiSa11 A High-Level Power Model for MPSoC on FPGA http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=PisPim11 This paper presents a framework for high-level power estimation of multiprocessor systems-on-chip (MPSoC) architectures on FPGA. The technique is based on abstract execution profiles, called event signatures, and it operates at a higher level of abstraction than, e.g., commonly-used instruction-set simulator (ISS) based power estimation methods and should thus be capable of achieving good evaluation performance. As a consequence, the technique can be very useful in the context of early system-level design space exploration. We integrated the power estimation technique in a system-level MPSoC synthesis framework. Subsequently, using this framework, we designed a range of different candidate architectures which contain different numbers of Microblaze processors and compared our power estimation results to those from real measurements on a Virtex-6 FPGA board. Online Task Remapping Strategies for Fault-tolerant Network-on-Chip Multiprocessors http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=DeKaFi11.NOCS As CMOS technology scales down into the deep submicron domain, the aspects of fault tolerance in complex Networks-on-Chip (NoCs) architectures are assuming an increasing relevance. Task remapping is a software based solution for dealing with permanent failures in processing elements in the NoC. In this work, we formulate the optimal task mapping problem for mesh-based NoC multiprocessors with deterministic routing as an integer linear programming (ILP) problem with the objective of minimizing the communication traffic in the system and the total execution time of the application. We find the optimal mappings at design time for all scenarios where single-faults occur in the processing nodes. We propose heuristics for the online task remapping problem and compare their performances with the optimal solutions. A 0.964mW Digital Hearing Aid System http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=QiCoLi.DATE.2011 This paper concerns the design and optimization of a digital hearing aid application. It aims to show that a suitably adapted ASIP can be constructed to create a highly optimized solution for the wide variety of complex algorithms that play a role in this domain. These algorithms are configurable to fit the various hearing impairments of different users. They pose significant challenges to digital hearing aids, having strict area and power consumption constraints. First, a typical digital hearing aid application is proposed and implemented, comprising all critical parts of today's products. Then a small area and ultra low-power 16-bit processor is designed for the application domain. The resulting hearing aid system achieves a power reduction of >56x over the RISC implementation and can operate for >300 hours on a typical battery. A Middleware Approach to Achieving Fault-tolerance of Kahn Process Networks on Networks-on-Chips http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=DeDiFi11.IJRC Kahn process networks (KPN) is a distributed model of computation used for describing systems where streams of data are transformed by processes executing in sequence or parallel. Autonomous processes communicate through unbounded FIFO channels in absence of a global scheduler. In this work, we propose a task-aware middleware concept that allows adaptivity in KPN implemented over a Network-on-Chip (NoC). We also list our ideas on the development of a simulation platform as an initial step towards creating fault-tolerance strategies for KPNs applications running on NoCs. In doing that, we extend our SACRE (Self-adaptive Component Run-time Environment) framework by integrating it with an open source NoC simulator, Noxim. We evaluate the overhead that the middleware brings to the the total execution time and to the total amount of data transferred in the NoC. With this work, we also provide a methodology that can help in identifying the requirements and implementing fault tolerance and adaptivity support on real platforms. Temporal Properties of Error Handling for Multimedia Applications http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=engel:11:itg In embedded consumer electronics devices, cost pressure is one of the driving design objectives. Devices that handle multimedia information, like DVD players or digital video cameras require high computing performance and real- time capabilities while adhering to the cost restrictions. The cost pressure often results in system designs that barely exceed the minimum requirements for such a system.Thus, hardware-based fault tolerance methods frequently are ignored due to their cost overhead. However, the amount of transient faults showing up in semiconductor-based systems is expected to increase sharply in the near future. Thus, low- overhead methods to correct related errors in such systems are required. Considering restrictions in processing speed, the real-time properties of a system with added error handling are of special interest. In this paper, we present our approach to flexible error handling and discuss the challenges as well as the inherent timing dependencies to deploy it in a typical soft real- time multimedia system, a H.264 video decoder. Embedded Systems Design - Embedded Systems Foundations of Cyber-Physical Systems http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=marwedel:2011:es-design Until the late 1980s, information processing was associated with large mainframe computers and huge tape drives. During the 1990s, this trend shifted toward information processing with personal computers, or PCs. The trend toward miniaturization continues and in the future the majority of information processing systems will be small mobile computers, many of which will be embedded into larger products and interfaced to the physical environment. Hence, these kinds of systems are called embedded systems. Embedded systems together with their physical environment are called cyber-physical systems. Examples include systems such as transportation and fabrication equipment. It is expected that the total market volume of embedded systems will be significantly larger than that of traditional information processing systems such as PCs and mainframes. Embedded systems share a number of common characteristics. For example, they must be dependable, efficient, meet real-time constraints and require customized user interfaces (instead of generic keyboard and mouse interfaces). Therefore, it makes sense to consider common principles of embedded system design.Embedded System Design starts with an introduction into the area and a survey of specification models and languages for embedded and cyber-physical systems. It provides a brief overview of hardware devices used for such systems and presents the essentials of system software for embedded systems, like real-time operating systems. The book also discusses evaluation and validation techniques for embedded systems. Furthermore, the book presents an overview of techniques for mapping applications to execution platforms. Due to the importance of resource efficiency, the book also contains a selected set of optimization techniques for embedded systems, including special compilation techniques. The book closes with a brief survey on testing.Embedded System Design can be used as a text book for courses on embedded systems and as a source which provides pointers to relevant material in the area for PhD students and teachers. It assumes a basic knowledge of information processing hardware and software. Courseware related to this book is available at http://ls12-www.cs.tu-dortmund.de/~marwedel. A Monitoring System for NoCs http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=FiPaSi10 In this paper, we propose and discuss a monitoring architecture for Networks-on-Chip (NoCs) that provides system information useful for helping designers in efficiently exploiting resources available in new complex Multiprocessor System-on-Chip (MPSoC) platforms, and in understanding their behavior. We focus on the analysis of the architectural details and design challenges of such systems, by describing powerful tools for detecting information that can be used both at run-time for detecting dynamic changes in system behavior and at post-execution time for debugging and profiling of applications. We detail the design of the probes monitoring the events and discuss an architecture for collection, storage, and analysis of information generated by them. We evaluate cost of the implementation of the system in terms of area and traffic overhead, and we present results obtained when monitoring a use-case multimedia application. Scenario-Based Design Space Exploration of MPSoCs http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=Pimentel11MISC NASA: A Generic Infrastructure for System-level MPSoC Design Space Exploration http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=JiPiThBaNu10 System-level simulation and design space exploration (DSE) are key ingredients for the design of multiprocessor system-on-chip (MP-SoC) based embedded systems. The efforts in this area, however, typically use ad-hoc software infrastructures to facilitate and support the system-level DSE experiments. In this paper, we present a new, generic system-level MP-SoC DSE infrastructure, called NASA (Non Ad-hoc Search Algorithm). This highly modular framework uses well-defined interfaces to easily integrate different system-level simulation tools as well as different combinations of search strategies in a simple plug-and-play fashion. Moreover, NASA deploys a so-called dimension-oriented DSE approach, allowing designers to configure the appropriate number of, possibly different, search algorithms to simultaneously co-explore the various design space dimensions. As a result, NASA provides a flexible and re-usable framework for the systematic exploration of the multi-dimensional MP-SoC design space, starting from a set of relatively simple user specifications. To demonstrate the distinct aspects of NASA, we also present several DSE experiments in which we, e.g., compare NASA configurations using a single search algorithm for all design space dimensions to configurations using a separate search algorithm per dimension. These experiments indicate that the latter multi-dimensional co-exploration can find better design points and evaluates a higher diversity of design alternatives as compared to the more traditional approach of using a single search algorithm for all dimensions. Scenario-Based Design Space Exploration of MPSoCs http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=StPi10.2 Early design space exploration (DSE) is a key ingredient in system-level design of MPSoC-based embedded systems. The state of the art in this field typically still explores systems under a single, fixed application workload. In reality, however, the applications are concurrently executing and contending for system resources in such systems. As a result, the intensity and nature of application demands can change dramatically over time. This paper therefore introduces the concept of workload scenarios in the DSE process, capturing dynamic behavior both within and between applications. More specifically, we present and evaluate a novel, scenario-based DSE approach based on a coevolutionary genetic algorithm. Improving Transient Memory Fault Resilience of An H.264 Decoder http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=HeEnScMa10 Traditionally, fault-tolerance has been the domain of expensive, hard real-time critical systems. However, the rates of transient faults occurring in semiconductor devices will increase significantly due to shrinking structure sizes and reduced operating voltages. Thus, even consumer-grade embedded applications with soft real-time requirements, like audio and video players, will require error detection and correction methods to ensure reliable everyday operation. Cost, timing and energy considerations, however, prevent the embedded system developer from correcting every single error. In many situations, however, it will not be required to create a totally error-free system. In such a system, only perceptible errors will have to be corrected. To distinguish between perceptible and non-perceptible errors, a classification of errors according to their relevance to the application is required. When real-time conditions have to be observed, the current timing properties of the system will provide additional contextual information. In this paper, we present a structure for an error-correcting embedded system based on a real-time aware classification. Using a cross-layer approach utilizing application annotations of error classifications as well as information available inside the operating system, the error correction overhead can be significantly reduced. This is shown in a first evaluation by analyzing the achievable improvements in an H.264 video decoder under error injection and simulated error correction. Using application knowledge to improve embedded systems dependability http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=HeEnScMa10.2 Semiconductor experts are convinced that the rate of soft errors occurring in electronic devices will rise to levels that regularly affect everyday operation of devices. Correcting every single error implies a significant hardware and real-time overhead, especially for embedded devices. Hence, an error classification is needed to distinguish whether an error has to be corrected or not. In this paper, we present an approach using application knowledge. This knowledge is used to classify errors according to their relevance and the influence of their correction on the timing behavior of the whole system. When real-time conditions have to be met not all errors can be fixed immediately. Using a typical soft real-time application, an H.264 video decoder, as an example, we show that error correction can be delayed. Furthermore, we show that the correction overhead will be significantly reduced if application knowledge is employed. Fault-tolerance of Kahn Process Networks on NoC-based heterogeneous multi-core embedded architectures http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=DeFiSaMeSeRa10 Due to thermal effects and decreasing manufacturing yields with technology scaling, continuity of service of chips and of applications running on them becomes increasingly important. In line with the trend of increasing the core numbers on a single chip, we propose fault tolerance techniques for applications based on the KPN formalism and run on NoC-based multicore architectures without a shared address space. At the core of our approach, there are a task-aware middleware concept and a self-adaptive run-time environment that adapts the application at run-time to meet performance and continuity of service requirements. A Trace-based Scenario Database for High-level Simulation of Multimedia MP-SoCs http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=StPi10 High-level simulation and design space exploration nowadays are key ingredients for system-level design of modern multimedia embedded systems. The majority of the work in this area evaluates systems under a single, fixed application workload. In reality, however, the application workload in such systems (i.e., the applications that are concurrently executing and contending for system resources), and therefore the intensity and nature of the application demands, can change dramatically over time. To facilitate the simulation and exploration of different workload scenarios, this paper presents the concept of a so-called scenario database, which has been integrated in our Sesame system-level simulation framework. This scenario database compactly stores application scenarios and allows for generating application workloads - in the form of event traces - belonging to the stored scenarios for the purpose of scenario-aware simulation in Sesame. A Task-aware Middleware for Fault-tolerance and Adaptivity of Kahn Process Networks on Network-on-Chip http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=DeDi10 We propose a task-aware middleware concept and provide details for its implementation on Network-on-Chip (NoC). We also list our ideas on the development of a simulation platform as an initial step towards creating fault-tolerance strategies for Kahn Process Networks (KPN) applications running on NoCs. In doing that, we extend our SACRE (Self-adaptive Component Run-time Environment) framework by integrating it with an open source NoC simulator, Noxim. We also hope that this work may help in identifying the requirements and implementing fault tolerance and adaptivity support on real platforms. An FPGA-Based Framework for Technology-Aware Prototyping of Multicore Embedded Architectures http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=MeSeRa10 The use of cycle-accurate software simulators as a foundation for the exploration of all the possible full-system hardware-software (hw-sw) configurations does not appear to be anymore a feasible way to handle modern embedded multicore systems complexity. In this letter, an field programmable gate array (FPGA)-based cycle-accurate hardware emulation framework is presented and proposed as a research accelerator for the exploration of complete multicore systems. The framework provides the possibility to extract from the automatically instantiated hardware-emulated system a set of metrics for the assessment of the performance and the evaluation of the architectural tradeoffs, as well as the estimation of figures of power and area consumption of a prospective application-specified integrated circuit (ASIC) implementation of the considered architecture. Exploiting FPGAs for technology-aware system-level evaluation of multi-core architectures http://www.madnessproject.org/bebop/index.php?action=showcategory&by=ID&pub=SeMeRa10.2 The hardware-software co-development of modern complex MPSoC computing platforms exposes to the designer a huge complexity, resulting from the combination of vastly different architectural possibilities with strict demands posed by the target applications. To handle this complexity, highly accurate but rapid prototyping/evaluation environments need to be developed, that would possibly be able to provide an effective measurement of the system under design as soon as possible, allowing to comply with current time-to-market. While software-based fully cycle-accurate simulators do not seem to represent anymore an adequate solution to solve this issue, the attention has been recently shifted to the adoption of hardware emulators in the early stages of the design flow. In this work, we present an emulation framework for library-based semi-automatic instantiation of complex multi-core platforms that exploits FPGA devices to provide detailed functional information on the platform under development, and at the same time using hardware execution traces with technology-related analytical models to extract, already at system-level, physical metrics on power consumption, maximum operating frequency and area occupation of a prospective ASIC implementation of the system. Two prospective use case scenarios are presented to validate the usefulness of the presented framework: the first one analyzes the mapping and the scalability of a highly parallel application over a 2D homogeneous mesh architecture for increasing number of processors, while the second one employs the emulation infrastructure inside a design space exploration flow for the configuration of some interconnection network parameters.