Sunteți pe pagina 1din 15

The Journal of Systems and Software 123 (2017) 145159

Contents lists available at ScienceDirect

The Journal of Systems and Software


journal homepage: www.elsevier.com/locate/jss

Distributed architecture for developing mixed-criticality systems in


multi-core platforms
Hctor Prez a,, J. Javier Gutirrez a, Salva Peir b, Alfons Crespo c
a
Computers and Real-Time Group, Universidad de Cantabria, Santander, Spain
b
Institute of Industrial Computing and Control Systems, Technical University of Valencia, Valencia, Spain
c
Department of Computer Engineering, Technical University of Valencia, Valencia, Spain

a r t i c l e i n f o a b s t r a c t

Article history: Partitioning is a widespread technique that enables the execution of mixed-criticality applications in the
Received 9 October 2015 same hardware platform. New challenges for the next generation of partitioned systems include the use
Revised 26 July 2016
of multiprocessor architectures and distribution standards in order to open up this technique to a het-
Accepted 27 August 2016
erogeneous set of emerging scenarios (e.g., cyber-physical systems). This work describes a system archi-
Available online 5 October 2016
tecture that enables the use of data-centric distribution middleware in partitioned real-time embedded
Keywords: systems based on a hypervisor for multi-core, and it focuses on the analysis of the available architectural
Real-time systems and embedded systems congurations. We also present an application-case study to evaluate and identify the possible trade-offs
Middleware among the different congurations.
Application virtualization 2016 Elsevier Inc. All rights reserved.
Multi-core

1. Introduction integration is often required to satisfy non-functional requirements


related to cost, weight or power consumption. The development of
1.1. Motivation and current trends this kind of applications can be enabled by means of strict space
and time partitioning such as that proposed by the ARINC-653
When multiple functionalities are integrated in the same specication (Airlines Electronic Engineering Committee, 2006),
embedded platform, it is highly likely that some of them will be which denes an API called APplication EXecutive (APEX) that
more critical to the survival of the system than others. Depend- allows multiple applications with different safety levels to be
ing on the required degree of safety, distinct levels of criticality executed in the same hardware platform (core module in the
are identied by domain-specic standards such as DO-178B or context of ARINC-653). One promising approach to build parti-
ED-12B for avionics, IEC 880 for nuclear plants or IEC61400 for tioned systems is that based on a hypervisor (Han and Jin, 2014),
wind turbines. A mixed-criticality system represents a system that a minimal layer of software with a low overhead that supports
contains two or more software applications with different critical- mixed-criticality partitions built on top of operating systems with
ity, and where failures in higher-criticality applications may lead different purposes.
to unacceptable consequences (e.g., nancial loss, environmental At the same time, embedded computing is also shifting to
harm, personal disasters or severe damage to equipment). Hence, multi-core architectures as they are able to provide higher com-
for instance, the safety concept for a wind turbine control system putational capabilities in less space. A recent study estimates that
can be ensuring a safe operational state according to the design about 45% of industrial embedded applications will rely on mul-
limits associated with the turbine. At the same time, this control tiprocessor architectures beyond 2015, and up to 95% of them
system can provide other functionalities such as real-time control will integrate several criticality levels (Ernst, 2010). However, mul-
of the turbine, communications with maintenance operators or tiprocessor architectures still have to address some challenges
management of video-surveillance cameras. to be used in safety-critical applications (MULTIPARTES, 2013;
Nowadays, the integration of mixed-criticality applications is parMERASA, 2014).
becoming more and more popular in the development of complex Unlike traditional approaches for developing partitioned sys-
embedded systems (MULTIPARTES, 2013). In some domains, this tems, hypervisor technology enables the support of the execution
of multiple operating systems on the same hardware platform,

which facilitates the use of commercial-off-the-shelf (COTS) com-
Corresponding author.
E-mail addresses: perezh@unican.es (H. Prez), gutierjj@unican.es (J.J. Gutirrez),
ponents. Furthermore, as ubiquitous connectivity increasingly
speiro@ai2.upv.es (S. Peir), acrespo@disca.upv.es (A. Crespo). penetrates traditional domains such as automotive or energy

http://dx.doi.org/10.1016/j.jss.2016.08.088
0164-1212/ 2016 Elsevier Inc. All rights reserved.
146 H. Prez et al. / The Journal of Systems and Software 123 (2017) 145159

domains (MULTIPARTES, 2013), new mechanisms may be required in partitioned distributed systems, and it also explores how
to eciently deal with the communication requirements of mixed- to address them in order to increase the communication re-
criticality applications in a transparent way. As a result of both sponsiveness of this kind of systems without compromising
trends, the use of middleware technology is starting to be seen their isolation features. Furthermore, this paper outlines the
as a potential solution (Dubey et al., 2011; Technical Standard for restrictions that distribution middleware should cope with in
Future Airborne Capability Environment, 2014), although it still order to interconnect partitions regardless of whether virtual
has to overcome the complexity traditionally associated with it. To or physical communication resources are used.
this end, distribution standards are evolving towards safety-critical
subsets of their full distribution facilities. This is the case of the Assess the feasibility of the approach
safety-critical prole for the Data Distribution Service for Real- This work presents the proposal of a partitioned distributed
Time Systems (DDS) (Object Management Group, 2007), which is real-time platform that integrates standard distribution mid-
currently under development. dleware and hypervisor technologies. Furthermore, a represen-
The DDS standard supports a comprehensive set of qual- tative case-study in the energy domain is included as a proof
ity of service (QoS) parameters that allows ne control over of concept in order to estimate the overheads incurred by our
non-functional properties. This has led to a large number of approach, as well as to identify the possible trade-offs among
developments on distributed real-time applications using DDS, the proposed architectural congurations.
such as Kang et al. (2012) for cyber-physical systems, Hakiri et
al. (2014) for cloud computing or Albano et al. (2015) for smart 1.3. Paper organization
grids. Furthermore, the use of DDS in safety-critical systems is also
attracting a high degree of interest. For instance, the Technical The remainder of this document is structured as follows.
Standard for Future Airborne Capability Environment (2014) (FACE) Section 2 presents the related work. The basic concepts of DDS
aims to bring software interoperability to avionic systems through and the hypervisor technology used in this work are introduced
open standard solutions, and it includes DDS as one of the suitable in Section 3. Section 4 explores the different system architectures
candidates to develop FACE-compliant distributed systems. that enable the use of standard distribution middleware and
partitioning over multi-core architectures. The distributed real-
time platform for partitioned multi-core systems is presented in
1.2. Key challenges and contributions
Section 5. Then, Section 6 details a case-study used to evaluate the
effectiveness of the proposed approach. Finally, Section 7 draws
As a result, handling distinct levels of criticality in software
the conclusions and outlines the future work.
systems from different industrial sectors can be a typical scenario
in the future of embedded systems engineering. However, even
when the adoption of multi-core and distribution middleware 2. Related work
provides signicant benets to mixed-criticality systems, some
challenges remain open and should be further analysed. From the Over the last years, an increasing amount of research has been
communication perspective, the main challenges are: done on the use of distribution middleware in mixed-criticality
Increasing the communication responsiveness of partitioned environments. For instance, the European Space Agency (ESA) has
systems. Traditional approaches to build ARINC-like partitioned elaborated a set of tools (Perrotin et al., 2010) to support the de-
systems often suffer from a signicant loss of performance in velopment of safety-related applications in the aerospace domain.
communications as a result of time partitioning (Prez and Gutir- This set of tools relies on a minimal middleware implementa-
rez, 2016). This loss of responsiveness in communications may tion which can be tailored to each target application through
become an impediment in taking advantage of the benets of code-generation tools. Other approaches are based on distribution
partitioning in modern complex embedded systems from different standards to boost interoperability among heterogeneous systems.
industries, as they are increasingly networked and might even re- For instance, Dubey et al. (2011) proposes a component-based
quire global connectivity. In this context, enhanced communication software construction using Java, and Higuera-Toledano (2012)
mechanisms are needed to develop distributed partitioned systems deals with CORBA Component Model and ARINC-653.
suitable to satisfy different safety, real-time and communication The growing acceptance of the DDS standard in the devel-
requirements. The use of multi-core platforms opens up the opment of distributed systems has originated several research
possibility of executing several partitions in parallel, which may contributions to investigate its design trade-off. Hence, there are
serve as a basis for the development of these new communication several works in the literature evaluating various performance
mechanisms. parameters such as latency, throughput and jitter (Almadani et al.,
Use of standard communication middleware in partitioned 2015; Xiong et al., 2007). Regarding the use of DDS middleware
systems. The emerging safety-critical proles of distribution stan- in mixed-criticality environments, the authors in Sommer et al.
dards and the adoption of hypervisor technology may facilitate (2013) and Dubey et al. (2012) apply this distribution standard
the use of COTS middleware in ARINC-like partitioned systems. for communications in the automotive and aerospace domains,
However, this development is not straightforward and requires respectively. Additionally, Putra and Kim (2015) deal with DDS
further analysis, as the virtualization of the network resources in a naval combat management system, while Marquardt et al.
may compromise the use of distribution middleware on top of the (2015) proposes the adoption of DDS in fault-tolerant avionic
hypervisor technology. systems. Nevertheless, the use of DDS in applications with high
To address these challenges, this paper identies feasible ar- levels of criticality is still an open challenge under research which
chitectural congurations that enable the use of middleware and is mainly being addressed through the adaptation of DDS to
hypervisor technologies in multiprocessor systems. In particular, ARINC-653-compliant partitioned systems. For instance, the work
this work makes the following contributions: in Karoui and Corsaro (2011) explores the architectural similarities
between DDS and ARINC-653, and Zhou et al. (2015) analyses
Identication and analysis of challenges for partitioned how DDS can provide support for the ARINC-653 communication
distributed systems based on multi-core platforms services. Even though all these works provide insights into the
This work identies the communication challenges which need feasibility of applying DDS in mixed-criticality environments, our
to be addressed when integrating multi-core architectures approach takes a step forward and explores the use of multi-core
H. Prez et al. / The Journal of Systems and Software 123 (2017) 145159 147

architectures. Furthermore, our proposal relies on virtualization are required to join a specic domain, and only entities belong-
mechanisms via a hypervisor instead of the traditional kernel ing to the same domain can communicate. The main properties
model for time and space partitioning, which enables the use of associated with DDS applications include:
COTS software on the proposed platform.
The combination of DDS and virtualization technologies was
Loose coupling. Publishers and subscribers are loosely coupled
previously addressed in Cho et al. (2013) and Garcia-Valls and in terms of time (i.e., data samples could be stored and re-
Basanta-Val (2016). While the former research explores the use trieved later) and space (i.e., middleware dynamically detects
of DDS to interconnect virtual resources on heterogeneous hy- the presence of new DDS entities through a special service
pervisors, the latter work analyses the overhead of using DDS in called discovery).
a virtualized environment. Unlike our approach, both works are
Interoperability. DDS denes a standardized protocol through
based on general-purpose virtualization technology instead of re- the DDS Interoperability Wire Protocol Specication (DDSI)
lying on certiable virtualization tools. More generally, a complete (Object Management Group, 2010). This specication denes a
survey on challenges for real-time virtualization in safety-related set of exchange information protocols and message formats to
systems can be found in Gu and Zhao (2012). guarantee that different DDS implementations can interoperate.
The increasing popularity of virtualization for multi-core Although DDSI is particularly oriented to using the UDP/IP pro-
environments has spawned interesting research on real-time tocol, it does not preclude the use of other transport protocols.
embedded systems. A review of the virtualization technologies
Fine control over non-functional properties. The QoS policies
can be found in SCOPE Alliance (2008). The use of partitioned available in DDS allow several aspects of data, networks and
systems for real-time applications was initiated in the avionics computing resources to be congured. For instance, the History
sector (Rushby, 1999), and extended to the space (Windsor and and Destination_Order QoS parameters deal with how many
Hjortnaes, 2009) and automotive (Heinecke et al., 2004) domains. data samples can be stored before delivery and the order in
Hierarchical scheduling techniques for mixed criticality systems which they should be read, respectively.
have been one of the topics that has focused the attention of Another key feature of DDS is the aforementioned Discovery
researchers (Baruah et al., 2014; Burns and Davis, 2014; Guasque service, which is responsible for obtaining information about the
et al., 2016). Other topics such as core interference due to shared characteristics of any other entity within the distributed system.
resources (e.g., cache, memory, bus, and other devices) (Cilku and This kind of information is usually referred to as discovery data.
Puschner, 2013; Kim et al., 2014) or I/O management (Muench The current version of the standard only includes a dynamic
et al., 2013; Mnch et al., 2015), have been also studied. Several discovery protocol, although it also allows static approaches to be
publicly funded research projects have been working on these implemented.
topics from different perspectives. For instance, DREAMS (2014)
and MULTIPARTES (2013) aim at facilitating the certication pro-
3.2. Overview of XtratuM
cess; parMERASA (2014) focuses on performance optimization;
and CONTREX (2014) addresses other non-functional properties
XtratuM (Crespo et al., 2010) is a bare-metal open source
such as energy eciency.
hypervisor designed for real-time embedded systems. It uses para-
The distributed architecture for multi-core platforms proposed
virtualization techniques to offer virtual machines to the execution
in this paper is built upon the XtratuM hypervisor. The main
environments. The para-virtualized model offers potential perfor-
advantage of this choice is the availability of the sources and the
mance benets when a guest operating system or application is
level of maturity, as it is used in different research (MULTIPARTES,
aware that it is running within a virtualized environment, and
2013; DREAMS, 2014; OVERSEE, 2013) and industrial projects
it has been modied to exploit this. This hypervisor has been
(Galizzi et al., 2012). This has led to XtratuM achieving a rst level
designed with simplicity in mind to be able to assess and validate
of certication for safety applications. There are other commercial
its correctness. Key points of this simplicity are the minimal
solutions like PikeOS, VxWorks or QNX (Gu and Zhao, 2012),
API, non-reentrant code, fault/interrupt isolation, etc. This special
but the license costs are very high and the availability of source
design impacts the remaining properties of XtratuM, which can be
code is not considered. Notable examples within the open source
summarised as:
community include POK, OKL4 and XEN. While the former is no
longer an active project, the second is not directly applicable in Spatial isolation, as a partition is completely allocated to
safety-critical scenarios (Eklund and Bosch, 2014), and the latter isolated memory regions which are inaccessible by the remain-
provides support for different scheduling policies at partition ing partitions. Shared memory areas are allowed if they are
level, although strict cyclic scheduling is not yet supported for statically dened at conguration time.
multi-core systems (Xi et al., 2014). Temporal isolation, as each partition is executed within spec-
ied and xed temporal intervals. The allocation of temporal
3. Background intervals to a partition is not impacted by the execution of
other partitions, although shared resources could produce
3.1. Overview of DDS interference in the execution time of its activities.
Robust inter-partition communication mechanisms, which are
The DDS standard (Object Management Group, 2007) denes based on sampling and queuing channels as dened in ARINC-
decentralised middleware architecture for anonymous, asyn- 653. The hypervisor is responsible for the management of
chronous and decoupled communications among publishers (i.e. channels in order to store and deliver messages from/to par-
suppliers of data) and subscribers (i.e., sinks of data). It is based titions that use the concept of port to send/receive them.
on a global data space where data may ow from one or many Moreover, a notication service is generated by the hypervisor
publishers to one or many subscribers. The data exchanged within to notify partitions about incoming messages.
the global data space are dened by means of topics, and sub- Static resource allocation. The system architect is responsible
scribers require registration of their interest in receiving particular for the system denition and resource allocation. This sys-
topics. When new data is supplied by publishers, middleware is tem denition is detailed in the systems conguration le,
responsible for propagating the new information transparently to which species all system resources, namely number of CPUs,
all interested subscribers. Furthermore, publishers and subscribers memory layout, peripherals, partitions, the execution plan of
148 H. Prez et al. / The Journal of Systems and Software 123 (2017) 145159

each CPU, etc. Each partition is characterized by its memory partitions involved in a communication are allocated to different
regions, communication ports, temporal requirements and cores, the hypervisor has to implement mechanisms to protect
other resources that are needed to execute its code. Static and synchronise memory accesses, based on the double buffering
resource allocation is the basis of the systems predictability technique. The interconnection of partitions belonging to different
and security. The hypervisor has to guarantee that a partition core modules also relies on these communication channels via a
can access the allocated resources and deny the requests to the communication network.
resources allocated to other partitions. XtratuM does not implement drivers at the hypervisor level
Fault isolation and management. This is a key aspect in critical in order to keep the trusted computing base (TCB) small and
systems, as faults have to be detected and handled properly veriable, and so partitions are in charge of implementing them.
in order to maintain the isolation properties. To this end, the When a device is used by several partitions, only one of them can
hypervisor implements a Health Monitor module that detects, be congured to exclusively access and manage the device, and
handles and avoids the propagation of system faults. When a this partition should provide the means (e.g., via inter-partition
fault is detected, the hypervisor identies the partition that communication mechanisms) to enable the rest of partitions to
has generated the fault and applies an action that has been make use of the shared device. This separation technique is widely
specied in the conguration le. applied in the embedded context, where this special partition is
Security. All the information in the system has to be protected known as the I/O partition.
against access and modication from unauthorized partitions XtratuM can associate a scheduling policy to each core or
or unplanned actions. Security implies the denition of a set group of cores. Two scheduling policies are implemented: cyclic
of elements and mechanisms that allow the system security and priority-based scheduling. The policy is statically specied in
functions to be established. This property is strongly related the conguration le. In cyclic scheduling, the conguration le
with the static resource allocation and a fault handling model species the temporal windows or slots in a major frame (MAF)
to identify and conne the vulnerabilities of the system. where partitions will be scheduled. In the case of priority-based
Predictability, as the services are predictable and their worst scheduling, the hypervisor selects the partition with the highest
case for a specic platform can be measured. priority among all those partitions ready to execute (i.e., this
policy is preemptive and therefore it allows switching to higher
XtratuM has been adapted to multi-core systems (Carrascosa priority partitions at any time).
et al., 2014) based on LEON4, LEON3-bicore and x86 processors
in the MULTIPARTES (2013) project. In the multi-core approach, 4. System architecture
the hypervisor can provide the partitions with several virtual
CPUs. Partitions can be mono or multi-core. Different partitions Partitioning represents a convenient approach to enable parti-
(from the point of view of the number of cores) can coexist in tions to be certied in isolation to their specic level of criticality.
the system. This approach allows multi-cores to be exploited and However, partitions may need to communicate with each other,
mono-core partitions to be executed on a multi-core platform. In and they may also require access to shared devices such as net-
order to handle the underlying multi-core hardware, two software work cards. These aspects cause dependencies among partitions
architecture alternatives can be used in XtratuM: (1) Asymmetric which may inuence both the space and the time partitioning
Multiprocessing (AMP), where an instance of the hypervisor is (Rierson, 2013):
running on each core, which executes the allocated partitions and
has exclusive access to its hardware resources; and (2) Symmetric Spatial partitioning requires mechanisms to ensure autho-
Multi-Processor (SMP), in which a single instance of the hypervisor rized transmission of data from one partition to another.
manages all hardware resources and therefore spin-locks are When the inter-partition communication is within the same
required for implementing ecient mutual exclusion mechanisms. core-module, this issue is addressed by means of the commu-
While the AMP software architecture simplies the hypervisor, the nication channels dened in ARINC-653, and the hypervisor
SMP approach permits the use of mono or multi-core execution acts as the trusted intermediary between partitions. When the
environments, and offers greater exibility to assign partitions communication is between partitions belonging to different
to different cores. Moreover, mono-core partitions in SMP archi- core-modules, another mechanism is also required to manage
tectures can be permanently allocated to one core, or to several the shared access to the network interface card (NIC). This
cores on different partition activations with no temporal overlap. mechanism can be implemented at the hypervisor-level or at
Henceforth, this paper focuses on SMP architectures and leaves partition-level: while the rst approach would increase the
AMP architectures for future work. complexity and size associated with the hypervisor, the second
In a partitioned environment, XtratuM is in charge of the in- one is more exible since it allows device drivers already
terrupt management and offers partitions a virtual interrupt table implemented in the OS to be used. As commented in Section 3,
and a set of services to interact with virtual interrupts such as XtratuM follows the second approach so sharing a NIC among
enable, disable, mask, unmask, etc. The virtual table also includes multiple partitions should focus on handling the contention by
the traps and exceptions generated by the processor which cannot dening an I/O partition with exclusive access to the network
be manipulated by the partitions mimicking the real hardware. card. As a result, this I/O partition acts as an intermediate
Additionally, the hypervisor denes new extended virtual interrupts trusted entity that is aware of the authorized inter-partition
to deal with partitioned systems. Some of these new virtual communications.
interrupts are related to the start of the execution window or the Inter-partition communication can also inuence time parti-
arrival of inter-partition messages. tioning by means of network interrupts and the shared access
The inter-partition communication relies on sampling and to the NIC. In mono-core environments, interrupts will be gen-
queuing channels (as dened in ARINC-653) provided by the erated as soon as the data arrives and they should be latched
hypervisor, for sending or receiving messages. Partitions can only until the scheduled execution of the I/O partition, which will
access ports for performing these operations and they do not have subsequently propagate the message to the target partition.
direct access to the message buffers so the transmission layer is Therefore, this partition should be executed with sucient
implemented by the hypervisor. In this case, the hypervisor in- regularity to full the I/O requirements of other partitions,
stance handles the message buffers for communication. When the which may even jeopardise the overall performance of the dis-
H. Prez et al. / The Journal of Systems and Software 123 (2017) 145159 149

Fig. 1. System architecture for distributed partitioned applications.

tributed system (Prez and Gutirrez, 2016). The step towards dising temporal isolation. The main contribution of this section is
multi-core environments may mitigate this issue, as it enables analysing the use of the hypervisors virtual interrupt model to ad-
the I/O partition to be in execution when the interrupts arrive. dress temporal interference problems in a distributed architecture.
In the virtual interrupt model (Crespo et al., 2010), the parti-
As a consequence of these issues, a set of challenges still re-
tions declare the handled interrupts in the systems static cong-
mains open in order to address the integration of distribution mid-
uration le, and the hypervisor manages hardware interrupts and
dleware, partitioned systems and multiprocessor architectures. One
propagates the congured virtual interrupts to the partitions, thus
important challenge to address in communications for partitioned
ensuring that partitions have no access to hardware interrupts.
environments is the overhead associated with time partitioning.
When a partition is scheduled, the interrupts that are not
When multiple partitions require access to the communication ser-
dened in the conguration le for this partition are disabled in
vices, time partitioning may cause extra delays in the processing of
order to avoid interference among cores. Otherwise, the hypervisor
incoming messages, either due to the delivery of interrupts or due
propagates the pending virtual interrupts generated when the
to the concurrent access to shared physical network resources. To
partition was not under execution. As a result, when a hardware
increase the communication responsiveness of partitioned systems
interrupt occurs, it is rstly handled by the hypervisor and subse-
in multi-core platforms, enhanced mechanisms should be adopted
quently propagated to the virtual interrupt table of the partitions
for the management of communication interrupts and for minimiz-
(Crespo et al., 2010). Partitions have to take into account that
ing the potential bottleneck associated with I/O operations.
at the beginning of their slot some pending interrupts can be
Furthermore, the use of standard distribution middleware
present, and these interrupts should be enabled by the partition
in partitioned systems must address the challenge of providing
in order to detect them.
interoperability among heterogeneous systems. To this end, our ap-
In mono-core systems, partitions are scheduled sequentially.
proach should facilitate the integration of distribution middleware
However, multi-core systems enable the parallel scheduling of
on top of the ARINC-like communication service by interconnecting
partitions in multiple cores. In the multi-core scenario, a core can
partitions via the standard communication mechanism proposed
be dedicated to implement the I/O server partition and therefore,
by DDS, as this would enable the interoperability of partitions re-
when a partition sends a message to another partition that is
gardless of whether they are running within the same core-module
being executed at the same time in another core, the message
or not and even with open systems (i.e., non-partitioned systems)
should be delivered as soon as it is sent. For this purpose, XtratuM
The selected strategy to address these issues may determine
provides the inter-processor virtual interrupts (IPVI) mechanism
the conguration of some architectural aspects of the distributed
to implement communication and event notication between
partitioned platform. Fig. 1 presents an overview of the integration
different virtual CPUs. These represent a special kind of interrupt
of distribution middleware, partitioned systems and multiproces-
used for the notication of specic partitioning events such as the
sor architectures. It is composed of a set of key elements which
arrival of an inter-partition message in multi-core systems.
will be detailed throughout this section.
In the case of distributed systems, there is likely to be a contin-
uous exchange of inter-partition messages among the I/O partition
4.1. Communication challenges in partitioned distributed systems
and the remaining partitions with communication requirements.
based on multi-core platforms
In this context, the delivery of the extended interrupts associated
A. Interrupt management at hypervisor level with the inter-partition messages should be performed as soon as
they are produced. To address this issue in a multi-core platform,
Interrupts are a known source of temporal indeterminism, our approach propagates an IPVI each time an inter-partition
which negatively affects temporal isolation (Rierson, 2013). To message is sent. As a consequence, the receiving partition should
address this problem, the XtratuM hypervisor provides a virtual be interrupted, if enabled by the partition, and the message from
interrupt model (see Section 3) to prevent partitions from jeopar- the communication port should be read.
150 H. Prez et al. / The Journal of Systems and Software 123 (2017) 145159

Fig. 2. Example for the partitioned distributed system.

In addition, it is necessary to address the worst-case scenario while one core can be in charge of the I/O partition under a
of an interrupt storm, where a faulty hardware device, core or xed-priority scheduling policy. This approach allows time and
partition can endlessly start triggering interrupt requests (IRQs, space isolation to be guaranteed in a set of cores and thus it
or equivalently IPVIs), thus affecting the temporal isolation of the may execute partitions with certication requirements.
system. To address this worst-case scenario, the solution adopted
only propagates the rst IRQ, while the following IRQs are marked To better explain the bottleneck associated with the commu-
as pending in the virtual interrupt controller. Consequently, the nications and its possible solutions, we rely on a simple example
partitions are in charge of checking the virtual interrupts pending of a partitioned system using the aforementioned I/O partition.
register at the beginning of the partition slot to manage pending Fig. 2 illustrates this example where Node 1 gets data from a
interrupts. peripheral and sends Message #1 to Node 2 in order to process
them. As a result of the processing, Message #2 is sent to Node
B. The conguration of the I/O partition 3 to commit an actuation. Without loss of generality, our analysis
The conguration of the time window allocated to each parti- is focused on Node 2. When partitioning is used, Message #1 is
tion represents a key step in the design of distributed partitioned received by the I/O partition, which is in charge of forwarding it to
applications. This is not a trivial problem, and it is even harder the Process_Data partition as shown in Fig. 2. For this example, all
with inter-partition dependencies as in the case of the I/O parti- the processing in Node 2 presents a deadline of 10 units of time,
tion which should be executed with sucient regularity to full which is computed from the time Message #1 is received until
the I/O requirements of other partitions. Under certain network Message #2 is transmitted. Additionally, the worst-case execution
workloads, this may require a signicant increase in the slot time times for the I/O and the data processing operations are 2 and
assigned to the I/O operations which can inuence the overall per- 3 units of time, respectively. From the viewpoint of XtratuM, we
formance (Prez and Gutirrez, 2016). To overcome the potential propose three partition-based congurations:
bottleneck caused by the I/O partition, the three following three
Cyclic-based partitioning for a mono-core system (Cong.
strategies are explored:
#1)
1 The use of a multiprocessor approach. This allows one core to This approach is depicted in Fig. 3, where the whole capacity of
be exclusively assigned for communications, which could avoid the single available CPU is used to run the proposed example.
the extra delays inherent to the time window conguration Hence, the application has been congured to have a dedicated
(see Fig. 1). In this case, the I/O partition can process the time window of 2 units of time for the I/O partition, and 3 units
messages coming from the NIC or from other partitions at of time for the Process_Data partition, resulting in a scheduling
any time. It should be remembered that partitions which are plan repeated every 5 units of time (this is the MAF). Therefore,
allocated to different cores may be running simultaneously, the I/O partition is executed regularly to check for communica-
and thus inter-partition messages may be delivered as soon as tion interrupts. Once the interrupt has been triggered, the mes-
they are produced. Otherwise, partitions are running concur- sage may or may not be fully processed within the current time
rently but not in parallel, and thus inter-partition messages are slot. The latter case may cause a signicant delay in the pro-
not delivered until the next execution of the reader partition cessing of the incoming message, as the Process_Data partition
(which is determined by the selected scheduling policy). must wait until the next time slot to process the message. Fig.
2 The application of new scheduling policies at partition level. This 3 shows a scenario where Message #1 arrives before the end of
enables the use of priority-based scheduling to determine the the slot allocated for the I/O partition and cannot be processed
order in the execution of partitions. Although this would en- completely; this causes a worst-case response time of 11 units
able the execution of the I/O partition to be prioritised even in of time and therefore the deadline is missed. This conguration
mono-core systems, this solution jeopardises the time isolation represents a common scenario in current partitioned systems,
feature required for safety-critical software. Nevertheless, it and it will be our reference case to compare with.
could be of interest for those scenarios where a high level of
criticality is not required, as the I/O partition may share the Cyclic-based partitioning for a multi-core system (Cong. #2)
CPU with other background activities. This conguration allows the I/O partition to be allocated as
3 The combination of scheduling policies in a multiprocessor ap- a standalone partition in Core #1. Thus, the communication
proach. The coexistence of cyclic and priority scheduling at the interrupts are processed by this partition as soon as they have
same time could be feasible as long as a physical core is only been triggered. However, the Process_Data partition may have
assigned to a single scheduling policy. For instance, a set of to wait until the next slot as shown in Fig. 4A, where the mes-
cores can execute partitions under a cyclic scheduling policy, sage from the I/O partition is generated while other activities
H. Prez et al. / The Journal of Systems and Software 123 (2017) 145159 151

Fig. 3. Cyclic-based partitioning for a mono-core system.

MAF MAF

CORE #2
CORE #2

PROCESS_DATA PROCESS_DATA PROCESS_DATA PROCESS_DATA PROCESS_DATA PROCESS_DATA

5 5
TIME SLOT TIME SLOT INTER-PARTITION
INTER-PARTITION MESSAGES MESSAGES

CORE #1
CORE #1

I/O
I/O
I/O

I/O

1 3 8 10 t 1 3 8 10 t

MESSAGE #1 MESSAGE #2 MESSAGE #1 MESSAGE #2

DEADLINE=10
DEADLINE=10
WCRT=10-1=9
WCRT=10-1=9

(A) Cyclic-based partitioning (B) Priority-based partitioning


Fig. 4. Partition-based congurations for a multi-core system.

can be executed in a third partition allocated in Core #2. This cality, but using a different scheme. For Cong #2, one core must
causes a delay in the processing of the message, but the system be exclusively assigned to communications, while the remaining
is still able to meet the deadline, as the worst-case response cores can execute partitions with different levels of criticality in
time is 9 units of time. Therefore, the overhead associated with a similar way to the traditional approach based on mono-core.
the cyclic-based scheduling for mono-core systems is partially Cong#3 adds exibility in the development of the entire system
mitigated in this conguration due to a higher responsiveness by supporting the execution of multiple partitions in each core,
for the I/O operations. but at the cost of jeopardising the temporal isolation in the core
where communications are allocated. Despite this issue, high
Priority and cyclic-based partitioning for a multi-core criticality partitions may still be executed in any of the remaining
system (Cong. #3) cores, as long as temporal isolation is guaranteed in these cores.
In this case, the I/O partition can be congured with the
highest priority and therefore it is executed as soon as the
communication interrupt is triggered by XtratuM. As in Cong. 4.2. Challenges for distribution middleware in partitioned
#1 and #2, the Process_Data partition may have to wait until environments
the next slot to process the incoming message. This congura-
tion provides partitioned systems with a lower overhead for the One of the most important challenges that distribution mid-
I/O operations even when the core running the I/O partition is dleware should overcome to be used in safety applications is
shared with other background activities (see Fig. 4B, where the complexity. As noted previously, this has led to the need for evolv-
worst-case response time is 9 units of time). This conguration ing towards safety-critical subsets of their full distribution facilities
may imply a higher complexity in the partitions code associ- (e.g., the future safety-critical prole for DDS). A safety-critical sub-
ated with Core #1, as they have to control their execution state set of DDS would allow certiable partitions to be interconnected
(i.e., changing to an idle state after nishing their execution). via a standard communication mechanism, and it would also facil-
itate the interoperability among heterogeneous systems regardless
Another possible conguration would be using a priority-based of whether they are partitioned or not. The current DDSI speci-
partitioning for the entire system. This solution does not only cation only considers UDP/IP as the underlying transport protocol,
jeopardise the time isolation feature, but it also increases the whose use has even been adopted for safety-critical environments
complexity of the partitions code as described for Cong. #3. under certain restrictions (Airlines Electronic Engineering Commit-
A comprehensive summary of the available congurations for tee, 2005). Therefore, the use of UDP/IP on top of the port-based
the allocation and scheduling of I/O partitions and their main communication service dened by ARINC-653 would facilitate
features are shown in Table 1. This table shows how using a multiple implementations to interoperate. However, the use of
multi-core platform together with the appropriate mechanisms DDSI on top of the ARINC-653 communication service could also
can assist in increasing the responsiveness in communications by be worthy of consideration, as shown in Fig. 1 (i.e., bypassing the
minimizing the overhead associated with time partitioning. It is UDP/IP protocols). In any case, interoperability among systems
worth noting that the two proposed congurations for multi-core with different communication services can be guaranteed by using
may enable the execution of partitions with a high level of criti- the I/O partition as a bridge between transport protocols.
152 H. Prez et al. / The Journal of Systems and Software 123 (2017) 145159

Table. 1
Summary of available congurations for the I/O partition.

Conguration Platform Deadline Worst-case response time Remarks

CONFIG #1 mono-core 10 11 Used as reference case


CONFIG #2 multi-core 10 9 One core strictly dedicated to communications
CONFIG #3 multi-core 10 9 Multiple partitions on each core

The integration of ARINC-653 communication services into DDS A. Operating system support
would require addressing the restrictions imposed by this kind of
partitioned systems (Prez and Gutirrez, 2014). To this end, DDS In a distributed system partitioned with XtratuM, the I/O
applications in partitioned systems should be congured to be partition has exclusive access to the NIC while the remaining
consistent with ARINC-653, and they would require: partitions must rely on the communication services provided by
the hypervisor. To minimize the complexity in the development
Single source of messages. Traditional middleware architectures of the platform, our approach should provide a homogeneous
often rely on the many-to-one communication model. For network interface to the middleware layer regardless of whether it
instance, DDS implementations may use a single receiving port has exclusive access to the NIC or not. It would enable ARINC-like
for the processing of several topics, the implementation of re- communication services to be congured without modifying the
dundancy of publishers or the automatic discovery of entities in DDS implementation. This is the purpose of the virtual network
the distributed system. In this context, a single DDS destination card (V-NIC), an entity that allows messages to be sent and
port can receive data from several sources. However, ARINC- received by partitions (see Fig. 1).
653 communications are based on the channel concept, which The V-NIC entity is implemented at operating system level and
interconnects a single source with one or more destinations. As it can make use of any of the communication facilities provided by
a result, middleware for partitioned systems should implement the hypervisor. Underneath this virtual device, the hypervisor will
new mechanisms to address the use of shared receiving ports at provide the required mechanisms for transporting the messages
middleware-level (e.g., as shown in Prez and Gutirrez, 2014). through an entity called the virtual network, which is described
Support for two modes of transfer. ARINC-653 denes sampling below.
and queuing communication channels. The main difference
is that received messages always overwrite previous data in B. Low-level communications design
sampling channels, whereas a bounded number of messages
can be buffered in queuing channels. In DDS, message buffer- In a distributed partitioned system, communications are
ing is supported through the History QoS parameter (Object performed at two levels:
Management Group, 2007), which allows the setting of the Communications between partitions through the virtual
maximum number of data samples that can be buffered. There-
network provided by the hypervisor
fore, this parameter should be consistent with the underlying
In XtratuM, the interconnection of partitions is performed
ARINC-653 communication mode. Furthermore, messages in
through a set of communication services which are grouped
queuing channels are buffered according to a FIFO order and
in the virtual network entity shown in Fig. 1. This entity repre-
therefore DDS should mimic this behavior. To this end, the
sents a logical and isolated network responsible for connecting
Destination_Order QoS parameter (Object Management Group,
the V-NICs that belong to a single core module. Partitions
2007) must order data based on the reception time.
with safety-related requirements should rely on the ARINC-like
Static conguration of communication resources. DDS is designed
communication service with pre-dened connections. However,
to be adaptable to changes in the distributed system, but the
low-criticality partitions may rely on other communications
communication resources for partitioned systems are non-
services such as XMIO (Prez and Gutirrez, 2016) or the use
changeable and they are dened at conguration-time. Hence,
of pre-allocated areas of shared memory, as these services
the QoS parameters must only be set before the corresponding
are suitable for open systems with variable workload and/or
entity is enabled.
dynamic connections. In this case, memory isolation properties
Bounded size of messages. Inter-partition messages may have a
are guaranteed by XtratuM in the sense that the memory areas
variable length, but the maximum length must be bounded at
to be shared and the specic partitions which are allowed to
conguration time. As DDS messages can include both user and
access them are explicitly congured by the user.
protocol data, middleware implementations should preserve the
boundary of messages dened by the ARINC-653 conguration. Communications between core modules via the communi-
Authorized inter-partition communications. While ARINC-653
cations network
ensures that all the data ows among partitions are dened
In this case, the I/O partition is responsible for redirecting
at conguration time, DDS is designed for the automatic
messages from the remaining partitions within the same
and dynamic discovery of entities throughout the distributed
core module to the communications network. When multi-
system. In a partitioned distributed system with static connec-
ple mixed-criticality partitions are accessing the NIC, the I/O
tions, the discovery data should also rely on a static scheme
partition should be certied at the same level as the highest
(Dubey et al., 2012), in which publishers and subscribers
level of criticality required by the partitions that use these
are manually asserted by the partition developer (i.e., only
I/O services. Therefore, it would be more appropriate that this
asserted entities may be discovered). Hence, the data associ-
partition routes received messages according to a previously
ated with the discovery of entities is bounded and restricted
generated conguration table in safety-critical environments.
to the initialization phase. Furthermore, the discovery infor-
Another option relies on forwarding the network messages by
mation can be automatically generated from the ARINC-653
means of the DDS facilities, as shown in (Prez and Gutirrez,
conguration.
2016). Under this approach, messages are not opaque and can
Additionally, the virtualization of the network resources may be processed by the I/O partition in order to provide other
also compromise the adoption of distribution middleware in kinds of features (e.g., fault-tolerance or scalability). However,
partitioned systems, which are briey discussed below. this option would require a certiable DDS implementation.
H. Prez et al. / The Journal of Systems and Software 123 (2017) 145159 153

Nevertheless, this approach based on the I/O partition may technologies, XtratuM provides guest operating systems with an
compromise the benets of decoupled communications as this interface or hypercalls for accessing hardware. Therefore, the guest
special partition represents a single-point of failure within the operating system must be adapted to use this interface instead of
core-module (i.e., it routes all the trac coming from/to a specic accessing the native hardware. Similarly, the design of MaRTE OS
network). To partially mitigate this issue, the I/O partition usually also includes a low-level abstract interface (Aldea and Gonzlez,
executes simple and trusted code. 2001) for accessing the hardware, which facilitates the migration
from one platform to another. As a result, the porting of MaRTE
5. The distributed, partitioned platform OS to XtratuM focuses on extending this low-level interface to
use the hypercalls. The main challenges for the adaptation to the
This section describes our proposal of distributed real-time hypervisor are briey described below:
platform for partitioned systems. In particular, the platform Use of virtualized services. For instance, this porting has
consists of a DDS implementation called RTI Connext Micro, a
also required addressing the clock and timer management.
real-time operating system called MaRTE OS (Aldea and Gonzlez,
Hardware clocks and timers are owned by XtratuM and thus
2001) which follows the POSIX.13 minimal real-time system pro-
guest OSs do not have access to them. Instead, separate virtual
le, and the aforementioned XtratuM as the hypervisor. The devel-
clocks/timers are provided to each partition. Therefore, MaRTE
opment of this platform has focused on validating the feasibility
OS has been modied to access these services through the
of the proposed system architecture. To this end, the platform
appropriate XtratuM hypercalls. Advanced features of MaRTE
provides support for the following features shown in Fig. 1:
OS include support for execution-time clocks per thread as
The virtual network is based on the ARINC-653 communication dened in POSIX. As XtratuM only provides partitions with a
service provided by XtratuM. global monotonic clock and a local execution-time clock for
The V-NIC entities have been implemented on top of the each virtual CPU, MaRTE OS must be adapted to build the
real-time operating system in order to provide partitions with execution-time clocks per thread upon the latter.
functionality similar to network cards. New communication mechanism to interconnect partitions.
Communications among partitions using DDS rely on the Support for communications in partitioned systems has also
UDP/IP transport. A future version of the platform could provide been developed in MaRTE OS as shown in Fig. 1. V-NICs
DDSI with an underlying transport based on the ARINC-653 have been designed to provide partitions with the same basic
communication service (Prez and Gutirrez, 2014). functionality as network cards, and therefore they have been
implemented as standard drivers (Aldea and Gonzlez, 2001)
Nevertheless, the integration of these technologies is not which provide standard POSIX functions such as close, open,
straightforward and requires some extensions which are briey ioctl, write or read. The virtual network entity is built upon
described next. the ARINC-653 communication service provided by XtratuM.
Relying on the virtualized interrupt model. As commented
5.1. Adaptation of the DDS implementation in Section 3, interrupts cannot be directly handled by the OS
(i.e., enabling/disabling them directly is not allowed). Instead,
The underlying system should provide distribution middleware the OS must be modied to use the appropriate hypercalls.
with different services such as threading, data synchronization Furthermore, the OS must be congured to dene the asso-
or timing services. In a partitioned environment, some of these ciated interrupt handlers only for those virtual interrupts and
services are virtualized and directly provided by the hypervisor. extended virtual interrupts which are allowed by the system
However, one of our key challenges is enabling the use of COTS conguration le.
middleware in ARINC-like partitioned systems so DDS should ac- Adaptation of the deployment process. The deployment pro-
cess these services through a standard operating system interface cess of a partitioned system has been adapted to include the
such as POSIX. binaries of the partitions running over MaRTE OS. Once the
The DDS middleware implementation used is RTI Connext Mi- system conguration le has been lled up with the resources
cro v2.2.3, a minimal DDS aimed at resource-constrained devices allocated to each partition (e.g., communication ports, mem-
which has been designed with certication requirements in mind. ory areas, interrupt lines and I/O ports) and validated, the
To facilitate the portability of RTI Connext Micro for different deployment process generates a software container with all
platforms, two abstraction layers called OSAPI and NETIO have the components of the system: boot loader, partitions and the
been designed to be OS and transport independent, respectively. hypervisor.
Therefore, only the implementation of these abstraction layers
needs to be modied. In particular, this porting has required the Other modications such as those required to provide a virtual
following functionality: shell or terminal for input commands are not necessary, as MaRTE
OS is a single-process, multi-threading embedded OS.
Operating System services. The MaRTE OS adaptation layer for Additionally, the integration of middleware in the partitioned
RTI Connext Micro provides functions to allocate and free platform has also required some modications at the OS level.
memory, and to create and congure mutexes and threads. Since the DDS standard supports non-blocking and blocking mech-
Furthermore, it also provides logging and time-related facilities. anisms for the reception of data, the V-NIC entity should also pro-
Network services. The NETIO module provides a common vide support for them. While support for the non-blocking mech-
network interface for the underlying transport. As the DDS anism is directly built upon the XtratuM communication service,
implementation relies on the UDP/IP transport, only a small set the OS should develop a blocking mechanism to allow middleware
of changes has been required in order to provide functions for to wait for incoming messages. In our case, this mechanism has
registering and managing the NIC. been implemented via the extended virtual interrupt mechanism.

5.2. Adaptation of the real-time operating system 6. Evaluation

In XtratuM, partitions need to be virtualized in order to This section aims to evaluate the effectiveness of the proposed
be executed on top of the hypervisor. Unlike other emulation congurations in a simulated wind power plant (inspired by an
154 H. Prez et al. / The Journal of Systems and Software 123 (2017) 145159

Fig. 5. Schema of a wind power plant.

industrial use case in MULTIPARTES (2013), which is depicted in Priority and cyclic-based partitioning for a multi-core system
Fig. 5. A wind farm can be composed of hundreds of intercon- (Cong. #3). In this scenario, the Communications and the
nected wind turbines, each of them with a supervisory unit which Services partitions are assigned to one core which is scheduled
provides the following functionalities: through xed priorities, and the former partition is congured
with the highest priority. The Control and the Safety partitions
Control. The supervisory unit gathers a variety of on-line data are assigned to a second core with cyclic-based scheduling.
to provide real-time control over the wind turbine. Among oth-
ers, it receives information about the wind speed, precipitation Although LEON and PPC processors are mainly used in space
or battery levels in order to enable control operations such as and avionics applications respectively, x86 processors are used
shutting down or starting up turbines, and resetting controller in industrial applications within the transportation or energy
alarms. domains (MULTIPARTES, 2013). Therefore, the latter represents
Communications. The supervisory unit sends on-line data to a simple and convenient choice for a proof of concept imple-
reporting servers, maintenance operators or customers. mentation and thus the proposed distributed platform has been
Safety. The supervisory unit ensures that the turbine always developed on top of x86-based hardware. To better focus on the
stays in a safe state by guaranteeing that design limits are not evaluation of the proposed architecture, the distributed application
exceeded (e.g., rotor overspeed or excessive vibration). is composed of only three nodes: the rst node is the partitioned
Services. It may include other third-party applications that may supervisory unit for the target turbine, and it runs in an Intel
be of interest such as video-surveillance cameras to detect the i5-4570 quad-core processor with a clock rate of 3.2 GHz; the sec-
presence of near-ying birds. This particular service can take ond node is responsible for the real-time monitoring of the target
corrective actions (e.g., dissuasion or warning notication) and turbine, and it is running in a single core 2.8 GHz non-partitioned
has soft real-time requirements. node; the third node is a reporting server where logs are stored,
and it is also a single core 2.8 GHz non-partitioned node. The
As each of these functionalities presents different safety, real- three nodes are connected through an isolated 100 Mbps Ethernet
time and communications requirements, this wind power system switch in which internal trac has been disabled. The software
can benet from using the proposed integration of technologies platform was described in Section 5 and consists of RTI Connext
(middleware, RTOS and hypervisor) on a multi-core platform. Micro v2.2.3, MaRTE OS v1.9 and XtratuM v3.7.2. In this evalu-
Furthermore, the use of DDS can also provide support to full ation, partitions are interconnected through the V-NIC interface
other kinds of requirements such as high data availability through (using DDS and UDP/IP on top of the ARINC-like communication
the provided set of quality of service (QoS) parameters (Object service), and the Communications partition is exclusively designed
Management Group, 2007). Therefore, this integration of technolo- to forward messages through predened communication links.
gies allows the functionalities to be split into different partitions
as shown in Fig. 5. According to Section 4, the mixed-criticality 6.1. Performance test
wind power system could be congured as follows:
This test will evaluate the response time of processing a turbine
Cyclic-based partitioning for a mono-core system (Cong. #1). data sample requested by the real-time monitoring node (i.e., the
This constitutes our reference scenario, as the four partitions data ow between the Control partition and the maintenance oper-
are executed in the same processor. ator in Fig. 5). The measurement is carried out from the time when
Cyclic-based partitioning for a multi-core system (Cong. #2). Un- the data sample is requested until it is returned and processed. An
like the reference scenario, one core is exclusively assigned to additional non-critical data ow between the Services partition and
execute the Communications partition. The remaining partitions the reporting server is also included to simulate an extra load in
are executed in a second core. the distributed system. To better estimate the performance gures,
H. Prez et al. / The Journal of Systems and Software 123 (2017) 145159 155

Fig. 6. Time window conguration for the case-study.

Fig. 7. Response times obtained for the performance test. Fig. 8. Histogram obtained for the performance test.

Table 2
Variability metrics for the performance test. the multicore-based congurations still present a substantial stan-
Metric Cong #1 Cong #2 Cong #3 dard deviation, which is above 150 s. This value can be explained
by the nature of the partitioned system, as the Control and Safety
Variance 180 625 37 636 23 104
Dev. standard 425 194 152 partitions rely on cyclic-based scheduling in which messages may
Percentile 90 2237 927 869 have to wait until the next time slot to be fully processed (see
Section 4). Similarly, the results for the 90th percentile (i.e., the
value below which 90% of the measurements are found) indicates
that a signicant amount of measurements are close to the max-
the operation under evaluation is executed 10,0 0 0 times and the imum response time in any of the congurations, which is also
payload is bounded to 1 KB to avoid network fragmentation. a consequence of the cyclic-based scheduling used in the second
The conguration of temporal and spatial properties associ- core.
ated with partitions remains a challenging process which can be
assisted by using specialized tools. In this work, the Xoncrete
tool (Brocal et al., 2010) has been used to validate the following 6.2. Overheads test
schedule. For Cong. #1, the system has been congured to have
dedicated time windows of 500 s for the Services and Safety A second test is carried out to estimate the overheads incurred
partitions, while the Control and Communications partitions have by the proposed architecture. Hence, the objective is to identify
10 0 0 s spread across multiple windows as shown in Fig. 6. The the temporal contribution of each software layer in the partitioned
scheduling plan is repeated every 30 0 0 s. The Communications system to the global response time measured for the rst test.
partition is assigned to the second core in the case of Cong. #2 Similarly to the performance test, this second experiment deals
and Cong. #3, and the Services partition is also assigned to the with the data ow between the Control partition and the main-
second core in Cong. #3. tenance operator using Cong #2, but it measures the execution
Fig. 7 shows the minimum, average and maximum response or processing time associated with the supervisory unit (i.e., using
times obtained in the evaluation for each proposed conguration. the local execution-time clocks), instead of the elapsed time for
As can be observed, the distributed real-time operation takes a the distributed data ow.
maximum of almost 2500 s in Cong. #1, while it takes around In the supervisory unit, the sequence of steps associated with
than 10 0 0 s for both Cong. #2 and #3. Apart from the per- the data ow under evaluation is depicted in Fig. 9, and it is
formance gain of using a multi-core architecture, the proposed performed in seven stages: (1) a network packet is received,
congurations enhance the responsiveness of the system. Further- processed and routed by the Communications partition; (2) the
more, the variation in performance between Cong. #2 and Cong. Control partition receives an interrupt to notify of the arrival of an
#3 can be explained by the slight overhead associated with the incoming message, which is handled by the corresponding drivers
xed priority scheduling policy, which adds extra blocking times at the OS level; (3) then, the incoming messages is processed by
and context switches at partition level. distribution middleware, which delivers it to the application; (4)
The distribution of response times obtained for each congura- the application reads the requests and sends the reply, which goes
tion is shown in Fig. 8. It should be noted that Cong. #1 provides through distribution middleware (5), operating system (6) and
two clearly differentiated sets of measurements as a result of the the Communications partition (7) before it is transmitted to the
bottleneck associated with communications, while the remaining network. Specically, the following metrics are measured:
congurations do not.
The metrics for evaluating the variability of the proposed plat- Middleware: It measures the overhead associated with the op-
form are shown in Table 2. As expected, the variability for Cong erations performed within distribution middleware: reception
#1 is higher than for the other two congurations. Nevertheless, and processing of the incoming DDSI message to request the
156 H. Prez et al. / The Journal of Systems and Software 123 (2017) 145159

Fig. 9. Overheads in the supervisory unit.

data sample, and the corresponding publishing process for the


reply.
Operating system: In this experiment, this metric measures the
drivers overhead for the Control partition. This includes the
operations related to the V-NIC and the UDP/IP layers.
I/O: It measures the processing time associated with the routing
algorithm in order to forward messages to/from the network.

For completeness, two additional metrics have been measured


to evaluate the variability of the partitioned system:

Interrupts: It measures the elapsed time from an interrupt


occurrence until the interrupt handler is ready to execute it.
Total: This metric represents the elapsed time in the partitioned
system, from the network request until the reply is delivered
to the network.

As can be seen in Fig. 9, the maximum processing time takes


less than 250 s. This result indicates that the proposed platform
is lightweight in terms of processing time, as it adds a low over-
head to the global response time. As expected for a partition that
executes simple code, the maximum processing time associated
with the Communications partition is less than 50 s for the
request-reply process. Similar results are obtained for the minimal
DDS implementation used in the approach, and moderately higher
Fig. 10. Response times obtained for the payload test.
times are obtained for the operating system. Furthermore, on this
platform, an interrupt latency was obtained in the range of 10 s,
which means that around 20 s is taken for the two interrupts
required by the data ow. According to the total elapsed time using the proposed congurations for multi-core (i.e., Cong #2
in the partitioned system, it can be seen that the cyclic-based and Cong #3), but it measures the response times for the dis-
scheduling used in the second core makes up most of the blocking tributed data ow using different payloads. In particular, the test
time, which is 500 s for our time window conguration. is repeated up to a maximum payload of 8 Kbytes.
Fig. 10 shows that the performance of the application is not
6.3. Payload test signicantly inuenced when the payload is below 1 Kbyte for
both congurations. However, the use of larger payloads requires
The last part of the evaluation analyses the inuence of data network fragmentation, which adds an extra overhead, thus caus-
sizes or payload on the proposed platform. Similarly to the ing higher response times. It is worth noting that the maximum
overheads test, this third experiment also deals with the data response times are notably increased from 2 Kbytes onward, as the
ow between the Control partition and the maintenance operator processing of larger messages increases the likelihood of waiting
H. Prez et al. / The Journal of Systems and Software 123 (2017) 145159 157

To address the former challenge, the paper has explored the


management of communication interrupts in multi-core platforms
and the implementation of the IPVI mechanism to notify partitions
about incoming inter-partition messages. Likewise, the paper has
also explored how the allocation and scheduling of I/O partitions
can assist in minimizing the overhead associated with time par-
titioning. To this end, two different congurations for multi-core
platforms have been analysed, and their possible trade-offs have
also been identied.
To provide interoperability among heterogeneous systems, our
approach is built upon the standard communication mechanism
proposed by DDS and a set of new virtual entities (V-NIC and
virtual network). In the context of the proposed architecture,
distributed applications do not directly access communication
services, but instead access the distribution facilities provided by
distribution middleware. To avoid the adaptation of middleware
implementations to partitioned environments, the ARINC-653
communication services are provided by the operating system
using the V-NIC entity, which is designed to provide the same
basic functionality as network cards. As a result, only the services
provided by this virtual entity are offered to distribution mid-
dleware, thus enabling transparent communications regardless of
whether virtual or physical communication resources are used.
Furthermore, the paper has also developed a partitioned dis-
tributed real-time platform as a proof of concept. This platform
Fig. 11. Standard deviation obtained for the payload test. has been tested through a case study drawn from a wind power
application. The results obtained with the proposed architecture
show that the allocation of communications to an isolated core
until the next time slot to fully process them. Furthermore, as was enhances the responsiveness of the system. Moreover, the use of
commented in the performance test, Cong #3 presents slightly priority-based partitioning in this core provides more exibility in
higher response times due to the overhead associated with the the system design, but at the cost of slightly less performance.
xed priority scheduling policy at partition-level. Future lines of research on the topic may include the denition
As can be seen in Fig. 11, the standard deviation remains nearly of a restricted prole for both DDS and DDSI standards, or the
constant throughout the payload test for Cong #2, while Cong adoption of different strategies for systems whose workload is not
#3 presents a variation around 50 s. Therefore, the use of dif- predened. There are plans to provide DDSI with an underlying
ferent payloads does not strongly affect this metric and therefore transport based on the ARINC-653 communication service. Further
most of the variability is due to the cyclic-based scheduling used investigation is also required to fully determine other benets that
in the second core. distribution middleware may bring to partitioned systems, such as
Taking these metrics into account, it can be appreciated that fault tolerance. Finally, combined support for para-virtualization
the integration of these technologies can be quite ecient, and and full-virtualization is planned for XtratuM, as this would allow
therefore it could be feasible to apply this approach to a wide the execution of both unmodied and para-virtualized guest
range of industrial applications. Furthermore, the second and operating systems.
third tests reinforce the fact that temporal isolation represents
one of the main sources of overhead in partitioned systems. This
loss of performance is inevitable when executing safety-related Acknowledgment
applications, but it can be minimized by means of the proposed
architecture for multi-core platforms. This work has been funded in part by the Spanish Government
and FEDER funds under grant numbers TIN2011-28567-C03-
02/TIN2011-28567-C03-03 (HIPARTES) and TIN2014-56158-C4-1-
7. Conclusions and future work
P/TIN2014-56158-C4-2-P (M2C2), and by the European Commission
under grant number FP7 ICT 610640 (DREAMS).
Modern industrial embedded applications already integrate
several functionalities with different criticality levels, but they are
starting to migrate to multi-core architectures and their need for References
ubiquitous connectivity is growing swiftly. This scenario leads to
Airlines Electronic Engineering Committee, 2005. ARINC specication 664P7: Air-
ever more heterogeneous systems which must preserve their crit- craft data network, P7 - Avionics Full Duplex Switched Ethernet (AFDX) Net-
icality requirements and may need global connectivity to execute work.
external services at the same time. In this context, the use of Airlines Electronic Engineering Committee, 2006. Avionics Application Software
Standard Interface. Aeronautical radio INC., ARINC Specication, pp. 651653.
multi-core platforms and standard distribution middleware can be
Albano, M., Ferreira, L.L., Pinho, L.M., Alkhawaja, A.R., 2015. Message-oriented mid-
seen as a potential solution to address their different communica- dleware for smart grids. Comput. Stand. Interfaces 38, 133143. doi:10.1016/j.
tion requirements in terms of performance and interoperability. csi.2014.08.002.
Aldea, M., Gonzlez, M., 2001. MaRTE OS: an Ada kernel for real-time embedded ap-
This paper has identied and addressed a set of challenges
plications. In: Proc. of the International Conference on Reliable Software Tech-
that enables the use of a data-centric distribution middleware nologies. Ada-Europe, Leuven, Belgium, LNCS, p. 2043.
in partitioned real-time embedded systems based on multi-core Almadani, B., Khan, S., Bajwa, M.N., Sheltami, T.R., Shakshuki, E., 2015. AVL and mon-
architectures: increasing the communication responsiveness of itoring for massive trac control system over DDS. Mobile Inf. Syst. doi:10.1155/
2015/187548.
partitioned systems and providing interoperability among hetero- Baruah, S.K., Chattopadhyay, B., Li, H., Shin, I., 2014. Mixed-criticality scheduling on
geneous systems. multiprocessors. Real-Time Syst. J. 50, 142177.
158 H. Prez et al. / The Journal of Systems and Software 123 (2017) 145159

Brocal, V., Masmano, M., Ripoll, I., Crespo, A., Balbastre, P., 2010. Xoncrete: a Marquardt, O., Riedlinger, M., Ahmadi, R., Reichel, R., 2015. An adaptive middleware
scheduling tool for partitioned real-time systems. Proc of the 5th Int. Congress approach for fault-tolerant avionic systems. In: Proc.of the IEEE Aerospace Con-
on Embedded Real-Time Software and Systems (ERTS2 ). ference, pp. 18. doi:10.1109/AERO.2015.7119303.
Burns, A., Davis, R.I., 2014. Mixed Criticality Systems A Review, fourth ed. Depart- Muench, D., Isfort, O., Mueller, K., Paulitsch, M., Herkersdorf, A., 2013. Hard-
ment of Computer Science, University of York Report. ware-based I/O virtualization for mixed criticality real-time systems using PCIe
Carrascosa, E., Coronel, J., Masmano, M., Balbastre, P., Crespo, A., 2014. Xtratum hy- SR-IOV. In: Proc. of the 16th IEEE International Conference on Computational
pervisor redesign for LEON4 multicore processor. SIGBED Rev. 11 (2), 2731. Science and Engineering.
Cho, Y., Choi, J., Choi, J., 2013. An integrated management system of virtual resources Multi-Core Execution of Parallelised Hard Real-Time Applications Supporting
based on virtualization API and data distribution service. In: Proc. of the ACM Analysability (parMERASA), 2014. European project, Part of the 7th Framework
Cloud and Autonomic Computing Conference. New York (USA). Programme http://www.parmerasa.eu.
Cilku, B., Puschner, P., 2013. Towards temporal and spatial isolation in memory hier- Multi-cores Partitioning for Trusted Embedded Systems (MULTIPARTES), 2013. EU
archies for mixed-criticality systems with hypervisors. In: Proc. of the 1st work- project, 7th Framework Programme www.multipartes.eu.
shop on Real-Time Mixed Criticality Systems (ReTiMiCS). Taiwan, pp. 2528. Mnch, D., Paulitsch, M., Herkersdorf, A., 2015. IOMPU: spatial separation for
Crespo, A., Ripoll, I., Masmano, M., Peir, S., 2010. Partitioned embedded archi- hardware-based I/O virtualization for mixed-criticality embedded real-time sys-
tecture based on hypervisor: the XtratuM approach. In: European Dependable tems using Non-transparent bridges. In: Proc. of the HPCC/CSS/ICESS, pp. 1037
Computing Conference (EDCC), pp. 6772. 1044. doi:10.1109/HPCC- CSS- ICESS.2015.221.
Design of embedded mixed-criticality CONTRol systems under consideration of Object Management Group, 2007. Data Distribution Service for Real-Time Systems
EXtra-functional properties (CONTREX), 2014. European project, Part of the 7th OMG Document, v1.2, formal/07-01-01.
Framework Programme http://contrex.os.de. Object Management Group, 2010. The real-time publish-subscribe wire protocol.
Distributed REal-time Architecture for Mixed Criticality Systems (DREAMS), DDS Interoperability Wire Protocol Specication OMG Document, v2.1, for-
2014. European project, Part of the 7th Framework Programme http://www. mal/2010-11-01.
uni-siegen.de/dreams. Open VEhiculaR SEcurE platform (OVERSEE), 2013. European project, Part of the 7th
Dubey, A., Emnger, W., Gokhale, A., Karsai, G., Otte, W.R., Parsons, J., Szabo, C., Framework Programme https://www.oversee-project.com.
Coglio, A., Smith, E., Bose, P., 2012. A software platform for fractionated space- Prez, H., Gutirrez, J.J., 2014. Data-centric distribution technology in ARINC-653
craft. In: Proc. of the 2012 IEEE Aerospace Conference, pp. 120. systems. In: Proc. of the 3rd Int. Workshop on Real-Time and Distributed Com-
Dubey, A., Karsai, G., Mahadevan, N., 2011. A component model for hard real-time puting in Emerging Applications. Italy.
systems: CCM with ARINC-653. Softw. Pract. Exp. 41 (12), 15171550. Prez, H., Gutirrez, J.J., 2016. Enabling data-centric distribution technology for par-
Eklund, U., Bosch, J., 2014. Architecture for embedded open software ecosystems. J. titioned embedded systems. IEEE Trans. Parallel Distrib. Syst. 27 (11), 3186
Syst. Softw. 92, 128142. doi:10.1016/j.jss.2014.01.009. 3198. doi:10.1109/TPDS.2016.2531695.
Ernst, R., 2010. Certication of trusted MPSoC platforms. In: Proc. of the Interna- Perrotin, M., Conquet, E., Dissaux, P., Tsiodras, T., Hugues, J., 2010. The TASTE toolset:
tional Forum on Embedded MPSoC and Multicore. turning human designed heterogeneous systems into computer built homoge-
Galizzi, J., Metge, J.J., Arberet, P., Morand, E., Vigeant, F., Crespo, A., Masmano, M., neous software. Proc of the 5th Int. Congress on Embedded Real-Time Software
Coronel, J., Ripoll, I., Brocal, V., Roubert, F., Scuri, C., Tedesco, V., Thomasson, N., and Systems (ERTS2 ).
2012. LVCUGEN (TSP-based solution) and rst porting feedback. In: Proc. of the Putra, H.A., Kim, D., 2015. Node discovery scheme of DDS for combat management
Int. Conf. on Embedded Real Time Software and Systems. France. system. Comput. Stand. Interfaces 37, 2028. doi:10.1016/j.csi.2014.05.002.
Garcia-Valls, M., Basanta-Val, P., 2016. Analyzing point-to-point DDS communication Rierson, L., 2013. Developing Safety-Critical software: A Practical Guide for Aviation
over desktop virtualization software. Comput. Stand. Interfaces doi:10.1016/j.csi. Software and DO-178C Compliance, rst ed. CRC Press ISBN: 9781439813683.
2016.06.007. Rushby, J., 1999. Partitioning in Avionics Architectures: Requirreements, Mecha-
Gu, Z., Zhao, Q., 2012. A state-of-the-art survey on real-time issues in embedded nisms, and Assurance. NASA Langley Research Center.
systems virtualization. J. Softw. Eng. Appl. 5, 277290. SCOPE Alliance, 2008. Virtualization: State of the Art Version 1.0.
Guasque, A., Balbastre, P., Crespo, A., 2016. Real-time hierarchical systems with arbi- Sommer, S., Camek, A., Becker, K., Buckl, C., Zirkler, A., Fiege, L., Armbruster, M.,
trary scheduling at global level. J. Syst. Softw. 119, 7086. doi:10.1016/j.jss.2016. Spiegelberg, G., Knoll, A., 2013. RACE: a centralized platform computer based
05.040. architecture for automotive applications. In: Proc. of the 2013 IEEE International
Hakiri, A., Berthou, P., Gokhale, A., Schmidt, D., Thierry, G., 2014. Supporting Electric Vehicle Conference (IEVC), pp. 16.
SIP-based end-to-end data distribution service QoS in WANs. J. Syst. Softw. 95, Technical Standard for Future Airborne Capability Environment, 2014. The Fu-
100121. ture Airborne Capability Environment (FACE) Consortium Edition 2.1. The Open
Han, S., Jin, H., 2014. Resource partitioning for integrated modular avionics: com- Group.
parative study of implementation alternatives. Softw. Pract. Exp. 44 (12), Windsor, J., Hjortnaes, K., 2009. Time and space partitioning in spacecraft avion-
14411466. ics. Space Mission Challenges for Information Technology 0:1320 doi:10.1109/
Heinecke, H., Bortolazzi, J., Schnelle, K.-P., l. Mat, J., Fennel, H., Scharnhorst, T., 2004. SMC-IT.2009.11.
AUTOSAR - an industry-wide initiative to manage the complexity of emerging Xi, S., Xu, M., Lu, C., Phan, L.T.X., Gill, C.D., Sokolsky, O., Lee, I., 2014. Real-time mul-
automotive E/E-Architectures. SAE Convergence Congress. ti-core virtual machine scheduling in xen. ACM International Conference on Em-
Higuera-Toledano, M.T., 2012. Adaptive distributed embedded and real-time java bedded Software (EMSOFT).
systems based on RTSJ. In: Proc. of the 15th Int. Symposium on Ob- Xiong, M., Parsons, J., Edmondson, J., Nguyen, H., Schmidt, D.C., 2007. Evaluating
ject/Component/Service-Oriented Real-Time Distributed Computing Workshops, technologies for tactical information management in net-centric systems. In:
pp. 164171. Proc. of the Defense Transformation and Net-Centric Systems. USA.
Kang, W., Kapitanova, K., Son, S.H., 2012. RDDS: a real-time data distribution service Zhou, Q., Xiong, Z., Zhan, Z., You, T., Jiang, N., 2015. The mapping mechanism be-
for cyber-physical systems. IEEE Trans. Ind. Inform. 8 (2), 393405. tween distributed integrated modular avionics and data distribution service. In:
Karoui, R., Corsaro, A., 2011. Real time data distribution for airborne systems. Work- Proc. of the 12th Int. Conference on Fuzzy Systems and Knowledge Discovery
shop on Real-time, Embedded and Enterprise-Scale Time-Critical Systems. (FSKD), pp. 25022507.
Kim, H., deNiz, D., Andersson, B., Klein, M., Mutlu, O., Rajkumar, R., 2014. Bounding
memory interference delay in COTS-based multi-core systems. In: Proc. of the
20th IEEE Real-time and Embedded Technology and Applications Symposium
(RTAS). Germany.
H. Prez et al. / The Journal of Systems and Software 123 (2017) 145159 159

Hctor Prez Tijero has been participating in intense teaching and research activity in the Electronics and Computers Department at the University of Cantabria (Spain)
since 2008. He received his M.Sc. and Ph.D. in 2008 and 2012, respectively. His Ph.D. was concerned with the integration of a real-time model into distribution middleware
to facilitate the development process of distributed real-time systems. He works in software engineering for real-time systems and has been involved in several research and
industrial projects using emerging distribution middleware technologies to build distributed and deterministic applications.

J. Javier Gutirrez received his B.Sc. and Ph.D. from the University of Cantabria (Spain) in 1989 and 1995 respectively. He has been an associate professor in the Computers
and Real-Time Group at the University of Cantabria since 1996, where he works in software engineering for real-time. His research activity deals with the scheduling,
analysis and optimization of embedded real-time distributed systems (including communication networks). He has been involved in several research projects building real-
time controllers for robots, evaluating Ada for real-time applications, developing middleware for real-time distributed systems, and proposing both models and analysis and
optimization techniques for distributed real-time applications.

Salvador Peir Frasquet is a Software Engineer working on the software development and testing of embedded partitioned systems. Master degree in Computer Engineering
in 2011 at the Polytechnic University of Valencia (UPV) and PhD in Computer Science in 2016 at the same university. His main research topics are related to the verication
of the security properties of embedded operating systems such as the Linux Kernel, and, virtualization technologies as the XtratuM Hypervisor.

Alfons Crespo is Professor of the Department of Computer Engineering of the Technical University of Valencia. He received the Ph.D. in Computer Science from the Technical
University of Valencia, Spain, in 1984. He held the position of Associate professor in 1986 and full Professor in 1991. He leads the group of Industrial Informatics and has
been the responsible of several European and Spanish research projects. His main research interest include different aspects of the real-time systems (scheduling, hardware
support, scheduling and control integration,). He has published more than 60 papers in specialised journals and conferences in the area of real-time systems.

S-ar putea să vă placă și