Autosar Scheduling Automotive Embedded - Copy

Model Engineering College

Scheduling Complex Automotive Embedded Real Time Sytems

Chapter 2

AUTOMOTIVE ELECTRONIC CONTTROL UNITS (ECUѕ) (ECU ѕ)

2.1 ECUѕ in ECUѕ in Automobiles Automobiles The number of ECUs in automobiles have increased rapidly in the last decade. Thiѕ rapid growth is due to consumer consumer demand for enhanced safety features, entertainment systems, added convenience functions and gov ernment edicts on emissions controls. . The software interfaceѕ interfaceѕ in a typical Automotive Electonic Control unit iѕ ѕhown in Figure 2.1.

. [8] Soft Soft ware in terf aces aces in si de ECU EC U F ig 2.1: 2.1:

It conѕiѕtѕ conѕiѕtѕ of a) Operating ѕyѕtem b)Application b)Application ѕoftware moduleѕ c)Input/Output ѕyѕtemѕ d)Microcontroller

2.2 Operating System Automotive applications are characterized by stringent real-time requirements. Therefore the operating system offers the necessary functionality to support event driven control systems. Departmenent Departmenent of Electronicѕ Engineering

3



As the operating system is intended for use in any type of control units, it shall support time-critical applications on a wide range of hardware. A high degree of modularity and ability for flexible configuration are prerequisites to make the operating system suitable for low-end microprocessors and complex control units alike. The OSEK operating system is a single processor operating system meant for distributed embedded control units for automotive automotive applications.

OSEK is an abbreviation for the German term "Offene Systeme und deren Schnittstellen für die Elektronik im Kraftfahrzeug" (English: Open Systems and the Corresponding Interfaces for Automotive Electronics) The OSEK operating system is designed to require only a minimum of hardware resources (RAM, ROM, CPU time) and therefore runs even on 8 bit microcontrollers.

2.3Application 2.3Application software module

The specified operating system services constitute a basis to enable the integration of software modules made by various manufacturers. The interface between the application software and the operating system is defined by system services. The interface is identical for all implementations of the operating system on various processor families. Operating Syѕtem must support the portability of application software. Portability means the ability to transfer an application software module from one ECU to another ECU without bigger changes inside the application. application. During the process to port application software from one ECU to another ECU it is necessary to consider characteristics of the software development process, the development environment, environment, and the hardware architecture of the ECU, for example a)Software development guidelines b) File management management system c)Data allocation and stack usage of the compiler d)Memory architecture of the ECU e) Timing behaviour of the ECU

Departmenent Departmenent of Electronicѕ Engineering

4



f)Different microcontroller microcontroller specific interfaces e.g. ports, A/D converter, serial communication and watchdog timer g)Placement of the API calls

The application software lies on the operating system and in parallel on an application-specific Input/Output System interface which is not standardised in the OSEK specification. The application software module can have several interfaces. There are interfaces to the operating system for real time control and resource management, but also interfaces to other software modules to represent a complete functionality in a system and at least to the hardware, if the application is intended work directly with microcontroller modules.

2.4 Input/Output ѕyѕtemѕ Input/Output ѕyѕtemѕ enables the ECU to communicate with the external world. The application software provides application-specific Input/Output System interface.

2.5 Microcontroller The automotive markets for electronics is growing rapidly as the demand for comfort, safety and reduced fuel consumption increases. All of these new functions require local intelligence and control, which can be optimized by the use of small, powerful microcontrollers. In order to deal with the new range of system requirements, the automotive microcontroller will be further developed to facilitate new functions.

2.5.1Communications Communications between electronic control units (ECU's) is a growing trend. Multiplexed communications in vehicles was originally developed to reduce weight, interconnections, cost and complexity. It soon became apparent however that vehicular systems could be enhanced greatly with the opportunity to share data from different ECUs, in real time.

Departmenent Departmenent of Electronicѕ Engineering

5



Many new automotive microcontrollers will have more silicon devoted to communications capabilities than the CPU. Already, microcontrollers such as the M68HC912DG128 are being offered with two independent CAN (Controller Area Network)

modules

along

with

several

more

synchronous

and

asynchronous

communications systems. These communications interfaces are as autonomous as possible, so that the CPU does not need to devote a great deal of overhead to managing communications.

2.5.2 Safety critical operation Microcontrollers have been at the heart of safety critical systems for many years. Almost all of the safety critical automotive systems in which they have been used have provided a fail-safe function. In the near future, there will be an added requirement for fault-tolerant microcontroller based systems. The best example of a modern automotive microcontroller is the MPC555, which was designed for powertrain control and Intelligent Transportation Systems (ITS) applications. A block diagram of the MPC555 is shown in Figure 2.2

[3]

F igur e2.2: M PC555 M icrocontr oller

The MPC555 is based on the PowerPC architecture and is composed of over 6.7 million transistors, over 300 times the complexity of a microcontroller used in a Departmenent of Electronicѕ Engineering

6



comparable application a decade ago. The 32-Bit CPU includes multiple execution units and a floating point unit as well as supporting a Harvard architecture with separate load / store and instruction busses for simultaneous instruction fetching and data handling. The chip is well equipped with peripherals to interface with the rest of the system. There are 32 analogue inputs as well as 48 Timer Processor Unit (TPU) timer controlled input/output channels. Two CAN (Controller Area Network) serial communications interfaces are also included to provide multiplexed communications with other vehicular systems. The program memory is 448 Kbytes of Flash EEPROM with 26 Kbytes of RAM. Certain i/o structures have been added to the chip to accommodate 5v signals around the chip. Although the MPC555 has been developed for a 0.35 um manufacturing process, it is expected that the technology of other system components will develop more slowly and will still operate with a 5v power supply and signaling level. The next major challenge for microcontroller based automotive systems will be to optimize the efficiency of the controller and associated software by model-based development techniques, open architectures and reusability of hardware/software. Meeting these challenges will ensure that the perennial requirements of the industry are met: reduce cost, increase performance and reduce time to market.

Departmenent of Electronicѕ Engineering

7



Chapter 3

RESPONSE TIME ANALYSIS

3.1 Computational model The real-time system analyzed in this section consists of a set of asynchronous tasks i €T . A set of tasks is called asynchronous if the first activation of a task may be different from the system start time to= 0, and synchronous if all tasks are activated at to. .

In the real-time domain tasks are described using their worst-case non-functional

properties. In our case these properties include the worst-case computational workload C, the task priority P , the activation period A, and the offset O of the first task activation w.r.t. t0

. Hence, the task i is described by a 4-tuple i = (Ci, Pi, Ai, Oi) €T

Table 3.1 summarizes the important variables used in the following equations.

Table 3.1 Vari ables of th e computati onal model


[1]

8



Tasks are grouped into transactions. A transaction j € R with activation period Tj contains all tasks that have the same activation period as the transaction,

3.2 Analysis using offset information The response time of a task can be calculated taking into account the properties of the task itself and the properties of all higher priority tasks which may be pre-empted over the task under analysis. The central concept to do such an analysis taking into account task offsets is by deriving so-called busy windows . A busy window is a period of time in which a processor is not idle at any time. Clearly, with pre-emptive scheduling the task with the lowest priority is always finished at the end of a busy period. The critical instant is the point in time that precedes the busy window(w) resulting in the longest response time for the analyzed task. Consider the example schedule in figure 3.1 consisting of seven tasks in three transactions.

F ig. 3.1. Comput ational model of t he response-ti me anal ysis

[1]

The following priorities are valid P4 > Pi > P5 > P6 > P2 > P3 > P7 for the respective tasks.


9



The critical instant for Task6 is the instant when the higher priority tasks Task1 and Task4 are activated at the same time as Task6. To describe this situation a vector W is defined that captures the time Wj the transaction j was started before the critical instant. The vector W describes the critical instant candidates an analysis context. To calculate the interference of a higher priority task k on the computation of a low priority task i without offsets the equation 3.1 can be used.

………………………….. (3.1)

3.3 Hyper period ( ) Given the hyper-period

= lcm(To,T1···, T|R|-1 ) the number of activation

instances of transaction k is exactly

. After H the activation schemes are repeated.

Figure 3.2 illustrates the hyper-period and activation instances of a task-set consisting of three transactions.

[1]

F ig. 3.2: Possibl e transacti on activation in stances over the hyper -peri od

An easy way to create the analysis contexts accordingly is to iterate over the hyper-period and consider each task activation a critical instant candidate, i.e. analysis context(Wj). This procedure is sketched in algorithm 3.1


10



Al gori thm 3.1 : Create analysis Context

[1]

The procedure iterates over all task activation instants and captures the analysis context of each respective instant. The size of the returned analysis context set is equals to the number of task activations during the hyper-period. In order to finding the critical instant we have to investigate all task activation points. The analysis context that results in the longest response-time for the task under ana lysis describes the critical instant.

3.4 Response time analysis for multiple clock reference systems In the previous section the WCRT analysis for systems consisting of one global clock reference that triggers all transactions is investigated.In this section the analysis of systems that consists of multiple clock references will be analyzed. The original computational model lacks the information required to determine the clock reference of each transaction. The computational model is refined to capture this additional information.

3.4.1 Extended Computational Model

The system model introduced in section 2. defines one global clock that triggers all transactions. The single clock reference is replaced by a set of clock references C, hence, S = (T,R,C)


11



3.4.2 Multiple Time Reference With the refined computational model the existing RTA for single clock reference systems can be extended to account for multiple clock references. The critical instant had to be calculated. Unlike in the single clock reference scenario triggered by different clock references are treated different to transactions triggered by the same clock reference. Consider figure 3.3. Transactions 0,1, and 2 are triggered by the same clock reference and transaction 3 and 4 are triggered by a different clock reference.The same is true for transactions 3 and 4.

[1]

F ig3.3: H yper-peri ods with phase of fset

Unlike in the single clock reference system tasks which are triggered by different clock references are subject to clock drift. The clock drift of the transaction triggers results in a constantly changing phase offset between the transactions from different clock references. Transactions for different clock references may be triggered at arbitrary points in time. For the analysis context generation this means assuming any task activation in a transaction has no effects on possible task activations in transactions with different clock reference. The analysis contexts for each transaction set which belong to the same clock reference are created using the procedure in the previous section. After the analysis contexts for each single clock transaction set have been created, we have to merge them Departmenent of Electronicѕ Engineering

12



into one joint set of analysis contexts. To do so, we consider

each single clock

transaction set are considerded one transaction with period equals to the hyper-period of the transaction set. We then combine all such transactions to create the analysis context set of the multiple clock reference problems. The procedure that combines the two separately created sets of analysis contexts is summarized in algorithm 3.2.

[1]

Al gori thm 3.2 M erge Analysis Contexts

To investigate the critical instant all possible task activation combinations have to be conѕidered. This increases the workload significantly, but it is necessary as combining the critical instant of set 1 and the critical instant of set 2 may not result in the overall critical instant. Algorithm 3.2 iterates over all analysis contexts of set 1 and combines it with all analysis contexts of set 2. In other words, each instant when a task activation in set 1 occurred is considered.At the same time as a task activation in set 2, for all combinations of task activations. The resulting set of analysis contexts is merged with the remaining sets until one joint set of analysis contexts is left. The proposed procedure increases the computational complexity of the RTA because all entries in the analysis context set have to be evaluated. In real-world scenarios the RTA is feasible if the number of different clock references is small.


13



Chapter 4

SYSTEM ANALYSIS 4.1 Simulation Model The key model elements in a holistic schedule simulation are the tasks, the operating system, signal source, signal sinks,and the communication between element nodes. The important parameters to capture the timing effects of a task are: The execution time, activation period, priority, activation offset, and pre-emption by higher priority tasks. In order to check for deadline violations during the simulation the deadline needs to be known as well. The operating system hosts a vector of real-time tasks as well as a vector of interrupts which are modeled similar to the tasks. The operating system periodically evaluates the currently running task and runs the dispatcher if new tasks have been activated. Interrupts may occur any time and also invoke the dispatcher. The implementation of the operating system model is illustrated in figure 4.1.

F ig 4.1 I ll ustration of the operati ng system simu lati on model

[4]

Note that the task execution and the operating system are subject to clock drift which is modeled in the controller the operating s ystem resides in.


14



Figure 4.2 llustrates the task scheduling aspect of the scheduling simulation.

Bas_running

Bas_suspended

Bas_suspended

Basic_OSEKTaskBehaviour

F ig. 4.2 I ll ustration of th e task simulation model

[4]

Task states are described by a state machine consisting of a running, suspended, and waiting state. Each time step the operating system updates the workload parameter of the task that has been running the last time step and evaluates activation times of the suspended tasks. If new tasks have been activated the dispatched assigns the CPU to the highest priority task, hence, the respective task enters the running state. At a time only one task may be running. Task states are described by a state machine consisting of a running,suspended, and waiting state. Each time step the operating system updates the workload parameter of the task that has been running the last time step and evaluates activation times of the suspended tasks. If new tasks have been activated the dispatched assigns the CPU to the highest priority task, hence the respective task enters the runnin g state. At a time only one task may be running.

4.2 Automotive Safety Architecture Departmenent of Electronicѕ Engineering

15



F ig. 4.3 System archi tectur e exampl e from the automoti ve safety domain

[1]

The safety architecture in figure 4.3 consists of an integrated safety ECU which receives input from different sources. A forward looking sensor device perceives the current driving scene and triggers active safety applications and pedestrian protection via an event-triggered bus. A time-triggered task is triggered at a specific instant when the time-triggered bus is in a specific state. The communication via the time-triggered bus is also triggered by the clock of the bus. An external watchdog participates in a challenge/response mechanism. In the safety ECU a sanity check is performed upon this request. Moreover, a traditional legacy subsystem for airbag deployment is triggered periodically by its respective sensor ICs. The legacy subsystem consists of a number of periodic tasks responsible for the generation of features from the sensor data upon which a decision is made regarding whether or not the airbag needs to be deployed. Also the basic diagnosis functionality is implemented in the legacy subsystem. The active and pedestrian safety applications and the ECU sanity check are triggered event-based with a certain Minimum Inter arrival Time (MINT). The time-triggered tasks are activated periodically but the activation period is subject to drift, as the ECU and the time-triggered bus do not share the same global clock.


16



All tasks of the embedded software system and their respective real-time properties are summarized in table 4.1

T:Period, P ;Priority, O: Offset,

C: Execution time [1]

Tabl e 4.1 Real-ti me proper ti es of th e tasks in the safety system

4.3.Timing Analysis The system in fig 4.3 is analyzed using simulation model presented in section 4.1 using statistical modeling and simulation is feasible in clock drifts between elements have to be considered. However,investigating the worst-case timing using statistical simulation is difficult because very long simulation durations would be neccessary. On the other hand, the simulation provides knowledge about mean response times and average controller load. The simulation is performed repeatedly until a predefined confidence threshold is reached. Between each simlation replication the clock drifts and initial transaction phase is evaluated statistically and changed. By doing so we create random samples of the mean and max response times of all tasks.


17



Using the samples from each replication we create the box plots of the mean task response times depicted in figure 4.4

[1]

F ig 4.4 M ean task r esponse times over m ul tipl e simu lati on r epli cations

The top and the bottom of the box which contains the thick horizontal bar represent the 75% quartile and the 25% quartile,i.e. the latency bound which 75%-25% of the replication instances did not exceed. The thicker line inside of the box presents the median of the values. The whiskers with the bars at the end of the line represent the data point that exceed the quartiles, but are limited to 1.5 x IQR, IQR being the interquartile range. Outlier are represented by circles. The confidence factor of the mean task response times was set to accept a relative error of 5% with the probability of 95%.


18



Over all simulation replications the simulated WCRT of all tasks is depicted in figure 4.5

F ig.4.5 M aximu m simul ated task response times and calcul ated WCRT

[1]

The WCRTs of the tasks is not constant over all replications but depend on the actual selection of clock drift and phase offset. The crosses in figure 4.5 represent the calculated WCRTs using the RTA presented in section 3. Two important findings can be observed in the figure: a) Even through extensive simulation the maximum response time from the simulation . b) The maximum response time in the simulation does not exceed the calculated WCRT.

Thiѕ RTA provides important information regarding the worst-case behavior of the system by providing upper bounds of the WCRT of all real-time tasks. Especially for the lower priority tasks the gap between WCRT and maximum response time during the simulation runs can be significant. Thiѕ analysis is applicable in automotive systems with relatively short schedule lengths, or hyper-periods, and limited number of different clock references.


19



Chapter 5 MULTICORE SCHEDULING IN AUTOMOTIVE ECUS

5.1 Introduction Multi-source software running on the same ECU (Electronic Control Unit) is becoming increasingly widespread in the automotive industry. One of the main reason being that OEMs want to reduce the number of ECUs which grew up above 70 for highend cars. These multicore ECUs offer new features such as higher levels of parallelism which eases the respect of the safety requirements introduced by the ISO 26262 and can be taken advantage of in various other automotive use-cases. Multicore ECU s rises the problem of scheduling numerous elementary software components (called runnables) on a limited set of identical cores.In the context of an automotive design, we assume the use of the static task partitioning scheme which provides simplicity and better predictability for the ECU designers by comparison with a global scheduling approach. The global scheduling problem can be addressed as two subproblems: partitioning the set of runnables and building the schedule on each core.

5.2 Main use cases for multicore ECU in the automotive d omain There exist very distinct hardware and software architectures for multicore ECU platforms. As far as hardware is concerned, suppliers envision various multicore architectures: identical cores, heterogeneous cores with different operating speeds and instruction sets and, possibly, various I/O and memory structures. However, chip manufacturers have been producing multiprocessor cores with identical cores for the PC industry for a while which may influence the automotive industry as those architectures are proven in use and are likely to be cheaper thanks to mass production. In this section, we discuss the main use cases for a multicore ECU and implementation solutions that would properly fit them.


20



5.2.1 Decreasing the complexity of in-vehicle architecture The higher level of performance provided by multicore architectures allows to simplify in-vehicle architectures by executing on multiple cores the software previously run on multiple ECUs.

5.2.2 Improving the safety Multicore architectures provide efficient ways to implement safety mechanisms. We identify three main methods to improve safety taking advantage of the multicore architecture. The first method consists in segregating trusted code and non trusted code on different cores. For instance, a car manufacturer may consider the software provided by suppliers as non-trusted code, or an ECU integrator may consider the car manufacturer’s code as non-trusted for responsibility reasons. The second method consists in executing safety critical software components in a redundant manner, possibly with a system of vote choosing the output given by a majority of the duplicated runnables. Finally multicore architectures enable easier implementation of function monitoring. In this case, the proper execution of some functions on one core can be monitored from another core.

5.2.3 Dedicated use of cores Finally, another important use case taking advantage of a multicore ECU consists in using a core to handle specific low-level services. In the context of Autosar OS, a core could serve as a dedicated I/O controller, execute the communication stack or the whole set of basic software modules, while some other core would only take care of applicative level software components. For instance, a core can be used to run the time-triggered application while a second core handles the interruptions as well as the event-triggered runnables


21



Chapter 6

PARTITIONED SCHEDULING OF TASK ON AUTOSAR OS 6.1 AUTOSAR OS AUTOSAR (Automotive Open System ARchitecture) is an open and standardized automotive software architecture, jointly developed by automobile manufacturers, suppliers and tool developers. Software platform structure of AUTOSAR OS

is shown in fig.6.1

F ig 6.1:Softwar e platf orm str uctur e o AUT OSAR OS

[8]

6.2 AUTOSAR Basic Software Basic Software is the standardized software layer, which provides services to the AUTOSAR Software Components and is necessary to run the functional part of the software. It does not fulfill any functional job itself and is situated below the AUTOSAR Runtime Environment. The Basic Software contains standardized and EC U specific components. The earlier include: Departmenent of Electronicѕ Engineering

22



6.2.1Services System services such as diagnostic protocols; NVRAM, flash and memory management.

6.2.2Communication Communication Framework (e.g. CAN, LIN, FlexRay...), I/O management, Network management.

6.3 Operating System As AUTOSAR aims at an architecture that is common for all vehicle domains it will specify the requirements for an AUTOSAR Operating System. Here are the basic features of the AUTOSAR OS a)is configured and scaled statically b) is amenable to reasoning of real-time performance c) provides a priority-based scheduling policy d)provides protective functions (memory, timing etc.) at run-time e)is hostable on low-end controllers and without external resources This feature set defines the type of OS commonly used in the current generation of automotive ECUs. AUTOSAR allows the inclusion of proprietary OSs in Basic Software omponents. To make the interfaces of these components AUTOSAR compliant, the proprietary OS must be abstracted to an AUTOSAR OS. The standard OSEK OS (ISO 17356-3) is used as the basis for the AUTOSAR OS.


23



6.4 Microcontroller Abstraction Access to the hardware is routed through the Microcontroller Abstraction layer (MCAL) to avoid direct access to microcontroller registers from higher-level software. MCAL is a hardware specific layer that ensures a standard interface to the components of the Basic Software. It manages the microcontroller peripherals and provides the components of the Basic Software with microcontroller independent values. MCAL implements notification mechanisms to support the distribution of commands, responses and information to different processes. Among others it can include: a)Digital I/O (DIO) b)Analog/Digital Converter (ADC) c)Pulse Width (De)Modulator (PWM, PWD) d)EEPROM (EEP) e)Flash (FLS) f)Capture Compare Unit (CCU) g)Watchdog Timer (WDT) h)Serial Peripheral Interface (SPI) 2

i)I C Bus (IIC)

6.5 ECU specific components 6.5.1 ECU Abstraction The ECU Abstraction provides a software interface to the electrical values of any specific ECU in order to decouple higher-level software from all underlying hardware dependencies.

6.5.1 Complex Device Driver (CDD) The CDD allows a direct access to the hardware in particular for resource critical applications.


24



Chapter 7

MULTICORE SCHEDULING IN AUTOSAR OS 7.1 Introduction One of the outcomes of the AUTOSAR initiative is indeed to help OEMs shift from the “one function per ECU” paradigm to more centralized architecture designs. There are currently several existing and suggested HW-architectures1 for MultiCore microprocessors. There is considerable variation in the features offered by these architectures. Therefore this section attempts to capture a common set of architectural features required for Multi-Core. a) More than one core on the same piece of silicon.

b) The hardware supports some atomic Test-And-Set functionality or similar functionalities that can be used to built a critical section shared between cores. Additional atomic operations may exist. c) If per-core caches exist, AUTOSAR requires support for RAM - c ache coherency in HW or in SW. In software means that the cache-controller can be programmed the SW in a way that it invalidates cache lines or excludes certain memory regions from caching d) The cores may have the same instruction set; at least a common basic instruction set is available on all cores. Core specific add-ons may exist but they are not taken


25



7.2 Memory features a) Shared RAM is available to all cores; at least all cores can share a substantial part of the memory. b)Fash shall be shared between all cores at least. However, performance can improved if Flash/RAM can be partitioned so that there are separate

be

pathways from

cores to Flash. c)A single address space is assumed, at least in the shared parts of the memory

address

space. d)The AUTOSAR Multi-Core architecture shall be capable to run on systems and do not support memory protection. If memory protection exists, all

that do cores are

covered by a hardware based memory protection

The

OS can be entered on each core in parallel. This optimizes scalability

towards multiple cores. The cores schedule independently.

SAR OS F ig 7.1 :M ul ticore scheduli ng in AU TO

[5]

Priorities are assigned to TASKS. The cores schedule independently from each other.This implies that the schedule on one core does not consider the scheduling on the other cores2. A low priority TASK on one core may run in parallel with a high priority TASK on another core.


26



7.3 Static cyclic and fixed priority scheduling Static cyclic scheduling of elementary software components or runnables, is common because they are usually many more runnables that the maximum number of tasks allowed by automotive operating systems such as OSEK/VDX or AUTOSAR OS. For this reason, runnables must be grouped together and scheduled within a sequencer task (also called dispatcher task). A first step of the approach is to partition the runnable sets onto the different Cores. The next and last step consists in determining the offsets between the runnables allocated on each core so as to balance the load over time.

7.4 Model description In this case study, we consider a large set of n periodic elementary software components, also called runnables, that are to be allocated on an ECU consisting in m identical cores. In practice, a runnable can be implemented as practice, a runnable can be implemented as a function called, whenever appropriate, within the body of an OS task .

7.4.1 Runnable characteristics The ith runnable is denoted by Ri =(Ci,Ti,Oi,{R},Pi). Where Ci= Worst-Case Execution Time(WCET) Ti=Execution Period of Ri Oi=Offset of Ri The offset of a runnable is the release date of the first instance of that unnable,subsequent instances are then released periodically. The choice made for the offset values has a direct influence on the repartition of the workload over time.


27



7.4.2 Dispatcher task Runnables are scheduled on their designated core using a dispatcher task, or “sequencer task”, that stores the runnable activation times in a table and releases them at the right points in time. A dispatcher task is characterized by the duration of the dispatch table Tcycle that is executed in a cyclic manner2, and by a quantum Ttic which is the duration of a slot in the table. Dispatch table holds Tcycle/Ttic slots.

7.4.3 Assumptions In this paper, we place a set of working assumptions which, in our experience, can most often be met in today’s automotive applications: a)Each runnable are executed strictly periodically. As a result, the whole trajectory of the b)The runnables are assumed to be offset-free, in the sense that the offset of a runnable construction of the dispatch table with the ob jective to uniformize the CPU load over a scheduling cycle. c)The worst case execution times of the runnables are assumed to be small compared to Ttic. Typical values for the case we consider would be 5ms for Ttic and Ci<=300µs. d)All cores are identical regarding their processing speed. e)There are no dependencies between runnables allocated on different cores Therefore, all cores can be scheduled independently. This assumption is in line with the choices made by AUTOSAR regarding multicore architecture .

7.5 Scheduling condition In our context, the system is schedulable, and thus can be safely deployed, if and only if on each core. a) The runnables are executed strictly periodically. b)The initial offset of each runnable is smaller than its period. c) The sum of the WCET of the runnables allocated in each slot does not exceed a given threshold, which is typically chosen as the duration of the ѕlot ttic.


28



7.6 Building tasks as a bin-packing problem It is assumed that the number of cores is fixed. We first distribute all the runnables on the cores without checking the schedulability condition at that stage. Assigning n tasks to m cores is like subdividing a set of n elements into m non-empty subsets. Considering this complexity, to balance as evenly as possible the utilization of processor cores, we propose a heuristic based on the bin-packing decreasing worst-fit scheme for a fixed number of bins (where “bins” here are processor cores). The heuristic is given in Algorithm 7.1.

[2]

Al gorithm 7.1:Partiti oning of r unn able set


29



Chapter 8

CASE STUDY : CAN FRAME ALLOCATION 8.1CAN(Control Area Network) CAN is a serial Protocol developed b y Robert Bosch GmbH originally for communication among components of cars. CAN finds lot of applications in automobiles: 1)A low speed CAN bus may be employed to operate window and seat controls. 2) A high speed CAN bus may be employed for engine management or brake control. 3)Other applications are Engine Sensors, Anti-Skid Systems. The success of CAN is due to the inexpensive electronic components (ICs) for managing the communication protocol. The number of CAN nodes on each vehicle 5-10 for the engine system, 10 for the body part, 15, 20, 25 or more for the passenger compartment . The CAN bus is a Balanced (differential) 2-wire interface. Bus support data transfer rates up to 1 Mbit/s and 11-bit addressing.

8.2 CAN Bus frame The CAN Bus interface uses an asynchronous transmission scheme controlled by start and stop bits at the beginning and end of each character. CAN uses different frames with distinct functions like a)Data Frame From transmitter to receiver b)Remote Frame Transmitted by receiver to request for data c)Error Frame d)Overload Frame


30



8.2.1 CAN bus Data Frame

F ig 8.1: Can data fr ame

[3]

a)The data frame is composed of an Arbitration field, Control field, Data field, CRCfield, . ACK field b)The frame begins with a 'Start of frame' [SOF], and ends with an 'End of frame' [EOF] space. c)The data field may be from 0 to 8 bytes.

8.3 CAN frame allocation and scheduling CAN has been and will most likely remain a prominent network in cars for atleast two more car generations. One of the issues CAN will have to face is the growth of traffic with the increasing amount of data exchanged between Electronic Control Units (ECUs). A car manufacturer has to make sure that the set of frames will be schedulable, i.e. the response time of the frames is kept small enough to ensure that the freshness of the data is still acceptable when used at the receiver end. Clearly here, for most messages, even periodic ones, we are in the realm of soft real-time constraints: a deadline constraint can be occasionally missed without major consequences. However, the issue on CAN is that worst-case response times increase drastically with the load.

8.4Least loaded algorithm Problem of scheduling runnables can be analyzed using the “least-loaded” algorithm which is proposed for the frame offset allocation on a CAN network. Considering a runnable Ri of period Ti, there are Ti/Ttic possibilities for allocating this runnable . As a result there are ∏   alternative schedules for the n r unnables. The intuition behind the heuristic is simple: at each step,assign the next runnable to the least loaded slot, as described in Algorithm 8.1. Departmenent of Electronicѕ Engineering

31



The load of a slot is the sum of the Ci of the runnables {Ri} already assigned to this slot. The intuition behind the heuristic is simple: at each step, we assign the next runnable to the least loaded slot, as described in Algorithm 8.1

Al gori thm 8.1: Assigning runnable to slots : the “least loaded” heuristic

.[2]

For practical applications, ties at Step (1) are broken using highest WCET first and ties at Step (2a) by choosing the central slot of the longest sequence of consecutive slots having the minimum load. It helps to separate load peaks, which is important from the ECU designer point of view. As an illustration, the result of applying the least-loaded heuristic to the set of runnables Ri(Ti;Ci): R1(10,2), R2(10, 1), R3(20, 4), R4(20, 2) leads to the dispatch table shown in Figure 8.2

Figure 8.2: Example of dispatch table[5]

The resulting distribution of the load is:

.[5]

Table 8.1: L oad reparti ti on corr espondin g to the dispatch table in F igu re 8.2 .


32



8.4.1 Dealing with non harmonic runnable set In practice, often, runnable sets do not have strictly harmonic periods. As a consequence, the previous results do not hold anymore. In particular, placing a runnable in the least loaded slot of the dispatch table could induce peaks because of the runnable periodicity. Take the folllowing runnable set for instance: R1(10; 2), R2(20; 3),R3(20; 1), R4(50; 2) with Ttic = 5 and Tcycle = 100. Figure 8.3 shows the dispatch table before the allocation of R4

.[5]

F igu re 8.3: D ispatch table befor e the insert ion of R4

The resulting distribution of the load is:

.[4]

Table 8.2: L oad r eparti tion corr espondin g to the dispatch table in F igu re8.3

At that point, choosing one of the least loaded slots in the dispatch table with make the schedule fail because R4 will also have to be allocated in a slot with the highest load because of its periodicity. For example, if the first instance of R4 is allocated in slot 1, be placed in slot 11 and make the system unschedulable. However, allocating R4 in any even slot is safe.In order to deal with non-armonic runnable sets, we need to go through a larger window of slots for the choice of the ffsets. In the following, variable Twindow is equal to the lcm of the periods of the runnables already scheduled at the current state of the algorithm. Instead of looking for the least loaded slot in the first Ti=Ttic slots, we try to create the smallest peak over Twindow, knowing that the schedule repeats in cycle afterward as given by algorithm 8.2.


33



.[2]

loaded” heuristic Algorithm8.2 Generalized “least -

8.4.2 Improvement: placing outliers first The algorithms 8.1 and 8.2 construct the scheduling of runnables with arbitrary periods and possibly with locality and inter-runnable constraints.These algorithms sometimes do not handle well runnable sets where a few runnables with a low frequency have a very large In practice, runnables with a large WCET tend to have a large period. As a result, runnables with large WCET are usually processed late in the runnable allocation process which explains the load peaks. In order to reduce those peaks, the scheduling algorithm is improved by processing some runnables with a large WCET first. We define the WCET threshold critical=µ+kσ with µ=average of the distribution of {Ci} σ =Standard deviation of the distribution of {Ci} k= an integer value. The runnables with Ci larger than Ccritic are allocated first. Then, the rest of the runnables are processed as done in algorithm 8.2. This new version of the loadbalancing algorithm is referred to as Generalized leastloaded sigma, or G-LLk σ for short. In the experiments that follows, k is chosen equal to 1 .


34



Chapter 9

SIMULATION USING NETCAR ECU

9.1 RTaW NETCAR ECU Software tools that help complex systems designers to optimizes the scheduling of tasks (or runnables) so as to reduce the load peaks while meeting task deadline constraints. RTaW-ECU enables the designers to squeeze the most from the CPUs.The tool is developed by company RTaW(RealTime-at-Work). RTaW-Sim supports the simulatin of discrete-event Controller Area Network (CAN) providing the frame response time distributions and statistics about the frame buffer usage at the microcontroller and communication controller level. RTaW-Sim helps the designer choose the right communication stacks (e.g.waiting queue policy) and communication controllers (e.g.,number of buffers), and configure them. RTaW-Sim enables the designer to also perform simulation Based Fault Injection (SBFI), for instance analyzing the effects of clock drifts or the impact of transmission errors on transmission latencies

9.1.1Features of NETCAR ECU

a)Handle multicore CPUs 22jj b)Optimize the number of tasks per time slot or the total workload per time slot c)Enable to reduce load peaks and thus  optimize the CPU usage d)Offer a better resilience against wrong estimations of the execution times o r activation Patterns. e)Achieve guaranteed performances in terms of CPU load f)Can be used jointly with Fixed-Priority Preemptive (FPP) scheduling tool to conceive mixed scheduling solutions g) Incremental scheduling :new tasks can be adde d in an existing system h)Very fast computation


35



9.2Simulation of task sets using NETCAR ECU Here we assess the ability of the algorithms to uniformize the CPU load over time and to keep on providing feasible solutions at very high load level. The task sets considered here are harmonic with periods in the set { 10; 50; 100; 500; 1000ms }, which is a large subset of the periods used in the real ECU. The WCETs vary from 10µ s to 300 µs with a probability derived from the real distribution on the body gateway ECU.We choose the average CPU load slightly below 94% so that the feasibility is ensured . Figure 9.1 shows the distribution of the load over a LCM of the periods with least loaded algorithm. A set of runnables corresponding to slightly less than 94% of CPU load is scheduled on a 4-core ECU.

F ig 9.1: The distributi on of the load over a L CM of the peri ods with L L

[4]

The x axis shows the time slots and y axis shows the total load per slots (Worst case execution time)


36



Fig 9.2 Shows the distribution with G-LL1 σ

[2]

LL1 σ F ig 9.2:T he distri bution with G-

It can been seen, the load peaks are much smaller with Generalized Least Loaded1 σ (peak load is 94.6%) than with Least Loaded (peak load is 98.4%). This can be explainedThe runnables with C i larger than C critic are allocated first. as done in algorithm ie, because the few largest runnables are placed first and the numerous smaller ones placed afterward fill the gaps in the schedulable table. This reduces the peak loads per slots.


37



Chapter 10

CONCLUSION . The formal verification of task schedulability is getting more important in the automotive domain We introduced a computational model for the response time analysis (RTA) of automotive real time systems. Such systems often consist of subsystems that comprise of a collection of tasks, called transactions, which are triggered by external events. Today’s automotive design methodologies need to be adapted to multicore computing and there is a wide range of technical problems to be solved In this paper, we have presented practical scheduling solutions well suited to the basic use-case which is to execute a large number of software components on the same multicore processor in order to reduce the number of ECUs. The set of algorithms described in this paper have shown on realistic case-studies to be versatile and efficient in terms of CPU usage optimization.


38

Autosar Scheduling Automotive Embedded - Copy

Recommend Documents