Adaptive QoS Control for Real-Time Systems Chenyang Lu CSE 520S
Challenges Ø Classical real-time scheduling theory relies on accurate knowledge about workload and platform. New challenges under uncertainties Ø Maintain robust real-time properties in face of q unknown and varying workload q system failure q system upgrade Ø Tuning, testing and certification of adaptive real-time systems 2
Challenge 1: Workload Uncertain5es Ø Task execution times q Heavily influenced by sensor data or user input q Unknown and time-varying Ø Disturbances q Aperiodic events q Resource contention from subsystems q Denial of Service attacks Ø Examples: power grid management, autonomous vehicles. 3
Challenge 2: System Failure Ø Only maintaining functional reliability is not sufficient. Must also maintain robust real-time properties! 1. Norbert fails. 2. Move its tasks to other processors. hermione & harry are overloaded! 4
Challenge 3: System Upgrade Ø Goal: Portable application across HW/OS platforms q Same application works on multiple platforms Ø Existing real-time middleware ü Support functional portability û Lack QoS portability: must manually reconfigure applications on different platforms to achieve desired QoS Profile execution times Determine/implement allocation and task rate Test/analyze schedulability Time-consuming and expensive! 5
Example: norb Middleware norb* Application Server Worker thread Conn. thread CORBA Objects Manually set offline T1: 2 Hz T2: 12 Hz Client Timer thread Priority queues Conn. thread Operation Request Lanes 6
Challenge 4: Cer5fica5on Ø Uncertainties call for adaptive solutions. But Ø Adaptation can make things worse. Ø Adaptive systems are difficult to test and certify 1 CPU utilization 0.8 0.6 0.4 0.2 0 0 100 200 300 Time (sampling period) P1 P2 Set Point An unstable adaptive system 7
Adap5ve QoS Control Ø Develop software feedback control in middleware q Achieve robust real-time properties for many applications Ø Apply control theory to design and analyze control algorithms q Facilitate certification of embedded software Sensor/human input? Disturbance? Applications Adaptive QoS Control Middleware Drivers/OS/HW? Available resources? HW failure? Maintain QoS guarantees w/o accurate knowledge about workload/platform w/o hand tuning 8
Adap5ve QoS Control Middleware Ø FCS/nORB: Single server control Ø FC-ORB: Distributed systems with end-to-end tasks 9
Feedback Control Real-Time Scheduling Ø Developers specify q Performance specs CPU utilization = 70%; Deadline miss ratio = 1%. q Tunable parameters Range of task rate: digital control loop, video/data display Quality levels: image quality, filters Admission control Ø Guarantee specs by tuning parameters based on feedbacks q Automatic: No need for hand tuning q Transparent from developers q Performance Portability! 10
A Feedback Control Loop FC-U Sensors, Inputs Specs U s = 70% {R i (k+1)} Controller Actuator Application? U(k) Middleware Parameters R 1 : [1, 5] Hz R 2 : [10, 20] Hz Monitor Drivers/OS? HW? 11
The FC-U Algorithm U s : utilization reference K u : control parameter R i (0): initial rate 1. Get utilization U(k) from Utilization Monitor. 2. Utilization Controller: B(k+1) = B(k)+ K u *(U s U(k)) /* Integral Controller */ 3. Rate Actuator adjusts task rates R i (k+1) = (B(k+1)/B(0))R i (0) 4. Inform clients of new task rates. 12
The Family of FCS Algorithms Ø FC-U controls utilization q Performance spec: U(k) = U s ü Meet all deadlines if U s schedulable utilization bound û Relatively low utilization if utilization bound is pessimistic Ø FC-M controls miss ratio q Performance spec: M(k) = M s ü High utilization ü Does not require utilization bound to be known a priori û Small but non-zero deadline miss ratio: M(k) > 0 Ø FC-UM combines FC-U and FC-M q Performance specs: U s, M s ü Allow higher utilization than FC-U ü No deadline misses in nominal case q Performance bounded by FC-M 13
Feedback Control Loop Software Feedback Control Loop Computing System Controller control input Actuator change Manipulated variable + error - Monitor sample Controlled variable Reference 14
Dynamic Response Controlled variable Reference Stability Steady state error Transient State Settling time Steady State Time 15
Control Analysis Ø Rigorously designed based on feedback control theory Ø Analytic guarantees on q Stability q Steady state performance q Transient state: settling time and overshoot q Robustness against variation in execution time Ø Do not assume accurate knowledge of execution time 16
FCS/nORB Architecture FCS/nORB Applica:on Server worker thread CORBA Objects miss monitor util monitor controller rate assigner rate modulator Client Timer thread Priority Queues conn. thread conn. thread feedback lane Operation Request Lanes 17
Implementa5on Ø Running on top of COTS Linux Ø Deadline Miss Monitor q Instrument operation request lanes q Time-stamp operation request and response on each lane Ø CPU Utilization Monitor q Interface with Linux /proc/stat file q Count idle time: Coarse granularity at jiffy (10 ms) Ø Only controls server delay 18
Offline or Online? Ø Offline q FCS executed in testing phase on a new platform q Turned off after entering steady state ü No run-time overhead û Cannot deal with varying workload Ø Online û Run-time overhead (actually small ) ü Robustness in face of changing execution times 19
Set-up Ø Ø Ø q q q q OS: Redhat Linux Hardware platform Server A: 1.8GHz Celeron, 512 MB RAM Server B: 1.99GHz Pentium 4, 256 MB RAM Same client Connected via 100 Mbps LAN Experiment 1. Overhead 2. Steady execution time (offline case) 3. Varying execution time (on-line case) 20
Server Overhead Overhead: FC-UM > FC-M > FC-U FC-UM increases CPU u:liza:on by <1% for a 4s sampling period. Server Overhead per Sampling Period Overhead (ms) 40 35 30 25 20 15 10 5 0 FC-U FC-M FC-UM Sampling Period = 4 sec 21
Performance Portability Steady Execu5on Time Same CPU u:liza:on (and no deadline miss) on different plaoorms w/o hand-tuning! 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0 25 50 75 100 125 150 175 200 Time (4 sec) FC-U on Server A 1.8GHz Celeron, 512 MB RAM U(k) B(k) M(k) 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 U s = 70% 0 25 50 75 100 125 150 175 200 Time (4 sec) FC-U on Server B 1.99GHz Pentium 4, 256 MB RAM U(k) B(k) M(k) 22
Steady-state Deadline Miss Ra5o FC-M enforces miss ra:o spec FC-U, FC-UM causes no deadline misses Average Deadline Miss Ratio in Steay State 2.00 1.50 1.49 % 1.00 0.50 0.00 FC-U FC-M FC-UM M s = 1.5% 23
Steady-State CPU U5liza5on FC-U, FC-UM enforces u:liza:on spec FC-M achieves higher u:liza:on Average CPU Utilization in Steady State 100 98.93 80 60 70.01 74.97 % 40 20 0 FC-U FC-M FC-UM U s = 70% U s = 75% 24
Robust Guarantees Varying Execu5on Time Same CPU u:liza:on and no deadline miss in steady state despite changes in execu:on :mes! 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0 50 100 150 200 250 300 350 400 Time (4 sec) U(k) B(k) M(k) 25
Tolerance to Load Increase Ø Surprise q Server crashes under FC-M when execution time increases q FCS/nORB threads run at real-time priority q Kernel starvation when CPU utilization reaches 100% Ø Tolerance margin of load increase q FC-U, FC-UM: margin = 1/U s -1 U s =70% à Server can tolerate (1/0.7-1)=43% increase in execution time q FC-M: small and unknown margin Unsuitable when execution time can increase unexpectedly 26
Summary of Experimental Results Ø FCS algorithms enforces specified CPU utilization or miss ratio in steady state q Experimental validation of control design and analysis of FCS Ø Performance Portability: FCS/nORB achieves the same performance guarantee when q platform changes q execution time changes (within tolerance margin) Ø Overhead acceptable à FCS can be used online 27
Summary: FCS/nORB Ø Enable robust, performance-portable real-time software q Program application once à runs on multiple platforms with robust performance guarantees! q FCS/nORB 1.0 release: http://deuce.doc.wustl.edu/fcs_norb Ø Next: FC-ORB q Handle end-to-end tasks q Fault tolerance 28
References Ø C. Lu, J.A. Stankovic, G. Tao, and S.H. Son, Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms, Real-Time Systems, Special Issue on Control-theoretical Approaches to Real-Time Computing, 23(1/2): 85-126, July/September 2002. Ø C. Lu, X. Wang and C.D. Gill, Feedback Control Real-Time Scheduling in ORB Middleware, IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), May 2003. Critique 29
Adap5ve QoS Control Middleware Ø FCS/nORB: Single server control Ø FC-ORB: Distributed systems with end-to-end tasks 30
End-to-End Task Model Ø Periodic task T i = chain of subtasks {T ij } on different processors q All subtasks run at a same rate q End-to-end deadline Ø Task rate can be adjusted within a range q Trade-off between video quality and rate q Higher rate à better video quality & higher CPU utilization T 1 T 11 T 12 T 13 T 2 T 3 Precedence Constraints Subtask P 1 P 2 P 3 31
End-to-End U5liza5on Control Ø CPU utilization q Too high à system overload à crash q Too low à poor performance (e.g. poor video quality) q Utilization < schedulable bound à meet deadlines Ø Uncertainties: varying task execution times q Adjust task rates to compensate for variations T 1 T 11 T 12 T 13 T 2 T 3 Precedence Constraints Subtask P 1 P 2 P 3 32
Challenges Ø Multi-Input-Multi-Output (MIMO) control Ø Utilizations are coupled due to end-to-end tasks q Rate change affects all processors in the task chain CPU utilization 80% 50% Ø Constraints on task rates Ø Stability assurance Controller 60% 60% 80% T 1 T 11 T 12 T 13 30% T 2 T 3 50% P 1 P 2 P 3 33
EUCON End-to-end U5liza5on CONtrol q Centralized control q Designed based on Model Predic:ve Control (MPC) theory q Invoked periodically to control the u:liza:ons of all processors Desired utilization bounds B! 1 B n Rmin,1 Rmax,1!! R R min, m max, m Allowed rate range for tasks (constraints) Controlled Variables: CPU utilizations Model Predictive Controller Manipulated variables: Task rate changes u u k)! ( k) 1 ( n Utilization Monitor Rate Modulator Δr ( 1 k)! Δr ( k) m UM RM UM RM 34
Control Theore5c Methodology 1. Model the controlled system 2. Design a controller 3. Analyze stability 35
Dynamic Model: One Processor u i (k) = u i (k 1) + g i c jl Δr j (k 1) T jl S i Ø S i : set of subtasks on P i Ø c jl : estimated execution time of T il running on P i q may not be correct Ø g i : utilization gain of P i q ratio between actual and estimated change in utilization q unknown: models uncertainty in execution times 36
Dynamic Model: Mul5ple Processors u(k) = u(k-1) + GFΔr(k-1) G: diagonal matrix of u:liza:on gains F: subtask alloca:on matrix models the coupling among processors f ij = c jl task T j has a subtask T jl on processor P i f ij = 0 if T j has no subtask on P i T 1 T 2 T 11 T 21 T 31 T 22 T 3 F = c 11 0 c c 21 22 c 0 31 P 1 P 2 37
Model Predic5ve Control At a sampling instant Ø Compute inputs in several future sampling periods Δr(k), Δr(k+1),... Δr(k+M-1) to minimize a cost function in the future Ø Cost in the future is predicted using i) feedback u(k-1) ii) approximate dynamic model Ø Apply Δr(k) to the system At the next sampling instant: Ø Shift time and re-compute Δr(k+1), Δr(k+2),... Δr(k+M) based on feedback u(k) 38
Model Predic5ve Controller in EUCON u u B! k )! ( k ) ( 1 n B n 1 System Rate Model Constraints Least Squares Solver Cost Reference Function Trajectory Constrained optimization solver Δ r ( ) 1 k + 1)!! ) Δr ( k + 1) m Difference with reference trajectory Model Predictive Controller Desired trajectory for u(k) to converge to B 39
Stability Analysis Ø Stability: system converges to equilibrium point from any initial condition q Equilibrium point = utilization set points B q Utilization of all processors à their set points whenever feasible Ø Derive stability condition in term of G q Tolerable range of variation in execution times Guarantees on utilization despite uncertainty! 40
Simula5on: Stable System 1 CPU utilization 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250 300 Time (sampling period) P1 P2 Set Point execu:on :me factor = 0.5 (actual execu:on :mes = ½ es:mates) 41
Simula5on: Unstable System CPU utilization 1 0.8 0.6 0.4 0.2 0 0 100 200 300 Time (sampling period) P1 P2 Set Point execu:on :me factor = 7 (actual execu:on :mes = 7x es:mates) 42
Stability Ø Stability condi:on à tolerable range of execu:on :mes Analy:cal assurance on u:liza:ons despite uncertainty Overes5ma5on of execu5on 5mes prevents oscilla5on Predicted bound for stability actual execution time / estimation 43
FC-ORB Feedback Controlled Object Request Broker Ø End-to-end utilization control q Maintains desired utilizations on all processors Ø End-to-end ORB architecture q Specialized for rate adaptation Ø Task migration q Reliability in terms of both functionality and real-time performance 44
End-to-End U5liza5on Control Service q Implements EUCON (End-to-end U:liza:on CONtrol) q Provides func:onal and performance portability Controlled variables: Utilizations Model Predictive Controller Manipulated variables: Rate changes Rate Modulator Priority Manager Utilization Monitor Rate Modulator Priority Manager Utilization Monitor Rate Modulator Priority Manager Utilization Monitor Remote request lanes Remote request lanes 45
End-to-End Object Request Broker Ø Release guard for end-to-end tasks Ø Priority management q Rate adaptation à continuous priority changes û Thread-per-priority à high overhead ü Thread-per-subtask: change priority only when the order of task rates changes Rate Modulator Priority Manager Utilization Monitor Rate Modulator Priority Manager Utilization Monitor Rate Modulator Priority Manager Utilization Monitor Remote request lanes Remote request lanes 46
Task Migra5on Ø Fault model: permanent processor failure Ø Subtasks have backups on different processors Ø Utilization control + fault-tolerance q Automatic controller reconfiguration q Handle overload caused by task migration Utilizations 1 u1( k) 2 u2( k) u ( ) 3 k Model Predictive Controller Rate changes Δr1 ( k) Δr2 ( k) Rate Modulator Rate Modulator Rate Modulator Priority Manager Priority Manager Priority Manager Utilization Monitor Utilization Monitor Utilization Monitor Remote request lanes Remote request lanes 47
FC-ORB Implementa5on Ø Implemented based on FCS/nORB, norb and ACE Ø Specialized for memory constrained distributed real-time systems Ø 7017 lines of C++ code Ø Controller is implemented as a Dynamic Link Library (DLL) generated by MATLAB 48
Experimental Setup Ø 12 tasks (25 subtasks) and 4 Pentium IV processors Ø KURT Linux 2.4.22 Ø Rate Monotonic Scheduling Ø Subtasks on Norbert have backups on other processors Harry Normal subtask Backup subtask Ron Hermione Norbert 49
Goal 1: Robust U5liza5on Control Execution times change at runtime Desired utilization: 73% (0.73) Disturbance from external resource contention 50
Goal 2: Performance Portability Same utilization: portable performance on systems with different capacities 1 CPU utilization 0.8 0.6 0.4 0.2 ron norbert harry hermione exec time = 2x expected (running on slow machines) 0 0 200 400 600 800 1000 1200 1400 1600 Time (sec) 1 Desired utilization: 73% (0.73) exec times = expected/4 (running on fast machines) CPU utilization 0.8 0.6 0.4 0.2 0 ron harry norbert hermione 0 200 400 600 800 1000 1200 1400 1600 Time (sec) 51
Goal 3: Fault Tolerance 100% 73%!! T 1 T 2 T 3 T 13 T 11 T 12 73% 73% P 1 P 2 Norbert 1. Norbert fails. 2. move its tasks to other processors. 3. reconfigure controller 4. control utilization by adjusting task rates 52
Summary: FC-ORB Ø Robust utilization control, despite q q unknown or varying execution times external disturbances Ø Performance portability Ø Fault tolerance, in terms of q q functionality real-time performance 53
Conclusion: Adap5ve QoS Control Ø Feedback control à robust real-time performance under uncertainty Ø Middleware: provides reusable control services to real-time applications Ø Control analysis: tuning and certification of adaptive software Ø More q Advanced control: event-driven, discrete configurations. q Coordination of multiple control policies q Sophisticated fault tolerance techniques q Certification/testing methodologies 54
Reading Ø Control of a single server q FCS/nORB: Feedback Control Real-Time Scheduling in ORB Middleware, RTAS 03. q FCS: Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms, Real-Time Systems, 2002. Ø Centralized control of distributed systems q FC-ORB: Enhancing the Robustness of Distributed Real-Time Middleware via End-to-End Utilization Control, RTSS 05. q EUCON: Feedback Utilization Control in Distributed Real-Time Systems with End-to-End Tasks, RTSS 05, IEEE TPDS. 55
For More Informa5on Ø Papers: http://www.cse.wustl.edu/~lu Ø Open source middleware: http://www.cse.wustl.edu/~lu/aqc.htm 56