[ag-automation] Strange Variation in Latency Values

Thu Dec 13 15:25:43 CET 2012

Rahul,

> We are working on Freescale MPC8313ERDB board. We have ported
> 3.0.46-rt69 kernel on the same. While testing the RT-Patch using
> cyclictest utility we got some strange latency values. When
> 'histogram' option in cyclictest is disabled, we get max latency in
> range of 100 to 200us whereas if we enable 'histogram' option (-h) we
> get max latency of around 50ms. What could be the possible reason
> behind this much variation?
This is most probably related to the "real-time throttler". Please have 
a look at your kernel logs; if there is the message "sched: RT 
throttling activated", then the following topic of the OSADL Technical 
FAQs is for you:

Q:
My system has perfect real-time capabilities under idle conditions. 
Under heavy load, however, the system repeatedly suffers from latencies 
of about 50 ms. What is wrong here?

A:
One of the most important prerequisites of a reliable real-time system 
with a priority-based scheduler such as PREEMPT_RT Linux is that the CPU 
load never gets saturated by a real-time task for a relevant period of 
time. The reason for this condition: As long as the task with the 
highest priority requires 100% CPU bandwidth, the system is 
undistinguishable from a crashed system. The real-time task still may 
behave in a deterministic way, but the rest of the system is no longer 
usable. BTW: The fact that a real-time process with 100% load can create 
kind of a denial-of-service attack, is the main reason why assigning 
real-time capabilities to a task or increasing its priority requires 
superuser permissions.

In order to avoid a situation where a real-time task consumes 100% CPU 
capacity and, thus, makes the system unresponsive, the Linux PREEMPT_RT 
kernel contains an automatic "real-time throttler" which is enabled by 
default. This mechanism forces a real time task for 50 ms to normal 
priority whenever the task was runnable continuously for more than 950 
ms. It is considered save to leave the "real-time throttler" enabled 
even under production conditions - however, there may be rare conditions 
where a real-time design explicitly allows a situation where the task 
with the highest priority may run havoc. Normally, however, there is a 
rule that says that the higher the priority of a task is, the shorter 
the length of any uninterruptible code section should be.

In the rare case, where the throttler needs to be disabled, simply use 
the command

echo -1 >/proc/sys/kernel/sched_rt_runtime_us

More details about bandwidth assignments to tasks or group of tasks are 
given in the kernel documentation relative to the kernel source code 
tree at Documentation/scheduler/sched-rt-group.txt.

> If anybody could give us a hint or so which would help us to debug
> the issue further.
No need to debug any further. You may wish to put your processor out of 
its misery and relax the load. For standard latency measurement, there 
is no need to enable tracing. To monitor latency of a system that 
already runs a critical application, do not use cyclictest but simply 
enable the kernel's built-in latency histograms:
CONFIG_SCHED_TRACER=y
CONFIG_WAKEUP_LATENCY_HIST=y
CONFIG_MISSED_TIMER_OFFSETS_HIST=y

To enable the histograms:
cd /sys/kernel/debug/tracing/latency_hist/enable
echo 1 >wakeup
echo 1 >missed_timer_offsets
echo 1 >timerandwakeup

To inspect the overall latency
cd /sys/kernel/debug/tracing/latency_hist/timerandwakeup
grep -v " 0$" CPU0
#Minimum latency: 0 microseconds
#Average latency: 1 microseconds
#Maximum latency: 12 microseconds
#Total samples: 111153
#There are 0 samples lower than 0 microseconds.
#There are 0 samples greater or equal than 10240 microseconds.
#usecs	         samples
      0	            2311
      1	           88567
      2	           18306
      3	            1434
      4	             372
      5	              36
      6	              49
      7	              35
      8	              19
      9	               6
     10	               7
     11	               8
     12	               3

More details in Documentation/trace/histograms.txt.

Hope this helps,
Carsten.