[ag-automation] Strange Variation in Latency Values
Carsten Emde
C.Emde at osadl.org
Thu Dec 13 15:25:43 CET 2012
Rahul,
> We are working on Freescale MPC8313ERDB board. We have ported
> 3.0.46-rt69 kernel on the same. While testing the RT-Patch using
> cyclictest utility we got some strange latency values. When
> 'histogram' option in cyclictest is disabled, we get max latency in
> range of 100 to 200us whereas if we enable 'histogram' option (-h) we
> get max latency of around 50ms. What could be the possible reason
> behind this much variation?
This is most probably related to the "real-time throttler". Please have
a look at your kernel logs; if there is the message "sched: RT
throttling activated", then the following topic of the OSADL Technical
FAQs is for you:
Q:
My system has perfect real-time capabilities under idle conditions.
Under heavy load, however, the system repeatedly suffers from latencies
of about 50 ms. What is wrong here?
A:
One of the most important prerequisites of a reliable real-time system
with a priority-based scheduler such as PREEMPT_RT Linux is that the CPU
load never gets saturated by a real-time task for a relevant period of
time. The reason for this condition: As long as the task with the
highest priority requires 100% CPU bandwidth, the system is
undistinguishable from a crashed system. The real-time task still may
behave in a deterministic way, but the rest of the system is no longer
usable. BTW: The fact that a real-time process with 100% load can create
kind of a denial-of-service attack, is the main reason why assigning
real-time capabilities to a task or increasing its priority requires
superuser permissions.
In order to avoid a situation where a real-time task consumes 100% CPU
capacity and, thus, makes the system unresponsive, the Linux PREEMPT_RT
kernel contains an automatic "real-time throttler" which is enabled by
default. This mechanism forces a real time task for 50 ms to normal
priority whenever the task was runnable continuously for more than 950
ms. It is considered save to leave the "real-time throttler" enabled
even under production conditions - however, there may be rare conditions
where a real-time design explicitly allows a situation where the task
with the highest priority may run havoc. Normally, however, there is a
rule that says that the higher the priority of a task is, the shorter
the length of any uninterruptible code section should be.
In the rare case, where the throttler needs to be disabled, simply use
the command
echo -1 >/proc/sys/kernel/sched_rt_runtime_us
More details about bandwidth assignments to tasks or group of tasks are
given in the kernel documentation relative to the kernel source code
tree at Documentation/scheduler/sched-rt-group.txt.
> If anybody could give us a hint or so which would help us to debug
> the issue further.
No need to debug any further. You may wish to put your processor out of
its misery and relax the load. For standard latency measurement, there
is no need to enable tracing. To monitor latency of a system that
already runs a critical application, do not use cyclictest but simply
enable the kernel's built-in latency histograms:
CONFIG_SCHED_TRACER=y
CONFIG_WAKEUP_LATENCY_HIST=y
CONFIG_MISSED_TIMER_OFFSETS_HIST=y
To enable the histograms:
cd /sys/kernel/debug/tracing/latency_hist/enable
echo 1 >wakeup
echo 1 >missed_timer_offsets
echo 1 >timerandwakeup
To inspect the overall latency
cd /sys/kernel/debug/tracing/latency_hist/timerandwakeup
grep -v " 0$" CPU0
#Minimum latency: 0 microseconds
#Average latency: 1 microseconds
#Maximum latency: 12 microseconds
#Total samples: 111153
#There are 0 samples lower than 0 microseconds.
#There are 0 samples greater or equal than 10240 microseconds.
#usecs samples
0 2311
1 88567
2 18306
3 1434
4 372
5 36
6 49
7 35
8 19
9 6
10 7
11 8
12 3
More details in Documentation/trace/histograms.txt.
Hope this helps,
Carsten.
More information about the ag-automation
mailing list