[ag-automation] neuer Mitleser

Mon Apr 24 00:38:05 CEST 2006

Hi Thomas,

Thomas Gleixner wrote:
> On Tue, 2006-04-11 at 11:41 +0200, Jan Kiszka wrote: 
>> ...
>>    More in details, RTDM focuses on basic POSIX I/O services (read /
>>    write / ioctl) and the socket interface (recvmsg / sendmsg &
>>    friends). Invocations are passed to driver handlers separated
>>    according to the caller's scheduling policy (RT or non-RT). An RTDM
>>    driver can register dedicated handlers for each context, providing,
>>    e.g., resource allocation policies depending on the criticality of
>>    the caller or rejecting wrong usages.
> 
> I dont see why this is a general necessary functional split. When the
> driver needs different handling depending on the users scheduling
> policy, then let the driver do that part. 

By focusing on the single-scheduler approach, this may not be clear
immediately. When considering also co-schedulers, it may become visible
that this concept reflects the two design goals of average and
worst-case performance, which are often conflicting. It supports the
driver to provide both or only one, while clearly declaring the type.

> 
>> 2. Providing a driver development interface for the time-critical part
>>    that is independent of the underlying RTOS (and its potential
>>    modifications over the time). This concept was (re-)born in the days
>>    when Preempt-rt was far from being in sight. So the current
>>    realisation may require a careful rebalancing when including
>>    Preempt-rt support.
> 
> I'm not yet convinced, that a "fit every OS driver" approach is the
> right way to go. Looking at the various existing implementations of "fit
> for everything drivers" it's not unlikely that this will become a
> nightmare.

Not "fit for everything". Optimised for hard-RT, but usable under any
RT-enhanced Linux. That's the goal, and we are on a quite promising path.

>> But regarding mainline integration of this layer: well, if this is going
>> to be the ultimate criteria, RTDM may fail. Without preempt-rt in the
>> kernel, it doesn't make sense to think about this at all.
> 
> Why not. When the design is good and there is actually a benefit from
> having such a subsystem, then it makes completely sense to give it a
> try. If it's single purpose and unusable when -rt is switched off, it's
> in fact debatable.

We'll see. Such a step costs a lot of energy, you know this better than
I, while resources are limited. Let's check first when and how other
things around RT materialise in mainline. Also the recently brought up
RT-networking issue will be a good indicator if it already makes sense
to spend time on integration.

> 
>> But even with
>> preempt-rt, I'm a bit sceptical that the current kernel community will
>> like the idea of including a layer which aims at keeping code compilable
>> also against something outside their yard.
> 
> That's no argument. There is a lot of code in mainline which is held
> compatible to non Linux implementations. The criteria is usefullness.

Really? I have some other comments in mind, but I must admit that I do
not have cites at hand. So: Dear jury, ignore the last statement.

> ...
> Granted, it's easier to add single purpose solutions in order to
> circumvent limitations in the generic code, but I doubt that this is a
> good general approach. I still think that it's better to spend time on
> solving such problems in the existing code or proving that the seperate
> implementation is necessary and usefull for others too. Nobody will have
> any objections if there is a solution to improve the performance of e.g.
> the networking stack. This might as well include some extra short cuts
> for critical applications.

Improve some performance numbers is one thing, introduce limitations to
generic code in order to make predictability of the critical path
manageable is another.

> Automation is not so different from other
> demanding application fields. Restricting the view on the requirements
> of automation is fundamentally wrong.

Even the automation domain is not homogeneous regarding RT requirements.
And there is more than automation where hard-RT counts. We (Xenomai and
approaches) surely do not only target automation applications, ask the
(serious) users of co-scheduling kernels. But we also do not claim to
solve all RT problems. I actually recommend Preempt-rt from time to time
when being asked for fitting approaches.

> 
>> But first keep the integration out of the focus. I'm sure we can more
>> easily agree on the specification part, i.e. the device driver profiles.
>> CAN is such an example where we are on a good way to achieve a
>> compatible programming model, independent of the underlying
>> implementation details.
> 
> As long as the programming model is generic enough and does not furtivly
> depend on non obvious assumptions e.g. syscall splits. 

No "syscall split" implications. The idea is to keep the programming
model common.

> 
>> As you said, this depends on your scenario. I can very well imagine
>> large, complex applications (telco...) where your definitely smart
>> optimisations help a lot. But I can also imagine scenarios (tight
>> control loops e.g.) where you better spend a few percentage efficiency
>> for guarantees.
> 
> There is a wide range between both types (telco and the tight control
> loop). Also tight control loops where this might matter are usually
> lockless. OTOH we have a critical control application (migrated from
> OS9) which uses locks heavily. Giving up the non contended / no waiters
> optimnization would hurt badly. 

I'm sorry that I didn't expressed myself clearly enough (though I was
sure I did): the non-contention optimisation per se is NOT the issue,
I'm referring to its details.

> 
> I had a short look into the xenomai implementation and I really wonder
> how you will achieve pthread_mutex handling across processes (pshared)
> with priority inheritance support - including shared mutexes across rt
> and non-rt processes. This is a basic requirement for many POSIX based
> applications and frameworks. It's not a question whether such a scenario
> is desirable from an engineering and design POV, it's simply a fact that
> it is necessary if you claim POSIX compliance.
> 
> Bluntly, without thinking too much: It does not work, you can't do
> priority inheritance across domains without a major hack all over the
> place.

Hmm, I must have missed some postings: Who claimed that deterministic
synchronisation would work "across domains"? It's a well-known and
otherwise well-documented fact that you need a compatible threads in
order to participate in RT co-scheduling. Ever looked at Xenomai docs?
And that such threads can be created easily is something we take care
for very soundly.

So, pshared PI-mutexes work very well under Xenomai.

> 
> That's one of my main criticism on the pseudo domain concept. It
> pretends to be safe by seperation and flexible by resource sharing at
> the same time. There is no way to get both things together without major
> restrictions. You claim POSIX compliance, but put self defined
> restrictions on its usage.

Sigh, please define the "major restrictions" you see so that we can
discuss concretely. I'm not claiming there are no restrictions at all,
but pure POSIX compliance doesn't guarantee strict determinism as well.

> 
> How do you justify that a POSIX compliant RT application has to be
> modified in order to work on such a system and the user has to redesign
> and rewrite the code?

If you have to redesign your code, it was broken anyway - oops, also a
clumsy generalisation :). No, please name concrete RT design patterns
that would not work so that we can discuss if there is an issue.

Again, POSIX does not specify the determinism of each and every service,
nor does it say anything about timing characteristics. Porting from
POSIX RTOS A to B therefore always include careful review and potential
redesigns of certain parts.

> 
> I'm not saying that domains are bad, but you have to state the
> restrictions clearly upfront and stay away from general compliance
> claims. For Joe User it has to be made entirely clear that the claimed
> conformance has severe restrictions:
> - works only inside a single domain
> - has incomplete functionality and needs modifications to the code
> 
> This makes it simply unusable for already existing large code bases -
> multi process user applications and code based on large frameworks like
> ACE/TAO.

ACE/TAO is a nice tool for specific problem domains, and how I recall
the code, Preempt-rt is probably the best way to help providing its
services with improved determinism. But I remember that it relies on
select/poll semantics for reading input channels, thus making its
determinism dependent on the question if this interface is implementable
in a deterministic way by the OS. That's not unusual, I have seen other
RT code before which included far more problematic patterns.

BTW, there are also frameworks which are already ported over
"restricted" RTOSes, like Xenomai is in your eyes, OROCOS e.g.

> 
> IMNSHO the whole concept of pseudo domains is broken by design and will
> never lead to a satisfying solution. Either you have an all in one OS or

Am I saying that your approach is totally broken and will never work
just because it doesn't meet all requirements we have?

So, define "satisfying", name the problem domain precisely, or please
stop spreading FUD of your own.

> real physically sepearated domains, which enforce a completely different
> design of the application/system.
> 
>>> If you trap into the -ENOMEM situation in the locking code, then your
>>> system has reached its limits anyway.
>> That's easy to claim, but I don't see why a well configured Linux box
>> shouldn't keep their time critical jobs alive and working even under
>> memory pressure.
> 
> Oh well. There is a huge difference between memory pressure and the
> point where you get ENOMEM in the locking code. Once this hits, there is
> serious trouble on the way. When ENOMEN happens in such a scenario then
> your complete system - xenomai or preempt-rt - is rendered unusable.

Oh, sorry, you are right. The allocation happens with GFP_KERNEL. That
should mean it will just take a few seconds to reclaim some free page by
OOM-killing.

Actually, this is a minor point for soft RT as it may only cause very
rare deadline misses. Even better, it could be solved for hard-RT by
consequently pre-allocation that chunk! But the fact that we have to
discuss this so intensively, that you defend it as the best-for-all
solution, this is what raises my concerns about universal usability.

> 
>> Bugs are everywhere, e.g. memory leaks in
>> not-that-well-reviewed non-RT applications.
> 
> We have this discussed before and it is still a strawman argument. Bugs
> are bugs and the fact that in theory the RT-application stays alive does
> not change this and does not make any bug less dangerous. 
> 
> Worse, you might even fool unexperienced users to believe, that they
> don't have to audit the complete system, as the OS will take care that
> the RT application survives.
> 
> In the whole extent such a statement suggest that even buggy non
> reviewed device drivers for non-rt related equipment are harmless. They
> are not. Hitting a kernel BUG from one of those might kill the system
> completely. The experience we've seen in course of the preempt-rt
> development was exactly this. Apparen't tly working drivers broke due to
> well hidden race conditions and other problems covered by the vanilla
> linux behaviour. I said behaviour, not semantics. The bugs were present
> in vanilla too, but so subtle that they were almost impossible to
> trigger.

...not to speak of those various out-of-tree drivers. I know. I worked
with developers who were not able to tell a spinlock apart from a mutex,
nor to use them correctly. That's why certain setups still work much
better with PREEMPT_NONE.

Don't get me wrong: We all appreciate the contribution Preempt-rt
provides to this problem very much!

> 
> I don't see any reason, why xenomai should not be affected in the same
> way.
> 
> As long as a domain concept does not provide complete physical
> seperation of the domains such statements are moot and delusive.
> 
> Pseudo domains as provided by xenomai are as vulnerable as single domain
> implementations by buggy and malicious code in the non-rt part despite
> of all sacrosanctness claims made in course of this and previous
> discussions.

Please don't throw all your threat models into the same bucket, makes
replying harder. I'll try to sort.

Threat targets: RT applications

Threat sources and potential impact:

			Single Domain		Separate Domains

memory-sucking apps	services are delayed	(separate pools to
or NRT-drivers		or fail which depend 	confine impact)
			on availability

lock-ups in buggy	high risk of loosing	RT/NRT interaction
NRT-drivers, including	RT properties		can be affected, RT
IRQ handlers					threads continue

buggy drivers that	high risk of crash	high risk of crash
overwrite kernel
memory

malicious non-root	risk of misusing	(not affected as long
apps			shared resources	as application remains
			(e.g. timers, memory,	non-root)
			locked code paths)

malicious root apps	unlimited damage	unlimited damage
or drivers

Of course, if and/or how long a concrete application may survive without
its NRT domain varies from scenario to scenario. But there are a lot of
recoverable scenarios, e.g. our autonomous forklifts, which are still
able to stop smoothly (i.e. without throwing away their load) even when
the Linux domain is not reactive anymore. Doesn't solve all problems,
"just" reduces risks without the costs and inconveniences of
virtualisation approaches.

> 
> Sorry. I really do not understand what you want to achieve with such
> arguments. I actually want to watch the fun you have with an experienced
> audit engineer, when you make such claims.
> 
> I would seriously appreciate it if we could restrict this to pure
> technical discussions again and keep FUD out.

Well, I'm trying hard to do so, also to address your arguments precisely
and directly. Memory availability is a real issue, or why are the
networking people starting to think about pre-allocated buffer pools?
The same goes for EVERY subsystem involved in critical tasks which are
not able to wait for a swapping or OOM-killing solution.

> 
>> Ok, I recently noticed that
>> the out-of-memory manager of 2.6 doesn't make this goal easier, but it
>> still remains feasible and worth to achieve.
> 
> I'd be grateful, if you find a sane solution to fix the oom-killer
> itself and not just add a "it works for xenomai" hack. :)

Thomas, where did I mentioned some "xenomai hack"? I was just referring
to the fact that the 2.6 OOM-killer needs to be told to leave the whole
process group untouched to which a RT app belongs, not just the
application itself. Just an ugly undocumented property. Once understood,
it's easy to make your application robust against OOM - when it doesn't
depend on dynamic allocation from global pools.

Ok, to sum up my humble POV: neither Preempt-rt nor frameworks for
co-scheduling like Xenomai are one-size-fits-it-all solutions. Over the
time, given an increasing RT-awareness of the kernel community
(_including_ hard-RT), Preempt-rt may extend its application domain. But
so far, also when looking at this discussion, I see no signs that this
is a predetermined path.

Jan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : https://lists.osadl.org/pipermail/ag-automation/attachments/20060424/6041de4a/signature.pgp