[ag-automation] neuer Mitleser

Jan Kiszka jan.kiszka at web.de
Wed Apr 26 21:12:49 CEST 2006


Hi Thomas,

Thomas Gleixner wrote:
> Hi Jan,
> 
> On Mon, 2006-04-24 at 00:38 +0200, Jan Kiszka wrote:
>>> I'm not yet convinced, that a "fit every OS driver" approach is the
>>> right way to go. Looking at the various existing implementations of "fit
>>> for everything drivers" it's not unlikely that this will become a
>>> nightmare.
>> Not "fit for everything". Optimised for hard-RT, but usable under any
>> RT-enhanced Linux. That's the goal, and we are on a quite promising path.
> 
> Promising what? 
> 
> A seperate out of tree driver code base, which is not usable under
> vanilla Linux ? So vendors need to provide another extra driver ?
> 
> You are free to look at the problem from the restricted RT POV, but
> people really care about reusability of code and a solution which allows
> to maintain one code base for both RT and non RT kernels. That's harder
> to achieve, but a goal which is well worth to have.

As I said, given an open mind on both sides to see the other one's
issues as well, we may find a common ground for both "domains". I heard
you are also planning to promote a focused and lean API to RT-driver
developers? Looks like we are heading for similar goals here.

> ...
> I'm really curious how the Van Jacobson idea will change the situation
> and I would appreciate if somebody with your experience would actively
> participate on that development. It's a simple fact, that those who do
> the work have large influence on the design and the outcome.

I know very well, and I'm trying hard to redirect some brain cycles on
this (reminds me of Esben's reply...).

> 
>> Hmm, I must have missed some postings: Who claimed that deterministic
>> synchronisation would work "across domains"? It's a well-known and
>> otherwise well-documented fact that you need a compatible threads in
>> order to participate in RT co-scheduling. Ever looked at Xenomai docs?
> 
> Yeah, from:
> http://snail.fsffrance.org/www.xenomai.org/documentation/trunk/html/api/index.html
> 
> "There is no support for the pshared attribute; mutexes created by
> Xenomai POSIX skin may be shared by kernel-space modules and user-space
> processes through shared memory."
> 
> Maybe I'm to stupid to understand that, but "no support" is rather clear
> for a uniformed reader like me.

Should read: "no support for the pshared *attribute*". The point is that
PTHREAD_PROCESS_SHARED is default on Xenomai, there is no difference to
PTHREAD_PROCESS_PRIVATE (which is perfectly fine according to the spec).
What is lacking so far is pthread_mutexattr_t support, therefore the
fall-back to RT-reasonable defaults.

But this misunderstanding is helpful to reveal shortcomings of the doc -
or better to motivate Gilles Chanteperdrix, the skin developer, to
implement the missing attribute back-end.

> 
>> And that such threads can be created easily is something we take care
>> for very soundly.
>>
>> So, pshared PI-mutexes work very well under Xenomai.
> 
> Depending on the definition of "work". Your wilful interpretation of
> POSIX and what's necessary to support POSIX compliant applications is
> really interesting.
> 
> I accept that POSIX has some braindead interfaces, but I also know that
> tons of applications (also realtime applications) rely on those
> interfaces and I really have no clue, why a POSIX implementation has to
> deviate willingly from the standard for no obvious reason.
> 
> Simple example from the same doc:
> 
> "By default, Xenomai POSIX skin mutexes are of the recursive type and
> use the priority inheritance protocol."
> 
> I really appreciate the anticipatory care you take for the propably
> wrong design decision of the application developer, but standards are
> not a subject to random interpretation and wilful enforcment of your
> personal view of computational scenarios.

I'm sorry that you have to enlighten me here. While I was correctly
remembering that the default type behaviour of mutexes is undefined, I
failed to find a statement in the spec regarding the default protocol.
If there is one that doesn't match the current version, this has to be
fixed, for sure.

> ...
>> Actually, this is a minor point for soft RT as it may only cause very
>> rare deadline misses. Even better, it could be solved for hard-RT by
>> consequently pre-allocation that chunk! But the fact that we have to
>> discuss this so intensively, that you defend it as the best-for-all
>> solution, this is what raises my concerns about universal usability.
> 
> While I care about RT related problems I also have to maintain the hard
> way to get this stuff gradually merged and accepted into mainline.
> 
> The basic design is in a way, which allows to address the problem simply
> by preallocating the data structures in a non shrinkable private futex
> slab.

That's a good sign - though I would personally prefer a way to trigger
this allocation during init, not at some harder-to-predict point during
runtime. What about process-scope futexes to overcome global
hash-bucket-lock dependencies at least for the non-pshared case? Already
on the roadmap?

> 
>> 			Single Domain		Separate Domains
>>
>> memory-sucking apps	services are delayed	(separate pools to
>> or NRT-drivers		or fail which depend 	confine impact)
>> 			on availability
> 
> See above.

Yep, getting closer, but still not totally decoupling RT from NRT
threads. But I assume that this will be the next-but-one step, and
concepts for remaining subsystems are under development as well.

> 
>> lock-ups in buggy	high risk of loosing	RT/NRT interaction
>> NRT-drivers, including	RT properties		can be affected, RT
>> IRQ handlers					threads continue
> 
> I don't know why you claim this. You can put a endless loop into a not
> RT driver interrupt handler and the high priority threads are still
> scheduled properly.

Sorry for the imprecision of "lock-ups" here. I know that lock-ups via
endless loops in a plain IRQ handler or deadlocks with uncritical system
parts are non-issues here.

But one point is, e.g., that many drivers still uses local_irq_save() &
friends for critical path protection and that those calls even happen in
IRQ context (due to commonly used functions). Those code paths do need
review regarding potential lock-ups or high latencies (due to bugs or
sluggish hardware). Another one is incorrect usage of locking
primitives, e.g. things that touches the preemption lock or clumsy
nesting with central locks.

I assume that the Preempt-rt community already did quite an intensive
review of in-kernel drivers (if not all yet, a list of those being save
would be good, BTW). It would be really great for the overall driver
quality if such a strong and continued review on both in-kernel and
relevant third-party drivers will be provided around the Preempt-rt
project, no question.

> 
>> Of course, if and/or how long a concrete application may survive without
>> its NRT domain varies from scenario to scenario. But there are a lot of
>> recoverable scenarios, e.g. our autonomous forklifts, which are still
>> able to stop smoothly (i.e. without throwing away their load) even when
>> the Linux domain is not reactive anymore. Doesn't solve all problems,
>> "just" reduces risks without the costs and inconveniences of
>> virtualisation approaches.
> 
> I still see a control system as a whole. Even if "only" the display is
> broken your machine control is simply unusable.
> 
> Granted it is a bit of a difference with safety related parts, but I
> really don't know any real life scenario which would rely in safety
> critical parts on pure software solutions.
> 
> If your forklift relies on a pure software solution, please let me know
> where it drives around in the wild, so I can avoid to meet it.

[Then don't visit the manufacturer. ;)]

Of course, we also have "hardware"-based safety features on board,
including remote emergency-off. But I guess I do not have to explain
that the amount of software involved even in such elementary functions
is constantly increasing. Specifically, not all safety properties of
service robots can by realised on small dedicated controllers anymore.

Jan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : https://lists.osadl.org/pipermail/ag-automation/attachments/20060426/301011b7/signature.pgp


More information about the ag-automation mailing list