[ag-automation] neuer Mitleser

Wed Apr 26 12:25:43 CEST 2006

Hi Jan,

On Mon, 2006-04-24 at 00:38 +0200, Jan Kiszka wrote:
> > I'm not yet convinced, that a "fit every OS driver" approach is the
> > right way to go. Looking at the various existing implementations of "fit
> > for everything drivers" it's not unlikely that this will become a
> > nightmare.
> 
> Not "fit for everything". Optimised for hard-RT, but usable under any
> RT-enhanced Linux. That's the goal, and we are on a quite promising path.

Promising what? 

A seperate out of tree driver code base, which is not usable under
vanilla Linux ? So vendors need to provide another extra driver ?

You are free to look at the problem from the restricted RT POV, but
people really care about reusability of code and a solution which allows
to maintain one code base for both RT and non RT kernels. That's harder
to achieve, but a goal which is well worth to have.

> > Why not. When the design is good and there is actually a benefit from
> > having such a subsystem, then it makes completely sense to give it a
> > try. If it's single purpose and unusable when -rt is switched off, it's
> > in fact debatable.
> 
> We'll see. Such a step costs a lot of energy, you know this better than
> I, while resources are limited. Let's check first when and how other
> things around RT materialise in mainline. Also the recently brought up
> RT-networking issue will be a good indicator if it already makes sense
> to spend time on integration.

Well, resources are often the reason why things do not happen, but it's
also easy to hide behind that argument.

It makes a lot of sense to think about integration and the issues
related to that in a very early stage. The longer you wait to do that
the harder it will be to do real integration.

> >> But even with
> >> preempt-rt, I'm a bit sceptical that the current kernel community will
> >> like the idea of including a layer which aims at keeping code compilable
> >> also against something outside their yard.
> > 
> > That's no argument. There is a lot of code in mainline which is held
> > compatible to non Linux implementations. The criteria is usefullness.
> 
> Really? I have some other comments in mind, but I must admit that I do
> not have cites at hand. So: Dear jury, ignore the last statement.

:) Just for the record: XFS, ReiserFS, JFS, JFFS2, ACPI, ibm_emac,
xilinx ....

> > ...
> > Granted, it's easier to add single purpose solutions in order to
> > circumvent limitations in the generic code, but I doubt that this is a
> > good general approach. I still think that it's better to spend time on
> > solving such problems in the existing code or proving that the seperate
> > implementation is necessary and usefull for others too. Nobody will have
> > any objections if there is a solution to improve the performance of e.g.
> > the networking stack. This might as well include some extra short cuts
> > for critical applications.
> 
> Improve some performance numbers is one thing, introduce limitations to
> generic code in order to make predictability of the critical path
> manageable is another.

I talked about problem solving, not about perfomance number improvement.

Solve the problem in a generic way. I'm sure that such a solution also
results in general performance improvements, so a lot of people get a
benefit.

I'm really curious how the Van Jacobson idea will change the situation
and I would appreciate if somebody with your experience would actively
participate on that development. It's a simple fact, that those who do
the work have large influence on the design and the outcome.

> Hmm, I must have missed some postings: Who claimed that deterministic
> synchronisation would work "across domains"? It's a well-known and
> otherwise well-documented fact that you need a compatible threads in
> order to participate in RT co-scheduling. Ever looked at Xenomai docs?

Yeah, from:
http://snail.fsffrance.org/www.xenomai.org/documentation/trunk/html/api/index.html

"There is no support for the pshared attribute; mutexes created by
Xenomai POSIX skin may be shared by kernel-space modules and user-space
processes through shared memory."

Maybe I'm to stupid to understand that, but "no support" is rather clear
for a uniformed reader like me.

> And that such threads can be created easily is something we take care
> for very soundly.
> 
> So, pshared PI-mutexes work very well under Xenomai.

Depending on the definition of "work". Your wilful interpretation of
POSIX and what's necessary to support POSIX compliant applications is
really interesting.

I accept that POSIX has some braindead interfaces, but I also know that
tons of applications (also realtime applications) rely on those
interfaces and I really have no clue, why a POSIX implementation has to
deviate willingly from the standard for no obvious reason.

Simple example from the same doc:

"By default, Xenomai POSIX skin mutexes are of the recursive type and
use the priority inheritance protocol."

I really appreciate the anticipatory care you take for the propably
wrong design decision of the application developer, but standards are
not a subject to random interpretation and wilful enforcment of your
personal view of computational scenarios.

You look at POSIX as a vehicle from a restricted RT perspective. That's
your freedom, but please accept that my freedom is to criticize that.

This graceful interpretation is not a problem restricted to
pthread_mutexes. Please run the relevant tests yourself.

> > That's one of my main criticism on the pseudo domain concept. It
> > pretends to be safe by seperation and flexible by resource sharing at
> > the same time. There is no way to get both things together without major
> > restrictions. You claim POSIX compliance, but put self defined
> > restrictions on its usage.
> 
> Sigh, please define the "major restrictions" you see so that we can
> discuss concretely. I'm not claiming there are no restrictions at all,
> but pure POSIX compliance doesn't guarantee strict determinism as well.

I know.

Re. "define ....": 
Sigh too :). In a POSIX environment the non compability is restriction
enough.

I'm not talking about non implemented features, as this is a well known
issue to POSIX programmers.

> > 
> > How do you justify that a POSIX compliant RT application has to be
> > modified in order to work on such a system and the user has to redesign
> > and rewrite the code?
> 
> If you have to redesign your code, it was broken anyway - oops, also a
> clumsy generalisation :). No, please name concrete RT design patterns
> that would not work so that we can discuss if there is an issue.

Aargh. Broken == it does not fit in the functional model of
xenomai/posix emulation ?

Concrete:
1. Non compliance - the most crucial part
2. Restrictions vs. cross boundary (i.e domain) computations, thats
something which can be handled, but is a pain in the neck for large
applications

> Again, POSIX does not specify the determinism of each and every service,
> nor does it say anything about timing characteristics. Porting from
> POSIX RTOS A to B therefore always include careful review and potential
> redesigns of certain parts.

Sure. I do not expect that a POSIX application on OS A will behave
exactly the same way on OS B in timing respects, but I expect and thats
not a naive assumption, that the semantics of the interfaces are exactly
the same.

> > This makes it simply unusable for already existing large code bases -
> > multi process user applications and code based on large frameworks like
> > ACE/TAO.
> 
> ACE/TAO is a nice tool for specific problem domains, and how I recall
> the code, Preempt-rt is probably the best way to help providing its
> services with improved determinism. But I remember that it relies on
> select/poll semantics for reading input channels, thus making its
> determinism dependent on the question if this interface is implementable
> in a deterministic way by the OS. That's not unusual, I have seen other
> RT code before which included far more problematic patterns.
> 
> BTW, there are also frameworks which are already ported over
> "restricted" RTOSes, like Xenomai is in your eyes, OROCOS e.g.

Oh man. I know frameworks ported to an 8 bit micro controller. And I was
also talking about large code bases aside ACE/TAO which do not even
touch poll/select.

All I wanted to get is that you declare upfront and clear

- which restrictions are part of the solution
- where and why you deliberately violate the specifications
- and what an user gains for that

> > IMNSHO the whole concept of pseudo domains is broken by design and will
> > never lead to a satisfying solution. Either you have an all in one OS or
> 
> Am I saying that your approach is totally broken and will never work
> just because it doesn't meet all requirements we have?
> 
> So, define "satisfying", name the problem domain precisely, or please
> stop spreading FUD of your own.

Sorry, I overshoot the limit of objective discussion here.

> > real physically sepearated domains, which enforce a completely different
> > design of the application/system.

The point which upset me was the implicit sacrosanctness declaration of
the xenomai approach - maybe it's my personal touchiness.

What I'm seriously fighting is the false illusion of reduced auditing
requirements.

> Actually, this is a minor point for soft RT as it may only cause very
> rare deadline misses. Even better, it could be solved for hard-RT by
> consequently pre-allocation that chunk! But the fact that we have to
> discuss this so intensively, that you defend it as the best-for-all
> solution, this is what raises my concerns about universal usability.

While I care about RT related problems I also have to maintain the hard
way to get this stuff gradually merged and accepted into mainline.

The basic design is in a way, which allows to address the problem simply
by preallocating the data structures in a non shrinkable private futex
slab.

> 			Single Domain		Separate Domains
> 
> memory-sucking apps	services are delayed	(separate pools to
> or NRT-drivers		or fail which depend 	confine impact)
> 			on availability

See above.

> lock-ups in buggy	high risk of loosing	RT/NRT interaction
> NRT-drivers, including	RT properties		can be affected, RT
> IRQ handlers					threads continue

I don't know why you claim this. You can put a endless loop into a not
RT driver interrupt handler and the high priority threads are still
scheduled properly.

> Of course, if and/or how long a concrete application may survive without
> its NRT domain varies from scenario to scenario. But there are a lot of
> recoverable scenarios, e.g. our autonomous forklifts, which are still
> able to stop smoothly (i.e. without throwing away their load) even when
> the Linux domain is not reactive anymore. Doesn't solve all problems,
> "just" reduces risks without the costs and inconveniences of
> virtualisation approaches.

I still see a control system as a whole. Even if "only" the display is
broken your machine control is simply unusable.

Granted it is a bit of a difference with safety related parts, but I
really don't know any real life scenario which would rely in safety
critical parts on pure software solutions.

If your forklift relies on a pure software solution, please let me know
where it drives around in the wild, so I can avoid to meet it.

> Ok, to sum up my humble POV: neither Preempt-rt nor frameworks for
> co-scheduling like Xenomai are one-size-fits-it-all solutions. Over the
> time, given an increasing RT-awareness of the kernel community
> (_including_ hard-RT), Preempt-rt may extend its application domain. But
> so far, also when looking at this discussion, I see no signs that this
> is a predetermined path.

We'll see :)

	tglx