|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
> On Jul 5, 2018, at 12:16 PM, Ian Jackson <ian.jackson@xxxxxxxxxx> wrote:
>
> Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design
> session] Process changes: is the 6 monthly release Cadence too short,
> Security Process, ..."):
>> We didn't look at the sporadic failing tests thoroughly enough. The
>> hypercall buffer failure has been there for ages, a newer kernel just
>> made it more probable. This would have saved us some weeks.
>
> In general, as a community, we are very bad at this kind of thing.
>
> In my experience, the development community is not really interested
> in fixing bugs which aren't directly in their way.
>
> You can observe this easily in the way that regression in Linux,
> spotted by osstest, are handled. Linux 4.9 has been broken for 43
> days. Linux mainline is broken too.
>
> We do not have a team of people reading these test reports, and
> chasing developers to fix them. I certainly do not have time to do
> this triage. On trees where osstest failures do not block
> development, things go unfixed for weeks, sometimes months.
>
> And overall my gut feeling is that tests which fail intermittently are
> usually blamed (even if this is not stated explicitly) on problems
> with osstest or with our test infrastructure. It is easy for
> developers to think this because if they wait, the test will get
> "lucky", and pass, and so there will be a push and the developers can
> carry on.
>
> I have a vague plan to sit down and think about how osstest's
> results analysers could respond better to intermittent failures. The
> If I can, I would like intermittent failures to block pushes. That
> would at least help address the problem of heisenbugs (which are often
> actually quite serious issues) not beint taken seriously.
>
> I would love to hear suggestions for how to get people to actually fix
> test failures in trees not maintained by the Xen Project and therefore
> not gated by osstest.
Well at the moment, investigation is ad-hoc. Basically everyone has to look to
see *whether* there’s been a failure, and it’s nobody’s job in particular to
try to chase it down to find out what it might be. If we had a team, we could
have a robot rotate between the teams to nominate one particular person per
failure to take a look at the result and at least try to classify it, maybe try
to find the appropriate person who may be able to take a deeper look.
-George
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |