[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/3] x86/ucode: Fix error handling during parallel ucode load


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 18 Nov 2025 08:49:34 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 18 Nov 2025 07:49:46 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 17.11.2025 23:21, Andrew Cooper wrote:
> wait_for_state() returns false on encountering LOADING_EXIT.
> control_thread_fn() can move directly to this state in the case of an early
> error.  It is not an error condition for APs, but right now the latest write
> into stopmachine_data.fn_result wins, causing the real error, -EIO, to get
> clobbered with -EBUSY.  e.g.:
> 
>   # xen-ucode /lib/firmware/amd-ucode/microcode_amd_fam17h.bin --force
>   Failed to update microcode. (err: Device or resource busy)
> 
>   (XEN) 256 cores are to update their microcode
>   (XEN) microcode: CPU0 update rev 0x830107d to 0x830107c failed, result 
> 0x830107d
>   (XEN) Late loading aborted: CPU0 failed to update ucode: -5
> 
> Drop all the -EBUSY's, and treat hitting LOADING_EXIT as a success case.  This
> causes only a single error to be returned through stop_machine_run().  e.g.:

Why "single"? stop_machine_run() can't return multiple ones, having only a
scalar return type? Or do you mean "a single, consistent" or some such?

>   # xen-ucode /lib/firmware/amd-ucode/microcode_amd_fam17h.bin --force
>   Failed to update microcode. (err: Input/output error)
> 
>   (XEN) 256 cores are to update their microcode
>   (XEN) microcode: CPU0 update rev 0x830107d to 0x830107c failed, result 
> 0x830107d
>   (XEN) Late loading aborted: CPU0 failed to update ucode: -5

The sole difference being which specific error is observed, which looks to
support the above interpretation. What I don't quite understand is ...

> Fixes: 5ed12565aa32 ("microcode: rendezvous CPUs in NMI handler and load 
> ucode")

... this and the specific indication that this needs backporting: Why is
the particular error code this important here?

> --- a/xen/arch/x86/cpu/microcode/core.c
> +++ b/xen/arch/x86/cpu/microcode/core.c
> @@ -260,7 +260,9 @@ static int secondary_nmi_work(void)
>  {
>      cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
>  
> -    return wait_for_state(LOADING_EXIT) ? 0 : -EBUSY;
> +    wait_for_state(LOADING_EXIT);
> +
> +    return 0;
>  }

At which point the function could as well return void? Preferably with this
adjustment (and the knock-on one at the call site) and with the slight
clarification to the description
Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>

> @@ -271,7 +273,7 @@ static int primary_thread_work(const struct 
> microcode_patch *patch,
>      cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
>  
>      if ( !wait_for_state(LOADING_ENTER) )
> -        return -EBUSY;
> +        return 0;
>  
>      ret = alternative_call(ucode_ops.apply_microcode, patch, flags);
>      if ( !ret )
> @@ -313,7 +315,7 @@ static int cf_check microcode_nmi_callback(
>  static int secondary_thread_fn(void)
>  {
>      if ( !wait_for_state(LOADING_CALLIN) )
> -        return -EBUSY;
> +        return 0;
>  
>      self_nmi();
>  
> @@ -336,7 +338,7 @@ static int primary_thread_fn(const struct microcode_patch 
> *patch,
>                               unsigned int flags)
>  {
>      if ( !wait_for_state(LOADING_CALLIN) )
> -        return -EBUSY;
> +        return 0;
>  
>      if ( ucode_in_nmi )
>      {

Vaguely recalling the original intentions, these changes looked wrong to me at
the first glance. But yes, an exit indication from the control thread isn't
really a separate error condition.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.