On Friday 08 April 2011 16:39:13 Jan Beulich wrote:
> >>> On 08.04.11 at 16:26, Wei Wang2 <wei.wang2@xxxxxxx> wrote:
> > On Friday 08 April 2011 15:43:57 Jan Beulich wrote:
> >> >>> On 08.04.11 at 13:35, Wei Wang2 <wei.wang2@xxxxxxx> wrote:
> >> >
> >> > Some device could generate bogus interrupts if an IO-APIC RTE and an
> >> > iommu interrupt remapping entry are not consistent during 2 adjacent
> >> > 64bits IO-APIC RTE updates. For example, if the 2nd operation updates
> >> > destination bits in RTE for SATA device and unmask it, in some case,
> >> > SATA device will assert ioapic pin to generate interrupt immediately
> >> > using new destination but iommu could still translate it into the old
> >> > destination, then dom0 would be confused. To fix that, we sync up
> >> > interrupt remapping entry with IO-APIC IRE on every 32 bits operation
> >> > and foward IOAPIC RTE updates after interrupt remapping table has been
> >> > changed.
> >> I don't think this is correct: Without the patch, the filling of
> >> ioapic_rte takes into account the value already written. Now that you
> >> only write the value at the end of the function, you should overwrite
> >> the
> >> affected half with "value" immediately before calling
> >> update_intremap_entry_from_ioapic().
> > Sorry, not quite understand your point. My thought is, no matter dom0
> > tried to
> > updates lower half or upper half of RTE, we always updates interrupt
> > table from the lower half. This will keep iommu table strictly
> > identically to RTE. The old code has an assumption that both lower half
> > and upper of RTE should be updated together. But this might not be always
> > true. If by incident, dom0 only updates the upper half and we don't sync
> > iommu with it, then the destination in RTE and iommu table will be
> > different.
> No, that's not my point. The problem I'm seeing is that you pass the
> old value (as read from the IO-APIC) to
> update_intremap_entry_from_ioapic(), but the function certainly
> should use the to-be-written one. Previously this was implicit because
> the IO-APIC register write happened first.
OK, got it. That is definitely problematic. will fix it.
> >> Eliminating the double write if reg == rte_lo would also seem desirable
> >> (and in no case should you write back the old value after having called
> >> update_intremap_entry_from_ioapic()).
> > It not a write back, It just finishes IO-APIC RTE writes. After updating
> > interrupt remapping table we still have to update RTE. It is just a copy
> > of __io_apic_write (maybe I should just call it). Old code updates ioapic
> > earlier than interrupt remapping table and sata device might generate
> > interrupt right after this, which is not expected.
> No. If reg == ret_lo, you write that IO-APIC register twice, which is
> pointless. With the other problem unaddressed, you actually first write
> back the old value (with the mask bit restored), which gets IO-APIC
> and remapping tables out of sync for a brief period of time (which is
> a problem by itself), then write the new value. With the other problem
> addressed, you would simply write the new value twice, which is
> wasteful given that these writes are uncached.
True. I will rework the patch try to eliminate this.
Xen-devel mailing list