Re: [Xen-devel] Problem with PV disk and iSCSI

Hi Gary,

On Fri, Feb 08, 2008 at 02:54:14PM -0500, Gary Grebus wrote:
> I've run into a problem on 3.1.2 with an HVM guest using PV disks.  In
> dom0, the physical disk is accessed using iSCSI.  The symptom is that
> applications in dom0 which are monitoring the iSCSI network interface
> (e.g. tcpdump) die with EFAULT errors.
> 
> When the block I/O completes, it looks like blkback is doing a
> GNTTABOP_unmap_grant_ref on a guest page, even though the dom0 kernel
> has done get_page() on it and still holds references.  
> 
> The page had been passed through iSCSI into the network stack, so it
> ends up referenced by one or more skb's.  Because there was an AF_PACKET
> socket open, a clone of the skb ends up queued for an indeterminate
> amount on that socket queue.  When the application finally gets around
> to reading the data, the page is no longer mapped, and the read fails
> trying to copy the data out of the kernel.
> 
> Has anyone else seen anything similar?  I mentioned tcpdump, but the
> problem also shows up with dhcpcd, which needs to process packets at the
> ethernet layer.  
> 

We're seeing the same thing with 3.1.3.  When running iscsi in dom0
(over a xen bridge) presenting these via blkfront to the guest we see 
the same crash (below) while performing failover tests on the storage
controller.

Just as you said, the error occurs in skb_remove_foreign_references from
loopback_start_xmit.  It's running all the foreign pages, attempting to
copy each locally when it dies on the source address (esi) of the
following memcpy:

115                 vaddr = kmap_skb_frag(&skb_shinfo(skb)->frags[i]);
116                 off = skb_shinfo(skb)->frags[i].page_offset;
117                 memcpy(page_address(page) + off,
118                       vaddr + off,
119                        skb_shinfo(skb)->frags[i].size);

c053f2f7:       0f b7 74 c8 18          movzwl 0x18(%eax,%ecx,8),%esi
c053f2fc:       0f b7 5c c8 1a          movzwl 0x1a(%eax,%ecx,8),%ebx
c053f301:       8b 44 24 0c             mov    0xc(%esp),%eax
c053f305:       e8 ba 09 f1 ff          call   0xc044fcc4  page_address
c053f30a:       89 d9                   mov    %ebx,%ecx
c053f30c:       c1 e9 02                shr    $0x2,%ecx
c053f30f:       8d 3c 30                lea    (%eax,%esi,1),%edi
c053f312:       03 74 24 04             add    0x4(%esp),%esi
c053f316:       f3 a5                   rep movsl %ds:(%esi),%es:(%edi)
<<<<<    memcpy
ds: 007b esi: c0df7000 es: 007b edi: ebffb000

It seems one of the skb->frags has been unmapped.


> I'm thinking blkback will have to make a dom0 copy of the page before
> doing the unmap if there are still extra references?
>

Can the unmap be deferred, handled by the last reference holder?  Or
does this open up a potential security hole?


Thanks
kurt


Kurt Hackel
Oracle Corp.


===========================================

BUG: unable to handle kernel paging request at virtual address c0df7000
 printing eip:
c053f316
36d4c000 -> *pde = 00000000:c4237027
36c37000 -> *pme = 00000001:1bd14067
00d14000 -> *pte = 00000000:00000000
Oops: 0000 [#1]
SMP 
Modules linked in: xt_physdev bridge autofs4 sunrpc dm_round_robin
ip_conntrack_netbios_ns ipt_REJECT xt_tcpudp xt_state ip_conntrack
nfnetlink
iptable_filter ip_tables x_tables ib_iser rdma_cm ib_addr ib_cm ib_sa
ib_mad
ib_core iscsi_tcp libiscsi scsi_transport_iscsi ocfs2(U) ocfs2_dlm(U)
ocfs2_nodemanager(U) configfs dm_mirror dm_multipath dm_mod video sbs
i2c_ec
button battery asus_acpi ac parport_pc lp parport joydev sg i2c_piix4
i2c_core
pcspkr k8_edac edac_mc tg3 ide_cd serio_raw serial_core cdrom qla2xxx
scsi_transport_fc sata_svw libata mptspi mptscsih mptbase
scsi_transport_spi
sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    3
EIP:    0061:[<c053f316>]    Not tainted VLI
EFLAGS: 00010286   (2.6.18-8.1.6.0.18.el5xen #1) 
EIP is at loopback_start_xmit+0x107/0x2bd
eax: ebffb000   ebx: 00000578   ecx: 0000015e   edx: c065c800
esi: c0df7000   edi: ebffb000   ebp: f1134ea8   esp: c0701e6c
ds: 007b   es: 007b   ss: 0069
Process swapper (pid: 0, ti=c0701000 task=f77c05a0 task.ti=c0d2f000)
Stack: c9a13c00 c0df7000 00000001 c157ff60 c9a13800 f1134ea8 c9a13980
c9a13800 
       c059fc02 c9a13800 f1134ea8 c9a13980 0000000e c05a1768 c0dcf824
c0dcf800 
       f1134ea8 c05a5cfc c9a13800 ed20e040 00001fc2 00000000 f48703d4
f48703e8 
Call Trace:
 [<c059fc02>] dev_hard_start_xmit+0x198/0x1ee
 [<c05a1768>] dev_queue_xmit+0x24c/0x2e8
 [<c05a5cfc>] neigh_resolve_output+0x1b7/0x1e1
 [<c05bea8b>] ip_output+0x1c0/0x1f9
 [<c05be309>] ip_queue_xmit+0x390/0x3cf
 [<c059fc02>] dev_hard_start_xmit+0x198/0x1ee
 [<c05adbe6>] __qdisc_run+0x30/0x19a
 [<c05a17e6>] dev_queue_xmit+0x2ca/0x2e8
 [<f8640d48>] br_dev_queue_push_xmit+0x15b/0x17e [bridge]
 [<c05cbc6f>] tcp_transmit_skb+0x5e4/0x612
 [<f8641945>] br_handle_frame+0x146/0x15d [bridge]
 [<c05cc9ad>] tcp_retransmit_skb+0x4b7/0x595
 [<c05c5baf>] tcp_enter_loss+0x1a2/0x1ff
 [<c05cee58>] tcp_write_timer+0x3ff/0x5d3
 [<c05cea59>] tcp_write_timer+0x0/0x5d3
 [<c0427146>] run_timer_softirq+0x120/0x19b
 [<c0423162>] __do_softirq+0x73/0xe8
 [<c0406dda>] do_softirq+0x6e/0x102
 =======================
 [<c0406d63>] do_IRQ+0xa5/0xae
 [<c052f040>] evtchn_do_upcall+0x85/0xde
 [<c04056a1>] hypervisor_callback+0x3d/0x45
 [<c040800e>] raw_safe_halt+0xc2/0xe8
 [<c040442a>] xen_idle+0x43/0x4f
 [<c04033b0>] cpu_idle+0xa1/0xbb
Code: 24 08 89 44 24 04 8b 85 a4 00 00 00 0f b7 74 c8 18 0f b7 5c c8 1a
8b 44
24 0c e8 ba 09 f1 ff 89 d9 c1 e9 02 8d 3c 30 03 74 24 04 <f3> a5 89 d9
83 e1 03
74 02 f3 a4 8b 44 24 04 ba 05 00 00 00 e8  

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] Problem with PV disk and iSCSI