WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] Ensure blktap reports I/O errors back to guest

To: "Daniel P. Berrange" <berrange@xxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH] Ensure blktap reports I/O errors back to guest
From: "Andrew Warfield" <andrew.warfield@xxxxxxxxxxxx>
Date: Fri, 1 Dec 2006 08:55:32 -0800
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 01 Dec 2006 08:55:37 -0800
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=Mm1Y842nuuH/qCWw2kc2jQxexX1+lisJbubYsN6bzA+RZIVtewOYHBdgpQxWSXqewQFOCduG23fPICbcrzVBgxhFhjEKGapQPGYwo1AnrgwIph0YmDTXLp7c4cTp2Os0+7ydZYHmTabbU4+03iiPupHvWx/mI28HVm0p4K7Ixh0=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <20061201162000.GB17515@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20061201162000.GB17515@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Applied, thanks Daniel.

a.

On 12/1/06, Daniel P. Berrange <berrange@xxxxxxxxxx> wrote:
There are a number of flaws in the blktap userspace daemon when dealing
with I/O errors.

 - The backends which use AIO check the io_events.res member to determine
   if an I/O error occurred. Which is good. But when calling the callback
   to signal completion of the I/O, they pass the io_events.res2 member

   Now this seems fine at first glance[1]

     "res is the usual result of an I/O operation: the number of bytes
      transfered, or a negative error code. res2 is a second status
      value which will be returned to the user"

   Except that

      "currently (2.6.0-test9), callers of aio_complete() within the
       kernel always set res2 to zero."

   And this hasn't changed anytime since 2.6.0, so by passing through
   the status from 'res2', the callback thinks the I/O operation succeeded
   even when it failed :-(

   The fix is simple instead of passing 'res2', just pass

      ep->res == io->u.c.nbytes ? 0 : 1

   This would solve the error reporting to the guest, except that there
   is a second flaw...

 - The tapdisk I/O completion callback checks the status parameter
   passed in, syslog's it and then returns. It never bothers to send
   the I/O completion response back to the blktap kernel driver when
   a failure occurrs.

   Fortunately the fix for this is also simple. Instead of returning
   from the callback when dealing with an error, we simply toggle the
   status field for the pending response to BLKIF_RSP_ERROR and then
   continue with the normal codepath. So the error eventually gets
   back to the guest.


The scenario I used to discover the problem and test the patch is thus:

 - In dom0  create a filesystem with only 200 MB of free space
 - Create a 1 GB sparse file on this volume.
 - Configure the guest so this sparse file appears as /dev/xvdb
 - In the domU create a single partition on /dev/xvdb and format
   it with ext3.
 - In the DomU, mount /dev/xvdb1 on /mnt and then run

      dd if=/dev/zero of=/mnt/data.bin bs=1GB count=1


Without this patch, the 'dd' command would succeed in writing 1 GB of data
even though the underlying disk in Dom0 was only 200 MB in size. More complex
tests of copying a whole directory heirarchy across resulted in catastrophic
data corruption of the filessytem itself. Manual fsck was needed to fixup
the filesystem & there were many very bad errors needing fixing.


With this patch applied the DomU sees the I/O failures and kernel  logs
messages

Dec  1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 
722127
Dec  1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 
730327
Dec  1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 
738527
Dec  1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 
746727
Dec  1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 
754927
Dec  1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 
763127
Dec  1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 
771327
Dec  1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 
779527
Dec  1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 
792399

It will retry the I/O operation until it runs out of sectors to try, and then
fail the operation. The filesystem is not seriously damaged - ext3 journal
recovery will trivially cleanup if the guest is rebooted after the disk in
Dom0 is enlarged.

   Signed-off-by: Daniel P. Berrange <berrange@xxxxxxxxxx>

Regards,
Dan.

[1] http://lwn.net/Articles/24366/
--
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=|


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel





_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>