Any chance this will be refreshed for 2.6.18? I very much enjoy being
able to block-attach in domain 0, but am less enamoured of the
frequent hangs when I fsck those devices...
On Tuesday, 02 January 2007 at 17:37, jake wrote:
> blktap devices attached to dom0 are liable to wedge during IO transfers.
> The problem does not occur in typical usage scenarios (i.e., virtual
> devices attached to guest domains); it is unique to the unanticipated
> case in which virtual devices are attached to dom0.
>
> The problem arises when processes in dom0 generate a large number of
> dirty pages while writing to a block-attached device. Once the number
> of dirty pages reaches a certain threshold, the dom0 kernel begins
> throttling IO in balance_dirty_pages; processes traversing the buffered
> IO path will block in this function until the number of dirty pages
> decreases.
>
> This is bad for the tapdisk process, which is responsible for servicing
> IO requests from the blktap driver. The tapdisk process normally
> performs direct IO, but if it writes to a hole in a sparse file, it
> falls into the buffered IO path. If the tapdisk process blocks in
> balance_dirty_pages, it will do so indefinitely, because it is the only
> process that cleans the pages dirtied by the processes writing to the
> virtual device. Thus dirty pages continue to amass in dom0 as IO is
> performed on the virtual device, but none of them make it to the
> physical devices because the tapdisk process is unable to service the
> requests.
>
> Note that when used as originally intended, blktap does not suffer from
> this problem: when blktap devices are attached to guest domains,
> performing IO on them dirties pages in the guest domain, not in dom0, so
> the tapdisk process doesn't get throttled in balance_dirty_pages.
>
> Attached is a patch that eschews the dom0 problem by exempting the
> tapdisk process from blocking in balance_dirty_pages. tapdisk processes
> servicing dom0-attached devices are granted special status using a
> modified setpriority syscall; a check in balance_dirty_pages ensures
> that such processes do not block indefinitely.
>
> This is clearly a hacky solution; any suggestions for improvement are
> welcome.
> # HG changeset patch
> # User Jake Wires <jwires@xxxxxxxxxxxxx>
> # Date 1166551978 28800
> # Node ID 34c6a9a2983ae46fad5dbba7e4b49520fb639a8c
> # Parent df1e7ae878b4badf4e5555df12a1c4d233170fb9
> [BLKTAP] prevent tapdisk processes from blocking in balance_dirty_pages
>
> This patch mods the setpriority syscall to enable marking processes as special
> IO processes. IO processes are exempted from blocking in balance_dirty_pages.
> This patch is intended to avoid deadlocks when block-attaching a blktap VDI to
> dom0.
>
> diff -r df1e7ae878b4 -r 34c6a9a2983a patches/linux-2.6.16.33/series
> +++ b/patches/linux-2.6.16.33/series Tue Dec 19 10:12:58 2006 -0800
> @@ -5,6 +5,7 @@ git-4bfaaef01a1badb9e8ffb0c0a37cd2379008
> git-4bfaaef01a1badb9e8ffb0c0a37cd2379008d21f.patch
> linux-2.6.19-rc1-kexec-move_segment_code-x86_64.patch
> blktap-aio-16_03_06.patch
> +blktap-ioprio.patch
> device_bind.patch
> fix-hz-suspend.patch
> fix-ide-cd-pio-mode.patch
> diff -r df1e7ae878b4 -r 34c6a9a2983a tools/blktap/drivers/blktapctrl.c
> +++ b/tools/blktap/drivers/blktapctrl.c Tue Dec 19 10:12:58 2006 -0800
> @@ -51,6 +51,7 @@
> #include <xs.h>
> #include <printf.h>
> #include <sys/time.h>
> +#include <sys/resource.h>
> #include <syslog.h>
>
> #include "blktaplib.h"
> @@ -535,6 +536,14 @@ int blktapctrl_new_blkif(blkif_t *blkif)
> goto fail;
> }
>
> + /* exempt tapdisk from flushing when attached to dom0 */
> + if (blkif->domid == 0)
> + if (setpriority(PRIO_PROCESS,
> + blkif->tappid, PRIO_SPECIAL_IO)) {
> + DPRINTF("Unable to prioritize tapdisk proc\n");
> + goto fail;
> + }
> +
> /* Both of the following read and write calls will block up to
> * max_timeout val*/
> if (write_msg(blkif->fds[WRITE], CTLMSG_PARAMS, blkif, ptr)
> diff -r df1e7ae878b4 -r 34c6a9a2983a tools/blktap/lib/blktaplib.h
> +++ b/tools/blktap/lib/blktaplib.h Tue Dec 19 10:12:58 2006 -0800
> @@ -57,6 +57,8 @@
> #define BLKTAP_QUERY_ALLOC_REQS 8
> #define BLKTAP_IOCTL_FREEINTF 9
> #define BLKTAP_IOCTL_PRINT_IDXS 100
> +
> +#define PRIO_SPECIAL_IO -9999
>
> /* blktap switching modes: (Set with BLKTAP_IOCTL_SETMODE) */
> #define BLKTAP_MODE_PASSTHROUGH 0x00000000 /* default */
> diff -r df1e7ae878b4 -r 34c6a9a2983a
> patches/linux-2.6.16.33/blktap-ioprio.patch
> +++ b/patches/linux-2.6.16.33/blktap-ioprio.patch Tue Dec 19 10:12:58
> 2006 -0800
> @@ -0,0 +1,81 @@
> +diff -pruN ../orig-linux-2.6.16.33/include/linux/sched.h
> ./include/linux/sched.h
> +--- ../orig-linux-2.6.16.33/include/linux/sched.h 2006-12-18
> 18:42:00.000000000 -0800
> ++++ ./include/linux/sched.h 2006-12-18 18:46:07.000000000 -0800
> +@@ -706,6 +706,7 @@ struct task_struct {
> + prio_array_t *array;
> +
> + unsigned short ioprio;
> ++ short special_prio;
> +
> + unsigned long sleep_avg;
> + unsigned long long timestamp, last_ran;
> +diff -pruN ../orig-linux-2.6.16.33/include/linux/resource.h
> ./include/linux/resource.h
> +--- ../orig-linux-2.6.16.33/include/linux/resource.h 2006-12-18
> 18:42:00.000000000 -0800
> ++++ ./include/linux/resource.h 2006-12-18 18:44:35.000000000 -0800
> +@@ -44,6 +44,7 @@ struct rlimit {
> +
> + #define PRIO_MIN (-20)
> + #define PRIO_MAX 20
> ++#define PRIO_SPECIAL_IO -9999
> +
> + #define PRIO_PROCESS 0
> + #define PRIO_PGRP 1
> +diff -pruN ../orig-linux-2.6.16.33/include/linux/init_task.h
> ./include/linux/init_task.h
> +--- ../orig-linux-2.6.16.33/include/linux/init_task.h 2006-12-18
> 18:42:00.000000000 -0800
> ++++ ./include/linux/init_task.h 2006-12-18 18:45:56.000000000 -0800
> +@@ -85,6 +85,7 @@ extern struct group_info init_groups;
> + .lock_depth = -1, \
> + .prio = MAX_PRIO-20, \
> + .static_prio = MAX_PRIO-20, \
> ++ .special_prio = 0, \
> + .policy = SCHED_NORMAL, \
> + .cpus_allowed = CPU_MASK_ALL, \
> + .mm = NULL, \
> +diff -pruN ../orig-linux-2.6.16.33/kernel/sys.c ./kernel/sys.c
> +--- ../orig-linux-2.6.16.33/kernel/sys.c 2006-12-18 18:42:00.000000000
> -0800
> ++++ ./kernel/sys.c 2006-12-18 18:43:30.000000000 -0800
> +@@ -245,6 +245,11 @@ static int set_one_prio(struct task_stru
> + error = -EPERM;
> + goto out;
> + }
> ++ if (niceval == PRIO_SPECIAL_IO) {
> ++ p->special_prio = PRIO_SPECIAL_IO;
> ++ error = 0;
> ++ goto out;
> ++ }
> + if (niceval < task_nice(p) && !can_nice(p, niceval)) {
> + error = -EACCES;
> + goto out;
> +@@ -272,10 +277,15 @@ asmlinkage long sys_setpriority(int whic
> +
> + /* normalize: avoid signed division (rounding problems) */
> + error = -ESRCH;
> +- if (niceval < -20)
> +- niceval = -20;
> +- if (niceval > 19)
> +- niceval = 19;
> ++ if (niceval == PRIO_SPECIAL_IO) {
> ++ if (which != PRIO_PROCESS)
> ++ return -EINVAL;
> ++ } else {
> ++ if (niceval < -20)
> ++ niceval = -20;
> ++ if (niceval > 19)
> ++ niceval = 19;
> ++ }
> +
> + read_lock(&tasklist_lock);
> + switch (which) {
> +diff -pruN ../orig-linux-2.6.16.33/mm/page-writeback.c ./mm/page-writeback.c
> +--- ../orig-linux-2.6.16.33/mm/page-writeback.c 2006-12-19
> 10:03:59.000000000 -0800
> ++++ ./mm/page-writeback.c 2006-12-19 10:04:17.000000000 -0800
> +@@ -231,6 +231,9 @@ static void balance_dirty_pages(struct a
> + pages_written += write_chunk - wbc.nr_to_write;
> + if (pages_written >= write_chunk)
> + break; /* We've done our duty */
> ++ if (current->special_prio == PRIO_SPECIAL_IO)
> ++ break; /* Exempt IO processes */
> ++
> + }
> + blk_congestion_wait(WRITE, HZ/10);
> + }
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|