[Xen-devel] Re: Linux Stubdom Problem

2011/7/29 Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>

On Thu, 28 Jul 2011, Jiageng Yu wrote:
> 2011/7/27 Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
> On Wed, 27 Jul 2011, Jiageng Yu wrote:
> > 2011/7/27 Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>:
> > > On Tue, 26 Jul 2011, Jiageng Yu wrote:
> > >> 2011/7/26 Jiageng Yu <yujiageng734@xxxxxxxxx>:
> > >> > 2011/7/22 Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>:
> > >> >> On Thu, 21 Jul 2011, Jiageng Yu wrote:
> > >> >>> 2011/7/19 Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>:
> > >> >>> > CC'ing Tim and xen-devel
> > >> >>> >
> > >> >>> > On Mon, 18 Jul 2011, Jiageng Yu wrote:
> > >> >>> >> 2011/7/16 Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>:
> > >> >>> >> > On Fri, 15 Jul 2011, Jiageng Yu wrote:
> > >> >>> >> >> 2011/7/15 Jiageng Yu <yujiageng734@xxxxxxxxx>:
> > >> >>> >> >> > 2011/7/15 Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>:
> > >> >>> >> >> >> On Fri, 15 Jul 2011, Jiageng Yu wrote:
> > >> >>> >> >> >>> > Does it mean you are actually able to boot an HVM guest using Linux
> > >> >>> >> >> >>> > based stubdoms?? Did you manage to solve the framebuffer problem too?
> > >> >>> >> >> >>>
> > >> >>> >> >> >>>
> > >> >>> >> >> >>> The HVM guest is booted. But the boot process is terminated because
> > >> >>> >> >> >>> vga bios is not invoked by seabios. I have got stuck here for a week.
> > >> >>> >> >> >>>
> > >> >>> >> >> >>
> > >> >>> >> >> >> There was a bug in xen-unstable.hg or seabios that would prevent vga bios from
> > >> >>> >> >> >> being loaded, it should be fixed now.
> > >> >>> >> >> >>
> > >> >>> >> >> >> Alternatively you can temporarely work around the issue with this hacky patch:
> > >> >>> >> >> >>
> > >> >>> >> >> >> ---
> > >> >>> >> >> >>
> > >> >>> >> >> >>
> > >> >>> >> >> >> diff -r 00d2c5ca26fd tools/firmware/hvmloader/hvmloader.c
> > >> >>> >> >> >> --- a/tools/firmware/hvmloader/hvmloader.c Fri Jul 08 18:35:24 2011 +0100
> > >> >>> >> >> >> +++ b/tools/firmware/hvmloader/hvmloader.c Fri Jul 15 11:37:12 2011 +0000
> > >> >>> >> >> >> @@ -430,7 +430,7 @@ int main(void)
> > >> >>> >> >> >> bios->create_pir_tables();
> > >> >>> >> >> >> }
> > >> >>> >> >> >>
> > >> >>> >> >> >> - if ( bios->load_roms )
> > >> >>> >> >> >> + if ( 1 )
> > >> >>> >> >> >> {
> > >> >>> >> >> >> switch ( virtual_vga )
> > >> >>> >> >> >> {
> > >> >>> >> >> >>
> > >> >>> >> >> >>
> > >> >>> >> >> >
> > >> >>> >> >> > Yes. Vga bios is booted. However, the upstram qemu receives a SIGSEGV
> > >> >>> >> >> > signal subsequently. I am trying to print the call stack when
> > >> >>> >> >> > receiving the signal.
> > >> >>> >> >> >
> > >> >>> >> >>
> > >> >>> >> >> Hi,
> > >> >>> >> >>
> > >> >>> >> >> I find the cause of SIGSEGV signal:
> > >> >>> >> >>
> > >> >>> >> >> cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf, int
> > >> >>> >> >> len, int is_write)
> > >> >>> >> >> ->memcpy(buf, ptr + (addr & ~TARGET_PAGE_MASK), l);
> > >> >>> >> >>
> > >> >>> >> >> In my case, ptr=0 and addr=0xc253e, when qemu attempts to vist
> > >> >>> >> >> 0x53e address, the SIGSEGV signal is generated.
> > >> >>> >> >>
> > >> >>> >> >> I believe the qemu is trying to vist vram in this moment. This
> > >> >>> >> >> code seems no problem, and I will continue to find the root cause.
> > >> >>> >> >>
> > >> >>> >> >
> > >> >>> >> > The vram is allocated by qemu, see hw/vga.c:vga_common_init.
> > >> >>> >> > qemu_ram_alloc under xen ends up calling xen_ram_alloc that calls
> > >> >>> >> > xc_domain_populate_physmap_exact.
> > >> >>> >> > xc_domain_populate_physmap_exact is the hypercall that should ask Xen to
> > >> >>> >> > add the missing vram pages in the guest. Maybe this hypercall is failing
> > >> >>> >> > in your case?
> > >> >>> >>
> > >> >>> >>
> > >> >>> >> Hi,
> > >> >>> >>
> > >> >>> >> I continue to invesgate this bug and find hypercall_mmu_update in
> > >> >>> >> qemu_remap_bucket(xc_map_foreign_bulk) is failing:
> > >> >>> >>
> > >> >>> >> do_mmu_update
> > >> >>> >> ->mod_l1_entry
> > >> >>> >> -> if ( !p2m_is_ram(p2mt) || unlikely(mfn == INVALID_MFN) )
> > >> >>> >> return -EINVAL;
> > >> >>> >>
> > >> >>> >> mfn==INVALID_MFN, because :
> > >> >>> >>
> > >> >>> >> mod_l1_entry
> > >> >>> >> ->gfn_to_mfn(p2m_get_hostp2m(pg_dom), l1e_get_pfn(nl1e), &p2mt));
> > >> >>> >> ->p2m->get_entry
> > >> >>> >> ->p2m_gfn_to_mfn
> > >> >>> >> -> if ( gfn > p2m->max_mapped_pfn )
> > >> >>> >> /* This pfn is higher than the
> > >> >>> >> highest the p2m map currently holds */
> > >> >>> >> return _mfn(INVALID_MFN);
> > >> >>> >>
> > >> >>> >> The p2m->max_mapped_pfn is usually 0xfffff. In our case,
> > >> >>> >> mmu_update.val exceeds 0x8000000100000000. Additionally, l1e =
> > >> >>> >> l1e_from_intpte(mmu_update.val); gfn=l1e_get_pfn(l1e ). Therefore, gfn
> > >> >>> >> will exceed 0xfffff.
> > >> >>> >>
> > >> >>> >> In the case of minios based stubdom, the mmu_update.vals do not
> > >> >>> >> exceed 0x8000000100000000. Next, I will invesgate why mmu_update.val
> > >> >>> >> exceeds 0x8000000100000000.
> > >> >>> >
> > >> >>> > It looks like the address of the guest that qemu is trying to map is not
> > >> >>> > valid.
> > >> >>> > Make sure you are running a guest with less than 2GB of ram, otherwise
> > >> >>> > you need the patch series that Anthony sent on Friday:
> > >> >>> >
> > >> >>> > http://marc.info/?l=qemu-devel&m=131074042905711&w=2
> > >> >>>
> > >> >>> Not this problem. I never alloc more than 2GB for the hvm guest. The
> > >> >>> call stack in qemu is:
> > >> >>>
> > >> >>> qemu_get_ram_ptr
> > >> >>> ->qemu_map_cache(addr, 0, 1)
> > >> >>> -> if (!entry->vaddr_base || entry->paddr_index !=
> > >> >>> address_index ||
> > >> >>> !test_bit(address_offset >>
> > >> >>> XC_PAGE_SHIFT, entry->valid_mapping)) {
> > >> >>> qemu_remap_bucket(entry, size ? :
> > >> >>> MCACHE_BUCKET_SIZE, address_index);
> > >> >>> ->xc_map_foreign_bulk(xen_xc,
> > >> >>> xen_domid, PROT_READ|PROT_WRITE,
> > >> >>>
> > >> >>> pfns, err, nb_pfn);
> > >> >>>
> > >> >>> The qemu tries to map pages from hvm guest(xen_domid) to linux
> > >> >>> stubdom. But some hvm pages' pfns are larger than 0xfffff. So, in the
> > >> >>> p2m_gfn_to_mfn, the judgement condition is valid:(p2m->max_mapped_pfn
> > >> >>> = 0xfffff)
> > >> >>>
> > >> >>> if ( gfn > p2m->max_mapped_pfn )
> > >> >>> /* This pfn is higher than the highest the p2m map currently holds */
> > >> >>> return _mfn(INVALID_MFN);
> > >> >>>
> > >> >>> In minios stubdom case, the hvm pages' pfns do not exceed 0xfffff.
> > >> >>> Maybe the address translation in linux stubdom cause this probem?
> > >> >>
> > >> >> Trying to map a pfn > 0xfffff is clearly a mistake if the guest's memory
> > >> >> does not exceed 2G:
> > >> >>
> > >> >> 0xfffff * 4096 > 2G
> > >> >>
> > >> >>
> > >> >>> BTW, in minios stubdom case, there seems no hvmloader process. Is it
> > >> >>> needed in linux stubdom?
> > >> >>
> > >> >> hvmloader is the first thing that runs within the guest, it is not a
> > >> >> process in the stubdom or in dom0.
> > >> >> It is required in both minios and linux stubdoms.
> > >> >
> > >> > Hi Stefano,
> > >> >
> > >> > I patched these patches, but we still have the same problem.
> > >> > However, I noticed the qemu_get_ram_ptr(s->vram_offset) in
> > >> > vga_common_init function was also failed. Maybe this can explain the
> > >> > previous problem, which happened in the phase of trying to remap
> > >> > 0xc0000-0xc8fff of hvm guest into stubdom.
> > >> >
> > >> > I have traced the process of qemu_get_ram_ptr(s->vram_offset) and
> > >> > located the failure in p2m_gfn_to_mfn function:
> > >> >
> > >> > pod_retry_l3:
> > >> > if ( (l3e_get_flags(*l3e) & _PAGE_PRESENT) == 0 )
> > >> > {
> > >> > .....
> > >> > return _mfn(INVALID_MFN);
> > >> > }
> > >> >
> > >> > I will continue to analyze this failure.
> > >> >
> > >> > Thanks!
> > >> >
> > >> > Jiageng Yu.
> > >> >
> > >>
> > >>
> > >> Hi,
> > >>
> > >> I compared the two executions of vga_common_init function in dom0
> > >> and linux based stubdom. The former succeeded and the later was
> > >> failed. First, they have the same call stack:
> > >>
> > >> Dom0 & Stubdom
> > >> _________________________________________________________
> > >> vga_common_init
> > >> ->qemu_get_ram_ptr(s->vram_offset)
> > >> ->block->host = xen_map_block(block->offset, block->length);
> > >> ->xc_map_foreign_bulk()
> > >> ->linux_privcmd_map_foreign_bulk()
> > >> ->xen_remap_domain_mfn_range()
> > >> ->HYPERVISOR_mmu_update()
> > >> __________________________________________________________
> > >>
> > >> Xen
> > >> __________________________________________________________
> > >> do_mmu_update()
> > >> ->case MMU_PT_UPDATE_PRESERVE_AD:
> > >> ->case PGT_l1_page_table:
> > >> ->mod_l1_entry(va, l1e, mfn,cmd == MMU_PT_UPDATE_PRESERVE_AD, v, pg_owner);
> > >> ->mfn_x(gfn_to_mfn(p2m_get_hostp2m(pg_dom),
> > >> l1e_get_pfn(nl1e), &p2mt));
> > >> ->gfn_to_mfn_type_p2m()
> > >> ->p2m->get_entry(p2m, gfn, t, &a, q);
> > >> ->p2m_gfn_to_mfn(p2m,gfn,t,&a,q)
> > >> ->if ( (l3e_get_flags(*l3e) &
> > >> _PAGE_PRESENT) == 0 )
> > >> -> Error happens!
> > >>
> > >> The qemu in dom0 can find the l3e of hvm guest, but the qemu in linux
> > >> stubdom cannot find the l3e. In my case, s->vram_offset=0x40000000,
> > >> vga_ram_size=0x800000. Therefore, we are going to map hvm guest's
> > >> address area(pfn:0x40000, size:8M) into linux stubdom's address space.
> > >>
> > >> In p2m_gfn_to_mfn function, p2m->domain->domain_id=hvm guest,
> > >> gfn=0x40000, t=p2m_mmio_dm.
> > >> mfn = pagetable_get_mfn(p2m_get_pagetable(p2m) = 0x10746e;
> > >> map_domain_page(mfn_x(mfn)) is also success. However, after executing:
> > >> l3e += ( (0x40000 << PAGE_SHIFT) >> L3_PAGETABLE_SHIFT)
> > >> the l3e->l3 =0 , and the error happens.
> > >>
> > >> So, in linux stubdom, when we are going to map the specified hvm
> > >> guest's address(pfn:0x40000, size:8M), we find these pages of hvm
> > >> guest are not present. This is never happened in qemu of dom0. Could
> > >> you give me some prompts to this problem?
> > >
> > >
> > > It seems that you are trying to map pages that don't exist.
> > > The pages in question should be allocated by:
> > >
> > > qemu_ram_alloc(NULL, "vga.vram", vga_ram_size)
> > > qemu_ram_alloc_from_ptr
> > > xen_ram_alloc
> > > xc_domain_populate_physmap_exact
> > >
> > > so I would add some printf and printk on this code path to find out if
> > > xc_domain_populate_physmap_exact fails for some reasons.
> >
> > Hmm.. the linux stubdom kernel had a wrong p2m pair
> > <gfn(0x40000),mfn(0x127bd2)> for some reason. But next, the
> > xc_domain_populate_physmap_exact will setup the correct p2m pair
> > <gfn(0x40000),mfn(0x896b7)>. However, the p2m pair in stubdom kernel
> > has not been updated, because the fllowing access to 0x40000 still
> > uses 0x127bd2.
>
> The p2m for the guest domain is only one in Xen, so I cannot understand
> how it is possible that you get the old mfn value.
> Also there shouldn't even be an old value because before
> xc_domain_populate_physmap_exact pfn 0x40000 wasn't even allocated in
> the guest yet.
>
> Make sure you are using the right domid in both calls
> (xc_domain_populate_physmap_exact and xc_map_foreign_bulk), also make
> sure that libxenlight calls xc_domain_set_target and xs_set_target for
> the stubdom otherwise the stubdom is not going to be privileged enough
> to allocate and map memory of the guest.
>
>
> > I notice you have a patch: xen: modify kernel mappings corresponding
> > to granted pages. I think maybe it could slove my problem.
>
> That patch fixes a different issue, related to grant table mappings.
>
>
>
> OK. That is my fault.
>
> The root cause of previous problem is that the backend drivers in qemu are not stopped. To confirm this root cause,
> I try to erase the codes about stubdom in xen_be_init function of old qemu. The same problem appears. The following
> patch is to fix this issue in upstream qemu.
>
> diff --git a/xen-all.c b/xen-all.c
> index b73fc43..8f0645e 100644
> --- a/xen-all.c
> +++ b/xen-all.c
> @@ -472,12 +479,23 @@ static void cpu_handle_ioreq(void *opaque)
> static void xenstore_record_dm_state(XenIOState *s, const char *state)
> {
>      char path[50];
> +#ifdef CONFIG_STUBDOM
> +    s->xenstore = xs_daemon_open();
> +    if (s->xenstore == NULL) {
> +        perror("xen: xenstore open");
> +        return -errno;
> +    }
> +#endif
>      snprintf(path, sizeof (path), "/local/domain/0/device-model/%u/state", xen_domid);
>      if (!xs_write(s->xenstore, XBT_NULL, path, state, strlen(state))) {
>          fprintf(stderr, "error recording dm state\n");
>          exit(1);
>      }
> +#ifdef CONFIG_STUBDOM
> +    xs_daemon_close(s->xenstore);
> +    s->xenstore = NULL;
> +#endif
> }

Why do you need to re-open the xenstore connection here?
It should be already been opened by xen_hvm_init, like in the normal
case.

> static void xen_main_loop_prepare(XenIOState *state)
> @@ -538,6 +556,7 @@ int xen_hvm_init(void)
>
>      state = qemu_mallocz(sizeof (XenIOState));
>
> +#ifndef CONFIG_STUBDOM
>      state->xce_handle = xen_xc_evtchn_open(NULL, 0);
>      if (state->xce_handle == XC_HANDLER_INITIAL_VALUE) {
>          perror("xen: event channel open");
> @@ -549,6 +568,10 @@ int xen_hvm_init(void)
>          perror("xen: xenstore open");
>          return -errno;
>      }
> +#else
> +    state->xce_handle = XC_HANDLER_INITIAL_VALUE;
> +    state->xenstore = NULL;
> +#endif
>

So you are explicitly avoiding to open the xenstore connection from
xen_hvm_init, why?
I think you might be trying to fix a race condition, maybe something is
not ready yet at this point that becomes ready later?

>      state->exit.notify = xen_exit_notifier;
>      qemu_add_exit_notifier(&state->exit);
> @@ -575,9 +598,10 @@ int xen_hvm_init(void)
>
>      state->ioreq_local_port = qemu_mallocz(smp_cpus * sizeof (evtchn_port_t));
>
>      /* FIXME: how about if we overflow the page here? */
>      for (i = 0; i < smp_cpus; i++) {
> -        rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
> +       rc = xc_evtchn_bind_interdomain(xen_xc, xen_domid,
>                                          xen_vcpu_eport(state->shared_page, i));
>          if (rc == -1) {
>              fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>

This cannot be right: xc_evtchn_bind_interdomain takes a xc_evtchn* as
first paramter while xen_xc is xc_interface*
This change would prevent you from receiving any IO request
notifications from Xen.

> The new problem is my stubdom hangs at:
>
> hvmloader:
>      ->main()
>             ->pci_setup()
>                     ->pci_writeb(PCI_ISA_DEVFN, 0x60 + link, isa_irq);
>
> I am investigating this problem. The pci_writeb will finally call the hvm_set_pci_link_route in Xen:
>
> hvmloader:             pci_writeb(PCI_ISA_DEVFN, 0x60 + link, isa_irq);
> qemu(stubdom):     PCIHostState->data_handler->write()
> qemu(stubdom):     i440fx_write_config_xen()
> qemu(stubdom):     xen_piix_pci_write_config_client()
> xenctrl:                   xc_hvm_set_pci_link_route()
>
> The ioport is registered by pci_host_data_register_ioport(0xcfc, s) function.
>
> I will find out why not invoke i440fx_write_config_xen() in my case. I will also read the pciutils.patch of minios stubdom
> and maybe find something interesting.

I think you are not receiving any IO request notifications from Xen
because of the previous change.
It is probably worth adding a printf into xen-all.c:handle_ioreq to see
if you receive something.

I have noticed the mistake. In fact, we shall stop the map_foreign_pages of xen_console and xenfb devices in qemu. Because the front drivers in stubdom has already map the memories. This is my new patch. But it is not stable, I am testing it.

We use xc_handle to map foreign pages in xenfb and xen_console devices. If qemu running on stubdom, the xc_handle is invalid.

diff --git a/hw/xen_backend.c b/hw/xen_backend.c
index cfb53c8..11c53fe 100644
--- a/hw/xen_backend.c
+++ b/hw/xen_backend.c
@@ -47,6 +47,7 @@ XenXC xen_xc = XC_HANDLER_INITIAL_VALUE;
XenGnttab xen_xcg = XC_HANDLER_INITIAL_VALUE;
struct xs_handle *xenstore = NULL;
const char *xen_protocol;
+XenXC xc_handle = XC_HANDLER_INITIAL_VALUE;

/* private */
static QTAILQ_HEAD(XenDeviceHead, XenDevice) xendevs = QTAILQ_HEAD_INITIALIZER(xendevs);
@@ -655,6 +656,7 @@ static void xen_be_evtchn_event(void *opaque)

int xen_be_init(void)
{
+#ifndef CONFIG_STUBDOM
     xenstore = xs_daemon_open();
     if (!xenstore) {
         xen_be_printf(NULL, 0, "can't connect to xenstored\n");
@@ -665,10 +667,16 @@ int xen_be_init(void)
         goto err;
     }

+    if (xc_handle == XC_HANDLER_INITIAL_VALUE) {
+        goto err;
+    }
+#endif
+
     if (xen_xc == XC_HANDLER_INITIAL_VALUE) {
         /* Check if xen_init() have been called */
         goto err;
     }
+
     return 0;

err:
diff --git a/hw/xen_backend.h b/hw/xen_backend.h
index 9d36df3..bc5a157 100644
--- a/hw/xen_backend.h
+++ b/hw/xen_backend.h
@@ -59,6 +59,9 @@ extern XenXC xen_xc;
extern struct xs_handle *xenstore;
extern const char *xen_protocol;

+/* invalid in linux stubdom */
+extern XenXC xc_handle;
+
/* xenstore helper functions */
int xenstore_write_str(const char *base, const char *node, const char *val);
int xenstore_write_int(const char *base, const char *node, int ival);
diff --git a/hw/xen_console.c b/hw/xen_console.c
index c6c8163..66b6dd7 100644
--- a/hw/xen_console.c
+++ b/hw/xen_console.c
@@ -213,7 +213,7 @@ static int con_connect(struct XenDevice *xendev)
     if (xenstore_read_int(con->console, "limit", &limit) == 0)
  con->buffer.max_capacity = limit;

-    con->sring = xc_map_foreign_range(xen_xc, con->xendev.dom,
+   con->sring = xc_map_foreign_range(xc_handle, con->xendev.dom,
           XC_PAGE_SIZE,
           PROT_READ|PROT_WRITE,
           con->ring_ref);
diff --git a/hw/xenfb.c b/hw/xenfb.c
index 039076a..278fa60 100644
--- a/hw/xenfb.c
+++ b/hw/xenfb.c
@@ -104,7 +104,7 @@ static int common_bind(struct common *c)
     if (xenstore_read_fe_int(&c->xendev, "event-channel", &c->xendev.remote_port) == -1)
  return -1;

-    c->page = xc_map_foreign_range(xen_xc, c->xendev.dom,
+   c->page = xc_map_foreign_range(xc_handle, c->xendev.dom,
        XC_PAGE_SIZE,
        PROT_READ | PROT_WRITE, mfn);
     if (c->page == NULL)
@@ -482,14 +482,14 @@ static int xenfb_map_fb(struct XenFB *xenfb)
     fbmfns = qemu_mallocz(sizeof(unsigned long) * xenfb->fbpages);

     xenfb_copy_mfns(mode, n_fbdirs, pgmfns, pd);
-    map = xc_map_foreign_pages(xen_xc, xenfb->c.xendev.dom,
+   map = xc_map_foreign_pages(xc_handle, xenfb->c.xendev.dom,
           PROT_READ, pgmfns, n_fbdirs);
     if (map == NULL)
  goto out;
     xenfb_copy_mfns(mode, xenfb->fbpages, fbmfns, map);
     munmap(map, n_fbdirs * XC_PAGE_SIZE);

-    xenfb->pixels = xc_map_foreign_pages(xen_xc, xenfb->c.xendev.dom,
+   xenfb->pixels = xc_map_foreign_pages(xc_handle, xenfb->c.xendev.dom,
       PROT_READ | PROT_WRITE, fbmfns, xenfb->fbpages);
     if (xenfb->pixels == NULL)
  goto out;
diff --git a/xen-all.c b/xen-all.c
index b73fc43..04dfb51 100644
--- a/xen-all.c
+++ b/xen-all.c
@@ -527,12 +534,22 @@ int xen_init(void)
         return -1;
     }

+#ifdef CONFIG_STUBDOM
+    return 0;
+#endif
+
+    xc_handle = xen_xc_interface_open(0, 0, 0);
+    if (xc_handle == XC_HANDLER_INITIAL_VALUE) {
+        xen_be_printf(NULL, 0, "can't open xen interface\n");
+        return -1;
+    }
+
     return 0;
}

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: Linux Stubdom Problem