On Thursday 28 July 2005 18:53, Brandon Williams wrote:
> I am having a problem running a rather large bind installation (about
> 250k domains) inside a domU machine. Bind itself takes about 600MB of
> memory when fully loaded, and I have 840MB allocated to the domU
> machine, but bind often reports that there is no more memory for zones
> when the memory is nowhere near exhausted.
>
> I originally tried using xen 2.0.5 with the 2.6.11 branch of the linux
> kernel for both dom0 and domU, and in this configuration bind would
> think it was out of memory when it had used about 200MB every time.
>
> I tried using a 2.4.30 domU kernel and this worked better to some
> degree, but still bind thinks it's out of memory at around 400MB
> fairly often.
>
> I upgraded to xen 2.0.6 when it was released, but has been no help.
>
> I started an strace on bind and found that an malloc() call which
> should have succeeded was failing, and wrote a test program in C that
> mimmicked the bind behavior as much as possible, looping malloc() many
> times, and this succeeded so I'm at a loss.
>
> I was going to try switching dom0 to 2.4, but this isn't possible
> since I'm using LVM2 due to awful loopback performance.
>
> Outside of xen, the same configurations work perfectly (I'm using xen
> to setup a staging environment for these servers), so I know
> something's going on here but I can't seem to pinpoint it.
>
> Is anyone else seeing problems like this? I'm using a dell 2650 with
> 4GB of memory, and I've tried it on multiple machines all with the
> same behavior.
I've seen similar problems on a xenU (which did run just fine with the same
kernel/config/filesystem) on a different host...
Strange thing there was, that the main applications did run quite well despite
allocating/deallocating large chunks of memory regularily (database server +
app which did a lot of image manipultations on-the-fly) but simple system
commands like mv or cp sometimes did fail with out-of-memory errors...
The domain had lots of free ram and swapspace, though.
Upgrading gcc to 3.4.4 and upgrading glibc to 3.3.5 (with nptl/tls and
-mno-tls-direct-seg-refs) solved the problem for me, and, as a side effect,
provided a large performance boost (due to NPTL and the main server app using
lots of threads)
The host sytem with the failures is a dual Xeon with Hyperthreading. EM64T
would be there, but I'm not using it, 4GB ram installed, 3.something
available (due to missing PAE support in xen 2.0.6)
The host where the domU was running smoothly before moving was a dual xeon,
also with hyperthreading, but without EM64T extensions, and only 2GB ram.
Only remaining problem on the new host are FPU exceptions appearing out of the
blue while decompressing/compressing JPEGs, which look to me like xen is
failing to save/restore all FPU registers while switching context, but thats
a different story... (workarround was to use integer DCT in libjpeg instead
of the float one)
/Ernst
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|