| 
 Thanks for comments! The solution I 
present in last mail has an advantage that, it could almost decrease the event 
channel notification frequency to zero, which will save a lot of CPU cycle 
especially for HVM PV driver.
  For James's suggestion, actually we have 
another solution which works in that style, see the attachment. We only 
modifies the netback, and keeps netfront unchanged. The patch is based on PV-ops 
Dom0, so the hrtimer is accurate. We set a timer in netback. If timer elapses or 
there are RING_SIZE/2 data slots in ring, netback will notify netfront (Of 
course we could modify the 'event' parameter to replace the check of data number 
in ring). The patch contains auto adjustment logic for each netfront's event 
channel frequency according to packet rate and size in a timer period. Also user 
could assign specific timer frequency for a certain netfront by using standard 
coalesce interface.         If set the 
event notification frequency to 1000HZ, it also brings a lot of CPU utilization 
decrease like the previous test result. Here are the detail result for the two 
solutions. I think the two solutions could coexist, and we can set a MACRO to 
indicate which solution is used as default. 
Here the w/ FE 
patch means that applying the first solution patch attached in my last mail. w/ 
BE patch means applying the second solution patch attached in this 
mail. 
VM receive 
results: 
  
  
    | 
       UDP Receive (Single 
      Guest VM)  | 
    
          | 
    
       TCP Receive (Single 
      Guest VM)  |  
  
    | 
       Packet Size 
      (bytes)  | 
    
       Test 
      Case  | 
    
       Throughput 
      (Mbps)  | 
    
       Dom0 CPU 
      Util  | 
    
       Guest CPU Total 
      Util  | 
    
          | 
    
       Packet Size 
      (bytes)  | 
    
       Test 
      Case  | 
    
       Throughput 
      (Mbps)  | 
    
       Dom0 CPU 
      Util  | 
    
       Guest CPU Total 
      Util  |  
  
    | 
       50  | 
    
       w/o 
      patch  | 
    
       83.25  | 
    
       100.00%  | 
    
       26.10%  | 
    
          | 
    
       50  | 
    
       w/o 
      patch  | 
    
       506.57  | 
    
       43.30%  | 
    
       70.30%  |  
  
    | 
       w/ FE 
      patch  | 
    
       79.56  | 
    
       100.00%  | 
    
       23.80%  | 
    
          | 
    
       w/ FE 
      patch  | 
    
       521.52  | 
    
       34.50%  | 
    
       57.70%  |  
  
    | 
       w/ BE 
      patch  | 
    
       72.43  | 
    
       100.00%  | 
    
       21.90%  | 
    
          | 
    
       w/ BE 
      patch  | 
    
       512.78  | 
    
       38.50%  | 
    
       54.40%  |  
  
    | 
       1472  | 
    
       w/o 
      patch  | 
    
       950.30  | 
    
       44.80%  | 
    
       22.40%  | 
    
          | 
    
       1472  | 
    
       w/o 
      patch  | 
    
       926.19  | 
    
       69.00%  | 
    
       32.90%  |  
  
    | 
       w/ FE 
      patch  | 
    
       949.32  | 
    
       46.00%  | 
    
       17.90%  | 
    
          | 
    
       w/ FE 
      patch  | 
    
       928.23  | 
    
       63.00%  | 
    
       24.40%  |  
  
    | 
       w/ BE 
      patch  | 
    
       951.57  | 
    
       51.10%  | 
    
       18.50%  | 
    
          | 
    
       w/ BE 
      patch  | 
    
       928.59  | 
    
       67.50%  | 
    
       24.80%  |  
  
    | 
       1500  | 
    
       w/o 
      patch  | 
    
       915.84  | 
    
       84.70%  | 
    
       42.40%  | 
    
          | 
    
       1500  | 
    
       w/o 
      patch  | 
    
       935.12  | 
    
       68.60%  | 
    
       33.70%  |  
  
    | 
       w/ FE 
      patch  | 
    
       908.94  | 
    
       88.30%  | 
    
       28.70%  | 
    
          | 
    
       w/ FE 
      patch  | 
    
       926.11  | 
    
       63.80%  | 
    
       24.80%  |  
  
    | 
       w/ BE 
      patch  | 
    
       904.00  | 
    
       88.90%  | 
    
       27.30%  | 
    
          | 
    
       w/ BE 
      patch  | 
    
       927.00  | 
    
       68.80%  | 
    
       24.60%  |  
  
    | 
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          |  
  
    | 
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          |  
  
    | 
       UDP Receive (Three Guest 
      VMs)  | 
    
          | 
    
       TCP Receive (Three Guest 
      VMs)  |  
  
    | 
       Packet Size 
      (bytes)  | 
    
       Test 
      Case  | 
    
       Throughput 
      (Mbps)  | 
    
       Dom0 CPU 
      Util  | 
    
       Guest CPU Total 
      Util  | 
    
          | 
    
       Packet Size 
      (bytes)  | 
    
       Test 
      Case  | 
    
       Throughput 
      (Mbps)  | 
    
       Dom0 CPU 
      Util  | 
    
       Guest CPU Total 
      Util  |  
  
    | 
       1472  | 
    
       w/o 
      patch  | 
    
       963.43  | 
    
       50.70%  | 
    
       41.10%  | 
    
          | 
    
       1472  | 
    
       w/o 
      patch  | 
    
       939.68  | 
    
       78.40%  | 
    
       64.00%  |  
  
    | 
       w/ FE 
      patch  | 
    
       964.47  | 
    
       51.00%  | 
    
       25.00%  | 
    
          | 
    
       w/ FE 
      patch  | 
    
       926.04  | 
    
       65.90%  | 
    
       31.80%  |  
  
    | 
       w/ BE 
      patch  | 
    
       963.07  | 
    
       55.60%  | 
    
       27.80%  | 
    
          | 
    
       w/ BE 
      patch  | 
    
       930.61  | 
    
       71.60%  | 
    
       34.80%  |  
  
    | 
       1500  | 
    
       w/o 
      patch  | 
    
       859.96  | 
    
       99.50%  | 
    
       73.40%  | 
    
          | 
    
       1500  | 
    
       w/o 
      patch  | 
    
       933.00  | 
    
       78.10%  | 
    
       63.30%  |  
  
    | 
       w/ FE 
      patch  | 
    
       861.19  | 
    
       97.40%  | 
    
       39.90%  | 
    
          | 
    
       w/ FE 
      patch  | 
    
       927.14  | 
    
       66.90%  | 
    
       31.90%  |  
  
    | 
       w/ BE 
      patch  | 
    
       860.92  | 
    
       98.90%  | 
    
       40.00%  | 
    
          | 
    
       w/ BE 
      patch  | 
    
       930.76  | 
    
       71.10%  | 
    
       34.80%  |  
  
    | 
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          |  
  
    | 
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          |  
  
    | 
       UDP Receive (Six Guest 
      VMs)  | 
    
          | 
    
       TCP Receive (Six Guest 
      VMs)  |  
  
    | 
       Packet Size 
      (bytes)  | 
    
       Test 
      Case  | 
    
       Throughput 
      (Mbps)  | 
    
       Dom0 CPU 
      Util  | 
    
       Guest CPU Total 
      Util  | 
    
          | 
    
       Packet Size 
      (bytes)  | 
    
       Test 
      Case  | 
    
       Throughput 
      (Mbps)  | 
    
       Dom0 CPU 
      Util  | 
    
       Guest CPU Total 
      Util  |  
  
    | 
       1472  | 
    
       w/o 
      patch  | 
    
       978.85  | 
    
       56.90%  | 
    
       59.20%  | 
    
          | 
    
       1472  | 
    
       w/o 
      patch  | 
    
       962.04  | 
    
       90.30%  | 
    
       104.00%  |  
  
    | 
       w/ FE 
      patch  | 
    
       975.05  | 
    
       53.80%  | 
    
       33.50%  | 
    
          | 
    
       w/ FE 
      patch  | 
    
       958.94  | 
    
       69.40%  | 
    
       43.70%  |  
  
    | 
       w/ BE 
      patch  | 
    
       974.71  | 
    
       59.50%  | 
    
       40.00%  | 
    
          | 
    
       w/ BE 
      patch  | 
    
       958.08  | 
    
       68.30%  | 
    
       48.00%  |  
  
    | 
       1500  | 
    
       w/o 
      patch  | 
    
       886.92  | 
    
       100.00%  | 
    
       87.20%  | 
    
          | 
    
       1500  | 
    
       w/o 
      patch  | 
    
       960.35  | 
    
       90.10%  | 
    
       103.70%  |  
  
    | 
       w/ FE 
      patch  | 
    
       902.02  | 
    
       96.90%  | 
    
       46.00%  | 
    
          | 
    
       w/ FE 
      patch  | 
    
       957.75  | 
    
       68.70%  | 
    
       42.80%  |  
  
    | 
       w/ BE 
      patch  | 
    
       894.57  | 
    
       98.90%  | 
    
       49.60%  | 
    
          | 
    
       w/ BE 
      patch  | 
    
       956.42  | 
    
       68.20%  | 
    
       48.50%  |  
  
    | 
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          |  
  
    | 
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          |  
  
    | 
       UDP Receive (Nine Guest 
      VMs)  | 
    
          | 
    
       TCP Receive (Nine Guest 
      VMs)  |  
  
    | 
       Packet Size 
      (bytes)  | 
    
       Test 
      Case  | 
    
       Throughput 
      (Mbps)  | 
    
       Dom0 CPU 
      Util  | 
    
       Guest CPU Total 
      Util  | 
    
          | 
    
       Packet Size 
      (bytes)  | 
    
       Test 
      Case  | 
    
       Throughput 
      (Mbps)  | 
    
       Dom0 CPU 
      Util  | 
    
       Guest CPU Total 
      Util  |  
  
    | 
       1472  | 
    
       w/o 
      patch  | 
    
       987.91  | 
    
       60.50%  | 
    
       70.00%  | 
    
          | 
    
       1472  | 
    
       w/o 
      patch  | 
    
       974.89  | 
    
       90.00%  | 
    
       110.60%  |  
  
    | 
       w/ FE 
      patch  | 
    
       988.30  | 
    
       56.60%  | 
    
       42.70%  | 
    
          | 
    
       w/ FE 
      patch  | 
    
       980.03  | 
    
       73.70%  | 
    
       55.40%  |  
  
    | 
       w/ BE 
      patch  | 
    
       986.58  | 
    
       61.80%  | 
    
       50.00%  | 
    
          | 
    
       w/ BE 
      patch  | 
    
       968.29  | 
    
       72.30%  | 
    
       60.20%  |  
  
    | 
       1500  | 
    
       w/o 
      patch  | 
    
       953.48  | 
    
       100.00%  | 
    
       93.80%  | 
    
          | 
    
       1500  | 
    
       w/o 
      patch  | 
    
       971.34  | 
    
       89.80%  | 
    
       109.60%  |  
  
    | 
       w/ FE 
      patch  | 
    
       904.17  | 
    
       98.60%  | 
    
       53.50%  | 
    
          | 
    
       w/ FE 
      patch  | 
    
       973.63  | 
    
       73.90%  | 
    
       54.70%  |  
  
    | 
       w/ BE 
      patch  | 
    
       905.52  | 
    
       100.00%  | 
    
       56.80%  | 
    
          | 
    
       w/ BE 
      patch  | 
    
       971.08  | 
    
       72.30%  | 
    
       61.00%  |    
  
VM send 
results: 
  
  
    | 
       UDP Send (Single Guest 
      VM)  | 
    
          | 
    
       TCP Send (Single Guest 
      VM)  |  
  
    | 
       Packet Size 
      (bytes)  | 
    
       Test 
      Case  | 
    
       Throughput 
      (Mbps)  | 
    
       Dom0 CPU 
      Util  | 
    
       Guest CPU Total 
      Util  | 
    
          | 
    
       Packet Size 
      (bytes)  | 
    
       Test 
      Case  | 
    
       Throughput 
      (Mbps)  | 
    
       Dom0 CPU 
      Util  | 
    
       Guest CPU Total 
      Util  |  
  
    | 
       1472  | 
    
       w/o 
      patch  | 
    
       949.84  | 
    
       56.50%  | 
    
       21.70%  | 
    
          | 
    
       1472  | 
    
       w/o 
      patch  | 
    
       932.16  | 
    
       71.50%  | 
    
       35.60%  |  
  
    | 
       w/ FE 
      patch  | 
    
       946.25  | 
    
       51.20%  | 
    
       20.10%  | 
    
          | 
    
       w/ FE 
      patch  | 
    
       932.09  | 
    
       66.90%  | 
    
       29.50%  |  
  
    | 
       w/ BE 
      patch  | 
    
       948.73  | 
    
       51.60%  | 
    
       19.70%  | 
    
          | 
    
       w/ BE 
      patch  | 
    
       932.54  | 
    
       66.20%  | 
    
       25.30%  |  
  
    | 
       1500  | 
    
       w/o 
      patch  | 
    
       912.46  | 
    
       87.00%  | 
    
       26.60%  | 
    
          | 
    
       1500  | 
    
       w/o 
      patch  | 
    
       929.91  | 
    
       72.60%  | 
    
       35.90%  |  
  
    | 
       w/ FE 
      patch  | 
    
       899.29  | 
    
       86.70%  | 
    
       26.20%  | 
    
          | 
    
       w/ FE 
      patch  | 
    
       931.63  | 
    
       66.70%  | 
    
       29.50%  |  
  
    | 
       w/ BE 
      patch  | 
    
       909.31  | 
    
       86.90%  | 
    
       25.90%  | 
    
          | 
    
       w/ BE 
      patch  | 
    
       932.83  | 
    
       66.20%  | 
    
       26.20%  |  
  
    | 
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          |  
  
    | 
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          | 
    
          |  
  
    | 
       UDP Send (Three Guest 
      VMs)  | 
    
          | 
    
       TCP Send (Three Guest 
      VMs)  |  
  
    | 
       Packet Size 
      (bytes)  | 
    
       Test 
      Case  | 
    
       Throughput 
      (Mbps)  | 
    
       Dom0 CPU 
      Util  | 
    
       Guest CPU Total 
      Util  | 
    
          | 
    
       Packet Size 
      (bytes)  | 
    
       Test 
      Case  | 
    
       Throughput 
      (Mbps)  | 
    
       Dom0 CPU 
      Util  | 
    
       Guest CPU Total 
      Util  |  
  
    | 
       1472  | 
    
       w/o 
      patch  | 
    
       972.66  | 
    
       57.60%  | 
    
       24.00%  | 
    
          | 
    
       1472  | 
    
       w/o 
      patch  | 
    
       955.92  | 
    
       70.40%  | 
    
       36.10%  |  
  
    | 
       w/ FE 
      patch  | 
    
       970.07  | 
    
       56.30%  | 
    
       23.30%  | 
    
          | 
    
       w/ FE 
      patch  | 
    
       946.39  | 
    
       72.90%  | 
    
       32.90%  |  
  
    | 
       w/ BE 
      patch  | 
    
       971.05  | 
    
       59.10%  | 
    
       23.10%  | 
    
          | 
    
       w/ BE 
      patch  | 
    
       949.80  | 
    
       70.30%  | 
    
       33.20%  |  
  
    | 
       1500  | 
    
       w/o 
      patch  | 
    
       943.87  | 
    
       93.50%  | 
    
       32.50%  | 
    
          | 
    
       1500  | 
    
       w/o 
      patch  | 
    
       966.06  | 
    
       73.00%  | 
    
       38.00%  |  
  
    | 
       w/ FE 
      patch  | 
    
       933.61  | 
    
       93.90%  | 
    
       30.00%  | 
    
          | 
    
       w/ FE 
      patch  | 
    
       947.23  | 
    
       72.50%  | 
    
       33.60%  |  
  
    | 
       w/ BE 
      patch  | 
    
       937.08  | 
    
       95.10%  | 
    
       31.00%  | 
    
          | 
    
       w/ BE 
      patch  | 
    
       948.74  | 
    
       72.20%  | 
    
       34.50%  |   Best 
Regards, -- Dongxiao  
  
  
  
 -----Original Message----- From: James Harper [mailto:james.harper@xxxxxxxxxxxxxxxx] Sent: 
Thursday, September 10, 2009 4:03 PM To: Xu, Dongxiao; 
xen-devel@xxxxxxxxxxxxxxxxxxx Subject: RE: [Xen-devel][PATCH][RFC] Using data 
polling mechanism in netfront toreplace event notification between netback and 
netfront
  > Hi, >       This is a VNIF 
optimization patch, need for your comments. Thanks! > > 
[Background]: >       One of the VNIF driver's 
scalability issues is the high event channel > frequency. It's highly 
related to physical NIC's interrupt frequency in dom0, > which could be 
20K HZ in some situation. The high frequency event channel > 
notification makes the guest and dom0 CPU utilization at a high value. > 
Especially for HVM PV driver, it brings high rate of interrupts, 
which could > cost a lot of CPU cycle. > 
      The attached patches have two parts: one part is 
for netback, and the > other is for netfront. The netback part is based 
on the latest PV-Ops Dom0, > and the netfront part is based on the 
2.6.18 HVM unmodified driver. >       This patch 
uses a timer in netfront to poll the ring instead of event > channel 
notification. If guest is transferring data, the timer will start > 
working and periodicaly send/receive data from ring. If guest is idle and 
no > data is transferring, the timer will stop working automatically. 
It will > restart again once there is new data transferring. > 
      We set a feature flag in xenstore to indicate 
whether the > netfront/netback support this feature. If there is only one 
side supporting > it, the communication mechanism will fall back to 
default, and the new feature > will not be used. The feature is enabled 
only when both sides have the flag > set in xenstore. > 
      One problem is the timer polling frequency. This 
netfront part patch is > based on 2.6.18 HVM unmodified driver, and in 
that kernel version, guest > hrtimer is not accuracy, so I use the 
classical timer. The polling frequency > is 1KHz. If rebase the 
netfront part patch to latest pv-ops, we could use > hrtimer 
instead. >
  I experimented with this in Windows too, but the timer 
resolution is too poor. I think you should also look at setting the 'event' 
parameter too. The current driver tells the backend to tell it as soon as 
there is a single packet ready to be notified (np->rx.sring->rsp_event 
= np->rx.rsp_cons + 1), but you could set it to a higher number and 
also use the timer, eg "tell me when there are 32 ring slots filled, or 
when the timer elapses". That way you should have less problems 
with overflows.
  Also, I don't think you need to tell the backend to 
stop notifying you, just don't set the 'event' field in the frontend and 
then RING_PUSH_RESPONSES_AND_CHECK_NOTIFY in the backend will not return 
that a notification is 
required.
  James
 
  
 
netbk_lowdown_evtchn_freq.patch 
Description: netbk_lowdown_evtchn_freq.patch 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
 
 |