From 1283d8627de251933c7d28a9fb15385e59eaf87f Mon Sep 17 00:00:00 2001 From: faith Date: Wed, 25 Sep 2002 23:33:32 +0000 Subject: Update OProfile results section --- xc/programs/Xserver/hw/dmx/doc/dmx.sgml | 96 +++++++++++++++++++++++---------- 1 file changed, 67 insertions(+), 29 deletions(-) diff --git a/xc/programs/Xserver/hw/dmx/doc/dmx.sgml b/xc/programs/Xserver/hw/dmx/doc/dmx.sgml index 70ab04300..9590b99b3 100644 --- a/xc/programs/Xserver/hw/dmx/doc/dmx.sgml +++ b/xc/programs/Xserver/hw/dmx/doc/dmx.sgml @@ -1492,7 +1492,7 @@ server specifically needs to make a call to guarantee interactivity. With this new system, X11 buffers protocol as much as possible during a 100mS interval, and many unnecessary XSync() calls are avoided. -

Out of more than 300 x11perf tests, 8 tests became more than 100 +

Out of more than 300 Out of more than 300 x11perf tests, 3 tests were at least twice as +

Out of more than 300 @@ -1567,8 +1567,8 @@ resized, which is common in many window managers. servers. Greater performance gains will be had as the number of back-end servers increases. -

This optimization improved the following x11perf tests by more than -10%: +

This optimization improved the following 1.10 500x500 rectangle outline 1.12 Fill 100x100 stippled trapezoid (161x145 stipple) @@ -1603,8 +1603,8 @@ this optimization was rejected for the other rendering primitives. back-end servers. Greater performance gains will be had as the number of back-end servers increases. -

This optimization improved the following x11perf tests by more than -10%: +

This optimization improved the following 1.12 Fill 100x100 stippled trapezoid (161x145 stipple) 1.26 PutImage 10x10 square @@ -1625,17 +1625,17 @@ optimization: Summary of x11perf Data -

With all of the optimizations on, 53 x11perf tests are more than 100X -faster than the unoptimized Phase II deliverable, with 69 more than 50X -faster, 73 more than 10X faster, and 199 more than twice as fast. No -tests were more than 10% slower than the unoptimized Phase II +

With all of the optimizations on, 53 The following table summarizes relative x11perf test changes for all -optimizations individually and collectively. Note that some of the +

The following table summarizes relative @@ -1984,22 +1984,60 @@ that is similar to that provided by 30%) by calls to Hash(), -SecurityLookupIDByClass(), SecurityLookupIDByType(), and -StandardReadRequestFromClient(). Some of these tests also spent -significant time in WaitForSomething(). In contrast, the window tests -spent significant time in SecurityLookupIDByType(), Hash(), -StandardReadRequestFromClient(), but also spent significant time in -other routines, such as ConfigureWindow(). Some time was spent looking -at Hash() and the LookupID functions, but optimizations in these -routines do not lead to a dramatic increase in Test runs were performed using the RETIRED_INSNS counter on the AMD +Athlon and the CPU_CLK_HALTED counter on the Intel Pentium III (with a +test configuration different from the one described above). We are +continuing to examine OProfile output and to compare it with Retired Instructions + +%

The initial tests using OProfile were done using the RETIRED_INSNS +%counter with DMX running on the dual-processor AMD Athlon machine -- the +%same test configuration that was described above and that was used for +%other tests. The RETIRED_INSNS counter counts retired instructions and +%showed drawing, text, copying, and image tests to be dominated (> +%30%) by calls to Hash(), SecurityLookupIDByClass(), +%SecurityLookupIDByType(), and StandardReadRequestFromClient(). Some of +%these tests also executed significant instructions in +%WaitForSomething(). + +%

In contrast, the window tests executed significant +%instructions in SecurityLookupIDByType(), Hash(), +%StandardReadRequestFromClient(), but also executed significant +%instructions in other routines, such as ConfigureWindow(). Some time +%was spent looking at Hash() function, but optimizations in this routine +%did not lead to a dramatic increase in Clock Cycles + +%

Retired instructions can be misleading because Intel/AMD instructions +%execute in variable amounts of time. The OProfile tests were repeated +%using the Intel CPU_CLK_HALTED counter with DMX running on the second +%back-end machine. Note that this is a different test configuration that +%the one described above. However, these tests show the amount of time +%(as measured in CPU cycles) that are spent in each routine. Because +%Using CPU_CLK_HALTED, DMX showed simple drawing +%tests spending more than 10% of their time in +%StandardReadRequestFromClient(), with significant time (> 20% total) +%spent in SecurityLookupIDByClass(), WaitForSomething(), and Dispatch(). +%For these tests, < 5% of the time was spent in Hash(), which explains +%why optimizing the Hash() routine did not impact The trapezoid, text, scrolling, copying, and image tests were +%dominated by time in ProcFillPoly(), PanoramiXFillPoly(), dmxFillPolygon(), +%SecurityLookupIDByClass(), SecurityLookupIDByType(), and +%StandardReadRequestFromClient(). Hash() time was generally above 5% but +%less than 10% of total time. X Test Suite -- cgit v1.2.3