diff options
author | faith <faith> | 2002-09-25 23:33:32 +0000 |
---|---|---|
committer | faith <faith> | 2002-09-25 23:33:32 +0000 |
commit | 1283d8627de251933c7d28a9fb15385e59eaf87f (patch) | |
tree | 6aecaced3b0282ed76e4b3ac2e28373a60729c57 | |
parent | f8d17e57cd60822e299e1e849bffcd75612da585 (diff) |
Update OProfile results section
-rw-r--r-- | xc/programs/Xserver/hw/dmx/doc/dmx.sgml | 96 |
1 files changed, 67 insertions, 29 deletions
diff --git a/xc/programs/Xserver/hw/dmx/doc/dmx.sgml b/xc/programs/Xserver/hw/dmx/doc/dmx.sgml index 70ab04300..9590b99b3 100644 --- a/xc/programs/Xserver/hw/dmx/doc/dmx.sgml +++ b/xc/programs/Xserver/hw/dmx/doc/dmx.sgml @@ -1492,7 +1492,7 @@ server specifically needs to make a call to guarantee interactivity. With this new system, X11 buffers protocol as much as possible during a 100mS interval, and many unnecessary XSync() calls are avoided. -<p>Out of more than 300 x11perf tests, 8 tests became more than 100 +<p>Out of more than 300 <tt/x11perf/ tests, 8 tests became more than 100 times faster, with 68 more than 50X faster, 114 more than 10X faster, and 181 more than 2X faster. See table below for summary. @@ -1517,7 +1517,7 @@ XSync() calls. The performance tests were run on a DMX system with only two back-end servers. Greater performance gains will be had as the number of back-end servers increases. -<p>Out of more than 300 x11perf tests, 3 tests were at least twice as +<p>Out of more than 300 <tt/x11perf/ tests, 3 tests were at least twice as fast, and 146 tests were at least 10% faster. Two tests were more than 10% slower with the offscreen optimization: <verb> @@ -1567,8 +1567,8 @@ resized, which is common in many window managers. servers. Greater performance gains will be had as the number of back-end servers increases. -<p>This optimization improved the following x11perf tests by more than -10%: +<p>This optimization improved the following <tt/x11perf/ tests by more +than 10%: <verb> 1.10 500x500 rectangle outline 1.12 Fill 100x100 stippled trapezoid (161x145 stipple) @@ -1603,8 +1603,8 @@ this optimization was rejected for the other rendering primitives. back-end servers. Greater performance gains will be had as the number of back-end servers increases. -<p>This optimization improved the following x11perf tests by more than -10%: +<p>This optimization improved the following <tt/x11perf/ tests by more +than 10%: <verb> 1.12 Fill 100x100 stippled trapezoid (161x145 stipple) 1.26 PutImage 10x10 square @@ -1625,17 +1625,17 @@ optimization: <sect2>Summary of x11perf Data -<p>With all of the optimizations on, 53 x11perf tests are more than 100X -faster than the unoptimized Phase II deliverable, with 69 more than 50X -faster, 73 more than 10X faster, and 199 more than twice as fast. No -tests were more than 10% slower than the unoptimized Phase II +<p>With all of the optimizations on, 53 <tt/x11perf/ tests are more than +100X faster than the unoptimized Phase II deliverable, with 69 more than +50X faster, 73 more than 10X faster, and 199 more than twice as fast. +No tests were more than 10% slower than the unoptimized Phase II deliverable. (Compared with the Phase I deliverable, only Circulate Unmapped window (100 kids) was more than 10% slower than the Phase II deliverable. As noted above, this test seems to have wider variability -than other x11perf tests.) +than other <tt/x11perf/ tests.) -<p>The following table summarizes relative x11perf test changes for all -optimizations individually and collectively. Note that some of the +<p>The following table summarizes relative <tt/x11perf/ test changes for +all optimizations individually and collectively. Note that some of the optimizations have a synergistic effect when used together. <verb> @@ -1984,22 +1984,60 @@ that is similar to that provided by <tt/gprof/, but without the necessity of recompiling the program with special instrumentation (i.e., OProfile can collect statistical profiling information about optimized programs). A test harness was developed to collect OProfile data for -each x11perf test. The results were examined by hand and were found to -correlate well with gprof data. However, they failed to reveal any -information that was helpful for optimization of Xdmx. - -The OProfile results for x11perf tests showed drawing, text, copying, -and image tests to be dominated (> 30%) by calls to Hash(), -SecurityLookupIDByClass(), SecurityLookupIDByType(), and -StandardReadRequestFromClient(). Some of these tests also spent -significant time in WaitForSomething(). In contrast, the window tests -spent significant time in SecurityLookupIDByType(), Hash(), -StandardReadRequestFromClient(), but also spent significant time in -other routines, such as ConfigureWindow(). Some time was spent looking -at Hash() and the LookupID functions, but optimizations in these -routines do not lead to a dramatic increase in <tt/x11perf/ performance. -Since these routines are in the dix layer and are not specific to DMX, -work based on the OProfile results has been deferred. +each <tt/x11perf/ test. + +<p>Test runs were performed using the RETIRED_INSNS counter on the AMD +Athlon and the CPU_CLK_HALTED counter on the Intel Pentium III (with a +test configuration different from the one described above). We are +continuing to examine OProfile output and to compare it with <tt/gprof/ +output. This investigation is ongoing and has not yet produced results +that yield performance increases in <tt/x11perf/ numbers. However, we +will continue this investigation and provide addition information as +necessary. + +%<sect3>Retired Instructions + +%<p>The initial tests using OProfile were done using the RETIRED_INSNS +%counter with DMX running on the dual-processor AMD Athlon machine -- the +%same test configuration that was described above and that was used for +%other tests. The RETIRED_INSNS counter counts retired instructions and +%showed drawing, text, copying, and image tests to be dominated (> +%30%) by calls to Hash(), SecurityLookupIDByClass(), +%SecurityLookupIDByType(), and StandardReadRequestFromClient(). Some of +%these tests also executed significant instructions in +%WaitForSomething(). + +%<p>In contrast, the window tests executed significant +%instructions in SecurityLookupIDByType(), Hash(), +%StandardReadRequestFromClient(), but also executed significant +%instructions in other routines, such as ConfigureWindow(). Some time +%was spent looking at Hash() function, but optimizations in this routine +%did not lead to a dramatic increase in <tt/x11perf/ performance. + +%<sect3>Clock Cycles + +%<p>Retired instructions can be misleading because Intel/AMD instructions +%execute in variable amounts of time. The OProfile tests were repeated +%using the Intel CPU_CLK_HALTED counter with DMX running on the second +%back-end machine. Note that this is a different test configuration that +%the one described above. However, these tests show the amount of time +%(as measured in CPU cycles) that are spent in each routine. Because +%<tt/x11perf/ was running on the first back-end machine and because +%window optimizations were on, the load on the second back-end machine +%was not significant. + +%<p>Using CPU_CLK_HALTED, DMX showed simple drawing +%tests spending more than 10% of their time in +%StandardReadRequestFromClient(), with significant time (> 20% total) +%spent in SecurityLookupIDByClass(), WaitForSomething(), and Dispatch(). +%For these tests, < 5% of the time was spent in Hash(), which explains +%why optimizing the Hash() routine did not impact <tt/x11perf/ results. + +%<p>The trapezoid, text, scrolling, copying, and image tests were +%dominated by time in ProcFillPoly(), PanoramiXFillPoly(), dmxFillPolygon(), +%SecurityLookupIDByClass(), SecurityLookupIDByType(), and +%StandardReadRequestFromClient(). Hash() time was generally above 5% but +%less than 10% of total time. <sect2>X Test Suite |