summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorfaith <faith>2002-09-25 23:33:32 +0000
committerfaith <faith>2002-09-25 23:33:32 +0000
commit1283d8627de251933c7d28a9fb15385e59eaf87f (patch)
tree6aecaced3b0282ed76e4b3ac2e28373a60729c57
parentf8d17e57cd60822e299e1e849bffcd75612da585 (diff)
Update OProfile results section
-rw-r--r--xc/programs/Xserver/hw/dmx/doc/dmx.sgml96
1 files changed, 67 insertions, 29 deletions
diff --git a/xc/programs/Xserver/hw/dmx/doc/dmx.sgml b/xc/programs/Xserver/hw/dmx/doc/dmx.sgml
index 70ab04300..9590b99b3 100644
--- a/xc/programs/Xserver/hw/dmx/doc/dmx.sgml
+++ b/xc/programs/Xserver/hw/dmx/doc/dmx.sgml
@@ -1492,7 +1492,7 @@ server specifically needs to make a call to guarantee interactivity.
With this new system, X11 buffers protocol as much as possible during a
100mS interval, and many unnecessary XSync() calls are avoided.
-<p>Out of more than 300 x11perf tests, 8 tests became more than 100
+<p>Out of more than 300 <tt/x11perf/ tests, 8 tests became more than 100
times faster, with 68 more than 50X faster, 114 more than 10X faster,
and 181 more than 2X faster. See table below for summary.
@@ -1517,7 +1517,7 @@ XSync() calls. The performance tests were run on a DMX system with only
two back-end servers. Greater performance gains will be had as the
number of back-end servers increases.
-<p>Out of more than 300 x11perf tests, 3 tests were at least twice as
+<p>Out of more than 300 <tt/x11perf/ tests, 3 tests were at least twice as
fast, and 146 tests were at least 10% faster. Two tests were more than
10% slower with the offscreen optimization:
<verb>
@@ -1567,8 +1567,8 @@ resized, which is common in many window managers.
servers. Greater performance gains will be had as the number of
back-end servers increases.
-<p>This optimization improved the following x11perf tests by more than
-10%:
+<p>This optimization improved the following <tt/x11perf/ tests by more
+than 10%:
<verb>
1.10 500x500 rectangle outline
1.12 Fill 100x100 stippled trapezoid (161x145 stipple)
@@ -1603,8 +1603,8 @@ this optimization was rejected for the other rendering primitives.
back-end servers. Greater performance gains will be had as the number
of back-end servers increases.
-<p>This optimization improved the following x11perf tests by more than
-10%:
+<p>This optimization improved the following <tt/x11perf/ tests by more
+than 10%:
<verb>
1.12 Fill 100x100 stippled trapezoid (161x145 stipple)
1.26 PutImage 10x10 square
@@ -1625,17 +1625,17 @@ optimization:
<sect2>Summary of x11perf Data
-<p>With all of the optimizations on, 53 x11perf tests are more than 100X
-faster than the unoptimized Phase II deliverable, with 69 more than 50X
-faster, 73 more than 10X faster, and 199 more than twice as fast. No
-tests were more than 10% slower than the unoptimized Phase II
+<p>With all of the optimizations on, 53 <tt/x11perf/ tests are more than
+100X faster than the unoptimized Phase II deliverable, with 69 more than
+50X faster, 73 more than 10X faster, and 199 more than twice as fast.
+No tests were more than 10% slower than the unoptimized Phase II
deliverable. (Compared with the Phase I deliverable, only Circulate
Unmapped window (100 kids) was more than 10% slower than the Phase II
deliverable. As noted above, this test seems to have wider variability
-than other x11perf tests.)
+than other <tt/x11perf/ tests.)
-<p>The following table summarizes relative x11perf test changes for all
-optimizations individually and collectively. Note that some of the
+<p>The following table summarizes relative <tt/x11perf/ test changes for
+all optimizations individually and collectively. Note that some of the
optimizations have a synergistic effect when used together.
<verb>
@@ -1984,22 +1984,60 @@ that is similar to that provided by <tt/gprof/, but without the
necessity of recompiling the program with special instrumentation (i.e.,
OProfile can collect statistical profiling information about optimized
programs). A test harness was developed to collect OProfile data for
-each x11perf test. The results were examined by hand and were found to
-correlate well with gprof data. However, they failed to reveal any
-information that was helpful for optimization of Xdmx.
-
-The OProfile results for x11perf tests showed drawing, text, copying,
-and image tests to be dominated (> 30%) by calls to Hash(),
-SecurityLookupIDByClass(), SecurityLookupIDByType(), and
-StandardReadRequestFromClient(). Some of these tests also spent
-significant time in WaitForSomething(). In contrast, the window tests
-spent significant time in SecurityLookupIDByType(), Hash(),
-StandardReadRequestFromClient(), but also spent significant time in
-other routines, such as ConfigureWindow(). Some time was spent looking
-at Hash() and the LookupID functions, but optimizations in these
-routines do not lead to a dramatic increase in <tt/x11perf/ performance.
-Since these routines are in the dix layer and are not specific to DMX,
-work based on the OProfile results has been deferred.
+each <tt/x11perf/ test.
+
+<p>Test runs were performed using the RETIRED_INSNS counter on the AMD
+Athlon and the CPU_CLK_HALTED counter on the Intel Pentium III (with a
+test configuration different from the one described above). We are
+continuing to examine OProfile output and to compare it with <tt/gprof/
+output. This investigation is ongoing and has not yet produced results
+that yield performance increases in <tt/x11perf/ numbers. However, we
+will continue this investigation and provide addition information as
+necessary.
+
+%<sect3>Retired Instructions
+
+%<p>The initial tests using OProfile were done using the RETIRED_INSNS
+%counter with DMX running on the dual-processor AMD Athlon machine -- the
+%same test configuration that was described above and that was used for
+%other tests. The RETIRED_INSNS counter counts retired instructions and
+%showed drawing, text, copying, and image tests to be dominated (&gt;
+%30%) by calls to Hash(), SecurityLookupIDByClass(),
+%SecurityLookupIDByType(), and StandardReadRequestFromClient(). Some of
+%these tests also executed significant instructions in
+%WaitForSomething().
+
+%<p>In contrast, the window tests executed significant
+%instructions in SecurityLookupIDByType(), Hash(),
+%StandardReadRequestFromClient(), but also executed significant
+%instructions in other routines, such as ConfigureWindow(). Some time
+%was spent looking at Hash() function, but optimizations in this routine
+%did not lead to a dramatic increase in <tt/x11perf/ performance.
+
+%<sect3>Clock Cycles
+
+%<p>Retired instructions can be misleading because Intel/AMD instructions
+%execute in variable amounts of time. The OProfile tests were repeated
+%using the Intel CPU_CLK_HALTED counter with DMX running on the second
+%back-end machine. Note that this is a different test configuration that
+%the one described above. However, these tests show the amount of time
+%(as measured in CPU cycles) that are spent in each routine. Because
+%<tt/x11perf/ was running on the first back-end machine and because
+%window optimizations were on, the load on the second back-end machine
+%was not significant.
+
+%<p>Using CPU_CLK_HALTED, DMX showed simple drawing
+%tests spending more than 10% of their time in
+%StandardReadRequestFromClient(), with significant time (&gt; 20% total)
+%spent in SecurityLookupIDByClass(), WaitForSomething(), and Dispatch().
+%For these tests, &lt; 5% of the time was spent in Hash(), which explains
+%why optimizing the Hash() routine did not impact <tt/x11perf/ results.
+
+%<p>The trapezoid, text, scrolling, copying, and image tests were
+%dominated by time in ProcFillPoly(), PanoramiXFillPoly(), dmxFillPolygon(),
+%SecurityLookupIDByClass(), SecurityLookupIDByType(), and
+%StandardReadRequestFromClient(). Hash() time was generally above 5% but
+%less than 10% of total time.
<sect2>X Test Suite