1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
|
This document describes the XAA (XFree86 Acceleration Architecture),
which is the new acceleration interface for the SVGA server (but not
limited to the SVGA server).
This code is not at all dependent on the SVGA server, but does
assume linear addressing at > 8bpp. It might be extendable to an
mi-based set up for configurations that can't use cfb. There are
still configurations around that need banked support for 16bpp.
To use the new acceleration interface, write low-level functions
like the sampledrv.c and ark_accel.c and call the ChipInitAccel()
function before screen initialization (from FbInit in a SVGA
driver, for example).
You're welcome to comment, test, debug, or add to this code.
Have fun...
Harm Hanemaayer
H.Hanemaayer@inter.nl.net
Here's a list of known problems (roughly in order of importance). If you
can confirm a problem using the lastest version, please do so.
- I've seen crashes when using Netscape related to stipple functions.
These might be caused by the "fall-back" logic still getting it
wrong. It seems to be triggered by a call of
vga8256FillRectTransparentStippled32. Fixed by mod 186?
- The "NonTE" text acceleration triggers core dumps (related to an invalid
fall-back function scheme in ValidateGC). It might also trigger lock-ups
(which would point towards a problem in NonTE text color expansion).
These functions are currently disabled.
- Color expanded (monochrome) 8x8 pattern is may not be working correctly
yet in all cases (not fully tested).
- The disabled non-terminal emulator font acceleration is suspect, I
don't think it handles horizontally overlapping characters correctly
(no visible evidence yet) in the xf86DrawNonTETextScanline function.
I don't know enough about the X font parameters to correctly
implement it.
- The SCANLINE_PAD_BYTE and SCANLINE_NO_PAD text transfer code for CPU
to screen color expansion has not been fully tested, nor has the
FIXED_BASE support.
- The pattern fill primitives are taken to have the same graphics operation
restrictions (planemask, rop etc) as ScreenToScreenCopy.
- The support for TRIPLE_BITS_24BPP has improved, but it has not yet
been fully tested.
- For color expansion implementation of stipples the graphics operation
restriction of color expansion are not honoured, but instead the
CopyArea ones are used. This is now sort-of fixed, but it has not been
tested in relevant cases.
- Instead of not accelerating GXinvert operations that would normally
access the source, we could instead to a GXinvert FillRectSolid.
When the server crashes, run 'gdb -c core XF86_SVGA' and print a
back-trace ('backtrace').
Change Log:
218. As well as GXinvert, also avoid GXclear, GXnoop, and GXset.
217. Don't accelerate functions that use source bitmap data (such as text,
stipples, bitmaps) when the raster-op is GXinvert.
216. Truncate pixel values to pixel depth in ValidateGC.
XFree86 3.2v
215. Rotate monochrome patterns stored in video memory in opposite
direction (David Bateman). I doubt whether this is correct.
214. Add FullPlanemask field to xf86AccelInfoRec, and use it for planemask
checks.
213. Use GC alu instead of cfb reduced "rrop" when checking raster-op
restrictions.
212. Add secondary restriction flag hack for stippled rectangles to
correctly handle different restrictions for pixmap cache and color
expansion stipple acceleration.
211. Move macros for graphics operations restriction checks to from
xf86gcmisc.c to xf86local.h.
210. Fix the check for server resets in xf86initac.c and xf86scrin.c (use
serverGeneration instead of xf86Resetting).
XFree86 3.2u
209. Fix monochrome pattern stored in video memory with PROGRAMMED_ORIGIN
and SCREEN_ORIGIN (Corin Anderson).
XFree86 3.2s
208. Respect CapStyle when using TwoPointLine for a non-clipped segment.
207. Fix line clipping when hardware clipping is used with multiple
clipping regions (Xavier Ducoin).
206. Add a hack to counter cfb cheating in PolyGlyphBlt when it does not
call ValidateGC when changing the foreground color to fill in the
background (affecting RGB_EQUAL).
205. Remove left-over fall-back tile function setting code in xf86gcmisc.c
that may have caused problems.
204. At ValidateGC time, take note of background color changes for
evaluation of RGB_EQUAL restrictions.
203. Add support for a monochrome pattern with PROGRAMMED_BITS that
needs to be rotated in software, so that all possible monochrome
pattern variations are now supported.
202. Add NO_TEXT_COLOR_EXPANSION flag.
201. Fix bitmap (CopyPlane1ToN) color expansion acceleration at 24bpp with
TRIPLE_BITS_24BPP defined.
200. Fix support for ScanlineScreenToScreenColorExpand with
TRIPLE_BITS_24BPP defined.
199. Fix the case of a monochrome 8x8 pattern stored in video memory.
The code was not consistently assuming that the patternx
coordinate is in units of "bits" (David Bateman).
198. Invalidate the pixmap cache when VT-switching back (suggested by
Andrew Vanderstock).
197. If color expansion is used for stipples, say so in the start-up
messages.
196. Indicate in start-up messages whether 8x8 pattern fill is actually
usable.
195. Better RGB_EQUAL checks for text acceleration with TRIPLE_BITS_24BPP
(David Bateman).
194. Do not accelerate lines with non-FillSolid fill style. Stippled lines
were rendered incorrectly as solid lines.
XFree86 3.2r
193. Check RGB_EQUAL for text acceleration with TRIPLE_BITS_24BPP
(David Bateman).
192. Honour RGB_EQUAL when deciding CopyPlane1To24 acceleration in
xf86plane.c.
191. Fix bugs in handling of left edge in CPU-to-screen color expansion
of bitmaps.
190. When the server resets, don't execute the start-up benchmarks.
189. When the server resets, don't execute the main part of the
xf86GCInfoRec and xf86AccelInfoRec initialization code, which
depends on default values for some fields.
188. When there is any kind of accelerated stippled rectangle fill, also
use it for stippled spans.
187. Add 10x10 CPU-to-screen color expansion benchmark.
186. Remove left-over broken fall-back stipple function setting code
in xf86gcmisc.c, probably fixing crashes.
185. Implement "no_pixmap_cache", and new "xaa_benchmark" and
"xaa_no_color_exp" server flags.
184. Only print detailed messages when xf86Verbose is TRUE.
183. Implement color expansion acceleration of stipple-filled rectangles
in xf86stip.c. Requires SCANLINE_PAD_DWORD for CPU-to-screen color
expansion, and does not support TRIPLE_BITS_24BPP.
182. Fix SCANLINE_NO_PAD CPU-to-screen color expansion by not defining
the flag definition as zero (Koen Gadeyne).
181. Add LEFT_EDGE_CLIPPING_NEGATIVE_X color expansion flag.
180. Fix potential bug in handling of LEFT_EDGE_CLIPPING.
179. Add new file xf86tables.c with byte expansion tables for
TRIPLE_BITS_24BPP.
178. Support TRIPLE_BITS_24BPP for bitmap color expansion, with the
exception of non-DWORD scanline padding of CPU-to-screen color
expansion.
177. Fix a probable bug in CPU-to-screen bitmap color expansion in
MSB-first mode without left edge clipping.
176. When checking for hardware pattern usage for tiles, prefer the
color expand (monochrome) pattern.
175. Improve TRIPLE_BITS_24BPP support with color expansion enabled for
text using screen-to-screen color color expansion or CPU-to-screen
with SCANLINE_PAD_DWORD.
174. Add TRANSPARENCY_GXCOPY graphics operation flag, and take it into
consideration for ScreenToScreenCopy with transparency.
XFree86 3.2q
173. Fix infinite loop in CPU_TRANSFER_BASE_FIXED color expansion
(Xavier Ducoin).
172. Fix color expansion benchmark when TRIPLE_BITS_24BPP is defined
(David Bateman).
171. When using the monochrome pattern, use an existing cache entry when
the stipple is the same but the colors are different.
170. Fix bugs in pattern handling code.
169. Fix a bug in the MSB-first version of the Pentium-optimized text
bitmap transfer functions (David Bateman).
168. Fully implement detection of tiles that only use two colors in order
to use a monochrome (color-expand) hardware pattern.
167. Potentially fix the case of rotated monochrome patterns stored video
memory.
166. Change start-up messages a little.
165. Add support for 8x8 hardware pattern with SCREEN_ORIGIN in addition
to PROGRAMMED_ORIGIN, and take into account the bit order when
PROGRAMMED_BITS is defined (Radek).
164. Mark pixmaps that are found to be unsuitable for caching.
163. Make caching of transparent stipples possible for chips that don't
have ScreenToScreenCopy with transparency but do have a color-expand
pattern fill that supports transparency.
162. Add low-level benchmarks for 10x10 pattern fill.
161. Use 64-bit access on DEC alpha in CPU-to-framebuffer bandwidth
benchmark.
160. Add the HARDWARE_PATTERN_MONO_TRANSPARENCY flag to further
differentiate between mono pattern and regular color expansion.
159. Use depth instead of bitsperpixel in planemask check for line
rectangles.
158. Reorganize the "cfbGetLongWidthAndPointer" function.
157. Support the HARDWARE_PATTERN_PROGRAMMED_ORIGIN for non-color
expanded patterns (including patterns that must be aligned on a 64
pixel boundary) and for color-expanded patterns that are stored
in video memory.
156. Various fixes for the cfb8 (non-vga256) support.
155. At start-up, display a message when no acceleration primitives
are defined.
154. Add ScratchBufferBase field to support scanline screen-to-screen
color expansion without a linear framebuffer.
153. Support multiple buffers for scanline screen-to-screen color
expansion. Adds the PingPongBuffers field.
XFree86 3.2n
152. Add HARDWARE_PATTERN_BIT_ORDER_MSBFIRST flag to differentiate between
mono hardware pattern and regular color expansion.
151. Fix missing fields in xf86defs.c.
150. Delay the actual initialization of the pixmap cache until general XAA
initialization. A server InfoRec field and pixmap cache memory boundary
fields are added to the xf86AccelInfoRec. This also eliminates the
dependency of the XAA code on vga256 (the SVGA server).
149. Lift the SCANLINE_PAD_DWORD requirement in xf86initac.c for the
enabling of text color expansion.
148. Add functions in xf86expblt.c for whole text bitmap transfer in the
case of BYTE padding or no padding at the end of scanlines, and
support this in xf86text.c.
147. Add a cfb8-based layer to support stand-alone servers not using
vga256.
146. Enable fixed-base CPU-to-screen color-expansion for bitmap and TE text.
145. Add FIXEDBASE support to color expansion functions in xf86expblt.c.
144. Add the HARDWARE_PATTERN_PROGRAMMED_BITS and HARDWARE_PATTERN_
PROGRAMMED_ORIGIN flags, and implement 8x8 mono pattern code
used when both flags are set.
143. Don't use scanline-byte-padded CPU to screen color expansion since
it doesn't work.
142. Use MSB-first versions of Pentium-optimized text transfer functions
when required.
141. Finally fix the flawed fall-back function schemes, potentially fixing
crashes associated with some span and rectangle fills. Still not stable.
140. More heavily unroll the CPU to framebuffer benchmark code (mainly
for the Cyrix 6x86).
XFree86 3.2g
139. Enable 24bpp CopyPlane acceleration (Dirk).
Version of 17 December 1996 (XFree86 3.2f)
138. Enable the Pentium-optimized text transfer functions for 6 and 8-pixel
wide fonts.
137. Fix a problem with accelerated horizontal and vertical lines clashing
with framebuffer lines.
136. In some places, fix the "NO_PLANEMASK" check to only check bits up to
the actual depth (rather than PMSK).
135. Avoid recursive xf86miFillRectStippledFallBack call.
134. When the new HARDWARE_PATTERN_MOD_64_OFFSET flag is set, do use
the hardware pattern when the framebuffer width guarantees
corrects alignment.
133. Avoid unaligned accesses in xf86expblt.c for DEC Alpha. Not tested.
What's a mem_barrier? Do we need them?
132. Do not use 8x8 pattern stipple fill at 24bpp because there's no
pixmap-to-pixmap CopyPlane1ToN primitive.
131. Allow CopyPlane1ToN to be accelerated at 24bpp.
130. Remove messages printed when xf86miStippleFallBack is called.
129. Add FillRectSolid graphics operation restriction flags to the line
draw restrictions in xf86initac.c, fixing a problem with planemask
restrictions not being honoured for some lines.
128. Add some Pentium-optimized text bitmap transfer functions in
xf86txtblt.s, but they are not used yet.
XFree86 3.2e
127. ImakefileBPP renamed to Imakefile.BPP
Version 0.4f (XFree86 3.2d) (28 November 1996)
126. Fix typo in xf86frect.c, pixmap-cache re-enabled (Alan).
125. Disable Non-TE text acceleration.
124. Add text fall-back functions in xf86gcmisc.c.
123. Fix ImakefileBPP.
Version 0.4e (XFree86 3.2c) (24 November 1996)
122. Fix some problems with drawing of Non-TE of text strings.
121. Fix compilation problem in xf86spans.c (Takaaki Nomura).
120. Add sanity checks in PolyFillRect and FillSpans for <= 0 specified
rects or spans.
119. Make sure the cfb fall-back text functions are initialized correctly.
This problem showed up when non-TE text acceleration was added.
118. Add the TWO_POINT_LINE_ERROR_TERM flag, but don't implement it yet.
117. Implement a stipple bitmap scanline function in xf86expblt.c for
future use in color expansion stipple acceleration.
116. Implement color expansion text acceleration for non-terminal emulator
fonts. Fix non-TE text scanline function in xf86expblt.c.
115. Add NO_SYNC_AFTER_CPU_COLOR_EXPAND flag.
114. When cfb MatchCommon is succesful in ValidateGC, make sure the
devPrivate.val is still correct. Fixes memory leak.
113. Fix bug in xf86PolyFillRect. This does not fix the pixmap cache.
Version 0.4d (XFree86 3.2a) (18 November 1996)
112. Prepare for integration into source tree (hw/xfree86/xaa/*).
111. Move the declaration of xf86PixmapIndex into xf86initac.c.
110. Screeninit functions renamed; vgabpp.h renamed to xf86scrin.h.
109. Cosmetic changes in preparation for integration into source tree.
108. Rename xf86gc.h to xf86xaa.h, and modify some long filenames.
107. In ValidateGC, correctly handle the case of the drawable of an
on-screen GC being changed to a pixmap.
106. Fix a problem with byte-padded CPU to screen color expansion in
xf86bitmap.c.
105. Initialize CPUToScreenColorExpandRange to default value of 64K
if it is not defined.
104. Fix missing cfb stipple function mappings in vga256map.h.
Version 0.4c (15 November 1996)
103. Really fix the CPU-to-screen color expansion benchmark.
102. Disable the accidently enabled debugging on-screen pixmap cache.
Version 0.4b (15 November 1996)
101. Add the UsingVGA256 flag to the xf86AccelInfoRec, and use this
to adjust the address pointer for low-level line fall-backs for
vga256 so that the non-bank checking versions will be used when
linear addressing is enabled (implemented in cfb8GetLongWidthAndPointer
in xf86im.c).
100. Add general line acceleration for chips that can only accelerate
horizontal/vertical lines using FillRectSolid and for chips that
only have TwoPointLine without fool-proof hardware clipping.
99. Fix crash with line rectangles when the raster-op is not GXcopy.
98. Change the 8x8 pattern benchmark a little.
97. Add an aligned screen copy (scroll) test to the low-level benchmarks,
and remove the transparent color expansion tests.
96. Fix related type warnings in xf86bitmap.c (Radek).
95. Really fix the initialization of CPUToScreenColorExpandEndMarker
(Radek).
94. Fix the initialization of CPUToScreenColorExpandEndMarker in
xf86initacl.c
93. Fix problems with small patterns when using 8x8 hardware pattern fill.
92. Fix for CopyPlane1to32 (resolves olvwm crash at 32bpp).
91. Fix the CPU to screen color expansion benchmark (Radek).
90. Use the accelerated FillPolygonSolid from the GCInfoRec in ValidateGC.
89. In xf86orect.c, use cfbGCGetPrivate().
88. Add monochrome 8x8 tile detection (not used yet).
87. Fix external byte_reversed declaration in xf86expblt.c and
xf86pcache.c (fixes problem with 8x8 color expanded pattern).
86. Fix xf86expblt.c inline asm for different OSs (Takaaki Nomura).
85. Support xf86bench.c on different OSs (Akio Morita).
84. Fix a bug in the color expanded 8x8 pattern code.
83. In ReduceTileToSize8, don't give up when not using 8bpp.
82. Cosmetic changes to sampledrv.c.
81. Improve the start-up messages.
80. In the benchmark routines, avoid memset().
79. Lift the LSBFIRST requirement for buffered screen-to-screen color
expansion.
Version 0.4a (7 November 1996)
78. Correct xf86AccelInfoRec.BitsPerPixel for 24bpp.
77. Add ONLY_LEFT_TO_RIGHT_BITBLT for chips that only support screen-to-
screen BitBLTs with xdir = 1, and support this in CopyArea.
76. Make decisions in InitAccel about whether specified CPU-to-screen
color expansion memory range is large enough.
75. Add FramebufferWidth (equivalent to infoRec.displayWidth).
74. Add CPUToScreenColorExpandRange, which is taken into account after
each scanline in text and bitmap color-expansion operations
(CPUToScreenColorExpandEndMarker is derived from it).
73. Only allow text CPU-to-screen color expansion with SCANLINE_PAD_DWORD
defined.
72. If the CPUToScreenColorExpandBase isn't initialized, use the
start of the framebuffer as color expansion base address.
71. Fix bug in DrawNonTETextScanline (a function not yet used).
70. Add ONLY_TWO_BITBLT_DIRECTIONS for chips that only support screen-to-
screen BitBLTs with xdir = ydir, and support this in CopyArea.
69. Add VIDEO_SOURCE_GRANULARITY_DWORD flag for color expansion.
68. Fix cfbPushPixels8 name mapping for vga256. This was probably causing
most of the stability problems.
Version 0.4 (5 November 1996)
67. Add support for color-expanded 8x8 hardware patterns (untested).
66. Fix a bug in FillSpansSolid that caused some spans to be drawn
at the wrong position (Radek).
65. In FillSpansSolid, correctly handle the case of no spans remaining
after clipping.
64. When doing the raster-op precomputations for cfb in ValidateGC,
don't clear the flag indicating that the raster-op has changed
since we must still evaluate accelerated functions.
63. Correctly modify devPrivate.val when new GC ops are created in
ValidateGC.
62. Reduce tiles to 8x8 pixels if possible.
61. Set the USE_TWO_POINT_LINE flag if appropriate.
60. Reduce stipples to 8x8 pixels if possible.
59. Add start-up benchmark timings for low-level primitives.
58. Reduce stipples and tiles to 8 pixels wide if possible in order to
use the 8x8 hardware pattern.
57. Add 8x8 hardware pattern stipples.
56. Provide a mechanism to call non-accelerated CopyPlane1toN
directly. Adds CopyPlane1toNFallBack to GCInfoRec.
55. Add the HARDWARE_PATTERN_ALIGN_64 flag (not supported yet).
54. Debug the 8x8 hardware pattern.
53. Guarantee a different transparency color instead of using the GC
background color when caching transparent stipples.
52. Add support for 8x8 hardware patterns, and use them for small
tiles.
51. Add BitsPerPixel to xf86AccelInfoRec.
50. Assign fall-back function to xf86AccelInfoRec.ImageWrite if
necessary for convenience.
49. Lift the VIDEO_SOURCE_GRANULARITY_PIXEL requirement for indirect
screen-to-screen text color expansion.
48. Add an extra set of wide slots to the pixmap cache. Disabled.
47. Honour ONE_RECT_CLIPPING flag when checking line drawing function.
46. Support WriteBitmap when only non-transparent color expansion is
supported.
Version 0.3b (31 October 1996)
45. Update vga256 patch (vga.c) to force GC validation after a VT-switch.
44. Support ImageText when only transparent color expansion is supported.
43. Add PolyText color expansion for TE fonts.
42. Use devPrivate.val to signal status of GC ops, and use this to
modify them when required.
41. Create new GC ops when GC ops are still pointing to a defaults
structure when modifying GC ops in ValidateGC.
40. As a stop-gap measure, reset all GC ops and pretend everything in
the GC has changed when a switch-away is detected in ValidateGC.
39. Update sampledrv.c.
38. Disable the pixmap cache if the memory range is wrongly specified
(Alan).
37. Fix typo in pixmap cache initialization in sampledrv.c.
36. Make xf86PolyRectangle use new line drawing functions for vertical
lines.
35. Add ErrorTermBits to the xf86AccelInfoRec for re-scaling Bresenham
error terms when software clipping is used.
34. Add flags to indicate whether PolySegment is supported with CapNotLast
using TwoPointLine.
33. Implement xf86PolyLine/Segment using BresenhamLine or TwoPointLine
(untested).
32. Add BresenhamLine and TwoPointLine primitives.
32. Fix initial coordinates for color-expanded text.
31. Move function prototypes from xf86gc.h to xf86local.h.
30. Take into account source offset into first byte of bitmap scanline.
29. Bitmap with buffered screen-to-screen color expansion now works.
28. Fix prototypes for intermediate-level text functions.
27. Fix source overrun problems in xf86DrawBitmapScanline.
26. Intialize FramebufferBase in ScreenInit.
25. Implement untested/unused functions for filling 24bpp pixels using
8bpp mode color expansion in two passes.
24. Fix color expand flag testing in xf86bitmap.c.
Version 0.3a (28 October 1996)
23. If tiles are cached but not stipples (but stipples are accelerated),
be aware of this in xf86PolyFillRect.
22. Disable updating of the PolyGlyphBlt GC op in ValidateGC because of an
unresolved problem showing up at > 8bpp.
21. Implement a better understanding of how GC changes affect the selection
of cfb and accelerated functions in ValidateGC.
20. Fix the way ValidateGC handles cfb operations initialized with
MatchCommon.
19. Add xf86mapfuncs.h for local functions that are depth-mapped.
18. Fix bugs accidently introduced into xf86initacl.c in version 0.3,
which effectively disabled pixmap caching.
17. Add untested, unoptimized CopyPlane1to24 (GXcopy, no planemask),
for use with stipple caching. Doesn't work yet.
Version 0.3 (27 October 1996)
16. Use framebuffer function for some vertical lines in PolyRectangle.
15. Fix SaveAreas and RestoreAreas for vga256.
14. Add PolyLine and PolySegment hooks to the xf86GCInfoRec.
13. Fix missing cfbPolyFillArc mappings in vga256map.h.
12. Fix MatchCommon call in ValidateGC. This fixes vga256 operation.
11. Fix updating of GC ops for text functions during ValidateGC.
10. Optimize the CopyPlane1to16/32 functions.
9. Update the docs, and include a sample driver template.
Version 0.2 (26 October 1996)
8. Re-enable CopyPlane hook.
7. Fix typo that prevented CopyArea from being accelerated.
6. Fix confusion over arguments of cfbBitBlt helper function.
5. Call the correct depth-specific cfbBitBlt helper function.
4. Fix the coordinates for the transparent stipple mi fall-back.
3. Fix problem with zero-width spans in FillSpansAsRects.
2. Disable CopyPlane hook because it doesn't work.
Version 0.1 (25 October 1996)
1. First logged version. Implements solid filled rectangles, arcs,
polygons, CopyArea, pixmap caching.
Untested are line-drawn rectangles, color expansion text, color
expansion stipple upload, bitmaps.
Overview of XAA
---------------
1.1
Some advantages of this new interface:
- Easier implementation of accelerated functions.
- More efficient use of accelerated functions.
- Code size reduction.
- Source code size reduction (less duplicated code).
- Greater test base for higher level code.
- Improvements can be beneficial for all drivers.
Disadvantages:
- More overhead in ValidateGC.
- Arguably more complex set of acceleration primitives.
1.2 Graphics Operation Flags
GXCOPY_ONLY
Indicates that the graphics operation only allows a GXcopy
raster-op (copy source). If this flag is not defined, the graphics
operation is assumed to be supported with all 16 raster operations.
NO_PLANEMASK
Indicates that the graphics operation does not allow a write
planemask. All bits in a pixel are written.
ONE_RECT_CLIPPING
Indicates that an accelerated function (usually a high-level one that
handles clipping) only accepts one clipping rectangle. This may be
of use for line drawing. [It is only checked for line drawing]
RGB_EQUAL
Indicates that the graphics operation requires that the red, green,
and blue bytes of the foreground color (and background color, if
applicable) are equal. This is useful for 24bpp when the graphics
coprocessor is used in 8bpp mode, which is the often the case since
most chips have no or only limited support for acceleration at
24bpp. This way, many operations will be accelerated for the common
case of "grayscale" colors. It should only be defined for 24bpp.
NO_TRANSPARENCY
Indicates that the graphics operation does not handle transparency.
This can be enabled for screen-to-screen copy.
NO_CAP_NOT_LAST
Indicates that the graphics operation (PolyLine or PolySegment) does
not support not drawing of the last pixel.
TRANSPARENCY_GXCOPY
Indicates that, unlike the case of no transparency, when
transparency is enabled only the GXcopy raster-op is allowed. This
is valid only for ScreenToScreenCopy.
1.3 The AccelInfoRec
Flags
This is a set of flags that controls some overall parameters for
the acceleration code.
BACKGROUND_OPERATIONS
If enabled, the "simple" acceleration functions are not assumed to
wait until the graphic coprocessor operation is finished. The
generic acceleration functions will call Sync() when all operations
have been done.
DELAYED_SYNC
If enabled, the acceleration functions will try to use as much
parallellism between the CPU and the accelerator as possible by
postponing calls to Sync() as much as possible. Without this flag, and
with BACKGROUND_OPERATIONS enabled, the acceleration functions will call
Sync() after each major set of acelerated operations. With DELAYED_SYNC
set, Sync() will only be called when absolutely necessary to maintain
framebuffer consistency. In practice, this means Sync() will only be
called just before writing to or reading from the framebuffer with the
CPU. This option increases performance, but is much more vulnerable to
display errors and synchronisation problems.
PIXMAP_CACHE
Use a pixmap cache for tiles and stipples, when the required
low-level functions (such as ScreenToScreenCopy) are available.
COP_FRAMEBUFFER_CONCURRENCY
CPU access to the framebuffer can continue while a screen-to-screen
coprocessor operation is being executed. This is taken advantage of
in some color expansion routines when CPU-to-screen color expansion
is not available, and potentially in some other places.
DO_NOT_CACHE_STIPPLES
Do not cache stipples, but instead use the CPU-to-screen color
expansion routines for stipples. These routines have not yet been
implemented.
HARDWARE_CLIP_LINE
When a general line has to be clipped, use hardware clipping
(SetClippingRectangle must be defined, and clipping must only
be active for the single following general line draw).
USE_TWO_POINT_LINE
Use two-point lines (TwoPointLine) instead of Bresenham lines for
general lines. This flag is automatically set if appropriate. It
must be set in a driver.
HORIZONTAL_TWOPOINTLINE
Use TwoPointLine for Horizontal Lines instead of the FillRectSolid
function.
TWO_POINT_LINE_NOT_LAST
Indicates that TwoPointLine supports the notlast flag that indicates
whether the last pixel should be drawn. If this is not supported,
PolyLine and PolySegment cannot support the CapNotLast CapStyle.
TWO_POINT_LINE_ERROR_TERM
Indicates that TwoPointLine supports the optional error term flag
and parameter that allows the initial error term to be provided
for software clipped lines.
ONLY_TWO_BITBLT_DIRECTIONS
Indicates that ScreenToScreenCopy is only allowed with xdir = ydir
(both -1 or both 1). BitBLTs are converted to smaller BitBLTs with
supported directions if necessary.
ONLY_LEFT_TO_RIGHT_BITBLT
Indicates that ScreenToScreenCopy is only allowed with xdir = 1.
BitBLTs are converted to smaller BitBLTs with supported directions
if necessary.
NO_SYNC_AFTER_CPU_COLOR_EXPAND
Indicates that a Sync() is not required after a CPU-to-screen color
expansion operation. Generally, this can be defined if host color
expansion data is processed by the graphics chip in the same way as
accelerated graphics commands (it uses the command FIFO).
NO_TEXT_COLOR_EXPANSION
Do not use color expansion to accelerate text. Define this if
color expansion is slower than plain framebuffer for text (which
might happen with scanline screen-to-screen color expansion,
when there is little video memory bandwidth but the CPU to
framebuffer bandwidth is decent).
LINE_PATTERN_POWER_OF_2_ONLY
Indicates that hardware dashed lines only support pattern lengths
that are a power of two.
LINE_PATTERN_ONLY_TRANSPARENCY
Indicates that the hardware dashed lines only support the foreground
colors.
LINE_PATTERN_MSBFIRST_INCREASING
LINE_PATTERN_MSBFIRST_DECREASING
Indicates that the dashed line pattern stored in the LinePatternBuffer
progresses from MSB to LSB within each DWORD. INCREASING patterns
are stored with the beginning of the pattern in the first DWORD of
the LinePatternBuffer, while DECREASING indicates that the end of the
pattern is in the first DWORD in the LinePatternBuffer. INCREASING or
DECREASING is only relevant if the pattern is longer than a DWORD.
PatternFlags
This is a set of flags that controls some overall parameters for
the acceleration code related to the hardware pattern.
HARDWARE_PATTERN_SCREEN_ORIGIN
Indicates that the baseline origin for hardware 8x8 pattern fills
is the top left corner of the screen, as opposed to the top left
corner of the area to be filled. Note that an origin offset feature
might still be supported.
HARDWARE_PATTERN_TRANSPARENCY
Indicates that the hardware 8x8 pattern fill supports transparency
color compare (does not apply to mono pattern).
HARDWARE_PATTERN_ALIGN_64
Indicates that the 8x8 hardware pattern must be stored on a
64-pixel boundary in video memory, and programmed pattern start
location must be the start of such a pattern. In the absence of a
programmable origin, this requires a lot more pre-rotated copies to
be made, although they should still fit within a 128x128 cache
area.
HARDWARE_PATTERN_MOD_64_OFFSET
Indicates that while the 8x8 hardware pattern must be stored
aligned on a 64-pixel boundary, the programmed pattern start
location can in fact include a multiple-of-8-pixels offset, which
indicates the vertical offset into the pattern. This flag is
mutually exclusive to HARDWARE_PATTERN_ALIGN_64. If you can also
specify the horizontal offset, do not use this flag, but instead
use HARDWARE_PATTERN_PROGRAMMED_ORIGIN.
HARDWARE_PATTERN_PROGRAMMED_BITS
Indicates that the monochrome (color expand) 8x8 pattern data must be
programmed into registers, rather than stored in video memory. This
is only supported in combination with the following flag.
HARDWARE_PATTERN_PROGRAMMED_ORIGIN
Indicates that the hardware pattern supports a programmable origin
(x and y offsets into the pattern). This is supported for all three
pattern storage types (programmed monochrome, monochrome in video
memory and regular (pixel depth) in video memory).
HARDWARE_PATTERN_BIT_ORDER_MSBFIRST
Indicates that the monochrome 8x8 pattern data is in MSB-first bit
order ("Windows-style").
HARDWARE_PATTERN_MONO_TRANSPARENCY
Indicates that the monochrome 8x8 pattern supports transparency
(signalled by a background color equal to -1).
HARDWARE_PATTERN_NOT_LINEAR
Indicates that the 8x8 pattern data should not be stored linearly
in video memory, but rather, as a tiled 8x8 pattern in the cache.
HARDWARE_PATTERN_NO_PLANEMASK
Indicates that the 8x8 pattern doesn't support a write planemask.
All bits in a pixel must be written.
Sync()
This function should be defined if BACKGROUND_OPERATIONS is enabled
(and also if any kind of CPU-to-screen color expansion is used). It
should wait for all graphics coprocessor operations to finish. It
also provides an opportunity to clean up the coprocessor state
after a batch for commands.
SetupForFillRectSolid(color, rop, planemask)
Sets up the color, raster-op and planemask for a solid rectangle
fill. It is called once before a batch of "Subsequent" fill
commands. Currently the restrictions for the operation are set up
with xf86GCInfoRec.PolyFillRectSolidFlags.
Another acceleration commmand might still be executing when a SetUp
function is called (assuming BACKGROUND_OPERATIONS). You may have
to do a Sync() here. In the current XAA code this doesn't happen,
but it might in the future.
SubsequentFillRectSolid(x, y, w, h)
This actually fills a rectangle. When writing spans, h will
be 1. It is usually called many times in a row.
A key thing to notice here is that the function call overhead
is "eaten" when performing coprocessor operations "in the
background" (concurrently with CPU processing). If you need to
wait for the previous operation to finish before sending the
commands for the next one, you can do that in this function.
Generally, you want to avoid querying the chip as much as
possible since PCI read operations have a devastating effect
on performance.
This function is taken advantage of when filling solid rectangles,
spans, polygons and arcs, and in other places.
SubsequentFillTrapezoidSolid(y, h, left, dxl, dyl, el, right, dxr, dyr, er)
Defining this will enable most solid polygons to be rendered as a
collection of trapezoids rather than horizontal spans. Y is the
top line of a trapezoid which has height h. Left and right are the
starting X positions of the left and right edges on the top line.
Dyl/dxl and dyr/dxr define the slopes of the left and right edges
and el and er are the initial error terms (always rendered from
top of the screen to the bottom).
Note that dyl, dxl, dyr and dxr are merely used to define the slope
of the line and are not necessarily the deltas between the top and
bottom points.
SetupForScreenToScreenCopy(xdir, ydir, rop, planemask, transparency_color)
Set up for a screen-to-screen BitBLT. The transparency color is -1
when there is no transparency. Transparency is used when drawing
transparent stipples from the pixmap cache. There are general flags
(set in xf86AccelInfoRec.Flags) to indicate restrictions for the
direction of the BitBLT (xdir, ydir); if restrictions exist, the
generic code converts the blits to allowable blits. Currently the
other restrictions for the operation are set up with
xf86GCInfoRec.CopyAreaFlags.
SubsequentScreenToScreenCopy(x1, y1, x2, y2, w, h)
Perform a screen-to-screen BitBLT. Again often there is
a batch of commands. Note that (x1, y1) is always the top-left
corner, regardless of the direction.
It is used for screen-to-screen area copies (such as scrolling),
and for the pixmap cache.
SubsequentBresenhamLine(x1, y1, octant, err, e1, e2, length)
Draw a line using the Bresenham algorithm. This is the most common
general line drawing feature that chips support. The octant consists
of bitflags that are defined as follows (miline.h defines them):
XDECREASING 4 Draw from right to left (a.o.t. right to left).
YDECREASING 2 Draw from bottom to top (top to bottom).
YMAJOR 1 Y is the major axis (X is the major axis).
The error terms are usually no bigger than a screen coordinate, but
when software clipping is used, the error time might be too big; it
is then rescaled according to the number of bits specified in
ErrorTermBits. When HARDWARE_CLIP_LINE is defined,
SetClippingRectangle must be defined. It seems to me that hardware
clipping makes the implicit assumption that the chip can handle
coordinates in the range [-37268, 32767]. Or are coordinates
guaranteed to be on-screen? Anyway I think having the chip trace
lines way off the screen does not sound like a good idea.
There is no SetUp function. SetupForFillRectSolid is called before
a batch of lines (this linked to the fact that horizontal lines
are drawn with FillRectSolid; they should not be affected by
hardware clipping).
SubsequentTwoPointLine(x1, y1, x2, y2, bias)
Draw a line between (x1, y1) and (x2, y2); the last point is drawn.
This is found in some newer chips. It is taken advantage of. The 8
lower bits of bias indicate whether 1 should be subtracted from the
error term for each of the octants (e.g. bit 0 matches octant 0),
it is not a requirement to support this parameter. If bit 8
(0x100) of bias is set, the last pixel should not be drawn (use
TWO_POINT_LINE_NOT_LAST to indicate whether this flag is
supported). This function requires hardware clipping.
Note that horizontal lines are always drawn with FillRectSolid.
SetClippingRectangle(x1, y1, x2, y2)
Set the hardware clipping rectangle. (x2, y2) is the inclusive
right-bottom corner. Clipping should be active only for the first
following line draw (BresenhamLine or TwoPointLine). This function
is only used when HARDWARE_CLIP_LINE is enabled.
SetupForDashedLine(fg, bg, rop, planemask, size)
Setup the hardware for accelerated dashed lines. Fg and bg are the
foreground and background colors respectively. If bg is -1, the
background is transparent. Size indicates the length of the color
expand pattern in bits (pixels) to be used by the subsequent dashed
line command. By the time this function is called, the bit pattern
will already reside in the LinePatternBuffer. The packing order in
the buffer is determined by one of the following flags:
LINE_PATTERN_MSBFIRST_MSBJUSTIFIED
LINE_PATTERN_MSBFIRST_LSBJUSTIFIED
One and only one of these flags must be set. A LinePatternBuffer must
be allocated and its length (in bits) recorded as LinePatternMaxLength
in order to use hardware dashed lines.
SubsequentDashedBresenhamLine(x1, y1, octant, err, e1, e2, length, offset)
SubsequentDashedTwoPointLine(x1, y1, x2, y2, bias, offset)
Draw a dashed line. The first arguments are identical to those of
SubsequentBresenhamLine and SubsequentTwoPointLine. The only
difference is the offset argument which specifies the offset into
the bit pattern the first pixel of this line must start on.
LinePatternMaxLength
LinePatternBuffer
LinePatternMaxLength is the maximum length pattern that the hardware
can accelerate. Additionally the LINE_PATTERN_POWER_OF_2_ONLY flag
can be set indicating that only patterns with lengths that are a
power of two can be accelerated. The LinePatternBuffer should be
allocated by the driver and should be long enough to hold
LinePatternMaxLength bits padded out to a DWORD.
SetupForImageWrite(rop, planemask, transparency_color)
This sets up for multiple pixmap (image) transfers using the
transfer window defined by ImageWriteBase and ImageWriteRange.
Currently, DWORD padding is required. CPU_TRANSFER_BASE_FIXED
is supported but is often only faster than the non-accelerated
versions for rops other than GXcopy.
If the transparency_color is not -1, then it defines the
transparent "masking" color. If your hardware cannot do
transparent image transfers then you should set NO_TRANSPARENCY
in the ImageWriteFlags.
SubsequentImageWrite(x, y, w, h, skipleft)
This sets up for an individual image transfer and is
analogous to the SubsequentCPUToScreenColorExpand function.
If the driver supports left edge clipping, then the skipleft
parameter specifies the number of pixels to be skipped on the
left edge (0 - 3). This is used in cases of unfavorable source
alignment.
SubsequentScanlineScreenToScreenCopy(LineAddr, skipleft, x, y, w)
This sets up for a scanline by scanline image transfer. If the
driver supports left edge clipping, then the skipleft parameter
specifies the number of pixels to be skipped on the left edge
(0 - 3). This is used in cases of unfavourable source alignment.
ImageWriteFlags
In addition to the regular planemask/rop restrictions, the
following flags are defined (with meaning similar to the color
expand flags).
LEFT_EDGE_CLIPPING
LEFT_EDGE_CLIPPING_NEGATIVE_X
NO_TRANSPARENCY
CPU_TRANSFER_BASE_FIXED
Additionally, a NO_GXCOPY flag can be set to indicate that
ImageWrites should only be used for the more complicated rops.
This is useful in the case where accelerator assisted image transfers
are not faster than the unaccelerated versions for simple copies.
Note: When using the Scanline version of ImageWrite the
CPU_TRANSFER_BASE_FIXED is currently unsupported.
ImageWriteBase
ImageWriteRange
ImageWriteOffset
This is often, but not necessarily, the same window defined by the
CPUToScreenColorExpandBase and Range. ImageWrites may be
suboptimal if this range is not at least as large as the
framebuffer width. By default ImageWriteBase is set to
FramebufferBase if not defined.
When using the ScanlineScreenToScreenCopy, the ImageWriteRange must
be a large buffer in video memory, from testing about four scanlines
should be sufficient. (needs furthur testing).
ImageWriteOffset should be set as the (x * y) offset into the frame
buffer for the copy, similar to ImageWriteBase. ImageWriteBase is
not used in the Scanline version.
SetupForFill8x8Pattern(patternx, patterny, rop, planemask, trans_col)
Set up for hardware 8x8 pattern fill (non-color expanded). If
neither the HARDWARE_PATTERN_SCREEN_ORIGIN flag or the HARDWARE_
PATTERN_PROGRAMMED_ORIGIN flag is set, patternx and patterny can be
ignored. Otherwise, patternx and patterny just indicate the video
memory address where the pattern is stored. The pattern is stored
linearly in video memory unless HARDWARE_PATTERN_NOT_LINEAR is
specified. When the transparency color is -1 there is no transparency.
SubsequentFill8x8Pattern(patternx, patterny, x, y, w, h)
Perform a hardware 8x8 pattern fill. If the flag HARDWARE_PATTERN_
SCREEN_ORIGIN is set, patternx and patterny can be ignored;
otherwise, patternx and patterny indicate the video memory address
where the pattern is stored. However, if HARDWARE_PATTERN_
PROGRAMMED_ORIGIN is set patternx and patterny define the origin
offset into the pattern. Any rotation issues are handled by the
generic code by generating pre-rotated copies of the pattern. The
pattern address will always be at a multiple of 8 pixels offset
from the start of a scanline (x will be a multiple of 8), unless
the HARDWARE_PATTERN_ALIGN_64 is set. At the moment, setting
HARDWARE_PATTERN_ALIGN_64 in the absence of HARDWARE_PATTERN_
PROGRAMMED_ORIGIN will disable the use of this function, but this
will change in a future version.
SetupFor8x8PatternColorExpand(patternx, patterny, bg, fg, rop, planemask)
Set up for hardware color-expanded 8x8 pattern fill. If the flag
HARDWARE_PATTERN_SCREEN_ORIGIN is set, or HARDWARE_PATTERN_
PROGRAMMED_ORIGIN is set in the absence of HARDWARE_PATTERN_
PROGRAMMED_BITS, patternx and patterny indicate the video memory
address where the pattern is stored, which will be on an 8 byte
boundary relative to the start of a scanline. Otherwise, patternx
and patterny can be ignored. The pattern x-coordinate will be in
units of "bits", that is, a byte offset of one relative to the
start of the scanline is represented by a patternx value of 8.
If HARDWARE_PATTERN_PROGRAMMED_BITS is set, patternx and patterny
are overloaded as follows: patternx holds the first 4 lines (32
pixels) of the pattern, with each byte (MSB-first bit order if the
HARDWARE_PATTERN_BIT_ORDER_MSBFIRST flag is set) corresponding to a
scanline of the pattern. patterny holds the second half of the
pattern. This is the so-called "Windows-format".
A background color of -1 indicates transparency (support of
transparency is indicated by HARDWARE_PATTERN_MONO_TRANSPARENCY).
Subsequent8x8PatternColorExpand(patternx, patterny, x, y, w, h)
Perform a hardware color-expanded 8x8 pattern fill. If the flag
HARDWARE_PATTERN_SCREEN_ORIGIN is set, patternx and patterny
can be ignored; otherwise, patternx and patterny indicate the
video memory address where the pattern is stored. Any rotating
issues are handled by the generic code by generating pre-rotated
copies of the pattern. Again patternx is in "bit" or "stencil"
units.
If HARDWARE_PATTERN_PROGRAMMED_ORIGIN is set, patternx and
patterny hold the origin (x and y offsets into the pattern).
HARDWARE_PATTERN_SCREEN_ORIGIN may be defined additionally;
in that case, the following is true: patternx and patterny will
be the same for all "Subsequent" calls. You may only need to
program the origin in the first Subsequent call.
ColorExpandFlags
This selects the restrictions for color expansion operations. The
flags are extended with a set of flags that is used to define
details about the hardware-specific implementation of color
expansion, as performed by the low-level color expansion functions.
The following extra flags are defined:
SCANLINE_NO_PAD
SCANLINE_PAD_BYTE
SCANLINE_PAD_DWORD
Defines the padding at the end of a scanline of monochrome
data, which indicates the number of bits that is ignored by the
graphics chip at the end of each scanline in multi-scanline
color-expansion operations from the CPU to the screen. DWORD
padding is preferred. These flags do not apply to screen-to-screen
color expansion. Currently, not defining SCANLINE_PAD_DWORD will
result in non-optimized and limited use of CPU-to-screen color
expansion.
CPU_TRANSFER_PAD_DWORD
CPU_TRANSFER_PAD_QWORD
Defines the total amount of data to be transferred in a
multi-scanline CPU-to-screen color-expansion operation. Most
chips pad to a DWORD boundary.
CPU_TRANSFER_BASE_FIXED
Indicates that the destination address for monochrome data for
CPU-to-screen color-expansion is a fixed address, rather than
a large range starting from the ColorExpandBase address.
ONLY_TRANSPARENCY_SUPPORTED
Indicates that the color expansion operations only work with
transparency (bit 0 pixels are not written).
TRIPLE_BITS_24BPP
When enabled (must be in 24bpp mode), color expansion functions
are expected to require three times the amount of bits to be
transferred so that 24bpp grayscale colors can used with color
expansion in 8bpp coprocessor mode. Each bit is expanded to 3
bits when writing the monochrome data. When definining this
flag, also define RGB_EQUAL.
VIDEO_SOURCE_GRANULARITY_PIXEL
VIDEO_SOURCE_GRANULARITY_BYTE
VIDEO_SOURCE_GRANULARITY_DWORD
This indicates the granularity of the horizontal source location
specification for screen-to-screen color expansion operations.
It is either one pixel, 8 pixels (a byte), or 32 pixels (a 32-bit
word). If there's some kind of clipping mechanism available, pixel
granularity is usually possible.
BIT_ORDER_IN_BYTE_LSBFIRST
BIT_ORDER_IN_BYTE_MSBFIRST
This defines the order of bits within a byte. As far as X is
concerned, it's best when the lowest-order bit corresponds to
the leftmost pixel on the screen (this is the technically
superior format), but many chips only support the "wrong" bit
order (MSBFIRST).
LEFT_EDGE_CLIPPING
This indicates that CPU-to-screen color expansion operations
support the left-edge clipping parameter, which indicates
the number of pixels to skip at the left edge.
LEFT_EDGE_CLIPPING_NEGATIVE_X
This indicates that when the left-edge clipping parameter is
specified, the x coordinate is allowed to be negative (while
being on-screen when the parameter is actually added to it).
At the moment, this flag is a requirement for CPU-to-screen
color expansion acceleration of (large) stipples.
Note that the regular graphics operations flags for raster-op,
planemask and color restrictions are also valid. NO_TRANSPARENCY
indicates that color expansion does not support transparency.
SetupForCPUToScreenColorExpand(bg, fg, rop, planemask)
Set up for CPU-to-screen color expansion operations. This is used
for writing bitmaps and text, and (not yet) stipples. When bg is
equal to -1, the background (bits that are 0) is transparent.
SubsequentCPUToScreenColorExpand(x, y, w, h, skipleft)
Perform a CPU-to-screen color expansion operation. The monochrome
data will be transferred after this function has been called.
Sync() is called when the data has been transferred. The optional
skipleft parameter defines a number of pixels (0 - 7) to be skipped
at the left edge (at the start of each scanline).
SetupForScreenToScreenColorExpand(bg, fg, rop, planemask)
Set up for screen-to-screen color expansion operations. This will
only be used when the storing of monochrome data in the pixmap (or
font) cache is implemented.
SubsequentScreenToScreenColorExpand(srcx, srcy, x, y, w, h)
Perform a screen-to-screen color expansion operation. scrx is in
pixel units (8 corresponds to one byte offset).
SetupForScanlineCPUToScreenColorExpand(x, y, w, bg, fg, rop, planemask)
Set up for a scanline-by-scanline color expansion operation from
the CPU to the screen. This is not of much use (except when a chip
is not compatible with supported methods of color expanding a whole
bitmap). It's not used currently.
SubsequentScanlineCPUToScreenColorExpand()
Color expand a scanline from the CPU to the screen. Many chips
automatically add the pitch of the dislay to the destination
address after a scanline has been written so that it doesn't need
to be updated. Otherwise you'll need to keep track of the address.
SetupForScanlineScreenToScreenColorExpand(x, y, w, h, bg, fg, rop, planemask)
Set up for a scanline-by-scanline color expansion operation from
the screen to the screen (top-down). This is typically used for
chips that don't have usable CPU-to-screen color expansion. It is
taken advantage of for bitmaps, text, and (not yet) stipples.
SubsequentScanlineScreenToScreenColorExpand(srcaddr)
This performs color expansion of a scanline from the screen
(typically a scratch buffer) to the screen. To take advantage of
this operation, ScratchBufferAddr and ScratchBufferSize must be
defined (> 0), and either linear addressing must be used or
ScratchBufferBase must be defined. Being able to support
COP_FRAMEBUFFER_CONCURRENCY is a win here. The srcaddr is the
linear framebuffer address in (non-expanded) pixel units. The real
address is (srcaddr / 8). When TRIPLE_BITS_24BPP is defined,
srcaddr is in non-expanded 8bpp pixel units.
In addition, PingPongBuffers defines the number of alternating
buffers used. The default is two. Depending on the implementation
and size of framebuffer and coprocessor write buffers on the chip,
you might need more than two.
CPUToScreenColorExpandBase
This address defines the base address for writing monochrome bitmap
data to when performing CPU-to-screen color expansion operations.
When the CPU_TRANSFER_BASE_FIXED flag is not set and
CPUToScreenColorExpandRange is not defined, a large range is
assumed to be available (at least the number pixels in the virtual
screen / 8). For text operations this is probably never a problem.
At the moment hardware that has 64 bytes or so of transfer space is
unlucky. 32-bit access is always used.
If this is not defined, FramebufferBase will automatically be
used.
CPUToScreenColorExpandRange
This defines the size of the "window" starting from the base
address for writing CPU-to-screen color-expand data. If this is
not defined or zero, the range is assumed to be large enough.
When it is greater than the width of the screen in pixels / 8,
the base address will be adjusted if necessary at the end of each
scanline. Currently, if it is smaller than that, the
CPU_TRANSFER_BASE_FIXED flag is set.
At the moment, the bottom line is that you need about 256 bytes
of transfer space to use CPU-to-screen color expansion (128 bytes
with a 1024 pixel screen width) with PCI-burst mode support.
However, "fixed-base" operation is supported.
FramebufferBase
This is a pointer to the framebuffer. It is required by the
ScanlineScreenToScreenColorExpand, and is automatically
initialized. It should not be set up in a chip-specific driver.
BitsPerPixel
This is the number of bits per pixel, stored here for convenience.
There's no need to initialize this from a driver.
FramebufferWidth
The is the width of the framebuffer in pixels, stored here for
convenience. There's no need to initialize this from a driver.
ScratchBufferAddr
ScratchBufferSize
This specifies the linear address in bytes and size of the scratch
buffer used for ScanlineScreenToScreenColorExpand operations.
ScratchBufferBase
This is a pointer to the mapped video memory of the scratch buffer.
When not defined, the scratch buffer is assumed to be at the
specified offset (ScratchBufferAddr) into a linear framebuffer.
This field should only be initialized when using
ScanlineScreenToScreenColorExpand with a non-linear framebuffer,
in which case it should be noted that it is totally independent
from ScratchBufferAddr.
PingPongBuffers
This field defines the number of alternating buffers used in the
scratch buffer for ScanlineScreenToScreenColorExpand. The default
is two. Depending on the implementation and size of framebuffer and
coprocessor write buffers on the chip, you might need more than
two.
ErrorTermBits
Indicates the number of bits of precision for the Bresenham line
error terms. The absolute values of the of the terms are guaranteed
to be in the range [0, 2 ^ ErrorTermBits - 1]. If your registers
have 14 significant bits, you would probably use 13 here because of
the sign bit.
ServerInfoRec
This is a pointer to the XFree86 server InfoRec. It must be defined.
The InitPixmapCache function initializes it for compatibility with
earlier versions of XAA. The SVGA server initializes it
automatically.
PixmapCacheMemoryStart
PixmapCacheMemoryEnd
These values must be defined if the pixmap cache is enabled. The
InitPixmapCache function initializes them, for compatibility with
earlier versions of XAA.
1.6 Commonly Used Parameters
This section clarifies the format of some of the commonly used
parameters in the low-level functions (as described above).
Coordinates ("x", "y") are pixel coordinates unless otherwise noted.
The width and height ("w", "h") define the size of the area involved in
pixel units.
Colors (named "color", "bg" or "fg") are simple pixel values. They are
not "replicated" over the 32-bit integer argument. So for example in
8bpp mode, bits 0-7 of the value represent the pixel value, and the
rest of the bits is zero. If your chip requires a "replicated" 32-bit
pixel value (4 duplicated pixels for 8bpp), you will have to do that in
your low-level functions implementation.
The planemask is a mask that defines what bits in the pixel value
are to be modified on the screen. Again, this value cannot be assumed
to be "replicated" to 32-bit in 8bpp and 16bpp modes.
The raster-op ("rop") is one of the 16 raster-operations that X
defines:
#define GXclear 0x0 /* 0 */
#define GXand 0x1 /* src AND dst */
#define GXandReverse 0x2 /* src AND NOT dst */
#define GXcopy 0x3 /* src */
#define GXandInverted 0x4 /* NOT src AND dst */
#define GXnoop 0x5 /* dst */
#define GXxor 0x6 /* src XOR dst */
#define GXor 0x7 /* src OR dst */
#define GXnor 0x8 /* NOT src AND NOT dst */
#define GXequiv 0x9 /* NOT src XOR dst */
#define GXinvert 0xa /* NOT dst */
#define GXorReverse 0xb /* src OR NOT dst */
#define GXcopyInverted 0xc /* NOT src */
#define GXorInverted 0xd /* NOT src OR dst */
#define GXnand 0xe /* NOT src OR NOT dst */
#define GXset 0xf /* 1 */
For each graphics operation you can define that only GXcopy is supported
by setting the GXCOPY_ONLY flag in the flags for that particular
operation. Similarly, NO_PLANEMASK indicates that the plane mask is
not supported.
1.5 The best strategy
Start with simple filled solid rectangles and screen-to-screen copies
(BitBLT). Those two functions alone will accelerate the vast majority
of graphic operations requested. The sample driver can be used as a
starting point.
Next you might want to look at color expansion (CPUToScreen, or if
that can't be done, ScanlineScreenToScreen), BresenhamLine or
TwoPointLine, and Fill8x8Pattern/ColorExpand8x8Pattern.
The relative win of seperately implementing functions that are already
accelerated with solid filled rectangles varies, but it can make a
difference since just using rectangle fills has some overhead. You may
be able to make better use of features of the graphics chip, and better
exploit CPU/graphics concurrency, although this already done by the
generic code for some operations (such as filled polygons and arcs).
2 Acceleration hooks
Many operations can be "hooked" at a higher level, instead of just
defining the low-level functions. This can be useful for existing code
or operations for which there are no adequate low-level functions. What
follow is a description of most of the functions that can be hooked.
[This isn't complete]
2.1 Filled Rectangles
Rectangles can be filled with a single source color, or with three
different types of repeating pattern:
Stipple: a transparent bitmap pattern where 1's correspond to the
foreground color.
Opaque stipple: a bitmap pattern where 0's correspond to the
background color and 1's to the foreground color.
Tile: an image pattern that can have full pixel depth.
2.1.1 Solid Filled Rectangles
Solid filled rectangles are a very common operation. Apart from
a regular solid fill, special raster ops are often used, for example
for inverting the destination.
To define a simple function for drawing one filled rectangle that will
be used for many kinds of operation, use this:
xf86AccelInfoRec.SetupForFillRectSolid = MySetupForFillRectSolid;
xf86AccelInfoRec.SubsequentFillRectSolid = MySubsequentFillRectSolid;
If you accelerate solid filled rectangles, and have a complete
replacement for PolyFillRect that handles clipping, do this:
xf86GCInfoRec.PolyFillRectSolid = MyPolyFillRect;
If you don't handle clipping, but do have a replacement for accelerated
solid filled rectangles, do this:
xf86GCInfoRec.PolyFillRectSolid = xf86PolyFillRect;
xf86AccelInfoRec.FillRectSolid = MyFillRectSolid;
In all cases, the following flags can be set in
xfGCInfoRec.FillRectSolidFlags:
GXCOPY_ONLY Only the raster-op GXcopy is supported.
NO_PLANEMASK No special planemask is supported.
RGB_EQUAL Only a foreground color with same values
for red, green and blue is accepted.
2.1.2 Tiled Filled Rectangles
If you have the required low-level functions and enable PIXMAP_CACHE,
the pixmap cache will be used to draw tiles. For tiles, you just need
ScreenToScreenCopy.
If you accelerate tiled filled rectangles, and have a complete
replacement for PolyFillRect that handles clipping, do this:
xf86GCInfoRec.PolyFillRectTiled = MyPolyFillRect;
If you don't handle clipping, but do have accelerated tiled filled
rectangles, do this:
xf86GCInfoRec.PolyFillRectTiled = xf86PolyFillRect;
xf86AccelInfoRec.FillRectTiled = MyFillRectTiled;
In both cases, the following flags can be set in
xfGCInfoRec.FillRectTiledFlags:
GXCOPY_ONLY Only the raster-op GXcopy is supported.
NO_PLANEMASK No special planemask is supported.
2.1.3 Stippled Filled Rectangles
If you have the required low-level functions and enable PIXMAP_CACHE,
the pixmap cache will be used to draw stipples. For stipples, you just need
ScreenToScreenCopy with support for transparency.
If you accelerate stippled filled rectangles, and have a complete
replacement for PolyFillRect that handles clipping, do this:
xf86GCInfoRec.PolyFillRectStippled = MyPolyFillRect;
If you don't handle clipping, but do have accelerated stippled filled
rectangles, do this:
xf86GCInfoRec.PolyFillRectStippled = xf86PolyFillRect;
xf86AccelInfoRec.FillRectStippled = MyFillRectStippled;
In both cases, the following flags can be set in
xfGCInfoRec.FillRectStippledFlags:
GXCOPY_ONLY Only the raster-op GXcopy is supported.
NO_PLANEMASK No special planemask is supported.
2.1.4 Opaque Stippled Filled Rectangles
If you have the required low-level functions and enable PIXMAP_CACHE,
the pixmap cache will be used to draw stipples. For stipples, you just need
ScreenToScreenCopy.
If you accelerate opaque filled rectangles, and have a complete
replacement for PolyFillRect that handles clipping, do this:
xf86GCInfoRec.PolyFillRectOpaqueStippled = MyPolyFillRect;
If you don't handle clipping, but do have accelerated opaque filled
rectangles, do this:
xf86GCInfoRec.PolyFillRectOpaqueStippled = xf86PolyFillRect;
xf86AccelInfoRec.FillRectOpaqueStippled = MyFillRectOpaqueStippled;
In both cases, the following flags can be set in
xf86GCInfoRec.FillRectOpaqueStippledFlags:
GXCOPY_ONLY Only the raster-op GXcopy is supported.
NO_PLANEMASK No special planemask is supported.
2.2 Filled Spans
Filled spans can be used for many purposes, mostly filled areas
of different shapes. The fill style can be solid (by far the most
useful), tiled, stippled and opaque stippled.
If you accelerate solid filled spans, and have a complete
replacement for FillSpansSolid that handles clipping, do this:
xf86GCInfoRec.FillSpansSolid = MyFillSpanstSolid;
And similarly for other fill styles:
xf86GCInfoRec.FillSpansTiled = MyFillSpanstTiled;
xf86GCInfoRec.FillSpansStippled = MyFillSpanstStippled;
xf86GCInfoRec.FillSpansOpaqueStippled = MyFillSpanstOpaqueStippled;
If you don't handle clipping, but do have a function for drawing solid
filled spans, do this:
xf86GCInfoRec.FillSpansSolid = xf86FillSpans;
xf86AccelInfoRec.FillSpansSolid = MyFillSpansSolid;
In all cases, the following flags can be set in
xfGCInfoRec.FillSpansSolidFlags (and similarly for for other fill styles):
GXCOPY_ONLY Only the raster-op GXcopy is supported.
NO_PLANEMASK No special planemask is supported.
RGB_EQUAL Only a foreground color with same values
for red, green and blue is accepted.
2.3 Filled Arcs
If you accelerate filled solid arcs, and have a complete replacement
for PolyFillArc that handles clipping, do this:
xf86GCInfoRec.PolyFillArc = MyPolyFillArc;
The following flags can be set in xf86GCInfoRec.PolyFillArcFlags:
GXCOPY_ONLY Only the raster-op GXcopy is supported.
NO_PLANEMASK No special planemask is supported.
If you have a function for accelerated solid horizontal spans, it will
automatically be taken advantage of for filled arcs.
2.4 Text
There are two kinds of text, transparent text (the background is not
written), and image text (the background is filled with the background
color).
There are also two types of font. Terminal-emulator fonts, which have
characters that are all the same size, and non-terminal emulator fonts,
which have characters of varying size.
In the case of image text with a non-terminal emulator font, the filled
background corresponds to the bounding box of the text image.
2.4.1 Transparent Text
If you accelerate transparent text strings, and have a complete replacement
for PolyGlyphBlt that handles clipping, do this if you accelerate
terminal-emulator fonts:
xf86GCInfoRec.PolyGlyphBltTE = MyPolyGlyphBltTE;
And if you also support non-terminal emulator fonts:
xf8GCInfoRec.PolyGlyphBltNonTE = MyPolyGlyphBltNonTE;
And if you also support non-terminal emulator fonts:
xf8GCInfoRec.PolyGlyphBltNonTE = MyPolyGlyphBltNonTE;
If you don't handle clipping, but do have accelerated transparent text:
xf86GCInfoRec.PolyGlyphBltTE = xf86PolyGlyphBltTE;
xf86AccelInfoRec.PolyTextTE = MyPolyTextTE;
And similarly for non-terminal emulator fonts:
xf86GCInfoRec.PolyGlyphBltNonTE = xf86PolyGlyphBltNonTE;
xf86AccelInfoRec.PolyTextNonTE = MyPolyTextNonTE;
2.4.2 Image text
If you accelerate image text strings, and have a complete replacement
for ImageGlyphBlt that handles clipping, do this if you accelerate
terminal-emulator fonts:
xf86GCInfoRec.ImageGlyphBltTE = MyImageGlyphBltTE;
And if you also support non-terminal emulator fonts:
xf8GCInfoRec.ImageGlyphBltNonTE = MyImageGlyphBltNonTE;
If you don't handle clipping, but do have accelerated transparent text:
xf86GCInfoRec.ImageGlyphBltTE = xf86ImageGlyphBltTE;
xf86AccelInfoRec.ImageTextTE = MyImageTextTE;
And similarly for non-terminal emulator fonts:
xf86GCInfoRec.ImageGlyphBltNonTE = xf86ImageGlyphBltNonTE;
xf86AccelInfoRec.ImageTextNonTE = MyImageTextNonTE;
2.5 CopyArea
Screen-to-screen area copies (BitBLTs) are extremely useful. It's vital
for smooth scrolling and dragging of windows. Unaccelerated, this
operation is often slow because of the slowness of read operations
from the framebuffer. This function can also be used to great effect
for caching mechanisms for patterns and fonts, when support for it
is added.
If you accelerate screen-to-screen area copies (BitBLTs), and have a
complete replacement for CopyArea that handles clipping, do this:
xf8GCInfoRec.CopyArea = MyCopyArea;
If you don't handle clipping, but do have an accelerated CopyArea:
xf86GCInfoRec.CopyArea = xf86CopyArea;
xf86AccelInfoRec.ScreenToScreenBitBlt = MyScreenToScreenBitBlt;
In all cases, the following flags can be set in
xfGCInfoRec.CopyAreaFlags:
GXCOPY_ONLY Only the raster-op GXcopy is supported.
NO_PLANEMASK No special planemask is supported.
NO_TRANSPARENCY Transparency color compare is not supported.
3. Opportunities For Improvement
- The graphics operation flags aren't consistent. There should be
seperate flags indicating the restrictions for the lower-level
functions.
- VT-switching awareness has not been extensively tested, and the
current implement has a few rough edges.
- Solid tile fill may be faster with cfb in some cases (if the chip
doesn't have much video memory bandwidth to play with and the PCI
bus bandwidth is decent).
- Having a function for clipped filled spans that clips on the fly. This
doesn't exist yet anywhere in the source tree. This would be a minor
speed up for things like clipped filled polygons and arcs, and wide
lines.
- Having the pixmap cache store stipples in monochrome format, and using
color expansion features of the graphics chip to replicate them. This
is more efficient since less video memory bandwidth is required for
the cached pattern source. Not all chips support this kind of operation
easily, especially w.r.t. clipping of the leftmost edge (the first pixel
to be drawn may start at some bit of the leftmost video memory byte), and
defining the location of the monochrome pattern in video memory can be
a little complex.
- Taking more advantage of built-in (8x8) chip pattern registers. This
works OK now, but things not implemented include detection of
tiles that have only two colors so that they can be done with
color-expand 8x8 pattern fill, and interleaving schemes allowing 16
and 32 pixel high patterns to be done using the hardware pattern. Also
some chips support 16x8 and 32x8 pattern fill at 8bpp by using 16bpp
or 32bpp pattern fill. Currently, support for chips that require the
pattern to be aligned on a 64-pixel boundary is missing in most cases,
which in practice means the 8x8 pattern is not usable for many chips.
- Font-caching (useful for configurations where it's not possible to use
color expansion for text, and for certain fonts). Non-"terminal emulator"
fonts is certainly a weak area of XAA.
- Complete implementation of non-terminal emulator font text acceleration
using color expansion (the code is in place, but causes problems).
- Generic hardware-cursor code (this sounds very useful to me), including
Harald Koenig's support for real-time software/hardware cursor switching.
- More complete 24bpp-in-8bpp-mode support. Missing is full implementation
of color expansion schemes to allow 24bpp fills in 8bpp mode in two
passes.
- The Pentium optimized text bitmap functions exist only for 6 and 8-pixel
wide fonts. BTW, on a Cyrix 6x86 the Pentium-optimized 6-wide function
seems to cause a 2% performance decrease.
- Accelerated stipples using direct color expansion would definitely be
worthwhile. The lowest-level function is in place (but untested). It
would take care of cases where the font cache cannot be used (such as
24bpp, lack of transparency color compare for transparent stipples,
lack of off-screen video memory), or when color expansion is faster
(generally on video memory bandwidth-starved configurations).
3.1 More Concurrency?
More concurrency between graphics and CPU processing sounds very
attractive. This can be implemented by not "syncing" when leaving the
graphic drawing code, but instead allowing graphics commands to
continue while X is doing its request processing, or even during
context switching or when the client is running. The ever larger PCI
write buffers help to make this a very nice optimization. This requires
awareness of coprocessor activity at several levels in the server code
(for example, at any point where something is read or written to the
video card).
There are variations between chipsets that affect how easily they would
support such a scheme. The best behaviour is what I would call "in
order execution" of coprocessor commands and simple CPU writes to the
framebuffer. That is, if you send some graphics coprocessor commands to
the chip, and then write something to the framebuffer, it is guaranteed
that the framebuffer writes will only happen when the graphics commands
have been completed. This avoids, for example, having to check for
coprocessor activity each time something is drawn with a "dumb"
framebuffer function. I think that PCI write buffers on the motherboard
generally follow this behaviour, but graphics chips generally do not.
Of course, reading or querying anything from the graphics card is
something you will want to avoid, since in most cases this will result
in the CPU being stalled until all the PCI and on-chip write buffers
are flushed and processed. Chips that require frequent querying or do
not allow concurrent coprocessor execution and CPU framebuffer access
will take much less benefit.
A somewhat wild way to test this kind of scheme is to simply not define
the BACKGROUND_OPERATIONS flag, but despite that not do any syncing
in the graphics primitives. Without BACKGROUND_OPERATIONS set, the XAA
code almost never calls Sync itself. Someone (inadvertently) tried
this on an ET6000, and it seemed to measurably increase
performance. This is of course hazardous and prone to lock-ups etc.
4. Comparison Of Chip-specific Implementations
4.1 Current Chip-specific Implementations
ARK Logic
Uses BACKGROUND_OPERATIONS and COP_FRAMEBUFFER_CONCURRENCY. The
latter is vital for high-performance color expansion, since the
ARK chips don't appear to have CPU-to-screen color expansion.
There's no need to "sync" during a batch of accelerator commands;
the ARK chips seem to have "PCI-Retry" support.
Screen locations are programmed as pixel addresses. The ARK chip
also supports coordinates, but that restricts the possible
framebuffer widths and I don't think it would be faster.
FillRectSolid is provided. At 24bpp, it uses 8bpp coprocessor mode
which leads to RGB_EQUAL and NO_PLANEMASK restrictions.
ScreenToScreenCopy is supported, again restrictions at 24bpp:
NO_PLANEMASK and NO_TRANSPARENCY. BresenhamLine is very
straightforward.
Fill8x8Pattern is supported; the ARK chip requires the pattern to
be aligned on a 64-pixel boundary and the address modulo 64 seems
to indicate the vertical offset (y origin) (HARDWARE_PATTERN_MOD_
64_OFFSET). The latter means the pattern can actually be used (when
the framebuffer width is a multiple of 64), despite the limited
support for 64-pixel pattern alignment in XAA. The ARK chips don't
seem to have support for a monochrome pattern.
Color expansion is implemented using ScanlineScreenToScreen-
ColorExpand (24bpp: RGB_EQUAL, NO_PLANEMASK), which is pretty fast
thanks to COP_FRAMEBUFFER_CONCURRENCY.
Color expansion flags are VIDEO_SOURCE_GRANULARITY_PIXEL and
BIT_ORDER_IN_BYTE_LSBFIRST. At 24bpp, TRIPLE_BITS_24BPP would be
useful but is not yet supported by XAA.
ScreenToScreenColorExpand is provided for future use by XAA. One
thing that ARK chips can accelerate but is not yet provided by XAA
is styled (patterned) line drawing.
Cirrus Logic GD5426/28/29/30/34/40/46 and 7543/48
Uses BACKGROUND_OPERATIONS. The driver is shared by a very wide
range of largely compatible chips, from the first-generation
accelerator CL-GD5426 to the recent CL-GD5446, which is the only
one to support COP_FRAMEBUFFER_CONCURRENCY and also doesn't need
"sync"-ing between coprocessor operations. Screen locations are
programmed as byte addresses (which makes the driver larger than,
for example, ARK). The driver is compiled twice, with programmed I/O
(required for earlier chips) and with memory-mapped I/O.
FillSolidRect is provided (NO_PLANEMASK, since the chips don't
support a planemask), and at 24bpp on a non-5436/46 uses 8bpp mode
in which case RGB_EQUAL is set.
ScreenToScreenCopy is supported (NO_PLANEMASK). A few chips
(5429/30/34) don't support transparency color compare at all
(NO_TRANSPARENCY), and none of the chips support it at pixel depths
greater than 16bpp.
For CPU-to-screen color expansion, chips earlier than the CL-GD5436
don't support DWORD padding of scanlines, so the XAA code isn't
usable for them. Instead, these chips use byte-padding-aware text
acceleration code from the old accelerated driver, and the
ScanlineScreenToScreenColorExpand method (which isn't very fast
on these chips) is provided for other things. NO_PLANEMASK. The
5436/46 support 24bpp color expansion, but only with transparency
(ONLY_TRANSPARENCY_SUPPORTED); the others would benefit from
TRIPLE_BITS_24BPP. The bit order is BIT_ORDER_IN_BYTE_MSBFIRST. The
LEFT_EDGE_CLIPPING parameter (a value from 0 to 7) is supported for
CPU-to-screen color expansion. Screen-to-screen color expansion is
provided for future use. It requires the source to be aligned on a
DWORD boundary (VIDEO_SOURCE_GRANULARITY_DWORD).
Matrox Millennium
BACKGROUND_OPERATIONS
24bpp: NO_PLANEMASK
FillRectSolid
ScreenToScreenCopy (NO_TRANSPARENCY)
Color expansion:
CPUToScreenColorExpand
SCANLINE_PAD_DWORD
CPU_TRANSFER_PAD_DWORD
BIT_ORDER_IN_BYTE_LSBFIRST
LEFT_EDGE_CLIPPING
ScreenToScreenColorExpand
VIDEO_SOURCE_GRANULARITY_PIXEL
4.2 Chip-specific Performance
This table is intended to help with determining what kinds of
operations best suit a particular chip. It shows the results (in MB/s)
for the low-level bandwidth benchmarks run at start-up. Because refresh
is disabled at the time time benchmark is run, the result reflects the
full DRAM bandwidth on DRAM-based cards (the dot clock doesn't really
matter). For this reason, the comparison isn't really fair (biased against
VRAM/WRAM and MDRAM). The virtual display width can have an influence.
Chip ARK1000PV Trid9385 CLGD5434 TGUI9440 MGA-Mill ET6000
Memory 1MB DRAM 2MB DRAM 2MB DRAM 1MB DRAM 2MB WRAM 2MB MDRAM
CPU DX4/100 ? DX4/100 AMK5/100 AMK5/100 6x86P150+
Bus PCI 33MHz PCI VLB 33MHz PCI 33MHz PCI 33MHz PCI 30MHz
bpp, width 8bpp 1024 8bpp 8bpp 1024 8bpp 8bpp 8bpp
-----------------------------------------------------------------------------
framebuffer 43.95 15.89 32.76 44.23 44.48 61.40
solid filled rect
10x1 7.38 3.14 4.58 8.72 34.99 28.22
40x40 85.82 89.93 120.34 62.60 369.84 143.84
400x400 108.81 157.20 211.18 80.03 1618.11 264.35
screen copy
10x10 24.77 11.26 20.49 18.53 28.14 24.22
40x40 38.81 43.90 41.68 32.11 89.27 70.57
400x400 46.70 68.59 54.47 34.14 126.88 194.23
400x400 scroll - - 55.63 40.22 - 189.34
8x8 pattern fill
400x400 105.16 116.08 - 80.02 - 264.34
color expansion
CPU to screen - - 116.25 - 261.03* -
scanl. scr-to-scr 71.75 - 80.64 - - 187.25
10x10 scr-to-scr 29.90 - 26.69 20.07 - -
Chip MGA-Mill MGA-Mill MGA-Mill TGUI9680 ARK2000PV
Memory 2MB WRAM 4MB WRAM 2MB WRAM 2MB DRAM 2MB EDO
CPU DX4/133 P133 P133 DX4/100
Bus PCI 33MHz PCI 33MHz PCI 33Mhz PCI PCI 33MHz
bpp, width 8bpp 8bpp 1024 8bpp 1152 8bpp 2048 8bpp 1024
-------------------------------------------------------------------
framebuffer 26.48 83.70 83.13 10.09 64.44
solid filled rect
10x1 28.63 34.28 41.30 3.06 7.91
40x40 385.13 316.47 453.02 55.97 155.76
400x400 1656.93 1367.64 1942.59 145.16 244.35
screen copy
10x10 29.18 23.47 35.09 12.61 33.20
40x40 81.52 74.55 99.92 35.38 78.80
400x400 114.81 105.83 137.60 46.48 99.03
400x400 scroll - - - 51.88 100.08
8x8 pattern fill
400x400 - - - 51.74 228.70
color expansion
CPU to screen 211.11* 419.09* 416.23* - -
scanl. scr-to-scr - - - - 137.97
10x10 scr-to-scr - - - 15.26 36.79
Chip ARK2000PV ARK2000PV MGA-Mill CL-GD5426 CL-GD5446
Memory 2MB EDO 2MB EDO 2MB WRAM 1MB DRAM 2MB DRAM
CPU 6x86P150+ 6x86P166+ 6x86P166+ DX4/100 P133
Bus PCI 30MHz PCI 33Mhz PCI 33MHz VLB 33MHz PCI 33Mhz
bpp, width 8bpp 1024 8bpp 1024 8bpp 1024 8bpp 1024 8bpp 1280
-------------------------------------------------------------------
framebuffer 88.84 100.45 92.18 13.22 80.81
solid filled rect
10x1 20.99 22.82 41.30 1.16 25.49
40x40 157.00 157.00 458.38 28.88 167.10
400x400 244.34 244.34 1961.74 41.74 218.02
screen copy
10x10 33.22 33.23 35.09 5.35 34.39
40x40 78.81 78.81 99.93 15.83 77.71
400x400 99.13 99.02 137.87 20.98 97.03
400x400 scroll 100.09 100.09 664.02 21.18 98.55
8x8 pattern fill
400x400 228.68 228.72 - 41.10 217.91
color expansion
CPU to screen - 730.52 - 221.10
scanl. scr-to-scr 138.01 138.09 - 19.56 -
10x10 scr-to-scr 36.87 36.86 39.19 6.16 66.46
(*) After this benchmark was taken, the color expansion benchmark was
changed to write a pattern including both colors instead of just the
background one, which is likely to affect the score.
The 10x1 filled rectangles score tells a lot about the command overhead
for small fills, which is important for operations that fill span-by-span.
The 10x10 and 40x40 screencopy give an impression of pixmap cache
efficiency, while the 10x10 score also indicates how a simple font cache
would perform (compare with color expansion). The 10x10 screen-to-screen
color expand score reflects a smarter kind of font cache.
If your implementation seems weak at a particular kind of operation, maybe
you are not doing it optimally and can improve it (usually by reducing
the command overhead, for example by minimizing the number of graphics
chip queries).
5. Development notes
When adding a function to the GCInfoRec or AccelInfoRec, make sure to
have a Makefile with dependencies (run make depend after doing make
Makefile). If you don't, you're bound to get unexplainable core dumps.
That also applies to SVGA drivers using the new interface; they should
be recompiled after a new version of the generic acceleration code
in installed.
Header files:
vgabpp.h Declares the new ScreenInit functions for each depth.
xf86xaa.h General public definitions, including the GCInfoRec and
AccelInfoRec.
xf86scrin.h XAA screen initialization functions.
xf86local.h Declares functions local to the generic acceleration code.
xf86gcmap.h Maps names of some local functions to depth-specific
versions.
xf86maploc.h Declares local functions that are name-mapped depending on
the depth.
vga256map.h Maps the name of some cfb functions to their vga256
equivalents. This is used for the vga256 version of the
GC validation code.
xf86pcache.h Some declarations for the pixmap cache.
xf86expblt.h Declares monochrome data color-expansion blit functions
defined in xf86expblt.c
6. Acknowledgements
The Mach64 server by Kevin Martin has been used a base for some parts
(notably pixmap caching), and the set of functions accelerated in the
Mach64 server provided a baseline for what to implement first.
$XFree86: xc/programs/Xserver/hw/xfree86/xaa/NOTES,v 3.20 1998/07/26 00:40:48 dawes Exp $
|