1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
|
- Acceleration
- Blits and solid fill
- XAA and the shadow buffer will not work together, because the
shadow buffer updates in the block handler, so if we got any XAA
calls in between, things would get messed up.
Current plan:
- Add our own damage tracker that produces raw rectangles
- Whenever it fires, submit the copy immediately
- Wrap the necessary ops in such a way that the original
implementation gets called first. The original implementation
will use fb, which will produce damage, which will get
submitted.
If we decide to accelerate a particular operation, first set
a flag that the immediately following damage event should not
result in spice protocol being sent. Ie.,
on_op:
qxl->enable_copying = FALSE
call original;
send acceleration command
qxl->enable_copying = TRUE
Note damage is added before the drawing hits the framebuffer, so
it will have to be stored, then cleared
- in a block handler
- before accelerating
Ie.,
on_op:
clear damage
disable damage reporting
call original (this will generate unreported damage and
paint to the shadow)
submit command
enable damage
It may be possible to use the shadow code if we added a
shadowReportNow() that would report any existing
damage. Ie., basically export shadowRedisplay()
1. Get damage added, out of CreateScreenResources
2. Make sure it works
3. Submit copies and disable shadow
4. Delete shadow
5. Wrap some of the ops, or use XAA?
The input we get is:
- First a damage notification: "I am going to draw here"
- Then maybe an exa notification
So the algorithm is.
Maintain a "to_copy" region to be copied into the device
- in damage, if there is anything in to_copy, copy it
- in block handler, if there is anything in to_copy, copy it
- in exa, if we manage to accelerate, delete to_copy.
Unfortunately, for core text, what happens is
- damage is produced for the glyph box
- solid fill is generated
- the glyph is drawn
And the algorithm above means the damage is thrown away.
- Coding style fixes
- Better malloc() implementation
- Take malloc() from the windows driver?
- Put blocks in a tree?
- Find out why it picks 8x6 rather than a reasonable mode
- Possibly has to do with the timings it reports. RandR only
allows 8x6 and 6x4.
- Only compile mmtest if glib is installed
Or maybe just get rid of mmtest.c
- Notes on offscreen pixmaps
Yaniv says that PCI resources is a concern and that it would be better
if we can use guest memory instead of video memory. I guess we can
do that, given a kernel driver that can allocate pinned memory.
- If/when we add hardware acceleration to pixman, pixman will need to
generate QXL protocol. This could be tricky because DRM assumes that
everything is a pixmap, but qxl explicitly has a framebuffer. Same
goes for cairo-drm.
- Hashing
QXL has a feature where it can send hash codes for pixmaps. Unfortunately
most of the pixmaps we use are very shortlived. But there may be a benefit
for the root pixmap (and in general for the (few) windows that have
a pixmap background).
- When copying from pixmap to framebuffer, right now we just copy
the bits from the fb allocated pixmap.
- With hashing, we need to copy it to video memory, hash it, then set the
"unique" field to that hash value (plus the QXL_CACHE
flag). Presumably we'll get a normal remove on it when it is no
longer in use.
- If we know an image is available in video memory already, we should just
submit it. There is no race condition here because the image is
ultimately removed from vmem by the driver.
(Note hash value could probably just be XID plus a serial number).
- So for the proof of concept we'll be hashing complete pixmaps every time
we submit them.
- Tiles
It may be beneficial to send pixmaps in smaller tiles, though Yaniv
says we will need atomic drawing to prevent tearing.
- Video
We should certainly support Xv. The scaled blits should be sent
as commands, rather than as software. Does spice support YUV images?
If not, then it probably should.
- Multi-monitor:
- Windows may not support more than dual-head, but we do support more than
dual-head in spice. This is why they do the multi-pci device.
Ie,. the claim is that Yaniv did not find any API that would
support more than two outputs per PCI device. (This seems dubious
given that four-head cards do exist).
- Linux multi-monitor configuration supports hotplug of monitors,
and you can't make up PCI devices from inside the driver.
- On windows the guest agent is responsible for setting the monitors
and resolutions.
- On linux we should support EDID information, and enabling and
disabling PCI devices on the fly is pretty difficult to deal with
in X. Ie., we would need working support for both GPU hotplug and
for shatter. This is just not happening in RHEL 5 or 6.
- Reading back EDID over the spice protocol would be necessary
because when you hit detect displays, that's what needs to happen.
Better acceleration:
- Given offscreen pixmaps, we should get rid of the shadow framebuffer.
If we have to fall back to software, we can use the drawing area to
get the area in question, then copy them to qxl_malloced memory,
then draw there, then finally send the bits.
-=-=-=-=-
Done:
Question:
- Submit cursor images
- Note: when we set a mode, all allocated memory should be considered
released.
- What is the "vram" PCI range used for?
As I read the Windows driver, it can be mapped with the ioctl
VIDEO_MAP_VIDEO_MEMORY. In driver.c it is mapped as pdev->fb, but
it is then never used for anything as far as I can tell.
Does Windows itself use that ioctl, and if so, for what. The area
is only 32K in size so it can't really be used for any realistic
bitmaps.
It's a required ioctl. I believe it's needed for DGA-like things.
I have no idea how the Windows driver manages syncing for that,
but I think we can safely ignore it. [ajax]
- Hook up randr if it isn't already
- Garbage collection
- Before every allocation?
- When we run out of memory?
- Whenever we overflow some fixed pool?
- Get rid of qxl_mem.h header; just use qxl.h
- Split out ring code into qxl_ring.c
- Don't keep the maps around that are just used in preinit
(Is there any real reason to not just do the CheckDevice in
ScreenInit?)
|