doc/Wayland/en_US/Architecture.xml


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318

<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
<!ENTITY % BOOK_ENTITIES SYSTEM "Wayland.ent">
%BOOK_ENTITIES;
]>
<chapter id="chap-Wayland-Architecture">
	<title>Wayland Architecture</title>
	<section id="sect-Wayland-Architecture-wayland_architecture">
		<title>X vs. Wayland Architecture</title>
		<para>
			A good way to understand the wayland architecture
			and how it is different from X is to follow an event
			from the input device to the point where the change
			it affects appears on screen.
		</para>
		<para>
			This is where we are now with X:
		</para>
		<mediaobject>
			<imageobject>
				<imagedata fileref="images/x-architecture.png" format="PNG" />
			</imageobject>
		</mediaobject>
		<para>
			<orderedlist>
				<listitem>
					<para>
						The kernel gets an event from an input
						device and sends it to X through the evdev
						input driver. The kernel does all the hard
						work here by driving the device and
						translating the different device specific
						event protocols to the linux evdev input
						event standard.
					</para>
				</listitem>
				<listitem>
					<para>
						The X server determines which window the
						event affects and sends it to the clients
						that have selected for the event in question
						on that window. The X server doesn't
						actually know how to do this right, since
						the window location on screen is controlled
						by the compositor and may be transformed in
						a number of ways that the X server doesn't
						understand (scaled down, rotated, wobbling,
						etc).
					</para>
				</listitem>
				<listitem>
					<para>
						The client looks at the event and decides
						what to do. Often the UI will have to change
						in response to the event - perhaps a check
						box was clicked or the pointer entered a
						button that must be highlighted. Thus the
						client sends a rendering request back to the
						X server.
					</para>
				</listitem>
				<listitem>
					<para>
						When the X server receives the rendering
						request, it sends it to the driver to let it
						program the hardware to do the rendering.
						The X server also calculates the bounding
						region of the rendering, and sends that to
						the compositor as a damage event.
					</para>
				</listitem>
				<listitem>
					<para>
						The damage event tells the compositor that
						something changed in the window and that it
						has to recomposite the part of the screen
						where that window is visible. The compositor
						is responsible for rendering the entire
						screen contents based on its scenegraph and
						the contents of the X windows. Yet, it has
						to go through the X server to render this.
					</para>
				</listitem>
				<listitem>
					<para>
						The X server receives the rendering requests
						from the compositor and either copies the
						compositor back buffer to the front buffer
						or does a pageflip. In the general case, the
						X server has to do this step so it can
						account for overlapping windows, which may
						require clipping and determine whether or
						not it can page flip. However, for a
						compositor, which is always fullscreen, this
						is another unnecessary context switch.
					</para>
				</listitem>
			</orderedlist>
		</para>
		<para>
			As suggested above, there are a few problems with this
			approach. The X server doesn't have the information to
			decide which window should receive the event, nor can it
			transform the screen coordinates to window local
			coordinates. And even though X has handed responsibility for
			the final painting of the screen to the compositing manager,
			X still controls the front buffer and modesetting. Most of
			the complexity that the X server used to handle is now
			available in the kernel or self contained libraries (KMS,
			evdev, mesa, fontconfig, freetype, cairo, Qt etc). In
			general, the X server is now just a middle man that
			introduces an extra step between applications and the
			compositor and an extra step between the compositor and the
			hardware.
		</para>
		<para>
			In wayland the compositor is the display server. We transfer
			the control of KMS and evdev to the compositor. The wayland
			protocol lets the compositor send the input events directly
			to the clients and lets the client send the damage event
			directly to the compositor:
		</para>
		<mediaobject>
			<imageobject>
				<imagedata fileref="images/wayland-architecture.png" format="PNG" />
			</imageobject>
		</mediaobject>
		<para>
			<orderedlist>
				<listitem>
					<para>
						The kernel gets an event and sends
						it to the compositor. This
						is similar to the X case, which is
						great, since we get to reuse all the
						input drivers in the kernel.
					</para>
				</listitem>
				<listitem>
					<para>
						The compositor looks through its
						scenegraph to determine which window
						should receive the event. The
						scenegraph corresponds to what's on
						screen and the compositor
						understands the transformations that
						it may have applied to the elements
						in the scenegraph. Thus, the
						compositor can pick the right window
						and transform the screen coordinates
						to window local coordinates, by
						applying the inverse
						transformations. The types of
						transformation that can be applied
						to a window is only restricted to
						what the compositor can do, as long
						as it can compute the inverse
						transformation for the input events.
					</para>
				</listitem>
				<listitem>
					<para>
						As in the X case, when the client
						receives the event, it updates the
						UI in response. But in the wayland
						case, the rendering happens in the
						client, and the client just sends a
						request to the compositor to
						indicate the region that was
						updated.
					</para>
				</listitem>
				<listitem>
					<para>
						The compositor collects damage
						requests from its clients and then
						recomposites the screen. The
						compositor can then directly issue
						an ioctl to schedule a pageflip with
						KMS.
					</para>
				</listitem>


			</orderedlist>
		</para>
	</section>
	<section id="sect-Wayland-Architecture-wayland_rendering">
		<title>Wayland Rendering</title>
		<para>
			One of the details I left out in the above overview
			is how clients actually render under wayland. By
			removing the X server from the picture we also
			removed the mechanism by which X clients typically
			render. But there's another mechanism that we're
			already using with DRI2 under X: direct rendering.
			With direct rendering, the client and the server
			share a video memory buffer. The client links to a
			rendering library such as OpenGL that knows how to
			program the hardware and renders directly into the
			buffer. The compositor in turn can take the buffer
			and use it as a texture when it composites the
			desktop. After the initial setup, the client only
			needs to tell the compositor which buffer to use and
			when and where it has rendered new content into it.
		</para>

		<para>
			This leaves an application with two ways to update its window contents:
		</para>
		<para>
			<orderedlist>
				<listitem>
					<para>
						Render the new content into a new buffer and tell the compositor
						to use that instead of the old buffer. The application can
						allocate a new buffer every time it needs to update the window
						contents or it can keep two (or more) buffers around and cycle
						between them. The buffer management is entirely under
						application control.
					</para>
				</listitem>
				<listitem>
					<para>
						Render the new content into the buffer that it previously
						told the compositor to to use. While it's possible to just
						render directly into the buffer shared with the compositor,
						this might race with the compositor. What can happen is that
						repainting the window contents could be interrupted by the
						compositor repainting the desktop. If the application gets
						interrupted just after clearing the window but before
						rendering the contents, the compositor will texture from a
						blank buffer. The result is that the application window will
						flicker between a blank window or half-rendered content. The
						traditional way to avoid this is to render the new content
						into a back buffer and then copy from there into the
						compositor surface. The back buffer can be allocated on the
						fly and just big enough to hold the new content, or the
						application can keep a buffer around. Again, this is under
						application control.
					</para>
				</listitem>
			</orderedlist>
		</para>
		<para>
			In either case, the application must tell the compositor
			which area of the surface holds new contents. When the
			application renders directly the to shared buffer, the
			compositor needs to be noticed that there is new content.
			But also when exchanging buffers, the compositor doesn't
			assume anything changed, and needs a request from the
			application before it will repaint the desktop. The idea
			that even if an application passes a new buffer to the
			compositor, only a small part of the buffer may be
			different, like a blinking cursor or a spinner.
			Hardware Enabling for Wayland
		</para>
		<para>
			Typically, hardware enabling includes modesetting/display
			and EGL/GLES2. On top of that Wayland needs a way to share
			buffers efficiently between processes. There are two sides
			to that, the client side and the server side.
		</para>
		<para>
			On the client side we've defined a Wayland EGL platform. In
			the EGL model, that consists of the native types
			(EGLNativeDisplayType, EGLNativeWindowType and
			EGLNativePixmapType) and a way to create those types. In
			other words, it's the glue code that binds the EGL stack and
			its buffer sharing mechanism to the generic Wayland API. The
			EGL stack is expected to provide an implementation of the
			Wayland EGL platform. The full API is in the wayland-egl.h
			header. The open source implementation in the mesa EGL stack
			is in wayland-egl.c and platform_wayland.c.
		</para>
		<para>
			Under the hood, the EGL stack is expected to define a
			vendor-specific protocol extension that lets the client side
			EGL stack communicate buffer details with the compositor in
			order to share buffers. The point of the wayland-egl.h API
			is to abstract that away and just let the client create an
			EGLSurface for a Wayland surface and start rendering. The
			open source stack uses the drm Wayland extension, which lets
			the client discover the drm device to use and authenticate
			and then share drm (GEM) buffers with the compositor.
		</para>
		<para>
			The server side of Wayland is the compositor and core UX for
			the vertical, typically integrating task switcher, app
			launcher, lock screen in one monolithic application. The
			server runs on top of a modesetting API (kernel modesetting,
			OpenWF Display or similar) and composites the final UI using
			a mix of EGL/GLES2 compositor and hardware overlays if
			available. Enabling modesetting, EGL/GLES2 and overlays is
			something that should be part of standard hardware bringup.
			The extra requirement for Wayland enabling is the
			EGL_WL_bind_wayland_display extension that lets the
			compositor create an EGLImage from a generic Wayland shared
			buffer. It's similar to the EGL_KHR_image_pixmap extension
			to create an EGLImage from an X pixmap.
		</para>
		<para>
			The extension has a setup step where you have to bind the
			EGL display to a Wayland display. Then as the compositor
			receives generic Wayland buffers from the clients (typically
			when the client calls eglSwapBuffers), it will be able to
			pass the struct wl_buffer pointer to eglCreateImageKHR as
			the EGLClientBuffer argument and with EGL_WAYLAND_BUFFER_WL
			as the target. This will create an EGLImage, which can then
			be used by the compositor as a texture or passed to the
			modesetting code to use as an overlay plane. Again, this is
			implemented by the vendor specific protocol extension, which
			on the server side will receive the driver specific details
			about the shared buffer and turn that into an EGL image when
			the user calls eglCreateImageKHR.
		</para>
	</section>
</chapter>