architecture.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>

<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<link href="wayland.css" rel="stylesheet" type="text/css">
<script type="text/javascript" src="generated-toc.js"></script>
<title>Wayland</title>
</head>

<body>
<h1><a href="/"><img src="wayland.png" alt="Wayland logo"></a></h1>

<div id="generated-toc" class="generate_from_h2"></div>

<h2>Wayland Architecture</h2>

<p>A good way to understand the wayland architecture and how it is
different from X is to follow an event from the input device to the
point where the change it affects appears on screen.</p>

<p>This is where we are now with X:</p>
<p><img src="x-architecture.png" alt="X architecture diagram"></p>

<ol>
  <li>
    The kernel gets an event from an input device and sends it to X
    through the evdev input driver.  The kernel does all the hard work
    here by driving the device and translating the different device
    specific event protocols to the linux evdev input event standard.
  </li>

  <li>
    The X server determines which window the event affects and sends
    it to the clients that have selected for the event in question on
    that window.  The X server doesn't actually know how to do this
    right, since the window location on screen is controlled by the
    compositor and may be transformed in a number of ways that the X
    server doesn't understand (scaled down, rotated, wobbling, etc).
  </li>

  <li>
    The client looks at the event and decides what to do.  Often the
    UI will have to change in response to the event - perhaps a check
    box was clicked or the pointer entered a button that must be
    highlighted.  Thus the client sends a rendering request back to
    the X server.
  </li>

  <li>
    When the X server receives the rendering request, it sends it to
    the driver to let it program the hardware to do the rendering.
    The X server also calculates the bounding region of the rendering,
    and sends that to the compositor as a <em>damage event</em>.
  </li>

  <li>
    The damage event tells the compositor that something changed in
    the window and that it has to recomposite the part of the screen
    where that window is visible.  The compositor is responsible for
    rendering the entire screen contents based on its scenegraph and
    the contents of the X windows.  Yet, it has to go through the X
    server to render this.
  </li>

  <li>
    The X server receives the rendering requests from the compositor
    and either copies the compositor back buffer to the front buffer
    or does a pageflip.  In the general case, the X server has to do
    this step so it can account for overlapping windows, which may
    require clipping and determine whether or not it can page flip.
    However, for a compositor, which is always fullscreen, this is
    another unnecessary context switch.
  </li>
</ol>

<p>
  As suggested above, there are a few problems with this approach.
  The X server doesn't have the information to decide which window
  should receive the event, nor can it transform the screen
  coordinates to window-local coordinates.  And even though X has
  handed responsibility for the final painting of the screen to the
  compositing manager, X still controls the front buffer and
  modesetting.  Most of the complexity that the X server used to
  handle is now available in the kernel or self contained libraries
  (KMS, evdev, mesa, fontconfig, freetype, cairo, Qt, etc).  In
  general, the X server is now just a middle man that introduces an
  extra step between applications and the compositor and an extra step
  between the compositor and the hardware.
</p>

<p>
  In wayland the compositor <em>is</em> the display server.  We
  transfer the control of KMS and evdev to the compositor.  The
  wayland protocol lets the compositor send the input events directly
  to the clients and lets the client send the damage event directly to
  the compositor:
</p>

<p><img src="wayland-architecture.png" alt="Wayland architecture diagram"></p>

<ol>
  <li>
    The kernel gets an event and sends it to the compositor.  This is
    similar to the X case, which is great, since we get to reuse all
    the input drivers in the kernel.
  </li>

  <li>
    The compositor looks through its scenegraph to determine which
    window should receive the event.  The scenegraph corresponds to
    what's on screen and the compositor understands the
    transformations that it may have applied to the elements in the
    scenegraph.  Thus, the compositor can pick the right window and
    transform the screen coordinates to window-local coordinates, by
    applying the inverse transformations.  The types of transformation
    that can be applied to a window is only restricted to what the
    compositor can do, as long as it can compute the inverse
    transformation for the input events.
  </li>

  <li>
    As in the X case, when the client receives the event, it updates
    the UI in response.  But in the wayland case, the rendering
    happens in the client, and the client just sends a request to the
    compositor to indicate the region that was updated.
  </li>

  <li>
    The compositor collects damage requests from its clients and then
    recomposites the screen.  The compositor can then directly issue
    an ioctl to schedule a pageflip with KMS.
  </li>
</ol>

<h2>Wayland Rendering</h2>

<p>
  One of the details I left out in the above overview is how clients
  actually render under wayland.  By removing the X server from the
  picture we also removed the mechanism by which X clients typically
  render.  But there's another mechanism that we're already using with
  DRI2 under X: <em>direct rendering</em>.  With direct rendering, the
  client and the server share a video memory buffer.  The client links
  to a rendering library such as OpenGL that knows how to program the
  hardware and renders directly into the buffer.  The compositor in
  turn can take the buffer and use it as a texture when it composites
  the desktop.  After the initial setup, the client only needs to tell
  the compositor which buffer to use and when and where it has
  rendered new content into it.
</p>

<p>
  This leaves an application with two ways to update its window
  contents:
</p>

<ol>
  <li>
    Render the new content into a new buffer and tell the compositor
    to use that instead of the old buffer.  The application can
    allocate a new buffer every time it needs to update the window
    contents or it can keep two (or more) buffers around and cycle
    between them.  The buffer management is entirely under application
    control.
  </li>

  <li>
    Render the new content into the buffer that it previously told the
    compositor to use.  While it's possible to just render directly
    into the buffer shared with the compositor, this might race with
    the compositor.  What can happen is that repainting the window
    contents could be interrupted by the compositor repainting the
    desktop.  If the application gets interrupted just after clearing
    the window but before rendering the contents, the compositor will
    texture from a blank buffer.  The result is that the application
    window will flicker between a blank window or half-rendered
    content.  The traditional way to avoid this is to render the new
    content into a back buffer and then copy from there into the
    compositor surface.  The back buffer can be allocated on the fly
    and just big enough to hold the new content, or the application
    can keep a buffer around.  Again, this is under application
    control.
  </li>
</ol>

<p>
  In either case, the application must tell the compositor which area
  of the surface holds new contents.  When the application renders
  directly to the shared buffer, the compositor needs to be noticed
  that there is new content.  But also when exchanging buffers, the
  compositor doesn't assume anything changed, and needs a request from
  the application before it will repaint the desktop.  The idea that
  even if an application passes a new buffer to the compositor, only a
  small part of the buffer may be different, like a blinking cursor or
  a spinner.
</p>

<h2>Hardware Enabling for Wayland</h2>

<p>
  Typically, hardware enabling includes modesetting/display and
  EGL/GLES2.  On top of that, Wayland needs a way to share buffers
  efficiently between processes.  There are two sides to that, the
  client side and the server side.
</p>

<p>
  On the client side we've defined a Wayland EGL platform.  In the EGL
  model, that consists of the native types (EGLNativeDisplayType,
  EGLNativeWindowType and EGLNativePixmapType) and a way to create
  those types.  In other words, it's the glue code that binds the EGL
  stack and its buffer sharing mechanism to the generic Wayland API.
  The EGL stack is expected to provide an implementation of the
  Wayland EGL platform.  The full API is in
  the <a href="https://cgit.freedesktop.org/wayland/wayland/tree/src/wayland-egl.h">wayland-egl.h</a>
  header.  The open source implementation in the mesa EGL stack is
  in <a href="https://cgit.freedesktop.org/mesa/mesa/tree/src/egl/wayland/wayland-egl/wayland-egl.c">wayland-egl.c</a>
  and <a href="https://cgit.freedesktop.org/mesa/mesa/tree/src/egl/drivers/dri2/platform_wayland.c">platform_wayland.c</a>.
</p>

<p>
  Under the hood, the EGL stack is expected to define a
  vendor-specific protocol extension that lets the client side EGL
  stack communicate buffer details with the compositor in order to
  share buffers.  The point of the wayland-egl.h API is to abstract
  that away and just let the client create an EGLSurface for a Wayland
  surface and start rendering.  The open source stack uses
  the <a href="https://cgit.freedesktop.org/mesa/mesa/tree/src/egl/wayland/wayland-drm/wayland-drm.xml">drm</a>
  Wayland extension, which lets the client discover the drm device to
  use and authenticate and then share drm (GEM) buffers with the
  compositor.
</p>

<p>
  The server side of Wayland is the compositor and core UX for the
  vertical, typically integrating task switcher, app launcher, lock
  screen in one monolithic application.  The server runs on top of a
  modesetting API (kernel modesetting, OpenWF Display or similar) and
  composites the final UI using a mix of EGL/GLES2 compositor and
  hardware overlays if available.  Enabling modesetting, EGL/GLES2 and
  overlays is something that should be part of standard hardware
  bringup.  The extra requirement for Wayland enabling is
  the <a href="https://cgit.freedesktop.org/mesa/mesa/tree/docs/specs/WL_bind_wayland_display.spec">EGL_WL_bind_wayland_display</a>
  extension that lets the compositor create an EGLImage from a generic
  Wayland shared buffer.  It's similar to
  the <a href="http://www.khronos.org/registry/egl/extensions/KHR/EGL_KHR_image_pixmap.txt">EGL_KHR_image_pixmap</a>
  extension to create an EGLImage from an X pixmap.
</p>

<p>
  The extension has a setup step where you have to bind the EGL
  display to a Wayland display.  Then as the compositor receives
  generic Wayland buffers from the clients (typically when the client
  calls eglSwapBuffers), it will be able to pass the struct wl_buffer
  pointer to eglCreateImageKHR as the EGLClientBuffer argument and
  with EGL_WAYLAND_BUFFER_WL as the target.  This will create an
  EGLImage, which can then be used by the compositor as a texture or
  passed to the modesetting code to use as an overlay plane.  Again,
  this is implemented by the vendor specific protocol extension, which
  on the server side will receive the driver specific details about
  the shared buffer and turn that into an EGL image when the user
  calls eglCreateImageKHR.
</p>

</body>
</html>