summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorsteph@migrax <steph@migrax>2010-10-10 11:18:44 -0700
committersteph@migrax <steph@migrax>2010-10-10 11:18:44 -0700
commit27e91711d2485b2949af03046bbc6de1c0fd98e6 (patch)
treeac21ac998124057f1185ad6bf82dd62f1594d507
parent42cbca3b75488bdf3d6f8e6ef00195c29adf4c1f (diff)
parent0a9904283c2e6a159f5284b52a8b796d57e1a470 (diff)
Figures...
-rwxr-xr-xlinuxgraphicsdrivers.lyx1995
1 files changed, 1941 insertions, 54 deletions
diff --git a/linuxgraphicsdrivers.lyx b/linuxgraphicsdrivers.lyx
index d4e43d1..76dbd18 100755
--- a/linuxgraphicsdrivers.lyx
+++ b/linuxgraphicsdrivers.lyx
@@ -59,7 +59,7 @@
\begin_modules
theorems-ams
\end_modules
-\language english
+\language american
\inputencoding auto
\font_roman palatino
\font_sans default
@@ -115,6 +115,8 @@ theorems-ams
\begin_body
\begin_layout Title
+
+\lang english
Linux Graphics Drivers: an Introduction
\begin_inset Newline newline
\end_inset
@@ -125,6 +127,8 @@ Version 2
\end_layout
\begin_layout Author
+
+\lang english
Stéphane Marchesin
\begin_inset Newline newline
\end_inset
@@ -133,6 +137,8 @@ Stéphane Marchesin
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset CommandInset toc
LatexCommand tableofcontents
@@ -142,6 +148,8 @@ LatexCommand tableofcontents
\end_layout
\begin_layout Chapter
+
+\lang english
Introduction
\begin_inset CommandInset label
LatexCommand label
@@ -185,10 +193,14 @@ Accelerating graphics is a complex art which suffers a mostly unjustified
\end_layout
\begin_layout Section
+
+\lang english
Book overview
\end_layout
\begin_layout Standard
+
+\lang english
The book starts with an introduction of relevant hardware concepts (Chapter
\begin_inset CommandInset ref
@@ -311,6 +323,8 @@ reference "cha:Conclusions"
\end_layout
\begin_layout Standard
+
+\lang english
Each chapter finishes with the
\begin_inset Quotes eld
\end_inset
@@ -323,10 +337,14 @@ takeaways
\end_layout
\begin_layout Section
+
+\lang english
What this book does not cover
\end_layout
\begin_layout Standard
+
+\lang english
Computer graphics move at a fast pace, and this book is not about the past.
Obsolete hardware (isa, vlb, ...), old standards (the vga standard and its
dreadful int10, vesa), outdated techniques (user space modesetting) and
@@ -334,6 +352,8 @@ Computer graphics move at a fast pace, and this book is not about the past.
\end_layout
\begin_layout Chapter
+
+\lang english
A Look at the Hardware
\begin_inset CommandInset label
LatexCommand label
@@ -345,6 +365,8 @@ name "cha:A-Look-at"
\end_layout
\begin_layout Standard
+
+\lang english
Before diving any further into the subject of graphics drivers, we need
to understand the graphcs hardware.
This chapter is by no means intended to be a complete description of all
@@ -367,10 +389,14 @@ cover the bases
\end_layout
\begin_layout Section
+
+\lang english
Hardware Overview
\end_layout
\begin_layout Standard
+
+\lang english
Today all computers are architectured the same way: a central processor
and a number of peripherals.
In order to exchange data, these peripherals are interconnected by a bus
@@ -386,6 +412,8 @@ reference "fig:Peripheral-interconnection-in"
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float figure
wide false
sideways false
@@ -394,6 +422,8 @@ status open
\begin_layout Plain Layout
\noindent
\align center
+
+\lang english
\begin_inset ERT
status open
@@ -619,9 +649,13 @@ end{tikzpicture}
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
+
+\lang english
\begin_inset CommandInset label
LatexCommand label
name "fig:Peripheral-interconnection-in"
@@ -653,6 +687,8 @@ The first user of the bus is the CPU.
\end_layout
\begin_layout Standard
+
+\lang english
If a peripheral has the ability to achieve DMA to or from an uncontiguous
list of memory pages (which is very convenient when the data is not contiguous
in memory), it is said to have DMA scatter-gather capability (as it can
@@ -660,6 +696,8 @@ If a peripheral has the ability to achieve DMA to or from an uncontiguous
\end_layout
\begin_layout Standard
+
+\lang english
Notice that the DMA capability can be a downside in some cases.
For example on real time systems, this means the CPU is unable to access
the bus while a DMA transaction is in progress, and since DMA transactions
@@ -669,10 +707,14 @@ Notice that the DMA capability can be a downside in some cases.
\end_layout
\begin_layout Section
+
+\lang english
Bus types
\end_layout
\begin_layout Standard
+
+\lang english
Buses connect the machine peripherals together; each and every communication
between different peripherals goes over (at least) one bus.
In particular, a bus is the way most graphics card are connected to the
@@ -692,6 +734,8 @@ reference "fig:Common-bus-types"
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float figure
wide false
sideways false
@@ -699,6 +743,8 @@ status open
\begin_layout Plain Layout
\align center
+
+\lang english
\begin_inset Tabular
<lyxtabular version="3" rows="8" columns="5">
<features>
@@ -712,6 +758,8 @@ status open
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
Bus type
\end_layout
@@ -721,6 +769,8 @@ Bus type
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
Bus width
\end_layout
@@ -730,6 +780,8 @@ Bus width
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
Frequency
\end_layout
@@ -739,6 +791,8 @@ Frequency
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
Bandwidth
\end_layout
@@ -748,6 +802,8 @@ Bandwidth
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
Capabilities
\end_layout
@@ -759,6 +815,8 @@ Capabilities
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
PCI
\end_layout
@@ -768,6 +826,8 @@ PCI
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
32 bits
\end_layout
@@ -777,6 +837,8 @@ PCI
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
33 Mhz
\end_layout
@@ -786,6 +848,8 @@ PCI
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
133 Mb/s (33 Mhz)
\end_layout
@@ -795,6 +859,8 @@ PCI
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
-
\end_layout
@@ -806,6 +872,8 @@ PCI
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
AGP
\end_layout
@@ -815,6 +883,8 @@ AGP
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
32 bits
\end_layout
@@ -824,6 +894,8 @@ AGP
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
66 Mhz
\end_layout
@@ -833,6 +905,8 @@ AGP
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
2100Mb/s (8x)
\end_layout
@@ -842,6 +916,8 @@ AGP
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
SBA, FW,
\end_layout
@@ -889,6 +965,8 @@ SBA, FW,
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
GART
\end_layout
@@ -900,6 +978,8 @@ GART
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
PCI-X
\end_layout
@@ -909,6 +989,8 @@ PCI-X
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
64 bits
\end_layout
@@ -918,6 +1000,8 @@ PCI-X
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
33, 66,
\end_layout
@@ -927,6 +1011,8 @@ PCI-X
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
533 Mb/s (66 Mhz)
\end_layout
@@ -936,6 +1022,8 @@ PCI-X
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
-
\end_layout
@@ -965,6 +1053,8 @@ PCI-X
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
133 Mhz
\end_layout
@@ -994,6 +1084,8 @@ PCI-X
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
PCI-Express (1.0)
\end_layout
@@ -1003,6 +1095,8 @@ PCI-Express (1.0)
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
Serial
\end_layout
@@ -1012,6 +1106,8 @@ Serial
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
1.25 Ghz
\end_layout
@@ -1021,6 +1117,8 @@ Serial
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
4Gb/s (16 lanes)
\end_layout
@@ -1030,6 +1128,8 @@ Serial
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
-
\end_layout
@@ -1041,6 +1141,8 @@ Serial
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
PCI-Express (3.0)
\end_layout
@@ -1050,6 +1152,8 @@ PCI-Express (3.0)
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
Serial
\end_layout
@@ -1059,6 +1163,8 @@ Serial
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
4 Ghz
\end_layout
@@ -1068,6 +1174,8 @@ Serial
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
16Gb/s (16 lanes)
\end_layout
@@ -1077,6 +1185,8 @@ Serial
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
-
\end_layout
@@ -1091,9 +1201,13 @@ Serial
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
+
+\lang english
\begin_inset CommandInset label
LatexCommand label
name "fig:Common-bus-types"
@@ -1114,10 +1228,14 @@ Common bus types.
\end_layout
\begin_layout Subparagraph*
+
+\lang english
PCI (Peripheral Component Interconnect)
\end_layout
\begin_layout Standard
+
+\lang english
PCI is the most basic bus allowing connecting graphics peripherals today.
One of its key feature is called bus mastering.
This feature allows a given peripheral to take hold of the bus for a given
@@ -1128,10 +1246,14 @@ PCI is the most basic bus allowing connecting graphics peripherals today.
\end_layout
\begin_layout Subparagraph*
+
+\lang english
AGP (Accelerated Graphics Port)
\end_layout
\begin_layout Standard
+
+\lang english
AGP is essentially a modified PCI bus with a number of extra features compared
to its ancestor.
Most importantly, it is faster thanks to a higher clock speed and the ability
@@ -1141,6 +1263,8 @@ AGP is essentially a modified PCI bus with a number of extra features compared
\end_layout
\begin_layout Itemize
+
+\lang english
The first feature is AGP GART (Graphics Aperture Remapping Table), a simple
form of IOMMU (as will be seen in section
\begin_inset CommandInset ref
@@ -1164,6 +1288,8 @@ ns from the other party can begin.
\end_layout
\begin_layout Itemize
+
+\lang english
The second feature is AGP side band addressing (SBA).
Side band addressing consists in 8 extra bus bits used as an address bus.
Instead of multiplexing the bus bandwidth between adresses and data, the
@@ -1172,6 +1298,8 @@ The second feature is AGP side band addressing (SBA).
\end_layout
\begin_layout Itemize
+
+\lang english
The third feature is AGP Fast Writes (FW).
Fast writes allow sending data to the graphics card directly, without having
the card initiate a DMA.
@@ -1179,6 +1307,8 @@ The third feature is AGP Fast Writes (FW).
\end_layout
\begin_layout Standard
+
+\lang english
Keep in mind that these last two features are known to be unstable on a
wide range of hardware, and oftentimes require chipset-specific hacks to
work properly.
@@ -1188,25 +1318,35 @@ Keep in mind that these last two features are known to be unstable on a
\end_layout
\begin_layout Subparagraph*
+
+\lang english
PCI-X
\end_layout
\begin_layout Standard
+
+\lang english
PCI-X was developed as a faster PCI for server boards, and very few graphics
peripherals exist in this format.
It is not to be confused with PCI-Express, which sees real widespread usage.
\end_layout
\begin_layout Subparagraph*
+
+\lang english
PCI-Express (PCI-E)
\end_layout
\begin_layout Standard
+
+\lang english
PCI-Express is the new generation of PCI devices.
It has more advantages than a simple improved PCI.
\end_layout
\begin_layout Standard
+
+\lang english
Finally, it is important to note that, depending on the architecture, the
CPU-GPU communication does not always relies on a bus.
This is especially common on embedded systems where the GPU and the CPU
@@ -1215,6 +1355,8 @@ Finally, it is important to note that, depending on the architecture, the
\end_layout
\begin_layout Section
+
+\lang english
Virtual and Physical Memory
\begin_inset CommandInset label
LatexCommand label
@@ -1226,6 +1368,8 @@ name "sec:Virtual-and-Physical"
\end_layout
\begin_layout Standard
+
+\lang english
The term
\begin_inset Quotes eld
\end_inset
@@ -1238,12 +1382,16 @@ memory
\end_layout
\begin_layout Itemize
+
+\lang english
Physical memory.
Physical memory is real, hardware memory, as stored in the memory chips.
\end_layout
\begin_layout Itemize
+
+\lang english
Virtual memory.
Virtual memory is a translation of physical memory addresses allowing user
space applications to see their allocated chunks as if they were contiguous
@@ -1251,6 +1399,8 @@ Virtual memory.
\end_layout
\begin_layout Standard
+
+\lang english
In order to simplify programming, it is easier to handle contiguous memory
areas.
This is easy to achieve as long as only a small area is needed.
@@ -1263,6 +1413,8 @@ In order to simplify programming, it is easier to handle contiguous memory
\end_layout
\begin_layout Standard
+
+\lang english
To achieve this, memory is split into pages.
For the scope of this book, it is sufficient to say that a memory page
is a collection contiguous bytes in physical memory
@@ -1270,6 +1422,8 @@ To achieve this, memory is split into pages.
status open
\begin_layout Plain Layout
+
+\lang english
On x86 and x86-64, a page is usually 4096 bytes long, although different
sizes are possible on other architectures or with huge pages.
\end_layout
@@ -1298,6 +1452,8 @@ reference "fig:MMU-and-IOMMU"
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float figure
wide false
sideways false
@@ -1306,6 +1462,8 @@ status open
\begin_layout Plain Layout
\noindent
\align center
+
+\lang english
\begin_inset ERT
status open
@@ -1322,8 +1480,15 @@ begin{tikzpicture}[node distance=1cm, auto]
\backslash
tikzset{ mynode/.style={rectangle,rounded corners,draw=black, top color=white
, bottom color=yellow!50,very thick, inner sep=1em, minimum size=3em, text
- centered, drop shadow}, myarrow/.style={->, >=latex', shorten >=1pt,
- thick}, mylabel/.style={text width=7em, text centered} }
+ centered, drop shadow},
+\end_layout
+
+\begin_layout Plain Layout
+
+mynode2/.style={rectangle,rounded corners,draw=black, top color=white, bottom
+ color=red!50,very thick, inner sep=1em, minimum size=3em, text centered,
+ drop shadow}, myarrow/.style={->, >=latex', shorten >=1pt, thick},
+ mylabel/.style={text width=7em, text centered} }
\end_layout
\begin_layout Plain Layout
@@ -1358,14 +1523,14 @@ node[mynode, below=2cm of GPU] (iommu) {IOMMU};
\backslash
-node[mynode, left=1cm of mmu] (mmupt) {MMU page table};
+node[mynode2, left=1cm of mmu] (mmupt) {MMU page table};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode, right=1cm of iommu] (iommupt) {IOMMU page table};
+node[mynode2, right=1cm of iommu] (iommupt) {IOMMU page table};
\end_layout
\begin_layout Plain Layout
@@ -1463,9 +1628,13 @@ end{tikzpicture}
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
+
+\lang english
\begin_inset CommandInset label
LatexCommand label
name "fig:MMU-and-IOMMU"
@@ -1478,16 +1647,6 @@ MMU and IOMMU.
\end_inset
-\begin_inset Note Note
-status open
-
-\begin_layout Plain Layout
-XXX ajouter les tables de page à ce dessin
-\end_layout
-
-\end_inset
-
-
\end_layout
\end_inset
@@ -1496,6 +1655,8 @@ XXX ajouter les tables de page à ce dessin
\end_layout
\begin_layout Standard
+
+\lang english
While the MMU only works for CPU accesses, it has an equivalent for peripherals:
the IOMMU.
As shown on figure
@@ -1525,6 +1686,8 @@ fooling
\end_layout
\begin_layout Standard
+
+\lang english
A special case of IOMMU is the Linux swiotlb which allocates a contiguous
piece of physical memory at boot (which makes it feasible to have a large
contiguous physical allocation since there is no fragmentation yet) and
@@ -1536,13 +1699,20 @@ A special case of IOMMU is the Linux swiotlb which allocates a contiguous
\end_layout
\begin_layout Standard
+
+\lang english
AGP GART is another special case of IOMMU present with AGP graphics cards
which exposes a single linear area to the card.
- In that case the IOMMU table is embedded in the AGP chipset, on the motherboard.
+ In that case the IOMMU is embedded in the AGP chipset, on the motherboard.
+ The AGP GART area is exposed as a linear area of virtual memory to the
+ system.
+
\begin_inset Note Note
status open
\begin_layout Plain Layout
+
+\lang english
Dire que c'est lineaire en memoire physique et virtu
\end_layout
@@ -1552,6 +1722,8 @@ Dire que c'est lineaire en memoire physique et virtu
\end_layout
\begin_layout Standard
+
+\lang english
Yet another special case of IOMMU is the PCI GART which allows exposing
a chunk of system memory to the card.
In that case the IOMMU table is embedded in the graphics card, and often
@@ -1559,31 +1731,45 @@ Yet another special case of IOMMU is the PCI GART which allows exposing
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Note Note
status open
\begin_layout Plain Layout
+
+\lang english
http://images.google.fr/images?hl=fr&source=hp&q=page+table&btnG=Recherche+d'image
s&gbv=2&aq=f&oq=
\end_layout
\begin_layout Plain Layout
+
+\lang english
http://pages.cs.wisc.edu/~bart/537/lecturenotes/s16.html
\end_layout
\begin_layout Plain Layout
+
+\lang english
http://a.michelizza.free.fr/pmwiki.php?n=TutoOS.Mm3
\end_layout
\begin_layout Plain Layout
+
+\lang english
http://lwn.net/Articles/106177/
\end_layout
\begin_layout Plain Layout
+
+\lang english
http://www.vocw.edu.vn/content/m10106/latest/
\end_layout
\begin_layout Plain Layout
+
+\lang english
http://cs.nyu.edu/courses/spring05/G22.2250-001/lectures/lecture-08.html
\end_layout
@@ -1593,6 +1779,8 @@ http://cs.nyu.edu/courses/spring05/G22.2250-001/lectures/lecture-08.html
\end_layout
\begin_layout Standard
+
+\lang english
Obviously, with so many different memory types, performance is not homogeneous;
not all combination of accesses are fast, depending on whether they involve
the CPU, the GPU, or bus transfers.
@@ -1604,14 +1792,20 @@ Obviously, with so many different memory types, performance is not homogeneous;
\end_layout
\begin_layout Standard
+
+\lang english
As far as setting the memory caching parameters goes, there are two ways
to set caching attributes on memory ranges:
\end_layout
\begin_layout Itemize
+
+\lang english
MTRRs.
An MTRR (Memory Type Range Register) is a register describing attributes
for a range of given physical memory.
+ Each MTRR consists in a starting physical address, a size, and a caching
+ type.
The number of MTRR depends on the system, but is very limited.
Although this applies to a physical memory range, the effect works on the
corresponding virtual memory pages.
@@ -1621,6 +1815,8 @@ MTRRs.
status open
\begin_layout Plain Layout
+
+\lang english
XXX des exemples
\end_layout
@@ -1630,11 +1826,17 @@ XXX des exemples
\end_layout
\begin_layout Itemize
+
+\lang english
PAT (Page Attribute Table) allows setting per-page memory attributes.
+ Instead of relying on a limited number of memory ranges like with MTRRs,
+ it is possible to specify caching attributes on a per-page basis.
However it is an extension only available on recent x86 processors.
\end_layout
\begin_layout Standard
+
+\lang english
On top of these, one can use explicit caching instructions on some architectures
, for example on x86
\emph on
@@ -1648,10 +1850,14 @@ clflush
\end_layout
\begin_layout Standard
+
+\lang english
There are 3 caching modes, usable both through MTRR and PAT on system memory:
\end_layout
\begin_layout Itemize
+
+\lang english
UC (UnCached) memory is uncached.
CPU read/writes to this area are uncached, and each memory write instruction
triggers an actual immediate memory write.
@@ -1660,6 +1866,8 @@ UC (UnCached) memory is uncached.
\end_layout
\begin_layout Itemize
+
+\lang english
WC (Write Combine) memory is uncached, but CPU writes are combined together
in order to improve the performance.
This is useful to improve performance in situations where uncached memory
@@ -1667,6 +1875,8 @@ WC (Write Combine) memory is uncached, but CPU writes are combined together
\end_layout
\begin_layout Itemize
+
+\lang english
WB (Write Back) memory is cached.
This is the default mode and leads to the best performance for CPU accesses.
However this does not ensure that memory writes are propagated to central
@@ -1674,6 +1884,8 @@ WB (Write Back) memory is cached.
\end_layout
\begin_layout Standard
+
+\lang english
Notice that these caching modes apply to the CPU only, the GPU accesses
are not directly affected by the current caching mode.
However, when the GPU has to access an area of memory which was previously
@@ -1688,17 +1900,23 @@ s present on some x86 processors (like cflush).
\end_layout
\begin_layout Standard
+
+\lang english
Obviously with so many different caching modes, not all accesses have the
same performance:
\end_layout
\begin_layout Itemize
+
+\lang english
When it comes to CPU access to system memory, uncached mode provides the
worst performance, write back provides the best performance, and write
combine is in between.
\end_layout
\begin_layout Itemize
+
+\lang english
When the CPU accesses the video memory from a discrete card, all accesses
are extremely slow, be they reads or writes, as each access needs a cycle
on the bus.
@@ -1709,10 +1927,14 @@ When the CPU accesses the video memory from a discrete card, all accesses
\end_layout
\begin_layout Itemize
+
+\lang english
Obviously the GPU accessing VRAM is extremely fast.
\end_layout
\begin_layout Itemize
+
+\lang english
GPU access to system ram is unaffected by the caching mode, but still has
to go over the bus.
This is the case of DMA transactions.
@@ -1731,6 +1953,8 @@ free
\end_layout
\begin_layout Standard
+
+\lang english
Finally, one last important point to make about memory is the notion of
memory barriers and write posting.
In the case of a cached (Write Combine or Write Back) memory area, a memory
@@ -1743,10 +1967,14 @@ Finally, one last important point to make about memory is the notion of
\end_layout
\begin_layout Section
+
+\lang english
Anatomy of the Graphics Card
\end_layout
\begin_layout Standard
+
+\lang english
Today, a graphics card is basically a computer-in-the-computer.
It is a complex beast with a dedicated processor on a separate card, and
features its own computation units, its own bus, and its own memory.
@@ -1754,10 +1982,14 @@ Today, a graphics card is basically a computer-in-the-computer.
\end_layout
\begin_layout Subsubsection*
+
+\lang english
Graphics Memory
\end_layout
\begin_layout Standard
+
+\lang english
The GPU's memory, which we will from now on refer to as video memory, can
be either real, dedicated, on-card memory (in the case of a discrete card),
or memory shared with the CPU (in the case of an integrated card).
@@ -1769,6 +2001,8 @@ d properly; while the case of dedicated memory means that transfers back
\end_layout
\begin_layout Standard
+
+\lang english
It is not uncommon for modern GPUs to feature a form of virtual memory as
well, allowing to map different resources (real video memory of system
memory) into the GPU address space.
@@ -1792,10 +2026,14 @@ It is not uncommon for modern GPUs to feature a form of virtual memory as
\end_layout
\begin_layout Subsubsection*
+
+\lang english
Surfaces
\end_layout
\begin_layout Standard
+
+\lang english
Surfaces are the basic sources and targets for all rendering.
Althought they can be called differenty (textures, render targets, buffers...)
the basic idea is always the same.
@@ -1814,6 +2052,8 @@ reference "fig:The-layout-of"
\end_layout
\begin_layout Itemize
+
+\lang english
The pixel format of the surface.
A pixel color is represented memory by its red, green and blue components,
plus an alpha component used as the opacity for blending.
@@ -1832,6 +2072,8 @@ bpp
status open
\begin_layout Plain Layout
+
+\lang english
, YUV12, YUY16
\end_layout
@@ -1842,12 +2084,16 @@ status open
\end_layout
\begin_layout Itemize
+
+\lang english
Width and height are the most obvious characteristics, and are given in
pixels.
\end_layout
\begin_layout Itemize
+
+\lang english
The pitch is the width in bytes (not in pixels!) of the surface, including
the dead zone pixels.
The pitch is convenient for computing memory usages, for example the size
@@ -1863,6 +2109,8 @@ The pitch is the width in bytes (not in pixels!) of the surface, including
\end_layout
\begin_layout Standard
+
+\lang english
Notice that surfaces are not always stored linearly in video memory, in
fact for performance reasons it is extremely common that they are not,
as this improves the locality of the memory accesses when rendering.
@@ -1877,6 +2125,8 @@ tiled
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float figure
wide false
sideways false
@@ -1885,6 +2135,8 @@ status open
\begin_layout Plain Layout
\noindent
\align center
+
+\lang english
\begin_inset ERT
status open
@@ -2036,9 +2288,13 @@ end{tikzpicture}
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
+
+\lang english
\begin_inset CommandInset label
LatexCommand label
name "fig:The-layout-of"
@@ -2059,10 +2315,14 @@ The layout of a surface.
\end_layout
\begin_layout Subsubsection*
+
+\lang english
2D engine
\end_layout
\begin_layout Standard
+
+\lang english
The 2D engine, or blitter, is the hardware used for 2D acceleration.
Blitters have been one of the earliest form of graphics acceleration and
are still extremely widespread today.
@@ -2070,6 +2330,8 @@ The 2D engine, or blitter, is the hardware used for 2D acceleration.
\end_layout
\begin_layout Itemize
+
+\lang english
Blits.
Blits are a copy of a memory rectangle from one place to another by the
GPU.
@@ -2077,23 +2339,31 @@ Blits.
\end_layout
\begin_layout Itemize
+
+\lang english
Solid fills.
Solid fills consist in filling a rectangle memory area with a color.
Note that this can also include the alpha channel.
\end_layout
\begin_layout Itemize
+
+\lang english
Alpha blits.
Alpha blits use the alpha component of pixels from of a surface to achieve
transparency [porter & duff].
\end_layout
\begin_layout Itemize
+
+\lang english
Stretched blits.
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float figure
wide false
sideways false
@@ -2102,6 +2372,8 @@ status open
\begin_layout Plain Layout
\noindent
\align center
+
+\lang english
\begin_inset ERT
status open
@@ -2347,9 +2619,13 @@ end{tikzpicture}
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
+
+\lang english
\begin_inset CommandInset label
LatexCommand label
name "fig:Blitting-between-two"
@@ -2370,6 +2646,8 @@ Blitting between two different surfaces.
\end_layout
\begin_layout Standard
+
+\lang english
Figure
\begin_inset CommandInset ref
LatexCommand ref
@@ -2385,6 +2663,8 @@ on coordinates, the source and destination pitches, and the blit width and
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float figure
wide false
sideways false
@@ -2393,6 +2673,8 @@ status open
\begin_layout Plain Layout
\noindent
\align center
+
+\lang english
\begin_inset ERT
status open
@@ -2566,9 +2848,13 @@ end{tikzpicture}
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
+
+\lang english
\begin_inset CommandInset label
LatexCommand label
name "fig:Overlapping-blit-inside"
@@ -2589,6 +2875,8 @@ Overlapping blit inside a surface.
\end_layout
\begin_layout Standard
+
+\lang english
When a blit happens between two overlapping source and destination surfaces,
the semantics of the copy is not trivially defined, especially if one considers
that what happens for a blit is not a simple move of a rectangle, but is
@@ -2610,6 +2898,8 @@ reference "fig:Overlapping-blit-inside"
\end_layout
\begin_layout Standard
+
+\lang english
Finally, keep in mind that not all current graphics accelerators feature
a 2D engine.
Since 3D acceleration is technically a super-set of 2D acceleration, it
@@ -2635,10 +2925,14 @@ reference "cha:Gallium-3D"
\end_layout
\begin_layout Subsubsection*
+
+\lang english
3D engine
\end_layout
\begin_layout Standard
+
+\lang english
A 3D engine is also called
\begin_inset Quotes eld
\end_inset
@@ -2652,65 +2946,95 @@ rasterization pipeline
\end_layout
\begin_layout Standard
+
+\lang english
vertex -> geom -> fragment
\end_layout
\begin_layout Standard
+
+\lang english
graphics fifo
\end_layout
\begin_layout Standard
+
+\lang english
DMA
\end_layout
\begin_layout Standard
+
+\lang english
http://www.x.org/wiki/Development/Documentation/HowVideoCardsWork
\end_layout
\begin_layout Standard
+
+\lang english
tiled textures
\end_layout
\begin_layout Subsubsection*
+
+\lang english
Overlays and hardware sprites
\end_layout
\begin_layout Subsubsection*
+
+\lang english
Scanout
\end_layout
\begin_layout Standard
+
+\lang english
The last stage of a graphics display is presenting the information onto
a display device, or screen.
\end_layout
\begin_layout Standard
+
+\lang english
Display devices are the last link of the graphics chain.
They are charged with presenting the pictures to the user.
\end_layout
\begin_layout Standard
+
+\lang english
digital vs analog signal
\end_layout
\begin_layout Standard
+
+\lang english
hsync, vsync
\end_layout
\begin_layout Standard
+
+\lang english
sync on green
\end_layout
\begin_layout Standard
+
+\lang english
Connectors and encoders: CRTC,TMDS, LVDS, DVI-I, DVI-A, DVI-D, VGA (D-SUB
15 is the proper name)
\end_layout
\begin_layout Section
+
+\lang english
Programming the card
\end_layout
\begin_layout Standard
+
+\lang english
Each PCI card exposes a number of PCI resources;
\emph on
lspci -v
@@ -2727,10 +3051,14 @@ lspci -v
\end_layout
\begin_layout Subparagraph*
+
+\lang english
MMIO
\end_layout
\begin_layout Standard
+
+\lang english
MMIO is the most direct access to the card.
A range of addresses is exposed to the CPU, where each write goes directly
to the GPU.
@@ -2746,10 +3074,14 @@ MMIO is the most direct access to the card.
\end_layout
\begin_layout Subparagraph*
+
+\lang english
DMA
\end_layout
\begin_layout Standard
+
+\lang english
A direct memory access (DMA) is the use by a peripheral of the bus mastering
feature of the bus.
This allows one peripheral to talk directly to another, without intervention
@@ -2758,6 +3090,8 @@ A direct memory access (DMA) is the use by a peripheral of the bus mastering
\end_layout
\begin_layout Itemize
+
+\lang english
Transfers by the GPU to and from system memory (for reading textures and
writing buffers).
This allows implementing things like texturing over AGP or PCI, and hardware-ac
@@ -2765,6 +3099,8 @@ celerated texture transfers.
\end_layout
\begin_layout Itemize
+
+\lang english
The implementation of command FIFO.
As MMIO between the CPU and GPU is synchronous and graphics drivers inherently
use a lot of I/O, a faster means of communicating with the card is required.
@@ -2777,10 +3113,14 @@ The implementation of command FIFO.
\end_layout
\begin_layout Subsubsection*
+
+\lang english
Interrupts
\end_layout
\begin_layout Standard
+
+\lang english
Interrupts are a way for hardware peripherals in general, and GPUs in particular
, to signal events to the CPU.
Usage examples for interrupts include signaling completion of a graphics
@@ -2794,22 +3134,32 @@ Interrupts are a way for hardware peripherals in general, and GPUs in particular
\end_layout
\begin_layout Section
+
+\lang english
Graphics Hardware Examples
\end_layout
\begin_layout Paragraph*
+
+\lang english
ATI
\end_layout
\begin_layout Standard
+
+\lang english
Shader engine 4+1
\end_layout
\begin_layout Paragraph*
+
+\lang english
Nvidia
\end_layout
\begin_layout Standard
+
+\lang english
NVidia hardware has multiple specificities compared to other architectures.
The first one is the availability of multiple contexts, which is implemented
using multiple command fifos (similar to what some high-end infiniband
@@ -2824,6 +3174,8 @@ NVidia hardware has multiple specificities compared to other architectures.
\end_layout
\begin_layout Standard
+
+\lang english
The second specificity is the notion of graphics objects.
Nvidia hardware features two levels of GPU access: the first one is at
the raw level and is used for context switches, an the second one is the
@@ -2832,22 +3184,38 @@ The second specificity is the notion of graphics objects.
\end_layout
\begin_layout Standard
+
+\lang english
Shader engine nv40/nv50
\end_layout
\begin_layout Standard
+
+\lang english
http://nouveau.freedesktop.org/wiki/HonzaHavlicek
\end_layout
\begin_layout Paragraph*
+
+\lang english
SGX
\end_layout
\begin_layout Standard
+
+\lang english
Tiling architecture
\end_layout
\begin_layout Standard
+
+\lang english
+Combined shader with blending and depth test
+\end_layout
+
+\begin_layout Standard
+
+\lang english
\begin_inset Box Shadowbox
position "t"
hor_pos "c"
@@ -2861,28 +3229,40 @@ height_special "totalheight"
status open
\begin_layout Plain Layout
+
+\lang english
Takeaways:
\end_layout
\begin_layout Itemize
+
+\lang english
There are multiple memory domains in a computer, and they are not coherent.
\end_layout
\begin_layout Itemize
+
+\lang english
A GPU is a completely separate computer with its own bus, address space
and computational units.
\end_layout
\begin_layout Itemize
+
+\lang english
Communication between the CPU and GPU is achieved over a bus, which has
non-trivial performance implications.
\end_layout
\begin_layout Itemize
+
+\lang english
GPUs can be programmed using two modes: MMIO and command FIFOs.
\end_layout
\begin_layout Itemize
+
+\lang english
There is no standard output method for display devices.
\end_layout
@@ -2892,6 +3272,8 @@ There is no standard output method for display devices.
\end_layout
\begin_layout Chapter
+
+\lang english
The Big Picture
\begin_inset CommandInset label
LatexCommand label
@@ -2903,16 +3285,22 @@ name "cha:The-Big-Picture"
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Note Note
status open
\begin_layout Plain Layout
+
+\lang english
X, how it works (encapsulating) with indirect (glx) 3D with kernel FB +
picture.
This is how utah-glx used to work.
\end_layout
\begin_layout Plain Layout
+
+\lang english
DRI : bypassing encapsulation for performance-critical operations with kernel
FB + picture
\end_layout
@@ -2923,16 +3311,22 @@ DRI : bypassing encapsulation for performance-critical operations with kernel
\end_layout
\begin_layout Standard
+
+\lang english
The Linux graphics stack has seen numerous evolutions over the years.
The purpose of this section is to detail that history, as well as the justifica
tion behind the changes in order to better motivate the current design.
\end_layout
\begin_layout Section
+
+\lang english
The X11 infrastructure
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float figure
placement tbh
wide false
@@ -2941,6 +3335,8 @@ status open
\begin_layout Plain Layout
\align center
+
+\lang english
\begin_inset ERT
status open
@@ -3055,9 +3451,13 @@ end{tikzpicture}
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
+
+\lang english
The X11 architecture.
\end_layout
@@ -3072,47 +3472,69 @@ The X11 architecture.
\end_layout
\begin_layout Standard
+
+\lang english
DIX (Device-Independent X), DDX (Device-Dependent X),
\end_layout
\begin_layout Standard
+
+\lang english
modules
\end_layout
\begin_layout Standard
+
+\lang english
Xlib
\end_layout
\begin_layout Standard
+
+\lang english
socket
\end_layout
\begin_layout Standard
+
+\lang english
X protocol
\end_layout
\begin_layout Standard
+
+\lang english
X extensions
\end_layout
\begin_layout Standard
+
+\lang english
shm -> shared memory for transport
\end_layout
\begin_layout Standard
+
+\lang english
XCB -> asynchronous
\end_layout
\begin_layout Standard
+
+\lang english
Another notable X extension is Xv, which will be discussed in further detail
in the video decoding chapter.
\end_layout
\begin_layout Section
+
+\lang english
The DRI/DRM infrastructure
\end_layout
\begin_layout Standard
+
+\lang english
Initially (when Linux first supported graphics hardware acceleration), only
a single piece of code would access the graphics card directly: the XFree86
server.
@@ -3128,6 +3550,8 @@ Initially (when Linux first supported graphics hardware acceleration), only
\end_layout
\begin_layout Standard
+
+\lang english
Later on, Utah-GLX, the first hardware-independent 3D accelerated design,
came to Linux.
Utah-GLX basically consists in an additional user space 3D driver implementing
@@ -3142,6 +3566,8 @@ Later on, Utah-GLX, the first hardware-independent 3D accelerated design,
\end_layout
\begin_layout Standard
+
+\lang english
At the same time, framebuffer drivers (which will be detailed in Chapter
\begin_inset CommandInset ref
@@ -3161,10 +3587,14 @@ reference "cha:Framebuffer-Drivers"
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Note Note
status open
\begin_layout Plain Layout
+
+\lang english
aide à faire des figures : http://www.texample.net/tikz/examples/
\end_layout
@@ -3174,6 +3604,8 @@ aide à faire des figures : http://www.texample.net/tikz/examples/
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float figure
placement H
wide false
@@ -3182,6 +3614,8 @@ status open
\begin_layout Plain Layout
\align center
+
+\lang english
\begin_inset ERT
status open
@@ -3372,9 +3806,13 @@ end{tikzpicture}
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
+
+\lang english
\begin_inset CommandInset label
LatexCommand label
name "fig:Early-implementation-of"
@@ -3395,6 +3833,8 @@ Early implementation of the Linux graphics stack using Utah-GLX.
\end_layout
\begin_layout Standard
+
+\lang english
Obviously, this model had drawbacks.
First, it required that unprivileged user space applications be allowed
access the graphics hardware for 3D.
@@ -3412,6 +3852,8 @@ reference "fig:Early-implementation-of"
\end_layout
\begin_layout Standard
+
+\lang english
To address the reliability and security concerns with the Utah-GLX model,
the DRI model was put together; it was used in both XFree86 and its successor,
X.Org.
@@ -3428,6 +3870,8 @@ To address the reliability and security concerns with the Utah-GLX model,
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float figure
placement H
wide false
@@ -3436,6 +3880,8 @@ status open
\begin_layout Plain Layout
\align center
+
+\lang english
\begin_inset ERT
status open
@@ -3654,9 +4100,13 @@ end{tikzpicture}
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
+
+\lang english
The old picture of the Linux graphics stack.
\end_layout
@@ -3671,6 +4121,8 @@ The old picture of the Linux graphics stack.
\end_layout
\begin_layout Standard
+
+\lang english
The current stack evolved from a new set of needs.
First, requiring the X server to have super-user has always had serious
security implications.
@@ -3685,6 +4137,8 @@ er functionality into the DRM module and second, have X.Org access the graphics
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float figure
placement H
wide false
@@ -3693,6 +4147,8 @@ status open
\begin_layout Plain Layout
\align center
+
+\lang english
\begin_inset ERT
status open
@@ -3909,9 +4365,13 @@ end{tikzpicture}
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
+
+\lang english
The new picture of the Linux graphics stack.
\end_layout
@@ -3926,43 +4386,63 @@ The new picture of the Linux graphics stack.
\end_layout
\begin_layout Standard
+
+\lang english
VT switches
\end_layout
\begin_layout Standard
+
+\lang english
http://dri.sourceforge.net/doc/dri_data_flow.html
\end_layout
\begin_layout Standard
+
+\lang english
http://dri.sourceforge.net/doc/dri_control_flow.html
\end_layout
\begin_layout Standard
+
+\lang english
http://nouveau.freedesktop.org/wiki/GraphicStackOverview
\end_layout
\begin_layout Standard
+
+\lang english
http://people.freedesktop.org/~ajax/dri-explanation.txt
\end_layout
\begin_layout Standard
+
+\lang english
http://dri.sourceforge.net/doc/DRIintro.html
\end_layout
\begin_layout Standard
+
+\lang english
http://jonsmirl.googlepages.com/graphics.html
\end_layout
\begin_layout Standard
+
+\lang english
http://wiki.x.org/wiki/Development/Documentation/Glossary
\end_layout
\begin_layout Standard
+
+\lang english
http://mjules.littleboboy.net/carnet/index.php?post/2006/11/15/89-comment-marche-x1
1-xorg-et-toute-la-clique-5-partie
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Box Shadowbox
position "t"
hor_pos "c"
@@ -3976,20 +4456,28 @@ height_special "totalheight"
status open
\begin_layout Plain Layout
+
+\lang english
Takeaways:
\end_layout
\begin_layout Itemize
+
+\lang english
Applications communicate with X.Org through a specific library which encapsulates
drawing calls.
\end_layout
\begin_layout Itemize
+
+\lang english
The current DRI design has evolved over time in a number of significant
steps.
\end_layout
\begin_layout Itemize
+
+\lang english
In a modern stack, all graphics hardware activity is moderated by a kernel
module, the DRM.
\end_layout
@@ -4000,6 +4488,8 @@ In a modern stack, all graphics hardware activity is moderated by a kernel
\end_layout
\begin_layout Chapter
+
+\lang english
Framebuffer Drivers
\begin_inset CommandInset label
LatexCommand label
@@ -4011,6 +4501,8 @@ name "cha:Framebuffer-Drivers"
\end_layout
\begin_layout Standard
+
+\lang english
Framebuffer drivers are the simplest form of graphics drivers under Linux.
Kernel modesetting DRM drivers are still a relevant option if the only
thing you are after is a basic two-dimensional display.
@@ -4024,10 +4516,14 @@ Framebuffer drivers are the simplest form of graphics drivers under Linux.
\end_layout
\begin_layout Standard
+
+\lang english
At the core, a framebuffer driver implements the following functionality:
\end_layout
\begin_layout Itemize
+
+\lang english
Modesetting.
Modesetting consists in configuring video mode to get a picture on the
screen.
@@ -4035,6 +4531,8 @@ Modesetting.
\end_layout
\begin_layout Itemize
+
+\lang english
Optional 2d acceleration.
Framebuffer drivers can provide basic 2D acceleration used to accelerate
the linux console.
@@ -4045,6 +4543,8 @@ Optional 2d acceleration.
\end_layout
\begin_layout Standard
+
+\lang english
By implementing only these two pieces, framebuffer drivers remain the simplest
and most amenable form of linux graphics drivers.
Framebuffer drivers do not always rely on a specific card model (like nvidia
@@ -4056,27 +4556,39 @@ By implementing only these two pieces, framebuffer drivers remain the simplest
\end_layout
\begin_layout Standard
+
+\lang english
http://www.linux-fbdev.org/HOWTO/index.html
\end_layout
\begin_layout Section
+
+\lang english
Creating a framebuffer driver
\end_layout
\begin_layout Standard
+
+\lang english
struct platform_driver with a probe function
\end_layout
\begin_layout Standard
+
+\lang english
probe function in charge of creating the fb_info struct and register_framebuffer
() on it.
\end_layout
\begin_layout Section
+
+\lang english
Framebuffer operations
\end_layout
\begin_layout Standard
+
+\lang english
The framebuffer operations structure is how non-modesetting framebuffer
callbacks are set.
Different callbacks can be set depending on what functionality you wish
@@ -4085,6 +4597,8 @@ The framebuffer operations structure is how non-modesetting framebuffer
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset ERT
status open
@@ -4114,10 +4628,14 @@ end{lstlisting}{}
\end_layout
\begin_layout Standard
+
+\lang english
/* set color register */
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset ERT
status open
@@ -4146,10 +4664,14 @@ end{lstlisting}{}
\end_layout
\begin_layout Standard
+
+\lang english
/* set color registers in batch */
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset ERT
status open
@@ -4178,10 +4700,14 @@ end{lstlisting}{}
\end_layout
\begin_layout Standard
+
+\lang english
/* blank display */
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset ERT
status open
@@ -4210,10 +4736,14 @@ end{lstlisting}{}
\end_layout
\begin_layout Standard
+
+\lang english
/* pan display */
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset ERT
status open
@@ -4243,10 +4773,14 @@ end{lstlisting}{}
\end_layout
\begin_layout Standard
+
+\lang english
/* Draws a rectangle */
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset ERT
status open
@@ -4275,10 +4809,14 @@ end{lstlisting}{}
\end_layout
\begin_layout Standard
+
+\lang english
/* Copy data from area to another */
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset ERT
status open
@@ -4307,10 +4845,14 @@ end{lstlisting}{}
\end_layout
\begin_layout Standard
+
+\lang english
/* Draws a image to the display */
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset ERT
status open
@@ -4339,10 +4881,14 @@ end{lstlisting}{}
\end_layout
\begin_layout Standard
+
+\lang english
/* Draws cursor */
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset ERT
status open
@@ -4371,10 +4917,14 @@ end{lstlisting}{}
\end_layout
\begin_layout Standard
+
+\lang english
/* Rotates the display */
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset ERT
status open
@@ -4403,10 +4953,14 @@ end{lstlisting}{}
\end_layout
\begin_layout Standard
+
+\lang english
/* wait for blit idle, optional */
\end_layout
\begin_layout Standard
+
+\lang english
Note that common framebuffer functions (cfb) are available if you do not
want to implement everything for your device specifically.
These functions are cfb_fillrect, cfb_copyarea and cfb_imageblit and will
@@ -4415,6 +4969,8 @@ Note that common framebuffer functions (cfb) are available if you do not
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Box Shadowbox
position "t"
hor_pos "c"
@@ -4428,20 +4984,28 @@ height_special "totalheight"
status open
\begin_layout Plain Layout
+
+\lang english
Takeaways:
\end_layout
\begin_layout Itemize
+
+\lang english
Framebuffer drivers are the simplest form of linux graphics driver, requiring
little implementation work.
\end_layout
\begin_layout Itemize
+
+\lang english
Framebuffer drivers deliver a low memory footprint and thus are useful for
embedded devices.
\end_layout
\begin_layout Itemize
+
+\lang english
Implementing acceleration is optional as software fallback functions exist.
\end_layout
@@ -4451,6 +5015,8 @@ Implementing acceleration is optional as software fallback functions exist.
\end_layout
\begin_layout Chapter
+
+\lang english
The DRM Kernel Module
\begin_inset CommandInset label
LatexCommand label
@@ -4462,27 +5028,37 @@ name "cha:The-DRM-Kernel"
\end_layout
\begin_layout Standard
+
+\lang english
The use of a kernel module is a requirement in a complex world.
The kernel module, or DRM, has multiple purposes:
\end_layout
\begin_layout Itemize
+
+\lang english
Share the rendering hardware between multiple user space components, and
arbitrate access.
\end_layout
\begin_layout Itemize
+
+\lang english
Enforce security by preventing applications from performing DMA to arbitrary
memory regions, and more generally programming the card in any way that
could result in a security hole.
\end_layout
\begin_layout Itemize
+
+\lang english
Manage the memory of the card, by providing video memory allocation functionalit
y to user space.
\end_layout
\begin_layout Itemize
+
+\lang english
More recently, DRM was improve to achieve modesetting.
This simplifies the situation where both the DRM and the framebuffer driver
access the card by removing the framebuffer driver and implementing in
@@ -4490,21 +5066,29 @@ More recently, DRM was improve to achieve modesetting.
\end_layout
\begin_layout Itemize
+
+\lang english
Put critical initialization of the card in the kernel, for example by uploading
firmwares or setting up DMA areas.
\end_layout
\begin_layout Standard
+
+\lang english
Kernel module (DRM)
\end_layout
\begin_layout Standard
+
+\lang english
Global DRI/DRM user space/kernel scheme (figure with libdrm - drm - entry
points - multiple user space apps)
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float figure
placement H
wide false
@@ -4513,6 +5097,8 @@ status open
\begin_layout Plain Layout
\align center
+
+\lang english
\begin_inset ERT
status open
@@ -4652,9 +5238,13 @@ end{tikzpicture}
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
+
+\lang english
Accessing the DRM through libdrm.
\end_layout
@@ -4669,6 +5259,8 @@ Accessing the DRM through libdrm.
\end_layout
\begin_layout Standard
+
+\lang english
When designing a Linux graphics driver aiming for more than simple framebuffer
support, a DRM component is the first thing to do.
One should derive a design that is both efficient and enforces security.
@@ -4678,10 +5270,14 @@ When designing a Linux graphics driver aiming for more than simple framebuffer
\end_layout
\begin_layout Section
+
+\lang english
Hardware sharing
\end_layout
\begin_layout Standard
+
+\lang english
Multiplexing of the card command fifo - For cards which only feature a single
hardware command submission fifo, it has to be shared between multiple
user space components.
@@ -4689,28 +5285,52 @@ Multiplexing of the card command fifo - For cards which only feature a single
\end_layout
\begin_layout Standard
+
+\lang english
Prevent simultaneous access to the same hw
\end_layout
+\begin_layout Standard
+
+\lang english
+Share video memory
+\end_layout
+
\begin_layout Section
+
+\lang english
Security
\end_layout
\begin_layout Standard
+
+\lang english
Prevent arbitrary DMAs to memory.
IF the hardware does not feature memory protection, you have to check the
command stream before submitting it to the GPU.
\end_layout
\begin_layout Section
+
+\lang english
Memory management
\end_layout
+\begin_layout Standard
+
+\lang english
+GEM, TTM
+\end_layout
+
\begin_layout Section
+
+\lang english
Modesetting
\end_layout
\begin_layout Standard
+
+\lang english
Modesetting is the act of setting a mode on the card to display.
This can range from extremely simple procedures (calling a VGA interrupt
or VESA call is a basic form of modesetting) to directly programming the
@@ -4721,6 +5341,8 @@ Modesetting is the act of setting a mode on the card to display.
\end_layout
\begin_layout Standard
+
+\lang english
However, these days it makes more sense to put it in the kernel once and
for all, and share it between different GPU users (framebuffer drivers,
DDXes, EGL stacks...).
@@ -4731,57 +5353,81 @@ However, these days it makes more sense to put it in the kernel once and
\end_layout
\begin_layout Subsubsection*
+
+\lang english
Crtc
\end_layout
\begin_layout Standard
+
+\lang english
Crtc is in charge of reading the framebuffer memory and routes the data
to an encoder
\end_layout
\begin_layout Subsubsection*
+
+\lang english
Encoder
\end_layout
\begin_layout Standard
+
+\lang english
Encoder encodes the pixel data for a connector
\end_layout
\begin_layout Subsubsection*
+
+\lang english
Connector
\end_layout
\begin_layout Standard
+
+\lang english
The connector is the name physical output on the card (DVI, Dsub, Svideo...).
Notice that connectors can get their data from multiple encoders (for example
DVI-I which can feed both analog and digital signals)
\end_layout
\begin_layout Standard
+
+\lang english
Also, on embedded or old hardware, it is common to have encoders and connectors
merged for simplicity/power efficiency reasons.
\end_layout
\begin_layout Standard
+
+\lang english
+++ Ajouter ici un schema crtc-encoder-connector
\end_layout
\begin_layout Section
+
+\lang english
libdrm
\end_layout
\begin_layout Standard
+
+\lang english
libdrm is a small (but growing) component that interfaces between user space
and the DRM module, and allows calling into the entry points.
\end_layout
\begin_layout Standard
+
+\lang english
Obviously security should not rely on components from libdrm because it
is an unprivileged user space component
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Box Shadowbox
position "t"
hor_pos "c"
@@ -4795,19 +5441,27 @@ height_special "totalheight"
status open
\begin_layout Plain Layout
+
+\lang english
Takeaways:
\end_layout
\begin_layout Itemize
+
+\lang english
The DRM manages all graphics activity in a modern linux graphics stack.
\end_layout
\begin_layout Itemize
+
+\lang english
It is the only trusted piece of the stack and is responsible for security.
Therefore it shall not trust the other components.
\end_layout
\begin_layout Itemize
+
+\lang english
It provides basic graphics functionality: modesetting, framebuffer driver,
memory management.
\end_layout
@@ -4818,6 +5472,8 @@ It provides basic graphics functionality: modesetting, framebuffer driver,
\end_layout
\begin_layout Chapter
+
+\lang english
X.Org Drivers
\begin_inset CommandInset label
LatexCommand label
@@ -4829,10 +5485,14 @@ name "cha:X.Org-Drivers"
\end_layout
\begin_layout Standard
+
+\lang english
This chapter covers the implementation of a 2D acceleration inside X.Org.
\end_layout
\begin_layout Standard
+
+\lang english
There are multiple ways to implement a 2D X.Org driver: ShadowFB, XAA, EXA.
Another simple way of implementing X.Org support is through the FBDev module.
This module implements X.Org on top of an existing, in-kernel framebuffer
@@ -4840,18 +5500,26 @@ There are multiple ways to implement a 2D X.Org driver: ShadowFB, XAA, EXA.
\end_layout
\begin_layout Standard
+
+\lang english
http://www.x.org/wiki/DriverDevelopment
\end_layout
\begin_layout Section
+
+\lang english
Initializing a driver
\end_layout
\begin_layout Section
+
+\lang english
ShadowFB acceleration
\end_layout
\begin_layout Standard
+
+\lang english
ShadowFB provides no acceleration proper, a copy of the framebuffer is kept
in system memory.
The driver implements a single hook that copies graphics from system to
@@ -4861,20 +5529,28 @@ ShadowFB provides no acceleration proper, a copy of the framebuffer is kept
\end_layout
\begin_layout Standard
+
+\lang english
Despite the name, shadowFB is not to be confused with the kernel framebuffer
drivers.
\end_layout
\begin_layout Standard
+
+\lang english
Although ShadowFB is a very basic design, it can result in a more efficient
and responsive desktop than an incomplete implementation of EXA.
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Note Note
status open
\begin_layout Plain Layout
+
+\lang english
Insérer une image avec la propagation shadowfb
\end_layout
@@ -4884,6 +5560,8 @@ Insérer une image avec la propagation shadowfb
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float figure
wide false
sideways false
@@ -4892,6 +5570,8 @@ status open
\begin_layout Plain Layout
\noindent
\align center
+
+\lang english
\begin_inset ERT
status open
@@ -5058,9 +5738,13 @@ end{tikzpicture}
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
+
+\lang english
Shadowfb acceleration.
\end_layout
@@ -5075,38 +5759,102 @@ Shadowfb acceleration.
\end_layout
\begin_layout Section
+
+\lang english
XAA acceleration
\end_layout
\begin_layout Standard
+
+\lang english
Scanline based acceleration
\end_layout
\begin_layout Standard
+
+\lang english
Offscreen area, same pitch as the screen
\end_layout
\begin_layout Section
+
+\lang english
EXA acceleration
\end_layout
\begin_layout Standard
-Adapted from KAA from Kdrive
+
+\lang english
+EXA is an interface inside X.Org implemented by drivers for 2D acceleration.
+ It was originally designed as KAA in the Kdriver X server, and then was
+ adapted into X.Org.
+ The interface used is pretty simple; for each acceleration function three
+ hooks are available: PrepareAction, Action and FinishAction.
+ PrepareAction is called once before the operation is used.
+ Action can be called many times in a row after a single PrepareAction call
+ for different surfaces.
+ FinishAction is called after all the Action calls have been made.
+ The number of Action calls can be just one or many, but the PrepareAction
+ and FinishAction function will always be called once, first and last.
+ The PrepareAction functions return a boolean, and can return false if they
+ fail at accelerating the specific type of operation, in which case a software
+ fallback is used instead.
+ Otherwise the function returns true and subsequent Action calls are expected
+ to succeed.
+\end_layout
+
+\begin_layout Standard
+
+\lang english
+Here is a detail of the main EXA acceleration functions.
\end_layout
\begin_layout Standard
-Simple interface : Prepare/Act/Finish for each acceleration function
+
+\lang english
+\begin_inset ERT
+status open
+
+\begin_layout Plain Layout
+
+
+\backslash
+begin{lstlisting}{}
+\end_layout
+
+\begin_layout Plain Layout
+
+Bool (*PrepareSolid) (PixmapPtr pPixmap, int alu, Pixel planemask, Pixel
+ fg);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+end{lstlisting}{}
+\end_layout
+
+\end_inset
+
+
\end_layout
\begin_layout Standard
+
+\lang english
Solid - fill an area with a solid color (RGBA)
\end_layout
\begin_layout Standard
+
+\lang english
Copy - copies a rectangle area from and to video memory
\end_layout
\begin_layout Standard
+
+\lang english
Composite - optional interface used to achieve composite operations like
blending.
This allows accelerating 2D desktop effects like blending, scaling, operations
@@ -5114,18 +5862,44 @@ Composite - optional interface used to achieve composite operations like
\end_layout
\begin_layout Standard
+
+\lang english
UploadToScreen - copies an area from system memory to video memory
\end_layout
\begin_layout Standard
+
+\lang english
DowndloadFromScreen - copies an area from video memory to system memory
\end_layout
\begin_layout Standard
-Problématique des migrations de pixmaps
+
+\lang english
+PrepareAccess - makes the pixmap accessible from the CPU.
+ This includes mapping it into memory, copying it from unmappable video
+ memory, untiling the pixmap...
+\end_layout
+
+\begin_layout Standard
+
+\lang english
+FinishAccess - is called once the pixmap is done being accessed, and must
+ do the opposite of PrepareAccess.
+\end_layout
+
+\begin_layout Standard
+
+\lang english
+EXA Pixmap migration.
+ EXA tries to be smart about pixmap migration, and will only migrate the
+ parts of a pixmap that are required for an operation.
+ Migration heuristics Greedy/Mixed/Driver.
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Box Shadowbox
position "t"
hor_pos "c"
@@ -5139,21 +5913,35 @@ height_special "totalheight"
status open
\begin_layout Plain Layout
+
+\lang english
Takeaways:
\end_layout
\begin_layout Itemize
+
+\lang english
Multiple choices exist for accelerating 2D in X.Org.
\end_layout
\begin_layout Itemize
+
+\lang english
The most efficient one is EXA, which puts all the smart optimizations in
a common piece of code, and leaves the driver implementation very simple.
\end_layout
\begin_layout Itemize
-If your card cannot accelerate 2D operations, shadowfb is probably the path
- to take.
+
+\lang english
+Today, most 2D acceleration is implemented using the 3D engine of the graphics
+ card.
+\end_layout
+
+\begin_layout Itemize
+
+\lang english
+If your card cannot accelerate 2D operations, shadowfb is the path to take.
\end_layout
\end_inset
@@ -5162,6 +5950,8 @@ If your card cannot accelerate 2D operations, shadowfb is probably the path
\end_layout
\begin_layout Chapter
+
+\lang english
Video Decoding
\begin_inset CommandInset label
LatexCommand label
@@ -5173,68 +5963,124 @@ name "cha:Video-Decoding"
\end_layout
\begin_layout Section
+
+\lang english
+Video Standards
+\end_layout
+
+\begin_layout Standard
+
+\lang english
+H262 (mpeg 2, DVD)
+\end_layout
+
+\begin_layout Standard
+
+\lang english
+H263 (divx/mpeg4)
+\end_layout
+
+\begin_layout Standard
+
+\lang english
+H264 (used on blu-ray)
+\end_layout
+
+\begin_layout Section
+
+\lang english
Video decoding pipeline
\end_layout
\begin_layout Standard
+
+\lang english
Two typical video pipelines : mpeg2 and h264
\end_layout
\begin_layout Paragraph*
-The MPEG2 decoding pipeline
+
+\lang english
+The H262 decoding pipeline
\end_layout
\begin_layout Standard
+
+\lang english
iDCT -> MC -> CSC -> Final display
\end_layout
\begin_layout Paragraph*
+
+\lang english
The H.264 decoding pipeline
\end_layout
\begin_layout Standard
+
+\lang english
entropy decoding -> iDCT -> MC -> CSC -> Final display
\end_layout
\begin_layout Subsection
+
+\lang english
Entropy
\end_layout
\begin_layout Standard
+
+\lang english
Entropy encoding is a lossless compression phase.
It is the last stage of encoding and therefore also the first stage of
decoding.
\end_layout
\begin_layout Standard
+
+\lang english
CABAC/CAVLC
\end_layout
\begin_layout Subsection
+
+\lang english
Inverse DCT
\end_layout
\begin_layout Subsection
+
+\lang english
Motion Compensation
\end_layout
\begin_layout Subsection
+
+\lang english
Color Space Conversion
\end_layout
\begin_layout Standard
+
+\lang english
Color spaces
\end_layout
\begin_layout Standard
+
+\lang english
Linear relation
\end_layout
\begin_layout Standard
+
+\lang english
Conversion matrices
\end_layout
\begin_layout Standard
+
+\lang english
The YUV color space: 1 component luminance (Y) + 2 components chrominance
(UV).
Chrominance information is less relevant to the eye than chrominance, so
@@ -5244,32 +6090,446 @@ The YUV color space: 1 component luminance (Y) + 2 components chrominance
\end_layout
\begin_layout Standard
+
+\lang english
Bandwidth gain (RGBA32 vs YV12)
\end_layout
\begin_layout Standard
-YUV Planar and packed (interlaced) formats
+
+\lang english
+YUV Planar and packed (interlaced) formats.
\end_layout
\begin_layout Standard
+
+\lang english
+\begin_inset Float figure
+wide false
+sideways false
+status open
+
+\begin_layout Plain Layout
+\noindent
+
+\lang english
+\begin_inset ERT
+status open
+
+\begin_layout Plain Layout
+
+
+\backslash
+begin{tikzpicture}[node distance=1cm, auto]
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+tikzset{ mynode/.style={rectangle,rounded corners,draw=black, top color=white
+, bottom color=yellow!50,very thick, inner sep=1em, minimum size=3em, text
+ centered, drop shadow}, myarrow/.style={->, >=latex', shorten >=1pt,
+ thick}, mylabel/.style={text width=7em, text centered} }
+\end_layout
+
+\begin_layout Plain Layout
+
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+tikz{
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[top color=white, bottom color=yellow!50, drop shadow,very thick, inner
+ sep=1em] (0,2) rectangle (5,6);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[top color=white, bottom color=yellow!50, drop shadow,very thick, inner
+ sep=1em] (6,2) rectangle (8.5,4);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[top color=white, bottom color=yellow!50, drop shadow,very thick, inner
+ sep=1em] (10,2) rectangle (12.5,4);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node at (2.5,1.5) {Y plane};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node at (7.25,1.5) {U plane};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node at (11.25,1.5) {V plane};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node at (1.25,5.5) {$Y_{0}$ $Y_{1}$ $Y_{2}$ $
+\backslash
+cdots$};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node at (6.75,3.5) {$U_{0}$ $U_{1}$ $
+\backslash
+cdots$};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node at (10.75,3.5) {$V_{0}$ $V_{1}$ $
+\backslash
+cdots$};
+\end_layout
+
+\begin_layout Plain Layout
+
+% faux noeud pour pas que la légende soit collée
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node at (6,0.5) { };
+\end_layout
+
+\begin_layout Plain Layout
+
+}
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+end{tikzpicture}
+\end_layout
+
+\begin_layout Plain Layout
+
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Plain Layout
+\noindent
+\align left
+
+\lang english
+\begin_inset ERT
+status open
+
+\begin_layout Plain Layout
+
+
+\backslash
+begin{tikzpicture}[node distance=1cm, auto]
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+tikzset{ mynode/.style={rectangle,rounded corners,draw=black, top color=white
+, bottom color=yellow!50,very thick, inner sep=1em, minimum size=3em, text
+ centered, drop shadow}, myarrow/.style={->, >=latex', shorten >=1pt,
+ thick}, mylabel/.style={text width=7em, text centered} }
+\end_layout
+
+\begin_layout Plain Layout
+
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+tikz{
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[top color=white, bottom color=yellow!50, drop shadow,very thick, inner
+ sep=1em] (0,2) rectangle (5,6);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[top color=white, bottom color=yellow!50, drop shadow,very thick, inner
+ sep=1em] (6,2) rectangle (11,4);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node at (2.5,1.5) {Y plane};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node at (7.25,1.5) {UV plane};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node at (1.25,5.5) {$Y_{0}$ $Y_{1}$ $Y_{2}$ $
+\backslash
+cdots$};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node at (7.25,3.5) {$U_{0}$ $V_{0}$ $U_{1}$ $V_{1}$ $
+\backslash
+cdots$};
+\end_layout
+
+\begin_layout Plain Layout
+
+% faux noeud pour pas que la légende soit collée
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node at (6,0.5) { };
+\end_layout
+
+\begin_layout Plain Layout
+
+}
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+end{tikzpicture}
+\end_layout
+
+\begin_layout Plain Layout
+
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Plain Layout
+\noindent
+\align left
+
+\lang english
+\begin_inset ERT
+status open
+
+\begin_layout Plain Layout
+
+
+\backslash
+begin{tikzpicture}[node distance=1cm, auto]
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+tikzset{ mynode/.style={rectangle,rounded corners,draw=black, top color=white
+, bottom color=yellow!50,very thick, inner sep=1em, minimum size=3em, text
+ centered, drop shadow}, myarrow/.style={->, >=latex', shorten >=1pt,
+ thick}, mylabel/.style={text width=7em, text centered} }
+\end_layout
+
+\begin_layout Plain Layout
+
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+tikz{
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[top color=white, bottom color=yellow!50, drop shadow,very thick, inner
+ sep=1em] (0,2) rectangle (10,6);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node at (2.5,1.5) {YUV plane};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node at (1.25,5.5) {$Y_{0}$ $U_{0}$ $Y_{1}$ $V_{0}$ $Y_{2}$ $U_{1}$ $Y_{3}$
+ $V_{1}$ $
+\backslash
+cdots$};
+\end_layout
+
+\begin_layout Plain Layout
+
+% faux noeud pour pas que la légende soit collée
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node at (6,0.5) { };
+\end_layout
+
+\begin_layout Plain Layout
+
+}
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+end{tikzpicture}
+\end_layout
+
+\begin_layout Plain Layout
+
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Plain Layout
+\noindent
+
+\lang english
+\begin_inset Caption
+
+\begin_layout Plain Layout
+
+\lang english
+YUV layouts in memory: planar format example (YV12, top), partially interleaved
+ format example (NV12, middle), fully interleaved format example (YUY2,
+ bottom).
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\lang english
+\begin_inset Note Note
+status open
+
+\begin_layout Plain Layout
+
+\lang english
+figure schema planar vs interlaced
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\lang english
Plane order (YV12 vs NV12)
\end_layout
\begin_layout Standard
+
+\lang english
Order of the planes (YV12, I420)
\end_layout
\begin_layout Standard
+
+\lang english
http://en.wikipedia.org/wiki/YUV
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
+
+\lang english
\begin_inset Formula $\left[\begin{array}{c}
R\\
G\\
@@ -5287,6 +6547,8 @@ V\end{array}\right]$
status open
\begin_layout Plain Layout
+
+\lang english
filler verifier la formule
\end_layout
@@ -5296,9 +6558,13 @@ filler verifier la formule
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
+
+\lang english
\begin_inset CommandInset label
LatexCommand label
name "fig:YUV-to-RGB"
@@ -5319,12 +6585,16 @@ YUV to RGB Conversion formula as per ITU-R RB recommendation 601.
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
+
+\lang english
\begin_inset Formula $\left[\begin{array}{c}
R\\
G\\
@@ -5342,6 +6612,8 @@ V\end{array}\right]$
status open
\begin_layout Plain Layout
+
+\lang english
filler verifier la formule peut pas etre la meme que 601
\end_layout
@@ -5351,9 +6623,13 @@ filler verifier la formule peut pas etre la meme que 601
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
+
+\lang english
\begin_inset CommandInset label
LatexCommand label
name "fig:YUV-to-RGB-1"
@@ -5391,18 +6667,26 @@ reference "fig:YUV-to-RGB"
\end_layout
\begin_layout Standard
+
+\lang english
http://www.fourcc.org/yuv.php
\end_layout
\begin_layout Standard
+
+\lang english
http://www.glennchan.info/articles/articles.html
\end_layout
\begin_layout Standard
+
+\lang english
http://www.poynton.com/papers/SMPTE_98_YYZ_Luma/index.html
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float table
wide false
sideways false
@@ -5410,6 +6694,8 @@ status open
\begin_layout Plain Layout
\align center
+
+\lang english
\begin_inset Tabular
<lyxtabular version="3" rows="6" columns="4">
<features>
@@ -5422,6 +6708,8 @@ status open
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
Format name
\end_layout
@@ -5431,6 +6719,8 @@ Format name
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
Y:U:V bits per pixel
\end_layout
@@ -5440,6 +6730,8 @@ Y:U:V bits per pixel
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
Layout
\end_layout
@@ -5449,6 +6741,8 @@ Layout
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
Comments
\end_layout
@@ -5460,6 +6754,8 @@ Comments
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
YV12
\end_layout
@@ -5469,6 +6765,8 @@ YV12
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
8:2:2
\end_layout
@@ -5478,6 +6776,8 @@ YV12
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
1 Y plane, 1 V 2*2 sub-sampled plane, 1 U 2*2 sampled plane
\end_layout
@@ -5487,6 +6787,8 @@ YV12
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
Same as I420 except U and V are reversed.
\end_layout
@@ -5498,6 +6800,8 @@ Same as I420 except U and V are reversed.
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
I420
\end_layout
@@ -5507,6 +6811,8 @@ I420
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
8:2:2
\end_layout
@@ -5516,6 +6822,8 @@ I420
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
1 Y plane, 1 U 2*2 sub-sampled plane, 1 V 2*2 sub-sampled plane
\end_layout
@@ -5525,6 +6833,8 @@ I420
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
Same as YV12 except U and V are reversed.
\end_layout
@@ -5536,6 +6846,8 @@ Same as YV12 except U and V are reversed.
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
NV12
\end_layout
@@ -5545,6 +6857,8 @@ NV12
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
8:2:2
\end_layout
@@ -5554,6 +6868,8 @@ NV12
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
1 Y plane, 1 packed U+V 2*2 sub-sampled plane
\end_layout
@@ -5563,6 +6879,8 @@ NV12
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
Convenient for hardware implementation on 3D-capable GPUs
\end_layout
@@ -5574,6 +6892,8 @@ Convenient for hardware implementation on 3D-capable GPUs
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
YUY2 (YUYV)
\end_layout
@@ -5583,6 +6903,8 @@ YUY2 (YUYV)
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
8:4:4
\end_layout
@@ -5592,6 +6914,8 @@ YUY2 (YUYV)
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
1 Packed YUV plane
\end_layout
@@ -5601,6 +6925,8 @@ YUY2 (YUYV)
\begin_inset Text
\begin_layout Plain Layout
+
+\lang english
Packed as Y0U0Y1V0
\end_layout
@@ -5653,9 +6979,13 @@ Packed as Y0U0Y1V0
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
+
+\lang english
Common YUV color space formats
\end_layout
@@ -5670,10 +7000,14 @@ Common YUV color space formats
\end_layout
\begin_layout Standard
+
+\lang english
Pixel scaling
\end_layout
\begin_layout Standard
+
+\lang english
Since the conversion from YUV space to RGB space is linear, filtered scaling
can be done either in the YUV or RGB space, which conveniently allows using
texture filtering which is available on 3D hardware to sample the YUV data.
@@ -5689,6 +7023,8 @@ Since the conversion from YUV space to RGB space is linear, filtered scaling
\end_layout
\begin_layout Standard
+
+\lang english
If the hardware cannot achieve color space conversion and scaling at the
same time (for example if you have a YUV->RGB blitter and a shader less
3D engine), again the linear color conversion allows you to do the scaling
@@ -5696,14 +7032,20 @@ If the hardware cannot achieve color space conversion and scaling at the
\end_layout
\begin_layout Section
+
+\lang english
Video decoding APIs
\end_layout
\begin_layout Paragraph*
+
+\lang english
Xv
\end_layout
\begin_layout Standard
+
+\lang english
Xv is simply about CSC ans scaling.
In order to implement Xv, a typical X.Org driver will have to implement
this space conversion.
@@ -5718,48 +7060,70 @@ Xv is simply about CSC ans scaling.
\end_layout
\begin_layout Paragraph*
+
+\lang english
XvMC
\end_layout
\begin_layout Standard
+
+\lang english
idct + mc +csc
\end_layout
\begin_layout Paragraph*
+
+\lang english
VAAPI
\end_layout
\begin_layout Standard
+
+\lang english
VAAPI was initially created for intel's poulsbo video decoding.
The API is very tailored to embedded platforms and has many entry points,
at different pipeline stages, which makes it more complex to implement.
\end_layout
\begin_layout Paragraph*
+
+\lang english
VDPAU
\end_layout
\begin_layout Standard
+
+\lang english
The VDPAU was initiated by nvidia for H264 & VC1 decoding support
\end_layout
\begin_layout Paragraph*
+
+\lang english
XvBA
\end_layout
\begin_layout Standard
+
+\lang english
All 3 APIs are intended for full
\end_layout
\begin_layout Paragraph*
+
+\lang english
OpenMax
\end_layout
\begin_layout Standard
+
+\lang english
http://x264dev.multimedia.cx
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Box Shadowbox
position "t"
hor_pos "c"
@@ -5773,19 +7137,27 @@ height_special "totalheight"
status open
\begin_layout Plain Layout
+
+\lang english
Takeaways:
\end_layout
\begin_layout Itemize
+
+\lang english
A video decoding pipeline consists in multiple stages chained together.
\end_layout
\begin_layout Itemize
+
+\lang english
Color space conversion and scaling is the most important stage, and if your
driver implements only one operation for simplicity, this is it.
\end_layout
\begin_layout Itemize
+
+\lang english
Implementing a full pipeline can provide a high performance boost, and save
battery life on mobile systems.
\end_layout
@@ -5796,6 +7168,8 @@ Implementing a full pipeline can provide a high performance boost, and save
\end_layout
\begin_layout Chapter
+
+\lang english
OpenGL
\begin_inset CommandInset label
LatexCommand label
@@ -5807,14 +7181,32 @@ name "cha:OpenGL"
\end_layout
\begin_layout Standard
+
+\lang english
+OpenGL is a specification.
+ There are many OpenGL implementations, both hardware accelerated and in
+ software.
+ As a driver author, our job is sometimes to provide a hardware-accelerated
+ OpenGL implementation.
+ In this section we describe the OpenGL pipeline from the point of view
+ of the driver.
+\end_layout
+
+\begin_layout Standard
+
+\lang english
OpenGL ARB, khronos, bla bla...
\end_layout
\begin_layout Section
+
+\lang english
The OpenGL Rendering Pipeline
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Float figure
placement tbh
wide false
@@ -5824,6 +7216,8 @@ status open
\begin_layout Plain Layout
\noindent
\align center
+
+\lang english
\begin_inset ERT
status open
@@ -5839,13 +7233,13 @@ begin{tikzpicture}[node distance=1cm, auto]
\backslash
tikzset{ mynode/.style={rectangle,rounded corners,draw=black, top color=white
-, bottom color=yellow!50,very thick, inner sep=1em, minimum size=3em, text
+, bottom color=yellow!50,very thick, inner sep=0.5em, minimum size=2em, text
centered, drop shadow}, myarrow/.style={->, >=latex', shorten >=1pt,
thick}, mylabel/.style={text width=7em, text centered} , mynode2/.style={rec
-tangle,rounded corners,draw=black, top color=white, bottom color=green!50,very
- thick, inner sep=1em, minimum size=3em, text centered, drop shadow}, mynode3/.st
-yle={rectangle,rounded corners,draw=black, top color=white, bottom color=red!50,
-very thick, inner sep=1em, minimum size=3em, text centered, drop shadow},}
+tangle,rounded corners,draw=black, top color=white, bottom color=red!50,very
+ thick, inner sep=0.5em, minimum size=2em, text centered, drop shadow}, mynode3/.s
+tyle={rectangle,rounded corners,draw=black, top color=white, bottom color=green!
+50,very thick, inner sep=0.5em, minimum size=2em, text centered, drop shadow},}
\end_layout
@@ -5853,29 +7247,68 @@ very thick, inner sep=1em, minimum size=3em, text centered, drop shadow},}
\backslash
-node[mynode] (vertex) {Vertex Shader};
+node[mynode, text width = 3cm] (vertex) {Vertex Shader};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode3, left=2cm of vertex] (vertexprog) {Vertex Shader Program};
-
+node[mynode3, text width = 4cm, right=1cm of vertex] (vertexcalls) {glDrawElemen
+ts, glArrayElement, glDrawArrays};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node[mynode2, text width = 3cm, left=1cm of vertex] (vertexprog) {Vertex
+ Shader Program};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node[mynode, text width = 3cm, below=0.3cm of vertex] (geom) {Geometry Shader};
+
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node[mynode2, text width = 3cm, left=1cm of geom] (geomprog) {Geometry Shader
+ Program};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node[mynode, text width = 3cm, below=0.3cm of geom] (clip) {Clipping};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node[mynode, text width = 3cm, below=0.3cm of clip] (viewport) {Viewport};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode, below=0.3cm of vertex] (geom) {Geometry Shader};
+node[mynode3, text width = 4cm, right=1cm of viewport] (viewportcalls) {glViewpo
+rt};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode3, left=2cm of geom] (geomprog) {Geometry Shader Program};
+node[mynode, text width = 3cm, below=0.3cm of viewport] (cull) {Face Culling};
\end_layout
@@ -5883,106 +7316,216 @@ node[mynode3, left=2cm of geom] (geomprog) {Geometry Shader Program};
\backslash
-node[mynode, below=0.3cm of geom] (clip) {Clipping};
+node[mynode3, text width = 4cm, right=1cm of cull] (cullcalls) {glCullFace,
+ glFrontFace, glPolygonMode, glEnable(GL
+\backslash
+_CULL
+\backslash
+_FACE)};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode, below=0.3cm of clip] (viewport) {Viewport};
+node[mynode2, text width = 3cm, left=1cm of viewport] (uniforms) {Uniforms
+ and Samplers};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode, below=0.3cm of viewport] (cull) {Culling};
+node[mynode, text width = 3cm, below=0.3cm of cull] (rast) {Rasterization};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode3, left=2cm of viewport] (uniforms) {Uniforms and Samplers};
-
+node[mynode3, text width = 4cm, right=1cm of rast] (rastcalls) {glPolygonOffset,
+ glPointSize, glLineStipple, glLineWidth};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node[mynode, text width = 3cm, below=0.3cm of rast] (frag) {Fragment Shader};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node[mynode2, text width = 3cm, left=1cm of frag] (fragprog) {Fragment Shader
+ Program};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node[mynode, text width = 3cm, below=0.3cm of frag] (scissor) {Scissor};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node[mynode3, text width = 4cm, right=1cm of scissor] (scissorcalls) {glScissor,
+ glEnable(GL
+\backslash
+_SCISSOR)};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node[mynode, text width = 3cm, below=0.3cm of scissor] (multisample) {Multisample
+};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node[mynode, text width = 3cm, below=0.3cm of multisample] (stencil) {Stencil};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node[mynode3, text width = 4cm, right=1cm of stencil] (stencilcalls) {glStencilF
+unc, glStentilMask, glStencilOp, glEnable(GL
+\backslash
+_STENCIL
+\backslash
+_TEST)};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node[mynode, text width = 3cm, below=0.3cm of stencil] (depth) {Depth};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node[mynode3, text width = 4cm, right=1cm of depth] (depthcalls) {glDepthFunc,
+ glDepthMask, glEnable(GL
+\backslash
+_DEPTH
+\backslash
+_TEST)};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode, below=0.3cm of cull] (rast) {Rasterization};
+node[mynode, text width = 3cm, below=0.3cm of depth] (query) {Occlusion Query};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode, below=0.3cm of rast] (frag) {Fragment Shader};
+node[mynode, text width = 3cm, below=0.3cm of query] (blend) {Blending};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode3, left=2cm of frag] (fragprog) {Fragment Shader Program};
+node[mynode3, text width = 4cm, right=1cm of blend] (blendcalls) {glBlendColor,
+ glBlendFunc, glBlendEquation, glEnable(GL
+\backslash
+_BLEND)};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode, below=0.3cm of frag] (scissor) {Scissor};
+node[mynode, text width = 3cm, below=0.3cm of blend] (dither) {Dithering};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode, below=0.3cm of scissor] (multisample) {Multisample};
+node[mynode3, text width = 4cm, right=1cm of dither] (dithercalls) {glEnable(GL
+\backslash
+_DITHER)};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode, below=0.3cm of multisample] (stencil) {Stencil};
+node[mynode, text width = 3cm, below=0.3cm of dither] (logicop) {Logic Op};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode, below=0.3cm of stencil] (depth) {Depth};
+node[mynode3, text width = 4cm, right=1cm of logicop] (logicopcalls) {glLogicOp,
+ glEnable(GL
+\backslash
+_COLOR
+\backslash
+_LOGIC
+\backslash
+_OP)};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode, below=0.3cm of depth] (query) {Occlusion Query};
+node[mynode, text width = 3cm, below=0.3cm of logicop] (mask) {Masking};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode, below=0.3cm of query] (blend) {Blending};
+node[mynode3, text width = 4cm, right=1cm of mask] (maskcalls) {glColorMask,
+ glIndexMask};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode, below=0.3cm of blend] (dither) {Dithering};
+node[mynode, text width = 3cm, below=0.3cm of mask] (fbcon) {Framebuffer
+ Control};
\end_layout
\begin_layout Plain Layout
\backslash
-node[mynode2, below=0.3cm of dither] (fb) {Framebuffer};
+node[mynode3, text width = 4cm, right=1cm of fbcon] (fbconcalls) {glDrawBuffer,
+ glDrawBuffers};
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+node[mynode2, text width = 3cm, below=0.3cm of fbcon] (fb) {Framebuffer};
+\end_layout
+
+\begin_layout Plain Layout
+
\end_layout
\begin_layout Plain Layout
@@ -6003,7 +7546,21 @@ draw[myarrow] (geom.south) -> (clip.north);
\backslash
-draw[myarrow] (clip.south) -> (rast.north);
+draw[myarrow] (clip.south) -> (viewport.north);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[myarrow] (viewport.south) -> (cull.north);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[myarrow] (cull.south) -> (rast.north);
\end_layout
\begin_layout Plain Layout
@@ -6017,18 +7574,77 @@ draw[myarrow] (rast.south) -> (frag.north);
\backslash
-draw[myarrow] (frag.south) -> (blend.north);
+draw[myarrow] (frag.south) -> (scissor.north);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[myarrow] (scissor.south) -> (multisample.north);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[myarrow] (multisample.south) -> (stencil.north);
\end_layout
\begin_layout Plain Layout
+
+\backslash
+draw[myarrow] (stencil.south) -> (depth.north);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[myarrow] (depth.south) -> (query.north);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[myarrow] (query.south) -> (blend.north);
\end_layout
\begin_layout Plain Layout
\backslash
-draw[myarrow] (blend.south) -> (fb.north);
+draw[myarrow] (blend.south) -> (dither.north);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[myarrow] (dither.south) -> (logicop.north);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[myarrow] (logicop.south) -> (mask.north);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[myarrow] (mask.south) -> (fbcon.north);
+\end_layout
+
+\begin_layout Plain Layout
+
+
+\backslash
+draw[myarrow] (fbcon.south) -> (fb.north);
\end_layout
\begin_layout Plain Layout
@@ -6102,10 +7718,14 @@ end{tikzpicture}
\end_layout
\begin_layout Plain Layout
+
+\lang english
\begin_inset Caption
\begin_layout Plain Layout
-The OpenGL pipeline.
+
+\lang english
+The OpenGL 3.2 pipeline.
\end_layout
\end_inset
@@ -6119,38 +7739,56 @@ The OpenGL pipeline.
\end_layout
\begin_layout Subsection
+
+\lang english
Vertex processing
\end_layout
\begin_layout Standard
+
+\lang english
vertex stage
\end_layout
\begin_layout Standard
+
+\lang english
vertex buffers
\end_layout
\begin_layout Subsection
+
+\lang english
Geometry processing
\end_layout
\begin_layout Subsection
+
+\lang english
Fragment processing
\end_layout
\begin_layout Standard
+
+\lang english
Rasterization
\end_layout
\begin_layout Standard
+
+\lang english
Render buffers
\end_layout
\begin_layout Standard
+
+\lang english
Textures
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Box Shadowbox
position "t"
hor_pos "c"
@@ -6164,10 +7802,14 @@ height_special "totalheight"
status open
\begin_layout Plain Layout
+
+\lang english
Takeaways:
\end_layout
\begin_layout Itemize
+
+\lang english
OpenGL is a suite of stages arranged in a pipeline.
\end_layout
@@ -6177,6 +7819,8 @@ OpenGL is a suite of stages arranged in a pipeline.
\end_layout
\begin_layout Chapter
+
+\lang english
Mesa
\begin_inset CommandInset label
LatexCommand label
@@ -6188,18 +7832,29 @@ name "cha:Mesa"
\end_layout
\begin_layout Standard
-Mesa is the Common Rendering Architecture for all open source graphics drivers.
+
+\lang english
+Mesa is both a software OpenGL implementation, and the common rendering
+ architecture for all open source hardware accelerated graphics drivers.
+ We now describe the internals of Mesa and the available interfaces and
+ infrastructure required for graphics drivers.
\end_layout
\begin_layout Section
+
+\lang english
Mesa
\end_layout
\begin_layout Standard
+
+\lang english
Mesa serves two major purposes:
\end_layout
\begin_layout Itemize
+
+\lang english
Mesa is a software implementation of OpenGL.
It is considered to be the reference implementation and is useful in checking
conformance, seeing that the official OpenGL conformance tests are not
@@ -6207,23 +7862,33 @@ Mesa is a software implementation of OpenGL.
\end_layout
\begin_layout Itemize
+
+\lang english
Mesa provides the OpenGL entry points for Open Source graphics drivers under
linux.
\end_layout
\begin_layout Standard
+
+\lang english
In this section, we will focus on the second point.
\end_layout
\begin_layout Section
+
+\lang english
Mesa internals
\end_layout
\begin_layout Subsection
+
+\lang english
Textures in mesa
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Box Shadowbox
position "t"
hor_pos "c"
@@ -6237,14 +7902,20 @@ height_special "totalheight"
status open
\begin_layout Plain Layout
+
+\lang english
Takeaways:
\end_layout
\begin_layout Itemize
+
+\lang english
Mesa is the reference OpenGL implementation under Linux.
\end_layout
\begin_layout Itemize
+
+\lang english
All Open Source graphics drivers use Mesa for 3D
\end_layout
@@ -6254,6 +7925,8 @@ All Open Source graphics drivers use Mesa for 3D
\end_layout
\begin_layout Chapter
+
+\lang english
Gallium 3D
\begin_inset CommandInset label
LatexCommand label
@@ -6265,22 +7938,32 @@ name "cha:Gallium-3D"
\end_layout
\begin_layout Standard
+
+\lang english
Gallium 3D is the Future of 3D Acceleration.
\end_layout
\begin_layout Standard
+
+\lang english
http://jrfonseca.blogspot.com/2008/04/gallium3d-introduction.html
\end_layout
\begin_layout Standard
+
+\lang english
http://people.freedesktop.org/~csimpson/gallium-docs/
\end_layout
\begin_layout Section
+
+\lang english
Gallium3D: a plan for a new generation of hardware
\end_layout
\begin_layout Standard
+
+\lang english
Ten years ago, GPUs were a direct match with all the OpenGL or Direct3D
functionality; back then the GPUs had specific transistors dedicated to
each piece of functionality.
@@ -6310,39 +7993,57 @@ emulate
\end_layout
\begin_layout Standard
+
+\lang english
everything is a shader, including inside the driver
\end_layout
\begin_layout Standard
+
+\lang english
thin layer for fixed pipe -> programmable functionality translation
\end_layout
\begin_layout Standard
+
+\lang english
global diagram
\end_layout
\begin_layout Section
+
+\lang english
State trackers
\end_layout
\begin_layout Standard
+
+\lang english
A state tracker implements an API (for example OpenGL, OpenVG, Direct3D...)
by turning it into API-agnostic and hardware-agnostic TGSI calls.
\end_layout
\begin_layout Section
+
+\lang english
Pipe driver
\end_layout
\begin_layout Standard
+
+\lang english
A pipe driver is the main part of a hardware-specific driver.
\end_layout
\begin_layout Section
+
+\lang english
Winsys
\end_layout
\begin_layout Standard
+
+\lang english
The winsys is in charge of talking to the OS/Platform of choice.
The pipe driver relies on the Winsys to talk to the hardware.
For example, this allows having a single pipe driver with multiple winsyses
@@ -6350,26 +8051,38 @@ The winsys is in charge of talking to the OS/Platform of choice.
\end_layout
\begin_layout Section
+
+\lang english
Writing Gallium3D drivers
\end_layout
\begin_layout Standard
+
+\lang english
screen
\end_layout
\begin_layout Standard
+
+\lang english
context
\end_layout
\begin_layout Standard
+
+\lang english
pipe_transfer
\end_layout
\begin_layout Section
+
+\lang english
Shaders in Gallium
\end_layout
\begin_layout Standard
+
+\lang english
In order to operate shaders, Gallium features an internal shader description
language which uses 4-component vectors.
We will later refer to the 4 components of a vector as x,y,z,w.
@@ -6381,20 +8094,28 @@ s of v in that order, and swizzling is allowed, for example v.wzyx reverses
\end_layout
\begin_layout Standard
+
+\lang english
These components usually carry no semantics, and despite their name they
can very well carry a color or an opacity value indifferently.
\end_layout
\begin_layout Standard
+
+\lang english
TGSI instruction set
\end_layout
\begin_layout Standard
+
+\lang english
mesa/src/gallium/auxiliary/tgsi/tgsi-instruction-set.txt
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Box Shadowbox
position "t"
hor_pos "c"
@@ -6408,18 +8129,26 @@ height_special "totalheight"
status open
\begin_layout Plain Layout
+
+\lang english
Takeaways:
\end_layout
\begin_layout Itemize
+
+\lang english
Gallium 3D is the new graphics API.
\end_layout
\begin_layout Itemize
+
+\lang english
Everything is converted to a shader internally, fixed functionality is gone.
\end_layout
\begin_layout Itemize
+
+\lang english
Drivers are simpler than classic Mesa drivers, as one only has to implement
shaders to get all fixed functionality to work.
\end_layout
@@ -6430,6 +8159,8 @@ Drivers are simpler than classic Mesa drivers, as one only has to implement
\end_layout
\begin_layout Chapter
+
+\lang english
GPU Computing
\begin_inset CommandInset label
LatexCommand label
@@ -6441,6 +8172,8 @@ name "cha:GPU-Computing"
\end_layout
\begin_layout Chapter
+
+\lang english
Suspend and Resume
\begin_inset CommandInset label
LatexCommand label
@@ -6452,18 +8185,26 @@ name "cha:Suspend-and-Resume"
\end_layout
\begin_layout Standard
+
+\lang english
VT switches
\end_layout
\begin_layout Standard
+
+\lang english
Card state
\end_layout
\begin_layout Standard
+
+\lang english
Suspend/resume hooks in the DRM
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Box Shadowbox
position "t"
hor_pos "c"
@@ -6477,10 +8218,14 @@ height_special "totalheight"
status open
\begin_layout Plain Layout
+
+\lang english
Takeaways:
\end_layout
\begin_layout Itemize
+
+\lang english
Suspend and resume has long been very clumsy, but this is solved now thanks
to the DRM implementing more functionality.
\end_layout
@@ -6491,6 +8236,8 @@ Suspend and resume has long been very clumsy, but this is solved now thanks
\end_layout
\begin_layout Chapter
+
+\lang english
Technical Specifications
\begin_inset CommandInset label
LatexCommand label
@@ -6502,6 +8249,8 @@ name "cha:Technical-Specifications"
\end_layout
\begin_layout Standard
+
+\lang english
Technical specifications are the nuts and bolts of graphics driver work.
Without hardware specifications, no work can be started.
However, manufacturing companies are usually wary of sharing said specification
@@ -6514,38 +8263,54 @@ tions), it is still very widespread and prevents a lot of hardware from
\end_layout
\begin_layout Section
+
+\lang english
Obtaining official specifications
\end_layout
\begin_layout Paragraph*
+
+\lang english
Public specifications
\end_layout
\begin_layout Standard
+
+\lang english
Some vendors distribute the technical documentation for their hardware publicly
without restrictions.
\end_layout
\begin_layout Standard
+
+\lang english
Sometimes, things can be as simple as asking the vendor, who might share
the documentation (possibly under NDA, see below).
\end_layout
\begin_layout Paragraph*
+
+\lang english
NDA (Non-Disclosure Agreement)
\end_layout
\begin_layout Standard
+
+\lang english
Put simply, an NDA is a contract signed between the developer and the hardware
company, by which the developer agrees not to spread the docs he received.
However, there can be more restrictions in an NDA.
\end_layout
\begin_layout Standard
+
+\lang english
Terms of the NDA
\end_layout
\begin_layout Standard
+
+\lang english
Before signing an NDA, think.
Whatever lawyers say, there is no such thing as a
\begin_inset Quotes eld
@@ -6559,23 +8324,33 @@ standard
\end_layout
\begin_layout Standard
+
+\lang english
Can Open Source drivers be written from that documentation under that NDA?
\end_layout
\begin_layout Standard
+
+\lang english
What happens when the NDA expires? Can code still be free, are you bound
by any clause?
\end_layout
\begin_layout Standard
+
+\lang english
What about yourself? Are you prevented from doing further work on this hardware?
\end_layout
\begin_layout Section
+
+\lang english
Reverse engineering
\end_layout
\begin_layout Standard
+
+\lang english
When specifications are not easily available or just incomplete, an alternate
route is reverse engineering.
Reverse engineering consists in figuring out the specifications for a given
@@ -6584,6 +8359,8 @@ When specifications are not easily available or just incomplete, an alternate
\end_layout
\begin_layout Standard
+
+\lang english
Reverse engineering is not just a tool to obtain missing hardware specifications
, it is also a strong means of Open Source advocacy.
Once a reverse engineered driver exists and ships in linux distributions,
@@ -6592,6 +8369,8 @@ Reverse engineering is not just a tool to obtain missing hardware specifications
\end_layout
\begin_layout Standard
+
+\lang english
not as difficult as it seems, requires organization, being rigorous.
Write down all bits of information (even incomplete bits), share it among
developers, try to work out bits one by one.
@@ -6600,10 +8379,14 @@ not as difficult as it seems, requires organization, being rigorous.
\end_layout
\begin_layout Paragraph*
+
+\lang english
Mmiotrace
\end_layout
\begin_layout Standard
+
+\lang english
The basic idea behind mmio-trace is simple: it first hooks the ioremap call,
and therefore prevents mapping of a designated I/O area.
Subsequently, accesses to this area will generate page faults, which are
@@ -6616,15 +8399,21 @@ The basic idea behind mmio-trace is simple: it first hooks the ioremap call,
\end_layout
\begin_layout Standard
+
+\lang english
mmio trace is now part of the official Linux kernels.
Therefore, any pre-existing driver can be traced.
\end_layout
\begin_layout Paragraph*
+
+\lang english
Libsegfault
\end_layout
\begin_layout Standard
+
+\lang english
libsegfault is similar to mmio-trace in the way it works: after removing
some pages which one want to track accesses to, it will generate a segmentation
fault on each access and therefore be able to report each access.
@@ -6633,10 +8422,14 @@ libsegfault is similar to mmio-trace in the way it works: after removing
\end_layout
\begin_layout Paragraph*
+
+\lang english
Valgrind-mmt
\end_layout
\begin_layout Standard
+
+\lang english
Valgrind is a dynamic recompiling and instrumentation framework.
Valgrint-mmt is a plugin for valgrind which implements tracing of read
and writes to a certain range of memory addresses, usually an mmio range
@@ -6646,14 +8439,20 @@ Valgrind is a dynamic recompiling and instrumentation framework.
\end_layout
\begin_layout Paragraph*
+
+\lang english
vbetool
\end_layout
\begin_layout Paragraph*
+
+\lang english
Virtualization
\end_layout
\begin_layout Standard
+
+\lang english
Finally, one last pre-existing tool to help reverse engineering is virtualizatio
n.
By running a proprietary driver in a controled environment, one can figure
@@ -6663,10 +8462,14 @@ n.
\end_layout
\begin_layout Paragraph*
+
+\lang english
Ad-hoc tools
\end_layout
\begin_layout Standard
+
+\lang english
In addition to these generic tools, you will often find it useful to implement
your own additional tools, tailored for specific needs.
Renouveau is an example of such a tool that integrates the reverse engineering
@@ -6679,6 +8482,8 @@ In addition to these generic tools, you will often find it useful to implement
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Box Shadowbox
position "t"
hor_pos "c"
@@ -6692,19 +8497,27 @@ height_special "totalheight"
status open
\begin_layout Plain Layout
+
+\lang english
Takeaways:
\end_layout
\begin_layout Itemize
+
+\lang english
Technical specifications of course very important for authoring graphics
drivers.
\end_layout
\begin_layout Itemize
+
+\lang english
NDAs can have unforeseen implications on yourself and your work.
\end_layout
\begin_layout Itemize
+
+\lang english
When they are unavailable, incomplete or just plain wrong, reverse engineering
can help you figure out how the hardware actually works.
\end_layout
@@ -6715,6 +8528,8 @@ When they are unavailable, incomplete or just plain wrong, reverse engineering
\end_layout
\begin_layout Chapter
+
+\lang english
Beyond Development
\begin_inset CommandInset label
LatexCommand label
@@ -6726,52 +8541,76 @@ name "cha:Beyond-Development"
\end_layout
\begin_layout Section
+
+\lang english
Testing for conformance
\end_layout
\begin_layout Paragraph*
+
+\lang english
Rendercheck
\end_layout
\begin_layout Paragraph*
+
+\lang english
OpenGL conformance test suite
\end_layout
\begin_layout Standard
+
+\lang english
The official OpenGL testing suite is not publicly available, and (paying)
Khronos Membership is required.
Instead, most developers use alternate sources for test programs.
\end_layout
\begin_layout Paragraph*
+
+\lang english
Piglit
\end_layout
\begin_layout Paragraph*
+
+\lang english
glean
\end_layout
\begin_layout Standard
+
+\lang english
glean.sourceforge.net
\end_layout
\begin_layout Paragraph*
+
+\lang english
Mesa demos
\end_layout
\begin_layout Standard
+
+\lang english
mesa/progs/*
\end_layout
\begin_layout Section
+
+\lang english
Debugging
\end_layout
\begin_layout Paragraph*
+
+\lang english
gdb and X.Org
\end_layout
\begin_layout Standard
+
+\lang english
gdb needs to run on a terminal emulator while the application debug might
be with a lock held.
That might result in a deadlock between the application stuck with a lock
@@ -6779,38 +8618,56 @@ gdb needs to run on a terminal emulator while the application debug might
\end_layout
\begin_layout Standard
+
+\lang english
printk debug
\end_layout
\begin_layout Standard
+
+\lang english
crash (surcouche gdb pour analyser les vmcore)
\end_layout
\begin_layout Standard
+
+\lang english
kgdb
\end_layout
\begin_layout Standard
+
+\lang english
serial console
\end_layout
\begin_layout Standard
+
+\lang english
diskdump
\end_layout
\begin_layout Standard
+
+\lang english
linux-uml
\end_layout
\begin_layout Standard
+
+\lang english
systemtap
\end_layout
\begin_layout Section
+
+\lang english
Upstreaming
\end_layout
\begin_layout Standard
+
+\lang english
Submitting your code for inclusion in the official trees is an important
part of the graphics driver development process under linux.
There are multiple motivations for doing this.
@@ -6818,27 +8675,39 @@ Submitting your code for inclusion in the official trees is an important
\end_layout
\begin_layout Standard
+
+\lang english
First, this allows end users to get hold of your driver more easily.
\end_layout
\begin_layout Standard
+
+\lang english
Second, this makes it easier for your driver maintenance in the future:
in the event of interface changes,
\end_layout
\begin_layout Standard
+
+\lang english
Why upstream?
\end_layout
\begin_layout Standard
+
+\lang english
How?
\end_layout
\begin_layout Standard
+
+\lang english
When?
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Box Shadowbox
position "t"
hor_pos "c"
@@ -6852,19 +8721,27 @@ height_special "totalheight"
status open
\begin_layout Plain Layout
+
+\lang english
Takeaways:
\end_layout
\begin_layout Itemize
+
+\lang english
Thoroughly testing all your changes can save you the cost of bisection later
on.
\end_layout
\begin_layout Itemize
+
+\lang english
Debugging is not easy for graphics drivers.
\end_layout
\begin_layout Itemize
+
+\lang english
By upstreaming your code in official repositories, you save yourself the
burden of adapting it to ever-moving programming interfaces in X.Org, Mesa
and the kernel.
@@ -6876,6 +8753,8 @@ By upstreaming your code in official repositories, you save yourself the
\end_layout
\begin_layout Chapter
+
+\lang english
Conclusions
\begin_inset CommandInset label
LatexCommand label
@@ -6887,19 +8766,27 @@ name "cha:Conclusions"
\end_layout
\begin_layout Standard
+
+\lang english
\begin_inset Note Note
status open
\begin_layout Plain Layout
+
+\lang english
Bordel à caser quelque part :
\end_layout
\begin_layout Plain Layout
+
+\lang english
- la composition, avec XRender ou avec GLX + GL_EXT_texture_from_pixmap,
expliquer les différences
\end_layout
\begin_layout Plain Layout
+
+\lang english
- XGL, AIGLX
\end_layout