1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
|
DVD subtitles
---------------
0. Introduction
1. Basics
2. The data structure
3. Reading the control header
4. Decoding the graphics
5. What I do not know yet / What I need
6. Thanks
7. Changes
The latest version of this document can be found here:
http://www.via.ecp.fr/~sam/doc/dvd/
0. Introduction
One of the last things we missed in DVD decoding under my system was the
decoding of subtitles. I found no information on the web or Usenet about them,
apart from a few words on them being run-length encoded in the DVD FAQ.
So we decided to reverse-engineer their format (it's completely legal in
France, since we did it on interoperability purposes), and managed to get
almost all of it.
1. Basics
DVD subtitles are hidden in private PS packets (0x000001ba), just like AC3
streams are.
Within the PS packet, there are PES packets, and like AC3, the header for the
ones containing subtitles have a 0x000001bd header.
As for AC3, where there's an ID like (0x80 + x), there's a subtitle ID equal
to (0x20 + x), where x is the subtitle ID. Thus there seems to be only
16 possible different subtitles on a DVD (my Taxi Driver copy has 16).
I'll suppose you know how to extract AC3 from a DVD, and jump to the
interesting part of this documentation. Anyway you're unlikely to have
understood what I said without already being familiar with MPEG2.
2. The data structure
A subtitle packet, after its parts have been collected and appended, looks
like this :
+----------------------------------------------------------+
| |
| 0 2 size |
| +----+------------------------+-----------------+ |
| |size| data packet | control | |
| +----+------------------------+-----------------+ |
| |
| a subtitle packet |
| |
+----------------------------------------------------------+
size is a 2 bytes word, and data packet and control may have any size.
Here is the structure of the data packet :
+----------------------------------------------------------+
| |
| 2 4 S0+2 |
| +----+------------------------------------------+ |
| | S0 | data | |
| +----+------------------------------------------+ |
| |
| the data packet |
| |
+----------------------------------------------------------+
S0, the data packet size, is a 2 bytes word.
Finally, here's the structure of the control packet :
+----------------------------------------------------------+
| |
| S0+2 S0+4 S1 size |
| +----+---------+---------+--+---------+--+---------+ |
| | S1 |ctrl seq |ctrl seq |..|ctrl seq |ff| end seq | |
| +----+---------+---------+--+---------+--+---------+ |
| |
| the control packet |
| |
+----------------------------------------------------------+
To summarize :
- S1, at offset S0+2, the position of the end sequence
- several control sequences
- the 'ff' byte
- the end sequence
3. Reading the control header
The first thing to read is the control sequences. There are several
types of them, and each type is determined by its first byte. As far
as I know, each type has a fixed length.
* type 0x01 : '01' - 1 byte
it seems to be an empty control sequence.
* type 0x03 : '03wxyz' - 3 bytes
this one has the palette information ; it basically says 'encoded color 0
is the wth color of the palette, encoded color 1 is the xth color, aso.
* type 0x04 : '04wxyz' - 3 bytes
I *think* this is the alpha channel information ; I only saw values of 0 or f
for those nibbles, so I can't really be sure, but it seems plausable.
* type 0x05 : '05xxxXXXyyyYYY' - 7 bytes
the coordinates of the subtitle on the screen :
xxx is the first column of the subtitle
XXX is the last column of the subtitle
yyy is the first line of the subtitle
YYY is the last line of the subtitle
thus the subtitle's size is (XXX-xxx+1) x (YYY-yyy+1)
* type 0x06 : '06xxxxyyyy' - 5 bytes
xxxx is the position of the first graphic line, and yyyy is the position of
the second one (the graphics are interlaced, so it helps a lot :p)
The end sequence has this structure:
xxxx yyyy 02 ff (ff)
it ends with 'ff' or 'ffff', to make the whole packet have an even length.
FIXME: I absolutely don't know what xxxx is. I suppose it may be some date
information since I found it nowhere else, but I can't be sure.
yyyy is equal to S1 (see picture).
Example of a control header :
----
0A 0C 01 03 02 31 04 0F F0 05 00 02 CF 00 22 3E 06 00 06 04 E9 FF 00 93 0A 0C 02 FF
----
Let's decode it. First of all, S1 = 0x0a0c.
The control sequences are :
01
Nothing to say about this one
03 02 31
Color 0 is 0, color 1 is 2, color 2 is 3, and color 3 is 1.
04 0F F0
Colors 0 and 3 are transparent, and colors 2 and 3 are opaque (not sure of this one)
05 00 02 CF 00 22 3E
The first column is 0x000, the last one is 0x2cf, the first line is 0x002, and
the last line is 0x23e. Thus the subtitle's size is 0x2d0 x 0x23d.
06 00 06 04 E9
The first encoded image starts at offset 0x006, and the second one starts at 0x04e9.
And the end sequence is :
00 93 0A 0C 02 FF
Which means... well, not many things now. We can at least verify that S1 (0x0a0c) is
there.
4. Decoding the graphics
The graphics are rather easy to decode (at least, when you know how to do it - it
took us one whole week to figure out what the encoding was :p).
The picture is interlaced, for instance for a 40 lines picture :
line 0 ---------------#----------
line 2 ------#-------------------
...
line 38 ------------#-------------
line 1 ------------------#-------
line 3 --------#-----------------
...
line 39 -------------#------------
When decoding you should get:
line 0 ---------------#----------
line 1 ------------------#-------
line 2 ------#-------------------
line 3 --------#-----------------
...
line 38 ------------#-------------
line 39 -------------#------------
Computers with weak processors could choose only to decode even lines
in order to gain some time, for instance.
The encoding is run-length encoded, with the following alphabet:
0xf
0xe
0xd
0xc
0xb
0xa
0x9
0x8
0x7
0x6
0x5
0x4
0x3-
0x2-
0x1-
0x0f-
0x0e-
0x0d-
0x0c-
0x0b-
0x0a-
0x09-
0x08-
0x07-
0x06-
0x05-
0x04-
0x03--
0x02--
0x01--
0x0000
'-' stands for any other nibble. Once a sequence X of this alphabet has
been read, the pixels can be displayed : (X >> 2) is the number of pixels
to display, and (X & 0x3) is the color of the pixel.
For instance, 0x23 means "8 pixels of color 3".
"0000" has a special meaning : it's a carriage return. The decoder should
do a carriage return when reaching the end of the line, or when encountering
this "0000" sequence. When doing a carriage return, the parser should be
reset to the next even position (it cannot be nibble-aligned at the start
of a line).
After a carriage return, the parser should read a line on the other
interlaced picture, and swap like this after each carriage return.
Perhaps I don't explain this very well, so you'd better have a look at
the enclosed source.
5. What I do not know yet / What I need
I don't know what's in the end sequence yet.
Also, I don't know exactly when to display subtitles, and when to remove them.
I don't know if there are other types of control sequences (in my programs I consider
0xff as a control sequence type, as well as 0x02. I don't know if it's correct or not,
so please comment on this).
I don't know what the "official" color palette is.
I don't know how to handle transparency information.
I don't know if this document is generic enough.
So what I need is you :
- if you can, patch this document or my programs to fix strange behaviour with your subtitles.
- send me your subtitles (there's a program to extract them enclosed) ; the first 10 KB
of subtitles in a VOB should be enough, but it would be cool if you sent me one subtitle
file per language.
6. Thanks
Thanks to Michel Lespinasse <walken@via.ecp.fr> for his great help on understanding
the RLE stuff, and for all the ideas he had.
Thanks to mass (David Waite) and taaz (David I. Lehn) from irc at
openprojects.net for sending me their subtitles.
7. Changes
20000116: added the 'changes' section.
20000116: added David Waite's and David I. Lehn's name.
20000116: changed "x0" and "x1" to "S0" and "S1" to make it less confusing.
--
Paris, January 16th 2000
Samuel Hocevar <sam@via.ecp.fr>
|