1 files changed, 0 insertions, 546 deletions
diff --git a/docs/design/draft-subtitle-overlays.txt b/docs/design/draft-subtitle-overlays.txt
deleted file mode 100644
index 87f2c2c61..000000000
--- a/docs/design/draft-subtitle-overlays.txt
+++ /dev/null
@@ -1,546 +0,0 @@
-===============================================================
- Subtitle overlays, hardware-accelerated decoding and playbin
-===============================================================
-
-Status: EARLY DRAFT / BRAINSTORMING
-
- === 1. Background ===
-
-Subtitles can be muxed in containers or come from an external source.
-
-Subtitles come in many shapes and colours. Usually they are either
-text-based (incl. 'pango markup'), or bitmap-based (e.g. DVD subtitles
-and the most common form of DVB subs). Bitmap based subtitles are
-usually compressed in some way, like some form of run-length encoding.
-
-Subtitles are currently decoded and rendered in subtitle-format-specific
-overlay elements. These elements have two sink pads (one for raw video
-and one for the subtitle format in question) and one raw video source pad.
-
-They will take care of synchronising the two input streams, and of
-decoding and rendering the subtitles on top of the raw video stream.
-
-Digression: one could theoretically have dedicated decoder/render elements
-that output an AYUV or ARGB image, and then let a videomixer element do
-the actual overlaying, but this is not very efficient, because it requires
-us to allocate and blend whole pictures (1920x1080 AYUV = 8MB,
-1280x720 AYUV = 3.6MB, 720x576 AYUV = 1.6MB) even if the overlay region
-is only a small rectangle at the bottom. This wastes memory and CPU.
-We could do something better by introducing a new format that only
-encodes the region(s) of interest, but we don't have such a format yet, and
-are not necessarily keen to rewrite this part of the logic in playbin
-at this point - and we can't change existing elements' behaviour, so would
-need to introduce new elements for this.
-
-Playbin2 supports outputting compressed formats, i.e. it does not
-force decoding to a raw format, but is happy to output to a non-raw
-format as long as the sink supports that as well.
-
-In case of certain hardware-accelerated decoding APIs, we will make use
-of that functionality. However, the decoder will not output a raw video
-format then, but some kind of hardware/API-specific format (in the caps)
-and the buffers will reference hardware/API-specific objects that
-the hardware/API-specific sink will know how to handle.
-
-
- === 2. The Problem ===
-
-In the case of such hardware-accelerated decoding, the decoder will not
-output raw pixels that can easily be manipulated. Instead, it will
-output hardware/API-specific objects that can later be used to render
-a frame using the same API.
-
-Even if we could transform such a buffer into raw pixels, we most
-likely would want to avoid that, in order to avoid the need to
-map the data back into system memory (and then later back to the GPU).
-It's much better to upload the much smaller encoded data to the GPU/DSP
-and then leave it there until rendered.
-
-Currently playbin only supports subtitles on top of raw decoded video.
-It will try to find a suitable overlay element from the plugin registry
-based on the input subtitle caps and the rank. (It is assumed that we
-will be able to convert any raw video format into any format required
-by the overlay using a converter such as videoconvert.)
-
-It will not render subtitles if the video sent to the sink is not
-raw YUV or RGB or if conversions have been disabled by setting the
-native-video flag on playbin.
-
-Subtitle rendering is considered an important feature. Enabling
-hardware-accelerated decoding by default should not lead to a major
-feature regression in this area.
-
-This means that we need to support subtitle rendering on top of
-non-raw video.
-
-
- === 3. Possible Solutions ===
-
-The goal is to keep knowledge of the subtitle format within the
-format-specific GStreamer plugins, and knowledge of any specific
-video acceleration API to the GStreamer plugins implementing
-that API. We do not want to make the pango/dvbsuboverlay/dvdspu/kate
-plugins link to libva/libvdpau/etc. and we do not want to make
-the vaapi/vdpau plugins link to all of libpango/libkate/libass etc.
-
-
-Multiple possible solutions come to mind:
-
-  (a) backend-specific overlay elements
-
-      e.g. vaapitextoverlay, vdpautextoverlay, vaapidvdspu, vdpaudvdspu,
-      vaapidvbsuboverlay, vdpaudvbsuboverlay, etc.
-
-      This assumes the overlay can be done directly on the backend-specific
-      object passed around.
-
-      The main drawback with this solution is that it leads to a lot of
-      code duplication and may also lead to uncertainty about distributing
-      certain duplicated pieces of code. The code duplication is pretty
-      much unavoidable, since making textoverlay, dvbsuboverlay, dvdspu,
-      kate, assrender, etc. available in form of base classes to derive
-      from is not really an option. Similarly, one would not really want
-      the vaapi/vdpau plugin to depend on a bunch of other libraries
-      such as libpango, libkate, libtiger, libass, etc.
-
-      One could add some new kind of overlay plugin feature though in
-      combination with a generic base class of some sort, but in order
-      to accommodate all the different cases and formats one would end
-      up with quite convoluted/tricky API.
-
-      (Of course there could also be a GstFancyVideoBuffer that provides
-      an abstraction for such video accelerated objects and that could
-      provide an API to add overlays to it in a generic way, but in the
-      end this is just a less generic variant of (c), and it is not clear
-      that there are real benefits to a specialised solution vs. a more
-      generic one).
-
-
-  (b) convert backend-specific object to raw pixels and then overlay
-
-      Even where possible technically, this is most likely very
-      inefficient.
-
-
-  (c) attach the overlay data to the backend-specific video frame buffers
-      in a generic way and do the actual overlaying/blitting later in
-      backend-specific code such as the video sink (or an accelerated
-      encoder/transcoder)
-
-      In this case, the actual overlay rendering (i.e. the actual text
-      rendering or decoding DVD/DVB data into pixels) is done in the
-      subtitle-format-specific GStreamer plugin. All knowledge about
-      the subtitle format is contained in the overlay plugin then,
-      and all knowledge about the video backend in the video backend
-      specific plugin.
-
-      The main question then is how to get the overlay pixels (and
-      we will only deal with pixels here) from the overlay element
-      to the video sink.
-
-      This could be done in multiple ways: One could send custom
-      events downstream with the overlay data, or one could attach
-      the overlay data directly to the video buffers in some way.
-
-      Sending inline events has the advantage that is is fairly
-      transparent to any elements between the overlay element and
-      the video sink: if an effects plugin creates a new video
-      buffer for the output, nothing special needs to be done to
-      maintain the subtitle overlay information, since the overlay
-      data is not attached to the buffer. However, it slightly
-      complicates things at the sink, since it would also need to
-      look for the new event in question instead of just processing
-      everything in its buffer render function.
-
-      If one attaches the overlay data to the buffer directly, any
-      element between overlay and video sink that creates a new
-      video buffer would need to be aware of the overlay data
-      attached to it and copy it over to the newly-created buffer.
-
-      One would have to do implement a special kind of new query
-      (e.g. FEATURE query) that is not passed on automatically by
-      gst_pad_query_default() in order to make sure that all elements
-      downstream will handle the attached overlay data. (This is only
-      a problem if we want to also attach overlay data to raw video
-      pixel buffers; for new non-raw types we can just make it
-      mandatory and assume support and be done with it; for existing
-      non-raw types nothing changes anyway if subtitles don't work)
-      (we need to maintain backwards compatibility for existing raw
-      video pipelines like e.g.:  ..decoder ! suboverlay ! encoder..)
-
-      Even though slightly more work, attaching the overlay information
-      to buffers seems more intuitive than sending it interleaved as
-      events. And buffers stored or passed around (e.g. via the
-      "last-buffer" property in the sink when doing screenshots via
-      playbin) always contain all the information needed.
-
-
-  (d) create a video/x-raw-*-delta format and use a backend-specific videomixer
-
-      This possibility was hinted at already in the digression in
-      section 1. It would satisfy the goal of keeping subtitle format
-      knowledge in the subtitle plugins and video backend knowledge
-      in the video backend plugin. It would also add a concept that
-      might be generally useful (think ximagesrc capture with xdamage).
-      However, it would require adding foorender variants of all the
-      existing overlay elements, and changing playbin to that new
-      design, which is somewhat intrusive. And given the general
-      nature of such a new format/API, we would need to take a lot
-      of care to be able to accommodate all possible use cases when
-      designing the API, which makes it considerably more ambitious.
-      Lastly, we would need to write videomixer variants for the
-      various accelerated video backends as well.
-
-
-Overall (c) appears to be the most promising solution. It is the least
-intrusive and should be fairly straight-forward to implement with
-reasonable effort, requiring only small changes to existing elements
-and requiring no new elements.
-
-Doing the final overlaying in the sink as opposed to a videomixer
-or overlay in the middle of the pipeline has other advantages:
-
- - if video frames need to be dropped, e.g. for QoS reasons,
-   we could also skip the actual subtitle overlaying and
-   possibly the decoding/rendering as well, if the
-   implementation and API allows for that to be delayed.
-
- - the sink often knows the actual size of the window/surface/screen
-   the output video is rendered to. This *may* make it possible to
-   render the overlay image in a higher resolution than the input
-   video, solving a long standing issue with pixelated subtitles on
-   top of low-resolution videos that are then scaled up in the sink.
-   This would require for the rendering to be delayed of course instead
-   of just attaching an AYUV/ARGB/RGBA blog of pixels to the video buffer
-   in the overlay, but that could all be supported.
-
- - if the video backend / sink has support for high-quality text
-   rendering (clutter?) we could just pass the text or pango markup
-   to the sink and let it do the rest (this is unlikely to be
-   supported in the general case - text and glyph rendering is
-   hard; also, we don't really want to make up our own text markup
-   system, and pango markup is probably too limited for complex
-   karaoke stuff).
-
-
- === 4. API needed ===
-
-  (a) Representation of subtitle overlays to be rendered
-
-      We need to pass the overlay pixels from the overlay element to the
-      sink somehow. Whatever the exact mechanism, let's assume we pass
-      a refcounted GstVideoOverlayComposition struct or object.
-
-      A composition is made up of one or more overlays/rectangles.
-
-      In the simplest case an overlay rectangle is just a blob of
-      RGBA/ABGR [FIXME?] or AYUV pixels with positioning info and other
-      metadata, and there is only one rectangle to render.
-
-      We're keeping the naming generic ("OverlayFoo" rather than
-      "SubtitleFoo") here, since this might also be handy for
-      other use cases such as e.g. logo overlays or so. It is not
-      designed for full-fledged video stream mixing though.
-
-        // Note: don't mind the exact implementation details, they'll be hidden
-
-        // FIXME: might be confusing in 0.11 though since GstXOverlay was
-        //        renamed to GstVideoOverlay in 0.11, but not much we can do,
-        //        maybe we can rename GstVideoOverlay to something better
-
-        struct GstVideoOverlayComposition
-        {
-            guint                          num_rectangles;
-            GstVideoOverlayRectangle    ** rectangles;
-
-            /* lowest rectangle sequence number still used by the upstream
-             * overlay element. This way a renderer maintaining some kind of
-             * rectangles <-> surface cache can know when to free cached
-             * surfaces/rectangles. */
-            guint                          min_seq_num_used;
-
-            /* sequence number for the composition (same series as rectangles) */
-            guint                          seq_num;
-        }
-
-        struct GstVideoOverlayRectangle
-        {
-            /* Position on video frame and dimension of output rectangle in
-             * output frame terms (already adjusted for the PAR of the output
-             * frame). x/y can be negative (overlay will be clipped then) */
-            gint  x, y;
-            guint render_width, render_height;
-
-            /* Dimensions of overlay pixels */
-            guint width, height, stride;
-
-            /* This is the PAR of the overlay pixels */
-            guint par_n, par_d;
-
-            /* Format of pixels, GST_VIDEO_FORMAT_ARGB on big-endian systems,
-             * and BGRA on little-endian systems (i.e. pixels are treated as
-             * 32-bit values and alpha is always in the most-significant byte,
-             * and blue is in the least-significant byte).
-             *
-             * FIXME: does anyone actually use AYUV in practice? (we do
-             * in our utility function to blend on top of raw video)
-             * What about AYUV and endianness? Do we always have [A][Y][U][V]
-             * in memory? */
-            /* FIXME: maybe use our own enum? */
-            GstVideoFormat format;
-
-            /* Refcounted blob of memory, no caps or timestamps */
-            GstBuffer *pixels;
-
-            // FIXME: how to express source like text or pango markup?
-            //        (just add source type enum + source buffer with data)
-            //
-            // FOR 0.10: always send pixel blobs, but attach source data in
-            // addition (reason: if downstream changes, we can't renegotiate
-            // that properly, if we just do a query of supported formats from
-            // the start). Sink will just ignore pixels and use pango markup
-            // from source data if it supports that.
-            //
-            // FOR 0.11: overlay should query formats (pango markup, pixels)
-            // supported by downstream and then only send that. We can
-            // renegotiate via the reconfigure event.
-            //
-
-            /* sequence number: useful for backends/renderers/sinks that want
-             * to maintain a cache of rectangles <-> surfaces. The value of
-             * the min_seq_num_used in the composition tells the renderer which
-             * rectangles have expired. */
-            guint      seq_num;
-
-            /* FIXME: we also need a (private) way to cache converted/scaled
-             * pixel blobs */
-        }
-
-      (a1) Overlay consumer API:
-
-        How would this work in a video sink that supports scaling of textures:
-
-        gst_foo_sink_render () {
-          /* assume only one for now */
-          if video_buffer has composition:
-            composition = video_buffer.get_composition()
-
-            for each rectangle in composition:
-              if rectangle.source_data_type == PANGO_MARKUP
-                actor = text_from_pango_markup (rectangle.get_source_data())
-              else
-                pixels = rectangle.get_pixels_unscaled (FORMAT_RGBA, ...)
-                actor = texture_from_rgba (pixels, ...)
-
-              .. position + scale on top of video surface ...
-        }
-
-      (a2) Overlay producer API:
-
-        e.g. logo or subpicture overlay: got pixels, stuff into rectangle:
-
-         if (logoverlay->cached_composition == NULL) {
-           comp = composition_new ();
-
-           rect = rectangle_new (format, pixels_buf,
-                                 width, height, stride, par_n, par_d,
-                                 x, y, render_width, render_height);
-
-           /* composition adds its own ref for the rectangle */
-           composition_add_rectangle (comp, rect);
-           rectangle_unref (rect);
-
-           /* buffer adds its own ref for the composition */
-           video_buffer_attach_composition (comp);
-
-           /* we take ownership of the composition and save it for later */
-           logoverlay->cached_composition = comp;
-         } else {
-           video_buffer_attach_composition (logoverlay->cached_composition);
-         }
-
-      FIXME: also add some API to modify render position/dimensions of
-      a rectangle (probably requires creation of new rectangle, unless
-      we handle writability like with other mini objects).
-
-  (b) Fallback overlay rendering/blitting on top of raw video
-
-      Eventually we want to use this overlay mechanism not only for
-      hardware-accelerated video, but also for plain old raw video,
-      either at the sink or in the overlay element directly.
-
-      Apart from the advantages listed earlier in section 3, this
-      allows us to consolidate a lot of overlaying/blitting code that
-      is currently repeated in every single overlay element in one
-      location. This makes it considerably easier to support a whole
-      range of raw video formats out of the box, add SIMD-optimised
-      rendering using ORC, or handle corner cases correctly.
-
-      (Note: side-effect of overlaying raw video at the video sink is
-      that if e.g. a screnshotter gets the last buffer via the last-buffer
-      property of basesink, it would get an image without the subtitles
-      on top. This could probably be fixed by re-implementing the
-      property in GstVideoSink though. Playbin2 could handle this
-      internally as well).
-
-        void
-        gst_video_overlay_composition_blend (GstVideoOverlayComposition * comp
-                                             GstBuffer                  * video_buf)
-        {
-          guint n;
-
-          g_return_if_fail (gst_buffer_is_writable (video_buf));
-          g_return_if_fail (GST_BUFFER_CAPS (video_buf) != NULL);
-
-          ... parse video_buffer caps into BlendVideoFormatInfo ...
-
-          for each rectangle in the composition: {
-
-                 if (gst_video_format_is_yuv (video_buf_format)) {
-                   overlay_format = FORMAT_AYUV;
-                 } else if (gst_video_format_is_rgb (video_buf_format)) {
-                   overlay_format = FORMAT_ARGB;
-                 } else {
-                   /* FIXME: grayscale? */
-                   return;
-                 }
-
-                 /* this will scale and convert AYUV<->ARGB if needed */
-                 pixels = rectangle_get_pixels_scaled (rectangle, overlay_format);
-
-                 ... clip output rectangle ...
-
-                 __do_blend (video_buf_format, video_buf->data,
-                             overlay_format, pixels->data,
-                             x, y, width, height, stride);
-
-                 gst_buffer_unref (pixels);
-          }
-        }
-
-
-  (c) Flatten all rectangles in a composition
-
-      We cannot assume that the video backend API can handle any
-      number of rectangle overlays, it's possible that it only
-      supports one single overlay, in which case we need to squash
-      all rectangles into one.
-
-      However, we'll just declare this a corner case for now, and
-      implement it only if someone actually needs it. It's easy
-      to add later API-wise. Might be a bit tricky if we have
-      rectangles with different PARs/formats (e.g. subs and a logo),
-      though we could probably always just use the code from (b)
-      with a fully transparent video buffer to create a flattened
-      overlay buffer.
-
-  (d) core API: new FEATURE query
-
-      For 0.10 we need to add a FEATURE query, so the overlay element
-      can query whether the sink downstream and all elements between
-      the overlay element and the sink support the new overlay API.
-      Elements in between need to support it because the render
-      positions and dimensions need to be updated if the video is
-      cropped or rescaled, for example.
-
-      In order to ensure that all elements support the new API,
-      we need to drop the query in the pad default query handler
-      (so it only succeeds if all elements handle it explicitly).
-
-      Might want two variants of the feature query - one where
-      all elements in the chain need to support it explicitly
-      and one where it's enough if some element downstream
-      supports it.
-
-      In 0.11 this could probably be handled via GstMeta and
-      ALLOCATION queries (and/or we could simply require
-      elements to be aware of this API from the start).
-
-      There appears to be no issue with downstream possibly
-      not being linked yet at the time when an overlay would
-      want to do such a query.
-
-
-Other considerations:
-
- - renderers (overlays or sinks) may be able to handle only ARGB or only AYUV
-   (for most graphics/hw-API it's likely ARGB of some sort, while our
-   blending utility functions will likely want the same colour space as
-   the underlying raw video format, which is usually YUV of some sort).
-   We need to convert where required, and should cache the conversion.
-
- - renderers may or may not be able to scale the overlay. We need to
-   do the scaling internally if not (simple case: just horizontal scaling
-   to adjust for PAR differences; complex case: both horizontal and vertical
-   scaling, e.g. if subs come from a different source than the video or the
-   video has been rescaled or cropped between overlay element and sink).
-
- - renderers may be able to generate (possibly scaled) pixels on demand
-   from the original data (e.g. a string or RLE-encoded data). We will
-   ignore this for now, since this functionality can still be added later
-   via API additions. The most interesting case would be to pass a pango
-   markup string, since e.g. clutter can handle that natively.
-
- - renderers may be able to write data directly on top of the video pixels
-   (instead of creating an intermediary buffer with the overlay which is
-   then blended on top of the actual video frame), e.g. dvdspu, dvbsuboverlay
-
-   However, in the interest of simplicity, we should probably ignore the
-   fact that some elements can blend their overlays directly on top of the
-   video (decoding/uncompressing them on the fly), even more so as it's
-   not obvious that it's actually faster to decode the same overlay
-   70-90 times (say) (ie. ca. 3 seconds of video frames) and then blend
-   it 70-90 times instead of decoding it once into a temporary buffer
-   and then blending it directly from there, possibly SIMD-accelerated.
-   Also, this is only relevant if the video is raw video and not some
-   hardware-acceleration backend object.
-
-   And ultimately it is the overlay element that decides whether to do
-   the overlay right there and then or have the sink do it (if supported).
-   It could decide to keep doing the overlay itself for raw video and
-   only use our new API for non-raw video.
-
- - renderers may want to make sure they only upload the overlay pixels once
-   per rectangle if that rectangle recurs in subsequent frames (as part of
-   the same composition or a different composition), as is likely. This caching
-   of e.g. surfaces needs to be done renderer-side and can be accomplished
-   based on the sequence numbers. The composition contains the lowest
-   sequence number still in use upstream (an overlay element may want to
-   cache created compositions+rectangles as well after all to re-use them
-   for multiple frames), based on that the renderer can expire cached
-   objects. The caching needs to be done renderer-side because attaching
-   renderer-specific objects to the rectangles won't work well given the
-   refcounted nature of rectangles and compositions, making it unpredictable
-   when a rectangle or composition will be freed or from which thread
-   context it will be freed. The renderer-specific objects are likely bound
-   to other types of renderer-specific contexts, and need to be managed
-   in connection with those.
-
- - composition/rectangles should internally provide a certain degree of
-   thread-safety. Multiple elements (sinks, overlay element) might access
-   or use the same objects from multiple threads at the same time, and it
-   is expected that elements will keep a ref to compositions and rectangles
-   they push downstream for a while, e.g. until the current subtitle
-   composition expires.
-
- === 5. Future considerations ===
-
- - alternatives: there may be multiple versions/variants of the same subtitle
-   stream. On DVDs, there may be a 4:3 version and a 16:9 version of the same
-   subtitles. We could attach both variants and let the renderer pick the best
-   one  for the situation (currently we just use the 16:9 version). With totem,
-   it's ultimately totem that adds the 'black bars' at the top/bottom, so totem
-   also knows if it's got a 4:3 display and can/wants to fit 4:3 subs (which
-   may render on top of the bars) or not, for example.
-
- === 6. Misc. FIXMEs ===
-
-TEST: should these look (roughly) alike (note text distortion) - needs fixing in textoverlay
-
-gst-launch-0.10 \
-    videotestsrc ! video/x-raw,width=640,height=480,pixel-aspect-ratio=1/1 ! textoverlay text=Hello font-desc=72 ! xvimagesink \
-    videotestsrc ! video/x-raw,width=320,height=480,pixel-aspect-ratio=2/1 ! textoverlay text=Hello font-desc=72 ! xvimagesink \
-    videotestsrc ! video/x-raw,width=640,height=240,pixel-aspect-ratio=1/2 ! textoverlay text=Hello font-desc=72 ! xvimagesink
-
- ~~~ THE END ~~~ 
-