Ytstenut Protocol Specification Version 0.2 Intel Corporation Open Source Technology Centre Tomas Frydrych tf@linux.intel.com 0.4 1 November 2010 Initial draft 0.5 29 November 2010 Changed JID specification; use XEP-0050 as messaging backbone 0.6 30 November 2010 General edits. 0.7 15 December 2010 Improvements to Introduction, diagrams 0.8 24 January 2011 Initial update to use XPMN backbone 0.9 24 January 2011 Not using Ad-Hoc 0.10 22 February 2011 Fix cap advertisement 2011 Intel Corporation
Introduction We often carry out similar activities on different devices, e.g., watch videos on a smart phone, laptop, or a TV set. However, as we move in time and space, the optimal choice of a device for any given activity changes: a smart phone might be the perfect video viewing platform while travelling on a train, but a TV set might be preferred in the comfort of one's living room. Furthermore, our discrete activities are often interconnected even when distributed across distinct devices: a person watching a TV might want to locate some additional information about the broadcast (e.g., who the director is, special effects details, etc.), and might use a smart phone, rather than the TV, to search for it. The above examples illustrate two key aspects of human interaction with today's technology: (a) our activities are no longer confined to a single dedicated device each, but are distributed over a device mesh, and (b) the mesh as a whole now provides a context which shapes the activities themselves. Unfortunately, the technologies of today neither allow for our experience to stretch seamlessly over the device mesh, nor provide an easy access to the unified activity context the device mesh represents. What device meshing technologies there already are (e.g., UPnP), tend to focus narrowly on the sharing of hardware resources. While resource sharing is an important capability of the device mesh, on it's own it only provides a quantitative, rather than qualitative, improvement on the overall user experience (e.g., the ability to use a TV set to watch a video stored on a PC does not represent a radical improvement on using a memory stick to achieve the same). A radical improvement of the user experience requires to be able not just to share resources between devices, but to be able to share, and to interact with, the user activities per se, and do so across the device mesh in a seamless fashion. And since user activities are generally mapped directly to user facing applications, what is in fact needed is an application mesh facilitating both active interaction and passive mutual awareness between applications. The Ytstenut framework aims to facilitate the creation of a such a dynamic application mesh. It does so by providing a communication channels through which individual user-facing applications on distinct devices can passively advertise their activities in real time, and actively cooperate and coordinate their discrete behaviours, and in so construct a dynamic and homogeneous experience spanning the devices involved. The activities for which consumers use computers are impossible to enumerate, and are set to evolve. Consequently the Ytstenut framework does not seek to narrowly define the activities and/or services that might fall withing its scope, nor it seeks to prescribe the ways in which such activities or tasks should be accomplished. Rather the Ytstenut framework is a set of generic protocols that can support new activities and services without the need to modify the core protocols. More specifically, the aims of the Ytstenut framework are as follows: To provide unified discovery, connection and transport mechanism that could be utilised by user-facing applications running on a variety of hardware and software platforms, To provide standardised metadata model to facilitate efficient inter-application communication, To provide mechanisms for both active interaction between applications, and passive awareness of each other.
The Big Picture
Two-application mesh
The preceding diagram outlines a Ytstenut mesh consisting of two applications on two devices. Note the separation between the metadata and status channel, provided by the Ytstenut framework, and the actual content data transfer, which is happening outwidth the framework, and relies on other industry standards. The Ytstenut mesh, may, of course consist of any number of applications, on any number of devices (potentially with multiple applications on any single device). The possible topologies of the mesh are described in the following section.
Ytstenut Mesh Topologies The application mesh established through the Ytstenut framework can have two basic topologies: server-centric, and server-less home cloud. The Ytstenut framework aims to support both of these scenarios in a transparent manner, and it is possible that additional mesh topologies will be facilitated in future versions of this protocol.
Server-based Mesh
Server mesh
In a server-based mesh applications communicate with each other via a central server (NB: only metadata and status information is passed through the server; content is passed out of band). This type of mesh provides two principal benefits: it places no requirements on the topology of the underlying network, and it gives the server operator complete control over access and services on offer. As such, the server-based mesh is well suited, for example, for subscription services.
LAN-based Cloud Mesh
Cloud mesh
The LAN-based cloud mesh differs from server-based mesh by the absence of a central server; instead applications are able to discover each other, and communicate, transparently throughout the cloud. The main benefit of the LAN-based cloud is eliminating the need for operating and administrating a server; as such this type of mesh is particularly suited for the domestic use case.
Application Classes Ytstenut applications can be divided into two broad classes: Task-oriented applications: these are the core participants in the Ytstenut mesh. They are user-facing applications, such as media players, that have been enriched by adding the Ytstenut capabilities. Control Applications: these provide background Ytstenut services on an Ytstenut-enabled devices. Their principal purpose is to allow task-oriented applications to direct their communications at a device, rather than a specific task-oriented application on that device, and to ensure that appropriate task-oriented application is available (e.g., by spawning of suitable application on the device in response to incoming requests). While control applications can be purely background processes, when provided with a suitable UI they can be used as generic Ytstenut remote controls.
Metadata model One of the key components of the Ytstenut framework is the metadata model. The purpose of the Ytstenut protocols is to allow applications to exchange metadata describing their activities in a way that would allow them to coordinate these across multiple devices and platforms. Consequently, the metadata model must be: Flexible and extensible, to allow use with new, innovative applications, Sufficiently standardised to allow common classes of applications to talk to each other transparently. It is worth noting that the protocol does not aim to provide mechanisms for actual data transfers, though in some common and specific cases it mandates which other standard protocols should be used (see ). The Ytstenut metadata is modelled as a pairing of a capability subject (representing a single application feature that is of interest to a user) and an activity predicate (a way in which the user can manipulate content tied to a specific capability). Both the capability and the activity in each specific pair can be further qualified by attributes; the resulting {capability, activity, attributes} tuple constitutes the elementary unit of Ytstenut metadata. The above described tuple is used in two distinct ways: to indicate present application state, and to encapsulate instructions about future desired state. In order to facilitate communication between common application classes, the protocol defines the subjects, verbs and attributes for common types of user activities. At the same time, new subjects, verbs and attributes can be defined and used by specialised applications. In addition to the metadata describing application activities, the protocol also specifies means through which application describe themselves to the user.
XMPP/XPMN Backbone The Ytstenut communication protocols are built on the existing XMPP standard, using the XPMN protocol to construct the backbone of the application mesh. The reasons for choosing XMPP as the basic transport protocol are: Using an established messaging standard means that much of the wheel needs not to be reinvented, XMPP is supported on a broad range of hardware and software platforms, thus aiding the speed with which the Ytstenut framework can be rolled out, XMPP is an open standard that can be used without difficulties over licensing, XMPP is extensible by design, XMPP is capable of operating both in a server-based and server-less manner, and supports both of these modes a in transparent way, XMPP is XML-based, so that implementation of extensions is simplified by being able to use standard XML-processing tools, such a parsers, etc. As far as possible, the Ytstenut framework aims to reuse existing XMPP capabilities and features; these are augmented by two extensions: Protocols for encoding of Ytstenut metadata, A server-less protocol similar to link-local XMPP , but tailored for Ytstenut use. In addition, at number of points, the Ytstenut specification mandates the use of standard, but optional, XMPP features, particularly so, where this is desirable to improve security and privacy.
Security and Privacy Considerations The flexible and extensible nature of the Ytstenut framework means that it is not possible to predict what kind of data may be transmitted via the protocol in its real-world deployment. Furthermore, the expectation of deployment on a variety of platforms, ranging from desktop computers to mobile phones, means that multiple implementations of the protocol will be in use. It is, therefore, important that security and privacy of user data is a key factor in the design of the protocol itself. More specifically: The protocol must facilitate privacy of data in transit where that is appropriate or required, Reliable identity verification mechanism must be available, The protocol must provide structured access control to user's local resources. With regards to the above, the following should be noted in particular: XMPP on its on only provides client-to-server privacy. As such XMPP exchanges that span multiple servers are susceptible to server eavesdropping, Normal XMPP presence information is broadcast across all subscribed contacts, or, in the case of link-local XMPP protocol, even advertised entirely openly via m-DNS broadcasts; consequently the the presence mechanism is not suitable for metadata exchanges, including advertising extended status information (see ). The Ytstenut framework uses the XPMN protocol which addresses the security requirements above.
Link-local Ytstenut protocol The link-local Ytstenut protocol allows for automatic connection between Ytstenut clients running on the same LAN. It is derived from the local-xmpp protocol, but with some differences: The link-local service is called 'ytstenut' rather than 'presence', i.e., the PTRs have pattern 'JID._ytstenut._tcp._local.', All implementations must fulfill the requirements of XPMN .
Messaging Protocols
Descriptive Device Information tf Intel We need some way to advertise user-friendly device description; in regular XMPP this usually provided by a vCard, but the vCard spec is not suited for this too well. Ytstenut device need to provide descriptive information about themselves that can be presented to the user. At the bare minimum, this information includes a suitable, localised, device name.
Support for Avatars In addition to the device description advertised above, it is recommended that all Ytstenut implementations support the XMPP User Avatar specification.
Application/Service Identifier Each application/service is identified by a unique identifier. The identifier is constructed following the D-Bus naming convention, e.g., com.meego.BestestFriendApplication. This identifier is used to identify message and status senders and recipients as described later in this document.
Descriptive Application Information Ytstenut applications need to provide descriptive information about themselves that can be presented to the user. At the bare minimum, this information includes a suitable, localised, application name. The descriptive information is advertised together with the application capabilities, as described in
Application/Service Capabilities Ytstenut applications/services advertise their Ytstenut capabilities via XMPP Entity Capabilities protocol, using urn:ytstenut:capabilities as the value of the node attribute of the <c/> element. When the device capabilities are queried, capabilities of each application/service are represented in the <iq/> reply using XMPP data form; the form format is best described by an example: tf Intel Need to formaly specify a localisation mechanism for the form fields. urn:ytstenut:capabilities#org.gnome.Banshee application en_GB/Banshee Media Player fr/Banshee Lecteur de Musique urn:ytstenut:capabilities:yts-caps-audio urn:ytstenut:data:jingle:rtp ]]> Data form fields: FORM_TYPE Links the form to the application; the value is constructed by concatenating an 'urn:ytstenut:capabilies#' prefix with the application unique identifier (see ), Required. type tf Intel Review whether this distinction is really meaningfull, or whether 'control' should not be another kind of capability. The application/service type; either application or controller. Required. name A localised application/service name. Required. capabilities List of application/service capabilites; the values are constructed by concatenating an 'urn:ytstenut:capabilies:' prefix and the canonical name of the capability (for standard capabilities defined in ). The capability list should further include any data transfer protocols supported, using the urns defined in as additional values. Required. vendor A localised vendor name. Optional.
Extended Status Extended status information is advertised using the XPMN eventing mechanism (which in turn relies on XMPP Personal Eventing Protocol). The status is identified with item node urn:ytstenut:status and the payload is held by an <ytstenut:status/> element and its attributes; applications with multiple capabilities must include an <ytstenut:status/> element for each capability. The following attributes, in addition to those defined in , are used with the <ytstenut:status/> element: version Ytstenut protocol version; required, from-service The ID of the application this status message describes; required (see ). capability The capability this status applies to; required. The value should be preferably one of those defined in , activity The activity this status represents; optional (if not present yts-activity-idle is implied). The value should be preferably one of those defined in primary-capability Boolean indicating whether capability this status applies to is the primary capability of the application; optional (if absent false is implied). While the <ytstenut:status/> element can be extended with custom attributes, no frequently changing information (such as current playback position) is permitted as part of status to avoid flooding of the network. Human readable description is provided using one or more <ytstenut:description/> elements inside the <ytstenut:status/> element; each <ytstenut:description/> element must have an xml:lang attribute, and multiple <ytstenut:description/> elements must have a different xml:lang attribute each. Status XML example Playing a video about colour-based optical illusions. ]]>
Instruction Messages Instruction messages are used to send Ytstenut commands and information queries; as per XPMN this is achieved by exchanging <iq/>stanzas with the Ytstenut metadata payload.
Message payload: <code><ytstenut:message/></code> The <ytstenut:message/> element is used to encapsulate the payload of standardised Ytstenut messages. Required attributes: version The Ytstenut protocol version, from-service The ID of the application that sent this message; required (see ). to-service The ID of the application that this message is for; required (see ). type Message type. Standaridsed types have the prefix ytstenut/ and are defined in the following section. Custom command types are permitted, and must use a suitable namespace prefix (other than ytstenut/). Depending on the message purpose, additional attributes are used to define the message payload; there are no standard child elements defined by this specification, but custom child elements are allowed.
Standardised Message Types
<code>ytstenut/command</code> A command sent from application A to application B to executed directly by application B. Required <ytstenut:message/> attributes: capability Capability on which the command is to operate, preferably using one of the values defined in , activity Activity to carry out, preferably using one of the values defined in , time Time of command dispatch with at least millisecond precision, in standard XMPP format. Additional attributes, preferably using those defined in , are used to further qualify the capability and activity specified. Command example The following XML snippet tells some other application to start playing given video starting 3/4 into the video duration: [Optional command data; binary data base64 encoded] ... ]]>
Error handling When the resource to which the Ytstenut command pertains is unavailable, the command recipient should return an error that best describes reason why: <forbidden/> The recipient does not have sufficient privileges to carry out the command. <item-not-found/> The resource could not be located. The type attribute of the <error/> stanza should be set appropriately: the modify value should be used if the recipient is able to explore other sources for the same resource; the value cancel is used to indicate that no further attempts to execute this command should be made. When handling errors of type modify, the sender must explore each possible source no more than once. When all known sources are exhausted, the initiating application should notify user that the command could not be executed.
<code>ytstenut/transfer</code> A request by application A to application B to transfer B's activity to application C. Required <ytstenut:message/> attributes: capability Capability that is subject of the transfer, jid JID of the application to transfer to. Additional attributes, preferably using those defined in , may be used to further qualify the capability specified.
<code>ytstenut/find</code> tf Intel Rationale: because of the need to facilitate e2e encryption, commands cannot be proxied through control applications; the find request allows clients to initiate a transfer to an application that might not yet be running on the target device. Not allowing proxying of commands via intermediate applications also significantly simplifies issues related to access control. A request by an application A to a control application C to identify a suitable application B to dispatch a (subsequent) command to: The criteria for the search is given by the supplied attributes (e.g., application capability would be specified using the capability attribute), The search is limited to the service context the control application is part of, or, in the case of the home cloud, the device the control application is running on, The control application returns the result of the search using the jid attribute of the <ytstenut:message/> payload.
Error handling If no suitable running application matching the specified criteria can be identified but a suitable application is available on the system, the control application must return immediately with status executing, then attempt to start such suitable application. If a suitable application does not exist on the system, the control application must return immediately an error condition <item-not-found/>. When the spawned client application successfully starts up, the control application returns the result of the search using the jid attribute of the <ytstenut:message/> payload. If the application fails to start, the control application must dispatch response with status completed and an error; the error condition should indicate why the application failed to start, if that is known.
Common metadata classes The canonical definition is given in ; the following information is extracted from the XML schemas for convenience.
Common capability classes yts-caps-control control application, yts-caps-audio audio playback capabilities, yts-caps-video video playback capabilities, yts-caps-image image display capabilities, yts-caps-html html rendering capabilities, yts-caps-antivirus anti-virus capabilities, tf Intel More standard definitions should be added here; open to suggestions. Custom capabilities can be defined, providing these are suitably name-spaced with a custom prefix; custom capabilities must not use 'yts-' prefix.
Common activity classes Absence of the 'activity' attribute, or its empty value, imply idle state. yts-activity-playback playback, yts-activity-pause paused state, yts-activity-ffw fast forward, yts-activity-rwd rewind, yts-activity-scan scan, yts-activity-volume volume adjustment. tf Intel More standard definitions should be added here, open to suggestions. Custom activities can be defined, providing these are suitably name-spaced with a custom prefix; custom activities must not use 'yts-' prefix.
Common attributes protocol urn identifying a suitable protocol through which the resource on which to operate can be obtained (see ). Multiple protocols can be listed as a space separated list, in descending order of preference. uri uri a of a resource associated with activity. uid Universal id identifying resource associated with activity, tf Intel The idea is being able to use something like, for example, musicbrainz id to identify the resource, though in practice this might be hard to extend beyond audio volume volume level (floating point number from <0,1>), progress activity progress (floating point number from <0,1> this is the preferred way of passing information such as stream position, position activity position (floating point number); NB: applications should use the progress attribute whenever possible instead of 'position', description description: human readable description, suitable for presentation to user, jid XMPP id, speed speed of activity (floating point number; 1.0 indicates normal speed). Custom attributes can be defined, providing these are suitably name-spaced with a custom prefix; custom attributes must not use 'yts-' prefix.
Data Transfer Protocols This section defines standard data transfer protocols to be used by Ytstenut clients; this list does not restrict clients to these protocols alone, but sets out preferred protocols.
File Transfers The preferred file transfer protocol is SI File Transfer; this protocol must be supported by all compliant Ytstenut clients for which a file constitutes a meaningful data unit, It is recommended that clients also implement Jingle File Transfer; this protocol is currently in experimental stage, but once it is reaches the draft stage, it will be adopted as the default file transfer protocol for Ytstenut clients.
Streaming The preferred streaming protocol is XMPP Jingle RTP; applications that support media streaming should implement this protocol.
URNs for common resource fetching protocols This section codifies urns to be used with the uri attribute of Ytstenut commands to indicate how to reach the resource, and when advertising application capabilities (see ). Each urn is formed by combining a 'urn:ytstenut:data:' prefix with one of the protocol ids defined below: si-file Resource can be obtained from initiating application using SI File Transfer, see . jingle:ft Resource can be obtained from initiating application using XMPP Jingle File Transfer, see . jingle:rtp Resource can be obtained from initiating application using XMPP Jingle RTP, see
Ytstenut XML Schemas
Schema for <code>urn:ytstenut:status</code>
Schema for <code>urn:ytstenut:messages</code>
External Resources RFC 3920 Extensible Messaging and Presence Protocol (XMPP): Core The Internet Engineering Task Force RFC 3921 Extensible Messaging and Presence Protocol (XMPP): Instant Messaging and Presence The Internet Engineering Task Force RFC 2222 Simple Authentication and Security Layer (SASL) The Internet Engineering Task Force RFC 3923 End-to-End Signing and Object Encryption for the Extensible Messaging and Presence Protocol (XMPP) The Internet Engineering Task Force Dirk Meyer Extended Personal Media Networks (XPMN) University of Bremen XEP-0004 Data Forms XMPP Standards Foundation XEP-0030 Service Discovery XMPP Standards Foundation XEP-0050 Ad-Hoc Commands XMPP Standards Foundation XEP-0060 Publish-Subscribe XMPP Standards Foundation XEP-0082 XMPP Standards Foundation XMPP Date and Time Profiles XEP-0084 User Avatar XMPP Standards Foundation XEP-0096 SI File Transfer XMPP Standards Foundation XEP-0115 Entity Capabilities XMPP Standards Foundation XEP-0163 Personal Eventing Protocol XMPP Standards Foundation XEP-0166 Jingle XMPP Standards Foundation XEP-0167 Jingle RTP Sessions XMPP Standards Foundation XEP-0174 Serverless Messaging XMPP Standards Foundation XEP-0234 Jingle File Transfer XMPP Standards Foundation Jingle XTLS XMPP Standards Foundation D-Bus Specification