Ytstenut Protocol SpecificationVersion 0.2Intel CorporationOpen Source Technology CentreTomasFrydrychtf@linux.intel.com0.41 November 2010
Initial draft
0.529 November 2010
Changed JID specification; use XEP-0050 as messaging backbone
0.630 November 2010
General edits.
0.715 December 2010
Improvements to Introduction, diagrams
0.824 January 2011
Initial update to use XPMN backbone
0.924 January 2011
Not using Ad-Hoc
0.1022 February 2011
Fix cap advertisement
2011Intel CorporationIntroduction
We often carry out similar activities on different devices, e.g., watch
videos on a smart phone, laptop, or a TV set. However, as we move in time
and space, the optimal choice of a device for any given activity changes:
a smart phone might be the perfect video viewing platform while travelling
on a train, but a TV set might be preferred in the comfort of one's living
room.
Furthermore, our discrete activities are often interconnected even when
distributed across distinct devices: a person watching a TV might
want to locate some additional information about the broadcast (e.g.,
who the director is, special effects details, etc.), and might use a
smart phone, rather than the TV, to search for it.
The above examples illustrate two key aspects of human interaction with
today's technology: (a) our activities are no longer confined to a single
dedicated device each, but are distributed over a device mesh, and (b) the
mesh as a whole now provides a context which shapes the activities
themselves.
Unfortunately, the technologies of today neither allow for our experience
to stretch seamlessly over the device mesh, nor provide an easy access to
the unified activity context the device mesh represents. What device
meshing technologies there already are (e.g., UPnP), tend to focus
narrowly on the sharing of hardware resources. While resource sharing is
an important capability of the device mesh, on it's own it only provides a
quantitative, rather than qualitative, improvement on the overall user
experience (e.g., the ability to use a TV set to watch a video
stored on a PC does not represent a radical improvement on using a memory
stick to achieve the same).
A radical improvement of the user experience requires to be able not just
to share resources between devices, but to be able to share, and to
interact with, the user activities per se, and do so across the device
mesh in a seamless fashion. And since user activities are generally mapped
directly to user facing applications, what is in fact needed is an
application mesh facilitating both active interaction and passive mutual
awareness between applications.
The Ytstenut framework aims to facilitate the creation of a such a dynamic
application mesh. It does so by providing a communication channels through
which individual user-facing applications on distinct devices can
passively advertise their activities in real time, and actively cooperate
and coordinate their discrete behaviours, and in so construct a dynamic
and homogeneous experience spanning the devices involved.
The activities for which consumers use computers are impossible to
enumerate, and are set to evolve. Consequently the Ytstenut framework does
not seek to narrowly define the activities and/or services that might fall
withing its scope, nor it seeks to prescribe the ways in which such
activities or tasks should be accomplished. Rather the Ytstenut framework
is a set of generic protocols that can support new activities and services
without the need to modify the core protocols.
More specifically, the aims of the Ytstenut framework are as follows:
To provide unified discovery, connection and transport mechanism that
could be utilised by user-facing applications running on a variety of
hardware and software platforms,
To provide standardised metadata model to facilitate efficient
inter-application communication,
To provide mechanisms for both active interaction between
applications, and passive awareness of each other.
The Big Picture
The preceding diagram outlines a Ytstenut mesh consisting of two
applications on two devices. Note the separation between the metadata
and status channel, provided by the Ytstenut framework, and the actual
content data transfer, which is happening outwidth the framework, and
relies on other industry standards.
The Ytstenut mesh, may, of course consist of any number of
applications, on any number of devices (potentially with multiple
applications on any single device). The possible topologies of the mesh
are described in the following section.
Ytstenut Mesh Topologies
The application mesh established through the Ytstenut framework can have
two basic topologies: server-centric, and server-less home cloud. The
Ytstenut framework aims to support both of these scenarios in a
transparent manner, and it is possible that additional mesh topologies
will be facilitated in future versions of this protocol.
Server-based Mesh
In a server-based mesh applications communicate with each other via a
central server (NB: only metadata and status information is passed
through the server; content is passed out of band). This type of mesh
provides two principal benefits: it places no requirements on the
topology of the underlying network, and it gives the server operator
complete control over access and services on offer. As such, the
server-based mesh is well suited, for example, for subscription
services.
LAN-based Cloud Mesh
The LAN-based cloud mesh differs from server-based mesh by the absence
of a central server; instead applications are able to discover each
other, and communicate, transparently throughout the cloud. The main
benefit of the LAN-based cloud is eliminating the need for operating
and administrating a server; as such this type of mesh is particularly
suited for the domestic use case.
Application Classes
Ytstenut applications can be divided into two broad classes:
Task-oriented applications: these are the core participants in the
Ytstenut mesh. They are user-facing applications, such as media
players, that have been enriched by adding the Ytstenut
capabilities.
Control Applications: these provide background Ytstenut services on
an Ytstenut-enabled devices. Their principal purpose is to allow
task-oriented applications to direct their communications at a
device, rather than a specific task-oriented application on that
device, and to ensure that appropriate task-oriented application
is available (e.g., by spawning of suitable application on the
device in response to incoming requests).
While control applications can be purely background processes,
when provided with a suitable UI they can be used as generic
Ytstenut remote controls.
Metadata model
One of the key components of the Ytstenut framework is the metadata
model. The purpose of the Ytstenut protocols is to allow applications to
exchange metadata describing their activities in a way that would allow
them to coordinate these across multiple devices and
platforms. Consequently, the metadata model must be:
Flexible and extensible, to allow use with new, innovative
applications,
Sufficiently standardised to allow common classes
of applications to talk to each other transparently.
It is worth noting that the protocol does not aim to provide mechanisms
for actual data transfers, though in some common and specific cases it
mandates which other standard protocols should be used (see ).
The Ytstenut metadata is modelled as a pairing of a capability subject
(representing a single application feature that is of interest to a
user) and an activity predicate (a way in which the user can manipulate
content tied to a specific capability). Both the capability and the
activity in each specific pair can be further qualified by attributes;
the resulting {capability, activity, attributes} tuple
constitutes the elementary unit of Ytstenut metadata.
The above described tuple is used in two distinct ways: to indicate
present application state, and to encapsulate instructions about future
desired state.
In order to facilitate communication between common application classes,
the protocol defines the subjects, verbs and attributes for common types
of user activities. At the same time, new subjects, verbs and attributes
can be defined and used by specialised applications.
In addition to the metadata describing application activities, the
protocol also specifies means through which application describe
themselves to the user.
XMPP/XPMN Backbone
The Ytstenut communication protocols are built on the existing XMPP
standard, using the XPMN protocol to construct the backbone of the application
mesh. The reasons for choosing XMPP as the basic transport protocol
are:
Using an established messaging standard means that much of the
wheel needs not to be reinvented,
XMPP is supported on a broad range of hardware and software
platforms, thus aiding the speed with which the Ytstenut framework
can be rolled out,
XMPP is an open standard that can be used without difficulties
over licensing,
XMPP is extensible by design,
XMPP is capable of operating both in a server-based and
server-less manner, and supports both of these modes a in
transparent way,
XMPP is XML-based, so that implementation of extensions is
simplified by being able to use standard XML-processing tools,
such a parsers, etc.
As far as possible, the Ytstenut framework aims to reuse existing XMPP
capabilities and features; these are augmented by two extensions:
Protocols for encoding of Ytstenut metadata,
A server-less protocol similar to link-local XMPP , but tailored for Ytstenut use.
In addition, at number of points, the Ytstenut specification mandates
the use of standard, but optional, XMPP features, particularly so, where
this is desirable to improve security and privacy.
Security and Privacy Considerations
The flexible and extensible nature of the Ytstenut framework means that
it is not possible to predict what kind of data may be transmitted via
the protocol in its real-world deployment. Furthermore, the expectation
of deployment on a variety of platforms, ranging from desktop computers
to mobile phones, means that multiple implementations of the protocol
will be in use. It is, therefore, important that security and privacy of
user data is a key factor in the design of the protocol itself. More
specifically:
The protocol must facilitate privacy of data in transit where that
is appropriate or required,
Reliable identity verification mechanism must be available,
The protocol must provide structured access control to user's
local resources.
With regards to the above, the following should be noted in particular:
XMPP on its on only provides client-to-server privacy. As such
XMPP exchanges that span multiple servers are susceptible to
server eavesdropping,
Normal XMPP presence information is broadcast across all
subscribed contacts, or, in the case of link-local XMPP protocol,
even advertised entirely openly via m-DNS broadcasts; consequently
the the presence mechanism is not suitable for metadata exchanges,
including advertising extended status information (see ).
The Ytstenut framework uses the XPMN protocol which addresses the
security requirements above.
Link-local Ytstenut protocol
The link-local Ytstenut protocol allows for automatic connection between
Ytstenut clients running on the same LAN. It is derived from the
local-xmpp protocol, but with some differences:
The link-local service is called 'ytstenut' rather than 'presence',
i.e., the PTRs have pattern 'JID._ytstenut._tcp._local.',
All implementations must fulfill the requirements of XPMN
.
Messaging ProtocolsDescriptive Device InformationtfIntel
We need some way to advertise user-friendly device description; in
regular XMPP this usually provided by a vCard, but the vCard spec is
not suited for this too well.
Ytstenut device need to provide descriptive information about
themselves that can be presented to the user. At the bare minimum, this
information includes a suitable, localised, device name.
Support for Avatars
In addition to the device description advertised above, it is
recommended that all Ytstenut implementations support the XMPP User
Avatar specification.
Application/Service Identifier
Each application/service is identified by a unique identifier. The
identifier is constructed following the D-Bus naming
convention, e.g.,
com.meego.BestestFriendApplication. This identifier is used
to identify message and status senders and recipients as described later
in this document.
Descriptive Application Information
Ytstenut applications need to provide descriptive information about
themselves that can be presented to the user. At the bare minimum, this
information includes a suitable, localised, application name.
The descriptive information is advertised together with the application
capabilities, as described in Application/Service Capabilities
Ytstenut applications/services advertise their Ytstenut capabilities via
XMPP Entity Capabilities protocol, using
urn:ytstenut:capabilities as the value of the
node attribute of the <c/> element.
When the device capabilities are queried, capabilities of each
application/service are represented in the <iq/>
reply using XMPP data form; the form format is best described by an
example:
tfIntel
Need to formaly specify a localisation mechanism for the form fields.
urn:ytstenut:capabilities#org.gnome.Bansheeapplicationen_GB/Banshee Media Playerfr/Banshee Lecteur de Musiqueurn:ytstenut:capabilities:yts-caps-audiourn:ytstenut:data:jingle:rtp
]]>Data form fields:FORM_TYPE
Links the form to the application; the value is constructed by
concatenating an 'urn:ytstenut:capabilies#' prefix
with the application unique identifier (see ),
Required.
typetfIntel
Review whether this distinction is really meaningfull, or whether
'control' should not be another kind of capability.
The application/service type; either application or
controller.
Required.
name
A localised application/service name.
Required.
capabilities
List of application/service capabilites; the values are
constructed by concatenating an
'urn:ytstenut:capabilies:' prefix and the canonical
name of the capability (for standard capabilities defined in ).
The capability list should further include any data transfer
protocols supported, using the urns defined in as additional values.
Required.
vendor
A localised vendor name.
Optional.
Extended Status
Extended status information is advertised using the XPMN eventing
mechanism (which in turn relies on XMPP Personal Eventing
Protocol). The status is
identified with item node urn:ytstenut:status and the
payload is held by an <ytstenut:status/> element and
its attributes; applications with multiple capabilities must include an
<ytstenut:status/> element for each capability.
The following attributes, in addition to those defined in , are used with the
<ytstenut:status/> element:
version
Ytstenut protocol version; required,
from-service
The ID of the application this status message describes;
required (see ).
capability
The capability this status applies to; required. The value
should be preferably one of those defined in ,
activity
The activity this status represents; optional (if not present
yts-activity-idle is implied). The value should be
preferably one of those defined in primary-capability
Boolean indicating whether capability this status applies to is
the primary capability of the application; optional (if absent
false is implied).
While the <ytstenut:status/> element can be extended
with custom attributes, no frequently changing information
(such as current playback position) is permitted as part of status to
avoid flooding of the network.
Human readable description is provided using one or more
<ytstenut:description/> elements inside the
<ytstenut:status/> element; each
<ytstenut:description/> element must have an
xml:lang attribute, and multiple
<ytstenut:description/> elements must have a different
xml:lang attribute each.
Status XML example
Playing a video about colour-based optical illusions.
]]>
Instruction Messages
Instruction messages are used to send Ytstenut commands and information
queries; as per XPMN this is achieved by exchanging
<iq/>stanzas with the Ytstenut metadata payload.
Message payload: <ytstenut:message/>
The <ytstenut:message/> element is used to
encapsulate the payload of standardised Ytstenut messages.
Required attributes:
version
The Ytstenut protocol version,
from-service
The ID of the application that sent this message;
required (see ).
to-service
The ID of the application that this message is for;
required (see ).
type
Message type. Standaridsed types have the prefix
ytstenut/ and are defined in the following
section. Custom command types are permitted, and must use a
suitable namespace prefix (other than ytstenut/).
Depending on the message purpose, additional attributes are used to
define the message payload; there are no standard child elements
defined by this specification, but custom child elements are allowed.
Standardised Message Typesytstenut/command
A command sent from application A to application
B to executed directly by application B.
Required <ytstenut:message/> attributes:
capability
Capability on which the command is to operate, preferably
using one of the values defined in ,
activity
Activity to carry out, preferably using one of the values
defined in ,
time
Time of command dispatch with at least millisecond
precision, in standard XMPP format.
Additional attributes, preferably using those defined in , are used to further
qualify the capability and activity specified.
Command example
The following XML snippet tells some other application to start
playing given video starting 3/4 into the video duration:
[Optional command data; binary data base64 encoded]
...
]]>
Error handling
When the resource to which the Ytstenut command pertains is
unavailable, the command recipient should return an error that
best describes reason why:
<forbidden/>
The recipient does not have sufficient privileges to
carry out the command.
<item-not-found/>
The resource could not be located.
The type attribute of the
<error/> stanza should be set appropriately:
the modify value should be used if the recipient is
able to explore other sources for the same resource; the value
cancel is used to indicate that no further attempts
to execute this command should be made.
When handling errors of type modify, the sender must
explore each possible source no more than once. When all known
sources are exhausted, the initiating application should notify
user that the command could not be executed.
ytstenut/transfer
A request by application A to application
B to transfer B's activity to application
C.
Required <ytstenut:message/> attributes:
capability
Capability that is subject of the transfer,
jid
JID of the application to transfer to.
Additional attributes, preferably using those defined in , may be used to further
qualify the capability specified.
ytstenut/findtfIntel
Rationale: because of the need to
facilitate e2e encryption, commands cannot be proxied through
control applications; the find request allows
clients to initiate a transfer to an application that might
not yet be running on the target device.
Not allowing proxying of commands via intermediate applications
also significantly simplifies issues related to access control.
A request by an application A to a control application
C to identify a suitable application B to
dispatch a (subsequent) command to:
The criteria for the search is given by the supplied
attributes (e.g., application capability would be specified
using the capability attribute),
The search is limited to the service context the control
application is part of, or, in the case of the home cloud, the
device the control application is running on,
The control application returns the result of the search using
the jid attribute of the
<ytstenut:message/> payload.
Error handling
If no suitable running application matching the specified criteria
can be identified but a suitable application is available on the
system, the control application must return immediately with
status executing, then attempt to start such suitable
application.
If a suitable application does not exist on the system, the
control application must return immediately an error condition
<item-not-found/>.
When the spawned client application successfully starts up, the
control application returns the result of the search using the
jid attribute of the
<ytstenut:message/> payload. If the application
fails to start, the control application must dispatch response
with status completed and an error; the error
condition should indicate why the application failed to start, if
that is known.
Common metadata classes
The canonical definition is given in ;
the following information is extracted from the XML schemas for
convenience.
Common capability classesyts-caps-control
control application,
yts-caps-audio
audio playback capabilities,
yts-caps-video
video playback capabilities,
yts-caps-image
image display capabilities,
yts-caps-html
html rendering capabilities,
yts-caps-antivirus
anti-virus capabilities,
tfIntel
More standard definitions should be added here; open to
suggestions.
Custom capabilities can be defined, providing these are suitably
name-spaced with a custom prefix; custom capabilities must not use
'yts-' prefix.
Common activity classes
Absence of the 'activity' attribute, or its empty value, imply idle
state.
yts-activity-playback
playback,
yts-activity-pause
paused state,
yts-activity-ffw
fast forward,
yts-activity-rwd
rewind,
yts-activity-scan
scan,
yts-activity-volume
volume adjustment.
tfIntel
More standard definitions should be added here, open to suggestions.
Custom activities can be defined, providing these are suitably
name-spaced with a custom prefix; custom activities must not use 'yts-'
prefix.
Common attributesprotocolurn identifying a suitable protocol through which
the resource on which to operate can be obtained (see ). Multiple protocols
can be listed as a space separated list, in descending order
of preference.
uriuri a of a resource associated with activity.
uid
Universal id identifying resource associated with activity,
tfIntel
The idea is being able to use something like, for example,
musicbrainz id to identify the resource, though in practice
this might be hard to extend beyond audio
volume
volume level (floating point number from <0,1>),
progress
activity progress (floating point number from
<0,1> this is the preferred way of passing information
such as stream position,
position
activity position (floating point number); NB: applications
should use the progress attribute whenever
possible instead of 'position',
description
description: human readable description, suitable for
presentation to user,
jid
XMPP id,
speed
speed of activity (floating point number; 1.0 indicates
normal speed).
Custom attributes can be defined, providing these are suitably
name-spaced with a custom prefix; custom attributes must not use 'yts-'
prefix.
Data Transfer Protocols
This section defines standard data transfer protocols to be used by
Ytstenut clients; this list does not restrict clients to these protocols
alone, but sets out preferred protocols.
File Transfers
The preferred file transfer protocol is SI File
Transfer; this
protocol must be supported by all compliant Ytstenut clients for
which a file constitutes a meaningful data unit,
It is recommended that clients also implement Jingle
File Transfer;
this protocol is currently in experimental stage, but once it is
reaches the draft stage, it will be adopted as the default file
transfer protocol for Ytstenut clients.
Streaming
The preferred streaming protocol is XMPP Jingle RTP; applications that support media
streaming should implement this protocol.
URNs for common resource fetching protocols
This section codifies urns to be used with the
uri attribute of Ytstenut commands to indicate how to
reach the resource, and when advertising application capabilities (see
). Each urn is formed
by combining a 'urn:ytstenut:data:' prefix with one of the
protocol ids defined below:
si-file
Resource can be obtained from initiating application using SI
File Transfer,
see .
jingle:ft
Resource can be obtained from initiating application using
XMPP Jingle File Transfer, see .
jingle:rtp
Resource can be obtained from initiating application using
XMPP Jingle RTP,
see Ytstenut XML SchemasSchema for urn:ytstenut:statusSchema for urn:ytstenut:messagesExternal ResourcesRFC 3920Extensible Messaging and Presence Protocol (XMPP):
CoreThe Internet Engineering Task ForceRFC 3921Extensible Messaging and Presence Protocol (XMPP): Instant
Messaging and PresenceThe Internet Engineering Task ForceRFC 2222Simple Authentication and Security Layer (SASL)The Internet Engineering Task ForceRFC 3923End-to-End Signing and Object Encryption for the Extensible
Messaging and Presence Protocol (XMPP)The Internet Engineering Task ForceDirkMeyerExtended Personal Media Networks (XPMN)University of BremenXEP-0004Data FormsXMPP Standards FoundationXEP-0030Service DiscoveryXMPP Standards FoundationXEP-0050Ad-Hoc CommandsXMPP Standards FoundationXEP-0060Publish-SubscribeXMPP Standards FoundationXEP-0082XMPP Standards FoundationXMPP Date and Time ProfilesXEP-0084User AvatarXMPP Standards FoundationXEP-0096SI File TransferXMPP Standards FoundationXEP-0115Entity CapabilitiesXMPP Standards FoundationXEP-0163Personal Eventing ProtocolXMPP Standards FoundationXEP-0166JingleXMPP Standards FoundationXEP-0167Jingle RTP SessionsXMPP Standards FoundationXEP-0174Serverless MessagingXMPP Standards FoundationXEP-0234Jingle File TransferXMPP Standards FoundationJingle XTLSXMPP Standards FoundationD-Bus Specification