summaryrefslogtreecommitdiff
path: root/Specifications/shared-filemetadata-spec.mdwn
blob: dda35d9accedb3df6e348299353ebfa9e51fc8d6 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163


# Introduction

There are a number of metadata frameworks and indexers such as [[Beagle|http://beaglewiki.org/Main_Page]], [[Kat|http://kat.mandriva.com/]], [[Strigi|http://strigi.sf.net/]] and [[GLScube|http://glscube.org/]], as well as a new freedesktop system [[Tracker|http://www.gnome.org/~jamiemcc/tracker/]], which is based on this spec and is currently under development. These frameworks provide a rich source of metadata about files including such things as the author of a document or the artist of an mp3 file. The purpose of this specification is to define a common metadata naming scheme that each framework can implement to allow applications to tap into this wealth of information. Some examples of interested applications would include filemanagers that want to display and allow editing of this metadata as well as providing integrated search functionality and virtual folder capability (IE folders whose file contents are defined by metadata rather than physical location). This specification will define a common set of "well-known" metadata. 

Also worth reading is [[Apple's spotlight metadata attributes reference|http://developer.apple.com/documentation/Carbon/Reference/MetadataAttributesRef/Reference/CommonAttrs.html]] 

[[CommonExtendedAttributes|CommonExtendedAttributes]] describes common extended attributes that can be used by indexers when retrieving metadata from files. 


# Metadata

Metadata is usually defined as data about data. In our case the metadata describes data about files that is often user visible in file managers, office applications, document viewers and audio players. Metadata can typically be viewed or written by selecting "properties" from the file menu of one of these applications.  

Whilst there are some standards for naming document metadata like [[Dublin Core|http://dublincore.org/documents/dces/]], most desktop applications use a propriety set of metadata names. This specification will attempt to define a common set of metadata using a mixture of [[Dublin Core|http://dublincore.org/documents/dces/]], [[ID3|http://en.wikipedia.org/wiki/Id3]] for audio files, [[EXIF|http://en.wikipedia.org/wiki/EXIF]] for image files as well as application specific metadata names. The purpose of these common metadata names is not just for the benefit of metadata frameworks and search engines but also for standardising the display of metadata in all applications.  


## Metadata rules

The only requirement for metadata names is that they are unique and do not overload or cause confusion with each other. To make this possible, all metadata is namespaced by an appropriate class based on the type of the file or the application name (if the metadata is application specific). 

This specification only defines a common subset of all possible metadata and is not designed to limit what metadata any file can have nor does it provide any formal names for custom or non-standard metadata other than a namespace class.  

None of the metadata defined in this specification is mandatory and the existence of any metadata is dependent on the framework being used and the files being indexed.  

All metadata defined here may be used in search strings. 

Only metadata that is not derived from the file or file contents may be editable in the interface (applications that want to change non-writable metadata need to modify the embedded metadata in the file's contents themselves). 


## Metadata Data Types

Metadata typically comes in a variety of formats and types. In order to facilitate efficient storage and querying, we need to define a group of data types and formats that all metadata we are interested can conform to. The basic data types specified here are: 

* String  (variable length string - up to 16MB max in length) 
* Array of String (comma delimited list of strings) 
* Integer (signed 32 bit) 
* UInt    (unsigned 32 bit) 
* Int64   (signed 64 bit) 
* UInt64 (64 bit unsigned) 
* Float   (64 bit signed floating point number - system dependent double precision) 
* Datetime (ISO 8601 format with timezone IE `YYYY-MM-DDThh:mm:ss+xx:yy` EG "2006-04-01T12:30+01:00") 
Datetime values that contain no explicit timezone info are assumed to be in the user's timezone. 


## Metadata Namespaces

For all metadata, each metadata item needs to be namespaced with its class type using a "." qualifier (EG Audio.Artist represents the metadata Artist for an audio class file). Metadata that is strictly application specific should use a namespace class based on the application name (EG "Nautilus.Window_Geometry"). 

This specification defines the following built-in classes: 

* File 
* Audio 
* Doc 
* Image 

## Generic File Metadata

Generic file metadata is applicable to all files regardless of their format. The specified metadata uses a few [[Dublin Core|http://dublincore.org/documents/dces/]] based types where applicable with the rest being custom ones. Generic file metadata types are namespaced with the "File" class. Only some of the generic metadata may be writable. Custom metadata not listed below that is generic and applies to all files should also be namespaced with the "File" class unless it is strictly application specific. 
[[!table header="no" class="mointable" data="""
 ** Name    **  |  **   Type     **  |  **Writable**  |  **    Description** 
 `File.Name`  |  string  |  No  |  File name excluding path but including the file extension 
 `File.Path`   |  string  |  No  |    full file path of file excluding the filename 
 `File.Link`   |  string  |  No  |    Uri of link target 
 `File.Format`   |  string  |  No  |    Mime type of the file or if a directory it should contain value "Folder"
 `File.Size`   |  uint64  |   No  |  size of the file in bytes or if a directory no. of items it contains
 `File.Permissions`  |  string  |   No  |  Permission string in unix format eg "-rw-r--r--" 
 `File.Publisher`  |  string  |   Yes  |  Editable DC type for the name of the publisher of the file (EG dc:publisher field in RSS feed) 
 `File.Content`   |  string  |   No  |  File's contents filtered as plain text (IE as stored by the indexer) 
 `File.Description`  |  string  |    Yes  |  Editable free text/notes 
 `File.Keywords `  |  array of string  |    Yes  |   Editable array of keywords 
 `File.Rank `  |  integer  |    Yes  |   Editable file rank for grading favourites. Value should be in the range 1..10 
 `File.IconPath `  |  string  |    Yes  |   Editable file uri for a custom icon for the file 
 `File.SmallThumbnailPath`  |  string  |    Yes  |  Editable file uri for a small thumbnail of the file suitable for use in icon views 
 `File.LargeThumbnailPath`  |  string  |    Yes  |  Editable file uri for a larger thumbnail of the file suitable for previews 
 `File.Modified `  |  datetime  |   No  |   Last modified datetime 
 `File.Accessed `  |  datetime  |   No  |   Last access datetime 
"""]]


## Audio Metadata

Audio metadata is based on the widespread [[ID3.1|http://en.wikipedia.org/wiki/Id3]] tags embedded in mp3, ogg and similar files. These are already defined in that specification. All metadata in this section is prefixed with "Audio" and it is recommended that any other metadata not listed below also uses this prefix if its audio related (unless it is application specific).  
[[!table header="no" class="mointable" data="""
 ** Name    **  |  **   Type     **  |  **Writable**   |  **    Description** 
 `Audio.Title `  |  string  |  No  |  title of the track 
 `Audio.Artist`  |  string  |  No  |   artist of the track 
 `Audio.Album `  |  string  |  No  |    name of the album 
 `Audio.AlbumArtist`  |  string  |  No  |    artist of the album 
 `Audio.AlbumTrackCount`  |  integer  |  No  |   total no. of tracks on the album 
 `Audio.TrackNo`  |  integer  |  No  |   position of track on the album 
 `Audio.DiscNo`  |  integer  |  No  |   specifies which disc the track is on 
 `Audio.Performer`  |  string  |  No  |   Name of the performer/conductor of the music 
 `Audio.TrackGain`  |  float  |  No  |   gain adjustment of track 
 `Audio.TrackPeakGain`  |  float  |  No  |   peak gain adjustment of track 
 `Audio.AlbumGain`  |  float  |  No  |   gain adjustment of album 
 `Audio.AlbumPeakGain`  |  float  |  No  |   peak gain adjustment of album 
 `Audio.Duration`  |  integer  |  No  |   duration of track in seconds 
 `Audio.ReleaseDate`     |  Datetime  |  No  |   date track was released 
 `Audio.Comment`        |  string  |  No  |    comments on the track 
 `Audio.Genre `    |  string  |  No  |   type of music classification for the track as defined in ID3 spec 
 `Audio.Codec`        |  string  |  No  |   codec encoding description 
 `Audio.CodecVersion`        |  string  |  No  |   codec version 
 `Audio.Samplerate`        |  integer  |  No  |   samplerate in Hz 
 `Audio.Bitrate`        |  float  |  No  |   bitrate in kbps 
 `Audio.Channels`        |  integer  |  No  |  no. of channels in the audio (2 = stereo) 
 `Audio.LastPlay`     |  datetime  |  Yes  |   when track was last played  
 `Audio.PlayCount`        |  integer  |  Yes  |   No. of times the track has been played 
 `Audio.IsNew`        |  integer  |  Yes  |   set to "1" if track is new to the user. (default "0") 
 `Audio.MBAlbumID`        |  string  |  Yes  |   [[MusicBrainz|http://musicbrainz.org/]] album ID in UUID format 
 `Audio.MBArtistID`        |  string  |  Yes  |   [[MusicBrainz|http://musicbrainz.org/]] artist ID in UUID format 
 `Audio.MBAlbumArtistID`        |  string  |  Yes  |   [[MusicBrainz|http://musicbrainz.org/]] album artist ID in UUID format 
 `Audio.MBTrackID`        |  string  |  Yes  |   [[MusicBrainz|http://musicbrainz.org/]] track ID in UUID format 
 `Audio.Lyrics`        |  string  |  Yes  |   Lyrics of the track 
 `Audio.CoverAlbumThumbnailPath`  |  string  |  Yes  |  File path to thumbnail image of the cover album 
"""]]


## Document Metadata

For documents, applications have typically used a mixture of Dublin Core types and propriety types. In order to be consistent with them, we have based our metadata names likewise. We have also based these names on metadata names found in Open Office, Ms Word and PDF documents. All metadata in this section is prefixed with the "Doc" class and any other document based metadata should also have this prefix (unless it is application specific). All the metadata here is not editable through the interface as all of it is derived from the file contents. 
[[!table header="no" class="mointable" data="""
 ** Name    **  |  **   Type     **  |  **    Description** 
 `Doc.Title`   |  string  |   Title of document 
 `Doc.Subject`  |  string  |   document subject 
 `Doc.Author`   |  string  |   name of the author 
 `Doc.Keywords`   |  string   |  string of keywords
 `Doc.Comments`   |  string  |  user definable free text 
 `Doc.PageCount`   |  integer  |  no. of pages in document 
 `Doc.WordCount`  |  integer  |  total no. of chars in document 
 `Doc.Created `  |  datetime  |  datetime document was originally 
"""]]


## Image Metadata

For images, most support the [[EXIF|http://en.wikipedia.org/wiki/EXIF]] standard and so this largely makes up this specification. SVG files have user definable non-standard metadata so a subset of Dublin Core is also provided here. All metadata in this section is prefixed with the "Image" class and any other image based metadata should also have this prefix (unless it is application specific). All the metadata here is not editable through the interface as all of it is derived from the file contents. 
[[!table header="no" class="mointable" data="""
 ** Name    **  |  **   Type     **  |  **    Description** 
 `Image.Height`   |  integer  |  Height in pixels 
 `Image.Width`  |  integer  |  Width in pixels 
 `Image.Title`   |  string  |   Title of image 
 `Image.Album`   |  string  |   Name of an album the image belongs to 
 `Image.Date `  |  datetime  |  datetime image was originally created 
 `Image.Keywords`  |  string  |   string of keywords 
 `Image.Creator`   |  string  |   name of the author 
 `Image.Comments`   |  string  |  user definable free text 
 `Image.Description`   |  string  |  description of the image 
 `Image.Software`   |  string  |  software used to produce/enhance the image 
 `Image.CameraMake`   |  string  |   make of camera used to take the image 
 `Image.CameraModel`   |  string  |  model of camera used to take the image 
 `Image.Orientation`   |  string  |  represents the orientation  of the image wrt camera IE "top,left" or "bottom,right" 
 `Image.ExposureProgram`  |  string  |  The program used by the camera to set exposure when the picture is taken. EG Manual, Normal, Aperture priority etc 
 `Image.ExposureTime`  |  float  |  Exposure time used to capture the photo in seconds 
 `Image.Fnumber`  |  float  |  Diameter of the aperture relative to the effective focal length of the lens. 
 `Image.Flash`   |  integer  |  Set to 1 if flash was fired 
 `Image.FocalLength`  |  float  |  Focal length of lens in mm 
 `Image.ISOSpeed`  |  float  |  ISO speed used to acquire the document contents. For example, 100, 200, 400, etc 
 `Image.MeteringMode`  |  string  |  Metering mode used to acquire the image (IE Unknown, Average, [[CenterWeightedAverage|CenterWeightedAverage]], Spot, [[MultiSpot|MultiSpot]], Pattern, Partial) 
 `Image.WhiteBalance`  |  string  |  White balance setting of the camera when the picture was taken (auto or manual) 
 `Image.Copyright`  |  string  |  Embedded copyright message 
"""]]