diff options
Diffstat (limited to 'Distributions/AppStream/Attic/XapianIndexHOWTO.mdwn')
-rw-r--r-- | Distributions/AppStream/Attic/XapianIndexHOWTO.mdwn | 71 |
1 files changed, 71 insertions, 0 deletions
diff --git a/Distributions/AppStream/Attic/XapianIndexHOWTO.mdwn b/Distributions/AppStream/Attic/XapianIndexHOWTO.mdwn new file mode 100644 index 00000000..86873cf6 --- /dev/null +++ b/Distributions/AppStream/Attic/XapianIndexHOWTO.mdwn @@ -0,0 +1,71 @@ + + +# Build yourself a Xapian index of package info + + +## Run the Debian indexer on your distro + +The Debian Xapian indexer is called update-apt-xapian-index and normally it reads data from the Apt database. Luckily it also has an option (--pkgfile=_file_) for reading data from a plain file, which is used to build server-side indices and to build a test environment for its test suite. If you can generate a suitable input file, update-apt-xapian-index will build an index for you. + +The input file has the same format as the Debian Packages file, which is similar to email or HTTP headers: +[[!format txt """ +Package: 2vcard +Priority: optional +Section: utils +Installed-Size: 108 +Maintainer: Martin Albisetti <argentina@gmail.com> +Architecture: all +Version: 0.5-3 +Filename: pool/main/2/2vcard/2vcard_0.5-3_all.deb +Size: 14300 +MD5sum: d831fd82a8605e9258b2314a7d703abe +SHA1: e903a05f168a825ff84c87326898a182635f8175 +SHA256: 2be9a86f0ec99b1299880c6bf0f4da8257c74a61341c14c103b70c9ec04b10ec +Description: perl script to convert an addressbook to VCARD file format + 2vcard is a little perl script that you can use to convert the + popular vcard file format. Currently 2vcard can only convert addressbooks + and alias files from the following formats: abook,eudora,juno,ldif,mutt, + mh and pine. + . + The VCARD format is used by gnomecard, for example, which is used by the + balsa email client. +Tag: implemented-in::perl, role::program, use::converting + +Package: 3dchess +[...] +"""]] +Records are separated with an empty line, and long fields like 'Description' use continuation lines that start with spaces. The first line of the description is the short description, the rest is the long description; an empty line in the Description is represented with a dot. + +For update-apt-xapian-index you only need the fields **Package**, **Version**, **Description**, **Tag**, Section, Installed-Size and Size. Tag, Section, Installed-Size and Size are all optional, although you probably want Tag for Debtags categories. + +If you want to start playing with the indexer without building your own input file, you can run `apt-cache dumpavail` on any Debian or Ubuntu system to extract the whole system dataset. Alternatively, you can use any [[Packages|http://ftp.debian.org/debian/dists/sid/main/binary-amd64/Packages.gz]] file from a Debian mirror. + +**Dependencies**: + +* python-xapian (Python bindings for Xapian) +* [[python-debian|http://packages.debian.org/sid/python-debian]] (used to read some Debian-style files, source is straightforward to build) +* python-chardet, dependency of python-debian, available in Fedora, Mandriva/Mageia and Suse with the same name +**Building the index**: +[[!format txt """ +git clone git://git.debian.org/git/collab-maint/apt-xapian-index.git +cd apt-xapian-index + +# Testrun is just a simple wrapper that exports the variables needed +# to run the indexer in the current directory +./testrun --pkgfile=inputfile --force --verbose # Creates an index in testdb/ + +# Try querying it with Xapian's low-level "delve" tool, to see if it worked: +delve -1 -d -t edit testdb/index +"""]] +The Xapian index itself is in testdb/index; testdb/ will contain other information about the index, including an autogenerated README file documenting its contents, especially the [[term prefixes|http://xapian.org/docs/omega/termprefixes.html]] used by the index. + +Congratulations: you can now try querying the index. The Xapian website has documentation and examples for [[C++|http://xapian.org/docs/quickstart.html]] and [[Python, Perl, PHP, Ruby, C#, Java and more bindings|http://xapian.org/docs/bindings/]]. + +Patches welcome for alternative input file formats and extra plugins to index extra info you may need. Please update this page with your experience if you try it. + +**Possible things to try**: + +* Change DEBTAGSDB in plugins/debtags.py to make it read Debtags information from one of the [[distromatch exports|http://www.enricozini.org/2011/debian/distromatch/]] so you don't need to add them as Tag: fields +* Get [[pkgshelf|http://www.enricozini.org/2011/debian/pkgshelf/]] to work (it should only need `export AXI_DB_PATH=testdb` and editing `pkgshelf/__init__.py` to change `/var/lib/debtags/package-tags` with the location of your distromatch Debtags export. +* If you need to index some extra information, take a look at plugins/template.py for a plugin template: you only need to redefine the method indexDeb822. +* Build an indexer that reads the native package database for your own distribution, then get in touch with Enrico to see if it can all fit in the same codebase.
\ No newline at end of file |