1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
|
README Swahili Myspell Dictionary
Release 1.1 2005-08-17
1. Intro
Myspell Swahili Dictionary - Compiled by Alberto Escudero-Pascual aep@it46.se
http://www.it46.se
2. Word list Sources
The wordlists have been compiled based on the following
resources:
- Dr. Jason M. Githeko (githeko at egerton.ac.ke)
Egerton University, Njoro, Kenya
http://www.egerton.ac.ke/ict/kiswa.php
(48340 words)
- Prof. D.P.B. Massamba, Prof. A.M. Khamisi et al.
TUKI English-Swahili Dictionary
(18327 words)
- Dr. Martin Benjamin et al. (swahili at yale.edu)
The Kamusi Project,
http://www.yale.edu/swahili/
(15418 words)
- Dr. Kevin P. Scannell (scannell at slu.edu)
Corpus building for minority languages
http://borel.slu.edu/crubadan/
(+8008 words)
Total words: 67901
In addition, the programming skills of the following persons
have also contributed to the Jambo Spellchecker:
Dwayne Bailey, Louise Berthilson, Iñaki Cívico Campos, Alberto
Escudero-Pascual and Fredrik Lilieblad.
3. Licence
The Jambo Spellchecker is released as free software (LGPL).
4. Final Notes
- Kamusi Project wordlist:
The Kamusi Project is an ongoing work of collaborative
scholarship that is developing a free online dictionary and
learning resources for Swahili. Established in 1994, it is the
world's most-used resource for the Swahili language, and the
first result for "Swahili" delivered by most Internet search
engines; see http://www.yale.edu/swahili/ for more
information.
- An Crúbadán:
The Swahili word list was improved with the help of Kevin
Scannell's software An Crúbadán, a web crawler that targets
minority languages and languages with limited computational
resources.
In December 2004, the web crawler searched into 6600+ online
Swahili documents and collected about 10 million (non unique)
words .
The goal of the An Crúbadán is to develop language technology
for as many languages as possible by applying statistical
techniques to the vast quantities of text freely available on
the web. Text corpora have been created for nearly 200
languages so far, and these data are available for use by open
source projects; see http://borel.slu.edu/crubadan/ for more
information.
5. TODO
Work in the .aff file
|