summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJehan <jehan@girinstud.io>2016-09-28 22:11:19 +0200
committerJehan <jehan@girinstud.io>2016-09-28 22:13:17 +0200
commitd62154bd6ed1eaeca2e40f36673a3e32acd445d7 (patch)
tree1eca008cffe5e976bd22207dc2c1eab7c9a87938
parentfbd2efdbe918ec18ec79a3b2e0064b2247393cd0 (diff)
LangModels: add Slovene support.
Encodings: ISO-8859-2, ISO-8859-16, Windows-1250, IBM852 and MAC-CENTRALEUROPE. Test text from https://sl.wikipedia.org/wiki/Naseljivi_planet
-rw-r--r--README.md6
-rw-r--r--script/BuildLangModelLogs/LangSloveneModel.log148
-rw-r--r--script/langs/sl.py59
-rw-r--r--src/CMakeLists.txt1
-rw-r--r--src/LangModels/LangSloveneModel.cpp259
-rw-r--r--src/nsSBCSGroupProber.cpp6
-rw-r--r--src/nsSBCSGroupProber.h2
-rw-r--r--src/nsSBCharSetProber.h6
-rw-r--r--test/sl/ibm852.txt9
-rw-r--r--test/sl/iso-8859-16.txt9
-rw-r--r--test/sl/iso-8859-2.txt9
-rw-r--r--test/sl/mac-centraleurope.txt9
-rw-r--r--test/sl/utf-8.txt9
-rw-r--r--test/sl/windows-1250.txt9
14 files changed, 540 insertions, 1 deletions
diff --git a/README.md b/README.md
index 1b54e4a..b4951dc 100644
--- a/README.md
+++ b/README.md
@@ -132,6 +132,12 @@ Techniques used by universalchardet are described at http://www.mozilla.org/proj
* ISO-8859-2
* IBM852
* MAC-CENTRALEUROPE
+ * Slovene
+ * ISO-8859-2
+ * ISO-8859-16
+ * Windows-1250
+ * IBM852
+ * MAC-CENTRALEUROPE
* Spanish
* ISO-8859-1
* ISO-8859-15
diff --git a/script/BuildLangModelLogs/LangSloveneModel.log b/script/BuildLangModelLogs/LangSloveneModel.log
new file mode 100644
index 0000000..e494190
--- /dev/null
+++ b/script/BuildLangModelLogs/LangSloveneModel.log
@@ -0,0 +1,148 @@
+= Logs of language model for Slovene (sl) =
+
+- Generated by BuildLangModel.py
+- Started: 2016-09-28 22:00:35.243966
+- Maximum depth: 5
+- Max number of pages: 100
+
+== Parsed pages ==
+
+XCOM: Enemy Unknown (revision 4704271)
+1UP.com (revision 4547348)
+2K Games (revision 4110089)
+Android (operacijski sistem) (revision 4619359)
+Animator videoigre (revision 4702643)
+App Store (revision 3903089)
+Artefakt (revision 4484504)
+Athlon (revision 4524746)
+Avstralazija (revision 4623530)
+Avtopsija (revision 4541344)
+Bralno-pisalni pomnilnik (revision 4256388)
+Civilization (serija) (revision 4645770)
+Deus Ex: Human Revolution (revision 4694860)
+Digitalna distribucija (revision 4696215)
+DirectX (revision 4477913)
+Dishonored (revision 4619444)
+Edge (magazine) (revision 4690049)
+Electronic Entertainment Expo (revision 4538691)
+Enoigralska videoigra (revision 4610359)
+Eurogamer (revision 4694860)
+Evropa (revision 4687833)
+Fantasy Flight Games (revision 4649361)
+Firaxis Games (revision 4110089)
+GameRankings (revision 3934020)
+GameSpot (revision 4238015)
+GameSpy (revision 4538691)
+GameTrailers (revision 4704271)
+Game Informer (revision 4704271)
+GamesTM (revision 4704271)
+Grafična kartica (revision 4257980)
+Granata (revision 3859332)
+Holograf (revision 4477482)
+IGN (revision 4576233)
+IOS (revision 4597264)
+Igra igranja vlog (revision 4642276)
+Igra na deski (revision 4649363)
+Igralna konzola (revision 4649866)
+Igralni pogon (revision 4622773)
+Intel (revision 4626025)
+International Standard Book Number (revision 4015087)
+Izdelovalec videoigre (revision 3851747)
+Joker (revija) (revision 3867772)
+Kotaku (revision 4613535)
+Kristal (revision 4156234)
+Linux (revision 4524740)
+Lovec prestreznik (revision 4102792)
+MTV (revision 4621758)
+Mac OS X (revision 4601645)
+Machinima (revision 4601716)
+Major (revision 4245802)
+Mednarodna različica (revision 4116054)
+Metacritic (revision 3934020)
+Michael McCann (skladatelj) (revision 4694860)
+MicroProse (revision 4382810)
+Microsoft Windows (revision 4691357)
+Nezemeljsko življenje (revision 4620576)
+NowGamer (revision 4704271)
+OS X (revision 4601645)
+Ognjena ekipa (revision 4694450)
+Operacijski sistem (revision 4698515)
+Ostrostrelec (revision 4529694)
+Pilot (revision 4069093)
+PlayStation 3 (revision 4382944)
+PlayStation Network (revision 4382944)
+PlayStation Vita (revision 3944025)
+Pogon igre (revision 4622773)
+Procesor (revision 4702518)
+Producent videoiger (revision 4599904)
+Razvijalec videoiger (revision 4093281)
+Računalniška miška (revision 4385579)
+Računalniška platforma (revision 4673669)
+Severna Amerika (revision 4643798)
+Sid Meier (revision 4061487)
+Stealth (revision 4618630)
+Steam (revision 4696215)
+Strateška videoigra (revision 4236795)
+Tablični računalnik (revision 4409985)
+Take-Two Interactive (revision 4110089)
+Telepatija (revision 4481192)
+The Bureau: XCOM Declassified (revision 4704271)
+The Guardian (revision 3929479)
+Trdi disk (revision 4644623)
+UFO: Enemy Unknown (revision 4704271)
+Unreal Engine (revision 4622773)
+Unreal Engine 3 (revision 4622773)
+Uporabniški vmesnik (revision 4552473)
+Valve Corporation (revision 4110105)
+Večigralska videoigra (revision 4618639)
+VideoGamer.com (revision 4704271)
+Vohunski satelit (revision 4215166)
+Vojaška taktika (revision 3970259)
+Vojaški čini (revision 4363026)
+
+== End of Parsed pages ==
+
+- Wikipedia parsing ended at: 2016-09-28 22:06:46.133919
+
+41 characters appeared 411226 times.
+
+First 29 characters:
+[ 0] Char a: 10.090315301075321 %
+[ 1] Char e: 9.90477255815537 %
+[ 2] Char i: 9.666703953543793 %
+[ 3] Char o: 9.177921629468953 %
+[ 4] Char n: 7.28309980400072 %
+[ 5] Char r: 5.808241696779873 %
+[ 6] Char s: 4.575586174025961 %
+[ 7] Char t: 4.4963110309173056 %
+[ 8] Char j: 4.343840126840229 %
+[ 9] Char l: 4.2672399118732764 %
+[10] Char v: 3.802775116359374 %
+[11] Char p: 3.5216644861949393 %
+[12] Char k: 3.5136397017698293 %
+[13] Char d: 3.0387183689747244 %
+[14] Char m: 2.9487435132992563 %
+[15] Char z: 2.350775485985808 %
+[16] Char u: 1.9719083910064055 %
+[17] Char g: 1.9342162217369525 %
+[18] Char b: 1.5392995579073308 %
+[19] Char c: 1.2924766430138173 %
+[20] Char h: 1.1864522184881305 %
+[21] Char č: 1.137087635509428 %
+[22] Char š: 0.6932927392723223 %
+[23] Char ž: 0.45303555709026183 %
+[24] Char f: 0.40707542811009034 %
+[25] Char x: 0.19381070263067024 %
+[26] Char y: 0.19040624863213904 %
+[27] Char w: 0.18919037220409216 %
+[28] Char q: 0.011186063138031156 %
+
+The first 29 characters have an accumulated ratio of 0.9998978663800442.
+
+727 sequences found.
+
+First 512 (typical positive ratio): 0.9983524317161332
+Next 512 (512-1024): 2.4317528560937295e-06
+Rest: -3.859759734048396e-17
+
+- Processing end: 2016-09-28 22:06:46.601266
diff --git a/script/langs/sl.py b/script/langs/sl.py
new file mode 100644
index 0000000..bf02bf8
--- /dev/null
+++ b/script/langs/sl.py
@@ -0,0 +1,59 @@
+#!/bin/python3
+# -*- coding: utf-8 -*-
+
+# ##### BEGIN LICENSE BLOCK #####
+# Version: MPL 1.1/GPL 2.0/LGPL 2.1
+#
+# The contents of this file are subject to the Mozilla Public License Version
+# 1.1 (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+# http://www.mozilla.org/MPL/
+#
+# Software distributed under the License is distributed on an "AS IS" basis,
+# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
+# for the specific language governing rights and limitations under the
+# License.
+#
+# The Original Code is Mozilla Universal charset detector code.
+#
+# The Initial Developer of the Original Code is
+# Netscape Communications Corporation.
+# Portions created by the Initial Developer are Copyright (C) 2001
+# the Initial Developer. All Rights Reserved.
+#
+# Contributor(s):
+# Jehan <jehan@girinstud.io>
+#
+# Alternatively, the contents of this file may be used under the terms of
+# either the GNU General Public License Version 2 or later (the "GPL"), or
+# the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
+# in which case the provisions of the GPL or the LGPL are applicable instead
+# of those above. If you wish to allow use of your version of this file only
+# under the terms of either the GPL or the LGPL, and not to allow others to
+# use your version of this file under the terms of the MPL, indicate your
+# decision by deleting the provisions above and replace them with the notice
+# and other provisions required by the GPL or the LGPL. If you do not delete
+# the provisions above, a recipient may use your version of this file under
+# the terms of any one of the MPL, the GPL or the LGPL.
+#
+# ##### END LICENSE BLOCK #####
+
+import re
+
+## Mandatory Properties ##
+
+name = 'Slovene'
+code = 'sl'
+use_ascii = True
+charsets = ['ISO-8859-2', 'ISO-8859-16',
+ 'Windows-1250', 'IBM852', 'MAC-CENTRALEUROPE']
+
+## Optional Properties ##
+
+# Alphabet characters.
+alphabet = 'čšž'
+# The starred page which was rewarded on the main page when I created
+# the data.
+start_pages = ['XCOM: Enemy Unknown']
+wikipedia_code = code
+case_mapping = True
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 2525ec6..67e76b1 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -30,6 +30,7 @@ set(
LangModels/LangRomanianModel.cpp
LangModels/LangRussianModel.cpp
LangModels/LangSlovakModel.cpp
+ LangModels/LangSloveneModel.cpp
LangModels/LangSpanishModel.cpp
LangModels/LangThaiModel.cpp
LangModels/LangTurkishModel.cpp
diff --git a/src/LangModels/LangSloveneModel.cpp b/src/LangModels/LangSloveneModel.cpp
new file mode 100644
index 0000000..da28d86
--- /dev/null
+++ b/src/LangModels/LangSloveneModel.cpp
@@ -0,0 +1,259 @@
+/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
+/* ***** BEGIN LICENSE BLOCK *****
+ * Version: MPL 1.1/GPL 2.0/LGPL 2.1
+ *
+ * The contents of this file are subject to the Mozilla Public License Version
+ * 1.1 (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ * http://www.mozilla.org/MPL/
+ *
+ * Software distributed under the License is distributed on an "AS IS" basis,
+ * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
+ * for the specific language governing rights and limitations under the
+ * License.
+ *
+ * The Original Code is Mozilla Communicator client code.
+ *
+ * The Initial Developer of the Original Code is
+ * Netscape Communications Corporation.
+ * Portions created by the Initial Developer are Copyright (C) 1998
+ * the Initial Developer. All Rights Reserved.
+ *
+ * Contributor(s):
+ *
+ * Alternatively, the contents of this file may be used under the terms of
+ * either the GNU General Public License Version 2 or later (the "GPL"), or
+ * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
+ * in which case the provisions of the GPL or the LGPL are applicable instead
+ * of those above. If you wish to allow use of your version of this file only
+ * under the terms of either the GPL or the LGPL, and not to allow others to
+ * use your version of this file under the terms of the MPL, indicate your
+ * decision by deleting the provisions above and replace them with the notice
+ * and other provisions required by the GPL or the LGPL. If you do not delete
+ * the provisions above, a recipient may use your version of this file under
+ * the terms of any one of the MPL, the GPL or the LGPL.
+ *
+ * ***** END LICENSE BLOCK ***** */
+
+#include "../nsSBCharSetProber.h"
+
+/********* Language model for: Slovene *********/
+
+/**
+ * Generated by BuildLangModel.py
+ * On: 2016-09-28 22:06:46.134717
+ **/
+
+/* Character Mapping Table:
+ * ILL: illegal character.
+ * CTR: control character specific to the charset.
+ * RET: carriage/return.
+ * SYM: symbol (punctuation) that does not belong to word.
+ * NUM: 0 - 9.
+ *
+ * Other characters are ordered by probabilities
+ * (0 is the most common character in the language).
+ *
+ * Orders are generic to a language. So the codepoint with order X in
+ * CHARSET1 maps to the same character as the codepoint with the same
+ * order X in CHARSET2 for the same language.
+ * As such, it is possible to get missing order. For instance the
+ * ligature of 'o' and 'e' exists in ISO-8859-15 but not in ISO-8859-1
+ * even though they are both used for French. Same for the euro sign.
+ */
+static const unsigned char Iso_8859_2_CharToOrderMap[] =
+{
+ CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
+ CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
+ SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
+ NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
+ SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 4X */
+ 11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,SYM, /* 5X */
+ SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 6X */
+ 11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,CTR, /* 7X */
+ CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
+ CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
+ SYM, 41,SYM, 42,SYM, 43, 44,SYM,SYM, 22, 45, 46, 47,SYM, 23, 48, /* AX */
+ SYM, 49,SYM, 50,SYM, 51, 52,SYM,SYM, 22, 53, 54, 55,SYM, 23, 56, /* BX */
+ 57, 32, 58, 59, 60, 61, 37, 34, 21, 29, 62, 36, 63, 30, 64, 65, /* CX */
+ 66, 67, 68, 31, 35, 69, 70,SYM, 71, 72, 39, 73, 74, 40, 75, 76, /* DX */
+ 77, 32, 78, 79, 80, 81, 37, 34, 21, 29, 82, 36, 83, 30, 84, 85, /* EX */
+ 86, 87, 88, 31, 35, 89, 90,SYM, 91, 92, 39, 93, 94, 40, 95,SYM, /* FX */
+};
+/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
+
+static const unsigned char Iso_8859_16_CharToOrderMap[] =
+{
+ CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
+ CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
+ SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
+ NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
+ SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 4X */
+ 11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,SYM, /* 5X */
+ SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 6X */
+ 11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,CTR, /* 7X */
+ CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 8X */
+ CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 9X */
+ SYM, 96, 97, 98,SYM,SYM, 22,SYM, 22,SYM, 99,SYM,100,SYM,101,102, /* AX */
+ SYM,SYM, 21,103, 23,SYM,SYM,SYM, 23, 21,104,SYM,105,106,107,108, /* BX */
+ 109, 32,110,111,112, 37,113, 34,114, 29, 33, 36,115, 30,116,117, /* CX */
+ 118,119,120, 31, 35,121,122,123,124,125, 39,126,127,128,129,130, /* DX */
+ 131, 32,132,133,134, 37,135, 34,136, 29, 33, 36,137, 30,138,139, /* EX */
+ 140,141,142, 31, 35,143,144,145,146,147, 39,148,149,150,151,152, /* FX */
+};
+/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
+
+static const unsigned char Windows_1250_CharToOrderMap[] =
+{
+ CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
+ CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
+ SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
+ NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
+ SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 4X */
+ 11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,SYM, /* 5X */
+ SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 6X */
+ 11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,CTR, /* 7X */
+ SYM,ILL,SYM,ILL,SYM,SYM,SYM,SYM,ILL,SYM, 22,SYM,153,154, 23,155, /* 8X */
+ ILL,SYM,SYM,SYM,SYM,SYM,SYM,SYM,ILL,SYM, 22,SYM,156,157, 23,158, /* 9X */
+ SYM,SYM,SYM,159,SYM,160,SYM,SYM,SYM,SYM,161,SYM,SYM,SYM,SYM,162, /* AX */
+ SYM,SYM,SYM,163,SYM,SYM,SYM,SYM,SYM,164,165,SYM,166,SYM,167,168, /* BX */
+ 169, 32,170,171,172,173, 37, 34, 21, 29,174, 36,175, 30,176,177, /* CX */
+ 178,179,180, 31, 35,181,182,SYM,183,184, 39,185,186, 40,187,188, /* DX */
+ 189, 32,190,191,192,193, 37, 34, 21, 29,194, 36,195, 30,196,197, /* EX */
+ 198,199,200, 31, 35,201,202,SYM,203,204, 39,205,206, 40,207,SYM, /* FX */
+};
+/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
+
+static const unsigned char Mac_Centraleurope_CharToOrderMap[] =
+{
+ CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
+ CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
+ SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
+ NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
+ SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 4X */
+ 11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,SYM, /* 5X */
+ SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 6X */
+ 11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,CTR, /* 7X */
+ 208,209,210, 29,211,212,213, 32,214, 21,215, 21, 37, 37, 29,216, /* 8X */
+ 217,218, 30,219, 38, 38,220, 31,221, 35,222,223, 39,224,225,226, /* 9X */
+ SYM,SYM,227,SYM,SYM,SYM,SYM,228,SYM,SYM,SYM,229,SYM,SYM,230,231, /* AX */
+ 232,233,SYM,SYM,234,235,SYM,SYM,236,237,238,239,240,241,242,243, /* BX */
+ 244,245,SYM,SYM,246,247,SYM,SYM,SYM,SYM,SYM,248,249,249,249,249, /* CX */
+ SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,249,249,249,249,SYM,SYM,249,249, /* DX */
+ 249, 22,SYM,SYM, 22,249,249, 32,249,249, 30, 23, 23,249, 31, 35, /* EX */
+ 249,249, 39,249,249,249,249,249, 40, 40,249,249,249,249,249,SYM, /* FX */
+};
+/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
+
+static const unsigned char Ibm852_CharToOrderMap[] =
+{
+ CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,RET,CTR,CTR,RET,CTR,CTR, /* 0X */
+ CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR,CTR, /* 1X */
+ SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* 2X */
+ NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,NUM,SYM,SYM,SYM,SYM,SYM,SYM, /* 3X */
+ SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 4X */
+ 11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,SYM, /* 5X */
+ SYM, 0, 18, 19, 13, 1, 24, 17, 20, 2, 8, 12, 9, 14, 4, 3, /* 6X */
+ 11, 28, 5, 6, 7, 16, 10, 27, 25, 26, 15,SYM,SYM,SYM,SYM,CTR, /* 7X */
+ 34,249, 29,249,249,249, 37, 34,249, 36,249,249,249,249,249, 37, /* 8X */
+ 29,249,249, 35,249,249,249,249,249,249,249,249,249,249,SYM, 21, /* 9X */
+ 32, 30, 31, 39,249,249, 23, 23,249,249,SYM,249, 21,249,SYM,SYM, /* AX */
+ SYM,SYM,SYM,SYM,SYM, 32,249,249,249,SYM,SYM,SYM,SYM,249,249,SYM, /* BX */
+ SYM,SYM,SYM,SYM,SYM,SYM,249,249,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM, /* CX */
+ 249,249,249, 36,249,249, 30,249,249,SYM,SYM,SYM,SYM,249,249,SYM, /* DX */
+ 31,249, 35,249,249,249, 22, 22,249, 39,249,249, 40, 40,249,SYM, /* EX */
+ SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,SYM,249,249,249,SYM,SYM, /* FX */
+};
+/*X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 XA XB XC XD XE XF */
+
+
+/* Model Table:
+ * Total sequences: 727
+ * First 512 sequences: 0.9983524317161332
+ * Next 512 sequences (512-1024): 0.0016475682838668457
+ * Rest: -3.859759734048396e-17
+ * Negative sequences: TODO
+ */
+static const PRUint8 SloveneLangModel[] =
+{
+ 2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,
+ 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,
+ 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,2,0,
+ 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,
+ 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,2,3,2,2,
+ 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,3,2,2,
+ 3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,2,3,2,3,3,3,2,0,0,3,2,3,3,2,
+ 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,0,0,0,3,2,3,3,0,
+ 3,3,3,3,3,2,3,3,0,0,3,3,3,3,3,2,3,2,3,3,3,2,3,0,0,0,0,0,0,
+ 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,2,3,2,3,3,2,3,2,0,
+ 3,3,3,3,3,3,3,3,3,3,0,3,3,3,3,3,3,3,2,3,3,3,3,2,2,2,2,0,0,
+ 3,3,3,3,3,3,3,3,2,3,0,3,3,3,2,2,3,3,3,3,3,2,2,0,0,0,3,2,2,
+ 3,3,3,3,3,3,3,3,3,3,3,0,2,3,3,2,3,0,2,3,3,0,3,0,2,0,3,2,0,
+ 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,2,3,2,2,3,2,0,
+ 3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,0,3,2,3,3,2,2,2,0,2,2,3,2,0,
+ 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,2,3,2,0,2,0,0,0,
+ 3,3,3,2,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,3,3,3,2,0,0,
+ 3,3,3,3,3,3,3,2,0,3,3,3,2,2,2,0,3,2,3,2,3,0,0,0,2,2,2,2,0,
+ 3,3,3,3,3,3,3,3,3,3,3,2,2,3,3,0,3,0,2,2,0,3,3,2,2,0,3,0,0,
+ 3,3,3,3,3,3,3,3,0,3,2,3,3,3,2,2,3,2,2,3,3,0,0,0,2,2,3,2,2,
+ 3,3,3,3,3,3,2,3,0,3,3,3,3,2,2,2,3,0,2,0,0,2,0,0,2,0,2,2,0,
+ 3,3,3,3,3,3,0,0,3,3,2,2,3,2,0,0,3,0,2,2,0,0,2,0,0,0,0,0,0,
+ 3,3,3,3,3,2,0,3,3,3,2,3,3,0,0,0,3,0,0,0,0,3,0,2,0,0,0,0,0,
+ 3,3,3,2,3,2,0,2,3,3,2,0,3,0,0,0,3,2,3,2,0,0,0,2,0,0,0,0,0,
+ 3,3,3,3,2,3,3,3,0,3,0,0,0,2,2,0,3,2,0,2,2,0,0,0,3,2,2,2,0,
+ 3,3,3,3,2,2,2,3,0,0,2,3,0,2,2,0,3,2,3,3,2,0,0,0,2,2,2,2,0,
+ 3,3,2,3,3,2,3,3,3,3,0,2,2,2,2,0,2,2,2,3,2,0,0,0,0,2,0,2,0,
+ 3,3,3,3,3,0,3,0,0,2,0,0,0,0,2,0,2,2,2,0,2,0,0,0,2,0,2,3,0,
+ 0,0,0,0,2,0,0,2,0,2,0,0,0,0,0,0,3,0,0,2,0,0,0,0,0,0,0,0,0,
+};
+
+
+const SequenceModel Iso_8859_2SloveneModel =
+{
+ Iso_8859_2_CharToOrderMap,
+ SloveneLangModel,
+ 29,
+ (float)0.9983524317161332,
+ PR_TRUE,
+ "ISO-8859-2"
+};
+
+const SequenceModel Iso_8859_16SloveneModel =
+{
+ Iso_8859_16_CharToOrderMap,
+ SloveneLangModel,
+ 29,
+ (float)0.9983524317161332,
+ PR_TRUE,
+ "ISO-8859-16"
+};
+
+const SequenceModel Windows_1250SloveneModel =
+{
+ Windows_1250_CharToOrderMap,
+ SloveneLangModel,
+ 29,
+ (float)0.9983524317161332,
+ PR_TRUE,
+ "WINDOWS-1250"
+};
+
+const SequenceModel Mac_CentraleuropeSloveneModel =
+{
+ Mac_Centraleurope_CharToOrderMap,
+ SloveneLangModel,
+ 29,
+ (float)0.9983524317161332,
+ PR_TRUE,
+ "MAC-CENTRALEUROPE"
+};
+
+const SequenceModel Ibm852SloveneModel =
+{
+ Ibm852_CharToOrderMap,
+ SloveneLangModel,
+ 29,
+ (float)0.9983524317161332,
+ PR_TRUE,
+ "IBM852"
+};
diff --git a/src/nsSBCSGroupProber.cpp b/src/nsSBCSGroupProber.cpp
index 96c93e0..161129d 100644
--- a/src/nsSBCSGroupProber.cpp
+++ b/src/nsSBCSGroupProber.cpp
@@ -179,6 +179,12 @@ nsSBCSGroupProber::nsSBCSGroupProber()
mProbers[87] = new nsSingleByteCharSetProber(&Iso_8859_16RomanianModel);
mProbers[88] = new nsSingleByteCharSetProber(&Ibm852RomanianModel);
+ mProbers[89] = new nsSingleByteCharSetProber(&Windows_1250SloveneModel);
+ mProbers[90] = new nsSingleByteCharSetProber(&Iso_8859_2SloveneModel);
+ mProbers[91] = new nsSingleByteCharSetProber(&Iso_8859_16SloveneModel);
+ mProbers[92] = new nsSingleByteCharSetProber(&Mac_CentraleuropeSloveneModel);
+ mProbers[93] = new nsSingleByteCharSetProber(&Ibm852SloveneModel);
+
Reset();
}
diff --git a/src/nsSBCSGroupProber.h b/src/nsSBCSGroupProber.h
index 7f7425c..b22f46e 100644
--- a/src/nsSBCSGroupProber.h
+++ b/src/nsSBCSGroupProber.h
@@ -40,7 +40,7 @@
#define nsSBCSGroupProber_h__
-#define NUM_OF_SBCS_PROBERS 89
+#define NUM_OF_SBCS_PROBERS 94
class nsCharSetProber;
class nsSBCSGroupProber: public nsCharSetProber {
diff --git a/src/nsSBCharSetProber.h b/src/nsSBCharSetProber.h
index e6dd2ae..dd29b90 100644
--- a/src/nsSBCharSetProber.h
+++ b/src/nsSBCharSetProber.h
@@ -240,5 +240,11 @@ extern const SequenceModel Iso_8859_2RomanianModel;
extern const SequenceModel Iso_8859_16RomanianModel;
extern const SequenceModel Ibm852RomanianModel;
+extern const SequenceModel Windows_1250SloveneModel;
+extern const SequenceModel Iso_8859_2SloveneModel;
+extern const SequenceModel Iso_8859_16SloveneModel;
+extern const SequenceModel Ibm852SloveneModel;
+extern const SequenceModel Mac_CentraleuropeSloveneModel;
+
#endif /* nsSingleByteCharSetProber_h__ */
diff --git a/test/sl/ibm852.txt b/test/sl/ibm852.txt
new file mode 100644
index 0000000..5fa60a4
--- /dev/null
+++ b/test/sl/ibm852.txt
@@ -0,0 +1,9 @@
+Naseljvi plant je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
+zmoen razviti in ohranjati ivljenje.
+
+Ker je obstoj nezemeljskega ivljenja trenutno negotov, je raziskovanje
+naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in znailnosti
+Sonca in celotnega Osonja, ki govorijo v prid razvitju ivljenja. e posebej so
+pomembni faktorji, ki so ohranili zapletene, mnogoceline organizme in ne le
+preprosta, enocelina iva bitja, mikroorganizme. Raziskovanje in teorija v tej
+smeri je del planetologije in razvijajoe astrobiologije.
diff --git a/test/sl/iso-8859-16.txt b/test/sl/iso-8859-16.txt
new file mode 100644
index 0000000..80d0b26
--- /dev/null
+++ b/test/sl/iso-8859-16.txt
@@ -0,0 +1,9 @@
+Naseljvi plant je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
+zmoen razviti in ohranjati ivljenje.
+
+Ker je obstoj nezemeljskega ivljenja trenutno negotov, je raziskovanje
+naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in znailnosti
+Sonca in celotnega Osonja, ki govorijo v prid razvitju ivljenja. e posebej so
+pomembni faktorji, ki so ohranili zapletene, mnogoceline organizme in ne le
+preprosta, enocelina iva bitja, mikroorganizme. Raziskovanje in teorija v tej
+smeri je del planetologije in razvijajoe astrobiologije.
diff --git a/test/sl/iso-8859-2.txt b/test/sl/iso-8859-2.txt
new file mode 100644
index 0000000..7af252e
--- /dev/null
+++ b/test/sl/iso-8859-2.txt
@@ -0,0 +1,9 @@
+Naseljvi plant je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
+zmoen razviti in ohranjati ivljenje.
+
+Ker je obstoj nezemeljskega ivljenja trenutno negotov, je raziskovanje
+naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in znailnosti
+Sonca in celotnega Osonja, ki govorijo v prid razvitju ivljenja. e posebej so
+pomembni faktorji, ki so ohranili zapletene, mnogoceline organizme in ne le
+preprosta, enocelina iva bitja, mikroorganizme. Raziskovanje in teorija v tej
+smeri je del planetologije in razvijajoe astrobiologije.
diff --git a/test/sl/mac-centraleurope.txt b/test/sl/mac-centraleurope.txt
new file mode 100644
index 0000000..4e84135
--- /dev/null
+++ b/test/sl/mac-centraleurope.txt
@@ -0,0 +1,9 @@
+Naseljvi plant je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
+zmoen razviti in ohranjati ivljenje.
+
+Ker je obstoj nezemeljskega ivljenja trenutno negotov, je raziskovanje
+naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in znailnosti
+Sonca in celotnega Osonja, ki govorijo v prid razvitju ivljenja. e posebej so
+pomembni faktorji, ki so ohranili zapletene, mnogoceline organizme in ne le
+preprosta, enocelina iva bitja, mikroorganizme. Raziskovanje in teorija v tej
+smeri je del planetologije in razvijajoe astrobiologije.
diff --git a/test/sl/utf-8.txt b/test/sl/utf-8.txt
new file mode 100644
index 0000000..11d013b
--- /dev/null
+++ b/test/sl/utf-8.txt
@@ -0,0 +1,9 @@
+Naseljívi planét je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
+zmožen razviti in ohranjati življenje.
+
+Ker je obstoj nezemeljskega življenja trenutno negotov, je raziskovanje
+naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in značilnosti
+Sonca in celotnega Osončja, ki govorijo v prid razvitju življenja. Še posebej so
+pomembni faktorji, ki so ohranili zapletene, mnogocelične organizme in ne le
+preprosta, enocelična živa bitja, mikroorganizme. Raziskovanje in teorija v tej
+smeri je del planetologije in razvijajoče astrobiologije.
diff --git a/test/sl/windows-1250.txt b/test/sl/windows-1250.txt
new file mode 100644
index 0000000..512309b
--- /dev/null
+++ b/test/sl/windows-1250.txt
@@ -0,0 +1,9 @@
+Naseljvi plant je planet ali naravni satelit (redkeje tudi asteroid[1]), ki je
+zmoen razviti in ohranjati ivljenje.
+
+Ker je obstoj nezemeljskega ivljenja trenutno negotov, je raziskovanje
+naseljivih planetov v glavnem ekstrapolacija razmer na Zemlji in znailnosti
+Sonca in celotnega Osonja, ki govorijo v prid razvitju ivljenja. e posebej so
+pomembni faktorji, ki so ohranili zapletene, mnogoceline organizme in ne le
+preprosta, enocelina iva bitja, mikroorganizme. Raziskovanje in teorija v tej
+smeri je del planetologije in razvijajoe astrobiologije.