summaryrefslogtreecommitdiff
path: root/script/BuildLangModelLogs/LangSlovakModel.log
blob: 429f32eba94f91c7c64ec1cfe8b01850bcb47a7f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
= Logs of language model for Slovak (sk) =

- Generated by BuildLangModel.py
- Started: 2021-03-21 12:48:41.368218
- Maximum depth: 4
- Max number of pages: 100

== Parsed pages ==

Európska_únia (revision 7169513)
1. decembra (revision 6792273)
1. júl (revision 7066144)
1. svetová vojna (revision 7159151)
10 centov (euro) (revision 6293215)
1952 (revision 7177031)
1957 (revision 7078231)
1958 (revision 7144704)
1960 (revision 7163978)
1967 (revision 7016805)
1968 (revision 7173483)
1973 (revision 7149623)
1979 (revision 7169115)
1981 (revision 7066520)
1985 (revision 7161691)
1986 (revision 7151177)
1987 (revision 7065067)
1990 (revision 7178863)
1992 (revision 7135542)
1993 (revision 7122277)
1995 (revision 7133683)
1999 (revision 7133241)
1 cent (euro) (revision 6963154)
1 euro (revision 6264994)
2003 (revision 7135529)
2004 (revision 7149802)
2007 (revision 7135534)
2008 (revision 7156084)
2009 (revision 7135536)
2013 (revision 7135522)
2016 (revision 7159554)
2017 (revision 7174262)
20 centov (euro) (revision 6293208)
23. jún (revision 7052430)
2 centy (euro) (revision 6963155)
2 eurá (revision 6452782)
31. december (revision 7149783)
50 centov (euro) (revision 6293202)
5 centov (euro) (revision 6963157)
Acquis communautaire (revision 7033703)
Al Gore (revision 7146244)
Albánsko (revision 7172414)
Americký dolár (revision 7050515)
Amsterdamská zmluva (revision 7070102)
Angličtina (revision 7148052)
Angola (revision 7035956)
Antigua a Barbuda (revision 6560340)
Argentína (revision 7171908)
Arménsko (revision 7147325)
Atény (revision 7150984)
Austrália (štát) (revision 7154003)
Azory (revision 6595058)
Bahrajn (revision 7178284)
Bangladéš (revision 7147804)
Barack Obama (revision 7158748)
Barbados (revision 7178784)
Belgicko (revision 7163339)
Belgický frank (revision 6953531)
Belize (revision 7156055)
Benin (revision 7172640)
Bolívia (revision 7111159)
Botswana (revision 7158699)
Brazília (revision 7177507)
Brettonwoodská menová sústava (revision 6710540)
Brunej (revision 6975045)
Brusel (revision 7037073)
Bulharsko (revision 7177290)
Bulharský lev (revision 6230899)
Bulharčina (revision 7150125)
Burkina (revision 7158783)
Burundi (revision 7049945)
Ceuta (revision 6575679)
Charles Michel (revision 7098830)
Chorvátska kuna (revision 6935490)
Chorvátsko (revision 7131429)
Chorvátčina (revision 7178832)
Clo (revision 6894735)
Cyperská libra (revision 5964697)
Cyprus (revision 7035263)
David-Maria Sassoli (revision 7032560)
David Cameron (revision 7078464)
Demokracia (revision 7049807)
Denis Mukwege (revision 6800186)
Dominika (štát) (revision 7126694)
Dominikánska republika (revision 7080374)
Drachma (novoveké Grécko) (revision 6391564)
Druhá svetová vojna (revision 7151355)
Dunaj (revision 7150320)
Dánska koruna (revision 6125942)
Dánsko (revision 7161625)
Dánčina (revision 6557304)
Džibutsko (revision 7111764)
EHS (revision 6927031)
Eduard Kukan (revision 7079321)
Egypt (revision 7151318)
Ekvádor (revision 7073543)
Ellen Johnsonová- Sirleafová (revision 7151906)
Estónska koruna (revision 6751629)
Estónsko (revision 7148919)

== End of Parsed pages ==

- Wikipedia parsing ended at: 2021-03-21 13:00:32.553701

70 characters appeared 674892 times.

Most Frequent characters:
[ 0] Char a: 8.935503754674821 %
[ 1] Char o: 8.347409659619613 %
[ 2] Char e: 8.052103151319026 %
[ 3] Char n: 6.170320584626874 %
[ 4] Char r: 6.046300741451965 %
[ 5] Char i: 5.852195610556948 %
[ 6] Char s: 5.3632284869282785 %
[ 7] Char k: 4.751278723114217 %
[ 8] Char t: 4.600439774067555 %
[ 9] Char l: 4.167037096305779 %
[10] Char v: 4.090580418792933 %
[11] Char m: 3.1385762462734776 %
[12] Char d: 2.7853345424156752 %
[13] Char u: 2.7336225647955525 %
[14] Char p: 2.6873929458342967 %
[15] Char c: 2.5881178025521123 %
[16] Char á: 2.0701089952170126 %
[17] Char h: 2.0477350450146097 %
[18] Char j: 1.9521641981235516 %
[19] Char b: 1.921344452149381 %
[20] Char z: 1.6398179264237835 %
[21] Char y: 1.3830361005909093 %
[22] Char ý: 1.2827237543192096 %
[23] Char í: 0.8906610242824038 %
[24] Char č: 0.8473948424340486 %
[25] Char é: 0.7884224438873183 %
[26] Char ú: 0.7808656792494206 %
[27] Char g: 0.749897761419605 %
[28] Char f: 0.6475110091688744 %
[29] Char š: 0.6189138410293795 %
[30] Char ž: 0.4720755320851336 %
[31] Char ľ: 0.4089543215803418 %
[32] Char ó: 0.3095310064425123 %
[33] Char ť: 0.24344635882481935 %
[34] Char w: 0.11735210967088068 %
[35] Char ô: 0.10297943967331069 %
[36] Char ä: 0.09142203493299668 %
[37] Char x: 0.08312441101687382 %
[38] Char ň: 0.07201152184349496 %
[39] Char ď: 0.06993711586446424 %
[40] Char q: 0.017187935254825957 %
[41] Char ë: 0.011112889173378852 %
[42] Char ř: 0.010075686183863493 %
[43] Char ü: 0.009186655049993185 %
[44] Char ě: 0.008445795771767926 %
[45] Char ö: 0.007260420926607517 %
[46] Char ĺ: 0.006371389792737208 %
[47] Char ć: 0.006223217937092157 %
[48] Char ŕ: 0.0044451556693515405 %

The first 49 characters have an accumulated ratio of 0.9998118217433309.

1410 sequences found.

First 773 (typical positive ratio): 0.9950030300775062
Next 277 (1050-773): 0.003999347913144824
Rest: 0.0009976220093489419

- Processing end: 2021-03-21 13:00:33.050085