summaryrefslogtreecommitdiff
path: root/doc/syntax.txt
blob: bcca2be9204ca7b7e01e9f04c75ab2301636dcdd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
= Encoding =

UTF-8

= Rule syntax =

pattern -> replacement # message

or (see Conditions)

pattern <- condition -> replacement # message

or

pattern <- condition -> = replacement # = expression_for_message
pattern <- condition -> = expression_to_generate_replacement_string # message
pattern <- condition -> = expression_to_generate_replacement_string # = expression_for_message


Basically pattern and replacement will be the parameters of the
standard Python re.sub() regular expression function (see also
Python regex module documentation for regular expression syntax:
http://docs.python.org/library/re.html).

Example 0. Report "foo" in the text and suggest "bar":

foo -> bar # Use bar instead of foo.

Example 1. Recognize and suggest missing hyphen:

foo bar -> foo-bar # Missing hyphen.

= Rule Sections =

Example 2. Recognize double or more spaces and suggests a single space:

[char]

"  +" -> " " # Extra space.

The line [char] changes the default word-level rules to character-level ones.
Use [Word] to change back to the (case-insensitive) word-level rules.
Also [word] is for the case-sensitive word-level rules, and [Char] for the
case-insensitive character-level rules.

ASCII " characters protect spaces in the pattern and in the replacement text.
Plus sign means 1 or more repetitions of the previous space.

= Other examples =

Example 3. Suggest a word with correct quotation marks:

\"(\w+)\" -> “\1” # Correct quotation marks.

(Here \" is an ASCII quotation mark, \w means an arbitrary letter,
+ means 1 or more repetitions of the previous object,
The parentheses define a regex group (the word). In the
replacement, \1 is a reference to the (first) group of the pattern.)

Example 4. Suggest the missing space after the !, ? or . signs:

\b([?!.])([a-zA-Z]+) -> \1 \2 # Missing space?

\b is the zero-length word boundary regex notation, so
\b signs the end and the begin of the words.

The [ and ] define a character pattern, the replacement will contain
the actual matching character (?, ! or .), a space and the word after
the punctuation character.
Note: ? and . characters have special meanings in regular expressions,
use [?] or [.] patterns to check "?" and "." signs in the text.

== Multiple suggestions ==

Use | or \n (new line) in the replacement text to add multiple suggestions:

foo -> Foo|FOO|Bar|BAR # Did you mean:

(Foo, FOO, Bar and BAR suggestions for the input word "foo")

= Expressions in the suggestions =

Suggestions (and warning messages) started by an equal sign are Python string expressions
extended with possible back references and named definitions:

Example:

foo\w+ -> = '"' + \0.upper() + '"' # With uppercase letters and quoation marks

All words beginning with "foo" will be recognized, and the suggestion is
the uppercase form of the string with ASCII quoation marks: eg. foom -> "FOOM".

== No suggestion ==

You can display message without making suggestions. For this purpose, use a single character _ in the replacement field.

Example:

foobar -> _ # Message

== Avoid automatic capitalization of suggestions ==

Start the suggestion with the character sequence "!CASE!":

Foo -> !CASE!foo # Did you mean:

or in multiple suggestions

Foo -> FOO|!CASE!foo # Did you mean:

(FOO and foo suggestions for the input word "Foo", instead
of FOO and the capitalized Foo)

== Longer explanations ==

Warning messages can contain optional URLs for longer explanations separated by "|" or "\n":

(your|her|our|their)['’]s -> \1s # Possessive pronoun:|http://en.wikipedia.org/wiki/Possessive_pronoun

== Back references in explanations ==

(fooo) bar -> foo bar # “\1” should be:

== Default variables ==

LOCALE

It contains the current locale of the checked paragraph. Its fields:
For en-US LOCALE.Language = "en" and LOCALE.Country = "US", eg.

colour <- LOCALE.Language == "US" -> color # Use American English spelling.

TEXT

Full text of the checked paragraph.

== Name definitions ==

Lightproof supports name definitions to simplify the
description of the complex rules.

Definition:

name pattern # name definition

Usage in the rules:

"{name} " -> "{name}. " # Missing dot?

{Name}s in the first part of the rules mean
subpatterns (groups). {Name}s in the second
part of the rules mean back references to the
matched texts of the subpatterns.

Example: thousand markers (10000 -> 10,000 or 10 000)

# definitions
d \d\d\d	# name definition: 3 digits
d2 \d\d		# 2 digits
D \d{1,3}	# 1, 2 or 3 digits

# rules
# ISO thousand marker: space, here: no-break space (U+00A0)
{d2}{d} -> {d2},{d}|{d2} {d}           # Use thousand marker (common or ISO).
{D}{d}{d} -> {D},{d},{d}|{D} {d} {d}   # Use thousand markers (common or ISO).

Note: Lightproof uses named groups for name definitions and
their references, adding a hidden number to the group names
in the form of "_n". You can use these explicit names in the replacement:

{d2}{d} -> {d2_1},{d_1}|{d2_1} {d_1}	# Use thousand marker (common or ISO).
{D}{d}{d} -> {D_1},{d_1},{d_2}|{D_1} {d_1} {d_2} # Use thousand markers (common or ISO).

Note: back references of name definitions are zeroed after new line
characters, see this and the following example:

E ( |$)                       # name definition: space or end of sentence
"\b[.][.]{E}" -> .{E}|…{E}   # Period or ellipsis?

See src/en/en.dat for more examples.

= Error positioning =

By default, the full pattern will be underlined with blue.
You can shorten the underlined text area by specifying a back reference group of the pattern.
Instead of writing ->, write -n>  n being the number of a back reference group.
Actually,  ->  is similar to  -0>

Example
(ying) and yang -1> yin # Did you mean:

== Comparison ==

Rule 1:
ying and yang -> yin and yang # Did you mean:

Rule 2:
(ying) and yang -1> yin # Did you mean:

With the rule 1, the full pattern is underlined:
    ying and yang
    ^^^^^^^^^^^^^

With the rule 2, only the first back reference group is underlined:
    ying and yang
    ^^^^

= Conditions =

A Lightproof condition is a Python condition with some modifications:
the \0..\9 regex notations and the Lightproof {name} notations in the condition will be
replaced by the matched subpatterns. For example, the rule

\w+ <- \0 == "foo" -> Foo # Foo is a capitalized word.

is equivalent of the following rule:

foo -> Foo # Foo is a capitalized word.

== Standard functions ==

There are some default function for the rule conditions.


word(n) or word(-n):

The function word(n) returns the Nth word (separated only by white spaces)
before or after the matching pattern, or None, if this word doesn't exist.


morph(word, regex pattern):
morph(word, regex pattern, all):

The function morph returns a matching subpattern of the morphological analysis
of the input word or None, if the pattern won't match all items of the
analysis of the input word. For example, the rule

\ban ([a-z]\w+) <- morph(\1, "(po:verb|is:plural)") -> and \1 # Missing letter?

will find the word "an" followed by a not capitalized verb or a plural noun (the notation depends from the morphological data of
the Hunspell dictionary).

The optional argument can modify the default "all" mode to "if exists", using
the False value:

morph(word, regex pattern, False):

stem(word):

The function returns an arraw with the stems of the input word.

Usage:

(\w+) <- "foo" in stem(\1) -> bar # One of the stem of the word is "foo"

(\w+) <- stem(\1) == ["foo"] -> bar # The word has got only one stem, "foo".



affix(word, regex pattern):
affix(word, regex pattern, all):

Variant of morph: it filters the affix fields from the result of the analysis
before matching the pattern.

The optional argument can modify the default "all" mode to "if exists", using
the False value:

affix(word, regex pattern, False):


calc(functionname, functionparameters):

Access to the Calc functions. Functionparameters is a tuple with the parameter
of the Calc function:

calc("CONCATENATE", ("string1", "string2"))


generate(word, example_word):

Morphological generation by example, eg. the result of generate("mouse",
"rodents") is ["mice"] with the en_US English dictionary. (See also
Hunspell (4) manual page for morphological generation.)

option(optionname):

Return the Boolean value of the option (see doc/dialog.txt).

== Multi-line rules ==

Rules can be break to multiple lines by leading tabulators:

pattern <- condition
	# only comment
	-> replacement
	# message (last comment)

== User code support ==

Use [code] sections to add your own Python functions for the rules:

Example (suggesting uppercase form for all words with underline character,
for example hello_world -> HELLO_WORLD)

[code]

def u(s):
    return s.upper()

[Word]

# suggest uppercase form for all words with underline character

\w+_\w+ -> =u(\0) # Use uppercase form

(In fact, this is equivalent of the following rule:

\w+_\w+ -> =\0.upper() # Use uppercase form)

See English rules (src/en/en.dat) for more examples, eg. precompiled regular
expressions for sentence checking, sets to handle more irregular words etc.

= Typical problems =

== Encoding ==

Python expressions (< Python 3.0) need explicit Unicode declaration for non-ASCII
characters:

fó -> bár # example

is equivalent of the following rule (see u'string' instead of 'string')

fó -> = u'bár' # example

== Pattern matching ==

Repeating pattern matching of a single rule continues after the previous matching, so
instead of general multiword patterns, like

(\w+) (\w+) <- some_check(\1, \2) -> \1, \2 # foo

use

(\w+) <- some_check(\1, word(1)) -> \1, # foo