diff options
author | László Németh <nemeth@numbertext.org> | 2012-02-02 12:17:44 +0100 |
---|---|---|
committer | László Németh <nemeth@numbertext.org> | 2012-02-02 12:17:44 +0100 |
commit | 1a46fc1103856acae202f047528d1bbd837220e4 (patch) | |
tree | f78ec351f747ecfc76c685547f2bd97fe0a4be5b /doc |
Initial release
Diffstat (limited to 'doc')
-rw-r--r-- | doc/dialog.txt | 52 | ||||
-rw-r--r-- | doc/manual.txt | 69 | ||||
-rw-r--r-- | doc/syntax.txt | 296 |
3 files changed, 417 insertions, 0 deletions
diff --git a/doc/dialog.txt b/doc/dialog.txt new file mode 100644 index 0000000..e67f4aa --- /dev/null +++ b/doc/dialog.txt @@ -0,0 +1,52 @@ +# 1. Copy this template to the data folder, under the name of your locale +# eg. en_US.dlg. +# +# 2. Define your group ids and option ids and the localized title +# texts (see later for the syntax). +# +# Syntax of the dialog data file: +# +# Options and title texts for the Settings and conditional rules +# +# The Lightproof dialogs contain only grouped checkboxes. +# +# Format of the dialog definition: +# +# GroupID: OptionID [OptionsInTheSameLines_or_hyphen_placeholder], OptionID ... +# Group2ID: OptionID, OptionID ... +# ... +# [Language_code=title of the window] +# GroupID=title of the group +# OptionID=title of the option [\n tooltip] +# Option2ID=title of the option [\n tooltip] +# +# The first language is the default language for the other locales +# (use en_US or the common language of your country) +# +# The OptionIDs are used in the rules, too. For example: +# +# foo <- option("style") -> bar # bar is far better +# +# this rule depends from the state of the "style" checkbox. + +# options + +spelling: comma, hyphen +proofreading: style, moregrammar + +# titles + +[en_US=Hungarian grammar checking] + +spelling=Spelling +comma=Comma usage +hyphen=Check compound words with hyphen +proofreading=Proofreading +style=Style checking +moregrammar=Use grammar rules with ambiguous word analysis + +[yourlocale=yourtitle] + +spelling=... +wordpart=... +... diff --git a/doc/manual.txt b/doc/manual.txt new file mode 100644 index 0000000..96cd096 --- /dev/null +++ b/doc/manual.txt @@ -0,0 +1,69 @@ += Making a new grammar checker for LibreOffice/OpenOffice.org = + +== Rule development == + +Use Lightproof editor (LibreOffice Writer toolbar extension) to create a rule +(Lightproof .dat) file. See doc/syntax.txt for more informations. + +After LibreOffice integration (see later) switch off the integrated grammar +checker component in the Tools » Options » Language Settings » Writing Aids » +Available language modules, eg. "Lightproof Grammar Checker (en)", and switch +on the component "Lightproof editor" and *restart* LibreOffice for rule +development. + +There is no visual editor for optional features, yet (dlg file), but you can +compile and test all optional rules in the editor with the Apply all rules icon. +Also copying your rule and dialog definition files to the src/editor folder, +you can compile an extended editor extension with the +following command: python make.py src/editor/editor.cfg + +Note: the default locales of the editor (supported languages of the edited rule +document) cover only the languages with Hunspell dictionaries. Choose such +languages for editing in Writer (eg. Esperanto), or add your language to the +"locales" section of the src/editor/editor.cfg and compile the editor extension. + +== Extension development == + +1. Copy src/en/en.cfg, dat and dlg files under the name src/your_locale/your_locale.cfg, dat and dlg. + +2. Set cfg, translate messages, dat and dlg files (see doc/dialog.txt). + +3. Compile and install your data (see README) + +== LibreOffice integration == + +Short description about the Lightproof integration with the dictionary +extensions (it is not an automatic process yet): + +1. Copy dialog/* and pythonpath/* directories and Lightproof.py, Linguistic.xcu +of your Lightproof extension to the dictionaries/your_language/ directory of +LibreOffice. (Note: pythonpath/lightproof_compiler* doesn't need for the +integration.) + +2. Extend COMPONENT_FILES variable of makefile.mk with these new files. + +3. Extend manifest.xml with the following elements: + + <manifest:file-entry manifest:full-path="dialog/OptionsDialog.xcs" + +manifest:media-type="application/vnd.sun.star.configuration-schema" /> + <manifest:file-entry manifest:full-path="dialog/OptionsDialog.xcu" + +manifest:media-type="application/vnd.sun.star.configuration-data" /> + <manifest:file-entry +manifest:media-type="application/vnd.sun.star.uno-component;type=Python" + manifest:full-path="Lightproof.py"/> + <manifest:file-entry + +manifest:media-type="application/vnd.sun.star.configuration-data" + manifest:full-path="Linguistic.xcu" /> + +4. Change extension ID of dialog/OptionsDialog.xcu to the ID of the dictionary +extension (see in description.xml): + +<prop oor:name="Id"> + <value>org.openoffice.en.hunspell.dictionaries</value> +</prop> + + + diff --git a/doc/syntax.txt b/doc/syntax.txt new file mode 100644 index 0000000..12da666 --- /dev/null +++ b/doc/syntax.txt @@ -0,0 +1,296 @@ += Encoding = + +UTF-8 + += Rule syntax = + +pattern -> replacement # message + +or (see Conditions) + +pattern <- condition -> replacement # message + +or + +pattern <- condition -> = replacement # = expression_for_message +pattern <- condition -> = expression_to_generate_replacement_string # message +pattern <- condition -> = expression_to_generate_replacement_string # = expression_for_message + + +Basically pattern and replacement will be the parameters of the +standard Python re.sub() regular expression function (see also +Python regex module documentation for regular expression syntax: +http://docs.python.org/library/re.html). + +Example 0. Report "foo" in the text and suggest "bar": + +foo -> bar # Use bar instead of foo. + +Example 1. Recognize and suggest missing hyphen: + +foo bar -> foo-bar # Missing hyphen. + += Rule Sections = + +Example 2. Recognize double or more spaces and suggests a single space: + +[char] + +" +" -> " " # Extra space. + +The line [char] changes the default word-level rules to character-level ones. +Use [Word] to change back to the (case-insensitive) word-level rules. +Also [word] is for the case-sensitive word-level rules, and [Char] for the +case-insensitive character-level rules. + +ASCII " characters protect spaces in the pattern and in the replacement text. +Plus sign means 1 or more repetitions of the previous space. + += Other examples = + +Example 3. Suggest a word with correct quotation marks: + +\"(\w+)\" -> “\1” # Correct quotation marks. + +(Here \" is an ASCII quotation mark, \w means an arbitrary letter, ++ means 1 or more repetitions of the previous object, +The parentheses define a regex group (the word). In the +replacement, \1 is a reference to the (first) group of the pattern.) + +Example 4. Suggest the missing space after the !, ? or . signs: + +\b([?!.])([a-zA-Z]+) -> \1 \2 # Missing space? + +\b is the zero-length word boundary regex notation, so +\b signs the end and the begin of the words. + +The [ and ] define a character pattern, the replacement will contain +the actual matching character (?, ! or .), a space and the word after +the punctuation character. +Note: ? and . characters have special meanings in regular expressions, +use [?] or [.] patterns to check "?" and "." signs in the text. + +== Multiple suggestions == + +Use \n (new line) in the replacement text to add multiple suggestions: + +foo -> Foo\nFOO\nBar\nBAR # Did you mean: + +(Foo, FOO, Bar and BAR suggestions for the input word "foo") + += Expressions in the suggestions = + +Suggestions (and warning messages) started by an equal sign are Python string expressions +extended with possible back references and named definitions: + +Example: + +foo\w+ -> = '"' + \0.upper() + '"' # With uppercase letters and quoation marks + +All words beginning with "foo" will be recognized, and the suggestion is +the uppercase form of the string with ASCII quoation marks: eg. foom -> "FOOM". + +== Longer explanations == + +Warning messages can contain optional URLs for longer explanations separated by "\n": + +(your|her|our|their)['’]s -> \1s # Possessive pronoun: \n http://en.wikipedia.org/wiki/Possessive_pronoun + +== Default variables == + +LOCALE + +It contains the current locale of the checked paragraph. Its fields: +For en-US LOCALE.Language = "en" and LOCALE.Country = "US", eg. + +colour <- LOCALE.Language == "US" -> color # Use American English spelling. + +TEXT + +Full text of the checked paragraph. + +== Name definitions == + +Lightproof supports name definitions to simplify the +description of the complex rules. + +Definition: + +name pattern # name definition + +Usage in the rules: + +"{name} " -> "{name}. " # Missing dot? + +{Name}s in the first part of the rules mean +subpatterns (groups). {Name}s in the second +part of the rules mean back references to the +matched texts of the subpatterns. + +Example: thousand markers (10000 -> 10,000 or 10 000) + +# definitions +d \d\d\d # name definition: 3 digits +d2 \d\d # 2 digits +D \d{1,3} # 1, 2 or 3 digits + +# rules +# ISO thousand marker: space, here: no-break space (U+00A0) +{d2}{d} -> {d2},{d}\n{d2} {d} # Use thousand marker (common or ISO). +{D}{d}{d} -> {D},{d},{d}\n{D} {d} {d} # Use thousand markers (common or ISO). + +Note: Lightproof uses named groups for name definitions and +their references, adding a hidden number to the group names +in the form of "_n". You can use these explicit names in the replacement: + +{d2}{d} -> {d2_1},{d_1}\n{d2_1} {d_1} # Use thousand marker (common or ISO). +{D}{d}{d} -> {D_1},{d_1},{d_2}\n{D_1} {d_1} {d_2} # Use thousand markers (common or ISO). + +Note: back references of name definitions are zeroed after new line +characters, see this and the following example: + +E ( |$) # name definition: space or end of sentence +"\b[.][.]{E}" -> .{E}\n…{E} # Period or ellipsis? + +See src/en/en.dat for more examples. + += Conditions = + +A Lightproof condition is a Python condition with some modifications: +the \0..\9 regex notations and the Lightproof {name} notations in the condition will be +replaced by the matched subpatterns. For example, the rule + +\w+ <- \0 == "foo" -> Foo # Foo is a capitalized word. + +is equivalent of the following rule: + +foo -> Foo # Foo is a capitalized word. + +== Standard functions == + +There are some default function for the rule conditions. + + +word(n) or word(-n): + +The function word(n) returns the Nth word (separated only by white spaces) +before or after the matching pattern, or None, if this word doesn't exist. + + +morph(word, regex pattern): +morph(word, regex pattern, all): + +The function morph returns a matching subpattern of the morphological analysis +of the input word or None, if the pattern won't match all items of the +analysis of the input word. For example, the rule + +\ban ([a-z]\w+) <- morph(\1, "(po:verb|is:plural)") -> and \1 # Missing letter? + +will find the word "an" followed by a not capitalized verb or a plural noun (the notation depends from the morphological data of +the Hunspell dictionary). + +The optional argument can modify the default "all" mode to "if exists", using +the False value: + +morph(word, regex pattern, False): + +stem(word): + +The function returns an arraw with the stems of the input word. + +Usage: + +(\w+) <- "foo" in stem(\1) -> bar # One of the stem of the word is "foo" + +(\w+) <- stem(\1) == ["foo"] -> bar # The word has got only one stem, "foo". + + + +affix(word, regex pattern): +affix(word, regex pattern, all): + +Variant of morph: it filters the affix fields from the result of the analysis +before matching the pattern. + +The optional argument can modify the default "all" mode to "if exists", using +the False value: + +affix(word, regex pattern, False): + + +calc(functionname, functionparameters): + +Access to the Calc functions. Functionparameters is a tuple with the parameter +of the Calc function: + +calc("CONCATENATE", ("string1", "string2")) + + +generate(word, example_word): + +Morphological generation by example, eg. the result of generate("mouse", +"rodents") is ["mice"] with the en_US English dictionary. (See also +Hunspell (4) manual page for morphological generation.) + +option(optionname): + +Return the Boolean value of the option (see doc/dialog.txt). + +== Multi-line rules == + +Rules can be break to multiple lines by leading tabulators: + +pattern <- condition + # only comment + -> replacement + # message (last comment) + +== User code support == + +Use [code] sections to add your own Python functions for the rules: + +Example (suggesting uppercase form for all words with underline character, +for example hello_world -> HELLO_WORLD) + +[code] + +def u(s): + return s.upper() + +[Word] + +# suggest uppercase form for all words with underline character + +\w+_\w+ -> =u(\0) # Use uppercase form + +(In fact, this is equivalent of the following rule: + +\w+_\w+ -> =\0.upper() # Use uppercase form) + +See English rules (src/en/en.dat) for more examples, eg. precompiled regular +expressions for sentence checking, sets to handle more irregular words etc. + += Typical problems = + +== Encoding == + +Python expressions (< Python 3.0) need explicit Unicode declaration for non-ASCII +characters: + +fó -> bár # example + +is equivalent of the following rule (see u'string' instead of 'string') + +fó -> = u'bár' # example + +== Pattern matching == + +Repeating pattern matching of a single rule continues after the previous matching, so +instead of general multiword patterns, like + +(\w+) (\w+) <- some_check(\1, \2) -> \1, \2 # foo + +use + +(\w+) <- some_check(\1, word(1)) -> \1, # foo + |