Initial release

author: László Németh <nemeth@numbertext.org> 2012-02-02 12:17:44 +0100
committer: László Németh <nemeth@numbertext.org> 2012-02-02 12:17:44 +0100
commit: 1a46fc1103856acae202f047528d1bbd837220e4 (patch)
tree: f78ec351f747ecfc76c685547f2bd97fe0a4be5b /doc
3 files changed, 417 insertions, 0 deletions
diff --git a/doc/dialog.txt b/doc/dialog.txt
new file mode 100644
index 0000000..e67f4aa
--- /dev/null
+++ b/doc/dialog.txt
@@ -0,0 +1,52 @@
+# 1. Copy this template to the data folder, under the name of your locale
+#    eg. en_US.dlg.
+#
+# 2. Define your group ids and option ids and the localized title 
+#    texts (see later for the syntax).
+#
+# Syntax of the dialog data file:
+# 
+# Options and title texts for the Settings and conditional rules
+#
+# The Lightproof dialogs contain only grouped checkboxes.
+#
+# Format of the dialog definition:
+#
+# GroupID: OptionID [OptionsInTheSameLines_or_hyphen_placeholder], OptionID ...
+# Group2ID: OptionID, OptionID ...
+# ...
+# [Language_code=title of the window]
+# GroupID=title of the group
+# OptionID=title of the option [\n tooltip]
+# Option2ID=title of the option [\n tooltip]
+#
+# The first language is the default language for the other locales
+# (use en_US or the common language of your country)
+#
+# The OptionIDs are used in the rules, too. For example:
+#
+# foo <- option("style") -> bar # bar is far better
+#
+# this rule depends from the state of the "style" checkbox.
+
+# options
+
+spelling: comma, hyphen
+proofreading: style, moregrammar
+
+# titles
+
+[en_US=Hungarian grammar checking]
+
+spelling=Spelling
+comma=Comma usage
+hyphen=Check compound words with hyphen
+proofreading=Proofreading
+style=Style checking
+moregrammar=Use grammar rules with ambiguous word analysis
+
+[yourlocale=yourtitle]
+
+spelling=...
+wordpart=...
+...
diff --git a/doc/manual.txt b/doc/manual.txt
new file mode 100644
index 0000000..96cd096
--- /dev/null
+++ b/doc/manual.txt
@@ -0,0 +1,69 @@
+= Making a new grammar checker for LibreOffice/OpenOffice.org =
+
+== Rule development ==
+
+Use Lightproof editor (LibreOffice Writer toolbar extension) to create a rule
+(Lightproof .dat) file. See doc/syntax.txt for more informations.
+
+After LibreOffice integration (see later) switch off the integrated grammar
+checker component in the Tools » Options » Language Settings » Writing Aids »
+Available language modules, eg. "Lightproof Grammar Checker (en)", and switch
+on the component "Lightproof editor" and *restart* LibreOffice for rule
+development.
+
+There is no visual editor for optional features, yet (dlg file), but you can
+compile and test all optional rules in the editor with the Apply all rules icon.
+Also copying your rule and dialog definition files to the src/editor folder,
+you can compile an extended editor extension with the
+following command: python make.py src/editor/editor.cfg
+
+Note: the default locales of the editor (supported languages of the edited rule
+document) cover only the languages with Hunspell dictionaries. Choose such
+languages for editing in Writer (eg. Esperanto), or add your language to the
+"locales" section of the src/editor/editor.cfg and compile the editor extension.
+
+== Extension development ==
+
+1. Copy src/en/en.cfg, dat and dlg files under the name src/your_locale/your_locale.cfg, dat and dlg.
+
+2. Set cfg, translate messages, dat and dlg files (see doc/dialog.txt).
+
+3. Compile and install your data (see README)
+
+== LibreOffice integration ==
+
+Short description about the Lightproof integration with the dictionary
+extensions (it is not an automatic process yet):
+
+1. Copy dialog/* and pythonpath/* directories and Lightproof.py, Linguistic.xcu
+of your Lightproof extension to the dictionaries/your_language/ directory of
+LibreOffice. (Note: pythonpath/lightproof_compiler* doesn't need for the
+integration.)
+
+2. Extend COMPONENT_FILES variable of makefile.mk with these new files.
+
+3. Extend manifest.xml with the following elements:
+
+    <manifest:file-entry manifest:full-path="dialog/OptionsDialog.xcs"
+               
+manifest:media-type="application/vnd.sun.star.configuration-schema" />
+        <manifest:file-entry manifest:full-path="dialog/OptionsDialog.xcu"
+               
+manifest:media-type="application/vnd.sun.star.configuration-data" />
+        <manifest:file-entry
+manifest:media-type="application/vnd.sun.star.uno-component;type=Python"
+                manifest:full-path="Lightproof.py"/>
+        <manifest:file-entry
+               
+manifest:media-type="application/vnd.sun.star.configuration-data"
+                manifest:full-path="Linguistic.xcu" />
+
+4. Change extension ID of dialog/OptionsDialog.xcu to the ID of the dictionary
+extension (see in description.xml):
+
+<prop oor:name="Id">
+    <value>org.openoffice.en.hunspell.dictionaries</value>
+</prop>
+
+
+
diff --git a/doc/syntax.txt b/doc/syntax.txt
new file mode 100644
index 0000000..12da666
--- /dev/null
+++ b/doc/syntax.txt
@@ -0,0 +1,296 @@
+= Encoding =
+
+UTF-8
+
+= Rule syntax =
+
+pattern -> replacement # message
+
+or (see Conditions)
+
+pattern <- condition -> replacement # message
+
+or
+
+pattern <- condition -> = replacement # = expression_for_message
+pattern <- condition -> = expression_to_generate_replacement_string # message
+pattern <- condition -> = expression_to_generate_replacement_string # = expression_for_message
+
+
+Basically pattern and replacement will be the parameters of the
+standard Python re.sub() regular expression function (see also
+Python regex module documentation for regular expression syntax:
+http://docs.python.org/library/re.html).
+
+Example 0. Report "foo" in the text and suggest "bar":
+
+foo -> bar # Use bar instead of foo.
+
+Example 1. Recognize and suggest missing hyphen:
+
+foo bar -> foo-bar # Missing hyphen.
+
+= Rule Sections =
+
+Example 2. Recognize double or more spaces and suggests a single space:
+
+[char]
+
+"  +" -> " " # Extra space.
+
+The line [char] changes the default word-level rules to character-level ones.
+Use [Word] to change back to the (case-insensitive) word-level rules.
+Also [word] is for the case-sensitive word-level rules, and [Char] for the
+case-insensitive character-level rules.
+
+ASCII " characters protect spaces in the pattern and in the replacement text.
+Plus sign means 1 or more repetitions of the previous space.
+
+= Other examples =
+
+Example 3. Suggest a word with correct quotation marks:
+
+\"(\w+)\" -> “\1” # Correct quotation marks.
+
+(Here \" is an ASCII quotation mark, \w means an arbitrary letter,
++ means 1 or more repetitions of the previous object,
+The parentheses define a regex group (the word). In the
+replacement, \1 is a reference to the (first) group of the pattern.)
+
+Example 4. Suggest the missing space after the !, ? or . signs:
+
+\b([?!.])([a-zA-Z]+) -> \1 \2 # Missing space?
+
+\b is the zero-length word boundary regex notation, so
+\b signs the end and the begin of the words.
+
+The [ and ] define a character pattern, the replacement will contain
+the actual matching character (?, ! or .), a space and the word after
+the punctuation character.
+Note: ? and . characters have special meanings in regular expressions,
+use [?] or [.] patterns to check "?" and "." signs in the text.
+
+== Multiple suggestions ==
+
+Use \n (new line) in the replacement text to add multiple suggestions:
+
+foo -> Foo\nFOO\nBar\nBAR # Did you mean:
+
+(Foo, FOO, Bar and BAR suggestions for the input word "foo")
+
+= Expressions in the suggestions =
+
+Suggestions (and warning messages) started by an equal sign are Python string expressions
+extended with possible back references and named definitions:
+
+Example:
+
+foo\w+ -> = '"' + \0.upper() + '"' # With uppercase letters and quoation marks
+
+All words beginning with "foo" will be recognized, and the suggestion is
+the uppercase form of the string with ASCII quoation marks: eg. foom -> "FOOM".
+
+== Longer explanations ==
+
+Warning messages can contain optional URLs for longer explanations separated by "\n":
+
+(your|her|our|their)['’]s -> \1s # Possessive pronoun: \n http://en.wikipedia.org/wiki/Possessive_pronoun
+
+== Default variables ==
+
+LOCALE
+
+It contains the current locale of the checked paragraph. Its fields:
+For en-US LOCALE.Language = "en" and LOCALE.Country = "US", eg.
+
+colour <- LOCALE.Language == "US" -> color # Use American English spelling.
+
+TEXT
+
+Full text of the checked paragraph.
+
+== Name definitions ==
+
+Lightproof supports name definitions to simplify the
+description of the complex rules.
+
+Definition:
+
+name pattern # name definition
+
+Usage in the rules:
+
+"{name} " -> "{name}. " # Missing dot?
+
+{Name}s in the first part of the rules mean
+subpatterns (groups). {Name}s in the second
+part of the rules mean back references to the
+matched texts of the subpatterns.
+
+Example: thousand markers (10000 -> 10,000 or 10 000)
+
+# definitions
+d \d\d\d	# name definition: 3 digits
+d2 \d\d		# 2 digits
+D \d{1,3}	# 1, 2 or 3 digits
+
+# rules
+# ISO thousand marker: space, here: no-break space (U+00A0)
+{d2}{d} -> {d2},{d}\n{d2} {d}           # Use thousand marker (common or ISO).
+{D}{d}{d} -> {D},{d},{d}\n{D} {d} {d}   # Use thousand markers (common or ISO).
+
+Note: Lightproof uses named groups for name definitions and
+their references, adding a hidden number to the group names
+in the form of "_n". You can use these explicit names in the replacement:
+
+{d2}{d} -> {d2_1},{d_1}\n{d2_1} {d_1}	# Use thousand marker (common or ISO).
+{D}{d}{d} -> {D_1},{d_1},{d_2}\n{D_1} {d_1} {d_2} # Use thousand markers (common or ISO).
+
+Note: back references of name definitions are zeroed after new line
+characters, see this and the following example:
+
+E ( |$)                       # name definition: space or end of sentence
+"\b[.][.]{E}" -> .{E}\n…{E}   # Period or ellipsis?
+
+See src/en/en.dat for more examples.
+
+= Conditions =
+
+A Lightproof condition is a Python condition with some modifications:
+the \0..\9 regex notations and the Lightproof {name} notations in the condition will be
+replaced by the matched subpatterns. For example, the rule
+
+\w+ <- \0 == "foo" -> Foo # Foo is a capitalized word.
+
+is equivalent of the following rule:
+
+foo -> Foo # Foo is a capitalized word.
+
+== Standard functions ==
+
+There are some default function for the rule conditions.
+
+
+word(n) or word(-n):
+
+The function word(n) returns the Nth word (separated only by white spaces)
+before or after the matching pattern, or None, if this word doesn't exist.
+
+
+morph(word, regex pattern):
+morph(word, regex pattern, all):
+
+The function morph returns a matching subpattern of the morphological analysis
+of the input word or None, if the pattern won't match all items of the
+analysis of the input word. For example, the rule
+
+\ban ([a-z]\w+) <- morph(\1, "(po:verb|is:plural)") -> and \1 # Missing letter?
+
+will find the word "an" followed by a not capitalized verb or a plural noun (the notation depends from the morphological data of
+the Hunspell dictionary).
+
+The optional argument can modify the default "all" mode to "if exists", using
+the False value:
+
+morph(word, regex pattern, False):
+
+stem(word):
+
+The function returns an arraw with the stems of the input word.
+
+Usage:
+
+(\w+) <- "foo" in stem(\1) -> bar # One of the stem of the word is "foo"
+
+(\w+) <- stem(\1) == ["foo"] -> bar # The word has got only one stem, "foo".
+
+
+
+affix(word, regex pattern):
+affix(word, regex pattern, all):
+
+Variant of morph: it filters the affix fields from the result of the analysis
+before matching the pattern.
+
+The optional argument can modify the default "all" mode to "if exists", using
+the False value:
+
+affix(word, regex pattern, False):
+
+
+calc(functionname, functionparameters):
+
+Access to the Calc functions. Functionparameters is a tuple with the parameter
+of the Calc function:
+
+calc("CONCATENATE", ("string1", "string2"))
+
+
+generate(word, example_word):
+
+Morphological generation by example, eg. the result of generate("mouse",
+"rodents") is ["mice"] with the en_US English dictionary. (See also
+Hunspell (4) manual page for morphological generation.)
+
+option(optionname):
+
+Return the Boolean value of the option (see doc/dialog.txt).
+
+== Multi-line rules ==
+
+Rules can be break to multiple lines by leading tabulators:
+
+pattern <- condition
+	# only comment
+	-> replacement
+	# message (last comment)
+
+== User code support ==
+
+Use [code] sections to add your own Python functions for the rules:
+
+Example (suggesting uppercase form for all words with underline character,
+for example hello_world -> HELLO_WORLD)
+
+[code]
+
+def u(s):
+    return s.upper()
+
+[Word]
+
+# suggest uppercase form for all words with underline character
+
+\w+_\w+ -> =u(\0) # Use uppercase form
+
+(In fact, this is equivalent of the following rule:
+
+\w+_\w+ -> =\0.upper() # Use uppercase form)
+
+See English rules (src/en/en.dat) for more examples, eg. precompiled regular
+expressions for sentence checking, sets to handle more irregular words etc.
+
+= Typical problems =
+
+== Encoding ==
+
+Python expressions (< Python 3.0) need explicit Unicode declaration for non-ASCII
+characters:
+
+fó -> bár # example
+
+is equivalent of the following rule (see u'string' instead of 'string')
+
+fó -> = u'bár' # example
+
+== Pattern matching ==
+
+Repeating pattern matching of a single rule continues after the previous matching, so
+instead of general multiword patterns, like
+
+(\w+) (\w+) <- some_check(\1, \2) -> \1, \2 # foo
+
+use
+
+(\w+) <- some_check(\1, word(1)) -> \1, # foo
+
author	László Németh <nemeth@numbertext.org>	2012-02-02 12:17:44 +0100
committer	László Németh <nemeth@numbertext.org>	2012-02-02 12:17:44 +0100
commit	1a46fc1103856acae202f047528d1bbd837220e4 (patch)
tree	f78ec351f747ecfc76c685547f2bd97fe0a4be5b /doc