Provides the core functionality for spell-checking documents
Package Specification
This package provides the interfaces for the notions of dictionary, edit distance, phonetic hash,
spell event and spell-check iterator. For most of these interfaces a default implementation
for english languages is provided. These implementations can be reused in custom dictionaries or
spell-check iterators, or replaced by more specialized algorithms for a particular group of languages.
Spell Check Engine
The central point to access the spell-checker functionality is the interface ISpellCheckEngine.
Implementations of this interface provide support for life-cycle management, registering and unregistering
dictionaries, changing the locale of the engine and creating a spell-checker for a specific language.
The following steps are needed to obtain a spell-checker for a specific language:
- Create an instance of ISpellCheckEngine. In this package, no default implementation is provided,
since the management of the dictionary registering and loading is application dependent. Usually, instances
of ISpellCheckEngine are implemented as singletons.
- Create the appropriate dictionaries that should be used during the spell-check process. All dictionaries that
can be registered with ISpellCheckEngine must implement the interface ISpellCheckDictionary.
For this interface, an abstract implementation is provided in the class AbstractSpellDictionary.
Depending on the language of the words contained in this dictionary, custom algorithms for the phonetic hash
(IPhoneticHashProvider) and the edit distance (IPhoneticDistanceAlgorithm) should be implemented
and registered with the dictionary.
- Instances of spell-checkers can now be created by calling createSpellChecker(Locale), where the locale
denotes the language that the spell-checker should use while executing.
When requesting a new spell-checker with a different locale via createSpellChecker(Locale), the spell-checker is
reconfigured with the new dictionaries. More concretely, the old dictionary is unregistered and a new one registered for the
desired locale is associated with the spell-checker. If no such dictionary is available, no spell-checker is returned and
the locale of the engine is reset to its default locale.
Dictionaries
Dictionaries are the data structures to hold word lists for a particular language. All implementations of dictionaries must
implement the interface ISpellDictionary. It provides support for life-cycle management as well as the facility to query
words from the list, add words to the list and get correction proposals for incorrectly spelt words.
This package provides a default implementation of a dictionary (AbstractSpellDictionary) that uses algorithms
convenient for english languages.
Every dictionary needs two kinds of algorithms to be plugged in:
- An edit distance algorithm: Edit distance algorithms implement the interface IPhoneticDistanceAlgorithm. The algorithm
is used to determine the similarity between two words. This package provides a default implementation for languages using the latin alphabet (DefaultPhoneticDistanceAlgorithm).
The default algorithm uses the Levenshtein text edit distance.
- A hash algorithm: Phonetic hash providers implement the interface IPhoneticHashProvider. The purpose of
phonetic hashes is to have a representation of words which allows comparing it to other, similar words. This package provides a default
implementation which is convenient for slavic and english languages. It uses the double metaphone algorithm by published
Lawrence Philips.
By plugging in custom implementations of one or both of these algorithms the abstract implementation AbstractSpellDictionary can
be customized to specified languages and alphabets.
Spell Check Iterators
Instances of ISpellChecker are usually language-, locale- and medium independent implementations and therefore need an input provider. The
interface ISpellCheckIterator serves this purpose by abstracting the tokenizing of text media to a simple iteration. The actual spell-check process
is launched by calling ISpellChecker#execute(ISpellCheckIterator). This method uses the indicated spell-check iterator to determine the
words that are to be spell-checked. This package provides no default implementation of a spell-check iterator.
Event Handling
To communicate the results of a spell-check pass, spell-checkers fire spell events that inform listeners about the status
of a particular word being spell-checked. Instances that are interested in receiving spell events must implement
the interface ISpellEventListener and register with the spell-checker before the spell-check process starts.
A spell event contains the following information:
- The word being spell-checked
- The begin index of the current word in the text medium
- The end index in the text medium
- A flag whether this word was found in one of the registered dictionaries
- A flag that indicates whether this word starts a new sentence
- The set of proposals if the word was not correctly spelt. This information is lazily computed.
Spell event listeners are free to handle the events in any way. However, listeners are not allowed to block during
the event handling unless the spell-checking process happens in another thread.