Overview

SpellCor is a spell correction tool coded fully with Python, inspired by JamSpell (written in C++ with Python bindings via swig). While JamSpell is efficient due to its C++ implementation, it’s not convenient for developers who want to extend or customize features. Considering the prevalence of Python in the NLP community, SpellCor provides a pure Python alternative with greater flexibility.

Why SpellCor?

  • Fully Python — Includes a Python n-gram language model, no C++ dependency
  • Extensible — Easily plug in your own language model (e.g., NN models) with a simple interface
  • Dictionary filtering — Filter candidate words using your own clean dictionary
  • Pip-installablepip install spellcor

Installation

pip install spellcor

Train your own n-gram language model:

python -m spellcor data.txt output_model_dir

Basic Usage

import spellcor

checker = spellcor.SpellChecker()

# Load language model
model_name = "BaseLanguageModel"
model_path = "./tmp/nglm.bin"
checker.load_lang_model(model_name, model_path)

# Sentence correction
checker.fix_sentence("here are some Questino , I am pyspell checkre")
# → "here are some Question , I am spell checker"

# Fix by position
checker.fix_pos(["here", "are", "some", "Questino"], 3)
# → ['Question', 'Questino']

Extending Language Models

Create your own model by subclassing AbstractLanguageModel:

import spellcor

@spellcor.register_lang_model("NewModel")
class MyModel(spellcor.AbstractLanguageModel):
    def __init__(self, model_path, **kwargs):
        super(MyModel, self).__init__("NewModel")
        self.language_model = None
        self.load_lang_model(model_path)

    def get_score(self, sentence, word_start_index, word_end_index):
        pass

    def is_word(self, word):
        return self.language_model.is_word(word)

    def load_lang_model(self, model_path):
        pass

    def word_freq(self, word):
        pass

Dictionary Filtering

# Load a clean dictionary to filter candidates
checker.load_valid_word_dict(dict_path)

# Or filter candidate words separately
checker.load_valid_candidate_dict(dict_path)

# Toggle dictionary usage
checker.use_valid_word_dict = False
checker.use_valid_candidate_dict = True

Citation

If you use SpellCor in your research, please cite:

@software{spellcor,
  author = {Xu, Shiting},
  title = {SpellCor: A Fully Python Spell Correction Tool},
  url = {https://github.com/xvshiting/SpellCor}
}