simplebayes package

Submodules

Module contents

class simplebayes.SimpleBayes(tokenizer: Callable[[str], List[str]] | None = None, alpha: float = 0.0, language: str = 'english', remove_stop_words: bool = False)[source]

Bases: object

A memory-based, optional-persistence naïve bayesian text classifier.

calculate_bayesian_probability(cat: str, token_score: float, token_tally: float) float[source]

Calculates the bayesian probability for a given token/category

Parameters:
  • cat (str) – The category we’re scoring for this token

  • token_score (float) – The tally of this token for this category

  • token_tally (float) – The tally total for this token from all categories

Returns:

bayesian probability

Return type:

float

calculate_category_probability() None[source]

Caches the individual probabilities for each category

classify(text: str) str | None[source]

Chooses the highest scoring category for a sample of text

Parameters:

text (str) – sample text to classify

Returns:

the “winning” category

Return type:

str

classify_result(text: str) ClassificationResult[source]

Returns structured classification output including score.

classmethod count_token_occurrences(words: List[str]) Dict[str, int][source]

Creates a key/value set of word/count for a given sample of text

Parameters:

words (list) – full list of all tokens, non-unique

Returns:

key/value pairs of words and their counts in the list

Return type:

dict

flush() None[source]

Deletes all tokens & categories

get_summaries() Dict[str, CategorySummary][source]

Returns per-category summary details.

load(source) None[source]

Loads classifier state from a text stream.

load_from_file(absolute_path: str = '') None[source]

Loads classifier state from a persisted model file.

classmethod normalize_category(category: str | None) str[source]

Validates and normalizes category input.

save(destination) None[source]

Saves classifier state to a text stream.

save_to_file(absolute_path: str = '') None[source]

Saves classifier state to file using atomic replacement.

score(text: str) Dict[str, float][source]

Scores a sample of text

Parameters:

text (str) – sample text to score

Returns:

dict of scores per category

Return type:

dict

tally(category: str) int[source]

Gets the tally for a requested category

Parameters:

category (str) – The category we want a tally for

Returns:

tally for a given category

Return type:

int

classmethod tokenize_text(text: str) List[str][source]

Default tokenize method; can be overridden

Parameters:

text (str) – the text we want to tokenize

Returns:

list of tokenized text

Return type:

list

train(category: str, text: str) None[source]

Trains a category with a sample of text

Parameters:
  • category (str) – the name of the category we want to train

  • text (str) – the text we want to train the category with

untrain(category: str, text: str) None[source]

Untrains a category with a sample of text

Parameters:
  • category (str) – the name of the category we want to train

  • text (str) – the text we want to untrain the category with