simplebayes package
Submodules
Module contents
- class simplebayes.SimpleBayes(tokenizer: Callable[[str], List[str]] | None = None, alpha: float = 0.0, language: str = 'english', remove_stop_words: bool = False)[source]
Bases:
objectA memory-based, optional-persistence naïve bayesian text classifier.
- calculate_bayesian_probability(cat: str, token_score: float, token_tally: float) float[source]
Calculates the bayesian probability for a given token/category
- Parameters:
cat (str) – The category we’re scoring for this token
token_score (float) – The tally of this token for this category
token_tally (float) – The tally total for this token from all categories
- Returns:
bayesian probability
- Return type:
float
- calculate_category_probability() None[source]
Caches the individual probabilities for each category
- classify(text: str) str | None[source]
Chooses the highest scoring category for a sample of text
- Parameters:
text (str) – sample text to classify
- Returns:
the “winning” category
- Return type:
str
- classify_result(text: str) ClassificationResult[source]
Returns structured classification output including score.
- classmethod count_token_occurrences(words: List[str]) Dict[str, int][source]
Creates a key/value set of word/count for a given sample of text
- Parameters:
words (list) – full list of all tokens, non-unique
- Returns:
key/value pairs of words and their counts in the list
- Return type:
dict
- load_from_file(absolute_path: str = '') None[source]
Loads classifier state from a persisted model file.
- classmethod normalize_category(category: str | None) str[source]
Validates and normalizes category input.
- save_to_file(absolute_path: str = '') None[source]
Saves classifier state to file using atomic replacement.
- score(text: str) Dict[str, float][source]
Scores a sample of text
- Parameters:
text (str) – sample text to score
- Returns:
dict of scores per category
- Return type:
dict
- tally(category: str) int[source]
Gets the tally for a requested category
- Parameters:
category (str) – The category we want a tally for
- Returns:
tally for a given category
- Return type:
int
- classmethod tokenize_text(text: str) List[str][source]
Default tokenize method; can be overridden
- Parameters:
text (str) – the text we want to tokenize
- Returns:
list of tokenized text
- Return type:
list