Package nltk_lite :: Package contrib :: Module langid
[hide private]
[frames] | no frames]

Module langid

source code

Sam Huston 2007

This is a simulation of the article: "Evaluation of a language identification system for mono- and multilingual text documents" by Artemenko, O; Mandl, T; Shramko, M; Womser-Hacker, C. presented at: Applied Computing 2006, 21st Annual ACM Symposium on Applied Computing; 23-27 April 2006

This implementation is intended for monolingual documents only, however it is performed over a much larger range of languages. Additionally three supervised methods of classification are explored: Cosine distance, NaiveBayes, and Spearman-rho

Functions [hide private]
 
run(classifier, training_data, gold_data) source code
Variables [hide private]
  fd = detect.feature({"char-bigrams": lambda t: [string.join(t)...
  training_data = udhr.langs(['English-Latin1', 'French_Francais...
  gold_data = {}
Variables Details [hide private]

fd

Value:
detect.feature({"char-bigrams": lambda t: [string.join(t) [n: n+ 2] fo\
r n in range(len(t)-1)]})

training_data

Value:
udhr.langs(['English-Latin1', 'French_Francais-Latin1', 'Indonesian-La\
tin1', 'Zapoteco-Latin1'])