######################################################################
    Text::Language::Guess 0.02
######################################################################

NAME
    Text::Language::Guess - Trained module to guess a document's language

SYNOPSIS
        use Text::Language::Guess;

        my $guesser = Text::Language::Guess->new();
        my $lang = $guesser->language_guess("bill.txt");

            # prints 'en'
        print "Best fit: $lang\n";

DESCRIPTION
    Text::Language::Guess guesses a document's language. Its implementation
    is simple: Using "Text::ExtractWords" and "Lingua::StopWords" from CPAN,
    it determines how many of the known stopwords the document contains for
    each language supported by "Lingua::StopWords".

    Each word in the document recognized as stopword of a particular
    language scores one point for this language.

    The "language_guess()" function takes a document as a parameter and
    returns the abbreviation of the language that it is most likely written
    in.

    Supported Languages:

    *   English (en)

    *   French (fr)

    *   Spanish (es)

    *   Portugese (pt)

    *   Italian (it)

    *   German (de)

    *   Dutch (nl)

    *   Swedish (sv)

    *   Norwegian (no)

    *   Danish (da)

  Methods
    "new()"
        Initializes the guesser with all stopwords available for all
        supported languges. If "new" has been called before, subsequent
        calls will return the same precomputed stoplist map, avoiding
        collecting all stopwords again (as long as the number of languages
        stays the same, see next paragraph).

        You can limit the number of searched languages by specifying the
        "language" parameter and passing it an array ref of wanted
        languages:

                # Only guess between English and German
            $guesser = Text::Language::Guess->new(languages => ['en', 'de']);

    "language_guess($textfile)"
        Reads in a text file, extracts all words, scores them using the
        stopword maps and returns a single two-letter string indicating the
        language the document is most likely written in.

    "language_guess_string($string)"
        Just like "language_guess", but takes a string instead of a file
        name.

    "scores($textfile)"
        Like "language_guess($textfile)", just returning a ref to a hash
        mapping language strings (e.g. 'en') to a score number. The entry
        with the highest score is the most likely one.

    "scores_string($string)"
        Like "scores", but takes a string instead of a file name.

EXAMPLES
        use Text::Language::Guess;

            # Guess language in a string instead of a file
        my $guesser = Text::Language::Guess->new();
        my $lang = $guesser->language_guess_string("Make love not war");
            # 'en'

            # Limit number of languages to choose from
        my $guesser = Text::Language::Guess->new(languages => ['da', 'nl']);
        my $lang = $guesser->language_guess_string(
                       "Which is closer to English, danish or dutch?");
            # 'nl'

            # Show different scores
        my $guesser = Text::Language::Guess->new();
        my $scores = $guesser->scores_string(
            "This text is English, but other languages are scoring as well");
        use Data::Dumper;
        print Dumper($scores);

            # $VAR1 = {
            #   'pt' => 1,
            #   'en' => 6,
            #   'fr' => 1,
            #   'nl' => 1
            # };

LEGALESE
    Copyright 2005 by Mike Schilli, all rights reserved. This program is
    free software, you can redistribute it and/or modify it under the same
    terms as Perl itself.

AUTHOR
    2005, Mike Schilli <cpan@perlmeister.com>