Language Detection

Language Detection

CoffeeHouse uses multiple methods of language detection to accurately predict the language of the given input, while this method is considered accurate it suffers from false predictions when it comes to very short input (1-2 words), if you are working with one or more sets of input from the same source we recommend that you use generalization

The input of your data is limited by your subscription and a larger input can take longer to process

Parameter NameDefault ValueRequiredDescription
inputNULLTrueThe given input to process, the size of the input is limited by your subscription
languageenFalseThe language the input is based in, if the given language is not english then CoffeeHouse will attempt to translate the input to english before processing
sentence_split0FalseSplits the results into sentences
generalize0FalseGeneralize the results using a generalization table
generalization_sizeNULLFalseThe size of the generalization table to create
generalization_idNULLFalseThe ID of the generalization table to use

Example Success Response (Without sentence_split)

The results has been shortened for demonstration purposes
{
  "success": true,
  "response_code": 200,
  "results": {
    "text": "But I must explain to you how all this mistaken idea of denouncing pleasure and praising pain was born and I will give you a complete account of the system. expound the actual teachings of the great explorer of the truth, the master-builder of human happiness.",
    "language_detection": {
      "language": "en",
      "prediction": 79.12586236633693,
      "predictions": {
        "en": 79.12586236633693,
        "av": 0.2295123502269184,
        "km": 0.17199043739382588,
        "la": 0.07712329250350208,
        "ps": 0.06653515230513536,
        "sn": 0.05174984055836619,
        ...
        "qu": 1.754251991361973e-5,
        "bo": 1.4324942825786477e-5,
        "oc": 1.2381976573822845e-5,
        "sc": 1.2317664747116796e-5,
        "gn": 1.2147578650476936e-5
      }
    },
    "generalization": null
  }
}

Example Success Response (With sentence_split)

The results has been shortened for demonstration purposes
{
  "success": true,
  "response_code": 200,
  "results": {
    "text": "But I must explain to you how all this mistaken idea of denouncing pleasure and praising pain was born and I will give you a complete account of the system. expound the actual teachings of the great explorer of the truth, the master-builder of human happiness.",
    "language_detection": {
      "language": "en",
      "prediction": 1.0322307389278638,
      "predictions": {
        "en": 1.0322307389278638,
        "my": 0.017448664741756926,
        "th": 0.012424069689276156,
        "sn": 0.008830834824679662,
        "lo": 0.005382460092292741,
        "km": 0.004873692501326716,
        "te": 0.004152841940107997,
        ...
        "wa": 4.6606214204199184e-7,
        "rn": 4.171656577970209e-7,
        "wo": 4.0380276295881717e-7,
        "qu": 2.797880597721182e-7,
        "oc": 2.5404668992172175e-7,
        "sc": 1.5684318758703247e-7
      }
    },
    "sentences": [
      {
        "text": "But I must explain to you how all this mistaken idea of denouncing pleasure and praising pain was born and I will give you a complete account of the system.",
        "offset_begin": 0,
        "offset_end": 156,
        "language_detection": {
          "language": "en",
          "prediction": 1.0322307389278638,
          "predictions": {
            "en": 1.0322307389278638,
            "my": 0.017448664741756926,
            "th": 0.012424069689276156,
            "sn": 0.008830834824679662,
            "lo": 0.005382460092292741,
            "km": 0.004873692501326716,
            ...
            "wo": 4.0380276295881717e-7,
            "qu": 2.797880597721182e-7,
            "oc": 2.5404668992172175e-7,
            "sc": 1.5684318758703247e-7
          }
        }
      },
      {
        "text": "expound the actual teachings of the great explorer of the truth, the master-builder of human happiness.",
        "offset_begin": 157,
        "offset_end": 260,
        "language_detection": {
          "language": "en",
          "prediction": 94.6400315850848,
          "predictions": {
            "en": 94.6400315850848,
            "my": 3.6607607565003675,
            "th": 2.608531731357951,
            "sn": 1.8952699435678055,
            "km": 1.2939182517338264,
            "lo": 1.1978972756304407,
            ...
            "rn": 9.777783551534327e-5,
            "qu": 6.745287005909427e-5,
            "oc": 6.534677168446407e-5,
            "wo": 5.901179558268886e-5,
            "sc": 3.617622901504524e-5,
            "wa": 1.985058409631666e-6
          }
        }
      }
    ],
    "generalization": null
  }
}

Example Success Response (With generalization)

The results has been shortened for demonstration purposes
{
  "success": true,
  "response_code": 200,
  "results": {
    "text": "But I must explain to you how all this mistaken idea of denouncing pleasure and praising pain was born and I will give you a complete account of the system. expound the actual teachings of the great explorer of the truth, the master-builder of human happiness.",
    "language_detection": {
      "language": "en",
      "prediction": 79.12586236633693,
      "predictions": {
        "en": 79.12586236633693,
        "av": 0.2295123502269184,
        "km": 0.17199043739382588,
        "la": 0.07712329250350208,
        "ps": 0.06653515230513536,
        ...
        "bo": 1.4324942825786477e-5,
        "oc": 1.2381976573822845e-5,
        "sc": 1.2317664747116796e-5,
        "gn": 1.2147578650476936e-5
      }
    },
    "generalization": {
      "id": "aeb19b65f4af8fd29f158c89648d0468292f73f6708e4cc33e4db8244409358f",
      "size": 20,
      "top_label": "en",
      "top_probability": 79.12586236633693,
      "probabilities": [
        {
          "label": "en",
          "calculated_probability": 79.12586236633693,
          "current_pointer": 0,
          "probabilities": [
            79.12586236633693
          ]
        },
        {
          "label": "av",
          "calculated_probability": 0.2295123502269184,
          "current_pointer": 0,
          "probabilities": [
            0.2295123502269184
          ]
        },
        {
          "label": "km",
          "calculated_probability": 0.17199043739382588,
          "current_pointer": 0,
          "probabilities": [
            0.17199043739382588
          ]
        },
        ...
        {
          "label": "bo",
          "calculated_probability": 1.4324942825786477e-5,
          "current_pointer": 0,
          "probabilities": [
            1.4324942825786477e-5
          ]
        },
        {
          "label": "oc",
          "calculated_probability": 1.2381976573822845e-5,
          "current_pointer": 0,
          "probabilities": [
            1.2381976573822845e-5
          ]
        },
        {
          "label": "sc",
          "calculated_probability": 1.2317664747116796e-5,
          "current_pointer": 0,
          "probabilities": [
            1.2317664747116796e-5
          ]
        }
      ]
    }
  }
}

Response Structure

NameTypeDescription
textstringThe text of the input
language_detectionLanguageDetectionLanguageDetection object that represents the language detection values
sentencesLanguageDetectionSentence[]Array of sentence splits containing LanguageDetection values, this will be returned if you use sentence_split
generalizationGeneralization|nullThe generalization results if generalization is used

LanguageDetection Object Structure

NameTypeDescription
languagestringThe language prediction value with the highest score
predictionfloatThe prediction value of the language value
predictionsarrayString:Float combination of all the predictions, includes ISO-639-1 Language codes as keys

LanguageDetectionSentence Object Structure

NameTypeDescription
textstringThe text of the sentence
offset_beginintThe character offset begin of the sentence
offset_endintThe character offset end of the sentence
language_detectionLanguageDetectionLanguageDetection object that represents the language detection values for this sentence

Supported Languages

These are the supported languages that the Language Detection can detect.
Label
en
my
th
sn
lo
km
te
si
or
la
gu
lv
yi
sk
mn
av
zu
ceb
xh
ja
cs
as
ka
ps
ml
ms
ne
mi
yo
nl
da
bn
ko
rw
id
de
ur
is
fa
vi
ig
gd
zh-cn
tl
zh-tw
su
af
ga
haw
tr
hi
tg
hmn
am
el
ku
no
sv
pt
hu
pl
mg
lb
so
fy
ru
kk
et
ny
hy
nb
uk
be
bg
om
ro
sw
st
sl
gl
ca
sa
ar
cy
mr
pa
ta
tn
bs
kn
mt
ky
az
he
fi
mk
ug
sd
ba
fo
uz
lt
kv
sm
hr
es
ht
cv
sq
ha
ia
tk
eu
nn
it
jv
co
fr
sr
ce
to
br
rm
gv
li
eo
kw
se
dv
ln
lg
kl
an
ay
tt
vo
gn
ie
bo
os
nv
io
wa
rn
wo
qu
oc
sc

Generalization Labels

This method supports generalization and will use the supported languages as labels for generalization, for more information on how generalization works see Generalization - Introduction