Consumers are more likely to engage with virtual assistants that communicate in their preferred language. The Kore.ai XO Platform supports enabling multiple languages within an assistant without having to rebuild the definitions. The platform supports over 100 languages and you can choose to enable any of these languages for your assistant. You can start building with one language and enable additional languages as and when you need them. Â
This article takes you through the general context within which multilingual VAs work. It discusses language use in VA-user conversations and in NLU processes, language enablement options, as well as language detection and selection.Â
Building a Multilingual VA
If you want to build a Multilingual VA, there are a few points to keep in mind:
- There is a set of basic building blocks to a multilingual VA: the language in which it talks to users, the language in which you train it, and the process through which it detects and selects which languages to use. Please continue reading this article to learn more.
- You can create a new Virtual Assistant or add new languages to an existing one. Read more about managing languages here.
- Multilingual VAs have language-specific components and some features exhibit different behaviors compared to single language assistants. Read more about this here.
- Translation can be automated using pre-built translation services from providers such as Microsoft or Google; or custom services, including any that you may build in-house. Read here to learn more.
The Conversation (Bot) Language
Enabling a language requires you to train the model to understand the user’s input and present the responses in the user’s language. To achieve this, the platform allows you to choose a Conversation (Bot) Language and NLU Language for every language that you would like to enable. In most cases, the Conversation Language can be the same as the NLU Language.
Conversation Language is the one that users use to interact with the assistant. You can choose any of the over 100 languages as the Conversation Language. You can define the responses (prompts, messages. etc.,) in the Conversation Language.Â
You can also use the automatic response translation feature when you or your team does not have expertise in the user’s language. You can write the responses in your preferred language and the platform will automatically translate them to the user’s language during the conversation.Â
Supported Bot Languages
The following are the Conversation Languages supported by the Platform:
| Afrikaans – af | Haitian_creole – ht | Romanian – ro | 
| Amharic – am | Hungarian – hu | Russian – ru | 
| Assamese – as | Irish – ga | Sinhalese – si | 
| Arabic – ar | Indonesian – id | Slovak – sk | 
| Azerbaijani – az | Igbo – ig | Slovenian – sl | 
| Armenian – hy | Icelandic – is | Spanish – es | 
| Albanian – sq | Italian – it | Samoan – sm | 
| Bulgarian – bg | Japanese – ja | Shona – sn | 
| Belarusian – be | Javanese – jv | Somali – so | 
| Bengali/Bangla – bn | Kazakh – kk | Serbian – sr | 
| Basque – eu | Khmer – km | Sesotho – st | 
| Bosnian – bs | Kannada – kn | Sundanese – su | 
| Burmese – my | Korean – ko | Swedish – sv | 
| Cebuano – ceb | Kurdish – km | Swahili – sw | 
| Catalan – ca | Kyrgyz – ky | Tamil – ta | 
| Chinese – izh | Kinyarwanda -rw | Tagalog – tl | 
| Corsican – co | Latin – la | Tibetan – bo | 
| Croatian – hr | Luxembourgish – lb | Telugu – te | 
| Czech – cs | Laothian/Laos/lao – lo | Tajik – tg | 
| Danish – da | Lithuanian – lt | Thai – th | 
| Dutch – nl | Latvian – lv | Turkmen – tk | 
| English – en | Marathi – mr | Tagalog/Filipino – fil | 
| Esperanto – eo | Malay – ms | Turkish – tr | 
| Estonian – et | Malagasy – mg | Tatar – tt | 
| Finnish – fi | Maori – mi | Uighur/Uyghur – ug | 
| French – fr | Macedonian – mk | Urdu – ur | 
| Frisian – fy | Maltese – mt | Ukrainian – uk | 
| German – de | Malayalam – ml | Uzbek – uz | 
| Greek – el | Mongolian – mn | Vietnamese – vi | 
| Galician – gl | Nepali – ne | Wolof – wo | 
| Georgian – ka | Norwegian – nb | Welsh – cy | 
| Gujarati – gu | Nyanja – ny | Xhosa – xh | 
| Hausa – ha | Oriya/Odia – or | Yiddish – yi | 
| Hawaiian – haw | Punjabi – pa | Yoruba – yo | 
| Hebrew – he | Polish – pl | Zulu – zo | 
| Hindi – hi | Portuguese (Brazilian) – pt | |
| Hmong – hmn | Portuguese (European) – pt_pt | |
| Hmong – hmn | Persian – fa | 
The NLU Language
The NLU Language is the one that you train the assistant with, to identify the user’s intents. The NLU model is built using the NLU Language that you choose. This language can be the same as the Conversation language or it can be any other supported language.Â
Supported NLU Languages
The following are the NLU Languages supported by the platform. While most of the NLU features are supported in all languages, there are some exceptions, see here for more details.
| Arabic – ar | Kazakh (post v7.2 release) – kk | 
| Chinese Simplified – zh_cn | Marathi (post v9.0 release) – mr | 
| Chinese Traditional -zh_tw | Norwegian (post v8.1 release) – nb | 
| Catalan (post v9.0 release) – ca | Polish (post v7.0 release) – pl | 
| Dutch – nl | Portuguese (Brazilian) – pt | 
| English – en | Portuguese (European) – pt_pt | 
| French – fr | Russian (post v7.0 release) – ru | 
| Finnish (post v6.4 release) – fi | Swedish (post v7.1 release) – sv | 
| Hindi (post v8.1 release) – hi | Slovenian – sl | 
| German – de | Spanish – es | 
| Indonesian – id | Tagalog – tl | 
| Italian – it | Telugu (post v9.0 release) – te | 
| Japanese – ja | Tamil (post v9.0 release) – ta | 
| Korean – ko | Ukrainian (post v7.0 release) – uk | 
Language-specific NLU ModelsÂ
The platform supports language-specific NLU models for 26 languages. These models are pre-trained to understand system entities, concepts, sentiment, etc. in specific languages.Â
- In most cases, the NLU Language can be the same as the Conversation Language for the 26 languages listed below.Â
- There may be cases where you can choose an NLU Language that is different from the Conversation Language. For example, you want to enable the Arabic language for your assistant but train using the English language. You can enable the automatic input translation feature to support this flow. The user input is automatically translated to the NLU Language during the conversation.Â
- These models provide a wide range of configurations for you to fully customize the model behavior.Â
The Multilingual NLU ModelÂ
Multilingual NLU model is a language-agnostic model that understands over 100 languages.Â
- Translation of user input is not required as the model is pre-trained to understand over 100 languagesÂ
- As the model is language agnostic, you can train the model in any of your preferred languages or a combination of languages.Â
- This model supports fewer configurations as compared to the language-specific NLU models.
Language Enablement Options
The platform offers various options for you to enable additional languages. You can choose a combination of Conversation Language, NLU Language, Input Translation, and Response Translation that suits your needs.
Scenario 1: Enabling a language in which you can train as well
Example
| Conversation Language | NLU Language | Input Translation | Response Translation | 
| English | English | Not Required | Optional | 
- This is one of the common ways of enabling languages.Â
- The Conversation Language and the NLU Language will be the same.Â
- Input Translation and Response Translation are not required for this flow.
Scenario 2: Enabling a language using another language as NLU Language
Example
| Conversation Language | NLU Language | Input Translation | Response Translation | 
| Arabic | English | Required | Optional | 
| Georgian | French | Required | Optional | 
- Use this flow if you want to train the assistant in a language other than the conversation language.Â
- You can also use this flow if the language you want to enable is not supported as an NLU Language.
- Input Translation is required for this flow to translate the user’s input to the NLU Language.Â
- You will need to enable the Response Translation option if the responses are defined in a language other than the conversation language.
Scenario 3: Enabling a language using the Multilingual NLU model
Example
| Conversation Language | NLU Language | Input Translation | Response Translation | 
| Arabic | Multilingual Model | Not Required | Optional | 
| Georgian | Multilingual Model | Not Required | Optional | 
- Use this flow if you want to train using the multilingual NLU model.Â
- You can also use this flow if the language you want to enable is not supported as an NLU Language.
- Input Translation is not required for this flow as the multi-lingual model understands over 100 languages.Â
- You will need to enable the Response Translation option if the responses are defined in a language other than the conversation language.
Language Detection and Selection
Multilingual virtual assistants auto-detect and switch language based on the user’s utterance. An exception to this rule is when the user is expected to enter a value against an entity and the user input satisfies that entity’s criteria.
Language Detection
There are three ways an assistant can detect the language based upon the user utterance:
- By Default: Kore.ai’s platform uses its own language detection algorithm to detect language from the user utterance. This is the default setting and the end user’s language will be detected by the platform.
- Google API: For on-prem installation, you can go with the above-mentioned default setting of Kore.ai’s in-house language detection algorithm or use Google APIs for language detections. You can set it in the Kore Config file.
- BotKit SDK: If you are using BotKit SDK, you may also send the following cheat command from your BotKit to the platform:
 cheat language <language name or code>
The assistant continues to communicate with the user in the same language. If the user switches to another enabled language anytime later, the assistant changes to the new language automatically.
If the assistant fails to detect a user’s language with high confidence, it requests the user to select a preferred language from the list of enabled options.
Tagalog language detection
Tagalog is widely spoken in the Philippines and uses the Latin alphabet, the same script as English. This shared alphabet makes it challenging for standard language detection methods to distinguish between Tagalog and English, particularly in multilingual environments. To improve accuracy, the platform employs a specialized Tagalog detection wrapper with a two-layer system specifically designed to identify Tagalog utterances.
Tagalog Detection Wrapper
When Tagalog is enabled as a supported bot language, all incoming utterances are first processed through the Tagalog detection wrapper.
Detection logic
The system uses a two-step approach:
- Primary Detection: The Langdetect module analyzes the input to determine if it’s Tagalog. If Tagalog is detected, the utterance is processed accordingly.
- Secondary Verification: If the Langdetect module doesn’t identify the utterance as Tagalog, the system performs dictionary-based verification:
- Short utterances (1–3 words): All words must exist in the Tagalog dictionary for the utterance to be classified as Tagalog.
- Longer utterances (4+ words): At least 60% of the words must match entries in the Tagalog dictionary.
If fewer than 60% of words in longer texts match the Tagalog dictionary, the utterance is not classified as Tagalog, to avoid misclassification with English.
Language SelectionÂ
- The virtual assistant identifies user language from every utterance. In case a change is detected, it will get a confirmation from the user regarding the switch and will proceeds as per the user response.  These standard responses can be customized using the getCurrentOptions utility, see here for more. 
 Note that the current conversation will be discarded in case the user wants to switch languages. 
- Language selection settings – In addition, you may want to configure the language selection options. From the menu under the Build tab, click Configurations -> Languages. Under Language Selection Logic (scroll down for the option), for language selection time frame, set to one of the following:
- Lifetime: The auto-detected language will be set as the user’s preferred language and used for all subsequent communications until the user interacts in another enabled language anytime later. If the user starts to talk in another enabled language, the virtual assistant changes to that language.
- Per Session: Detects the user’s language at the beginning of every session and responds accordingly.
- Every User Message: Identifies the user’s language from every utterance. In case a change is detected, the VA will get a confirmation from the user regarding the switch and proceed as per the user response. 
 Note that the current conversation will be discarded in case the user wants to switch languages. This feature was introduced in release 7.2 and is the default setting for multilingual virtual assistants. 
 
- For testing and debugging purposes, you can override the language selection settings by using the cheat command during a chat session. Replace the language name or code with one of these values:
- 
- English: English or EN
- German: German or DE
- French: French or FR
- Spanish: Spanish or ES
 
