GETTING STARTED
Kore.ai XO Platform
Virtual Assistants Overview
Natural Language Processing (NLP)
Concepts and Terminology
Quick Start Guide
Accessing the Platform
Working with the Builder
Building a Virtual Assistant
Using Workspaces
Release Notes
Current Version
Previous Versions
Deprecations

CONCEPTS
Design
Storyboard
Dialog Tasks
Overview
Dialog Builder
Node Types
Intent Node
Dialog Node
Entity Node
Form Node
Confirmation Node
Message Nodes
Logic Node
Bot Action Node
Service Node
Webhook Node
Script Node
Group Node
Agent Transfer
User Prompts
Voice Call Properties
Dialog Task Management
Connections & Transitions
Component Transition
Context Object
Event Handlers
Knowledge Graph
Introduction
Knowledge Extraction
Build Knowledge Graph
Add Knowledge Graph to Bot
Create the Graph
Build Knowledge Graph
Add FAQs
Run a Task
Build FAQs from an Existing Source
Traits, Synonyms, and Stop Words
Manage Variable Namespaces
Update
Move Question and Answers Between Nodes
Edit and Delete Terms
Edit Questions and Responses
Knowledge Graph Training
Knowledge Graph Analysis
Knowledge Graph Import and Export
Importing Knowledge Graph
Exporting Knowledge Graph
Creating a Knowledge Graph
From a CSV File
From a JSON file
Auto-Generate Knowledge Graph
Alert Tasks
Small Talk
Digital Skills
Digital Forms
Views
Introduction
Panels
Widgets
Feedback Survey
Train
Introduction
ML Engine
Introduction
Model Validation
FM Engine
KG Engine
Traits Engine
Ranking and Resolver
NLP Configurations
NLP Guidelines
Intelligence
Introduction
Contextual Memory
Contextual Intents
Interruption Management
Multi-intent Detection
Amending Entities
Default Conversations
Sentinment Management
Tone Analysis
Test & Debug
Talk to Bot
Utterence Testing
Batch Testing
Conversation Testing
Deploy
Channels
Publish
Analyze
Introduction
Conversations Dashboard
Performance Dashboard
Custom Dashboards
Introduction
Meta Tags
Dashboards and Widgets
Conversations History
Conversation Flows
Feedback Analytics
NLP Metrics
Containment Metrics
Usage Metrics
Smart Bots
Universal Bots
Introduction
Universal Bot Definition
Universal Bot Creation
Training a Universal Bot
Universal Bot Customizations
Enabling Languages
Store
Manage Assistant
Plan & Usage
Overview
Usage Plans
Support Plans
Invoices
Authorization
Multilingual Virtual Assistants
Masking PII Details
Variables
IVR Settings
General Settings
Assistant Management
Data Table
Table Views
App Definitions
Sharing Data Tables or Views

HOW TOs
Build a Flight Status Assistant
Design Conversation Skills
Create a Sample Banking Assistant
Create a Transfer Funds Task
Create a Update Balance Task
Create a Knowledge Graph
Set Up a Smart Alert
Design Digital Skills
Configure Digital Forms
Configure Digital Views
Add Data to Data Tables
Update Data in Data Tables
Add Data from Digital Forms
Train the Assistant
Use Traits
Use Patterns for Intents & Entities
Manage Context Switching
Deploy the Assistant
Configure an Agent Transfer
Use Assistant Functions
Use Content Variables
Use Global Variables
Web SDK Tutorial
Widget SDK Tutorial
Analyze the Assistant
Create a Custom Dashboard
Use Custom Meta Tags in Filters

APIs & SDKs
API Reference
API Introduction
API List
API Collection
koreUtil Libraries
SDK Reference
SDK Introduction
SDK Security
SDK Registration
Web Socket Connect and RTM
Using the BotKit SDK
BotKit SDK Tutorial - Blue Prism

ADMINISTRATION
Introduction
Assistant Admin Console
Administration Dashboard
User Management
Add Users
Manage Groups
Manage Roles
Assistant Management
Enrollment
Invite Users
Send Bulk Invites
Import User Data
Synchronize Users from AD
Security & Compliance
Using Single-Sign On
Security Settings
Cloud Connector
Analytics
Billing
  1. Home
  2. Docs
  3. Virtual Assistants
  4. Natural Language
  5. Machine Learning22 min read

Machine Learning22 min read

Developers need to provide sample utterances for each intent (task) the bot needs to identify to train the machine learning model. The platform ML engine will build a model that will try to map a user utterance to one of the bot intents.

Kore.ai’s Bots Platform allows fully unsupervised machine learning to constantly expand the language capabilities of your chatbot – without human intervention. Unlike unsupervised models in which chatbots learn from any input – good or bad – the Kore.ai Bots Platform enables chatbots to automatically increase their vocabulary only when the chatbot successfully recognizes the intent and extracts the entities of a human’s request to complete a task.

However, we recommend keeping Supervised learning enabled to monitor the bot performance and manually tune where required. Using the bots platform, developers can evaluate all interaction logs, easily change NL settings for failed scenarios, and use the learnings to retrain the bot for better conversations.

Multiple Intent Model

Training of “similar intents” with different purposes is usually difficult as the training given for an intent can add noise or conflict with the training given to the other intent. This is more evident in cases where the intents have a contextually different meaning or purpose.

Consider the following case, here when the user is in the Place Order task, any query pertaining to returns policy or delivery options should be answered in the placed order context. But the query from the generic Return a product FAQ would be triggered.

Enabling the Multiple Intent Models from the Advanced NLP Configurations (see here for how) allows you to have a dedicated ML model only for the primary intents and separate ML Models for each of the dialogs with their associated sub-intents so that the intent detection of sub-intents gets preferential treatment.

Continuing with the above example, with a Multiple Intent Model, you can define a separate context-based FAQ and ensure a proper response to the user.

All the primary intents of the bot will be part of the Bot Level Intent Model. Each of the Dialog tasks will have its own ML Model consisting of all the sub-intents added to it. The Thresholds and Configurations can be individually configured for each of the models.

For example, the Bot Level Intent Model can use ‘Standard’ Network Type and a specific Dialog’s intent model can use ‘LSTM’ Network Type.

Adding Machine Learning Utterances

  1. Open the bot for which you want to add sample user utterances.
  2. Select the Build tab from the top menu.
  3. From the left menu, select the Natural Language -> Training option.
  4. By default, the tab with a list of all Intents would be displayed.
  5. You can use the filter option to restrict the display items to Dialog, Sub-dialog, Sub-intents, or Action tasks.
  6. Click the Utterances/+ Utterance against the Intent for which you want to add the utterances
  7. The user utterance page would open.
  8. Here enter the utterances. Note that utterances greater than 3,000 characters in length are not allowed.

Note: Utterances added should be unique, but in the case of multiple intent models, the same utterance can be used across different models.

The negation of trained intents will be ignored by the platform.
For example, consider a Banking Bot with trained utterance – Funds Transfer. Then a user utterance “My account is debited even without doing funds transfer” will not trigger the “funds transfer” task.

Named Entity Recognition

Apart from the intent, you can train your Bot to recognize the entities, if present, in the user utterance. For example, if the user says “Book Flight from Hyderabad to Mumbai” apart from recognizing the intent as “Book Flight” the source and destination of the flight should also be recognized. This can be achieved by marking the entities in the user utterance during training.

You can mark entities in your utterances, by selecting the entity value and clicking the corresponding entity name.

The platform will also try to identify and mark the entities, you have the option to accept or discard these suggestions. The platform will identify the entities based upon:

  • System entities;
  • Static List of items either enumerated or lookup;
  • NER trained entities (from above).

For each of the entities thus marked, the confidence scores identified by the ML engine are displayed. This score is available only when Conditional Random Field is selected as the NER model.

Further, if you have enabled Entity Placeholders the platform will replace the entity values in the training utterance with entity name placeholders for training the ML model. Using actual entity values as well as multiple additions of an utterance with just a change in the entity value will have an adverse impact on the ML training model. The name of entities also starts contributing highly to the intent detection model.

Training your Bot

After you add user utterances, you should train the Kore.ai interpreter to recognize the utterances and the associated user intent. When you have untrained utterances in your bot, the following message is displayed:

“You have untrained utterances in your ML model. Train your bot to update with all your utterances.”

Click Train. A status bar is displayed to show progress for utterance training. When complete, the Utterances trained successfully message is displayed. The user utterances are added to the Machine Learning Database. You can further configure the ML engine, identify the dummy intents when a user utterance contains the words that are not used in the bot’s training i.e. bot vocabulary, refer here for more details.

Learn how to test your bot.

Auto-Train

By default, machine learning is automatically trained for any defined user utterances whenever a task is:

  • changed from a status of In-Progress to Configured.
  • updated with a new
    • task name or intent name,
    • entity name or parameter name,
    • entity type,
    • bot name.
  • published.
  • suspended by the Bots Admin.
  • deleted by the Bots Admin.

In Bot Builder when auto-train is in progress, a warning message that untrained user utterances cannot be identified is displayed if you try to test the bot before auto-train is complete.

You can set the Auto Train option as follows:

  1. Open the bot for which you want to modify the settings.
  2. Select the Build option from the top menu.
  3. From the left navigation menu, Natural Language -> Advanced Settings.
  4. Enable or Disable the Auto Training option as per your requirements.

Negative Patterns

Negative patterns can be used to eliminate intents detected by the Fundamental Meaning or Machine Learning models. Refer here to know more.

Threshold & Configurations

To train and improve the performance Threshold and Configurations can be specified for all three NLP engines – FM, KG, and ML. You can access these settings under Build > Natural Language > Thresholds & Configurations.

NOTE: If your Bot is multilingual, you can set the Thresholds differently for different languages. If not set, the Default Settings will be used for all languages.

The settings for the ML engine are discussed in detail in the following sections.

Machine Learning

The Bots Platform ver 6.3 upgraded its Machine Learning (ML) model to v3. This includes a host of improvements and also allows developers to fine-tune the model using parameters to suit business requirements. The developers can change parameters like stopword usage, synonym usage, thresholds, and n-grams, as well as opt between Deep Neural Network or Conditional Random Field-based algorithm for the Named-Entity Recognition (NER) model.

In v8.0 of the platform, provision has been enabled to use the v5 of the ML intent model and externalize several hyperparameters. This can be achieved through the Advanced NLP Configuration, refer here for details.

When the ‘multiple intents model’ option is enabled, the ML Engine maintains multiple intent models for the bot as follows:

  • Bot level Intent Model containing all the Primary Intents of the bot which includes Primary Dialog Intents, and Alert Task Intents.
  • Dialog Intent Models – one for every primary dialog intent and sub-dialog intent which includes the Sub-intent nodes added to the dialog definition, Sub-intents scoped as part of the Group nodes and Interruption exceptions added to the dialog definition.

You can configure the Thresholds and Configurations separately for each of the intent models. This includes

  • All the configurations under Thresholds and Configurations – ML Engine as discussed in the below section;
  • All the ML Engine configurations under the Advanced NLP Configurations discussed in detail here.

Configuring the Machine Learning Parameters

The Bots Platform provides language-wise defaults for the following parameters related to the ML performance of your bot. You can customize them to suit your particular needs.

Points to note in ML configurations:

  • The following is the list of all possible configurations and these are available for both single and multiple intent models.
  • When the multiple intent model is enabled, you can configure the individual models by selecting the Configure link against the model.
  • While there is only one Bot level intent model, you can add multiple dialog intent models using the Add New button and configure each as per your requirements.
  • Advanced ML Configurations can be applied from here or from the Advanced NLP Configurations section refer here for details.

Network Type

You can choose the Neural Network that you would like to use to train the intent models. This setting has been moved to Machine Learning from Advanced NLP Configurations in v8.1.

You can choose between the following types. Based on the selection additional configurations can be done from the Advanced NLP Configurations section, refer here for details.

  • Standard;
  • MLP-BOW – The bag-of-words model is a simplifying representation used in natural language processing and information retrieval. In this model, a text is represented as the bag of its words, disregarding grammar and even word order but keeping multiplicity.
  • MLP-WordEmbeddings – Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing where words or phrases from the vocabulary are mapped to vectors of real numbers.
  • LSTM (Long Short-Term Memory) is an artificial recurrent neural network (RNN) architecture used in the field of deep learning. LSTM has feedback connections and hence has the ability to capture long-term dependencies for texts of any length and is well suited for longer texts.
  • CNN (convolutional neural networks) is a class of deep neural networks in deep learning most commonly applied to analyzing visual imagery. It makes use of the word order for a specific region size and has achieved remarkable results on various text classification tasks.
  • Transformers use a Universal Sentence encoder in the vectorization stage of the Training pipeline. The output of the sentence encoder is fed to a Multi-Layer perceptron network for training. SentenceEncoder has an inbuilt capability of understanding the semantic similarity between sentences taking into account the synonyms and various usage patterns of the same sentence.
    The Universal Sentence Encoder encodes text into high-dimensional vectors that can be used for text classification, semantic similarity, clustering, and other natural language tasks. The model is trained and optimized for greater-than-word length text, such as sentences, phrases, or short paragraphs. It is trained on a variety of data sources and a variety of tasks with the aim of dynamically accommodating a wide variety of natural language understanding tasks. The input is the variable-length English text and the output is a 512-dimensional vector.
  • KAEN (Kore Advanced Embeddings Network) – Models trained with Sentence Embeddings alone can not understand the domain-specific terminology especially if the words from training are non-dictionary words. Kore.ai provides a model which can understand the meaning of the sentence and at the same time give importance to the domain-specific terminology. There are two parallel layers in work in this model – one to optimize the weights against the sentence embeddings and the other to optimize the word importance for a given sentence.  The activation function used for these two layers is RReLU (Randomized Leaky Rectified Linear Unit, refer here for details)
ML Threshold

ML Threshold defines the criteria for qualifying a probability score of an intent to be a possible or definite match. The default value is set to 0.3. This means that any intent which scores >0.3 is considered as qualified Intent. Intents scoring < 0.3 are rejected

ML Definitive Score

Configure the threshold score for definite matches, which can be set to a value between 80-100%, with the following classification:

  • Probability Score – If the probability score by the classification Engine is >0.95 (default which is adjustable using “ML Definitive Score” divided by 100) Intent is considered as a Definite Match/Perfect Match.
  • Fuzzy logic goes through each utterance of a given Intent and compares it against the user input to see how close the user input and the utterance are (scores are usually from 0-100). If the score is above 95% (default which is adjustable using “ML Definitive Score”) Intent is considered as a Definite Match/Perfect Match.
Bot Synonyms

This setting is Disabled by default. Enable this option if you would like to consider intent synonyms in building the ML model.

Enabling Synonyms allows the ML model to take the synonyms defined under “Synonyms and Concepts” to be considered while training the ML model. It helps in avoiding preparing duplicate utterances.
For example: “I want to transfer funds”.
If we had defined “send”, “give”, “move” as synonyms of “transfer” and “money”, “dollars” as synonyms of “funds”, then we need not add training utterances like “I want to send money” or “I want to give dollars” etc.

NER Model

Choose the NER model to be used for entity detection. Kore.ai provides two entity recognition models for training using NER that follow the same approach with

  • Conditional Random Fields: lightweight and is easy to use for all sizes of datasets
  • Neural network: works well with medium to large datasets but training time is very high
Note: The CRF model supports all languages and the Deep Neural Network model supports English, Spanish, German, and French. This option appears on the screen only when the selected bot language is supported by the Deep Neural Network model.
Stop Words

This setting is Disabled by default. Enable this option if you would like to remove the stop words in the training utterances in building the ML model. Once enabled, stop words are used to filter out the words/phrases from the Training utterances before training the ML model and removed from the user utterance before prediction.

Not valid when Network Type is set to Transformer.

Feature Extraction

Using this option (introduced in ver8.0) you can associate the ML intent model with the preferred algorithm. Not valid when Network Type is set to MLP WordEmbeddings, LSTM, CNN, and Transformer.
The options being:

  • n-gram – this is the default setting and can be used to define the contiguous sequence of words to be used from training sentences to train the model.
    For example, if Generate sales forecast report is the user utterance and if the n-gram is set to 2, then Generate sales, Sales forecast, and Forecast report are used in training the model. If n-gram is set to 3, then Generate sales forecast, and Sales forecast report will be used in training the model.
    You can set the n-gram using the n-gram Sequence Length – The minimum n-gram limit is 1 by default. You can set the maximum limit up to 4.
  • skip-gram – when the corpus is very limited or when the training sentences, in general, contain fewer words then skip-gram would be a better option. For this you need to define:
    • Sequence Length – the length for skip-gram sequences, with a minimum of 2 and a maximum of 4
    • Maximum Skip Distance – the maximum words to skip to form the grams, with a minimum of 1 and a maximum of 3.
Entity Placeholders

Enable to replace entity values present in the training utterances with the corresponding entity placeholders in the training model. Entity placeholders remove the contribution of real entity values in Intent detection. This works only when the entity training(NER) is done via ML. Enabling this flag reduces scores contributed by entity values.
Ex: I want to transfer $500 to John Doe
In the above example, we don’t want the engine to learn that $500 and “John Doe” are important features. Hence they are replaced with their Placeholders once NER is done and the Entity Placeholders flag is enabled. Training utterance becomes “I want to transfer <Amount> to <Payee>”

Not valid when Network Type is set to Transformer.

Upgrading the ML Model

All new bots that are created use the new ML model by default. Developers can upgrade the ML model for old bots or downgrade the model for the bots using the new model.
If you are using a previous model of ML in the bots platform, you can upgrade it as follows:

  1. Open the bot for which you want to upgrade the ML model and go to Natural Language > Thresholds & Configurations.
  2. Expand Machine Learning. Under the ML Upgrade section, click the Upgrade Now button. It opens a confirmation window.
  3. Click Upgrade and Train. You can see new customizable options under the Machine Learning section.

Note: If a bot is exported using the older model (V2) and imported as a new bot, it continues to be in the V2 model until you upgrade it.

Exporting and Importing Machine Learning Utterances

You can import and export ML utterances of a bot into another in CSV and JSON formats. You can choose between ‘In-Development’ or ‘Published’ tasks to export, whereas importing utterances always replace the latest copy of the task in the bot.

How to Export or Import ML Utterances

  1. On the bot’s Build menu, click Natural Language -> Training.
  2. The ‘In-Development’ version of the bot’s ML utterances open by default. If you want to see the utterances in the ‘Published’ version, toggle on the top right side of the window to Published.
    Note: The export of ML utterances varies based on this selection as explained in the Versioning and Behavior of the Exported Utterances section below.
  3. Click the options icon and select an option:

Versioning and Behavior of Imported Utterances

  • The imported utterances in CSV/JSON entirely replace the utterances present in the latest copy of the tasks.
  • If the task is in the Configured status, the utterances in the task get entirely replaced with the new utterances for the task present in the imported file.
  • If the task is in Upgrade in Progress status, the utterances related to the task get entirely replaced with the task utterances present in the imported file. The utterances in the Published copy of the task aren’t affected.
  • If the task is in the Published status, an Upgrade in Progress copy of the task gets created by default and the new utterances present in the imported file will be added to the upgraded copy. The utterances in the Published copy of the task aren’t affected.

Versioning and Behavior of Exported Utterances

  • When you export a bot’s utterances, all the utterances related to every task type – alert, action, information, dialog – get exported.
  • When you export an In Development copy of the bot, the utterances of all tasks in the latest available copy get exported.
  • When you export a Published copy of the bot, all the utterances in the published state get exported.
  • In the case of multi-language bots, the export of utterances includes utterances added in all of the bot languages.
  • Export of utterances to JSON includes NER tagging present in the tasks, whereas CSV export doesn’t include them.

Goal Driven Training Validations

The ML engine enables you to identify issues proactively in the training phase itself with the following set of recommendations: 

  • Untrained Intents – notifies about intents that are not trained with any utterances so that you can add the required training. 
  • Inadequate training utterances – notifies the intents that have insufficient training utterances so that you can add more utterances to them. 
  • Utterance does not qualify any intent (false negative) – notifies about a utterance for which the NLP model cannot predict any intent. For example, an utterance added to Intent A is expected to predict Intent A. Whereas in some cases the model won’t be able to predict neither the trained Intent A nor any other Intents within the model. Identifying such cases proactively helps you to rectify the utterance and enhance the model for prediction. 
  • Utterance predicts wrong intent (false positive) Identifies utterances that predict intents other than the trained intent. For example, when you add an utterance that is similar to utterances from another intent, the model could predict a different intent rather than the intent to which it is trained to. Knowing this would help you to rectify the utterance and improve the model prediction
  • Utterance predicts intent with low confidence – notifies about the utterances that have low confidence scores. With this recommendation, you can identify and fix such utterances to improve the confidence score during the virtual assistant creation phase itself.

How to View NLU Training Validations

  1. On the virtual assistant’s Build menu, click Natural Language -> Training.
  2. In the Intents tab, you can see the set of recommendations for the Intents and ML utterances.
    Note: The errors and warnings in this screen are examples. The ML validations vary based on the error or waning recommendation as explained in the Goal-Based NLU Training Validations section above.
  3. Hover over the validation options and view the following recommendations:
    • Hover on the Error icon to view the recommendations to resolve the error.
      Note: An Error is displayed when the intent has a definite problem that impacts the virtual assistant’s accuracy or intent score. Errors are high severity problems.
    • Hover on the Warning icon and follow the instructions in the warning to enhance the training for ML utterances.
      Note: A warning is displayed when the issue impact the VA’s accuracy and it can be resolved. Warnings are less severe problems when compared to errors.
  4. Once you click on the Intent with error or warning, hover over the Bulb icon to view the summary of error or warning messages as illustrated below:

ML Training Recommendations

  • Give a balanced training for all the intents that the bot needs to detect, add approximately the same number of sample utterances. A skewed model may result in skewed results.
  • Provide at least 8-10 sample utterances against each intent. The model with just 1-2 utterances will not yield any machine learning benefits. Ensure that the utterances are varied and you do not provide variations that use the same words in a different order.
  • Avoid training common phrases that could be applied to every intent, for example, “I want to”. Ensure that the utterances are varied for larger variety and learning.
  • After every change, train the model and check the model. Ensure that all the dots in the ML model are diagonal (in the True-positive and True-negative) quadrant and you do not have scattered utterances in other quadrants. Train the model until you achieve this.
  • Regularly train the bot with new utterances.
  • Regularly review the failed or abandoned utterances and add them to utterance list against a valid task or intent.

NLP Intent Detection Training Recommendations

  • If there are a good number of sample utterances, try training the bot using Machine Learning approach first, before trying to train the fundamental meaning model.
  • Define bot synonyms to build a domain dictionary such as pwd for a password; SB for a savings bank account.
  • After every change to the model training, run the batch testing modules. Test suites are a means to perform regression testing of your bot’s ML model.

NLP Entity Detection Training Recommendations

The best approach to train entities is based on the type of entity as explained below:

  • Entity type like List of Items (enumerated, lookup), City, Date, Country do not need any training unless the same entity type is used multiple types in the same task. If the same entity type is used in a bot task, use either of the training models to find the entity within the user utterances.
  • When the entity type is String or Description, the recommended approach is to use Entity patterns and synonyms.
  • For all other entity types, both NER and Patterns can be used in combination.

Entity Training Recommendations

  • Use NER training where possible – NER coverage is higher than patterns.
  • NER approach best suits detecting an entity where information is provided as unformatted data. For entities like Date and Time, the platform has been trained with a large set of data.
  • NER is a neural network-based model and will need to be trained with at least 8-10 samples to work effectively.

Suggested Reading
You might want to read on ML Model, refer here.

Menu