Improving Bot Performance – NLP Optimization

A chatbot’s ability to consistently understand and interact with a user is dictated by the robustness of the Natural Language Processing (NLP) that powers the conversation.

Kore.ai’s platform uses a unique Natural Language Processing strategy, combining Fundamental Meaning and Machine Learning engines for maximum conversation accuracy with little upfront training. Bots built on Kore.ai’s platform can understand and process multi-sentence messages, multiple intents, contextual references made by the user, patterns and idiomatic sentences, and more. See here for an overview.

The NL engine includes recognition support for a wide range of entities and provides the tools needed to further customize your bot’s language understanding using additional patterns.

Optimizing your Bot

To make sure your bot is NLP-optimized, you can define, and refine names and terms used for your bot to enhance the NLP interpreter accuracy and performance to recognize the right bot task for the user.
You begin by defining synonyms at the task level, and then manage and refine synonyms, and test at the bot level.

To get started optimizing your bot and bot tasks, you need to access the Natural Language options. These options are categorized under various headings for your convenience:

Training – In the Training section, you can define how the NLP interpreter recognizes and responds to the user input for a bot, and then train the interpreter to recognize the correct user intent.
- Machine Learning Utterances – With Machine Learning, you can enhance bot recognition of user utterances for better recognition and system performance for the user intent which is the intended task that the user wants to access.
- Synonyms & Concepts – You can use the Synonyms section to optimize the NLP interpreter accuracy in recognizing the correct intent and entity provided by the user.
- Patterns & Rules – In the Patterns section, you can define slang, metaphors, or other idiomatic expressions for intent and entities.
Thresholds & Configurations – In this section, you can define the recognition confidence levels required for minimum recognition actions, the confidence range for asking a user to choose from a list of possible matches, and a recognition confidence level for a positive match for the knowledge graph.
Modify Advanced Settings like auto training setting for user utterances and negative intent patterns.

You can start optimizing your bot, by:

Customizing the ML engine, see here for more
Customizing the KG engine, see here for more
Customizing the FM engine, see here for more
Customizing the Traits engine, see here for more
Customizing the R&R engine, see here for more

The rest of this document gives an in-depth understanding of the various aspects of NLP processing and various engines therein.

NLP Training

Morphology is the underlying principle behind NLP. Morphology is the study of words, how they are formed, and their relationship to other words in the same language. It analyzes the structure of words and parts of words, such as stems, root words, prefixes, and suffixes. Morphology also looks at parts of speech, intonation, and stress, and the ways the context can change a word’s pronunciation and meaning.

Based on this, a user utterance undergoes the following preprocessing before an attempt at entity extraction and intent detection:

Tokenization – Splitting of utterances into sentences (Sentence tokenization) and Splitting of Sentence(s) into words. Kore.ai NLP uses TreeBank Tokenizer for English. Each language might have its own tokenizer
toLower() – Convert all the text into lower (Not done for German, since the word meaning changes based on the case). This process is done only by ML and KG engines.
StopWord removal – Each language has its own set of stop words that can be edited by the developer. Removes words that may not contribute to improving the learning. This process is done only in ML and KG engines. This setting is optional, by default is disabled.
Lemmatization or Stemming depending on the language
- Stemming – Retains the stem of the word like “Working”->” work”, “Running”->” Run”, “housing”->”hous”. It basically cuts the words. The output word may not be a valid language word
- Lemmatization – Converts the word to its base form using the dictionary. Like in earlier examples “Working”->” work”, “Running”->” Run” however, “housing”->” house”.
N-grams – Helps in combining co-occurring words. For example, “New York City” and “Internet Explorer”. Each word has its own meaning. But when we take tri-gram in the first case and bi-gram in the second case, it actually results in a more meaningful word. N-grams also help in getting some context before or after a word.

ML Engine

From the training samples and the learning process, ML Engine builds an ML model. As mentioned earlier, Machine Learning Engine is concerned with intent detection and entity extraction.

The intent prediction model is trained using statistical modeling and Neural networks. Intent classification tags the user utterance to a specific intent. The classification algorithm learns from the set of sample utterances that are labeled on how they should be interpreted. Training utterance preparation and curation is one of the most significant aspects of building a robust Machine learning model.
Entity Detection involves recognizing System Entities (Out Of the Box, Rule-based model) predicting Custom Entities (Custom-trainable Rules-based Model), and Named Entity Recognition. System Entities are defined using built-in rules. However, using the NER approach, any named entity can be trained using Machine Learning by simply choosing the value from the ML training utterances and tagging them against the named entity.

ML Training

Steps in training ML engine can be listed as follows:

Choosing and gathering data that can be used as the training set
Dividing the training set for evaluation and tuning (test and cross-validation sets)
Training a few ML models according to algorithms (feed-forward neural networks, support vector machines, and so on) and hyperparameters (for example, the number of layers and the number of neurons in each layer for neural networks)
Evaluating and tuning the model over test and cross-validation sets
Choosing the best performing model and using it to solve the desired task

Tips for better ML training:

Batch test suites compulsory, for comparing various ML models. Run the batch suite, configure the parameters, re-run the suite and compare the results.
There is no set rule as to which ML model to go for. It is a trial and error method – configure the engine, run batch suites and compare results.
If your data is huge then the stop words and synonyms are recognized automatically by the ML engine and taken care of without having to enable them explicitly.
Check for influencer words and if needed add it as a stop word, for the n-gram algorithm.
Prepare as diverse examples as possible.
Avoid adding noise or pleasantries. If unavoidable, add noise so that it is equally represented across intents, else you can easily overfit noise to intents

Confusion Matrix can be used to identify training sentences that are borderline and fix them accordingly. Each dot in the matrix represents an utterance and can be individually edited.

The graph can further be studied for each of the following parameters:

Get rid of false positives and false negatives by assigning the utterance to the correct intent. Click on the dot and on the edit utterance page assign the utterance to the correct intent.
Cohesion can be defined as the similarity between each pair of intents. The higher the cohesion the better the intent training. Improve cohesion by adding synonyms or rephrasing the utterance.
Distance between each pair of training phrases in the two intents. Larger the distance the better the prediction.
Confusing phrases should be avoided i.e. phrases that are similar between intents.

K-fold model is ideal for large data but can be used for less data too with two-folds. Track and fine-tune the F1-score, Precision, and Recall as per your requirements. A higher value of recall score is recommended.

ML Process

Intent Detection

The below diagram summarizes the intent detection pipeline for both training and prediction stages. For the training pipeline, the language detection and auto-correction are not run with the assumption that the trainer would be aware of the language in which training needs to be done and of the spellings to be used which might include domain-specific non-dictionary words like Kore, etc.

Entity Extraction

Entity extraction involves identifying any information provided by the user apart from the intent that can be used in the intent fulfillment. The entities are of three types

System entities like date, time, color, etc are provided out-of-the-box by the platform. It includes nearly 22-24 entities and these are recognized by the ML engine automatically with no training except for string & description entity types.
Custom entities are defined by the bot developer and these include the list of values – enumerated, lookup, and remote, regex expressions, and composite entities. These are also mostly auto-detected by the ML engine.
NER or named entity recognition needs the training to identify the same entity type for different entities e.g. source & destination cities for flight booking intent, both of which are city type entities and the engine needs the training to distinguish between the two. NER can be conditional random field-based or neural network-based. CRF is preferred since it works on lesser data and has a faster training time compared to the NN-based method.

The following diagram summarizes the NER entity extraction pipeline

ML Output

ML Engine runs the classification against the user utterance and generates the following scores output which Ranking and Resolver use for identifying the correct intent:

The probability Score for each class/intent, can be interpreted as follows
- Definitive Match/Perfect Match: If the probability score >0.95 (default and adjustable)
- Possible match: If the score is <0.95%, it becomes eligible for ranking against other intents which may have been found by other engines.
The fuzzy score for each of the classes/intents which are greater than the Threshold score(default is 0.3) – Fuzzy logic goes through each utterance of a given intent and compares it against the user input to see how close the user input and the utterance are. The scores are usually from 0-100 and can be interpreted as follows:
- Definite Match/Perfect Match: If the score is above 95%(default & adjustable)
- Possible match: If the score is <95%, becomes eligible for Ranking against other intents which may have been found by other engines.
CR Sentences – The ML engine also sends the top 5 ML utterances for each of those Intents which have qualified using the Threshold score. These 5 ML utterances are derived using the Fuzzy score. Ranking & Resolver uses these CR sentences to Rescore and choose the best of these utterances (compares each utterance against user input and chooses an utterance with topmost score)

ML Engine Limitations

Though the ML model is very thorough and encompassing, it has its own limitations.

In cases where sufficient training data is not available ML model tends to overfit small datasets and subsequently leading to poor generalization capability, which in turn leads to poor performance in production.
Domain adaptation might be difficult if trained on datasets originating from some common domains like the internet or news articles.
Controllability and interpretability are hard because, most of the time, they work like a black box, making it difficult to explain the results.
Cost is high both in terms of resources and time
The above two points also result in maintenance or problem resolution being expensive (again both in terms of time & effort) and can result in regression issues.

Hence ML engines augmented by FM engines would yield better results. One can train the bot with a basic ML model, and any minor issues can be addressed using FM patterns and negative patterns for idiomatic sentences, command-like utterances, and quick fixes.

FM Engine

The Fundamental Meaning model is a deterministic model that uses semantic rules and language context to determine the intent match. This engine can be trained using synonyms, built-in and custom concepts, and patterns.

The FM model scores user utterance using various semantic rules which include:

Grammar;
Parts of speech;
Word Match, Word Coverage across the sentence, Word Position;
Sentence structure and many more.

FM Process

FM model uses the following processes in training

Tokenization (word segmentation) is the process of breaking up the given text into units called tokens. Hyphenated words are retained (might be subjected to spell correction later); digits with a hyphen are split, eg “2-3” becomes “2 – 3”. Tokenization is not done for known concepts like Dates, Currency, etc.
Substitution is the process of expanding interjections, abbreviations, texting shorthand, and contractions using system concepts. Like ~emohello for all greeting-related expressions, ~yes for confirmation, ~no for rejection, and much more.
Merging is the process of combining a sequence of words that are obviously a single word, numbers, or dates. E.g. “credit card” or “twenty five” or “twenty-five” merged into a single word.
Spell Check is the process of replacing unknown words with known words (if any) and involves case conversion. The platform refers to WordNet and Bot Defined Terms for spell check. E.g. “I wantt to pai bill” becomes “I want to pay bill”
Lemmatize – The Bots Platform uses the WordNet database to look up for lemmas of the words in a given text.
Gleaning to identify sections of utterances and mark them as special. This includes
- Marking polite phrases and treating them as noise, like “can you please….”.
- Language constructs that indicate multiple intents, like “and then”, “and after that”, “but first” cause the sentence to be split into two and do multiple intent detection.
- Identify and normalize numbers and from other related entities: e.g. “seven one three triple five double zero eighty four” => 7135550084, which is probably a phone number.
- System entities like Percentages – “sixty six percent”; Units of measurement – “five sq km”, “12 stone 7 pounds”; Currencies – “twenty bucks”, “six lakh rupees”; Dates and times – “last day of next month”, “10 o’clock tonight”.
POS tagging is the process of marking up a word in a corpus to a corresponding part of a speech tag, based on its context and definition. Part of Speech Tags is useful for building parse trees, which are used in extracting relations between words. POS Tagging is also essential for building lemmatizers which are used to reduce a word to its root form. Each word is assigned a part of the speech tag, and possibly a role (subject/verb/object) from bot definition data.
Marking is the process of assigning concepts to each word. POS tagging and parsing are abstract, dealing with nouns and verbs while Marking applies meanings to the words. For example, “book a flight” – book can be a noun or verb, in this context, it is marked as a verb.

Key Elements in FM Engine

Synonyms, Concepts & Patterns are the cornerstones of the FM engine. These are used in intent detection and also by Ranking & Resolver when trying to choose among the multiple possible intents.

Synonyms need to be used when the words used to identify an intent/entity can be used interchangeably like. The platform comes with a built-in library, it can be augmented by adding domain words that will be used to build a Bot dictionary.
Concepts are a predefined set of choices that are defined once and used in multiple places. The platform has a large set of inbuilt concepts that developers can use to define a pattern like ~world_country, ~asian_country. You can create your own custom concepts which are applicable to your use case, you can also create hierarchical concepts.
Patterns used mainly for intent detection in the FM engine. Can be used to define metaphors or other idiomatic expressions for task names. Concepts can be used in defining patterns.

FM Output

The FM Engine collects information on a word in a given user input depending on:

Position of the matching word in the sentence
Whether the matching word is a Noun or verb
Role of the matching word – Main Subject, Main Verb, Main Object
Exact word match or Synonym
Tense of the matching word – present/future/past

A series of individual scores are calculated from the set of matched words.

The goal is to prefer tasks that match the most likely words in the earliest sentence in the input.
Preference is given to the words when they are close together, towards the start of the sentence, and in the same order as the task label.
It is undesirable if there are several phrases before the task name or if there is a conjunction in the middle of the task label.
Preference is given to tasks in phrases in the present or future tense.

Ontology-based KG

Ontology-based Knowledge Graph turns static FAQ text into an intelligent, personalized conversational experience. It uses domain terms and relationships thus reducing the training needs. It also has the capability to enable the ontology weighted features whenever ML gets confused and automatic conversational dialog for resolving appropriate answers.

Capabilities of a Graph Engine

Ease of training using synonyms – Kore.ai’s Knowledge Graph has a provision to associate synonyms against a graph node. This helps capture the variation in a question.
Better coverage with alternate questions – Knowledge Graph has a provision to add alternate questions. This helps us to capture the various ways a user might ask the same question.
Improved accuracy – Ontology-driven question-answers reduce the possibility of false positives.
Weighing Phrases using Traits – Kore.ai’s graph engine allows developers to build a concept of Traits for filtering out irrelevant suggestions.
Ability to mark term importance – Kore.ai’s graph engine has a provision to mark an ontology term as important.
Ability to group relevant nodes – As the graph grows in size, managing graph nodes can become a challenging task. Using the “organizer node” construct of the ontology engine, bot developers can group relevant nodes under a node.

FAQ Detection Steps

Step 1: Extract Nodes: The KG engine processes the user utterance to extract the term (ontology nodes) present in the graph. It also takes into consideration the synonyms, classes, and tags associated with the terms.
Step 2: Query Graph: The KG engine fetches all the paths that consist of the extracted nodes.
Step 3: Shortlist Paths: All the paths consisting of 50% or more matching terms with the user utterance are shortlisted for further processing.
Note: Patch coverage computation doesn’t consider the root node.
Step 4: Filter with Traits: If classes are defined in the Knowledge Graph, paths shortlisted in the above step are further filtered based on the confidence score of a classification algorithm in user utterance.
Step 5: Send to Ranker: The KG engine then sends the shortlisted paths to the Ontology Ranker Program.
Step 6: Score based on Cosine Similarity: The Ontology Ranker makes use of user-defined synonyms, lemma forms of word, n-grams, stop words, to compute the cosine similarity between the user utterance and the shortlisted questions. Paths are ranked in non-increasing order of cosine similarity score.
Step 7: Qualify Matches: The Ontology Ranker then qualifies the paths as follows:
- Paths with score >= upper_threshold are qualified as an answer (definitive match).
- Paths with lower_threshold < score < upper_threshold are marked as suggestion (probable match).
- Paths with a score < lower_threshold are ignored.

Traits

Traits are entities that can be extracted from user input, before intent recognition. They can be used in multiple scenarios

They in Indirect entity Extraction ex: gender or age-specific words can be inferred from the text. A phrase like “Suit for your baby aged 1-5 years” implies it is a product for toddlers.
Traits can be used for intent recognition, using rules. Any rule match for intent will be considered a definitive match.
Traits can be identified based on keywords/phrases and their synonyms. For Example, Trait-Color: Blue – “Blue”, “Sapphire”, “Teal”; Trait-Color-Red – Red, Maroon, Crimson Trait-Status: NotWorking: “doesn’t work”, “switched off”; Trait-Status:Working: “working”, “turned on”
Traits can also be inferred from a keyword or specific phrase in the sentence. There is no obvious association between certain words in the sentence and the value of the entity, but rather you need the sentence as a whole to determine the value. For example, Trait-Greeting-Emotion: Positive – “Good Morning”, “How are you”; Trait-Greeting-Emotion: Negative – “I hate to say”, “I am not having trouble”

Ranking & Resolver

The Ranking & Resolver engine receives the outputs from the above engines and further processes them.

A quick recap of the various parameters the R&R engine works with:

The output of ML Engine: for a given utterance
- Deterministic match: fuzzy score match of >= 95% match against user input.
- Probable matches: Confidence scores for each intent
- Top 5 utterances (from ML training dataset) of each intent whose confidence score is > threshold (default is 0.3)
- Top 5 utterances are found by ML engine which are close to the user input
The output of KG Engine
- Deterministic Match: Fuzzy score if Deterministic (fuzzy match of >= 95% match against user input).
- Probable matches: Confidence scores of Questions that matched minimum threshold(>=50% match of path terms & >=60% word match)
- Synonyms matched, Nodes matched, Path terms matched, Class/Traits matched
- Original Question, Modified question(replaced with synonyms which matched)
- Alternate questions of each matched question
The output of FM Engine
- Deterministic Match: Intents that matched deterministically (Pattern match or input is exactly the same as the task name)
- Probable matches: Partial label matches including synonyms.

The winner is decided by the Ranking & Resolver as follows:

Definitive match(s) found
- If any engine has found an Intent Definitively, that’s the winning Intent
- If more than one engine found different intents but Deterministically, then consider them as ambiguous and present the found intents as choices to the user to choose the intent which user felt is right.
Possible matches found
- If a deterministic Intent is found, ignore all probable matches.
- If only FM or ML engine found an Intent but probable, that’s the winning intent.
- If only the KG engine found a probable intent and its score is > higher threshold(80%) then that’s the winning intent
- If only the KG engine found a probable Intent and its score is >60% but <80% then that’s the winning intent, but since the confidence is low, show it as a suggestion (user will see “Did you mean”)
- If more than one probable intents were found,
  - Score each of the 5 utterances given by the ML engine and find the highest scoring utterance against each probable Intent.
  - Score each of the alternate questions, modified questions given by the KG engine, and find the highest scoring question against each intent.
  - Rank the scores and choose the Top scoring intent as the winning intent.
  - If Topper and the immediate next intent are within the range of 2% then consider them as ambiguous.

Dialog Tasks

Machine Learning

On this Page