Chatbot tasks can be broken down to a few words that describe what a user intends to do, usually a verb and a noun such as Find an ATM, Create an event, Search for an item, Send an alert, and Transfer fund.
Most products only use Machine Learning (ML) for natural language processing. The drawback of just using machine learning to train bots is that it needs a vast amount of data. With ML you must provide a collection of sentences(utterances) that match a chatbot’s intended goal (and eventually a selection of sentences that do not). With such approach, the bot itself does not inherently understand an input sentence. Instead, it tries to measure how similar the data input is to what it already knows. An ML-only approach can also be inaccurate because it requires extensive training of a bot for high success rates. Our approach combines Fundamental Meaning (FM) with Machine Learning(ML) to make it easy to build Natural language capable chatbots – whether or not rich training data is available.
Our NLP engine analyzes the structure of a user’s utterance to identify each word by meaning, position, conjugation, capitalization, plurality, and other factors. The analysis helps the chatbot correctly interpret and understand obvious and non-obvious synonyms for these common “action” words.
The goal of intent recognition isn’t just to match an utterance with a task, it’s to match an utterance with its correctly intended task. We do this by matching verbs and nouns with as many obvious and non-obvious synonyms as possible
In doing so, enterprise developers can solve real-world dynamics and gain the inherent benefits of both approaches, while eliminating the shortcomings of the individual methods.
ML Model Training
Developers need to provide sample utterances for each intent (task) the bot needs to identify to train the machine learning model. The platform ML engine will build a model that will try to map a user utterance to one of the bot intents. Learn more about adding utterances.
ML Training Recommendations
- Give a balanced training: for all the intents that the bot needs to detect, add approximately the same number of sample utterances. A skewed model may result in skewed results.
- Provide at least 8-10 sample utterances against each intent. The model with just 1-2 utterances will not yield any machine learning benefits. Ensure that the utterances are varied and you do not provide variations that use the same words in a different order.
- Avoid training common phrases that could be applied to every intent, for example, “I want to”. Ensure that the utterances are varied for larger variety and learning.
- After every change, train the model and check the model. Ensure that all the dots in the ML model are diagonal (in the True-positive and True-negative) quadrant and you do not have scattered utterances in other quadrants. Train the model until you achieve this.
- Regularly train the bot with new utterances.
- Regularly review the failed or abandoned utterances and add them to utterance list against a valid task or intent.
Kore.ai’s Bots Platform allows fully unsupervised machine learning to constantly expand the language capabilities of your chatbot – without human intervention. Unlike unsupervised models in which chatbots learn from any input – good or bad – the Kore.ai Bots Platform enables chatbots to automatically increase their vocabulary only when the chatbot successfully recognizes the intent and extracts the entities of a human’s request to complete a task.
However, we recommend keeping Supervised learning enabled to monitor the bot performance and manually tune where required. Using the bots platform, developers can evaluate all interaction logs, easily change NL settings for failed scenarios, and use the learnings to retrain the bot for better conversations.
Fundamental Meaning Model Training
The fundamental meaning model creates a form of the input with the canonical version of each word in the user utterance, so it converts verbs into their infinitive, nouns are made singular, numbers become digits. The intent recognition process then uses this canonical form for matching. The original input form is still available and is referenced for certain entities like proper names where there isn’t a canonical form. The Fundamental Meaning model considers parts of speech and inbuilt concepts to identify each word in the user utterance and relate it with the intents the bot can perform. The scoring is based on the number of words matched, total word coverage and more.
The platform provides the following tools to train the Fundamental Meaning engine:
- Patterns: Using Patterns you can define slang, metaphors, or other idiomatic expressions for task names. Learn more about patterns along with examples.
- Synonyms: The Platform includes a built-in synonym library for common terms. Developers can further optimize the accuracy of the NLP engine by easily adding synonyms for bot names, words used in the names of your tasks and task fields, and any words associated with your dialog task entity node. The platform auto-corrects domain words unless they are specially trained, for example, Paracetamol, IVR. Learn more about synonyms along with examples.
The platform also facilitates a Default Dialog option which initiates automatically if the platform fails to identify an intent from the user utterance. Developers can modify the dialog based on the bot requirement. We also provide the ability for a human reviewer (developer, customer, support personnel, and more) to passively review every user utterance and mark the ones that need further training. Once trained, the bot recognizes the utterances based on the newly trained model.
NLP Intent Detection Training Recommendations
- If there are a good number of sample utterances, try training the bot using Machine Learning approach first, before trying to train the fundamental meaning model.
- Define bot synonyms to build a domain dictionary such as pwd for password; SB for savings bank account.
- After every change to the model training, run the batch testing modules. Test suites are a means to perform regression testing of your bot’s ML model.
Important Notes on Test Suites:
- An optimal approach to bot NLP training is to first create a test suite of most of the use cases(user utterances) that the bot needs to identify, run it against the model and start training for the ones that failed.
- Create/update batch testing modules for high usage utterances.
- Publish the trained model only after detailed testing.
- When naming the intent, ensure that the name is relatively short (3-5 words) and does not have special characters or words from the Stop Word list. Try to ensure the intent name is close to what the users request in their utterance.
Bot Ontology and Knowledge Graph Training
Kore.ai Knowledge Task uses a combination of machine learning techniques and knowledge graph that helps you turn your static FAQ text into intelligent, personalized conversational experience. It goes beyond the usual practice of capturing FAQs in the form of flat question-answer pairs. Instead, the Knowledge Graph enables you to create an ontological structure of key domain terms and associate them with context-specific questions and their alternatives, synonyms, and Machine learning-enabled classes.
Build and Train a Knowledge Graph:
Follow these steps to build and training a Knowledge Graph:
- Identify terms by grouping the unique words in each FAQ question.
- Build an ontology based on the all such unique words.
- Define synonyms each term in the ontology. Ensure that all the different ways to call the term are defined.
- Depending on the importance of each in a path, mark them as either mandatory or regular.
- Define alternative questions for each FAQ to ensure better coverage.
- Associate classes to terms to filter based on multiple identified results.
FAQ Detection Steps
The following steps give you an overview of the process in which the KG engine shortlists the questions in a Knowledge Graph:
- Extract Nodes: The KG engine processes the user utterance to extract the term (ontology nodes) present in the graph. It also takes into consideration the synonyms, classes, and tags associated with the terms.
- Query Graph: The KG engine fetches all the paths that consist of the extracted nodes.
- Shortlist Paths: All the paths consisting of 50% or more matching terms with the user utterance are shortlisted for further processing. For example, the engine shortlists a path with four nodes such as Personal Banking → Joint Account → Add → Account Holder if at least two of these terms occur in the user utterance.
Note: Patch coverage computation doesn’t consider the root node.
- Filter with Classes: If you define any classes in the Knowledge Graph, paths shortlisted in above step are further filtered based on the confidence score of a classification algorithm in user utterance.
- Send to Ranker: The KG engine then sends the shortlisted paths to the Ranker Program.
- Score based on Cosine Similarity: Ranker makes use of user-defined synonyms, lemma forms of word, n-grams, stop words, to compute the cosine similarity between user utterance and the shortlisted questions. Paths are ranked in non-increasing order of cosine similarity score.
- Qualify Matches: The ranker then qualifies the paths as follows:
- Paths with score >= upper_threshold are qualified as an answer (definitive match).
- Paths with lower_threshold < score < upper_threshold are marked as suggestion (probable match).
- Paths with score < lower_threshold are ignored.
Ranking and Resolver
All the three Kore.ai engines finally deliver their findings to the Kore.ai Ranking and Resolver component as either exact matches or probable matches. Based on the ranking and resolver, the winning intent between the engines is ascertained. If the platform finds ambiguity, then an ambiguity dialog is initiated. The platform initiates one of these two system dialogs when it cannot ascertain a single winning intent for a user utterance :
- Disambiguation Dialog: Initiated when there are more than one Definitive matches returned across engines. In this scenario, the bot asks the user to choose a Definitive match to execute. You can customize the message shown to the user from the NLP Standard Responses.
- Did You Mean Dialog: Initiated if the Ranking and Resolver returns more than one winner or the only winning intent is an FAQ whose KG engine score is between lower and upper thresholds. This dialog lets the user know that the bot found a match to an intent that it isn’t entirely sure about and would like the user to select to proceed further. In this scenario, the developer should identify these utterances and train the bot further. You can customize the message shown to the user from the NLP Standard Responses.