Using patterns can help to improve NLP interpreter accuracy.
In this document, we will elaborate on the various pattern syntax and how they can be used in intent detection and entity extraction.
Things to Remember:
- Patterns are evaluated in the order of their listing. Once a match is found the rest of the patterns are not evaluated. So ensure when adding patterns to add in the order of most restrictive to least restrictive.
- Only one wildcard (*) is allowed in a pattern.
- While most of the features are supported in all languages, there are some exceptions, see here for more details.
Patterns for Intent Detection
Following is a list of pattern syntax, along with examples, that can be configured for intent detection.
Pattern | Description | Pattern Examples | ||||||
---|---|---|---|---|---|---|---|---|
word1 word2 … wordn | This mandates all the words defined to be available in the user utterance in the same consecutive order with upto 3 (language specific) additional words allowed between any two consecutive words mentioned in the pattern and infinite number of words before and after those specified set of words. |
|
||||||
word1_word2 | Enforce phrase, no additional words allowed in between word1 and word2. This is to ensure a sequence of tokens are read as a phrase. Usage restricted to words, concepts not allowed. Note: There should be no space between the word1, word2 and _. Also be aware that “_word1” is to ensure that the word1 in the user utterance is not marked as Used Up by the platform and is to be considered for entity extraction. This is useful when entity words are used in the intent pattern. |
|
||||||
word1 * word2 | 0 to infinite number of additional words between the specified words/phrases |
|
||||||
word1 *n word2 | Exactly n number of additional words between the specified words/phrases |
|
||||||
word1 *0 word2 | To disable wildcards between two tokens. Similar to the underscore between two words but can be used between two concepts or within [ ], { } groups. (available 7.1 onwards) |
|
||||||
word1 < word2 | Indicates the match for word2 should start from the beginning of a sentence. It is useful especially when the word2 appears in the middle of the utterance. Add a space after the angular bracket |
|
||||||
word1 > word2 | Indicates the end of the sentence and no words are allowed after it. Add a space before closing the angular bracket |
|
||||||
!abc | Indicates the word/concept “abc” should not exist anywhere in the user utterance after this token No space between ! and word/concept |
|
||||||
!!abc | The very next word/concept should not be “abc” No space between !! and word/concept |
|
||||||
[ … ] | Used to define a group of words/concepts and the match should be against exactly one of the group declared in [ ]. Be aware that when a match is found the rest of the group is ignored, so order the words accordingly. Note: the parentheses should not be clubbed with the word, i.e maintain a space between the parenthesis and the adjacent word. |
|
||||||
{ … } | Used to define a optional group or words/concepts and the match would be against zero or one of the words/patterns declared in { }. Be aware that when a match is found rest of the group is ignored, so order the words accordingly. Note: the parentheses should not be clubbed with the word, i.e maintain a space between the parenthesis and the adjacent word. |
|
||||||
( … ) | contain a pattern i.e when a pattern or part of a pattern is enclosed in these parentheses, we treat it as a pattern unlike [ ] and { }. This is the default setting i.e. when a pattern word1 word2 it is treated as ( word1 word2 ) Commonly used explicitly to define sub pattern inside [ ] or { } |
|
||||||
<< … >> | Used to find words in any order |
|
||||||
‘word1 | If you quote words or use words that are not in canonical form, the system will restrict itself to what you used in the pattern |
|
||||||
word1~concept2 ~concept1~concept2 (from ver8.0) |
A word (word1) or concept (concept1) can be matched only if it is also a member of another concept (concept2). The most common usage of this is through the system concepts that are dynamically added for each POS tag. |
|
Negative Patterns
Negative Patterns can be used to eliminate intents detected in the presence of a phrase. This will help filter the matched intents for false positives.
User Utterance: “I was transferring funds when I got network failure error”
Intent Detected: Transfer Funds
Intended Intent: Register Complaint
Add a Negative Pattern (network failure) (error) (technical issue) for the intent Transfer Funds
User Utterance: “I was transferring funds when I got network failure error”
or “I was transferring funds when I faced a technical issue”
or “I got an error during transfer funds process.”
Intent Rejected: Transfer Funds
Intent Triggered: Register Complaint
Patterns for Entity Extraction
Patterns can be used to identify the values for entities in user utterance based upon their position and occurrence in user utterance.
Intent patterns operators like {…}, […], !, ~concepts can be used for entity extraction. The following are some use cases how the patterns can be applied.
Every entity pattern has to include a * (of some form) to represent where the platform should look for an entity value.
Continuing with the Banking Bot example with Transfer Funds intent. This intent needs two entities – ToAccount and FromAccount. We will see how to achieve this.
Pattern 1: word1 * word2
This can be used as a positional wildcard that indicates the expected position of the entity.
Pattern for ToAccount entity: to * from
User Utterance: Transfer funds to ABC123 from my account.
Entity Extracted: ToAccount = ABC123
User Utterance not resulting in entity extraction: “transfer funds for ABC123 from my account”
Pattern 2: word1 *n
This can be used as a positional wildcard * that indicates the expected position of the entity based upon the number of words after the specified word1. That is, n words after the word1 are to be considered for the entity, if n words are not present then look for the next occurence of word1.
Pattern for ToAccount entity: from *2
User Utterance: Transfer funds to ABC123 from my account.
Entity Extracted: FromAccount = my account
User Utterance: Transfer funds to ABC123 from XYZ321 that is from my account.
Entity Extracted: FromAccount = my account
User Utterance not resulting in entity extraction: “transfer funds to ABC123 using my account”
Extension to Pattern 2: word1 *~n
Similar to above (pattern 2) but extracts up to n number, if that number of words are available. Note that entities need to extract something so *~1 is really the same as *1.
Pattern 3: a combination of word1 * word2 and word3 *n
This can be used as a combination of patterns for the likely location in the user utterance that the entity value could be found and the number of words contributing to the entity.
Pattern for ToAccount entity: “to * from” and “from to *1”
Pattern for FromAccount entity: “from * to” and “to from *2”
User Utterance: Transfer funds to ABC123 from my account.
or Transfer funds from my account to ABC123.
Entity Extracted: ToAccount = ABC123 and FromAccount = my account
User Utterance not resulting in entity extraction: “transfer funds for ABC123 using my account”
Pattern 4: [ word1 word2 ] *
This can be for patterns using a group of words or concepts of which at least one should be present in the utterance. The order within the group is important (see above in intent detection for details).
Pattern for ToAccount entity: “to * [ using from ]” and “[ using from ] to *1”
Pattern for FromAccount entity: “[ using from ] * to” and “to [ using from ] *”
User Utterance: Transfer funds to ABC123 from my account.
or Transfer funds using my account to ABC123.
Entity Extracted: ToAccount = ABC123 and FromAccount = my account
User Utterance not resulting in entity extraction: “transfer funds for ABC123 using my account”
Pattern 5: ~CustomConcept *
This can be for using concepts. You can create your own custom concepts and use them to define patterns.
Pattern for ToAccount entity: “to * from” and “from to *”
Pattern for FromAccount entity: “~in * to” and “to ~in *”
Custom Concept: ~in – (using) (from)
User Utterance: Transfer funds to ABC123 using my account.
or Transfer funds from my account to ABC123.
Entity Extracted: ToAccount = ABC123 and FromAccount = my account
User Utterance not resulting in entity extraction: “transfer funds to ABC123 of my account”
Pattern 6: ~intent
Useful in entity patterns and custom entities
Words that are used in the intent identification are dynamically marked with the ~intent concept. This can then be used as an anchor or reference point for some entity patterns.
Sample Pattern: “~intent~meeting~plural”
User Utterance not resulting in entity extraction: show my meetings.
User Utterance might mark the entity: “schedule a presentation called Meeting the Sales Goals”
Pattern 7: $currentEntity
Useful in delaying the evaluation of a pattern until the entity is actually processed. Normally entity patterns are evaluated when a dialog starts and on new input to see if any words need to be protected until that entity is processed. This might not always desirable, especially for strings.
Pattern: “$currentEntity=TaskTitle ‘called *”
The above rule will result in evaluating the pattern when the dialog flow has reached the TaskTitle node.