Using patterns can help to improve NLP interpreter accuracy.
In this document, we will elaborate on the various pattern syntax and how they can be used in intent detection and entity extraction.
Things to Remember:
- Patterns are to be used as a last resort, only for cases where ML engine cannot be used. Examples of such cases would be to train bot in recognising idiomatic utterances, command like utterances.
- Patterns are evaluated in the order of their listing. Once a match is found the rest of the patterns are not evaluated. So ensure when adding patterns to add in the order of most restrictive to least restrictive.
- Only one wildcard (*) is allowed in a pattern.
- While most of the features are supported in all languages, there are some exceptions, see here for more details.
- Use a minimum of 3 words.
- Use words in their canonical forms (i.e. infinitive verbs, singular nouns).
- Use lowercase both for words and their synonyms.
- Use the US spelling of words (i.e. normalize instead of normalise).
- Avoid using determiners and pronouns (the, a, my, that).
- Avoid using digits.
- Avoid using entity values in defining a task pattern.
- Don’t use elision (i.e. what’s ).
- Don’t use special characters such as () & / \ $ [ ] + *.
- Don’t use punctuation such as – , . ! ? ‘ “.
Patterns for Intent Detection
Following is a list of pattern syntax, along with examples, that can be configured for intent detection.
Note: Pattern matching occurs in the canonical form of a sentence. Therefore, the words in a pattern should be in their canonical form.
Pattern | Description | Pattern Examples | |||||||
---|---|---|---|---|---|---|---|---|---|
word1 word2 … wordn |
This mandates all the words defined to be available in the user utterance in the same consecutive order with upto 3 (language specific) additional words allowed between any two consecutive words mentioned in the pattern and infinite number of words before and after those specified set of words. Note: The three wildcard words are configurable as advanced NLP options. |
|
|||||||
word1_word2 |
Compound words: A compound word is treated as one word and that impacts how the canonical form is constructed. No additional words allowed in between word1 and word2. This is to ensure that a sequence of tokens are read as a phrase. Usage is restricted to words, concepts not allowed. Note: There should be no space between the word1, word2 and _. |
|
|||||||
_word1 |
_word1 ensures that the word1 in the user utterance is not marked as Used Up by the platform and is to be considered for entity extraction. This is useful when entity words are used in the intent pattern. For e.g., the pattern (buy ~number ticket) will match “buy 2 tickets for the show at 7”; each of the three pattern words internally will be tracked as used up, but a ticket number entity will first consider “7” and not “2” because “7” is not used up. If the pattern is changed to (buy _~number ticket), then “2” is still matched for the intent pattern but the word is not marked as used up and the entity would consider it, hence the leading underscore is useful for pattern tokens that constitute important data. |
|
|||||||
word1 *n word2 | Exactly n number of additional words between the specified words/phrases |
|
|||||||
word1 *~n word2 | Up to n number of additional words between the specified words/phrases. Note: FM engine automatically generates variations with this wildcard and unless it is a special scenario, developers do not further perform any action. An advanced NLP configuration setting allows developers to change the default number of possible wildcards between tokens. |
|
|||||||
word1 *0 word2 | To disable wildcards between two tokens. Similar to the underscore between two words but can be used between two concepts or within [ ], { } groups. (available 7.1 onwards). Note: If a phrase needs to be treated as an idiom or a complete unit, then instead of using an _ or the *0 syntax, the phrase can be used in concept. The advantages of this usage are having a specific sequence of words, correct canonical handling, easy reuse, and better performance. |
|
|||||||
< word1 word2 |
Indicates the match for word1 should start from the beginning of a sentence. Add a space after the angular bracket. Note: ‘<’ indicates the start of the sentence. The next token would match the first word. |
|
|||||||
word1 word2 > |
Indicates that the next word or phrase after detecting the exact match is the end of the sentence. Add a space before closing the angular bracket |
|
|||||||
!abc |
Indicates the word/concept “abc” should not exist anywhere in the user utterance after this token No space between ! and word/concept |
|
|||||||
!!abc | The very next word/concept should not be “abc” No space between !! and word/concept. |
|
|||||||
[ … ] | Used to define a group of words/concepts and the match should be against exactly one of the group declared in [ ]. Be aware that when a match is found the rest of the group is ignored, so order the words accordingly. Note: The brackets should not be clubbed with the word, i.e. maintain a space between the parenthesis and the adjacent word. |
|
|||||||
{ … } | Used to define an optional group or words/concepts and the match would be against zero or one of the words/patterns declared in { }. Be aware that when a match is found, rest of the group is ignored, so order the words accordingly. Note: The brackets should not be clubbed with the word, i.e. maintain a space between the parenthesis and the adjacent word. |
|
|||||||
( … ) | Contains a sub-pattern i.e. when a pattern or part of a pattern is enclosed in these parentheses, we treat it as a pattern unlike [ ] and { }. This is the default setting i.e. when a pattern word1 word2 it is treated as ( word1 word2 ) Commonly used explicitly to define sub pattern inside [ ] or { } |
|
|||||||
<< … >> |
Used to find words in any order anywhere in the sentence. Due to the risk of running into false positives, you are advised not to use this pattern. |
|
|||||||
‘word1 | If you quote words or use words that are not in canonical form, the system will restrict itself to what you used in the pattern. |
|
|||||||
word1~concept2 ~concept1~concept2 (from ver8.0) |
A word (word1) or concept (concept1) can be matched only if it is also a member of another concept (concept2). The most common usage of this is through the system concepts that are dynamically added for each POS tag. |
|
|||||||
word1 * word2 | 0 to infinite number of additional words between the specified words/phrases. |
|
Pattern Operators
- AND: ( X Y ): An ordered relationship of words in sequence. This is the default setting. i.e. when you specify a pattern as cancel order it is the same as (cancel order).
For example, (Cancel Order) matches Cancel my phone order but doesn’t match I have a pending order for an iPhone X, can I cancel. Bot Builder tool uses patterns with increasing numbers of wildcards between words (up to 3 for an intent). So a pattern of Cancel Order can match:- cancel order
- cancel my order
- cancel that last order
- cancel last weeks big order
- OR: [X Y Z]: Any of these can be interchangeably used in the user utterance. For example, ([get make] me [food drink dessert]) will match any of the below utterances:
- Get me food
- Make me a drink
- Get me a drink
- Get me a dessert
- Make me some quick food
- NOT: !X: Words that should not appear in the user utterance for an intent match. For example, (!forecast) is marked as a pattern for intent named Get current weather and the bot supports another intent called Get 3-day weather forecast.
- User utterance: Planning a trip to California get me the forecast
- will not match Get current weather
- will match Get 3-day weather forecast
Note that the !word means not after this point. So (!forecast the weather) and (get the weather !forecast) are different. The utterance get the forecast for the weather matches the second but not the first.
- User utterance: Planning a trip to California get me the forecast
- Optional: {X}: For example, {phone} If the user utterance is Get me a phone number or get me a number the platform will treat it equally.
- Enforce Phrase: X_Y: To enforce occurrence of the phrase as is in the user utterance, without any words in between. For example, transfer_funds. The utterance transfer funds or I want to transfer funds will match but not Can I transfer some funds.
- Concepts: ~: Platform has a large set of inbuilt concepts that developers can use to define a pattern. For example, (I [like love] ~world_country) will match
- I like India
- I love traveling to Australia
- I would like to visit an African country
- Unordered: <<, >>: Used to find words in any order. For example, <<Cancel Order>> matches Cancel my phone order and also I have a pending order for an iPhone X, can I cancel
- Start/End of Statement: <, >: For example, ( transfer fund > ) will match I want to transfer funds but will not match transfer funds today.
- Quote: ‘ –: If you quote words or use words that are not in canonical form, the system will restrict itself to what you used in the pattern. For example, (like to transfer funds) This matches I would like to transfer funds from my account but not I really liked transfer funds process.
Negative Patterns
Negative Patterns can be used to eliminate intents detected in the presence of a phrase. This will help filter the matched intents for false positives.
User Utterance: “I was transferring funds when I got network failure error”
Intent Detected: Transfer Funds
Intended Intent: Register Complaint
Add a Negative Pattern (network failure) (error) (technical issue) for the intent Transfer Funds
User Utterance: “I was transferring funds when I got network failure error”
or “I was transferring funds when I faced a technical issue”
or “I got an error during transfer funds process.”
Intent Rejected: Transfer Funds
Intent Triggered: Register Complaint
Patterns for Entity Extraction
Patterns can be used to identify the values for entities in user utterance based upon their position and occurrence in user utterance.
Intent patterns operators like {…}, […], !, ~concepts can be used for entity extraction. The following are some use cases how the patterns can be applied.
Every entity pattern has to include a * (of some form) to represent where the platform should look for an entity value.
Continuing with the Banking Bot example with Transfer Funds intent. This intent needs two entities – ToAccount and FromAccount. We will see how to achieve this.
Pattern 1: word1 * word2
This can be used as a positional wildcard that indicates the expected position of the entity.
Pattern for ToAccount entity: to * from
User Utterance: Transfer funds to ABC123 from my account.
Entity Extracted: ToAccount = ABC123
User Utterance not resulting in entity extraction: “transfer funds for ABC123 from my account”
Pattern 2: word1 *n
This can be used as a positional wildcard * that indicates the expected position of the entity based upon the number of words after the specified word1. That is, n words after the word1 are to be considered for the entity, if n words are not present then look for the next occurence of word1.
Pattern for ToAccount entity: from *2
User Utterance: Transfer funds to ABC123 from my account.
Entity Extracted: FromAccount = my account
User Utterance: Transfer funds to ABC123 from XYZ321 that is from my account.
Entity Extracted: FromAccount = my account
User Utterance not resulting in entity extraction: “transfer funds to ABC123 using my account”
Extension to Pattern 2: word1 *~n
Similar to above (pattern 2) but extracts up to n number, if that number of words are available. Note that entities need to extract something so *~1 is really the same as *1.
Pattern 3: a combination of word1 * word2 and word3 *n
This can be used as a combination of patterns for the likely location in the user utterance that the entity value could be found and the number of words contributing to the entity.
Pattern for ToAccount entity: “to * from” and “from to *1”
Pattern for FromAccount entity: “from * to” and “to from *2”
User Utterance: Transfer funds to ABC123 from my account.
or Transfer funds from my account to ABC123.
Entity Extracted: ToAccount = ABC123 and FromAccount = my account
User Utterance not resulting in entity extraction: “transfer funds for ABC123 using my account”
Pattern 4: [ word1 word2 ] *
This can be for patterns using a group of words or concepts of which at least one should be present in the utterance. The order within the group is important (see above in intent detection for details).
Pattern for ToAccount entity: “to * [ using from ]” and “[ using from ] to *1”
Pattern for FromAccount entity: “[ using from ] * to” and “to [ using from ] *”
User Utterance: Transfer funds to ABC123 from my account.
or Transfer funds using my account to ABC123.
Entity Extracted: ToAccount = ABC123 and FromAccount = my account
User Utterance not resulting in entity extraction: “transfer funds for ABC123 using my account”
Pattern 5: ~CustomConcept *
This can be for using concepts. You can create your own custom concepts and use them to define patterns.
Pattern for ToAccount entity: “to * from” and “from to *”
Pattern for FromAccount entity: “~in * to” and “to ~in *”
Custom Concept: ~in – (using) (from)
User Utterance: Transfer funds to ABC123 using my account.
or Transfer funds from my account to ABC123.
Entity Extracted: ToAccount = ABC123 and FromAccount = my account
User Utterance not resulting in entity extraction: “transfer funds to ABC123 of my account“
Pattern 6: ~intent
Useful in entity patterns and custom entities
Words that are used in the intent identification are dynamically marked with the ~intent concept. This can then be used as an anchor or reference point for some entity patterns.
Sample Pattern: “~intent~meeting~plural“
User Utterance not resulting in entity extraction: show my meetings.
User Utterance might mark the entity: “schedule a presentation called Meeting the Sales Goals“
Pattern 7: $currentEntity
Useful in delaying the evaluation of a pattern until the entity is actually processed. Normally entity patterns are evaluated when a dialog starts and on new input to see if any words need to be protected until that entity is processed. This might not always desirable, especially for strings.
Pattern: “$currentEntity=TaskTitle ‘called *“
The above rule will result in evaluating the pattern when the dialog flow has reached the TaskTitle node.