GETTING STARTED
Kore.ai XO Platform
Virtual Assistants Overview
Natural Language Processing (NLP)
Concepts and Terminology
Quick Start Guide
Accessing the Platform
Navigating the Kore.ai XO Platform
Building a Virtual Assistant
Help & Learning Resources
Release Notes
Current Version
Recent Updates
Previous Versions
Deprecations
Request a Feature
CONCEPTS
Design
Storyboard
Overview
FAQs
Conversation Designer
Overview
Dialog Tasks
Mock Scenes
Dialog Tasks
Overview
Navigate Dialog Tasks
Build Dialog Tasks
Node Types
Overview
Intent Node
Dialog Node
Dynamic Intent Node
GenAI Node
GenAI Prompt
Entity Node
Form Node
Confirmation Node
Message Nodes
Logic Node
Bot Action Node
Service Node
Webhook Node
Script Node
Process Node
Agent Transfer
Node Connections
Node Connections Setup
Sub-Intent Scoping
Entity Types
Entity Rules
User Prompts or Messages
Voice Call Properties
Knowledge AI
Introduction
Knowledge Graph
Introduction
Terminology
Build a Knowledge Graph
Manage FAQs
Knowledge Extraction
Import or Export Knowledge Graph
Prepare Data for Import
Importing Knowledge Graph
Exporting Knowledge Graph
Auto-Generate Knowledge Graph
Knowledge Graph Analysis
Answer from Documents
Alert Tasks
Small Talk
Digital Skills
Overview
Digital Forms
Digital Views
Introduction
Widgets
Panels
Session and Context Variables
Context Object
Intent Discovery
Train
NLP Optimization
ML Engine
Overview
Model Validation
FM Engine
KG Engine
Traits Engine
Ranking and Resolver
Training Validations
NLP Configurations
NLP Guidelines
LLM and Generative AI
Introduction
LLM Integration
Kore.ai XO GPT Module
Prompts & Requests Library
Co-Pilot Features
Dynamic Conversations Features
Intelligence
Introduction
Event Handlers
Contextual Memory
Contextual Intents
Interruption Management
Multi-intent Detection
Amending Entities
Default Conversations
Conversation Driven Dialog Builder
Sentinment Management
Tone Analysis
Default Standard Responses
Ignore Words & Field Memory
Test & Debug
Overview
Talk to Bot
Utterance Testing
Batch Testing
Conversation Testing
Conversation Testing Overview
Create a Test Suite
Test Editor
Test Case Assertion
Test Case Execution Summary
Glossary
Health and Monitoring
NLP Health
Flow Health
Integrations
Actions
Actions Overview
Asana
Configure
Templates
Azure OpenAI
Configure
Templates
BambooHR
Configure
Templates
Bitly
Configure
Templates
Confluence
Configure
Templates
DHL
Configure
Templates
Freshdesk
Configure
Templates
Freshservice
Configure
Templates
Google Maps
Configure
Templates
Here
Configure
Templates
HubSpot
Configure
Templates
JIRA
Configure
Templates
Microsoft Graph
Configure
Templates
Open AI
Configure
Templates
Salesforce
Configure
Templates
ServiceNow
Configure
Templates
Stripe
Configure
Templates
Shopify
Configure
Templates
Twilio
Configure
Templates
Zendesk
Configure
Templates
Agents
Agent Transfer Overview
Custom (BotKit)
Drift
Genesys
Intercom
NiceInContact
NiceInContact(User Hub)
Salesforce
ServiceNow
Configure Tokyo and Lower versions
Configure Utah and Higher versions
Unblu
External NLU Adapters
Overview
Dialogflow Engine
Test and Debug
Deploy
Channels
Publishing
Versioning
Analyze
Introduction
Dashboard Filters
Overview Dashboard
Conversations Dashboard
Users Dashboard
Performance Dashboard
Custom Dashboards
Introduction
Custom Meta Tags
Create Custom Dashboard
Create Custom Dashboard Filters
LLM and Generative AI Logs
NLP Insights
Task Execution Logs
Conversations History
Conversation Flows
Conversation Insights
Feedback Analytics
Usage Metrics
Containment Metrics
Universal Bots
Introduction
Universal Bot Definition
Universal Bot Creation
Training a Universal Bot
Universal Bot Customizations
Enabling Languages
Store
Manage Assistant
Team Collaboration
Plan & Usage
Overview
Usage Plans
Templates
Support Plans
Invoices
Authorization
Conversation Sessions
Multilingual Virtual Assistants
Get Started
Supported Components & Features
Manage Languages
Manage Translation Services
Multiingual Virtual Assistant Behavior
Feedback Survey
Masking PII Details
Variables
Collections
IVR Settings
General Settings
Assistant Management
Manage Namespace
Data
Overview
Data Table
Table Views
App Definitions
Data as Service
HOW TOs
Build a Travel Planning Assistant
Travel Assistant Overview
Create a Travel Virtual Assistant
Design Conversation Skills
Create an ‘Update Booking’ Task
Create a Change Flight Task
Build a Knowledge Graph
Schedule a Smart Alert
Design Digital Skills
Configure Digital Forms
Configure Digital Views
Train the Assistant
Use Traits
Use Patterns
Manage Context Switching
Deploy the Assistant
Use Bot Functions
Use Content Variables
Use Global Variables
Use Web SDK
Build a Banking Assistant
Design Conversation Skills
Create a Sample Banking Assistant
Create a Transfer Funds Task
Create a Update Balance Task
Create a Knowledge Graph
Set Up a Smart Alert
Design Digital Skills
Configure Digital Forms
Configure Digital Views
Add Data to Data Tables
Update Data in Data Tables
Add Data from Digital Forms
Train the Assistant
Composite Entities
Use Traits
Use Patterns for Intents & Entities
Manage Context Switching
Deploy the Assistant
Configure an Agent Transfer
Use Assistant Functions
Use Content Variables
Use Global Variables
Intent Scoping using Group Node
Analyze the Assistant
Create a Custom Dashboard
Use Custom Meta Tags in Filters
Migrate External Bots
Google Dialogflow Bot
APIs & SDKs
API Reference
API Introduction
Rate Limits
API List
koreUtil Libraries
SDK Reference
SDK Introduction
Web SDK
How the Web SDK Works
SDK Security
SDK Registration
Web Socket Connect and RTM
Tutorials
Widget SDK Tutorial
Web SDK Tutorial
BotKit SDK
BotKit SDK Deployment Guide
Installing the BotKit SDK
Using the BotKit SDK
SDK Events
SDK Functions
Tutorials
BotKit - Blue Prism
BotKit - Flight Search Sample VA
BotKit - Agent Transfer
  1. Home
  2. Docs
  3. Virtual Assistants
  4. Builder
  5. Knowledge Graph
  6. Knowledge Extraction

Knowledge Extraction

The Knowledge Graph Extraction service enables you to effortlessly move your enterprise’s existing Frequently Asked Questions (FAQ content) into a Knowledge Graph that trains your assistant based on these questions..

The feature supports the extraction of knowledge from unstructured content such as web pages and PDF documents as well as from structured content such as CSV files.

After completing the extraction, you can edit the question and answers using an easy-to-use interface and organize them under the relevant Knowledge Graph nodes.

The Extraction Process

Moving data using the Knowledge Extraction service to the Knowledge Graph involves the followings steps:

  1. Extracting: Extract the existing FAQ content from structured or unstructured sources of question-answer data such as PDF, web pages, and CSV files. This extraction can be done before or after creating a Knowledge Graph for the assistant you are working with.
    Note: The Knowledge Extraction service supports a specific content structure for each source type. Refer to the Supported formats section below for details.
  2. Editing: Upon successful data extraction, you can edit the questions and answer text before moving it to the Knowledge Graph.
  3. Moving: You can add data into a VA before or after creating a Knowledge Graph (KG). If you try to add the extracted content to a KG before it exists, the VA automatically creates one with the VA’s name.

The Knowledge Extractor allows you to add the extracted content to the Knowledge Graph as follows:

  • Add to Knowledge Graph moves the selected questions to the root node of the Knowledge Graph. You can use this option when the required term is not yet added to the KG or when the VA does not have a Knowledge Graph.
  • Add to Specific Term: If the VA already consists of a Knowledge Graph, you drag-drop the selected content to the required nodes.

Extract from a Website

    1. Open the VA to which you want to extract the content.
    2. Select the Build top menu item.
    3. From the left menu, click Conversational Skills > Knowledge Graph.
    4. Under the Extracts section, click Extract from URL.
    5. Enter a Name for the extraction.
    6. Enter the URL of the page, and then click Proceed.
    7. Once the extraction is completed successfully, a success status page appears.
    8. Review & Add the relevant questions to your Knowledge Graph,

Extract from a File

Note: File size must not exceed 5MB.

 

To extract content from a file, please follow the steps below. For file format details, refer to the Supported formats section of this article.

  1. Open the VA to which you want to extract the content.
  2. Select the Build top menu item.
  3. From the left menu, click Conversational Skills > Knowledge Graph.
  4. Under the Extracts section, click Extract from URL.
  5. Click Browse to locate the file (PDF or CSV).
  6. Click Proceed.
  7. For PDF files you have an option to annotate the document before extraction.
  8. After the extraction is completed successfully, a success status page is displayed.
  9. Review & Add the relevant questions to your Knowledge Graph, .

Annotate & Extract

Note: This feature has been introduced in v8.0 of the Platform.

 

You might have all the FAQs related to your business in a PDF file but not in the format mandated by the platform. Before v8.0, you can not use such files. But with the introduction of the Annotation tool, you can annotate documents identifying the key sections of the content. The Knowledge Extraction engine uses this information to extract the FAQs from the document.

Note: This is only applicable to PDF documents.

 

  1. Select a new or previously extracted PDF file. Note that you can use a previously extracted file provided no questions from that file are added to the Knowledge Graph.
  2. Click Annotate & Extract to make annotation on a newly uploaded file.

  3. The PDF document is loaded into the Annotation Tool allowing you to annotate the various sections in the document.
  4. To annotate, select the text and tag it as follows:
    1. Heading tags are used to identify questions. Headings are used to train the model to identify the questions and the content between two consecutive headings is treated as the answer for the preceding heading.
    2. Header – Text thus marked is ignored. Text marked as Headers is used to train the model to identify and ignore such text. Random marking of texts as headers must be avoided as marking texts as headers or paragraphs as the header invalidates the backend ML model, and will not produce optimal results.
    3. Footer – Text thus marked is ignored. Text marked as Footers is used to train the model to identify and ignore such text. Same as the Header, random marking of texts as footers must be avoided as marking text such as header or paragraphs as the footer invalidates the backend ML model, and will not produce optimal results.
    4. Exclude – This text is not used for extraction.
    5. Ignore Page – Pages marked as ignored are not used for extraction.
    6. You can use Remove Annotation to rectify any incorrect annotations.
  5. The Knowledge Graph Engine uses the headings, headers, and footers in the extraction process. Since the model is trained by the KG Engine, you need not annotate the entire document. You can annotate a couple of pages with headings, headers, and footers, extract and review the questions. If satisfied, you can proceed with adding questions to the Knowledge Graph, else repeat the annotation process till you get satisfactory results.
  6. Additional document information is provided:
    1. Document Info – Name, Size, and the Number of Pages of the document.
    2. Annotation Summary – Number of annotations marked for each category for the particular page and entire document.
  7. After you annotate, you can Extract the document.
  8. Once the content is extracted, you will see a message showing you how many questions have been found and allowing you to review and add them to the Knowledge Graph.
  9. Choosing to Review the questions will take you to a screen where you can review extracted FAQs. This screen splits your FAQs into: All Questions, Added to KG and Not Added to the KG.
  10. The All Questions tab gives the questions extracted by the KG Engine as per the annotations and training. Click the name of a question or check the checkbox to select multiple ones to add to the Knowledge Graph then drag and drop them to the appropriate node..
  11. If you are not satisfied with the extracted content, you can always re-annotate the document. Just click on the Annotate tab to return to the annotation tool.
  12. The same procedure mentioned above is followed for re-annotation. The following points need to be kept in mind for re-annotation:
    1. You can re-annotate the document provided no questions from this file are added to the Knowledge Graph.
    2. In case questions are already added, you can choose to create a copy of the annotated document and work with it. The copy will have all the annotations intact.

Edit the Extracted Content

  1. Open the VA.
  2. Select the Build top menu item.
  3. From the left pane, click Conversational Skills > Knowledge Graph.
  4. The Knowledge Extraction section displays the list of all extractions.
  5. Click the name of a successful extract you want to edit.
  6. Hover over the question-answer pair to modify it and click the Edit icon.
  7. Make the necessary changes and click Save.

Add the Extracted Content to the Knowledge Graph

There are two ways to add the extracted content to the Knowledge Graph.

From the Extracts Section

  1. Open the VA.
  2. Select the Build top menu item.
  3. From the left menu, click Conversational Skills > Knowledge Graph.
  4. From the Knowledge Extraction section, select the name of a successful extract you want to add.
  5. Drag and drop the required Q&A to the node/term you want to add. As you drag and drop, the child nodes will be expanded.
  6. You can select multiple Q&As and perform a bulk move.

From Knowledge Graph

  1. Open the VA.
  2. Select the Build top menu item.
  3. From the left pane, click Conversational Skills > Knowledge Graph.
  4. Select the node you want to add these Question-Answers.
  5. Click Add from Extraction. It opens the list of successful and failed extractions.
  6. Click the name of a successful extract you want to move.
  7. Select the checkboxes next to the question-answer pairs that you want to move and then click Add.

Note: Once you move a question-answer pair from the extract to the knowledge graph, you cannot move it again. The platform shows a duplicate error when you try to move a question from the extract that is already present in the collection. You can make any changes to the moved content from the knowledge graph. However, if the question is modified or removed from the knowledge graph, then the developer is allowed to add it again.

Supported Formats and Requirements

The Knowledge Extraction service supports extracting FAQs only from supported CSV, PDF, and URL formats.

Note that the file size must not exceed 5MB.

CSV

  • The Knowledge Extraction service interprets the text in the first column as a question and that in the second column as an answer.
  • The file must not have any headers.
  • The Knowledge Extraction service ignores any headers and the text present in the other columns.

PDF

  • The Knowledge Extraction service processes the content from a PDF and converts it into question-answer pairs.
  • Documents with the table of contents: Ideally a document with a table of contents is preferred. In such cases, the Knowledge Extraction service extracts the table of contents first and then uses it to parse the document and identify headings. The information present in the table of contents is used to derive the hierarchy of headings (headings, subheadings, sub-sub headings, etc.). These levels are separated by a vertical line as a delimiter (heading | subheading | sub-sub heading) as part of the extraction process.
  • Documents with no table of contents: In such cases, the Knowledge Extraction service uses a pre-trained machine learning model that identifies headings based on either font style or font size. In the case of using font size, the heading hierarchy can also be derived.
  • The text is then formatted with a uniform header and paragraph blocks.

Web Pages

The Knowledge Extraction service supports the following three different formats of FAQ web pages:

  • Plain FAQ pages with linear question-answer pairs.
  • Pages with question hyperlinks that point to answers on the same page.
  • Pages with question hyperlinks that point to answers on a different page.

Extraction of certain FAQs on the webpage fails under the following conditions:

  • The question text is split between multiple HTML tags on the FAQ page.
  • The tag applied to the answer is neither the child nor the sibling of the extracted question as per the HTML DOM structure.
  • The question does not have a hyperlink to the answer (applies to FAQs with hyperlinks).
  • When the questions hyperlink to the answer, but the question statement is not repeated above the answer (applies to FAQs with hyperlinks).

The extraction of the entire FAQ page also fails if the page consists of more than one FAQ page type mentioned above.

Knowledge Extraction

The Knowledge Graph Extraction service enables you to effortlessly move your enterprise’s existing Frequently Asked Questions (FAQ content) into a Knowledge Graph that trains your assistant based on these questions..

The feature supports the extraction of knowledge from unstructured content such as web pages and PDF documents as well as from structured content such as CSV files.

After completing the extraction, you can edit the question and answers using an easy-to-use interface and organize them under the relevant Knowledge Graph nodes.

The Extraction Process

Moving data using the Knowledge Extraction service to the Knowledge Graph involves the followings steps:

  1. Extracting: Extract the existing FAQ content from structured or unstructured sources of question-answer data such as PDF, web pages, and CSV files. This extraction can be done before or after creating a Knowledge Graph for the assistant you are working with.
    Note: The Knowledge Extraction service supports a specific content structure for each source type. Refer to the Supported formats section below for details.
  2. Editing: Upon successful data extraction, you can edit the questions and answer text before moving it to the Knowledge Graph.
  3. Moving: You can add data into a VA before or after creating a Knowledge Graph (KG). If you try to add the extracted content to a KG before it exists, the VA automatically creates one with the VA’s name.

The Knowledge Extractor allows you to add the extracted content to the Knowledge Graph as follows:

  • Add to Knowledge Graph moves the selected questions to the root node of the Knowledge Graph. You can use this option when the required term is not yet added to the KG or when the VA does not have a Knowledge Graph.
  • Add to Specific Term: If the VA already consists of a Knowledge Graph, you drag-drop the selected content to the required nodes.

Extract from a Website

    1. Open the VA to which you want to extract the content.
    2. Select the Build top menu item.
    3. From the left menu, click Conversational Skills > Knowledge Graph.
    4. Under the Extracts section, click Extract from URL.
    5. Enter a Name for the extraction.
    6. Enter the URL of the page, and then click Proceed.
    7. Once the extraction is completed successfully, a success status page appears.
    8. Review & Add the relevant questions to your Knowledge Graph,

Extract from a File

Note: File size must not exceed 5MB.

 

To extract content from a file, please follow the steps below. For file format details, refer to the Supported formats section of this article.

  1. Open the VA to which you want to extract the content.
  2. Select the Build top menu item.
  3. From the left menu, click Conversational Skills > Knowledge Graph.
  4. Under the Extracts section, click Extract from URL.
  5. Click Browse to locate the file (PDF or CSV).
  6. Click Proceed.
  7. For PDF files you have an option to annotate the document before extraction.
  8. After the extraction is completed successfully, a success status page is displayed.
  9. Review & Add the relevant questions to your Knowledge Graph, .

Annotate & Extract

Note: This feature has been introduced in v8.0 of the Platform.

 

You might have all the FAQs related to your business in a PDF file but not in the format mandated by the platform. Before v8.0, you can not use such files. But with the introduction of the Annotation tool, you can annotate documents identifying the key sections of the content. The Knowledge Extraction engine uses this information to extract the FAQs from the document.

Note: This is only applicable to PDF documents.

 

  1. Select a new or previously extracted PDF file. Note that you can use a previously extracted file provided no questions from that file are added to the Knowledge Graph.
  2. Click Annotate & Extract to make annotation on a newly uploaded file.

  3. The PDF document is loaded into the Annotation Tool allowing you to annotate the various sections in the document.
  4. To annotate, select the text and tag it as follows:
    1. Heading tags are used to identify questions. Headings are used to train the model to identify the questions and the content between two consecutive headings is treated as the answer for the preceding heading.
    2. Header – Text thus marked is ignored. Text marked as Headers is used to train the model to identify and ignore such text. Random marking of texts as headers must be avoided as marking texts as headers or paragraphs as the header invalidates the backend ML model, and will not produce optimal results.
    3. Footer – Text thus marked is ignored. Text marked as Footers is used to train the model to identify and ignore such text. Same as the Header, random marking of texts as footers must be avoided as marking text such as header or paragraphs as the footer invalidates the backend ML model, and will not produce optimal results.
    4. Exclude – This text is not used for extraction.
    5. Ignore Page – Pages marked as ignored are not used for extraction.
    6. You can use Remove Annotation to rectify any incorrect annotations.
  5. The Knowledge Graph Engine uses the headings, headers, and footers in the extraction process. Since the model is trained by the KG Engine, you need not annotate the entire document. You can annotate a couple of pages with headings, headers, and footers, extract and review the questions. If satisfied, you can proceed with adding questions to the Knowledge Graph, else repeat the annotation process till you get satisfactory results.
  6. Additional document information is provided:
    1. Document Info – Name, Size, and the Number of Pages of the document.
    2. Annotation Summary – Number of annotations marked for each category for the particular page and entire document.
  7. After you annotate, you can Extract the document.
  8. Once the content is extracted, you will see a message showing you how many questions have been found and allowing you to review and add them to the Knowledge Graph.
  9. Choosing to Review the questions will take you to a screen where you can review extracted FAQs. This screen splits your FAQs into: All Questions, Added to KG and Not Added to the KG.
  10. The All Questions tab gives the questions extracted by the KG Engine as per the annotations and training. Click the name of a question or check the checkbox to select multiple ones to add to the Knowledge Graph then drag and drop them to the appropriate node..
  11. If you are not satisfied with the extracted content, you can always re-annotate the document. Just click on the Annotate tab to return to the annotation tool.
  12. The same procedure mentioned above is followed for re-annotation. The following points need to be kept in mind for re-annotation:
    1. You can re-annotate the document provided no questions from this file are added to the Knowledge Graph.
    2. In case questions are already added, you can choose to create a copy of the annotated document and work with it. The copy will have all the annotations intact.

Edit the Extracted Content

  1. Open the VA.
  2. Select the Build top menu item.
  3. From the left pane, click Conversational Skills > Knowledge Graph.
  4. The Knowledge Extraction section displays the list of all extractions.
  5. Click the name of a successful extract you want to edit.
  6. Hover over the question-answer pair to modify it and click the Edit icon.
  7. Make the necessary changes and click Save.

Add the Extracted Content to the Knowledge Graph

There are two ways to add the extracted content to the Knowledge Graph.

From the Extracts Section

  1. Open the VA.
  2. Select the Build top menu item.
  3. From the left menu, click Conversational Skills > Knowledge Graph.
  4. From the Knowledge Extraction section, select the name of a successful extract you want to add.
  5. Drag and drop the required Q&A to the node/term you want to add. As you drag and drop, the child nodes will be expanded.
  6. You can select multiple Q&As and perform a bulk move.

From Knowledge Graph

  1. Open the VA.
  2. Select the Build top menu item.
  3. From the left pane, click Conversational Skills > Knowledge Graph.
  4. Select the node you want to add these Question-Answers.
  5. Click Add from Extraction. It opens the list of successful and failed extractions.
  6. Click the name of a successful extract you want to move.
  7. Select the checkboxes next to the question-answer pairs that you want to move and then click Add.

Note: Once you move a question-answer pair from the extract to the knowledge graph, you cannot move it again. The platform shows a duplicate error when you try to move a question from the extract that is already present in the collection. You can make any changes to the moved content from the knowledge graph. However, if the question is modified or removed from the knowledge graph, then the developer is allowed to add it again.

Supported Formats and Requirements

The Knowledge Extraction service supports extracting FAQs only from supported CSV, PDF, and URL formats.

Note that the file size must not exceed 5MB.

CSV

  • The Knowledge Extraction service interprets the text in the first column as a question and that in the second column as an answer.
  • The file must not have any headers.
  • The Knowledge Extraction service ignores any headers and the text present in the other columns.

PDF

  • The Knowledge Extraction service processes the content from a PDF and converts it into question-answer pairs.
  • Documents with the table of contents: Ideally a document with a table of contents is preferred. In such cases, the Knowledge Extraction service extracts the table of contents first and then uses it to parse the document and identify headings. The information present in the table of contents is used to derive the hierarchy of headings (headings, subheadings, sub-sub headings, etc.). These levels are separated by a vertical line as a delimiter (heading | subheading | sub-sub heading) as part of the extraction process.
  • Documents with no table of contents: In such cases, the Knowledge Extraction service uses a pre-trained machine learning model that identifies headings based on either font style or font size. In the case of using font size, the heading hierarchy can also be derived.
  • The text is then formatted with a uniform header and paragraph blocks.

Web Pages

The Knowledge Extraction service supports the following three different formats of FAQ web pages:

  • Plain FAQ pages with linear question-answer pairs.
  • Pages with question hyperlinks that point to answers on the same page.
  • Pages with question hyperlinks that point to answers on a different page.

Extraction of certain FAQs on the webpage fails under the following conditions:

  • The question text is split between multiple HTML tags on the FAQ page.
  • The tag applied to the answer is neither the child nor the sibling of the extracted question as per the HTML DOM structure.
  • The question does not have a hyperlink to the answer (applies to FAQs with hyperlinks).
  • When the questions hyperlink to the answer, but the question statement is not repeated above the answer (applies to FAQs with hyperlinks).

The extraction of the entire FAQ page also fails if the page consists of more than one FAQ page type mentioned above.

Menu