Crawler
Feature: Crawler
The "Crawler" is an innovative feature designed to enhance your AI chatbot's capabilities by creating dynamic knowledge bases directly from web pages. This powerful tool automatically extracts and processes information from specified URLs, turning it into a structured knowledge base that the AI can use to generate responses.
Click “Get Started” below to learn how to create a knowledge base with the crawler.
To create a new knowledge base out of the one or multiple web pages you need to select the “Crawler“ mode in the “Create Knowledge Base“ page.
Name: Enter a unique name for your knowledge base in the “Knowledge Base“ text field.
Description: Provide a brief description of your knowledge base in the “Description” text field.
URL: Input the starting URL of the web page you want the crawler to analyze in the “Initial URL” text field.
Configuring Your Crawler
Bypass Settings: Decide whether to “Comply” or “Bypass” websites that normally block crawlers.
Comply: The crawler will respect website settings that block crawling and won’t be able to crawl blocked pages.
Bypass: The crawler attempts to access and crawl even those websites that have anti-crawling measures. (*bypass of the blocked pages consumes more tokens)
Setting Limits for Crawling
Links Limit: Select a limit for the number of links the crawler will can crawl from the dropdown list. This controls the breadth of the crawl.
Time Limit: Choose a time limit for the crawling process from the dropdown list. This defines how long the crawler will be able to crawl.
Please note that the crawler will stop once it reaches either the set link limit or the time limit, whichever reach first.
For example: if the links limit set to 50000 and time limit set to 5 minutes, the crawler will stop after 5 minutes. Or vice versa, if the links limit set to 10 and time limit set to 20 hours, the crawler will stop after 10 links.
Starting the Crawler
Once you've configured the settings, click the “Start” button to initiate the crawling process.
Progress Tracking: You can monitor the crawler's progress in real-time, including the completion percentage and pages crawled.
Completion: Once the crawling is completed, the data will be processed and integrated into your chatbot’s knowledge base. You can check your knowledge base by sending the message in the “Ask a question“ chat window
Queue: If the application is overloaded by big amount of crawling processes, you’ll need to wait in the queue to start your crawling process.
The “Crawler” feature is a versatile and robust tool that significantly enhances your AI chatbot's ability to access and deliver accurate information.
Seeing the Data You’ve Stored in a Crawler Knowledge Base
In Await Cortex, you can view the data you’ve crawled after creating a knowledge base with the crawler.
Video Example:
Instructions
Click a crawler knowledge base
Click the expand button
Click the data symbol on one of the URLs.
Now you can see the data that has been stored from the crawled URL.
Click the URL dropdown to switch between different crawled URLs.
Updating a Crawler Knowledge Base
Instructions
Click a crawler knowledge base
Click the update button on your knowledge base
Agree to the disclaimer and click “update“