Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Current »

Feature: Crawler

The "Crawler" is an innovative feature designed to enhance your AI chatbot's capabilities by creating dynamic knowledge bases directly from web pages. This powerful tool automatically extracts and processes information from specified URLs, turning it into a structured knowledge base that the AI can use to generate responses.

Click “Get Started” below to learn how to create a knowledge base with the crawler.

https://app.supademo.com/demo/uft99H7vzuJq5zhykhLTh

To create a new knowledge base out of the one or multiple web pages you need to select the “Crawler“ mode in the “Create Knowledge Base“ page.

  1. Name: Enter a unique name for your knowledge base in the “Knowledge Base“ text field.

  2. Description: Provide a brief description of your knowledge base in the “Description” text field.

  3. URL: Input the starting URL of the web page you want the crawler to analyze in the “Initial URL” text field.

Configuring Your Crawler

  1. Bypass Settings: Decide whether to “Comply” or “Bypass” websites that normally block crawlers.

    • Comply: The crawler will respect website settings that block crawling and won’t be able to crawl blocked pages.

    • Bypass: The crawler attempts to access and crawl even those websites that have anti-crawling measures. (*bypass of the blocked pages consumes more tokens)

bypass.png

Setting Limits for Crawling

  1. Links Limit: Select a limit for the number of links the crawler will can crawl from the dropdown list. This controls the breadth of the crawl.

  2. Time Limit: Choose a time limit for the crawling process from the dropdown list. This defines how long the crawler will be able to crawl.

Please note that the crawler will stop once it reaches either the set link limit or the time limit, whichever reach first.

For example: if the links limit set to 50000 and time limit set to 5 minutes, the crawler will stop after 5 minutes. Or vice versa, if the links limit set to 10 and time limit set to 20 hours, the crawler will stop after 10 links.

Starting the Crawler

  • Once you've configured the settings, click the “Start” button to initiate the crawling process.

crawling.mov
  • Progress Tracking: You can monitor the crawler's progress in real-time, including the completion percentage and pages crawled.

  • Completion: Once the crawling is completed, the data will be processed and integrated into your chatbot’s knowledge base. You can check your knowledge base by sending the message in the “Ask a question“ chat window

  • Queue: If the application is overloaded by big amount of crawling processes, you’ll need to wait in the queue to start your crawling process.

The “Crawler” feature is a versatile and robust tool that significantly enhances your AI chatbot's ability to access and deliver accurate information.


Seeing the Data You’ve Stored in a Crawler Knowledge Base

In Await Cortex, you can view the data you’ve crawled after creating a knowledge base with the crawler.

Video Example:

See Data Inside a Crawled KB.mp4

Instructions

  1. Click a crawler knowledge base

    image-20240701-230555.png
  2. Click the expand button

    image-20240701-230835.png
  3. Click the data symbol on one of the URLs.

    image-20240701-230950.png
  4. Now you can see the data that has been stored from the crawled URL.

  5. Click the URL dropdown to switch between different crawled URLs.

    image-20240701-231516.pngimage-20240701-231432.png


Updating a Crawler Knowledge Base

You can re-crawl previously crawled knowledge bases with the click of a button

Instructions

  1. Click a crawler knowledge base

    image-20240701-230555.png
  2. Click the update button on your knowledge base

    image-20240701-235007.png

  3. Agree to the disclaimer and click “update“

    image-20240701-235117.png

  • No labels