You can get started quickly with HTML files that you download from the website.

This can be particularly useful if you are looking to download a website’s contents and provide a natural language interface on top.

Installing via Python

You can install Twilix's BlitzChain package in Python running the following pip command:

pip install -U blitzchain

Once you have installed it in Python, you need to insert them. You can get your API key from app.twilix.io.

API_KEY = "YOUR_API_KEY"
from blitzchain import Client 
client = Client(API_KEY)
collection = client.Collection("htmlExample")

Processing

You can then insert a locally saved HTML file using just a few lines of code.

html_file = 'example.html'
with open(html_file) as f:
    html_content = f.read()

Inserting Data

Twilix provides support for inserting complex data types like HTML. We handle parsing, splitting, indexing.

# This runs in the background in our servers where we handle splitting, parsing for you
# The metadata is also stored alongside the document and is flattened
# We automatically extract the title for you if you only provide the html
collection.insert_html(html_content, metadata={"url": "https://example.com/index/", "insert_date": "22-04-21"})

# Titles are used to provide clean reference titles
# You can insert a title using the code below
collection.insert_html(
    html_content,
    metadata={"url": "https://example.com/index/", "insert_date": "22-04-21"},
    title="Sample Index HTML"
)

You will get:

{'success': True, 'results': []}

You can then check the size of your collection to ensure it has been properly inserted.

collection.count()

This will return something similar to:

{'count': 135}

Launching dashboard

You can then launch the dashboard using the following

collection.launch_dashboard(
    name="Example Website",
    description="Source: https://example.com/index/"
)