Building a PDF Chatbot with Twilix is simple

In this tutorial, we will be using

Using Unstructured

Unstructured is an open-source data-processing pipeline. It provides clean abstractions to partition various types of unstructured data such as PDF, Audio.

To install unstructured, just run:

pip install "unstructured[local-inference]"

You can then parse in a PDF simply by running the following:

from unstructured.partition.pdf import partition_pdf
from unstructured.staging.base import convert_to_dict

# Returns a List[Element] present in the pages of the parsed pdf document
elements = partition_pdf("example-docs/layout-parser-paper-fast.pdf")

# Applies the English and Swedish language pack for ocr. OCR is only applied
# if the text is not available in the PDF.
elements = partition_pdf("example-docs/layout-parser-paper-fast.pdf", ocr_languages="eng+swe")
objects = convert_to_dict(elements)
# Convert everything into a string to make sure it can be indexed properly
for o in objects:
    for k, v in o.items():
        o[k] = str(v)

Ingesting into Twilix

You can then ingest into Twilix using the following:

# You can ingest into Twilix using the following
from blitzchain import Client
client = Client("<YOUR_API_KEY>")
collection = client.Collection("sampleCollection")
collection.insert_objects(objects)

Using Twilix for Ingesting PDFs

If you are looking for a hosted solution with more complex chunking, indexing, formatting included solution, I highly recommend using the Insert PDF endpoint.

You can integrate this with BlitzChain (our Python SDK) or explore our API docs. You can get your API key from https://app.twilix.io/

We support local PDFs through our frontend - which you can upload on our dashboard.

from blitzchain import Client
client = Client("<YOUR_API_KEY>")
collection = client.Collection("sampleCollection")
collection.insert_pdf("pdf_url_here")

Insert PDF

See how you can insert a pdf easily from this endpoint!

If the unstructured solution does not provide great results, I recommend trying us out on Twilix.

Using Twilix for asking questions

You can then ask it generative QA questions using the Twilix

collection.generative_qa(
    user_input="Summarize this",
    prompt_fields=["content"]
)

Conversing with PDF

You can easily converse with your PDF by giving it a conversation ID.

from uuid import uuid4
conversation_id = str(uuid4())
collection.generative_qa(
    user_input="What is this?",
    prompt_fields=["content"],
    conversation_id=conversation_id
)

We automatically store previous conversations for you.

Create a copilot

You can build a copilot to do more complex things based on your data such as integrating with your product to do advanced reasoning tasks or analysis.

collection.copilot(
    # Ask it any query
    user_input="How do I use this with LangChain?",
    # Use your answer field
    prompt_fields=['content'],
)

If you are interested in further details, we recommend checking out:

Copilot

Read more about this endpoint and how you can integrate this into a variety of clients such as JS, Go, etc. and how you can test different things out such as re-ranking and fields.

For more a hands-on support, join our discord community at https://discord.gg/a3K9c8GRGt