What this tutorial will cover:

  • How you can use Langchain’s loaders to insert into Twilix
  • How you can convert Jupyter Notebooks into Twilix
  • Demonstrate how we built Microsoft’s Guidance in just a few lines of code!

In addition, we open-source our Jupyter Notebook splitting strategy in this tutorial to help others who are also looking to find better ways to index these notebooks.

Installation

pip install -q langchain
pip install -q GitPython

Set-up

Cloning a repository

Download any open-source github repository using the code below.

from git import Repo
repo = Repo.clone_from(
    "https://github.com/microsoft/guidance", to_path="./guidance"
)
branch = repo.head.reference

Using LangChain’s Loaders

Now, we want to use LangChain’s loaders.

from langchain.document_loaders import GitLoader
loader = GitLoader(repo_path="./guidance", branch=branch)
data = loader.load()

Processing for Jupyter Notebooks

These loaders will be used to create and store examples.

nb_examples = [
    x for x in data if 'pynb' in x.metadata['file_path']
]
file_paths = [nb.metadata['file_path'] for nb in nb_examples]

Once you load in the notebooks, you can then convert them into strings

# Parse in string appropriately depending on their type
def get_cell_string(cell: dict):
  content = ""

  if cell['cell_type'] == 'markdown':
    content += cell['source']
  elif cell['cell_type'] == 'code':
    content += f"""```python
{cell['source']}
"""
  return content

You can then insert Jupyter Notebooks

from nbformat import read
from tqdm.auto import tqdm
docs = []
for notebook_path in tqdm(file_paths):
  with open('guidance/' + notebook_path, 'r') as f:
      notebook = read(f, as_version=4)

  cells = notebook['cells']

  # For each cell, we include the before and after markdown and code
  all_content = []
  for i, cell in enumerate(cells):
    content = get_cell_string(cell)
    all_content.append(
        {
            "content": content
        }
    )


  # For each code block include previous and enxt content cell
  # below are an example indices
  # [[0, 2], [1, 3], [2, 4], [3, 5], [4, 6], [5, 7], [6, 8]]
  # Ignore the terrible code
  content_clean = ["\n".join([c['content'] for c in all_content[(i-1):(i + 1)]])for i in range(1, len(all_content))]
  docs += [{"content": c, "file_path": notebook_path} for c in content_clean]

# You can then quickly check how many them there are
len(docs)
# 432

Inserting into Twilix

Before You Start

All API requests requires an API key. To get your API key, sign up for free at https://app.twilix.io.

Installing via Python

You can install Twilix's BlitzChain package in Python running the following pip command:

pip install -U blitzchain
from blitzchain import Client
TWILIX_API_KEY = "..."
client = Client(TWILIX_API_KEY)
collection = client.Collection("microsoftGuidanceDemo")

Quick sense check

You can do a sense check of the number of objects in a collection.

collection.count()

Generative Question Answering

Ask Microsoft Guidance simple questions and have it return to you useful answers.

# Ask Microsoft Guidance questions about their docs
collection.generative_qa(
    # Ask it any query
    user_input="What is Anachronism?",
    # Use your answer field
    prompt_fields=['content']
)

You can find out more about generative question-answering from:

Generative Question Answering

Explore more about generative question-answering and other features like content-moderation.

Co-Pilot

Ask Microsoft Guidance to help you write code to integrate into your application

collection.copilot(
    # Ask it any query
    user_input="What is token healing?",
    # Use your answer field
    prompt_fields=['content']
)

CoPilot

Explore more about co-pilot and other features including content moderation.

For more a hands-on support, join our discord community at https://discord.gg/a3K9c8GRGt