Skip to main content

· 23 min read
Dave Shoup
info

Like in the previous post, we made a Notebook available in case you want to skip straight to all the code covered in this post! 📓

Next Steps

In Part 1, we went through a brief example of using the OpenAI function calling API to come up with arguments to our create_notebook_and_launch_kernel function, which let our specified model pick the name of the Notebook file, and optionally provide a kernel name and hardware size. That helped establish a basic understanding of the pieces at play, but didn't really show the power of the function calling API.

This time, we'll add more functions into the mix while keeping the same feedback loop through a conversation function. The main functionality we're going to expose to the large language model (LLM) this time focus around molding the body of the Notebook itself -- the code and markdown cells, along with any outputs from executed code.

All of this, combined with our previous example of creating a Notebook, will let us accomplish the foundational capabilities integrated into the Noteable ChatGPT plugin. Let's get started.

Setup

Make sure our required packages are available:

[ ]:
!pip install noteable-origami openai tiktoken
note

tiktoken is new here; since we expect more function calls to be made, we may end up going over the 4096-token limit that OpenAI enforces. We'll set a soft limit of 4000 tokens for testing, but depending on the model you use, you can bump that number up even higher. In any case, it's a good idea to keep track of the tokens you're using before sending requests (from both a cost/usage perspective, as well as to avoid hitting the limit part-way through a chain of requests).

Creating a Notebook & Launching a Kernel

For the sake of simplicity, we'll be working in a single Notebook for the duration of this post, but you can experiment with including that same start_notebook_func_schema function schema from Part 1 in your own environment if you want to see how the LLM does with working across multiple Notebooks.

The function below should look familiar, but with the additions of setting a global DEFAULT_PROJECT_ID and returning a file_id in the response, which we'll need to provide for the new functionality further down.

[1]:
import uuid
from typing import Optional

from origami.clients.api import APIClient
from origami.models.api.files import File
from origami.models.kernels import KernelSession


api_client = APIClient()
user_info = await api_client.user_info()
# get our default Project ID from the ChatGPT default, or provide another for easy reference,
# e.g. 'a1b2c3d4-e5f6-4a7b-8123-abcdef123456' from the following URL:
# https://app.noteable.io/p/a1b2c3d4-e5f6-4a7b-8123-abcdef123456/My-ChatGPT-Project
DEFAULT_PROJECT_ID = user_info.origamist_default_project_id


async def create_notebook_and_launch_kernel(
file_path: str,
project_id: Optional[uuid.UUID] = None,
kernel_name: str = "python3.9",
hardware_size: str = "small",
) -> dict:
"""Create a Notebook in a Project and launch a Kernel session."""
# if we're not specifying a project ID, just use what we pulled earlier
project_id = project_id or DEFAULT_PROJECT_ID
file: File = await api_client.create_notebook(project_id, file_path)
kernel_session: KernelSession = await api_client.launch_kernel(
file_id=file.id,
kernel_name=kernel_name,
hardware_size=hardware_size,
)
# test out passing other properties here and see how the LLM responds!
return {
'file_url': file.url,
'kernel_state': kernel_session.kernel.execution_state,
'file_id': file.id,
}

New functions

Here comes the fun part -- adding multiple new functions that the LLM will have to choose from. If we think about regular Notebook use, the bulk of it comes down to an iterative combination of the following:

  • add cells and content (code or markdown)
  • update/replace cell content
  • getting the current cells
  • execute (code) cell and return cell state + outputs
  • delete cells

So let's break these down and generate some new functions and schemas.

info

The code sections below are going to get a bit complicated. Part of this is to get ahead of common scenarios we've seen before (attempting to run code without a kernel session, handling error outputs, etc) rather than make an exhaustive guide to incrementally handling every situation.

If this feels overwhelming, don't worry -- Part 3 will get into transitioning this more complicated functionality over to the chatlab package as well as using the Noteable ChatGPT plugin directly, where these scenarios are already handled.

Before we get anywhere in a Notebook, we need content. Whether it's documentation in markdown cells, or executable source in code cells, adding cells with content is a capability we need our LLM to know about.

Adding Cells & Content

[2]:
import asyncio
from typing import Literal

from origami.models.notebook import CodeCell, MarkdownCell, NotebookCell


async def add_cell_to_notebook(
file_id: uuid.UUID,
cell_source: str,
cell_type: Literal["code", "markdown"] = "code",
after_cell_id: Optional[str] = None,
before_cell_id: Optional[str] = None,
) -> NotebookCell:
"""Add a Code or Markdown cell to a Notebook file with source content."""
# connect to a Notebook file via RTU
rtu_client = await APIClient().connect_realtime(file_id)

# create the new cell model
if cell_type == "code":
new_cell = CodeCell(source=cell_source)
elif cell_type == "markdown":
new_cell = MarkdownCell(source=cell_source)

existing_cell_ids = rtu_client.cell_ids
# make sure any provided after_cell_id/before_cell_id was valid
after_id = None
before_id = None
if existing_cell_ids:
if after_cell_id and after_cell_id in existing_cell_ids:
after_id = after_cell_id
before_id = None
elif before_cell_id and before_cell_id in existing_cell_ids:
after_id = None
before_id = before_cell_id
if not (after_id or before_id):
# ...otherwise, place the cell after the last cell (at the bottom of the Notebook)
after_id = rtu_client.cell_ids[-1]

# add the cell to the notebook document
cell = await rtu_client.add_cell(
cell=new_cell,
after_id=after_id,
before_id=before_id,
)
return cell

And our schema -- most of this structure should look similar to the schemas we made in Part 1, with the addition of the enum property for cell_type. This will tell the LLM that only two values should be used here: code and markdown. We're also providing optional after_cell_id and before_cell_id arguments to allow the LLM to place cells before or after existing cells.

[3]:
add_cell_func_schema = {
'name': 'add_cell_to_notebook',
'description': 'Add a Code or Markdown cell to a Notebook file with source content.',
'parameters': {
'type': 'object',
'properties': {
'file_id': {
'type': 'string',
'format': 'uuid',
'description': 'The file UUID to add the cell to.'
},
'cell_source': {
'type': 'string',
'description': 'The source content for the cell.'
},
'cell_type': {
'type': 'string',
'description': 'The type of cell to add.',
'enum': ['code', 'markdown'],
'default': 'code',
},
'after_cell_id': {
'type': 'string',
'description': 'The ID of the cell to place this cell after.'
},
'before_cell_id': {
'type': 'string',
'description': 'The ID of the cell to place this cell before.'
}
},
'required': ['file_id', 'cell_source']
}
}

Executing Code Cells

This one is a bit more involved since there are some extra steps we want to take. First, we need a running kernel session in order to execute anything. Next, we not only want to kick off the execution, but we also want to see the result of that execution -- was there any output? Did we encounter an error? If so, we should probably raise that for the LLM to try to correct in a follow-on request.

We're also going to add some convenience functions for our own use to handle things like cleaning the ANSI escape codes out of colored traceback strings, as well as summarizing a cell and its outputs. (There are a lot of properties between the cell, the outputs, and output contents; we only want to provide a sample of the most important ones for our testing.)

[4]:
import json
import re

from origami.clients.rtu import RTUClient


def remove_ansi_from_text(text: str) -> str:
"""Removes ANSI escape sequences from text.
Useful for cleaning colored text from formatted traceback strings.
"""
pattern = re.compile(r"\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])")
return pattern.sub("", text)


async def summarize_cell_and_outputs(
rtu_client: RTUClient,
cell: NotebookCell,
raise_error_outputs: bool = False,
) -> dict:
"""Create a summary of a Code cell and its outputs."""
cell_summary = {
'cell_id': cell.id,
'source': cell.source,
'cell_type': cell.cell_type,
'outputs': [],
'cell_state': rtu_client.cell_states.get(cell.id),
}

output_collection_id = cell.metadata.get("noteable", {}).get("output_collection_id")
if not output_collection_id:
# no output collection; return the cell summary as-is
return cell_summary

# if the cell has an output collection ID, fetch the output(s)
output_collection = await api_client.get_output_collection(output_collection_id)
for output in output_collection.outputs:
if output.content.mimetype == 'application/vnd.jupyter.error+json':
output_content = json.loads(output.content.raw)
error_traceback = remove_ansi_from_text("\n".join(output_content['traceback']))
if raise_error_outputs:
raise RuntimeError(error_traceback)
output_content = error_traceback
# normal output; pass it through (without any large presigned URLs)
cell_summary['outputs'].append(
output.content.dict(
exclude_none=True,
exclude={"url"},
)
)

# there are a lot of properties between the cell, the outputs, and output contents; we only want
# to provide a sample of the most important ones for our testing
return cell_summary


async def execute_code_cell(
file_id: uuid.UUID,
cell_id: str,
) -> dict:
"""Execute a Code cell in a Notebook."""
# connect to a Notebook file via RTU
rtu_client = await APIClient().connect_realtime(file_id)

# make sure a kernel is up
if rtu_client.kernel_state not in ["idle", "busy"]:
# if not, start one
await api_client.launch_kernel(file_id=file_id)
# wait for kernel state to be idle
await rtu_client.wait_for_kernel_idle()

# queue the cell for execution and wait until it's done executing
queued_execution = await rtu_client.queue_execution(cell_id)
cells = await asyncio.gather(*queued_execution)
cell = cells[0]
# create a cell summary and raise an error if the cell failed during execution
cell_summary = await summarize_cell_and_outputs(rtu_client, cell, raise_error_outputs=True)
return cell_summary
[5]:
execute_code_func_schema = {
'name': 'execute_code_cell',
'description': 'Execute a Code cell in a Notebook.',
'parameters': {
'type': 'object',
'properties': {
'file_id': {
'type': 'string',
'format': 'uuid',
'description': 'The file UUID to execute the cell in.'
},
'cell_id': {
'type': 'string',
'description': 'The ID of the cell to execute.',
}
},
'required': ['file_id', 'cell_id']
}
}

Optional Extras

Our two add_cell_func_schema and execute_code_func_schema will likely handle the majority of the work in a Notebook-focused conversation like this, but let's add a few more options that may come in handy for different scenarios or follow-on prompts:

  • getting current cells
  • replacing cell contents
  • deleting cells entirely

In the event of an error in execution, we can add the option to replace a cell's contents (to allow re-execution), or the option to remove a cell altogether. Or, if we decided to call run_conversation() again with different prompts (e.g. "remove the second cell and make the last cell more concise"), these give the LLM some ability to refactor our Notebook.

note

Take this all with a grain of salt; the model may not choose to use these next functions at all, and it starts to become "more of an art than a science" depending on how prompts and function schemas are structured.

Get Cells

[6]:
async def get_cells(
file_id: uuid.UUID,
cell_ids: Optional[list[str]] = None,
) -> list:
"""Get all (or a list of) cells from a Notebook file."""
# connect to a Notebook file via RTU
rtu_client = await APIClient().connect_realtime(file_id)

cells: list[NotebookCell] = rtu_client.builder.nb.cells

# if cell_ids were provided, check to make sure they exist and filter the cells
if cell_ids:
existing_cell_ids = set(rtu_client.cell_ids) & set(cell_ids)
if existing_cell_ids:
cells = [cell for cell in cells if cell.id in existing_cell_ids]

# summarize the cells and their outputs using the function we made earlier
cell_summaries = await asyncio.gather(
*[summarize_cell_and_outputs(rtu_client, cell) for cell in cells]
)
return cell_summaries

This function schema is also a bit different than all others we've covered; here, we introduce an array type that indicates we're expecting a list of strings for the optional cell_ids argument.

[7]:
get_cells_func_schema = {
'name': 'get_cells',
'description': 'Get all (or a list of) cells from a Notebook file.',
'parameters': {
'type': 'object',
'properties': {
'file_id': {
'type': 'string',
'format': 'uuid',
'description': 'The file UUID to get cells from.',
},
'cell_ids': {
'type': 'array',
'items': {
'type': 'string',
},
},
},
'required': ['file_id'],
},
}

Replace Cell Contents

We could also go with the RTUClient's update_cell_content method here, but that requires providing a diff-match-patch style patch string. In practice, we've noticed LLMs have trouble consistently getting that formatting correct, so we're going with a full content replacement strategy here to be on the safe side.

[8]:
async def replace_cell_content(
file_id: uuid.UUID,
cell_id: str,
new_content: str,
) -> None:
"""Replace a cell's content with new source content."""
rtu_client = await APIClient().connect_realtime(file_id)
await rtu_client.replace_cell_content(cell_id, new_content)
return


replace_cell_content_func_schema = {
'name': 'replace_cell_content',
'description': "Replace a cell's content with new source content.",
'parameters': {
'type': 'object',
'properties': {
'file_id': {
'type': 'string',
'format': 'uuid',
'description': 'The file UUID to replace the cell in.',
},
'cell_id': {
'type': 'string',
'description': 'The ID of the cell to replace.',
},
'new_content': {
'type': 'string',
'description': 'The new source content for the cell.',
}
},
'required': ['file_id', 'cell_id', 'new_content']
}
}

Delete Cell

[9]:
async def delete_cell(
file_id: uuid.UUID,
cell_id: str,
) -> None:
"""Delete a cell from the Notebook."""
rtu_client = await APIClient().connect_realtime(file_id)
await rtu_client.delete_cell(cell_id)
return


delete_cell_func_schema = {
'name': 'delete_cell',
'description': 'Delete a cell from the Notebook.',
'parameters': {
'type': 'object',
'properties': {
'file_id': {
'type': 'string',
'format': 'uuid',
'description': 'The file UUID to delete the cell from.'

},
'cell_id': {
'type': 'string',
'description': 'The ID of the cell to delete.'
}
},
'required': ['file_id', 'cell_id']
}
}

Putting it all together (again)

Just like in Part 1, we'll make a run_conversation() function that provides a feedback loop -- we provide the initial prompt or list of messages, get the response from OpenAI, and decide whether we make a function call or return with a content response.

One noteworthy addition below is that we're checking token counts before making requests. Since we're increasing the likelihood of continuously calling run_conversation with an updated list of messages, we'll slowly build up the length of context we submit as a request.

Note that we check to make sure the function call name is a valid function name before pulling its associated function out of func_names_to_funcs.

[10]:
import json
import openai
import tiktoken
from IPython.display import Markdown, display


function_schemas = [
add_cell_func_schema,
execute_code_func_schema,
replace_cell_content_func_schema,
delete_cell_func_schema,
]
# when the response includes `function_call`, we'll check the `name` against this dictionary and run
# the associated function (if it wasn't hallucinated)
func_names_to_funcs = {
"add_cell_to_notebook": add_cell_to_notebook,
"execute_code_cell": execute_code_cell,
"replace_cell_content": replace_cell_content,
"delete_cell": delete_cell,
}


async def run_conversation(
prompt: Optional[str] = None,
messages: Optional[list] = None,
model: str = "gpt-3.5-turbo",
show_all_messages: bool = False,
) -> list:
"""Run a conversation with a given model.

If `show_all_messages` is True, this will print all user, system, and assistant messages as they
are generated, along with token counts for each time this function is called.
"""
messages = messages or []
if prompt:
if show_all_messages:
print(f"`User`: {prompt}")
messages.append({"role": "user", "content": prompt})

# count tokens in use and eliminate oldest messages if necessary
encoder = tiktoken.encoding_for_model(model)
token_count = sum([len(encoder.encode(str(m))) for m in messages])
if show_all_messages:
print(f"Token count: {token_count}")
if token_count > 4000:
# soft limit to allow room for the response token count
# (this is where we could handle reducing the token count another way, maybe with an
# intermediate "conversation summary" operation)
messages = messages[1:]

response = openai.ChatCompletion.create(
messages=messages,
functions=function_schemas,
model=model,
)
response_message = response.choices[0]['message'].to_dict_recursive()
messages.append(response_message)

# since we didn't specify `function_call={"name": "create_notebook_and_launch_kernel"}`, we need
# to check if the response has a function call or if it's just a content string
if "function_call" not in response_message.keys():
display(
Markdown(f"Assistant: {response_message['content']}")
)
return messages

# make sure the LLM didn't hallucinate a function name
function_to_call = response_message["function_call"]["name"]
if show_all_messages:
print(f"Assistant: {response_message['function_call']}")
if function_to_call not in func_names_to_funcs.keys():
system_message = f"Function `{function_to_call}` not found. Available functions: {list(func_names_to_funcs.keys())}"
messages.append({"role": "system", "content": system_message})
return await run_conversation(
messages=messages,
model=model,
show_all_messages=show_all_messages,
)

# call the specified function with the arguments provided by the LLM
func = func_names_to_funcs[function_to_call]
call_args = json.loads(response_message["function_call"]["arguments"])
try:
result = await func(**call_args)
system_message = f"`{function_to_call}` ran successfully and returned: `{result}`"
except Exception as e:
system_message = f"Problem running {function_to_call}: `{e}`"
if show_all_messages:
print(f"System: {system_message}")

# add a system message to let the LLM know whether or not the function call was successful,
# and provide success/error states
messages.append({"role": "system", "content": system_message})
return await run_conversation(
messages=messages,
model=model,
show_all_messages=show_all_messages,
)

Instead of starting with just a prompt, let's lead with a system message that provides some ground rules for the LLM. We'll also enable the new show_all_messages argument to watch what functions the LLM chooses to run, and watch for any potential situations where it gets stuck.

warning

If you aren't providing a function schema for create_notebook_and_launch_kernel(), make sure you have a file_id ready to use here; otherwise, run these two lines before moving on:

# change the file name/path as needed
created_file = await create_notebook_and_launch_kernel("OrigamiLLM2.ipynb")
file_id = created_file['file_id']
[11]:
messages = await run_conversation(
messages=[
{
"role": "user",
"content": f"in file ID {file_id}, add code that prints the current date and time. after that, plot the sine function from 0 to 2pi in the next cell. in a third cell, print out a fibonacci sequence of length 20. show me the outputs when they're done.",
},
{
"role": "system",
"content": "Use markdown format for responses. When you create a cell, immediately execute it unless otherwise specified. If you have to install any packages, use `!pip install ____ -q` to suppress output, and delete the cell once the packages are installed.",
},
],
model="gpt-4",
show_all_messages=True,
)
Click to see the `show_all_messages` print statements
[ ]:
Token count: 153

Assistant: {'name': 'add_cell_to_notebook', 'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_source": "from datetime import datetime\nprint(datetime.now())",\n "cell_type": "code"\n}'}

System: add_cell_to_notebook ran successfully and returned: id='87a26214-b243-4d80-ac52-607dbd51f0a4' source='from datetime import datetime\nprint(datetime.now())' metadata={} cell_type='code' execution_count=None outputs=[]

Token count: 319

Assistant: {'name': 'execute_code_cell', 'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_id": "87a26214-b243-4d80-ac52-607dbd51f0a4"\n}'}

System: execute_code_cell ran successfully and returned: {'cell_id': '87a26214-b243-4d80-ac52-607dbd51f0a4', 'source': 'from datetime import datetime\nprint(datetime.now())', 'cell_type': 'code', 'outputs': [{'raw': '2023-10-05 16:37:57.057048\n', 'mimetype': 'text/plain'}], 'cell_state': 'finished_with_no_error'}

Token count: 526

Assistant: {'name': 'add_cell_to_notebook', 'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_source": "import matplotlib.pyplot as plt\nimport numpy as np\n\nx = np.linspace(0, 2 * np.pi, 100)\ny = np.sin(x)\n\nplt.plot(x, y)\nplt.show()",\n "cell_type": "code",\n "after_cell_id": "87a26214-b243-4d80-ac52-607dbd51f0a4"\n}'}

System: add_cell_to_notebook ran successfully and returned: id='a053a9c5-2189-4e71-9835-19d3dcc3aa8c' source='import matplotlib.pyplot as plt\nimport numpy as np\n\nx = np.linspace(0, 2 * np.pi, 100)\ny = np.sin(x)\n\nplt.plot(x, y)\nplt.show()' metadata={} cell_type='code' execution_count=None outputs=[]

Token count: 809

Assistant: {'name': 'execute_code_cell', 'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_id": "a053a9c5-2189-4e71-9835-19d3dcc3aa8c"\n}'}

System: Problem running execute_code_cell: `--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Cell In[2], line 1 ----> 1 import matplotlib.pyplot as plt 2 import numpy as np 4 x = np.linspace(0, 2 * np.pi, 100)

ModuleNotFoundError: No module named 'matplotlib'`

Token count: 997

Assistant: {'name': 'add_cell_to_notebook', 'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_source": "!pip install matplotlib numpy -q",\n "cell_type": "code"\n}'}

System: add_cell_to_notebook ran successfully and returned: id='21400cf9-2f39-4e9e-b09c-9715cc2e46a9' source='!pip install matplotlib numpy -q' metadata={} cell_type='code' execution_count=None outputs=[]

Token count: 1160

Assistant: {'name': 'execute_code_cell', 'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_id": "21400cf9-2f39-4e9e-b09c-9715cc2e46a9"\n}'}

System: execute_code_cell ran successfully and returned: {'cell_id': '21400cf9-2f39-4e9e-b09c-9715cc2e46a9', 'source': '!pip install matplotlib numpy -q', 'cell_type': 'code', 'outputs': [{'raw': '\r\n\x1b[1m[\x1b[0m\x1b[34;49mnotice\x1b[0m\x1b[1;39;49m]\x1b[0m\x1b[39;49m A new release of pip is available: \x1b[0m\x1b[31;49m23.0.1\x1b[0m\x1b[39;49m -> \x1b[0m\x1b[32;49m23.2.1\x1b[0m\r\n\x1b[1m[\x1b[0m\x1b[34;49mnotice\x1b[0m\x1b[1;39;49m]\x1b[0m\x1b[39;49m To update, run: \x1b[0m\x1b[32;49mpip install --upgrade pip\x1b[0m\r\n', 'mimetype': 'text/plain'}], 'cell_state': 'finished_with_no_error'}

Token count: 1590

Assistant: {'name': 'delete_cell', 'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_id": "21400cf9-2f39-4e9e-b09c-9715cc2e46a9"\n}'}

System: delete_cell ran successfully and returned: None

Token count: 1707

Assistant: {'name': 'execute_code_cell', 'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_id": "a053a9c5-2189-4e71-9835-19d3dcc3aa8c"\n}'}

System: execute_code_cell ran successfully and returned: {'cell_id': 'a053a9c5-2189-4e71-9835-19d3dcc3aa8c', 'source': 'import matplotlib.pyplot as plt\nimport numpy as np\n\nx = np.linspace(0, 2 * np.pi, 100)\ny = np.sin(x)\n\nplt.plot(x, y)\nplt.show()', 'cell_type': 'code', 'outputs': [{'mimetype': 'image/png'}], 'cell_state': 'finished_with_no_error'}

Token count: 1941

Assistant: {'name': 'add_cell_to_notebook', 'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_source": "def fibonacci(n):\n seq = [0, 1]\n while len(seq) < n:\n seq.append(seq[-1] + seq[-2])\n return seq\n\nprint(fibonacci(20))",\n "cell_type": "code",\n "after_cell_id": "a053a9c5-2189-4e71-9835-19d3dcc3aa8c"\n}'}

System: add_cell_to_notebook ran successfully and returned: id='1738d394-82c0-422d-995d-712109351540' source='def fibonacci(n):\n seq = [0, 1]\n while len(seq) < n:\n seq.append(seq[-1] + seq[-2])\n return seq\n\nprint(fibonacci(20))' metadata={} cell_type='code' execution_count=None outputs=[]

Token count: 2226

Assistant: {'name': 'execute_code_cell', 'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_id": "1738d394-82c0-422d-995d-712109351540"\n}'}

System: execute_code_cell ran successfully and returned: {'cell_id': '1738d394-82c0-422d-995d-712109351540', 'source': 'def fibonacci(n):\n seq = [0, 1]\n while len(seq) < n:\n seq.append(seq[-1] + seq[-2])\n return seq\n\nprint(fibonacci(20))', 'cell_type': 'code', 'outputs': [{'raw': '[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]\n', 'mimetype': 'text/plain'}], 'cell_state': 'finished_with_no_error'}

Token count: 2518

And the final message:

[11]:

Assistant: The outputs of your code are as follows:

  1. The current date and time is: 2023-10-05 16:37:57.057048

  2. The plot of the sine function from 0 to 2pi is shown below:

<invalid markdown image link>

  1. The first 20 numbers in the fibonacci sequence are: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]

While this process was running, I had the targeted Notebook open in another tab, and the steps taken were interesting to watch, particularly in how the LLM iterated between the available operations given the provided system messages:

  • adding and executing a cell to display the current date and time ran with no problem
  • adding sine wave plot syntax was fine, but during execution, we ran into a ModuleNotFoundError
    • after seeing this, the LLM created a new cell to install the required packages with !pip install matplotlib numpy -q and executed it without issue
    • it then deleted the pip install cell and re-executed the cell to plot the sine wave, which ran successfully and produced a matplotlib image output
  • it continued on by successfully adding and executing the fibonacci sequence

In this particular run, it didn't replace any cell contents or have to look up the existing cells in the Notebook, but that's alright.

The full breakdown can be found in the list of messages generated during the conversation. You could stop here, or you could feed in more messages, continue the conversation, and see where the LLM takes you (and your Notebook)!

[12]:
messages
Click to see the full output of `messages`
[12]:
[
{
'role': 'user',
'content': "in file ID cbc69c43-89a3-4b39-810e-2ee15119f9eb, add code that prints the current date and time. after that, plot the sine function from 0 to 2pi in the next cell. in a third cell, print out a fibonacci sequence of length 20. show me the outputs when they're done."
},
{
'role': 'system',
'content': 'Use markdown format for responses. When you create a cell, immediately execute it unless otherwise specified. If you have to install any packages, use `!pip install ____ -q` to suppress output, and delete the cell once the packages are installed.'
},
{
'role': 'assistant',
'content': None,
'function_call': {
'name': 'add_cell_to_notebook',
'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_source": "from datetime import datetime\\nprint(datetime.now())",\n "cell_type": "code"\n}'
}
},
{
'role': 'system',
'content': "`add_cell_to_notebook` ran successfully and returned: `id='87a26214-b243-4d80-ac52-607dbd51f0a4' source='from datetime import datetime\\nprint(datetime.now())' metadata={} cell_type='code' execution_count=None outputs=[]`"
},
{
'role': 'assistant',
'content': None,
'function_call': {
'name': 'execute_code_cell',
'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_id": "87a26214-b243-4d80-ac52-607dbd51f0a4"\n}'
}
},
{
'role': 'system',
'content': "`execute_code_cell` ran successfully and returned: `{'cell_id': '87a26214-b243-4d80-ac52-607dbd51f0a4', 'source': from datetime import datetime\\nprint(datetime.now())', 'cell_type': 'code', 'outputs': [{'raw': '2023-10-05 16:37:57.057048\\n', 'mimetype': 'text/plain'}], 'cell_state': 'finished_with_no_error'}`"
},
{
'role': 'assistant',
'content': None,
'function_call': {
'name': 'add_cell_to_notebook',
'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_source": "import matplotlib.pyplot as plt\\nimport numpy as np\\n\\nx = np.linspace(0, 2 * np.pi, 100)\\ny = np.sin(x)\\n\\nplt.plot(x, y)\\nplt.show()",\n "cell_type": "code",\n "after_cell_id": "87a26214-b243-4d80-ac52-607dbd51f0a4"\n}'
}
},
{
'role': 'system',
'content': "`add_cell_to_notebook` ran successfully and returned: `id='a053a9c5-2189-4e71-9835-19d3dcc3aa8c' source='import matplotlib.pyplot as plt\\nimport numpy as np\\n\\nx = np.linspace(0, 2 * np.pi, 100)\\ny = np.sin(x)\\n\\nplt.plot(x, y)\\nplt.show()' metadata={} cell_type='code' execution_count=None outputs=[]`"
},
{
'role': 'assistant',
'content': None,
'function_call': {
'name': 'execute_code_cell',
'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_id": "a053a9c5-2189-4e71-9835-19d3dcc3aa8c"\n}'
}
},
{
'role': 'system',
'content': "Problem running execute_code_cell: `---------------------------------------------------------------------------\nModuleNotFoundError Traceback (most recent call last)\nCell In[2], line 1\n----> 1 import matplotlib.pyplot as plt\n 2 import numpy as np\n 4 x = np.linspace(0, 2 * np.pi, 100)\n\nModuleNotFoundError: No module named 'matplotlib'`"
},
{
'role': 'assistant',
'content': None,
'function_call': {
'name': 'add_cell_to_notebook',
'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_source": "!pip install matplotlib numpy -q",\n "cell_type": "code"\n}'
}
},
{
'role': 'system',
'content': "`add_cell_to_notebook` ran successfully and returned: `id='21400cf9-2f39-4e9e-b09c-9715cc2e46a9' source='!pip install matplotlib numpy -q' metadata={} cell_type='code' execution_count=None outputs=[]`"
},
{
'role': 'assistant',
'content': None,
'function_call': {
'name': 'execute_code_cell',
'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_id": "21400cf9-2f39-4e9e-b09c-9715cc2e46a9"\n}'
}
},
{
'role': 'system',
'content': "`execute_code_cell` ran successfully and returned: `{'cell_id': '21400cf9-2f39-4e9e-b09c-9715cc2e46a9', 'source': '!pip install matplotlib numpy -q', 'cell_type': 'code', 'outputs': [{'raw': '\\r\\n\\x1b[1m[\\x1b[0m\\x1b[34;49mnotice\\x1b[0m\\x1b[1;39;49m]\\x1b[0m\\x1b[39;49m A new release of pip is available: \\x1b[0m\\x1b[31;49m23.0.1\\x1b[0m\\x1b[39;49m -> \\x1b[0m\\x1b[32;49m23.2.1\\x1b[0m\\r\\n\\x1b[1m[\\x1b[0m\\x1b[34;49mnotice\\x1b[0m\\x1b[1;39;49m]\\x1b[0m\\x1b[39;49m To update, run: \\x1b[0m\\x1b[32;49mpip install --upgrade pip\\x1b[0m\\r\\n', 'mimetype': 'text/plain'}], 'cell_state': 'finished_with_no_error'}`"
},
{
'role': 'assistant',
'content': None,
'function_call': {
'name': 'delete_cell',
'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_id": "21400cf9-2f39-4e9e-b09c-9715cc2e46a9"\n}'
}
},
{'role': 'system', 'content': '`delete_cell` ran successfully and returned: `None`'},
{
'role': 'assistant',
'content': None,
'function_call': {
'name': 'execute_code_cell',
'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_id": "a053a9c5-2189-4e71-9835-19d3dcc3aa8c"\n}'
}
},
{
'role': 'system',
'content': "`execute_code_cell` ran successfully and returned: `{'cell_id': 'a053a9c5-2189-4e71-9835-19d3dcc3aa8c', 'source': 'import matplotlib.pyplot as plt\\nimport numpy as np\\n\\nx = np.linspace(0, 2 * np.pi, 100)\\ny = np.sin(x)\\n\\nplt.plot(x, y)\\nplt.show()', 'cell_type': 'code', 'outputs': [{'mimetype': 'image/png'}], 'cell_state': 'finished_with_no_error'}`"
},
{
'role': 'assistant',
'content': None,
'function_call': {
'name': 'add_cell_to_notebook',
'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_source": "def fibonacci(n):\\n seq = [0, 1]\\n while len(seq) < n:\\n seq.append(seq[-1] + seq[-2])\\n return seq\\n\\nprint(fibonacci(20))",\n "cell_type": "code",\n "after_cell_id": "a053a9c5-2189-4e71-9835-19d3dcc3aa8c"\n}'
}
},
{
'role': 'system',
'content': "`add_cell_to_notebook` ran successfully and returned: `id='1738d394-82c0-422d-995d-712109351540' source='def fibonacci(n):\\n seq = [0, 1]\\n while len(seq) < n:\\n seq.append(seq[-1] + seq[-2])\\n return seq\\n\\nprint(fibonacci(20))' metadata={} cell_type='code' execution_count=None outputs=[]`"
},
{
'role': 'assistant',
'content': None,
'function_call': {
'name': 'execute_code_cell',
'arguments': '{\n "file_id": "cbc69c43-89a3-4b39-810e-2ee15119f9eb",\n "cell_id": "1738d394-82c0-422d-995d-712109351540"\n}'
}
},
{
'role': 'system',
'content': "`execute_code_cell` ran successfully and returned: `{'cell_id': '1738d394-82c0-422d-995d-712109351540', 'source': 'def fibonacci(n):\\n seq = [0, 1]\\n while len(seq) < n:\\n seq.append(seq[-1] + seq[-2])\\n return seq\\n\\nprint(fibonacci(20))', 'cell_type': 'code', 'outputs': [{'raw': '[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]\\n', 'mimetype': 'text/plain'}], 'cell_state': 'finished_with_no_error'}`"
},
{
'role': 'assistant',
'content': 'The outputs of your code are as follows:\n\n1. The current date and time is: 2023-10-05 16:37:57.057048\n\n2. The plot of the sine function from 0 to 2pi is shown below:\n\n [![Sine function](attachment:image.png)]()\n\n3. The first 20 numbers in the fibonacci sequence are: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]'
}
]

And the final Notebook -- note the execution count for the sine wave plot cell:

A screenshot of a Noteable notebook with three executed code cells, showing the date and time, a sine wave plot, and numbers in a fibonacci sequence.

Recap

Continuing from Part 1, we dove further into the function calling API, crafting more function calls and schemas. To make our lives a bit easier, we introduced some handy convenience functions to use between these new function wrappers. We also slightly upgraded our run_conversation() function to handle multiple function schemas and allow us to monitor on the flow of the conversation.

Keep an eye out for Part 3, where we'll take the lessons learned in these last two posts and go even further, pulling in the chatlab package and the Noteable Plugin's OpenAPI spec to expose additional functionality.

Thanks for sticking around! Experiment, share your discoveries, and let's keep innovating. Happy coding!

· 14 min read
Dave Shoup
info

If you'd like to skip straight to the action, check out the Notebook that covers all the code used below! 📓

Let's embark on a journey to uncover how to interface with the Noteable API using its foundational Python package, origami, while also leveraging the power of OpenAI's function calling API. For those acquainted with the Noteable ChatGPT plugin, you'll not only gain direct access to the features you've already explored but also uncover functionality that only origami and the Noteable API can offer, such as customizing which operations are made available, but also how those operations are described to the Large Language Models (LLMs). Whether you're a veteran user or just getting started, there's a world of possibilities waiting to be discovered.

Background

When ChatGPT burst onto the tech scene, it didn't just nudge the boundaries of conversational AI – it blew right past them. The tech community, accustomed to continuous innovations and breakthroughs, suddenly found itself in the throes of another groundbreaking revolution. But OpenAI didn’t stop there. By introducing access to plugins, it allowed ChatGPT to take user prompts and meld them with other services' APIs. This was not just another feature; it was a gateway to endless integrations and applications.

By early May 2023, Noteable released its own ChatGPT plugin, and almost instantly, the plugin became the go-to tool for those with an itch for data analysis.

The Noteable plugin on the ChatGPT plugin store.

What set the Noteable plugin apart from other plugins was its ability to transform conversations into tangible data analysis, all encapsulated within a Jupyter-notebook compatible document. This breakthrough meant that users weren’t just getting answers; they were obtaining a detailed artifact of their entire analytical journey. Such a document serves as an invaluable asset for documentation, further analysis, and collaboration. The blend of conversational AI with the structured format of Jupyter notebooks provides both clarity and context, ensuring that insights derived from data aren't just momentary flashes but are well-recorded, replicable, and ready for deeper exploration.

A conversation with ChatGPT using the Noteable plugin (left), working through data analysis of data from the Spotify API, shown in a Noteable notebook (right).

As powerful as ChatGPT Plugins are, they have their own limitations. Our own design of the plugin is limited by what we can expose to the broad user base of ChatGPT users. The Noteable plugin is a one-size-fits-all set of operations and descriptions that ChatGPT uses to drive Noteable Notebooks. You, however, are not limited by this set and can leverage a wider range of operations and customizations through the combination of origami and the function calling API. For the curious minds and the restless tinkerers, this post is for you.

Interacting with the Noteable API with OpenAI's function calling API

At a high level, the OpenAI function calling API allows users or services to expose functionality to a LLM, so when a prompt is sent, the LLM can respond with a suggestion to call on that exposed functionality. It does not allow the LLM call function(s) directly.

note

Feel free to look at the OpenAI function calling docs for reference as we move along.

Install Dependencies

To get started, we first need to install some packages into our Python environment:

In a notebook environment:

[ ]:
!pip install noteable-origami openai

Or in a terminal:

pip install noteable-origami openai

Authentication

We'll need both OPENAI_API_KEY as well as NOTEABLE_TOKEN as environment variables in order to authenticate to both services.

tip

If you're using Noteable as your Python environment of choice, you can create Secrets for both of these, and they will be available as environment variables while executing code cells.

Function Calling Setup

To enable OpenAI's function calling API to use origami functions, we must take a few key steps:

  • generate JSON schemas for functions
  • send a request to the OpenAI chat completion API
  • parse the response with function calling arguments

Let's get started by breaking down each step. For the first portions of this blog series, we'll be creating the function schemas by hand. This will help set a base understanding of what OpenAI is looking for, and we'll go over this structure with a couple different examples.

Basic Example

The following is a very basic function that takes three arguments of different variable types, with the last one, baz, being optional.

[1]:
from typing import Optional

def test_func(
foo: str,
bar: int,
baz: Optional[dict] = None,
):
"""Sample function"""
print(f"ChatGPT's favorite food is {foo}.")
print(f"It thought about the Roman Empire {bar} time(s).")
if baz:
print(f"Its favorite rock stars are: {baz}")
else:
print("It doesn't have any favorite rock stars.")

The main parts of the function schema are the name, description, and the parameters. (If you're familiar with JSON schemas at all, the parameters section should look familiar.)

{
"name": "test_func",
"description": "Sample function",
"parameters": {
"type": "object",
"properties": {
# fill in the arguments here
},
"required": [
# add required argument names here
],
},
}

Once we fill in the function arguments and add some basic descriptions, we might end up with something like this:

[2]:
test_func_schema = {
"name": "test_func",
"description": "Sample function",
"parameters": {
"type": "object",
"properties": {
"foo": {
"type": "string",
"description": "Your favorite food",
},
"bar": {
"type": "integer",
"description": "The number of times you've thought about the Roman Empire",
},
"baz": {
"type": "object",
"description": "Names and ages of your favorite rock stars",
},
},
"required": ["foo", "bar"],
},
}

Now that we have a basic function schema, we try it out by passing it to OpenAI with a basic prompt.

note

The function_call argument is set to auto by default, which means it would return either a basic text response or a function call response. By giving it the name of the function, we're explicitly saying that we want a function call response back.

See related docs.

[3]:
import openai

# feel free to change this and play around with the results!
initial_prompt = "print some made-up data"

response = openai.ChatCompletion.create(
messages=[
{
"role": "user",
"content": initial_prompt,
}
],
functions=[test_func_schema],
model="gpt-3.5-turbo",
function_call={"name": "test_func"},
)
response_message = response.choices[0]['message'].to_dict_recursive()
response_message
[3]:
{
'role': 'assistant',
'content': None,
'function_call': {
'name': 'test_func',
'arguments': '{\n "foo": "pizza",\n "bar": 10\n}'
}
}

Here we see the model suggesting test_func should be called with the arguments {"foo": "pizza", "bar": 10} (and nothing for baz, because it was optional). If we pass those arguments into our test function directly, we'll see the following:

[4]:
import json

call_args = json.loads(response_message["function_call"]["arguments"])
result = test_func(**call_args)
result
[4]:
ChatGPT's favorite food is pizza.
It thought about the Roman Empire 10 time(s).
It doesn't have any favorite rock stars.

Example with origami

Now let's switch to a more relevant example using the origami package. Next, we'll write a function that can create a new Noteable Notebook and start a kernel for us. But first, we need a Noteable Project ID to work with.

To keep things simple, we're going to use the same default Project as ChatGPT. You can swap this out for a Project of your choosing, though.

note

If you go to Noteable, you'll likely see the Spaces page with at least one Project listed. Right-clicking and copying the link to one of those should give you a URL that looks like this:

https://app.noteable.io/p/a1b2c3d4-e5f6-4a7b-8123-abcdef123456/My-ChatGPT-Project

Copy/paste that UUID between the /p/ and the name of the Project to provide it as the default here if you don't want to use the ChatGPT default.

[5]:
from origami.clients.api import APIClient

api_client = APIClient()
user_info = await api_client.user_info()
# use the same user-default project id as ChatGPT
project_id = user_info.origamist_default_project_id
project_id
[5]:
UUID('a1b2c3d4-e5f6-4a7b-8123-abcdef123456')

Creating a Notebook & Launching a Kernel

Alright, onto creating our new Notebook-creating function.

[6]:
import uuid

from origami.models.api.files import File
from origami.models.kernels import KernelSession


async def create_notebook_and_launch_kernel(
file_path: str,
project_id: Optional[uuid.UUID] = None,
kernel_name: str = "python3.9",
hardware_size: str = "small",
) -> dict:
"""Create a Notebook in a Project and launch a Kernel session."""
# if we're not specifying a project ID, just use what we pulled earlier
project_id = project_id or user_info.origamist_default_project_id
file: File = await api_client.create_notebook(project_id, file_path)
kernel_session: KernelSession = await api_client.launch_kernel(
file_id=file.id,
kernel_name=kernel_name,
hardware_size=hardware_size,
)
# test out passing other properties here and see how the LLM responds!
return {
'file_url': file.url,
'kernel_state': kernel_session.kernel.execution_state,
}

For the time being, we're only returning the Notebook's URL and the state of the kernel to pass as context back to the LLM in cells further down.

Using the function above for reference, we can create its function schema like how we did for test_func above. In this case, adding description properties to each argument is helpful context for the LLM.

[7]:
start_notebook_func_schema = {
'name': 'create_notebook_and_launch_kernel',
'description': 'Create a Notebook in a Project and launch a Kernel session.',
'parameters': {
'type': 'object',
'properties': {
'file_path': {
'type': 'string',
'description': 'Path to the file to create. Must end in .ipynb.',
},
'project_id': {
'type': 'string',
'format': 'uuid',
'description': 'If not specified, will use the user\'s default project.',
},
'kernel_name': {
'type': 'string',
'description': 'Name of the kernel to launch.',
'default': 'python3.9',
},
'hardware_size': {
'type': 'string',
'description': 'Hardware size of the kernel.',
'default': 'small',
},
},
'required': ['file_path'],
},
}

And again, we'll test it out while specifying the function_call argument.

[8]:
response = openai.ChatCompletion.create(
messages=[
{
"role": "user",
"content": "Make a notebook for making a 3d scatterplot using the plotly python package",
}
],
functions=[start_notebook_func_schema],
function_call={"name": "create_notebook_and_launch_kernel"},
model="gpt-3.5-turbo",
)
response_message = response.choices[0]['message'].to_dict_recursive()
response_message
[8]:
{
'role': 'assistant',
'content': None,
'function_call': {
'name': 'create_notebook_and_launch_kernel',
'arguments': '{\n "file_path": "3D_Scatterplot.ipynb",\n "kernel_name": "python3.9"\n}'}
}

It successfully provided a file_path, and even chose to specify a kernel_name (even though one wasn't required). If we run the function with the provided arguments, we should see our file_id and kernel_state:

[9]:
call_args = json.loads(response_message["function_call"]["arguments"])
file_kernel = await create_notebook_and_launch_kernel(**call_args)
file_kernel
[9]:
{
'file_url': 'https://app.noteable.io/f/bcd12345-6789-4abc-d012-3456abcdef90/3D_Scatterplot.ipynb',
'kernel_state': 'requested'
}

Putting it all together

The basics are done, and all that's left is to turn it into a convenience function to enable a feedback loop. To do this, we'll build up a messages list that includes our first prompt string, along with the assistant responses along the way. Here's how it should work:

  • The first time we call run_conversation(), we'll pass in our starting prompt as a user message like we did before.
  • We won't specify function_call this time, which means the response may be text content or a function call with suggested arguments.
  • If we get a text/content response, we'll exit the function and return all messages from the conversation.
  • If we get a function call response, we'll execute the function and create a system message to add to our messages list.
    • The idea here is that the system message will inform the LLM whether or not a file was created and a kernel was started. If all goes well, the LLM should respond with text/content echoing back the file URL and kernel state.

Ideally, the conversation should flow like this:

  • Call run_conversation() with prompt
    • User: "create a new notebook for making a 3d scatterplot using the plotly python package"
    • Assistant: (function call create_notebook_and_launch_kernel with file_path argument, maybe more)
    • <run create_notebook_and_launch_kernel(**args)>
    • System: "The file was successfully created and a kernel was launched. ..."
  • Call run_conversation() again, with messages
    • Assistant: "Your file was created at <file url> ..." (or some similar successful status message)
    • <return>

(In a more complicated example with multiple functions, we would probably use a dictionary of func_name: func pairs to call the correct function depending on value of the ["function_call"]["name"] in the response.)

[10]:
from IPython.display import Markdown, display


async def run_conversation(
prompt: Optional[str] = None,
messages: Optional[list] = None,
model: str = "gpt-3.5-turbo",
) -> list:
"""Run a conversation with a given model, providing either a starting prompt or a list of
messages with the `{"role": "user"/"assistant"/"system", "content": "..."}` format.
"""
messages = messages or []
if prompt:
# start building up `messages` with our first user prompt
messages.append({"role": "user", "content": prompt})

response = openai.ChatCompletion.create(
messages=messages,
functions=[start_notebook_func_schema],
model=model,
)
response_message = response.choices[0]['message'].to_dict_recursive()
# whatever happens here, keep the `assistant` message in the history
messages.append(response_message)

# since we didn't specify `function_call={"name": "create_notebook_and_launch_kernel"}`, we need
# to check if the response has a function call or if it's just a content string
if "function_call" not in response_message.keys():
# we got a text/content response; display it as markdown for easy viewing, then exit cleanly
display(
Markdown(f"Assistant: {response_message['content']}")
)
return messages

# time to parse the args and call our function!
call_args = json.loads(response_message["function_call"]["arguments"])
try:
file_kernel: dict = await create_notebook_and_launch_kernel(**call_args)
system_message = (
f"The file was successfully created and a kernel was launched. "
f"Kernel state: `{file_kernel['kernel_state']}` "
f"File URL: `{file_kernel['file_url']}`"
)
except Exception as e:
# if something goes wrong here, make sure we inform the LLM and it can either echo back an
# error state or attempt to self-correct
system_message = f"Problem creating a file or starting a kernel session: `{e}`"

# add a system message to the conversation to let the assistant know that the kernel session was
# created, along with its current execution state
messages.append({"role": "system", "content": system_message})

return await run_conversation(messages=messages, model=model)
[11]:
messages = await run_conversation("create a new notebook for making a 3d scatterplot using the plotly python package")
[11]:

Assistant: I have created a new notebook for you to make a 3D scatter plot using the Plotly Python package. You can access the notebook using the following link: plot_3d_scatterplot.ipynb.

If we go to that link, we see the Notebook was created and the kernel successfully launched. 🎉

Screenshot of a Noteable Notebook created with arguments provided by the LLM.

And if we examine messages, we see each message that was generated during the conversation -- our starting prompt, the first assistant function_call response, the system message, and the final assistant text/content response.

[12]:
messages
[12]:
[
{
'role': 'user',
'content': 'create a new notebook for making a 3d scatterplot using the plotly python package'
},
{
'role': 'assistant',
'content': None,
'function_call': {
'name': 'create_notebook_and_launch_kernel',
'arguments': '{\n "file_path": "plot_3d_scatterplot.ipynb",\n "kernel_name": "python3.9"\n}'
}
},
{
'role': 'system',
'content': 'The file was successfully created and a kernel was launched. Kernel state: `requested` File URL: `https://app.noteable.io/f/6e30c7db-9a80-4bcd-aa0f-9f4dbe46a740/plot_3d_scatterplot.ipynb`'
},
{
'role': 'assistant',
'content': 'I have created a new notebook for you to make a 3D scatter plot using the Plotly Python package. You can access the notebook using the following link: [plot_3d_scatterplot.ipynb](https://app.noteable.io/f/6e30c7db-9a80-4bcd-aa0f-9f4dbe46a740/plot_3d_scatterplot.ipynb).'
}
]

Recap

In this post, we touched on the OpenAI plugins and how to mimic similar behavior with OpenAI's function calling API, wrote some basic function schemas, and got our feet wet with the origami package by creating a notebook and starting a kernel using arguments provided by the LLM responses.

That's it for this part of the series, be sure to check out Part 2 for our next steps where we build on the lessons learned here. We'll dive into creating more function schemas with the origami package to add cell content to our notebook, execute code, and retrieve outputs.

Thanks for reading!

The Noteable Origami logo, depicting a snake made out of folded paper

· 13 min read
Noel Cothren

Workers across industries want to work smarter, not harder, and recently people around the world got access to powerfully smart AI tools that could help them do just that. But, are they really the magic wand they're made out to be?

What is an LLM?

Generative AI models use complex algorithms to produce responses (text, code, answers to questions, but also images, videos, art, voice, etc.) based on very large data sets that were used to train the model. Large Language Models (LLMs) use text and code training data to generate natural language and code responses. In this article, the terms LLMs, “generative AI", and AI are used interchangeably.

The rise of the bots

It's clear the rise of accessible LLMs has been a game-changer for many industries, especially software development. If you haven’t been using these tools, you might be surprised at how prevalent AI has become. According to GitLab's DevSecOps survey, “The vast majority of respondents (90%) said their organizations are using AI in software development today or plan to, and 83% said it is essential to implement AI to avoid falling behind.”

“[90% of respondents] said their organizations are using AI in software development today or plan to, and 83% said it is essential to implement AI to avoid falling behind.” - GitLab Survey

Our engineering team at Noteable has been actively using OpenAI since last summer with the genai Python package for cell-level AI error help for notebooks. Then we started integrating it into various aspects of our product — from building our ChatGPT plugin and in-app error fix suggestions, to the upcoming AI chatbot in our sidebar - and seeing incredible results. And it’s not just for the product - many of us are exploring AI tools for daily work tasks, personal use, and play.

But not the fall of the humans?

Here’s the catch – amidst the buzz about what AI tools can achieve, the critical role of humans in these interactions often takes a backseat. Critics of AI’s capabilities cite the initial ‘crud’, mistakes, and just plain bad code that comes out of LLM tools as a reason they may never be useful, but the reality is more nuanced.

AI can significantly boost productivity, but the human worker remains essential. You can’t just copy and paste AI-generated code onto production and call it a day. From writing concise prompts, to refining responses, to managing the risks and biases, there is a lot of work to making great things with the help of an LLM.

As developers who value craftsmanship, we want to ensure our work meets high standards and retains authenticity. But, if AI can do some of the hard, boring, or tedious stuff, let's use it! The goal is to strike a balance, ensuring our work is not just efficient but also authentic and high-quality.

What can AI tools do for you?

It’s not hard to find people online raving about what AI can do. From summarizing documents to generating boilerplate tests, there are obvious tasks that generative AI makes faster. But what’s really interesting is the way it can change your feelings about the job. GitHub’s Copilot study found that “…productivity gains went beyond speed, with 74% of developers reporting that they felt less frustrated when coding and were able to focus on more satisfying work.” [source].

By speeding up repetitive tasks, reducing some of the cognitive load, and doing the more tedious parts of the job, AI can empower developers to tackle bigger problems, learn new languages, or work on side projects that may have been too draining otherwise.

What are the tools?

There are a ton of "listicles" with AI tools for developers and others. I particularly liked this one by LeadDev.com. The most popular tools seem to be from OpenAI (the ChatGPT interface, their Code Interpreter and plugins), and many 3rd party tools are using OpenAI's API and models. GitHub's Copilot is especially made for software developers, and their Copilot Chat is in beta now. Replit is another AI IDE on the market.

Brainstorming solutions & first drafts

This is a favorite use case for myself and others I’ve talked to. Sometimes the first step is the hardest when starting a new project, feature or document. LLMs can get you over that "writer's block" with a quick prompt asking it to brainstorm ideas or outline solutions. Then you can throw it out or use it, but either way you’ll have a clearer idea where to begin.

This is especially fun for new projects or new programming languages. Little side-projects and "proofs of concept" become much easier with AI making the first pass; use them to create starter code and accelerate the initial setup and get to the more interesting tasks.

Debugging

Imagine this: You’re on call and there’s an incident - the production application has a bug, but it’s in a part of the codebase you’ve never touched. 🙀 You could spend hours combing the codebase and documentation, or you could send over the context -- error message, relevant code, etc -- to a code-trained LLM for help. While it probably can’t solve the problem, it can help illuminate the unknown and act as a virtual, interactive rubber duck debug partner in getting to the root cause. And it's generally friendly & non-judgemental, which is great.

In a less stressful daily scenario, LLM tools in your IDE can help debug small errors and refactor code. GitHub Copilot was designed specifically for this type of task. It uses a programming-specific AI model ("beyond" OpenAI Codex according to an announcement in July) that can access your entire codebase, so it has a good amount of context about the project.

Explaining unknown code & libraries

Since LLMs are great at summarizing large amounts of text, they're very useful for quickly comprehending a big code block or documentation from a library you're working with. Using these tools to understand unfamiliar code and errors can provide more direct answers than searching online, reducing context switching and conserving mental energy.

Boilerplate code, advanced auto-complete

GitHub’s marketing video for Copilot Chat aptly says, “business logic over boilerplate,” which resonates with how I use it regularly - to generate simple unit tests and repetitive functions. This means you can code faster and stay in the flow, instead of switching contexts to lookup how to write a specific syntax or code pattern you forgot. Recently our team wrote multiple E2E tests in a short one-hour pairing session by using Copilot to generate the basic outlines, adding robust coverage to a previously bare area.

Writing drafts of documentation, emails, feedback, meeting notes

Engineering-adjacent tasks, especially those that involve writing, are a perfect use case for generative AI. LLMs were made to summarize, outline, and draft text. With the right prompt and interactions, these tools can help you increase your documentation coverage, write better feedback during review cycles, and draft proposals or tech specs for projects so much faster than starting from scratch.

What must you do for the AI tools?

Despite their advantages, LLM tools come with plenty of limitations. They're often wrong and require diligent fact-checking, testing, and validation. Treating AI-generated responses with scrutiny and asking follow-up questions are essential practices for enhancing the quality of the final product.

Do you really want to use AI for that?

There are still so many unknowns, risks & biases to consider. Many businesses are legitimately concerned about how LLM-powered tools handle important data and some have banned them from the workplace. Racially or culturally biased responses are a well-known risk as well. Only you and your company policies can answer the question of whether it's appropriate to use these tools at work.

It's also worth considering the cost-benefit ratio of time spent editing & iterating on the LLM response, as well as personal work preferences and abilities. Think about what you like to do, what you’re good at, and whether you even need LLM help in that area. If you're someone who enjoys writing, you may have AI write an outline for a proposal, but fill in the text content yourself so it's not generic. Similarly, the context matters - maybe it's fine for AI to generate an email to an insurance company, but one to a colleague is probably better written by hand so it stays authentic.

There is also the issue of how much current knowledge you have in the area, since you must be able to critique the outputs. Yes, an LLM might be able to write code faster than you, but it is well-documented how often it's wrong and buggy. You have to be knowledgeable enough to understand and correct the bugs it creates, or, as blogger and Microsoft employee Scott Hanselman tweeted – “Otherwise it will make the code wrong, 10x faster” [source]

Scott Hanselman's tweet

Prompts and context matter.

When it comes to coversation-like interfaces like ChatGPT and Bing, the clarity and context of a prompt significantly influences the results. Often called prompt engineering or prompt design, learning how to write strong prompts is one key to harnessing the power of generative AI models.

Here are some common tips to better prompting:

  • Assign a role or job title: This helps in limiting the scope and tone of the AI’s responses.
  • Ask it to collaborate: Instead of telling it to do something, encourage conversation and interactions.
  • Set boundaries: Give it a specific area to focus on, like a specific coding language or problem area.
  • Be clear & specific about the response you expect: Use examples if you have them. If you're writing code you can send along a json schema or type interface to get more accurate results.
  • Ask again a different way, or ask for refinements: Keep the conversation going or keep changing your prompt and note the differences in the responses. This is a great way to get better results and to learn what makes a good prompt for a specific task.

Here's an example prompt for ChatGPT using these suggestions:

Imagine you’re a software developer with expertise in Javascript (assigning a role w/ boundaries). Work with me (ask it to collaborate) on coming up with a JSON schema (specifying the response format) for a new e-commerce product catalog. This catalog has a product which has an id, name, price, and optional tags (further limiting the scope & specifying the response). We also need to think of other fields that might be useful for the catalog.

And here's how it turned out when I tried it on ChatGPT.

There are plenty of websites and blogs with advice on prompting, including resources like the OpenAI CookBook, Learn Prompting, this Prompt Engineering Guide, and many others. For an academic look, ExplainPrompt explores LLM prompts from recent research papers. The suggested prompts for students by Harvard's Ethan Mollick offer insight into the deep specificity you can add when prompting, and are good starting points for learning new code concepts. If you really want to get into it, DeepLearning.ai offers a prompt engineering course specifically for software developers using ChatGPT.

Scrutinize and iterate on the responses you get

The importance of thoroughly reviewing AI-generated response cannot be overstated. Every time you have AI generate significant work for you, there are multiple next steps to edit, validate, clarify, and often ask again. This is where the time-to-polished-product tradeoffs have to be considered. When it comes to written work, it can take a lot of editing to get the results you want.

"Your expertise is what will turn a basic LLM response into something great."

Your expertise is what will turn a basic LLM response into something great. As Luca Maria Aiello, from the IT University of Copenhagen, puts it: “AI has the potential to provide a significant productivity boost… However, when it comes to fixing, assessing, improving, combining the response or to integrate it into complex projects, expertise is absolutely needed.” [source]

When it comes to using AI-generated code, GitHub's solid advice (quoted below) is to start with basic understanding and make sure you review, run, and test the generated response.

“So, how do developers know whether GitHub Copilot is generating good or bad code for them? To start, developers using GitHub Copilot should have some basic understanding of the language they are coding in. This will help them determine if the solutions GitHub Copilot suggests are valid. From there, the code should still be run and tested locally. And of course, code reviews should not be skipped!”

No matter which tools you're using, the responsibility is on you to ensure the quality of the work you produce. You have to review, test, and iterate on the AI generated content to make sure it’s what you want, and that there are no performance or security vulnerabilities.

Conclusion

The conversation around AI revolutionizing work in various fields, including coding, writing, and graphics, has been a mix of excitement, awe, denial, and fear. There's no question that LLM tools have a ton of uses, from productivity boosts to eliminating entire categories of tasks, but the core of a developer's job — building robust new features and fixing tricky bugs — still requires human brain power.

There's no magic software wand. Building software demands context about the whole codebase, understanding historical patterns and features, and grasping product requirements. Effective utilization of LLM tools by professionals must involve refining prompts, iterating, and critically assessing the generated content.

Still, the advent of these tools marks a significant stride in software development. Ongoing work has new tools accepting diverse inputs (voice! images!) and continual training improvements. Embracing LLM tools, while recognizing the irreplaceable role of human touch, paves the way for a future where technology and human creativity join forces to push the boundaries of what is possible. Best of all, you can use these AI tools today with your unique input and expertise to excel in your software developer role, and maybe even bring to life those side projects you’ve been dreaming of!

· 6 min read
Ana Rodriguez

I'm a DevOps engineer with six years of hands-on experience in the field. Over the years, I've dived deep into the world of DevOps, constantly seeking ways to improve and adapt. My journey has been a blend of experience and curiosity, leading me to some exciting discoveries.

I've seen firsthand how automation can empower our daily activities. Automation, however, is only part of the equation. Effective collaboration and communication are equally essential in our ever-evolving DevOps landscape. This is where Noteable notebooks come into the spotlight.

Noteable takes the concept of Jupyter notebooks to new heights by providing a collaborative and cloud-powered environment. It's a space where your DevOps expertise can be transformed into interactive documents that not only solve problems but also tell stories.

Privileged access review

Let me tell you about a particular use case that I have recently implemented in my daily activities as part of our security compliance controls: Privileged access review.

Conducting access reviews involves manually checking user privileges across multiple tools and systems. This process was time consuming and error prone. Any oversight or error in the process could potentially lead to security vulnerabilities or compliance issues, which are risks that organizations cannot afford to take.

Unlocking Secrets and APIs

Noteable offers Secrets Management to allow teams to securely store and share connection credentials and other sensitive information. As DevOps engineers, having a centralized repository for secrets simplifies the process of securing our interactions with external services, reducing the risk of unauthorized access or data exposure.

Once we have stored our credentials within Noteable, we can easily automate API calls to various vendors. This functionality is particularly valuable because it allows us to tailor automation to our specific use cases. Whether it's provisioning cloud resources, configuring network settings, or interacting with third-party APIs, Noteable's secrets management empowers us to automate these tasks with confidence.

Okta API

Let's go back to privileged access review; leveraging Noteable's capabilities and integrating it with the Okta API allowed me to streamline this critical process.

Here's how it went down: I explored the capabilities of the Okta API to gather all the essential details about our users and their access levels across our toolbox. This included the raw data: user roles, permissions, and any recent changes.

[1]:
import requests
import os
import pandas

# Set up API parameters
base_url = os.environ['OKTA_URL']
api_token = os.environ['OKTA_TOKEN']

# Make API call to get user's ID
headers = {
"Accept": "application/json",
"Content-Type": "application/json",
"Authorization": "SSWS " + api_token
}
response = requests.get(base_url + "/api/v1/users/", headers=headers)

# Print out user ID
users = response.json()
user_data = []
for user in users:
user_data.append({"Login Email": user["profile"]["login"], "User ID": user["id"]})
user_df = pd.DataFrame(user_data)
user_df

Next up, I put Noteable's Data Exploration and Visualization (DEX) tools to work. These handy tools allowed me to turn that raw data into visually appealing reports; they can now be easily understood by everyone, from technical colleagues to individuals in different departments and external auditors alike.

Noteable dashboard

User activity analysis

Let’s jump into another detailed example that integrates all the features we've explored. This is a situation when I was tackling a surge of crypto-mining attacks; a scenario that requires a comprehensive approach. You can draw inspiration from our real-world case study described in the blog "How to Handle High Volumes of Crypto-Mining Attacks While Still Offering a Free Plan".

Time to break down the process step by step. Here's how to use the Auth0 authentication API within Noteable:

  1. Understanding the Basics: Before diving in, take a moment to familiarize yourself with the Auth0 authentication API. You can find comprehensive documentation at Auth0's API Documentation.
  2. Setting Up the Noteable Environment: Within the Noteable platform, create a new notebook or open an existing one. This will be your workspace for interacting with the Auth0 API.
  3. Importing Necessary Libraries: Begin by importing any Auth0 libraries you'll need to make API requests. For additional information refer to to Auth0-Python documentation.
[3]:
pip install auth0-python
from auth0.authentication import GetToken
from auth0.management import Auth0
info

Before moving to the next step, you must get an Auth0 short-lived access token, your Auth0 tenant domain and the Auth0 application/client ID. You can obtain all from Auth0 dashboard

  1. Configuring Secrets in Noteable: Noteable provides a robust solution to shield sensitive data like API keys. To ensure your Auth0 API key remains protected, follow these steps to store credentials and secrets.
[4]:

# Your Auth0 credentials
domain = "my-amazing-domain.us.auth0.com"
client_id = os.environ["AUTH0_CLIENT_ID"]
client_secret = os.environ["AUTH0_CLIENT_SECRET"]

# Get a management token
get_token = GetToken(domain, client_id, client_secret)
token = get_token.client_credentials(
f"https://{domain}/api/v2/"
)
mgmt_api_token = token["ACCESS_TOKEN"]

# Initialize Auth0 management API
auth0 = Auth0(domain, mgmt_api_token)
  1. Crafting the API Request: Using the Auth0 API's endpoints and methods outlined in their documentation, construct the API request. This might involve making requests to retrieve user data, authentication logs, or any other relevant information.
[5]:
# Get list of all users
users = auth0.users.list()
  1. Parsing the Response: Once you receive a response from the Auth0 API, extract the data you need for your specific use case. This could be data related to daily sign-ups, user activity, or any other relevant metrics.
[6]:
# Get list of all users
# Get list of all users and filter by blocked
users = auth0.users.list(q='blocked:true')

# Extract desired fields from the JSON response
user_data = [{'email': user['email'], 'created_at': user['created_at'], 'blocked': user['blocked']} for user in users['users']]
  1. Visualizing the Data: Use visualization libraries like pandas to create meaningful tables from the data you've obtained. This step turns raw data into insights that are easy to understand.
[7]:
import pandas

# Create a DataFrame from the extracted data
user_df = pd.DataFrame(user_data)

# Print the DataFrame with filtered fields
user_df
tip

Easily create a visualization any time data is returned within Noteable, either as an output to a Python cell or as the result of a SQL cell query.

DEX views
  1. Sharing the Insights: As part of Noteable's collaborative capabilities, consider sharing your notebook with relevant team members. This fosters discussions and allows others to benefit from your insights.
tip

Pro tip: Noteable provides a simple scheduling functionality that allows users to have a notebook version executed on a fixed schedule. Notebooks can be executed hourly, daily, weekly, or monthly.

Conclusion

Looking ahead, I see even more opportunities to leverage Noteable's capabilities across various facets of DevOps. From automating deployment pipelines to enhancing monitoring and reporting, the potential is vast. Noteable has not only positively impacted my current DevOps activities but also ignited a sense of excitement for the future. I can already envision various scenarios where Noteable can be integrated, but more importantly, I can see the potential to leverage its diverse features not just from a technical standpoint but also to share valuable insights with a broader audience.

· 4 min read
Kyle Kelley

For so long, I have wanted an incredible environment to do interactive and literate computing with server side JavaScript. Some of the main requirements I've had are:

  • Top level async await
  • Ease of package management
  • Native support for TypeScript
  • Commitment to web standards
  • Strong community and ecosystem

Deno has all of these.

I was so delighted to find out that Deno developers had already been working on a Jupyter kernel for Deno. The V8 Inspector protocol was already in place as was all the REPL machinery. Discussions were already had about interfaces for exposing rich outputs (HTML, Markdown, Images, etc.). It was a perfect time to get involved. Not knowing Rust yet knowing Jupyter, I was happy to collaborate. I'm so grateful for the Deno team for their tenacity, support, and guidance. They turned this around so quickly.

Bringing interactive literate computing to Deno has been an incredible joy. We want to make it available to as many people as possible. Today, Deno shipped 1.37 with the Jupyter kernel. It's ready for you to experiment with.

Now, I'm excited to announce that Deno Notebooks are now available on Noteable.

info

The Deno kernel is experimental 🧪. As with any data science work, it's worth experimenting!

If you find bugs, report them either to Noteable or to Deno.

Getting Started

You have two ways to launch a Deno kernel from Noteable:

  • On any notebook, click to change the Notebook Settings from the lefthand sidebar. Then, select Deno from the dropdown menu.
  • Using the Noteable Plugin on ChatGPT, ask it to start a Deno notebook. You might have to hint to it to check on what kernels are available.

Once ready, you can write JavaScript. You can write TypeScript too. It's a REPL though so you can make typing mistakes. Get messy!

[1]:
console.log("Hello from Deno", Deno.version.deno);
[1]:
Hello from Deno 1.37.0

Importing Packages

To get started, we'll import some deno packages using ECMAScript modules (ESM). Deno's main way to import packages is through URLs.

[2]:
import { display } from "https://deno.land/x/display@v0.0.6/mod.ts";
import { createCanvas } from "https://deno.land/x/skia_canvas@0.5.4/mod.ts";

Here we've imported display which allows us to display rich outputs like images, markdown, and more in the notebook. We've also imported skia_canvas which is a native binding that gives us access to the Canvas API.

Let's draw some pastel colored rectangles.

[3]:
// Initialize canvas and context
const canvas = createCanvas(180, 180);
const ctx = canvas.getContext("2d");

// Define a pastel color palette
const pastelColors = ["#FFB6C1", "#FFD700", "#98FB98", "#ADD8E6"];

// Draw rectangles
for (let i = 0; i < pastelColors.length; i++) {
ctx.fillStyle = pastelColors[i];
ctx.fillRect(10 + i * 40, 10 + i * 40, 30, 30);
}

// Display the canvas
display(canvas);
[3]:

Pastel Colored Rectangles

npm package support

Beyond ESM, we can also tap into the npm ecosystem using npm:package_name imports.

[4]:
// Import RxJS from npm
import { range } from "npm:rxjs";
import { filter, map } from "npm:rxjs/operators";

_ = range(1, 10)
.pipe(
filter((value) => value % 2 !== 0),
map((value) => value ** 2)
)
.subscribe(console.log);
[4]:
1
9
25
49
81

Bringing IPython Rich Objects to Deno

In IPython, we have access to classes like IPython.display.Markdown and IPython.display.HTML.

from IPython.display import Markdown
import IPython
import sys

display(Markdown(f"""# Notebooks in Python via IPython ![IPython logo](https://github.com/ipython.png?size=32)

* Python {sys.version.split()[0]}
* IPython {IPython.__version__}

Interactive compute with Jupyter _built into IPython_!
"""))

In JavaScript though, we have tagged template literals to simplify how we would make the same calls:

[5]:
import { md } from "https://deno.land/x/display/mod.ts"

md`# TypeScript Notebooks!

![Deno logo](https://github.com/denoland.png?size=32)

- TypeScript ${Deno.version.typescript}
- V8 ${Deno.version.v8}
- Deno ${Deno.version.deno}

Interactive compute with Jupyter _built into Deno_!
`;
[5]:

TypeScript Notebooks!

Deno logo

  • TypeScript 5.2.2
  • V8 11.8.172.3
  • Deno 1.37.0

Interactive compute with Jupyter built into Deno!

This opens up whole new possibilities for generating reports from the comforts of JavaScript. Also available are html and svg tagged template literals.

[6]:
import { html } from "https://deno.land/x/display/mod.ts";

html`<b>Whoa</b>`;
[6]:
Whoa

Where do we go from here?

There is plenty more to do within the kernel. Follow along at the Deno repo and let me know what else you'd like to see. There's a whole world of possibilities.

One area I'm particularly keen on is having excellent support for DataFrames in Deno. I'm excited to see what we can do with Arrow and Polars. That's for another post though. Until then, happy experimenting!

· 9 min read
Elijah Meeks
Let's get me a fun dataframe thing done in Midjourney.Let's get me a fun dataframe thing done in Midjourney.
The 5000-year evolution in the display of tabular data

Whether you work with notebooks or not, you work with data. And when you work with data, it typically comes in the form of tables. Rows and columns of numbers and text are a powerful way of modeling data–so powerful that examples date back to the Middle Kingdom of Egypt, with a carving at the temple of Karnak displaying what looks like a spreadsheet.​​ In notebooks, tables of data most often take the form of Pandas dataframes. They may seem simple and almost invisible but by better understanding what a dataframe is and what it enables, we can better understand how to empower data-driven workflows not only on notebooks but via LLMs and BI tools.

The Great Spreadsheet of Karnak

A Brief History of the Dataframe

The dataframe, as we understand it today, is a foundational element in the area of data science, analytics, and engineering. Its inception can be traced back to R, and earlier data structures in the precursor S. R was designed primarily for statistical computing and graphics, so it was natural that it would implement features like this to support analysis and representation of table data.

In R, the dataframe was conceived as a list of vectors of equal length. This design allowed for the storage of data in a tabular format, where each column could be of a different type (numeric, character, factor, etc.), and each row represented an observation. This structure allowed for complex data manipulations and statistical analyses to be performed seamlessly.

It was soon adopted by Python via the Pandas library. Pandas was designed to provide data structures that make data analysis in Python faster and more intuitive. Before Pandas arrived on the scene, NumPy was the primary option for numerical data analysis in Python, but it lacked the flexibility to easily handle heterogeneous data types or labeled data in a tabular format. While NumPy was, and still is, excellent for numerical computations and mathematical operations, its focus on homogenous data made it less ideal for tasks that required dealing with real-world, messy data. That's where Pandas filled the gap, offering data structures specifically engineered for data manipulation and analysis, which could deal with a wide array of data types.

The name "Pandas" is a portmanteau of "Panel Data". In econometrics and multidimensional data analysis, panels refers to datasets that involve observations over multiple time periods. Imagine a traditional 2D table, but with each cell having the capability to expand into another dimension, creating a cube of data. Originally, Pandas was designed to handle this kind of 3D data, sort of a cubic table, but dropped support for panels because their usage was limited and what utility users saw in them was handled with multi-index dataframes.

How Dataframes Show You Data

The advent of Jupyter Notebooks gained momentum largely due to Pandas' robust tabular data manipulation and the simple, easy, and accessible view within a notebook. For data scientists, viewing the first few rows of a table is often the first step in even the most sophisticated approaches.

Dataframes are more than just tables. They’re a collection of columnar functions, metadata and instructions for rendering deeply integrated with the approach to data analysis enshrined in notebooks.

Those columnar functions include things like filtering the data to look at subsets, group/aggregation functions that transform the data into summaries, sorting the data for more efficient analysis, deriving new columns based on calculations of existing columns and sampling that data. But they also cover what we might think of as descriptive statistics, allowing you to describe a table or summarize individual columns.

That’s just looking at dataframes as a class in a library. They’re more than that–they’re a first-class citizen of a notebook. Support for Dataframes within applications like Jupyter Notebooks naturally led to more robust output processing and handling. Instantiate a dataframe in a notebook and you don’t just get access to those columnar and table-level functions, you also get a visual display of the dataframe in the notebook as part of the cell output. That’s not a feature of the dataframe but rather a feature of the notebook seeing what a dataframe represents.

[1]:
dataframe = pandas.read_csv("happiness.csv")

dataframe
[1]:
The simple HTML table that notebooks typically output when a dataframe is created

That output is traditionally a simple HTML table, but with the implementation of the application/vnd.dataresource+json output type, notebooks could expand on what the default visual representation of a dataframe could be. That’s what enabled the open-source Data Explorer, which allowed notebooks to render data in an interactive and exploratory manner more rich than a static html table. It only took five millenia but we’ve finally improved upon display of tables beyond what was seen in Ancient Egypt.

[2]:
dataframe = pandas.read_csv("happiness.csv")

dataframe
[1]:
The nteract Data Explorer that provides an interactive view into data when a dataframe is created

We expanded on Data Explorer with Noteable’s DEX, a fully-functional BI experience in the notebook. Like all visualization tools, DEX takes tabular data and transforms it just-in-time into other forms of data to produce hierarchical charts, geospatial data visualization, network viz and other representations that are not strictly tabular.

[3]:
dataframe = pandas.read_csv("happiness.csv")

dataframe
[1]:
The same happiness data represented using DEX's Data Prism shared as an iframe output from a Noteable Notebook

The Secret Power of Dataframes

All that functionality describes but isn’t sufficient to explain why dataframes proved so popular. Dataframes provide an immediate answer to that critical question:

What does my data look like?

More importantly, they provide an answer in service to a solution. When you’re working with data you need answers, even when they’re not great. Take any world-class algorithm that’s a part of your life and if you go back to the beginning of its life you’ll find a data scientist looking at an HTML table of data. The top 10 rows of a dataframe, though simple and naive, give analysts, engineers and scientists the raw material to pose questions for analysis and modeling.

Even though ten rows of data is not a great representation of the data, seeing it enables you to ask the next question, get the next not great answer, and work toward a solution. LLM interfaces like ChatGPT have reminded us that people are comfortable with not great and even ephemeral answers when they are in service to a solution. Any user of ChatGPT can point to a pile of mixed up, hallucinated and simplistic answers that helped them work toward their solution.

That’s why we integrated a data visualization prototyping environment called Data Prism into DEX. Data Prism uses a heuristic approach to suggest multiple chart views into the tabular data in the dataframe. They’re like faceting the data but the facets are different views instead of just different metrics or dimensions. These might not be great views into the data but they serve as jumping-off points for better understanding the shape of the data in the same way that the html table view helps analysts and scientists think about their data.

We’ll never get away from tables because tables are the fundamental currency of communicating data. Even when you’re working with exotic data structures (hierarchical data, networks, vectors) eventually they become tables due to the accessibility of that representation. And it’s not just native Python table data, it takes little effort to envision CSVs as dataframes, Tables as dataframes, Spreadsheets as dataframes (especially now that Excel has integrated Python) and SQL databases as sources or sinks for dataframes.

Answers in Service to a Solution

The only way someone comes up with a solution is if you first give them answers. We might think when we see the final results of EDA or data science that it was the answer but it was just the last answer at the end of a string of answers. Answers are both ends and means. Answers, like table views and automatic visualization and LLM chat responses, benefit from immediacy to enable greater analytical sophistication.

The resulting call-response structure, whether in a notebook or an LLM chat is a visible workflow that we know further enables literate programming. That workflow is reproducible, branchable and by its very nature a narrative. Dataframes work because they’re all about answers in service to a solution. It’s almost like they know that they’re part of that workflow. They are an integral part of developing data driven decisions beyond their ability to process tabular data. And the better we can get about providing answers, even not great answers, to data-driven workers, the better we’ll be about enabling that workflow.

· 8 min read
Kyle Kelley

Sometimes it's hard to get people to read your notebook reports no matter how well you design your dashboard. To solve this, let's push reports directly to your colleagues on Slack under the veneer of an adorable chart monster.

👋🏻 Introducing Chartsworth

[1]:
from chartsworth import Chartsworth

chartsworth = Chartsworth()
Chartsworth the chart monster. A rough sketch outline of a floating monster with scruffy fur and adorable eyes amidst axes.Chartsworth the chart monster. A rough sketch outline of a floating monster with scruffy fur and adorable eyes amidst axes.
info

Chartsworth is an open source monster you can install right from PyPI.

pip install chartsworth

Chartsworth posts from notebooks to slack. This little metric monster lets you post plots, images, and anything else the slack sdk will let you post.

[2]:
day = "Monday"
emoji = "tada"
signups = 8127

message = f"""
Happy {day}! It's time for our daily update. :{emoji}:

\# of :new: signups so far this month: *{signups}*
""".strip()

chartsworth.post(message)
Chartsworth's avatarChartsworth's avatar
APP10:04 AM

Happy Saturday! It's time for our daily update. 🎉

# of 🆕 signups so far this month: 8127

Each time you call post, Chartsworth will keep the thread going. You can keep posting without inundating your colleagues with too many messages. They can open the 🧵 for more detail.

[3]:
chartsworth.post("I'm going to post a plot now!")

# fig can be any matplotlib figure
fig = chartsworth.create_monster_plot()

chartsworth.post(fig)
Chartsworth's avatarChartsworth's avatar
APP10:04 AM

Happy Saturday! It's time for our daily update. 🎉

# of 🆕 signups so far this month: 8127


Chartsworth's avatarChartsworth's avatar
APP10:04 AM

I'm going to post a plot now!

Chartsworth's avatarChartsworth's avatar
APP10:04 AM
monster-plot.png ▾

With this in place, you can start to build your own chart monsters to post from your notebooks. Set up a Slack App and install chartsworth to start posting!

📆 Scheduling

The most interesting and useful way to use a chart monster is to schedule your work. Integrate them into your data workflows to get insight into your data pipelines, reports, or even just a daily affirmation.

ETL
Derived Data
Reporting

Within Noteable, this can be done when you save a named version. On the right-hand sidebar click the history icon. After that, you can schedule the notebook to run regularly.

Schedule notebook

This will free you up from "running a daily report" by hand and let you focus on the more interesting parts of your work.

🎂 Creating your first chart monster

note

This section will cover how to write your own version of Chartsworth, and how to use it to post from Notebooks to Slack.

You can also pip install chartsworth to use all the baked in opinions, like threads 🧵 and immediate gratification.

Set a Chart Monster up as a Slack App

The first step is to create a Slack app. This is a fairly straightforward process, but it does require a few steps.

  1. Go to api.slack.com/apps and click the "Create New App" button.

  2. Click "Create New App"

Create an App Modal on api.slack.com
  1. Click "From an app manifest", select your workspace, and click "Next"

At this point it should show you a basic manifest like this:

{
"display_information": {
"name": "Demo App"
},
"settings": {
"org_deploy_enabled": false,
"socket_mode_enabled": false,
"is_hosted": false,
"token_rotation_enabled": false
}
}
  1. Click Next, and then Create. From here you'll land at the app's settings page. Scroll down to "Display Information" so we can add personality ✨.
App Settings on api.slack.com

To make your chart monster, I recommend the following prompts for Midjourney, DALL-E 2, or Stable Diffusion.

black and white monster on a matplotlib chart, slack app logo

a white monster sits amidst matplotlib axes, in the style of dynamic mark making, rough clusters, poodlepunk, scientific diagrams, comic art, simple line drawings, furry art a white monster with eyes is shown with scales, in the style of simplistic ink drawings, comic/cartoon, furry art, spot metering, heavy outlines, frostpunk, konica big mini, a monster is being measured with an x and y axis, in the style of dynamic sketching, rough clusters, sabattier filter, non-linear, large-scale, jump cuts, poodlepunk an image of a cute grey monster metrics, in the style of simplistic ink drawings, maya lin, zeiss batis 18mm f/2.8, frank miller, luminous shadowing, lively facial expressions, spiky mounds

🛠️ How Chartsworth Posts

Slack makes it really easy to post directly using the slack_sdk. Above I used the opinionated version of this workflow using chartsworth. Let's unpack how do this with the slack_sdk directly.

[3]:
!pip install slack_sdk -q

import os
from slack_sdk import WebClient

client = WebClient(token=os.environ["SLACK_API_TOKEN"])

# This can be any channel you've invited your @Chartsworth into
channel = "#chartsworth-dev"

Let's post!

[4]:
message_response = slack_client.chat_postMessage(
channel=chartsworth_channel,
text="I am ready to *plot*"
)

Assuming you've invited your @Chartsworth to a #chartsworth-dev channel, there should now be a post.

Threading

The most useful field from the message_response is the ts field. This is the timestamp of the message chartsworth just posted.

[5]:
message_response["ts"]
[5]:
'1693611091.015119'

You can use this timestamp to thread replies.

[6]:
message_response = slack_client.chat_postMessage(
channel=chartsworth_channel,
text=":tada:",
thread_ts=message_response["ts"],
)

Plotting

Let's post a plot! We're going to use my favorite plot, the monster plot.

[6]:
def create_monster_plot():
"""Creates a Monstrous Matplotlib Figure."""
# Create a Matplotlib figure object.
fig, ax = plt.subplots()

# Create some eyes
eye1_x = np.random.uniform(0.25, 0.75)
eye1_y = np.random.uniform(0.7, 0.9)
eye2_x = np.random.uniform(0.25, 0.75)
eye2_y = np.random.uniform(0.7, 0.9)
ax.plot(eye1_x, eye1_y, "o", markersize=20, color="black")
ax.plot(eye2_x, eye2_y, "o", markersize=20, color="black")

# Create a mouth
x = np.linspace(0, 1, 100)
y = 0.2 + 0.1 * np.sin(np.random.uniform(1, 10) * x + np.random.uniform(0, 2 * np.pi))
ax.plot(x, y, color="black", linewidth=2)

# Set the aspect ratio of the plot to 1 so the monster looks dope
ax.set_aspect("equal", adjustable="box")
ax.set_ylim(0, 1)

# Don't let matplotlib show the plot immediately
plt.close(fig)

return fig

The way to make a plot appear is to save it to a file or a buffer using savefig. After that you can use the files_upload_v2 method to post the image.

[7]:
fig = create_monster_plot()

image_stream = io.BytesIO()
fig.savefig(image_stream, format="png")
# Close the figure so it doesn't show up in the notebook
plt.close(fig)
image_stream.seek(0)

slack_client.files_upload_v2(
channels=chartsworth_channel,
initial_comment="Here's a monster plot!",
file=image_stream,
filename="monster.png",
)

which should result in a post like this:

Chartsworth's avatarChartsworth's avatar
APP10:04 AM

Happy Saturday! It's time for our daily update. 🎉

# of 🆕 signups so far this month: 8127


Chartsworth's avatarChartsworth's avatar
APP10:04 AM

🎉

Chartsworth's avatarChartsworth's avatar
APP10:04 AM
monster-plot.png ▾

🤔 Why Chart Monsters?

Chartsworth is not just another tool or app; it's a way to think about how we communicate data and results. It's a way to make data more accessible and more fun. With a chart monster, you can still provide the link to your notebook for people to look behind the scenes and learn from you. Giving people results at-a-glance where they're already looking and working will enable them to engage with your work quickly and easily.

· 8 min read
Matt Kafonek

Introduction

By now, everyone has played with a large language model like ChatGPT. You've probably gone through those cycles of copying and pasting your email drafts, code snippets, or posts. There's a lot of power in giving large language models more context. The biggest way to do this is to provide access to documents directly. As a developer, you can enable this experience by writing a ChatGPT Plugin that uses OAuth.

tip

OpenAI's own plugin docs include two use cases that either require OAuth or are greatly augmented by it.

  1. Retrieval over user-specific or otherwise hard-to-search knowledge sources

  2. Plugins that give the model computational abilities

OAuth is a mechanism used to enable Single Sign-On (SSO) across applications. When you install the Noteable ChatGPT plugin, you can choose to login or sign up (it's free!) to Noteable using an existing Google, Github, or LinkedIn account. In this post, the Noteable engineering team wants to share some of the low-level details of how OAuth works, and how it's implemented in Noteable. We hope this helps other plugin developers and the community at large.

Why OAuth?

Let’s start with why a plugin would use OAuth, compared to “no auth” or “service level auth”. Simply put, if your plugin or downstream API needs to know about a logged in user, use OAuth. For instance, if you were writing a wikipedia-reading plugin you could skip OAuth because you don’t need to have a logged in user to read Wiki. If the large language model (LLM) is creating Notebooks and running code via the Noteable plugin, which goes through role-based access control (RBAC) permission checks and user-context-aware features, we need to know what user account the request is for.

There are many OAuth providers out there, and there's nothing stopping you from writing your own. We happen to use Auth0, so our examples will include their implementation details (such as authorize and /oauth/token endpoints). OpenAI and Auth0 both have good documentation about OAuth flows, I recommend reading these sections in addition to this blog post if you're working on an OAuth plugin yourself.

OAuth 101

When you click Install on an OAuth-enabled ChatGPT plugin, your browser will be redirected to the OAuth provider page. Once you've completed logging in there, which may entail even more OAuth redirect jumps, the provider will redirect you back to ChatGPT. If everything goes well, ChatGPT will acquire a JSON web token (JWT) that it will include in an Authorization header on every HTTP request to your plugin.

A JWT contains limited identity information about the authenticated user, and has an expiration. You can learn more about JWT's and decode the payloads in a JWT at jwt.io.

OAuth 101

note

When you are developing a plugin in localhost mode, the only authorization type allowed is "none". You cannot test OAuth flows in localhost development mode. You will need to host your plugin somewhere or use a tool like ngrok to create a proxy to your machine.

OAuth apps

OAuth and JWT's are not unique to ChatGPT plugins. A typical front-end / back-end web application would use an OAuth flow very similar to the ChatGPT plugin experience. On the backend, you can validate that the JWT's you're receiving were issued by the OAuth provider you trust by using JSON Web Keys (JWK). At Noteable we use the jwcrypto Python library.

OAuth app

The Noteable ChatGPT plugin is more or less a proxy to our main API. There's a little more going on in our application, but a plugin that is effectively a pass-through to another API can pass the JWT it got from ChatGPT right along as an Authorization header to the real API.

Plugin and Frontend

OAuth configuration

Once you're ready to test out OAuth with your plugin, the first step is to have your plugin hosted somewhere besides localhost and for your manifest file (ai-plugin.json) to have its auth section set to type oauth. You'll also need to have the client_url and authorization_url point to the endpoints of your OAuth provider for the initial redirect and POST to grab the jwt respectively.

When you click "develop your own plugin" in ChatGPT and give it the domain your plugin is hosted at, it will try to download the manifest file and OpenAPI spec file. If it sees your manifest file has type oauth, it will prompt you to enter the client_id and client_secret from your OAuth provider.

Develop your own Plugin step 1

Develop your own Plugin step 2

After you've put those in, ChatGPT will give you a token that you need to add to your manifest file and then redeploy / restart.

Develop your own Plugin step 3

If ChatGPT can pull the manifest file and see the new token, then the "develop your own plugin" flow is complete and ChatGPT will give you a plugin application id that you can use to update the redirect_uri in your OAuth provider.

OAuth config

note

Scope is optional, and is an empty string in the OpenAI example. Noteable uses scopes openid profile email offline_access in order get back three tokens during the OAuth process: access_token, id_token, and refresh_token (all are JWTs).

  • ChatGPT uses the access_token in Authorization headers to our plugin
  • ChatGPT will automatically refresh access_token using the refresh_token
  • Noteable uses the name and email from the id_token payload to create a User account in Noteable if one does not already exist

You can read more about scopes here

Painful Lessons

One decision we made early on at Noteable that turned out to be a mistake was creating User accounts using the sub payload from the Auth0 identity tokens, and looking up a User row in our database from the sub in the Auth0 access token. Each login mechanism ended up being its own separate User in our system. If you logged in to app.noteable.io using your Github social login, then installed the ChatGPT plugin and authenticated with your Google login, you would end up with all sorts of permission denied errors trying to work on Notebooks between them. It was a major pain point.

A compounding problem was that we were enforcing email verification for username / password accounts using a rule in Auth0 that would not return a JWT until the user clicked a link in their email. In our Noteable app frontend, when you signed up that way we could direct the user what to do. However we had no control over the ChatGPT UI, and from the user perspective they would install the Noteable plugin and it would fail with no error message. Technically there was an error code in url arguments of the redirect from Auth0 back to ChatGPT, but it would take an eagle-eyed user to notice that. Our temporary solution was to disable username / password login from the Auth0 application we used for ChatGPT, funneling even more users into the multiple-account problem space.

Bad Times

Our solution was to create a second database table we called Principals to represent the login mechanism. A Google, Github, or Auth0 username/password login with the same email all link to the same User account now. We reconfigured our ChatGPT manifest file to proxy the authorize and token endpoints through our plugin so that we could automatically create or link Noteable accounts during the OAuth flow. We moved the email verification onto our own system instead of within an Auth0 rule, with error handling in the plugin to tell the user that while they did successfully install the Noteable ChatGPT plugin, they still need to click the email verification link before it will successfully create Notebooks or run code for them.

Good Times

Account Creation

Localhost development

We mentioned at the top of the post that you cannot do OAuth testing in localhost development. If your backend API requires a JWT for authentication though, what do you do? Luckily at Noteable we issue our own tokens for programmatic access to our API, which we'll talk more about in other blog posts and show off in Origami documentation.

Localhost Development

Final Thoughts

Integrating OAuth with ChatGPT plugins opens up a world of personalized possibilities, linking the reasoning capabilities of Large Language Models with personalized content. If you're a developer inspired by the idea of creating powerful, user-centric plugins, now's the time to get started. Dive into plugins, explore OpenAI's documentation on plugins, and make the most of OAuth to unlock the potential of personalized interaction. Join us on this journey and let's push what ChatGPT + Plugins can do!

· 11 min read
Kyle Kelley

Introduction

[1]:
import openai
import noteable

The Noteable Plugin for ChatGPT exposes rich computational notebooks to GPT Models, allowing you to experiment with data, analysis, and interactive computing in new ways.

Notebooks are a pivotal tool for data scientists, researchers, and engineers. Now, they're available for everyone. Notebooks allow us to communicate ideas, explore data, and share results. Notebooks are a tool for thinking. Most people use them to explore data. They're also a great little laboratory for experimenting with APIs.

Artificial Intelligence companies like OpenAI clearly included notebooks as part of the corpus of training data because their Large Language Models know notebooks. ChatGPT can create well documented notebooks with code and prose. It has shown me whole new ways of using libraries and analyzing data that previously were out of reach or just unknown to me. Anyone who knows notebooks and interactive computing in general, knows what this means. It's a big deal.

Please create an ETL notebook for the IMDb dataset

ChatGPTChatGPT
UsedNoteable
Request to Noteable
{
"notebook_name": "IMDb_ETL.ipynb"
}
Response from Noteable
{
"project_id": "3b179d03-9efb-4f6c-ad7a-fa0a478d45e4",
"file_id": "d3d944d0-5a18-48b3-bf3e-fdb59d8a5dea",
"notebook_url": "https://app.noteable.io/f/d3d944d0-5a18-48b3-bf3e-fdb59d8a5dea/IMDb_ETL.ipynb",
"assistant_hint": "The user can now follow along with your cell work at the notebook link."
}

I've created a new notebook named IMDb_ETL_Notebook.ipynb.

Now, let's start by writing the ETL (Extract, Transform, Load) process. The ETL process involves:

  1. Extract: Download the IMDb dataset.
  2. Transform: Clean and transform the data into a suitable format.
  3. Load: Load the transformed data into a data structure or database for analysis.

Let's start with the extraction process. Could you please provide the URL or the location of the IMDb dataset?

Early experiments with integrating LLMs into notebooks were very promising. Open source projects have shown the potential of integrating IPython and Jupyter with Large Language Models (Jupyter AI, genai). These projects allow for deeper exception handling, inline assistance, and even code generation. It's fantastic. Due to technical limitations in open source Jupyter though, there's no realtime server side access to the notebook document model.

What we needed was a way for the LLM to create and edit cells, start new notebooks, and debug its own code. Everything that would be part of the workflow for a data scientist, data engineer, or anyone else looking to learn in a notebook.

Enabling Large Language Models to Interact with Notebooks

Building on our foundations for realtime collaboration and scheduled notebooks, we began developing a plugin that would allow the model to create and edit cells, start new notebooks, and read outputs. The LLM could participate in the full workflow of data science-everything from data loading to exploration to reporting.

The model can do this because we built a realtime API that allows users to collaborate and notebooks to run server-side as scheduled jobs. This also means that AI Agents can join the mix. Let's look at how this happens from a ChatGPT plugin perspective.

Following along with the IMDb ETL example above, ChatGPT proceeds to make Markdown and Code cells in the notebook it created.

ChatGPTChatGPT
UsedNoteable
Request to Noteable
// ChatGPT requests to use noteable.create_cell
{
file_id: "f879d549-a4cc-4f5c-bde8-edea8b3950ec",
cell_type: "markdown",
source: [
"# ETL Process for IMDb Dataset\n",
"\n",
"## Step 1: Extract\n",
"\n",
"The IMDb datasets are available at 'https://datasets.imdbws.com/'. We will download the required datasets for our analysis.",
],
}
Response from Noteable
{
"cell": {
"id": "760dad4d-001f-40b3-9375-165a2f67ee21",
"cell_type": "markdown"
},
"cell_link": "https://app.noteable.io/f/f879d549-a4cc-4f5c-bde8-edea8b3950ec/?cellID=760dad4d-001f-40b3-9375-165a2f67ee21",
"cell_state": null,
"kernel_state": "idle",
"kernel_state_description": "Kernel is ready for input",
"assistant_hint": "The cell was created successfully"
}

The Noteable Plugin tells ChatGPT about the cell ID, a link to the cell, and information about the kernel (runtime) for the notebook. This is important as ChatGPT proceeds to create and run code cells.

ChatGPTChatGPT
UsedNoteable
Request to Noteable
{
"file_id": "f879d549-a4cc-4f5c-bde8-edea8b3950ec",
"cell_type": "code",
"source": [
"!pip install -q wget",
"\n",
"import wget\n",
"\n",
"# URLs of the IMDb datasets\n",
"urls = [\n",
" 'https://datasets.imdbws.com/name.basics.tsv.gz',\n",
" 'https://datasets.imdbws.com/title.akas.tsv.gz',\n",
" 'https://datasets.imdbws.com/title.basics.tsv.gz',\n",
" 'https://datasets.imdbws.com/title.crew.tsv.gz',\n",
" 'https://datasets.imdbws.com/title.episode.tsv.gz',\n",
" 'https://datasets.imdbws.com/title.principals.tsv.gz',\n",
" 'https://datasets.imdbws.com/title.ratings.tsv.gz'\n",
"]\n",
"\n",
"# Download the datasets\n",
"for url in urls:\n",
" wget.download(url)"
],
"and_run": true
}
Response from Noteable
{
"cell": {
"id": "edf64f21-40a8-4b8a-a39c-845242f66851",
"cell_type": "code",
},
"cell_link": "https://app.noteable.io/f/f879d549-a4cc-4f5c-bde8-edea8b3950ec/?cellID=edf64f21-40a8-4b8a-a39c-845242f66851",
"cell_state": "executing",
"kernel_state": "busy",
"kernel_state_description": "Kernel is processing input",
"assistant_hint": "The cell was created successfully\nCell execution taking more than 11 seconds.\nGet the cell for final results, or have the user view their notebook cell at\nhttps://app.noteable.io/f/f879d549-a4cc-4f5c-bde8-edea8b3950ec/?cellID=edf64f21-40a8-4b8a-a39c-845242f66851 to learn more."
}

I've started the process of downloading the IMDb datasets. This process might take a while due to the large size of the datasets. You can check the progress in your notebook here.

Once the download is complete, we can proceed to the next step of the ETL process, which is data transformation. In this step, we will clean and preprocess the data to make it suitable for analysis.

Do you have any specific transformations or cleaning steps you'd like to apply to the data? For example, removing null values, filtering specific columns, etc.

Go ahead and load the title basics into a dataframe.

What's amazing about this is that it can feel like having a partner to work on the notebook with you, directly. Cells live update in your separate window of the notebook open on Noteable.

[4]:
import pandas as pd

# Load the dataset into a pandas DataFrame
df = pd.read_csv('title.ratings.tsv', sep='\t')

As soon as you have data loaded, start requesting visualizations.

Please visualize the max runtimes per year over the past 20 years.

ChatGPTChatGPT
UsedNoteable
Request to Noteable
{
"file_id": "169e3c44-5b82-4101-9fec-52ca2d84962d",
"cell_type": "code",
"source": [
"import matplotlib.pyplot as plt",
"import datetime",
"\n",
"# Clean the data",
"title_basics['startYear'] = pd.to_numeric(title_basics['startYear'], errors='coerce')",
"title_basics['runtimeMinutes'] = pd.to_numeric(title_basics['runtimeMinutes'], errors='coerce')",
"\n",
"# Filter the data for the past 20 years",
"current_year = datetime.datetime.now().year",
"title_basics_filtered = title_basics[(title_basics['startYear'] >= current_year - 20) & (title_basics['startYear'] <= current_year)]",
"\n",
"# Group by year and get the max runtime",
"max_runtimes = title_basics_filtered.groupby('startYear')['runtimeMinutes'].max()",
"\n",
"# Plot the data",
"plt.figure(figsize=(10, 6))",
"plt.plot(max_runtimes.index, max_runtimes.values, marker='o')",
"plt.xlabel('Year')",
"plt.ylabel('Max Runtime (minutes)')",
"plt.title('Max Runtimes per Year Over the Past 20 Years')",
"plt.grid(True)",
"plt.show()"
],
"and_run": true
}
Request to Noteable
{
"cell": {
"id": "7daa7f99-097e-48c6-9467-ae79e29a9a7b",
"source": "import matplotlib.pyplot as plt\nimport datetime\n\n# Clean the data\ntitle_basics['startYear'] = pd.to_numeric(title_basics['startYear'], errors='coerce')\ntitle_basics['runtimeMinutes'] = pd.to_numeric(title_basics['runtimeMinutes'], errors='coerce')\n\n# Filter the data for the past 20 years\ncurrent_year = datetime.datetime.now().year\ntitle_basics_filtered = title_basics[(title_basics['startYear'] >= current_year - 20) & (title_basics['startYear'] <= current_year)]\n\n# Group by year and get the max runtime\nmax_runtimes = title_basics_filtered.groupby('startYear')['runtimeMinutes'].max()\n\n# Plot the data\nplt.figure(figsize=(10, 6))\nplt.plot(max_runtimes.index, max_runtimes.values, marker='o')\nplt.xlabel('Year')\nplt.ylabel('Max Runtime (minutes)')\nplt.title('Max Runtimes per Year Over the Past 20 Years')\nplt.grid(True)\nplt.show()",
"metadata": {
"noteable": {
"cell_type": "code",
"output_collection_id": "183eb3d9-d050-4f20-8b65-d868a66466bb"
}
},
"cell_type": "code",
"outputs": [
{
"type": "display_data",
"mimetype": "image/png",
"assistant_hint": null,
"url": "https://chat.noteable.io/origami/o/97599da17b2b47d0a620945c89fc1bd1.png"
}
]
},
"cell_link": "https://app.noteable.io/f/169e3c44-5b82-4101-9fec-52ca2d84962d/?cellID=7daa7f99-097e-48c6-9467-ae79e29a9a7b",
"cell_state": "finished_with_no_error",
"kernel_state": "idle",
"kernel_state_description": "Kernel is ready for input",
"assistant_hint": null
}

Here is the plot showing the maximum runtimes per year over the past 20 years:

Max runtimes across all films and TV shows for the last 20 years

It's not perfect by any means! As you can see in the chart above, ChatGPT didn't convert the years to timestamps leading to the x-axis showing 2017.5. Nothing a little prompting can't fix.

On the x-axis, it should read as whole years not "2017.5". Can you also make this a much more beautiful plot? Human readable labels, no grid. Real clean modern infographic.

ChatGPTChatGPT
UsedNoteable
Request to Noteable
{
"file_id": "169e3c44-5b82-4101-9fec-52ca2d84962d",
"cell_type": "code",
"source": [
"import seaborn as sns",
"\n",
"sns.set_theme(style='whitegrid')",
"\n",
"# Create a figure and a set of subplots",
"fig, ax = plt.subplots(figsize=(10, 6))",
"\n",
"# Plot the data",
"sns.lineplot(x=max_runtimes.index, y=max_runtimes.values, marker='o', ax=ax)",
"\n",
"# Set xticks as integers (whole years)",
"ax.xaxis.set_major_locator(plt.MaxNLocator(integer=True))",
"\n",
"# Remove the grid",
"ax.grid(False)",
"\n",
"# Set labels and title",
"ax.set_xlabel('Year', fontsize=14)",
"ax.set_ylabel('Max Runtime (minutes)', fontsize=14)",
"ax.set_title('Max Runtimes per Year Over the Past 20 Years', fontsize=16)",
"\n",
"# Show the plot",
"plt.show()"
],
"and_run": true
}
Response from Noteable
{
"cell": {
"id": "9d16e873-2f43-4d9c-ba95-28131fd05eb2",
"source": "import seaborn as sns\n\nsns.set_theme(style='whitegrid')\n\n# Create a figure and a set of subplots\nfig, ax = plt.subplots(figsize=(10, 6))\n\n# Plot the data\nsns.lineplot(x=max_runtimes.index, y=max_runtimes.values, marker='o', ax=ax)\n\n# Set xticks as integers (whole years)\nax.xaxis.set_major_locator(plt.MaxNLocator(integer=True))\n\n# Remove the grid\nax.grid(False)\n\n# Set labels and title\nax.set_xlabel('Year', fontsize=14)\nax.set_ylabel('Max Runtime (minutes)', fontsize=14)\nax.set_title('Max Runtimes per Year Over the Past 20 Years', fontsize=16)\n\n# Show the plot\nplt.show()",
"metadata": {
"noteable": {
"cell_type": "code",
"output_collection_id": "e509680d-f48a-4131-97fe-00062385b1df"
}
},
"cell_type": "code",
"outputs": [
{
"type": "display_data",
"mimetype": "image/png",
"assistant_hint": null,
"url": "https://chat.noteable.io/origami/o/984967bccd8b479594f508f6674c2251.png"
}
]
},
"cell_link": "https://app.noteable.io/f/169e3c44-5b82-4101-9fec-52ca2d84962d/?cellID=9d16e873-2f43-4d9c-ba95-28131fd05eb2",
"cell_state": "finished_with_no_error",
"kernel_state": "idle",
"kernel_state_description": "Kernel is ready for input",
"assistant_hint": "The cell was created successfully"
}

Here is the updated plot showing the maximum runtimes per year over the past 20 years. I've made the x-axis labels as whole years and improved the aesthetics of the plot for a cleaner, more modern look:

Max runtimes across all films and TV shows for the last 20 years

One of my favorite approaches once I reach this stage with data loading is to start asking questions of my data. I'll apply domain knowledge to my real datasets (unlike this example). ChatGPT will debug, reason about outputs and exceptions, and generally be a great partner in tandem with the Noteable Plugin. You know who else can? ChatGPT's Code Interpreter.

Rapid Iteration with GPT Models and Interpreters

There isn't just one way to code. There are also many ways to create data products. In both ChatGPT with the Noteable Plugin and OpenAI's Code Interpreter, the underlying Large Language Model can reason, debug, run code, analyze data, and more.

One of the strengths of OpenAI's Code Interpreter is that it's fast to start up. Additionally, they've used some clever prompting and Reinforcement Learning from Human Feedback (RLHF) to create a compelling user experience.

However, one aspect of the Code Interpreter's design limits its capability: it's ephemeral. This approach allows for aggressive caching (speed to startup), but it also means you lose your work when your session ends.

I know this from creating tmpnb back in 2014, which we used to provide notebooks to readers of an IPython article in the Nature Journal. Not long after we had this deployed for everyone, we started finding out that people wanted to use our temporary notebooks for real analysis work. We fundamentally disrupted the problem of access and people wanted more. That means access to external APIs and especially external SQL data sources. Done securely, this is the most useful advancement in user interfaces to Large Language Models we've seen yet.

Whether you're using OpenAI's Code Interpreter, Jupyter extensions, or Noteable, the underlying models from OpenAI excel at performing analysis work and collaborating with human users.

If anyone from OpenAI is reading this, thank you so much for the models and the absolutely stunning APIs. I'm not even getting a chance to gush about what you can build with Chat Functions yet. Another time, very soon.

Providing Durable Access with Notebooks

For some people and workloads, an ephemeral environment is perfect. For others, they need a more durable solution—this is where a Notebook Platform like Noteable comes in. I enjoy all the access to my data files, environment variables/secrets, and databases.

Notebooks document computations. They're expressive, reproducible communication tools for collaborating with others. The hosted environment expands the possibilities for analysis tasks, and git integration lets our platform participate in robust, durable projects.

While LLMs are powerful as standalone tools, they are even more powerful when exposed to notebooks. I've always been excited to enable more people to feel the delight of interactive computing. Now it feels like that reach is much more broad and more accessible.

My goal is to build excellent environments for collaborative analysis and development. To enable everyone, from small teams to massive scale. We're seeing a glimpse of this now with Noteable and OpenAI's GPT models. Collaboration now includes not only human peers but also artificial intelligence in the same document. I can't wait to see what you create.

Get started with the Noteable Plugin today!

P.S. If you're a developer, reach out. We're launching a way for you to do everything you can do in the plugin faster directly with APIs compatible with your favorite LLM tooling.