Run Databao

Before you begin

Initialize an agent

An agent in Databao acts as the main interface for database connections and context.

To initialize an agent, use the following code:


import databao
from databao import LLMConfig
 
llm_config = LLMConfig(name="gpt-4o-mini", temperature=0)
agent = databao.new_agent(llm_config=llm_config)

Add data sources

You can add configured database connections and data frames to the agent as follows:


# Add a database connection
agent.add_db(conn)
 
# Add a dataframe
agent.add_df(df)
 
# Add a dataframe with context as a string
agent.add_df(df, context="some context")
 
# Add a dataframe with context as a file
# If you add context as a file, make sure to import Path (`from pathlib import Path`)
agent.add_df(df, context=Path("context.md"))

Create a thread and ask a question

Threads are conversations within an agent and you can think of them as single chats in ChatGPT or Claude. Threads have their own message history, so you can ask follow-up questions based on a previous answer in the same thread.

To start a conversation thread:
```
thread = agent.thread()
```

To ask a question and get a dataframe as a response, use the df() method:


df = thread.ask("List all the shows produced in Germany").df()
print(df.head())

To ask a question and get a text response, use the text() method as in either ot the following options:


thread.ask("List all the shows produced in Germany")
print(thread.text())


thread.ask("List all the shows produced in Germany").text()

To generate a visualization, use the plot() method on the thread:
```
thread.plot("Create a bar chart of shows by country")
```
Databao uses Vega-Lite to generate visualizations, and you can specify any supported chart or plot type in your prompt.
You can also access the generated plot code as follows:
```
print(plot.code)
```

Chain questions

Because all questions in a thread have the same context, you can chain them if you data processing flow requires several steps:


thread \
    .ask("List all the shows produced in Germany") \
    .ask("Sort them by the year") \
    .df()