GPT-3: A Data Scientist in the Making

Author(s): Shubham Saboo

Natural Language Processing

Autopilot exploratory data analysis in pandas by leveraging the capabilities of the world’s most sophisticated language model GPT-3…

“Errors using inadequate data are much less than those using no data at all” — Charles Babbage

Pre-Requisites

I have collected the dots in the form of articles, please go through the below articles in the same order to connect the dots and understand the key tech stack behind the intelligent Kube Bot:

FastAPI — The Spiffy Way Beyond Flask!
Streamlit — Revolutionizing Data App Creation
A Brief Introduction to GPT-3

Introduction to Pandas

Pandas is a fast, powerful, and easy-to-use open-source data analysis and manipulation tool built on top of the Python programming language. It is widely accepted among the Python community and is used in many other packages, frameworks, and modules. Pandas is an extremely flexible framework and has a wide range of use-cases for preparing the data for machine learning and deep learning models.

“Torture the data to the right extent and it will confess to anything” — Ronald Coase

Installing pandas

Pandas is available as a standard python library at PyPI, which can be easily installed using either pip or conda depending on the python environment. Due to the popularity of Pandas, it has its own conventional abbreviation, so the following command can be used for installing Pandas:

import pandas as pd

What kind of data pandas can handle?

If you work with tabular data, such as data in spreadsheets or databases, pandas is the right tool for you. With Pandas, you can explore, clean, and process your data. In pandas, a data table is called a DataFrame.

Fig: Illustration of Pandas Dataframe

How to read and write tabular data with pandas?

Pandas support the integration with many file formats or data sources out of the box (like CSV, excel, SQL, JSON, parquet, etc). It is fairly easy and straightforward to import data from these sources by using the prefix read_*. Similarly, we can use the to_* methods to export the data to the respective formats.

Fig: Illustration of import and export sources in pandas

Application walkthrough

Now I will walk you through the GPT-3 powered pandas assistant application step by step:

While creating any GPT-3 application the first and foremost thing to consider is the design and content of the training prompt. Prompt design is the most significant process in priming the GPT-3 model to give a favorable and contextual response.

As a rule of thumb while designing the training prompt you should aim towards getting a zero-shot response from the model, if that isn’t possible move forward with few examples rather than providing it with an entire corpus. The standard flow for training prompt design should look like: Zero-Shot → Few Shots →Corpus-based Priming.

For designing the training prompt for the pandas assistant application, I have used the following structure for the training prompt:

Description: An initial description of the context about what the pandas assistant is supposed to do and adding a line or two about its functionality.
Natural Language (English): This component includes a minimal one-liner description of the task that will be performed by the pandas assistant. It helps GPT-3 to understand the context in order to generate proper pandas code in python.
Pandas Code: This component includes the pandas code corresponding to the English description provided as an input to the GPT-3 model.

Input → Natural Language ; Output → Pandas Code

Streamlit powered UI (All in Python)
The magic of FastAPI → On-the-fly API documentation

Let’s see an example in action, to truly understand the power of GPT-3 in generating pandas code from pure English language. In the below example, we will generate the pandas code by providing minimal instructions to the AI pandas assistant.

https://medium.com/media/0f95fe36e9e8bb31d81f222dc2d70475/href

References

https://en.wikipedia.org/wiki/GPT-3
https://openai.com/blog/openai-api
https://pandas.pydata.org/docs

If you would like to learn more or want to me write more on this subject, feel free to reach out.

My social links: LinkedIn| Twitter | Github

If you liked this post or found it helpful, please take a minute to press the clap button, it increases the post visibility for other medium users.

GPT-3: A Data Scientist in the Making was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI