SlideShare a Scribd company logo
1 of 39
Download to read offline
Getting started with
OpenAI and Data
science
SUSAN IBACH | HOCKEYGEEKGIRL
SUSAN.IBACH@LIVE.COM
You can't go
anywhere these
days without
hearing about
Generative AI
AI won't replace you, but someone with your skills + AI might
Coders are more productive when they
use AI to help them code
 Over 80% of coders say they are more productive when they use a code helper
such as GitHub Copilot
 74% say it enables them to focus on more satisfying work
 96% say they are faster completing repetitive tasks
 When studying two control groups, the group using a built in AI to help with
coding completed their tasks 50% faster
Okay I get it Susan this AI thing
looks useful, how do I get
started using it for data
science?
You could just
open up ChatGPT
ask it to write code
for you then copy
& paste
But the real win is doing it inside your IDE!
This Photo by Unknown Author is licensed under CC BY
Step 1
Find a Large
Language Model
(LLM) you can install
inside your IDE
This takes a bit
of research
OpenAI – Owned by Microsoft
Codeium – VS Code, Vim, Jupyter Notebook, Eclipse
GitHub Copilot – comes as an extension for VS Code, Visual
Studio, JetBrains
Obsidian Integration, heroml, Superpower extension,
llmops.space, cursor.so, ChatGPT, CometLLM, Cohere
I use Jupyter
notebooks so
I'm going with
Jupyter AI
Jupyter AI is
vendor neutral
and can
connect to
different LLMs
 AI21
 Anthropic
 AWS
 Cohere
 HuggingFace Hub
 OpenAI
I chose OpenAI
because I had
played with it a
bit already
Step 2
Install the
extension or
library in your IDE
If you want to use Jupyter AI with OpenAI
in a Jupyter Notebook
Software versions required
 Requires Jupyter Lab 4
 Python 3.8 – 3.11 (I installed Python 3.11.6 64 bit)
Accounts required (you can start with the free version)
 OpenAI
If you want to use Jupyter AI with OpenAI
in a Jupyter Notebook
Install the openai library
 pip install openai
Create an environment variable and set it to the API key for your OpenAI account
 OPENAI_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxx
 Each LLM supported has a specific environment variable name
Install the jupyter_ai extension
 load_ext jupyter_ai
Not all OpenAI models are created equal
Version GPT-3.5 Turbo GPT-4.0
Speed Faster Slower
Database size 10X size of ChatGPT 3.5
and can handle images
Quality of output 40% more likely to
produce factual responses
than 3.5, better at dialects
$ Input / 1000 tokens $0.00005 $0.03
$ Output / 1000 tokens $0.0015 $0.06
You can find more information on pricing at openai.com
So what is a token anyway?
You can think of tokens as pieces of words
Wayne Gretzky’s quote "You miss 100% of the shots you don't take" contains 11 tokens
1 token is about 4 characters in English
1 English word is typically 1.3 tokens
1 French word is typically 2 tokens
Punctuation marks are counted as one token
Special characters are one to three tokens
Emojis are between two to three tokens
Step 3
Try a hello world
type command
Ask the AI to create "Hello World"
%%ai chatgpt --format code
display a message that says hello world
Possible successful outputs include
print("Hello World")
System.out.println("Hello World");
console.log("Hello World");
echo "Hello World";
Step 4
Evaluate the
suggested code
AI does not replace programmers.
Programmers with AI replace programmers
 There is more than one way to write code to complete a task
 LLMs make an educated guess based on code it has seen in the past
 The coder provides the knowledge to evaluate the suggestion from the AI and make
modifications to the prompt as needed (referred to as prompt engineering)
Curious about
pricing?
How much did that cost?
How many tokens and calls was it?
Step 5
Now we can play!
Maybe I need a
dataframe with
some sample
data
Maybe I forgot the
syntax for returning
entries that start
with a particular
letter
Let's read a .csv file
and then do some
linear regression
Let's read a .csv file
and then do some
linear regression
ValueError: Input y
contains NaN
AI does not replace
programmers.
Programmers with AI
replace
programmers
What would a
coder do? We'd
get rid of the rows
with Nulls and try
again!
Victory!
I have successfully
produced a plot but if
you don't know how
to read it this isn't
going to help you 
AI does not replace
data scientists. Data
scientists with AI
replace data
scientists
Until today, I have never done a
live code demo
- with this much code
- in a session this short
- without having to look up
method names and parameters
- without spending time in the
session having the audience
help me find my typing mistakes
AI doesn't replace
presenters.
Presenters with AI
replace presenters
References
ChatGPT
Open AI
Project Jupyter | Installing Jupyter
Generative AI in Jupyter. Jupyter AI, a new open source project… | by Jason Weill | Jupyter Blog
GitHub - jupyterlab/jupyter-ai: A generative AI extension for JupyterLab
What are tokens and how to count them
OpenAI Pricing
Questions?
SUSAN IBACH | HOCKEYGEEKGIRL
SUSAN.IBACH@LIVE.COM
Thank you!

More Related Content

Similar to Confoo 2024 Gettings started with OpenAI and data science

ChatGPT and AI for web developers - Maximiliano Firtman
ChatGPT and AI for web developers - Maximiliano FirtmanChatGPT and AI for web developers - Maximiliano Firtman
ChatGPT and AI for web developers - Maximiliano FirtmanWey Wey Web
 
Open Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdfOpen Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdfJavier Perez
 
Python and its Applications
Python and its ApplicationsPython and its Applications
Python and its ApplicationsAbhijeet Singh
 
Pythonanditsapplications 161121160425
Pythonanditsapplications 161121160425Pythonanditsapplications 161121160425
Pythonanditsapplications 161121160425Sapna Tyagi
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
Breaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI TechnologyBreaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI TechnologySafe Software
 
Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox Animesh Singh
 
OpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonOpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonCodeOps Technologies LLP
 
A Year of Pyxley: My First Open Source Adventure
A Year of Pyxley: My First Open Source AdventureA Year of Pyxley: My First Open Source Adventure
A Year of Pyxley: My First Open Source AdventureNick Kridler
 
Listen and look at your PHP code
Listen and look at your PHP codeListen and look at your PHP code
Listen and look at your PHP codeGabriele Santini
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014openi_ict
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards   25-6-2014API Athens Meetup - API standards   25-6-2014
API Athens Meetup - API standards 25-6-2014Michael Petychakis
 
Building an Open Source iOS app: lessons learned
Building an Open Source iOS app: lessons learnedBuilding an Open Source iOS app: lessons learned
Building an Open Source iOS app: lessons learnedWojciech Koszek
 
OpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and MisconceptionsOpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and MisconceptionsIvo Andreev
 
PHP Interview Questions for Freshers 2018
PHP Interview Questions for Freshers 2018PHP Interview Questions for Freshers 2018
PHP Interview Questions for Freshers 2018AshokKumar3319
 
First Steps in Python Programming
First Steps in Python ProgrammingFirst Steps in Python Programming
First Steps in Python ProgrammingDozie Agbo
 

Similar to Confoo 2024 Gettings started with OpenAI and data science (20)

ChatGPT and AI for web developers - Maximiliano Firtman
ChatGPT and AI for web developers - Maximiliano FirtmanChatGPT and AI for web developers - Maximiliano Firtman
ChatGPT and AI for web developers - Maximiliano Firtman
 
Walter api
Walter apiWalter api
Walter api
 
Open Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdfOpen Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdf
 
Python and its Applications
Python and its ApplicationsPython and its Applications
Python and its Applications
 
Pythonanditsapplications 161121160425
Pythonanditsapplications 161121160425Pythonanditsapplications 161121160425
Pythonanditsapplications 161121160425
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Breaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI TechnologyBreaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI Technology
 
Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox
 
OpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonOpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in Python
 
A Year of Pyxley: My First Open Source Adventure
A Year of Pyxley: My First Open Source AdventureA Year of Pyxley: My First Open Source Adventure
A Year of Pyxley: My First Open Source Adventure
 
Listen and look at your PHP code
Listen and look at your PHP codeListen and look at your PHP code
Listen and look at your PHP code
 
LVPHP.org
LVPHP.orgLVPHP.org
LVPHP.org
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards   25-6-2014API Athens Meetup - API standards   25-6-2014
API Athens Meetup - API standards 25-6-2014
 
Building an Open Source iOS app: lessons learned
Building an Open Source iOS app: lessons learnedBuilding an Open Source iOS app: lessons learned
Building an Open Source iOS app: lessons learned
 
From open source labs to ceo methods and advice by sysfera
From open source labs to ceo methods and advice by sysferaFrom open source labs to ceo methods and advice by sysfera
From open source labs to ceo methods and advice by sysfera
 
OpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and MisconceptionsOpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and Misconceptions
 
PHP Interview Questions for Freshers 2018
PHP Interview Questions for Freshers 2018PHP Interview Questions for Freshers 2018
PHP Interview Questions for Freshers 2018
 
First Steps in Python Programming
First Steps in Python ProgrammingFirst Steps in Python Programming
First Steps in Python Programming
 
Performance Tuning with XHProf
Performance Tuning with XHProfPerformance Tuning with XHProf
Performance Tuning with XHProf
 

Recently uploaded

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 

Recently uploaded (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 

Confoo 2024 Gettings started with OpenAI and data science

  • 1. Getting started with OpenAI and Data science SUSAN IBACH | HOCKEYGEEKGIRL SUSAN.IBACH@LIVE.COM
  • 2. You can't go anywhere these days without hearing about Generative AI
  • 3. AI won't replace you, but someone with your skills + AI might
  • 4. Coders are more productive when they use AI to help them code  Over 80% of coders say they are more productive when they use a code helper such as GitHub Copilot  74% say it enables them to focus on more satisfying work  96% say they are faster completing repetitive tasks  When studying two control groups, the group using a built in AI to help with coding completed their tasks 50% faster
  • 5. Okay I get it Susan this AI thing looks useful, how do I get started using it for data science?
  • 6. You could just open up ChatGPT ask it to write code for you then copy & paste
  • 7. But the real win is doing it inside your IDE! This Photo by Unknown Author is licensed under CC BY
  • 8. Step 1 Find a Large Language Model (LLM) you can install inside your IDE
  • 9. This takes a bit of research OpenAI – Owned by Microsoft Codeium – VS Code, Vim, Jupyter Notebook, Eclipse GitHub Copilot – comes as an extension for VS Code, Visual Studio, JetBrains Obsidian Integration, heroml, Superpower extension, llmops.space, cursor.so, ChatGPT, CometLLM, Cohere
  • 10. I use Jupyter notebooks so I'm going with Jupyter AI
  • 11. Jupyter AI is vendor neutral and can connect to different LLMs  AI21  Anthropic  AWS  Cohere  HuggingFace Hub  OpenAI
  • 12. I chose OpenAI because I had played with it a bit already
  • 13. Step 2 Install the extension or library in your IDE
  • 14. If you want to use Jupyter AI with OpenAI in a Jupyter Notebook Software versions required  Requires Jupyter Lab 4  Python 3.8 – 3.11 (I installed Python 3.11.6 64 bit) Accounts required (you can start with the free version)  OpenAI
  • 15. If you want to use Jupyter AI with OpenAI in a Jupyter Notebook Install the openai library  pip install openai Create an environment variable and set it to the API key for your OpenAI account  OPENAI_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxx  Each LLM supported has a specific environment variable name Install the jupyter_ai extension  load_ext jupyter_ai
  • 16. Not all OpenAI models are created equal Version GPT-3.5 Turbo GPT-4.0 Speed Faster Slower Database size 10X size of ChatGPT 3.5 and can handle images Quality of output 40% more likely to produce factual responses than 3.5, better at dialects $ Input / 1000 tokens $0.00005 $0.03 $ Output / 1000 tokens $0.0015 $0.06 You can find more information on pricing at openai.com
  • 17. So what is a token anyway? You can think of tokens as pieces of words Wayne Gretzky’s quote "You miss 100% of the shots you don't take" contains 11 tokens 1 token is about 4 characters in English 1 English word is typically 1.3 tokens 1 French word is typically 2 tokens Punctuation marks are counted as one token Special characters are one to three tokens Emojis are between two to three tokens
  • 18. Step 3 Try a hello world type command
  • 19. Ask the AI to create "Hello World" %%ai chatgpt --format code display a message that says hello world Possible successful outputs include print("Hello World") System.out.println("Hello World"); console.log("Hello World"); echo "Hello World";
  • 21. AI does not replace programmers. Programmers with AI replace programmers  There is more than one way to write code to complete a task  LLMs make an educated guess based on code it has seen in the past  The coder provides the knowledge to evaluate the suggestion from the AI and make modifications to the prompt as needed (referred to as prompt engineering)
  • 23. How much did that cost?
  • 24. How many tokens and calls was it?
  • 25. Step 5 Now we can play!
  • 26. Maybe I need a dataframe with some sample data
  • 27. Maybe I forgot the syntax for returning entries that start with a particular letter
  • 28. Let's read a .csv file and then do some linear regression
  • 29. Let's read a .csv file and then do some linear regression
  • 31. AI does not replace programmers. Programmers with AI replace programmers
  • 32. What would a coder do? We'd get rid of the rows with Nulls and try again!
  • 33. Victory! I have successfully produced a plot but if you don't know how to read it this isn't going to help you 
  • 34. AI does not replace data scientists. Data scientists with AI replace data scientists
  • 35. Until today, I have never done a live code demo - with this much code - in a session this short - without having to look up method names and parameters - without spending time in the session having the audience help me find my typing mistakes
  • 36. AI doesn't replace presenters. Presenters with AI replace presenters
  • 37. References ChatGPT Open AI Project Jupyter | Installing Jupyter Generative AI in Jupyter. Jupyter AI, a new open source project… | by Jason Weill | Jupyter Blog GitHub - jupyterlab/jupyter-ai: A generative AI extension for JupyterLab What are tokens and how to count them OpenAI Pricing
  • 38. Questions? SUSAN IBACH | HOCKEYGEEKGIRL SUSAN.IBACH@LIVE.COM