SlideShare a Scribd company logo
1	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Ibis:	
  Scaling	
  the	
  Python	
  Data	
  
Experience	
  
Wes	
  McKinney	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Marcel	
  Kornacker	
  
JusFn	
  Erickson 	
   	
  Silvius	
  Rus	
  
2	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Wes	
  McKinney	
  
•  A	
  key	
  person	
  in	
  building	
  today’s	
  open	
  source	
  Python	
  data	
  community	
  
•  Creator	
  of	
  pandas,	
  a	
  standard	
  Python	
  data	
  wrangling	
  and	
  analyFcs	
  toolkit	
  used	
  
by	
  data	
  scienFsts	
  
•  Author	
  of	
  best-­‐selling	
  canonical	
  text	
  Python	
  for	
  Data	
  Analysis	
  (2012)	
  
•  Formerly	
  Founder/CEO	
  of	
  DataPad	
  (acquired	
  by	
  Cloudera	
  in	
  2014)	
  
3	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Python	
  is	
  popular…	
  
•  Python	
  has	
  become	
  a	
  standard	
  language	
  of	
  data	
  science	
  
•  Why	
  is	
  it	
  popular?	
  
• Maximizes	
  producFvity	
  for	
  data	
  engineers	
  and	
  data	
  scienFsts	
  
• Build	
  robust	
  so[ware	
  and	
  do	
  interacFve	
  data	
  analysis	
  with	
  100%	
  Python	
  code	
  	
  
• Easy-­‐to-­‐learn	
  and	
  makes	
  happy	
  and	
  producFve	
  data	
  teams	
  	
  
• Large,	
  diverse	
  open	
  source	
  development	
  community	
  
• Comprehensive	
  libraries:	
  data	
  wrangling,	
  ML,	
  visualizaFon,	
  etc.	
  
•  Main	
  use	
  case:	
  data	
  science	
  &	
  engineering	
  swiss	
  army	
  knife	
  on	
  small-­‐to-­‐medium	
  
size	
  data	
  
4	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
…but	
  Python	
  does	
  not	
  scale	
  today	
  
•  Python	
  ecosystem	
  confined	
  to	
  single-­‐node	
  analysis	
  
• Great	
  for	
  smaller	
  data	
  sets	
  
• Requires	
  sampling	
  or	
  aggregaFons	
  for	
  larger	
  data	
  
• Distributed	
  tools	
  compromise	
  in	
  various	
  ways	
  
•  ExtracFng	
  samples	
  or	
  aggregaFons	
  for	
  larger	
  data	
  means:	
  
• “Scales”	
  by	
  losing	
  more	
  fidelity	
  
• AddiFonal	
  ETL	
  overhead	
  to	
  extract	
  samples/aggregaFons	
  
• Loss	
  of	
  producFvity	
  with	
  mulFple	
  languages,	
  tools,	
  etc	
  
• Blocks	
  certain	
  analysis	
  and	
  use	
  cases	
  
5	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Ibis:	
  Same	
  Python,	
  now	
  at	
  scale	
  
•  Target	
  user:	
  
• Data	
  scienFsts	
  and	
  data	
  engineers	
  (“Python	
  data	
  users”)	
  
•  Goals:	
  
• Mirrors	
  single-­‐node	
  Python	
  experience	
  
• Scales	
  to	
  any	
  node	
  and	
  data	
  size	
  
• No	
  compromise	
  in	
  funcFonality	
  or	
  usability	
  
• InteracFve	
  experience	
  at	
  naFve	
  hardware	
  speeds	
  
6	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
What’s	
  announced?	
  
•  First	
  public	
  release	
  of	
  Ibis	
  
• hgp://ibis-­‐project.org	
  
•  Beta	
  release	
  to	
  Cloudera	
  Labs	
  
•  InviFng	
  usage	
  and	
  community	
  development	
  
•  Apache-­‐licensed	
  open-­‐source	
  
7	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Ibis’s	
  Vision	
  
•  Uncompromised	
  Python	
  experience	
  
• 100%	
  Python	
  end-­‐to-­‐end	
  user	
  workflows	
  	
  
• Enable	
  integraFon	
  with	
  the	
  exisFng	
  Python	
  data	
  ecosystem	
  (pandas,	
  scikit-­‐
learn,	
  NumPy,	
  etc)	
  
•  InteracFve	
  at	
  big	
  data	
  scale	
  
• Full-­‐fidelity	
  analysis	
  without	
  extracFons	
  
• Scalability	
  for	
  big	
  data	
  
• NaFve	
  hardware	
  speeds	
  for	
  a	
  broad	
  set	
  of	
  use	
  cases	
  
8	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
9	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Advantages	
  of	
  our	
  approach	
  
•  Analyze	
  big	
  data	
  100%	
  in	
  Python,	
  with	
  the	
  same	
  ease	
  as	
  small/medium	
  data	
  on	
  
the	
  local	
  filesystem	
  
•  Full-­‐fidelity	
  data	
  access	
  
•  Familiar	
  Python	
  experience	
  and	
  integraFon	
  with	
  exisFng	
  Python	
  data	
  libraries	
  
•  Provide	
  a	
  means	
  for	
  Python	
  high	
  performance	
  compuFng	
  tools	
  to	
  be	
  leveraged	
  at	
  
Hadoop-­‐scale	
  
10	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Beta	
  0.3	
  release 	
  	
  
•  High	
  level	
  Python	
  API	
  for	
  describing	
  analyFcs	
  and	
  ETL	
  that	
  can	
  be	
  executed	
  by	
  
Impala	
  
• Familiar	
  API	
  for	
  users	
  of	
  pandas	
  
• Comprehensive	
  coverage	
  of	
  operaFons	
  expressible	
  as	
  relaFonal	
  data	
  flows	
  
•  Integrated	
  tools	
  for	
  managing	
  data	
  in	
  HDFS	
  
•  Simple	
  workflows	
  to	
  query	
  data	
  files	
  in	
  several	
  formats	
  (Parquet,	
  Avro,	
  Text)	
  
•  pandas	
  data	
  interchange	
  
11	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Ibis/Impala	
  Joint	
  Roadmap	
  
•  More	
  natural	
  data	
  modeling	
  
• Complex	
  types	
  support	
  
•  IntegraFon	
  with	
  full	
  Python	
  data	
  ecosystem	
  
• Advanced	
  analyFcs	
  +	
  machine	
  learning	
  
• Enable	
  use	
  of	
  performance	
  compuFng	
  tools	
  
•  User	
  extensibility	
  with	
  naFve	
  performance	
  
• In-­‐memory	
  columnar	
  format	
  
• Python-­‐to-­‐LLVM	
  IR	
  compilaFon	
  
•  Workflow	
  and	
  usability	
  tools	
  
12	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Benefits	
  of	
  Ibis	
  
•  Maximize	
  developer	
  producFvity	
  
• Mirrors	
  single-­‐node	
  Python	
  experience	
  
• Solve	
  big	
  data	
  problems	
  without	
  leaving	
  Python	
  
• Leverage	
  Python	
  skills,	
  ecosystem,	
  and	
  tools	
  
•  Python	
  as	
  first-­‐class	
  language	
  for	
  Hadoop	
  
• Full-­‐fidelity	
  analysis	
  without	
  extracFons	
  
• Python	
  analysis	
  at	
  any	
  scale	
  
• NaFve	
  hardware	
  speeds	
  for	
  a	
  broad	
  set	
  of	
  use	
  cases	
  
13	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Thank	
  you	
  
wes@cloudera.com	
  

More Related Content

What's hot

Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
DataWorks Summit
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
Anshuman Ghosh
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
Timothy Spann
 
Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data ArchitectureRunning Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
DataWorks Summit
 
SDLC with Apache NiFi
SDLC with Apache NiFiSDLC with Apache NiFi
SDLC with Apache NiFi
DataWorks Summit
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
DataWorks Summit
 
Spark Infrastructure Made Easy
Spark Infrastructure Made EasySpark Infrastructure Made Easy
Spark Infrastructure Made Easy
BlueData, Inc.
 
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
Timothy Spann
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Building a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and SparkBuilding a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and Spark
DataWorks Summit/Hadoop Summit
 
Apache Zeppelin and Spark for Enterprise Data Science
Apache Zeppelin and Spark for Enterprise Data ScienceApache Zeppelin and Spark for Enterprise Data Science
Apache Zeppelin and Spark for Enterprise Data Science
Bikas Saha
 
What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4
DataWorks Summit
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
Hortonworks
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
DataWorks Summit/Hadoop Summit
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop Summit
Aldrin Piri
 
Apache NiFi: Ingesting Enterprise Data At Scale
Apache NiFi:   Ingesting Enterprise Data At Scale Apache NiFi:   Ingesting Enterprise Data At Scale
Apache NiFi: Ingesting Enterprise Data At Scale
Timothy Spann
 
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonScaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Hortonworks
 
OpenStack + Nano Server + Hyper-V + S2D
OpenStack + Nano Server + Hyper-V + S2DOpenStack + Nano Server + Hyper-V + S2D
OpenStack + Nano Server + Hyper-V + S2D
Alessandro Pilotti
 
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for CybersecurityCloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera, Inc.
 

What's hot (20)

Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
 
Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data ArchitectureRunning Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
 
SDLC with Apache NiFi
SDLC with Apache NiFiSDLC with Apache NiFi
SDLC with Apache NiFi
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
Spark Infrastructure Made Easy
Spark Infrastructure Made EasySpark Infrastructure Made Easy
Spark Infrastructure Made Easy
 
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Building a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and SparkBuilding a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and Spark
 
Apache Zeppelin and Spark for Enterprise Data Science
Apache Zeppelin and Spark for Enterprise Data ScienceApache Zeppelin and Spark for Enterprise Data Science
Apache Zeppelin and Spark for Enterprise Data Science
 
What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop Summit
 
Apache NiFi: Ingesting Enterprise Data At Scale
Apache NiFi:   Ingesting Enterprise Data At Scale Apache NiFi:   Ingesting Enterprise Data At Scale
Apache NiFi: Ingesting Enterprise Data At Scale
 
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonScaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC Isilon
 
OpenStack + Nano Server + Hyper-V + S2D
OpenStack + Nano Server + Hyper-V + S2DOpenStack + Nano Server + Hyper-V + S2D
OpenStack + Nano Server + Hyper-V + S2D
 
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for CybersecurityCloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
 

Viewers also liked

BCHS - Final Presentation
BCHS - Final PresentationBCHS - Final Presentation
BCHS - Final Presentation
Linda Zheng
 
Pharmacy baba
Pharmacy babaPharmacy baba
Pharmacy baba
sainaburg09
 
Pk 08.06 final
Pk 08.06 finalPk 08.06 final
Pk 08.06 final
luisadoniacovo
 
Inferring networks of substitute and complementary products
Inferring networks of substitute and complementary productsInferring networks of substitute and complementary products
Inferring networks of substitute and complementary products
Turi, Inc.
 
Bob’s training programs
Bob’s training programsBob’s training programs
Bob’s training programs
Bob Seshadri
 
ETP Introduction for Launch Events
ETP Introduction for Launch EventsETP Introduction for Launch Events
ETP Introduction for Launch Events
RL Learning
 
American Builders Quarterly 12-12-07
American Builders Quarterly 12-12-07American Builders Quarterly 12-12-07
American Builders Quarterly 12-12-07
Mark Roshanski
 
Ob1 unit 4 chapter - 15 - power and politics
Ob1   unit 4 chapter - 15 - power and politicsOb1   unit 4 chapter - 15 - power and politics
Ob1 unit 4 chapter - 15 - power and politics
Dr S Gokula Krishnan
 
Screenplay - 'Kay'
Screenplay - 'Kay'Screenplay - 'Kay'
Screenplay - 'Kay'
skywalker97
 
De 2
De 2 De 2
Ob1 unit 4 chapter - 12 - managing teams at work
Ob1   unit 4 chapter - 12 - managing teams at workOb1   unit 4 chapter - 12 - managing teams at work
Ob1 unit 4 chapter - 12 - managing teams at work
Dr S Gokula Krishnan
 
Managing Time as a Coach
Managing Time as a CoachManaging Time as a Coach
Managing Time as a Coach
RL Learning
 
Fuel cell stacking
Fuel cell stackingFuel cell stacking
Fuel cell stacking
Pana Mann
 
Ob1 unit 4 chapter - 16 - conflict management
Ob1   unit 4 chapter - 16 - conflict managementOb1   unit 4 chapter - 16 - conflict management
Ob1 unit 4 chapter - 16 - conflict management
Dr S Gokula Krishnan
 
Osvaldo Ajuda C.V.-English
Osvaldo Ajuda C.V.-EnglishOsvaldo Ajuda C.V.-English
Osvaldo Ajuda C.V.-English
Osvaldo Ajuda
 
Marketing_Collateral_Samples_2015_final
Marketing_Collateral_Samples_2015_finalMarketing_Collateral_Samples_2015_final
Marketing_Collateral_Samples_2015_final
Troy Wise
 
First
FirstFirst
First
Pana Mann
 

Viewers also liked (18)

BCHS - Final Presentation
BCHS - Final PresentationBCHS - Final Presentation
BCHS - Final Presentation
 
Pharmacy baba
Pharmacy babaPharmacy baba
Pharmacy baba
 
Pk 08.06 final
Pk 08.06 finalPk 08.06 final
Pk 08.06 final
 
Inferring networks of substitute and complementary products
Inferring networks of substitute and complementary productsInferring networks of substitute and complementary products
Inferring networks of substitute and complementary products
 
Bob’s training programs
Bob’s training programsBob’s training programs
Bob’s training programs
 
ETP Introduction for Launch Events
ETP Introduction for Launch EventsETP Introduction for Launch Events
ETP Introduction for Launch Events
 
American Builders Quarterly 12-12-07
American Builders Quarterly 12-12-07American Builders Quarterly 12-12-07
American Builders Quarterly 12-12-07
 
Ob1 unit 4 chapter - 15 - power and politics
Ob1   unit 4 chapter - 15 - power and politicsOb1   unit 4 chapter - 15 - power and politics
Ob1 unit 4 chapter - 15 - power and politics
 
Screenplay - 'Kay'
Screenplay - 'Kay'Screenplay - 'Kay'
Screenplay - 'Kay'
 
De 2
De 2 De 2
De 2
 
Ob1 unit 4 chapter - 12 - managing teams at work
Ob1   unit 4 chapter - 12 - managing teams at workOb1   unit 4 chapter - 12 - managing teams at work
Ob1 unit 4 chapter - 12 - managing teams at work
 
Rapport ramed 2013 v2
Rapport ramed 2013 v2Rapport ramed 2013 v2
Rapport ramed 2013 v2
 
Managing Time as a Coach
Managing Time as a CoachManaging Time as a Coach
Managing Time as a Coach
 
Fuel cell stacking
Fuel cell stackingFuel cell stacking
Fuel cell stacking
 
Ob1 unit 4 chapter - 16 - conflict management
Ob1   unit 4 chapter - 16 - conflict managementOb1   unit 4 chapter - 16 - conflict management
Ob1 unit 4 chapter - 16 - conflict management
 
Osvaldo Ajuda C.V.-English
Osvaldo Ajuda C.V.-EnglishOsvaldo Ajuda C.V.-English
Osvaldo Ajuda C.V.-English
 
Marketing_Collateral_Samples_2015_final
Marketing_Collateral_Samples_2015_finalMarketing_Collateral_Samples_2015_final
Marketing_Collateral_Samples_2015_final
 
First
FirstFirst
First
 

Similar to Pandas & Cloudera: Scaling the Python Data Experience

Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinneyIbis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Hakka Labs
 
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenEnabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
Wes McKinney
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
Wes McKinney
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
Cloudera, Inc.
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
Cloudera, Inc.
 
Python Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the FuturePython Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the Future
Wes McKinney
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Codemotion
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
 
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache ArrowNext-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
Wes McKinney
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
Hortonworks
 
Deep learning on HDP 2018 Prague
Deep learning on HDP 2018 PragueDeep learning on HDP 2018 Prague
Deep learning on HDP 2018 Prague
Timothy Spann
 
Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
Wes McKinney
 
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
ADV Slides: Trends in Streaming Analytics and Message-oriented MiddlewareADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
DATAVERSITY
 
Apache Arrow and Python: The latest
Apache Arrow and Python: The latestApache Arrow and Python: The latest
Apache Arrow and Python: The latest
Wes McKinney
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
Jason Hubbard
 
PyData: The Next Generation
PyData: The Next GenerationPyData: The Next Generation
PyData: The Next Generation
Wes McKinney
 
Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018
Timothy Spann
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems
 
Apache deep learning 101
Apache deep learning 101Apache deep learning 101
Apache deep learning 101
DataWorks Summit
 
IBM Developer Model Asset eXchange
IBM Developer Model Asset eXchangeIBM Developer Model Asset eXchange
IBM Developer Model Asset eXchange
Nick Pentreath
 

Similar to Pandas & Cloudera: Scaling the Python Data Experience (20)

Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinneyIbis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
 
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenEnabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
 
Python Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the FuturePython Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the Future
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache ArrowNext-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
 
Deep learning on HDP 2018 Prague
Deep learning on HDP 2018 PragueDeep learning on HDP 2018 Prague
Deep learning on HDP 2018 Prague
 
Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
 
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
ADV Slides: Trends in Streaming Analytics and Message-oriented MiddlewareADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
 
Apache Arrow and Python: The latest
Apache Arrow and Python: The latestApache Arrow and Python: The latest
Apache Arrow and Python: The latest
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
PyData: The Next Generation
PyData: The Next GenerationPyData: The Next Generation
PyData: The Next Generation
 
Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
 
Apache deep learning 101
Apache deep learning 101Apache deep learning 101
Apache deep learning 101
 
IBM Developer Model Asset eXchange
IBM Developer Model Asset eXchangeIBM Developer Model Asset eXchange
IBM Developer Model Asset eXchange
 

More from Turi, Inc.

Webinar - Analyzing Video
Webinar - Analyzing VideoWebinar - Analyzing Video
Webinar - Analyzing Video
Turi, Inc.
 
Webinar - Patient Readmission Risk
Webinar - Patient Readmission RiskWebinar - Patient Readmission Risk
Webinar - Patient Readmission Risk
Turi, Inc.
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)
Turi, Inc.
 
Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)
Turi, Inc.
 
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)
Turi, Inc.
 
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)
Turi, Inc.
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Turi, Inc.
 
Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataPattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log Data
Turi, Inc.
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning Toolkits
Turi, Inc.
 
Text Analysis with Machine Learning
Text Analysis with Machine LearningText Analysis with Machine Learning
Text Analysis with Machine Learning
Turi, Inc.
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
Turi, Inc.
 
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive Services
Turi, Inc.
 
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos GuestrinMachine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Turi, Inc.
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data science
Turi, Inc.
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Turi, Inc.
 
Introduction to Recommender Systems
Introduction to Recommender SystemsIntroduction to Recommender Systems
Introduction to Recommender Systems
Turi, Inc.
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
Turi, Inc.
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
Turi, Inc.
 
SFrame
SFrameSFrame
SFrame
Turi, Inc.
 
Building Personalized Data Products with Dato
Building Personalized Data Products with DatoBuilding Personalized Data Products with Dato
Building Personalized Data Products with Dato
Turi, Inc.
 

More from Turi, Inc. (20)

Webinar - Analyzing Video
Webinar - Analyzing VideoWebinar - Analyzing Video
Webinar - Analyzing Video
 
Webinar - Patient Readmission Risk
Webinar - Patient Readmission RiskWebinar - Patient Readmission Risk
Webinar - Patient Readmission Risk
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)
 
Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)
 
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)
 
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
 
Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataPattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log Data
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning Toolkits
 
Text Analysis with Machine Learning
Text Analysis with Machine LearningText Analysis with Machine Learning
Text Analysis with Machine Learning
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
 
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive Services
 
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos GuestrinMachine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos Guestrin
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data science
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
 
Introduction to Recommender Systems
Introduction to Recommender SystemsIntroduction to Recommender Systems
Introduction to Recommender Systems
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
 
SFrame
SFrameSFrame
SFrame
 
Building Personalized Data Products with Dato
Building Personalized Data Products with DatoBuilding Personalized Data Products with Dato
Building Personalized Data Products with Dato
 

Recently uploaded

GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 

Recently uploaded (20)

GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 

Pandas & Cloudera: Scaling the Python Data Experience

  • 1. 1  ©  Cloudera,  Inc.  All  rights  reserved.   Ibis:  Scaling  the  Python  Data   Experience   Wes  McKinney                    Marcel  Kornacker   JusFn  Erickson    Silvius  Rus  
  • 2. 2  ©  Cloudera,  Inc.  All  rights  reserved.   Wes  McKinney   •  A  key  person  in  building  today’s  open  source  Python  data  community   •  Creator  of  pandas,  a  standard  Python  data  wrangling  and  analyFcs  toolkit  used   by  data  scienFsts   •  Author  of  best-­‐selling  canonical  text  Python  for  Data  Analysis  (2012)   •  Formerly  Founder/CEO  of  DataPad  (acquired  by  Cloudera  in  2014)  
  • 3. 3  ©  Cloudera,  Inc.  All  rights  reserved.   Python  is  popular…   •  Python  has  become  a  standard  language  of  data  science   •  Why  is  it  popular?   • Maximizes  producFvity  for  data  engineers  and  data  scienFsts   • Build  robust  so[ware  and  do  interacFve  data  analysis  with  100%  Python  code     • Easy-­‐to-­‐learn  and  makes  happy  and  producFve  data  teams     • Large,  diverse  open  source  development  community   • Comprehensive  libraries:  data  wrangling,  ML,  visualizaFon,  etc.   •  Main  use  case:  data  science  &  engineering  swiss  army  knife  on  small-­‐to-­‐medium   size  data  
  • 4. 4  ©  Cloudera,  Inc.  All  rights  reserved.   …but  Python  does  not  scale  today   •  Python  ecosystem  confined  to  single-­‐node  analysis   • Great  for  smaller  data  sets   • Requires  sampling  or  aggregaFons  for  larger  data   • Distributed  tools  compromise  in  various  ways   •  ExtracFng  samples  or  aggregaFons  for  larger  data  means:   • “Scales”  by  losing  more  fidelity   • AddiFonal  ETL  overhead  to  extract  samples/aggregaFons   • Loss  of  producFvity  with  mulFple  languages,  tools,  etc   • Blocks  certain  analysis  and  use  cases  
  • 5. 5  ©  Cloudera,  Inc.  All  rights  reserved.   Ibis:  Same  Python,  now  at  scale   •  Target  user:   • Data  scienFsts  and  data  engineers  (“Python  data  users”)   •  Goals:   • Mirrors  single-­‐node  Python  experience   • Scales  to  any  node  and  data  size   • No  compromise  in  funcFonality  or  usability   • InteracFve  experience  at  naFve  hardware  speeds  
  • 6. 6  ©  Cloudera,  Inc.  All  rights  reserved.   What’s  announced?   •  First  public  release  of  Ibis   • hgp://ibis-­‐project.org   •  Beta  release  to  Cloudera  Labs   •  InviFng  usage  and  community  development   •  Apache-­‐licensed  open-­‐source  
  • 7. 7  ©  Cloudera,  Inc.  All  rights  reserved.   Ibis’s  Vision   •  Uncompromised  Python  experience   • 100%  Python  end-­‐to-­‐end  user  workflows     • Enable  integraFon  with  the  exisFng  Python  data  ecosystem  (pandas,  scikit-­‐ learn,  NumPy,  etc)   •  InteracFve  at  big  data  scale   • Full-­‐fidelity  analysis  without  extracFons   • Scalability  for  big  data   • NaFve  hardware  speeds  for  a  broad  set  of  use  cases  
  • 8. 8  ©  Cloudera,  Inc.  All  rights  reserved.  
  • 9. 9  ©  Cloudera,  Inc.  All  rights  reserved.   Advantages  of  our  approach   •  Analyze  big  data  100%  in  Python,  with  the  same  ease  as  small/medium  data  on   the  local  filesystem   •  Full-­‐fidelity  data  access   •  Familiar  Python  experience  and  integraFon  with  exisFng  Python  data  libraries   •  Provide  a  means  for  Python  high  performance  compuFng  tools  to  be  leveraged  at   Hadoop-­‐scale  
  • 10. 10  ©  Cloudera,  Inc.  All  rights  reserved.   Beta  0.3  release     •  High  level  Python  API  for  describing  analyFcs  and  ETL  that  can  be  executed  by   Impala   • Familiar  API  for  users  of  pandas   • Comprehensive  coverage  of  operaFons  expressible  as  relaFonal  data  flows   •  Integrated  tools  for  managing  data  in  HDFS   •  Simple  workflows  to  query  data  files  in  several  formats  (Parquet,  Avro,  Text)   •  pandas  data  interchange  
  • 11. 11  ©  Cloudera,  Inc.  All  rights  reserved.   Ibis/Impala  Joint  Roadmap   •  More  natural  data  modeling   • Complex  types  support   •  IntegraFon  with  full  Python  data  ecosystem   • Advanced  analyFcs  +  machine  learning   • Enable  use  of  performance  compuFng  tools   •  User  extensibility  with  naFve  performance   • In-­‐memory  columnar  format   • Python-­‐to-­‐LLVM  IR  compilaFon   •  Workflow  and  usability  tools  
  • 12. 12  ©  Cloudera,  Inc.  All  rights  reserved.   Benefits  of  Ibis   •  Maximize  developer  producFvity   • Mirrors  single-­‐node  Python  experience   • Solve  big  data  problems  without  leaving  Python   • Leverage  Python  skills,  ecosystem,  and  tools   •  Python  as  first-­‐class  language  for  Hadoop   • Full-­‐fidelity  analysis  without  extracFons   • Python  analysis  at  any  scale   • NaFve  hardware  speeds  for  a  broad  set  of  use  cases  
  • 13. 13  ©  Cloudera,  Inc.  All  rights  reserved.   Thank  you   wes@cloudera.com