SlideShare a Scribd company logo

Using Oracle Big Data Discovey as a Data Scientist's Toolkit

As delivered at Trivadis Tech Event 2016 - how Big Data Discovery along with Python and pySpark was used to build predictive analytics models against wearables and smart home data

Using Oracle Big Data Discovey as a Data Scientist's Toolkit

1 of 40
Download to read offline
T : @markrittman
USING ORACLE BIG DATA DISCOVERY AS THE
DATA SCIENTIST'S TOOLKIT
Mark Rittman, Oracle ACE Director
TRIVADIS TECHEVENT 2016, ZÜRICH
•Oracle ACE Director, blogger + ODTUG member
•Regular columnist for Oracle Magazine
•Past ODTUG Executive Board Member
•Author of two books on Oracle BI
•Co-founder & CTO of Rittman Mead
•15+ Years in Oracle BI, DW, ETL + now Big Data
•Implementor, trainer, consultant + company founder
•Based in Brighton, UK
About The Presenter
2
•A visual front-end to the Hadoop data reservoir, providing end-user access to datasets
•Data sampled and loaded from Hadoop (Hive) into NoSQL Dgraph engine for fast analysis
•Catalog, profile, analyse and combine schema-on-read datasets across the Hadoop cluster
•Visualize and search datasets to gain insights, potentially load in summary form into DW
Oracle Big Data Discovery - What Is It?
3
FOR DATA SCIENTISTS?
4
NOT THAT USEFUL.
5
(A Data Scientist)
6

Recommended

Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Mark Rittman
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...Mark Rittman
 
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Mark Rittman
 
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle CloudOTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle CloudMark Rittman
 
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Mark Rittman
 
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?Mark Rittman
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsMark Rittman
 
OTN EMEA TOUR 2016 - OBIEE12c New Features for End-Users, Developers and Sys...
OTN EMEA TOUR 2016  - OBIEE12c New Features for End-Users, Developers and Sys...OTN EMEA TOUR 2016  - OBIEE12c New Features for End-Users, Developers and Sys...
OTN EMEA TOUR 2016 - OBIEE12c New Features for End-Users, Developers and Sys...Mark Rittman
 

More Related Content

What's hot

Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsMark Rittman
 
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Mark Rittman
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...Mark Rittman
 
Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Mark Rittman
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017Rittman Analytics
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing ArchitectureGang Tao
 
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case StudyOracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case StudyMark Rittman
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformCaserta
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azureDavid Giard
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringAnant Corporation
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalMichael Rainey
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data SolutionsGuido Schmutz
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data PipelineJesus Rodriguez
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 

What's hot (20)

Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
 
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
 
Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
 
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case StudyOracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data Engineering
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle Professional
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 

Viewers also liked

40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) hamaken
 
Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016Cloudera Japan
 
ICTSC5 DMM.comラボの紹介+お給料の話
ICTSC5 DMM.comラボの紹介+お給料の話ICTSC5 DMM.comラボの紹介+お給料の話
ICTSC5 DMM.comラボの紹介+お給料の話Ken SASAKI
 
Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本Sotaro Kimura
 
ICTSC6 ちょっとだけ数学の話
ICTSC6 ちょっとだけ数学の話ICTSC6 ちょっとだけ数学の話
ICTSC6 ちょっとだけ数学の話Ken SASAKI
 
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
ちょっと理解に自信がないなという皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)ちょっと理解に自信がないなという皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)hamaken
 
AWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API GatewayAWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API GatewayShinpei Ohtani
 
Apache NiFiと 他プロダクトのつなぎ方
Apache NiFiと他プロダクトのつなぎ方Apache NiFiと他プロダクトのつなぎ方
Apache NiFiと 他プロダクトのつなぎ方Sotaro Kimura
 
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016Cloudera Japan
 
White Paper Outline
White Paper OutlineWhite Paper Outline
White Paper Outlineritabean1
 
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~Sotaro Kimura
 
Bb Tour ANZ 17 - X-Ray Roll Up Reports
Bb Tour ANZ 17 - X-Ray Roll Up ReportsBb Tour ANZ 17 - X-Ray Roll Up Reports
Bb Tour ANZ 17 - X-Ray Roll Up ReportsBlackboard APAC
 
Bb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product Updates
Bb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product UpdatesBb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product Updates
Bb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product UpdatesBlackboard APAC
 

Viewers also liked (18)

40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
 
Application of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructureApplication of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructure
 
Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016
 
ICTSC5 DMM.comラボの紹介+お給料の話
ICTSC5 DMM.comラボの紹介+お給料の話ICTSC5 DMM.comラボの紹介+お給料の話
ICTSC5 DMM.comラボの紹介+お給料の話
 
Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本
 
ICTSC6 ちょっとだけ数学の話
ICTSC6 ちょっとだけ数学の話ICTSC6 ちょっとだけ数学の話
ICTSC6 ちょっとだけ数学の話
 
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
ちょっと理解に自信がないなという皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)ちょっと理解に自信がないなという皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
 
AWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API GatewayAWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API Gateway
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Application of postgre sql to large social infrastructure jp
Application of postgre sql to large social infrastructure jpApplication of postgre sql to large social infrastructure jp
Application of postgre sql to large social infrastructure jp
 
Apache NiFiと 他プロダクトのつなぎ方
Apache NiFiと他プロダクトのつなぎ方Apache NiFiと他プロダクトのつなぎ方
Apache NiFiと 他プロダクトのつなぎ方
 
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
 
Oracle Advanced Analytics 概要
Oracle Advanced Analytics 概要Oracle Advanced Analytics 概要
Oracle Advanced Analytics 概要
 
White Paper Outline
White Paper OutlineWhite Paper Outline
White Paper Outline
 
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
 
Bb Tour ANZ 17 - X-Ray Roll Up Reports
Bb Tour ANZ 17 - X-Ray Roll Up ReportsBb Tour ANZ 17 - X-Ray Roll Up Reports
Bb Tour ANZ 17 - X-Ray Roll Up Reports
 
Bb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product Updates
Bb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product UpdatesBb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product Updates
Bb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product Updates
 
Apache Hadoop 2.8.0 の新機能 (抜粋)
Apache Hadoop 2.8.0 の新機能 (抜粋)Apache Hadoop 2.8.0 の新機能 (抜粋)
Apache Hadoop 2.8.0 の新機能 (抜粋)
 

Similar to Using Oracle Big Data Discovey as a Data Scientist's Toolkit

Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Rittman Analytics
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightTillmann Eitelberg
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architectSaurabh K. Gupta
 
Big Data Analytics .pptx
Big Data Analytics .pptxBig Data Analytics .pptx
Big Data Analytics .pptxpriti jadhao
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL David Smelker
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationDenodo
 
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and VisualizationMongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and VisualizationMongoDB
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersAdaryl "Bob" Wakefield, MBA
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 

Similar to Using Oracle Big Data Discovey as a Data Scientist's Toolkit (20)

Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
 
Big Data training
Big Data trainingBig Data training
Big Data training
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architect
 
Big data applications
Big data applicationsBig data applications
Big data applications
 
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Hadoop Eco system
 
Big Data Analytics .pptx
Big Data Analytics .pptxBig Data Analytics .pptx
Big Data Analytics .pptx
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and VisualizationMongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and Visualization
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R Users
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 

More from Mark Rittman

IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...Mark Rittman
 
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsBig Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsMark Rittman
 
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...Mark Rittman
 
Deploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle CloudDeploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle CloudMark Rittman
 
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Mark Rittman
 
What is Big Data Discovery, and how it complements traditional business anal...
What is Big Data Discovery, and how it complements  traditional business anal...What is Big Data Discovery, and how it complements  traditional business anal...
What is Big Data Discovery, and how it complements traditional business anal...Mark Rittman
 
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Mark Rittman
 
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Mark Rittman
 
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...Mark Rittman
 
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015Mark Rittman
 
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODIBIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODIMark Rittman
 
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI ProjectsOGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI ProjectsMark Rittman
 
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cUKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cMark Rittman
 
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Mark Rittman
 
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gPart 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gMark Rittman
 
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cPart 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cMark Rittman
 

More from Mark Rittman (16)

IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsBig Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
 
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
 
Deploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle CloudDeploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle Cloud
 
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
 
What is Big Data Discovery, and how it complements traditional business anal...
What is Big Data Discovery, and how it complements  traditional business anal...What is Big Data Discovery, and how it complements  traditional business anal...
What is Big Data Discovery, and how it complements traditional business anal...
 
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
 
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
 
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
 
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
 
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODIBIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
 
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI ProjectsOGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
 
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cUKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
 
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
 
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gPart 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
 
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cPart 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
 

Recently uploaded

Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)CUO VEERANAN VEERANAN
 
chatgpt-prompts (1).pdf
chatgpt-prompts (1).pdfchatgpt-prompts (1).pdf
chatgpt-prompts (1).pdfMuntherMurjan1
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Cyber Security Experts
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referencepriyansabari355
 
Hashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdfHashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdfJaithoonBibi
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023stephizcoolio
 
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...AkbarHidayatullah11
 
GDSC Machine Learning Session Presentation
GDSC Machine Learning Session PresentationGDSC Machine Learning Session Presentation
GDSC Machine Learning Session Presentationgdsclavasa
 
Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...
Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...
Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...Mesum Raza Hemani
 
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus
 
Big Data Foundations Level 1-IBM SkillsBuild
Big Data Foundations Level 1-IBM SkillsBuildBig Data Foundations Level 1-IBM SkillsBuild
Big Data Foundations Level 1-IBM SkillsBuildOshri Bitton
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referencepriyansabari355
 
PredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptxPredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptxKapilSinghal47
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxVighnesh Shashtri
 
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...Daniele Malitesta
 

Recently uploaded (17)

Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)
 
chatgpt-prompts (1).pdf
chatgpt-prompts (1).pdfchatgpt-prompts (1).pdf
chatgpt-prompts (1).pdf
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as reference
 
Hashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdfHashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdf
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023
 
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...
 
GDSC Machine Learning Session Presentation
GDSC Machine Learning Session PresentationGDSC Machine Learning Session Presentation
GDSC Machine Learning Session Presentation
 
Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...
Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...
Tableau User Group - Khi > First Meetup! Movies + Data Hands-On Vizathon (11t...
 
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
 
DELHI URBANIZATION
DELHI URBANIZATIONDELHI URBANIZATION
DELHI URBANIZATION
 
Big Data Foundations Level 1-IBM SkillsBuild
Big Data Foundations Level 1-IBM SkillsBuildBig Data Foundations Level 1-IBM SkillsBuild
Big Data Foundations Level 1-IBM SkillsBuild
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a reference
 
PredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptxPredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptx
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptx
 
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
 

Using Oracle Big Data Discovey as a Data Scientist's Toolkit

  • 1. T : @markrittman USING ORACLE BIG DATA DISCOVERY AS THE DATA SCIENTIST'S TOOLKIT Mark Rittman, Oracle ACE Director TRIVADIS TECHEVENT 2016, ZÜRICH
  • 2. •Oracle ACE Director, blogger + ODTUG member •Regular columnist for Oracle Magazine •Past ODTUG Executive Board Member •Author of two books on Oracle BI •Co-founder & CTO of Rittman Mead •15+ Years in Oracle BI, DW, ETL + now Big Data •Implementor, trainer, consultant + company founder •Based in Brighton, UK About The Presenter 2
  • 3. •A visual front-end to the Hadoop data reservoir, providing end-user access to datasets •Data sampled and loaded from Hadoop (Hive) into NoSQL Dgraph engine for fast analysis •Catalog, profile, analyse and combine schema-on-read datasets across the Hadoop cluster •Visualize and search datasets to gain insights, potentially load in summary form into DW Oracle Big Data Discovery - What Is It? 3
  • 7. Tools And Techniques Used By Data Scientists 7 IMPORTING AND TIDYING DATA VISUALISING AND TRANSFORMING DATA MODELING AND INFERRING COMMUNICATING AND BUNDLING VISUALISING AND TRANSFORMING DATA COMMUNICATING AND BUNDLING
  • 8. Tools And Techniques Used By Data Scientists 8 IMPORTING AND TIDYING DATA MODELING AND INFERRING •Whilst Big Data Discovery 1.1 enabled data wrangling, it was single-row only •No ability to aggregate data or perform inter-row calculations •No special null handling or other regularly-used techniques •No ability to materialise joins (only in data visualizations) •No ability to access commonly-used R, Python and other stats libraries •No solution for machine learning or predictive analytics
  • 10. IMPORTING AND TIDYING DATA METADATA AND DEVELOPER PRODUCTIVITY COMMUNICATING AND BUNDLING •Metadata Curation •Attribute-level Search from Catalog •Activity Hub •Python Interface to 
 BDD Datasets •Streamlined UI •Faster Data Indexing •Activity Hub •Sunburst Visualization •Aggregation •Materialised Joins •Better Pan and Zoom •Speed and Scale New Features In Oracle Big Data Discovery 1.2 10
  • 11. •Interactive tool designed to work with BDD without using Studio's front-end •Exposes all BDD concepts 
 (views, datasets, data sources etc) •Supports Apache Spark •HiveContext and SQLContext exposed •BDD Shell SDK for easy access to BDD
 features, functionality •Access to third-party libraries such as
 Pandas, Spark ML, numPy •Use with web-based notebook such as
 iPython, Jupyter, Zeppelin Big Data Discovery Python Shell - What Is It? 11
  • 12. 12 SOMEONE WILL BE HAPPY… 12
  • 13. SO LET’S SEE IT IN ACTION.
  • 14. •Over the past months I’ve been on sabattical, taking time out to look at new Hadoop tech •Building prototypes, working with with startups & analysts outside of core Oracle world •Asking myself the question “What will an analytics platform look like in 5 years time?” •But also during this time, getting fit, getting into cycling and losing 14kg over 12 months •Using Wahoo Elemnt + Strava for workout recording •Withings Wifi scales for weight + body fat measurement •Jawbone UP3 for steps, sleep, resting heart rate •All the time, collecting data and storing it in Hadoop Personal Data Science Project - “Quantified Self” 14
  • 15. •Quantified Self is about self-knowledge through numbers •Decide on some goals, work out what metrics to track •Use wearables and other smart devices to record steps, heart rate, workouts, weight and other health metrics •Plot, correlate, track trends and combine datasets •For me, goal was to maintain new “healthy weight” •Understand drivers of weight gain or loss •See how sleep affected productivity •Understand what behaviours led to a “good day” Personal Data Science Project - “Quantified Self” 15
  • 16. MY OTHER SABBATICAL PROJECT… 16
  • 18. Smart Devices Logging Data To Hadoop Cluster 18 Philips Hue 
 Lighting Nest Protect (X2), 
 Thermostat, Cam Withings
 Smart Scales Airplay
 Speakers Homebridge
 Homekit / Smarthings 
 Connector Samsung
 Smart Things Hub (Z-Wave, Zigbee) Door, Motion, Moisture,
 Presence Sensors Apple Homekit,
 Apple TV, Siri IFTTT Maker Channel 
 JSON via HTTP POST LogStash (real-time) (real-time) (real-time) • Gmail • Withings Scales • Strava • Jawbone UP • Weather • Youtube • IOS Photos • Twitter • RescueTime • Pocket • Instagram • Google Calendar • Facebook (real-time) 6-Node CDH5.8 Hadoop Cluster in garage,
 + Oracle Big Data Discovery 1.2.0
 on VMWare ESXi 4-node cluster
  • 19. •Data extracted or transported to target platform using LogStash, CSV file batch loads •Landed into HDFS as JSON documents, then exposed as Hive tables using Storage Handler •Cataloged, visualised and analysed using Oracle Big Data Discovery + Python ML Hadoop Cluster Dataset - “Personal Data Lake" 19 Data Transfer Data Access “Personal” Data Lake Jupyter
 Web Notebook 6 Node Hadoop Cluster (CDH5.5) Discovery & Development Labs
 Oracle Big Data Discovery 1.2 Data sets and samples Models and programs Oracle DV
 Desktop Models BDD Shell,
 Python, 
 Spark ML Data Factory LogStash
 via HTTP Manual
 CSV U/L Data streams CSV, IFTTT
 or API call Raw JSON log files in HDFS Each document an event, daily record or comms message Hive Tables
 w/ Elastic
 Storage Handler Index data turned into tabular format Health Data Unstructured Comms Data Smart Home
 Sensor Data
  • 20. •Uses IFTTT cloud workflow service to subscribe to events on wearables’ APIs •Triggers HTTP GET request via IFTTT Maker Channel to Logstash running at home •Event data sent as JSON documents, loaded
 into HDFS via webhdfs protocol •Structured in Hadoop using Hive JSONSerDe •Then loaded hourly into DGraph using
 Big Data Discovery dataprocessing CLI •Event data automatically enriched, and can
 be joined to smart home data for analysis Landing Wearables Data In Real-Time 20 New workout
 logged using
 Strava 1 Workout details uploaded
 to Strava using cloud API 2 3 IFTTT recipe gets workout event from Strava API, triggers an HTTP GET web request 4 JSON document received by
 Logstash, then forwarded to 
 Hadoop using webhdfs PUT 5 JSON documents landed in HDFS in raw form, then structured using Hive JSONSerDe 6 Hive data uploaded into Oracle Big Data Discovery, visualised and wrangled, and modelled using pySpark In the Cloud Home
  • 21. •All smart device events and sensor readings are routed through Samsung Smart Things hub •Including Apple HomeKit devices, through custom integration •Event data uploads to Smart Things cloud service + storage •Custom Groovy SmartApp subscribes to
 device events, transmits JSON documents
 to Logstash using HTTP GET requests •Then process flow the same as with
 wearables and social media / comms data Landing Smart Home Data In Real-Time 21 Sensor or other smart device
 raises a Smart Things event 1 Event logged in Samsung Smarthings Cloud Service from Smart Things Hub 2 4 JSON document received by
 Logstash, then forwarded to 
 Hadoop using webhdfs PUT 5 JSON documents landed in HDFS in raw form, then structured using Hive JSONSerDe 6 Hive data uploaded into Oracle Big Data Discovery, visualised and wrangled, and modelled using pySpark In the Cloud Home SmartApp subscribes to device events, forwards them as JSON document using HTTP GET requests 3
  • 22. •As well as visualising the combined dataset, we could also use “machine learning” •Find correlations, predict outcomes based on regression analysis, classify and cluster data •Run algorithms on the full dataset to answer questions like: •“What are the biggest determinants of weight gain or loss for me?” •“On a good day, what are the typical combination of behaviours I exhibit”? •“If I raised my cadence RPM average, how much further could I cycle per day?” •“Is working late or missing lunch self-defeating in terms of overall weekly output?” And Use Machine Learning For Insights… 22 MODELING AND INFERRING
  • 23. •Analysis started with data from Jawbone UP2 ecosystem (manual export, and via IFTTT events) •Base activity data (steps, active time, active calories expended) •Sleep data (time asleep, time in-bed, light and deep sleep, resting heart-rate) •Mood if recorded; food ingested if recorded •Workout data as provided by Strava integration •Weight data as provided by Withings integration Initial Base Dataset - Jawbone Up Extract 23 1 2 3
  • 24. •Understand the “spread” of data using histograms •Use box-plot charts to identify outliers and range of “usual” values •Sort attributes by strongest correlation to a target attribute Perform Exploratory Analysis On Data 24
  • 25. •Initial row-wise preparation and transformation of data using Groovy transformations Transform (“Wrangle”) Data As Needed 25
  • 26. •Very typical with self-recorded healthcare and workout data •Most machine-learning algorithms expect every attribute to have a value per row •Self-recorded data is typically sporadically recorded, lots of gaps in data •Need to decide what to do with columns of poorly populate values Dealing With Missing Data (“Nulls”) 26 1 2 3
  • 27. •Previous versions of BDD allowed you to create joins for views •Used in visualisations, equivalent to a SQL view i.e. SELECT only •BDD 1.2.x allows you to add new joined attributes to data view, i.e. materialise •In this instance, use to bring in data on emails, and on geolocation Joining Datasets To Materialize Related Data 27
  • 28. •Only sensible option when looking at change in weight compared to prior period •Change compared to previous day too granular Aggregate Data To Week Level 28 1 2 3
  • 29. NOW FOR THE CLEVER BIT MODELING AND INFERRING 29
  • 30. Use BDD Shell API to Identify Main Dataset ID 30
  • 31. Use Python PANDAS to Calculate % CHG W/w 31
  • 33. Use Linear Regression on BDD Dataset via Python 33 •To answer the question - which metric is the most influential when it comes to weight change?
  • 34. And the Answer … Amount of Sleep Each Night 34 •Most influential variable/attribute in my weight / loss gain is “# of emails sent” •Inverse correlation - more emails I sent, the more weight I lose - but why? •In my case - unusual set of circumstances that led to late nights, burst of intense work •So busy I skipped meals, didn’t snack, stress and overwork perhaps •And then compensated once work over by getting out on bike and exercising •Correlation and most influential variable 
 will probably change in time •This is where the data, measuring it, 
 and analysing it comes in •Useful basis for experimenting •And bring in the Smart Home data too
  • 35. •Load device + event data into Cloudera Kudu rather than HDFS + Hive •Current limitation is around Big Data Discovery - does not work with Kudu or Impala •But useful for real-time metrics (BDD requires batch ingest, and samples the data) •Use Kafka for more reliable event routing •Push email, social media, saved documents etc into Cloudera Search •Do more on the machine learning / data integration + correlation side For The Future..? 35
  • 36. 36
  • 37. 37
  • 38. 38
  • 39. THANK YOU E X A M P L E S O F T E X T - S I M P L E A N D E A S Y T O U S E T H E T H E M E O F T H E D E M O T E M P L AT E B L A C K A N D W H I T E W O R L D THANK YOU 39
  • 40. T : @markrittman USING ORACLE BIG DATA DISCOVERY AS THE DATA SCIENTIST'S TOOLKIT Mark Rittman, Oracle ACE Director TRIVADIS TECHEVENT 2016, ZÜRICH