SlideShare a Scribd company logo
How and why you need to build a Big Data Lab
Why GCP is a pretty cool place to do it
Chris Kernaghan
Principal Consultant
VS
Data Lab Data Factory
Big data Lab – the world’s biggest
• WLCG – Worldwide LHC
Computing Grid
• 170 Computing facilities
• 200,000 Cores
• 300GB/s data stream
ingestion
• 300MB/s data stream
filtered
• 27TB RAW data per day
4
Big data Lab – Traditional Home brew
• Based on Vmware or Virtuabox or
Raspberry PI
• Mix of hardware
• Limited resources – 6 cores, 128GB space
• Low performance – 1 GHz Processor
• Lots of baby sitting
• Equal measures of heartbreak and joy
5
Big data Lab – Using Cloud
• IaaS and PaaS services
• Mix of applications
• Infinite resources
• High performance
• Access to quality data sets
• Utility billing
• Sharable outcomes
Big data platforms in the Cloud - AWS
Big data platforms in the Cloud - GCP
Big data platforms in the Cloud - Azure
Big data platforms in the Cloud - SAP
Big data platforms in the Cloud - IBM
Common characteristics of Cloud based platforms
Streaming Engine
Data Storage
Hadoop
In Memory Engine
Machine Learning
Analytics
Why have a lab
• Data is a complex beast, it has several attributes
• Quality – different tasks require different data quality
• Machine Learning & Predictive
• Reporting
• Context – data context is vital for analytics
• Story of the data
• Volume – how much data is there
• Testing requirements for data latency
• Format – data format is not universal
• Different applications have different data types
• Analysis
• What and how to analyse
A lab is essential for testing these items before large scale factory work is
done
Why have a lab
• Data is a complex beast, it has several attributes
• Quality – different tasks require different data quality
• Machine Learning & Predictive
• Reporting
• Context – data context is vital for analytics
• Story of the data
• Volume – how much data is there
• Testing requirements for data latency
• Format – data format is not universal
• Different applications have different data types
• Analysis
• What and how to analyse
A lab is essential for testing these items before large scale factory work is
done
Why have a lab
• Data is a complex beast, it has several attributes
• Quality – different tasks require different data quality
• Machine Learning & Predictive
• Reporting
• Context – data context is vital for analytics
• Story of the data
• Volume – how much data is there
• Testing requirements for data latency
• Format – data format is not universal
• Different applications have different data types
• Analysis
• What and how to analyse
A lab is essential for testing these items before large scale factory work is
done
Define your goals
• Achieving the best use of resources is critical
• Cloud based Big Data labs have a direct charge model
• Homebrew Big Data labs have limited resources
• Define what the outcome of the lab work is
• This is no different to a proper science experiment
• Design your lab and define your tools
• You have to use the right tool for the job, not just those you are familiar with
• Define your data set
• Work out what data you need
• Gain permission to use what you need if required
Define your goals
• Achieving the best use of resources is critical
• Cloud based Big Data labs have a direct charge model
• Homebrew Big Data labs have limited resources
• Define what the outcome of the lab work is
• This is no different to a proper science experiment
• Design your lab and define your tools
• You have to use the right tool for the job, not just those you are familiar with
• Define your data set
• Work out what data you need
• Gain permission to use what you need if required
Mind the gap and acquire knowledge
Part of the fun of big data labs is working out what you don’t know
• A particular framework
• An algorithm
• A data set
• A visualisation
The next fun part is working out where to fill that knowledge gap
• Online sources –
• Kaggle
• MOOC’s – Andrew Ng’s Stanford course
• Forums – Stack Overflow
It is also implicit that you also share what you have learnt once you have
Mind the gap and acquire knowledge
Part of the fun of big data labs is working out what you don’t know
• A particular framework
• An algorithm
• A data set
• A visualisation
The next fun part is working out where to fill that knowledge gap
• Online sources –
• Kaggle
• MOOC’s – Andrew Ng’s Stanford course
• Forums – Stack Overflow
It is also implicit that you also share what you have learnt once you have
SAP and Big Data platforms
In-Memory
Store
Simplified processing of large
volumes of archived data
HANA SDA / Spark Adapter
HANA-Spark Adapter for real-
time understanding of current
data with historical context
Unified administration using
HANA cockpit administration
simplifies system management
SAP HANA
Application Services
Database Services
Processing Services
Integration Services
YARN
HDFSFiles Files Files
Vora
Spark
Vora
Spark
Vora
Spark
SAP HANA Platform
HANA Smart
Data Access
Structured
Storage
Dynamic
Tiering
Spark API
enhancement
Hadoop Cluster
SAP HANA Express Edition
• Fast application development and deployment with essential features
• Free up to 32GB of memory – upgradeable for a fee
• Flexible access from a laptop, desktop, server, Cloud platform
• Pre-Packages with sample code and data
• Downloadable from SAP Developer network
Big data datasets
Companies are really really bad at using external data sets
• There are many public data sets which can be used to compliment existing internal
data.
• Weather data for logistics companies
• AWS Public Datasets
• Google Public Datasets
• GitHub Public Datasets
• Kaggle Public Datasets
• Data.gov.uk Public Datasets
AWS Big data datasets
Google Big data datasets
GitHub Big data datasets
Kaggle Big data datasets
Data.gov.uk data datasets
SAP HANA Express Edition Deploying in GCP
DEMO

More Related Content

What's hot

What's hot (17)

Collaborate 2018: Optimizing Your Robust Oracle EBS Footprint for Radical Eff...
Collaborate 2018: Optimizing Your Robust Oracle EBS Footprint for Radical Eff...Collaborate 2018: Optimizing Your Robust Oracle EBS Footprint for Radical Eff...
Collaborate 2018: Optimizing Your Robust Oracle EBS Footprint for Radical Eff...
 
Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016
 
FUG Agile software engineering practices
FUG Agile software engineering practicesFUG Agile software engineering practices
FUG Agile software engineering practices
 
Creating High Performance teams by using a DevOps culture (FUG presentation)
Creating High Performance teams by using a DevOps culture (FUG presentation)Creating High Performance teams by using a DevOps culture (FUG presentation)
Creating High Performance teams by using a DevOps culture (FUG presentation)
 
Salesforce Flows Architecture Best Practices
Salesforce Flows Architecture Best PracticesSalesforce Flows Architecture Best Practices
Salesforce Flows Architecture Best Practices
 
Sap netweaver as abap 7.4 overview and product highlights
Sap netweaver as abap 7.4 overview and product highlightsSap netweaver as abap 7.4 overview and product highlights
Sap netweaver as abap 7.4 overview and product highlights
 
Directions NA Water-Agile-Fall methodology and NAV implementation
Directions NA Water-Agile-Fall methodology and NAV implementationDirections NA Water-Agile-Fall methodology and NAV implementation
Directions NA Water-Agile-Fall methodology and NAV implementation
 
Directions NA Choosing the best possible Azure platform for NAV
Directions NA Choosing the best possible Azure platform for NAVDirections NA Choosing the best possible Azure platform for NAV
Directions NA Choosing the best possible Azure platform for NAV
 
How Tempo Adds More Value To Your JIRA
How Tempo Adds More Value To Your JIRAHow Tempo Adds More Value To Your JIRA
How Tempo Adds More Value To Your JIRA
 
Facilitating continuous delivery in a FinTech world with Salt, Jenkins, Nexus...
Facilitating continuous delivery in a FinTech world with Salt, Jenkins, Nexus...Facilitating continuous delivery in a FinTech world with Salt, Jenkins, Nexus...
Facilitating continuous delivery in a FinTech world with Salt, Jenkins, Nexus...
 
Project Tracking Application
Project Tracking ApplicationProject Tracking Application
Project Tracking Application
 
Serena Business Manager Visualizing 2016
Serena Business Manager Visualizing 2016Serena Business Manager Visualizing 2016
Serena Business Manager Visualizing 2016
 
Key takeaways for SAP PI Integration 2018
Key takeaways for SAP PI Integration 2018Key takeaways for SAP PI Integration 2018
Key takeaways for SAP PI Integration 2018
 
What's new in SBM 11.1
What's new in SBM 11.1What's new in SBM 11.1
What's new in SBM 11.1
 
DevBoss May 2019 Presentation
DevBoss May 2019 Presentation DevBoss May 2019 Presentation
DevBoss May 2019 Presentation
 
How to speed up your SAP PI/CPI development
How to speed up your SAP PI/CPI developmentHow to speed up your SAP PI/CPI development
How to speed up your SAP PI/CPI development
 
What's New for Atlassian Administrators
What's New for Atlassian AdministratorsWhat's New for Atlassian Administrators
What's New for Atlassian Administrators
 

Similar to How and why you need to build a big data lab

Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 

Similar to How and why you need to build a big data lab (20)

Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R Users
 
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Architectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoopArchitectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoop
 
Adventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE BytesAdventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE Bytes
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Day 00 - Introduction to machine learning with big data
Day 00 - Introduction to machine learning with big dataDay 00 - Introduction to machine learning with big data
Day 00 - Introduction to machine learning with big data
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Hands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop EcosystemHands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop Ecosystem
 
Rdbms
RdbmsRdbms
Rdbms
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 

More from Chris Kernaghan

01 sap hana landscape and operations infrastructure v2 0
01  sap hana landscape and operations infrastructure v2 001  sap hana landscape and operations infrastructure v2 0
01 sap hana landscape and operations infrastructure v2 0
Chris Kernaghan
 

More from Chris Kernaghan (14)

DevOps for SAP customers
DevOps for SAP customersDevOps for SAP customers
DevOps for SAP customers
 
Can you do DevOps in SAP (DevOps -> SAP)
Can you do DevOps in SAP (DevOps -> SAP)Can you do DevOps in SAP (DevOps -> SAP)
Can you do DevOps in SAP (DevOps -> SAP)
 
Can you do DevOps in SAP (SAP -> DevOps)
Can you do DevOps in SAP (SAP -> DevOps)Can you do DevOps in SAP (SAP -> DevOps)
Can you do DevOps in SAP (SAP -> DevOps)
 
Deploying Big Data Platforms
Deploying Big Data PlatformsDeploying Big Data Platforms
Deploying Big Data Platforms
 
Change management in hybrid landscapes
Change management in hybrid landscapesChange management in hybrid landscapes
Change management in hybrid landscapes
 
Quick and dirty performance analysis
Quick and dirty performance analysisQuick and dirty performance analysis
Quick and dirty performance analysis
 
HANA - the backbone for S/4 HANA
HANA - the backbone for S/4 HANAHANA - the backbone for S/4 HANA
HANA - the backbone for S/4 HANA
 
Cloud or On Premise
Cloud or On PremiseCloud or On Premise
Cloud or On Premise
 
TEC118 – How Do You Manage the Configuration of Your Environments from Metal ...
TEC118 –How Do You Manage the Configuration of Your Environments from Metal ...TEC118 –How Do You Manage the Configuration of Your Environments from Metal ...
TEC118 – How Do You Manage the Configuration of Your Environments from Metal ...
 
Automating Infrastructure as a Service Deployments and monitoring – TEC213
Automating Infrastructure as a Service Deployments and monitoring – TEC213Automating Infrastructure as a Service Deployments and monitoring – TEC213
Automating Infrastructure as a Service Deployments and monitoring – TEC213
 
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deploymentsSAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
 
SAP TechEd 2013 session Tec118 managing your-environment
SAP TechEd 2013 session Tec118 managing your-environmentSAP TechEd 2013 session Tec118 managing your-environment
SAP TechEd 2013 session Tec118 managing your-environment
 
01 sap hana landscape and operations infrastructure v2 0
01  sap hana landscape and operations infrastructure v2 001  sap hana landscape and operations infrastructure v2 0
01 sap hana landscape and operations infrastructure v2 0
 
Sapuki sig 2013
Sapuki sig 2013Sapuki sig 2013
Sapuki sig 2013
 

Recently uploaded

Recently uploaded (20)

Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 

How and why you need to build a big data lab

  • 1. How and why you need to build a Big Data Lab Why GCP is a pretty cool place to do it Chris Kernaghan Principal Consultant
  • 3. Big data Lab – the world’s biggest • WLCG – Worldwide LHC Computing Grid • 170 Computing facilities • 200,000 Cores • 300GB/s data stream ingestion • 300MB/s data stream filtered • 27TB RAW data per day
  • 4. 4 Big data Lab – Traditional Home brew • Based on Vmware or Virtuabox or Raspberry PI • Mix of hardware • Limited resources – 6 cores, 128GB space • Low performance – 1 GHz Processor • Lots of baby sitting • Equal measures of heartbreak and joy
  • 5. 5 Big data Lab – Using Cloud • IaaS and PaaS services • Mix of applications • Infinite resources • High performance • Access to quality data sets • Utility billing • Sharable outcomes
  • 6. Big data platforms in the Cloud - AWS
  • 7. Big data platforms in the Cloud - GCP
  • 8. Big data platforms in the Cloud - Azure
  • 9. Big data platforms in the Cloud - SAP
  • 10. Big data platforms in the Cloud - IBM
  • 11. Common characteristics of Cloud based platforms Streaming Engine Data Storage Hadoop In Memory Engine Machine Learning Analytics
  • 12. Why have a lab • Data is a complex beast, it has several attributes • Quality – different tasks require different data quality • Machine Learning & Predictive • Reporting • Context – data context is vital for analytics • Story of the data • Volume – how much data is there • Testing requirements for data latency • Format – data format is not universal • Different applications have different data types • Analysis • What and how to analyse A lab is essential for testing these items before large scale factory work is done
  • 13. Why have a lab • Data is a complex beast, it has several attributes • Quality – different tasks require different data quality • Machine Learning & Predictive • Reporting • Context – data context is vital for analytics • Story of the data • Volume – how much data is there • Testing requirements for data latency • Format – data format is not universal • Different applications have different data types • Analysis • What and how to analyse A lab is essential for testing these items before large scale factory work is done
  • 14. Why have a lab • Data is a complex beast, it has several attributes • Quality – different tasks require different data quality • Machine Learning & Predictive • Reporting • Context – data context is vital for analytics • Story of the data • Volume – how much data is there • Testing requirements for data latency • Format – data format is not universal • Different applications have different data types • Analysis • What and how to analyse A lab is essential for testing these items before large scale factory work is done
  • 15. Define your goals • Achieving the best use of resources is critical • Cloud based Big Data labs have a direct charge model • Homebrew Big Data labs have limited resources • Define what the outcome of the lab work is • This is no different to a proper science experiment • Design your lab and define your tools • You have to use the right tool for the job, not just those you are familiar with • Define your data set • Work out what data you need • Gain permission to use what you need if required
  • 16. Define your goals • Achieving the best use of resources is critical • Cloud based Big Data labs have a direct charge model • Homebrew Big Data labs have limited resources • Define what the outcome of the lab work is • This is no different to a proper science experiment • Design your lab and define your tools • You have to use the right tool for the job, not just those you are familiar with • Define your data set • Work out what data you need • Gain permission to use what you need if required
  • 17. Mind the gap and acquire knowledge Part of the fun of big data labs is working out what you don’t know • A particular framework • An algorithm • A data set • A visualisation The next fun part is working out where to fill that knowledge gap • Online sources – • Kaggle • MOOC’s – Andrew Ng’s Stanford course • Forums – Stack Overflow It is also implicit that you also share what you have learnt once you have
  • 18. Mind the gap and acquire knowledge Part of the fun of big data labs is working out what you don’t know • A particular framework • An algorithm • A data set • A visualisation The next fun part is working out where to fill that knowledge gap • Online sources – • Kaggle • MOOC’s – Andrew Ng’s Stanford course • Forums – Stack Overflow It is also implicit that you also share what you have learnt once you have
  • 19. SAP and Big Data platforms In-Memory Store Simplified processing of large volumes of archived data HANA SDA / Spark Adapter HANA-Spark Adapter for real- time understanding of current data with historical context Unified administration using HANA cockpit administration simplifies system management SAP HANA Application Services Database Services Processing Services Integration Services YARN HDFSFiles Files Files Vora Spark Vora Spark Vora Spark SAP HANA Platform HANA Smart Data Access Structured Storage Dynamic Tiering Spark API enhancement Hadoop Cluster
  • 20. SAP HANA Express Edition • Fast application development and deployment with essential features • Free up to 32GB of memory – upgradeable for a fee • Flexible access from a laptop, desktop, server, Cloud platform • Pre-Packages with sample code and data • Downloadable from SAP Developer network
  • 21. Big data datasets Companies are really really bad at using external data sets • There are many public data sets which can be used to compliment existing internal data. • Weather data for logistics companies • AWS Public Datasets • Google Public Datasets • GitHub Public Datasets • Kaggle Public Datasets • Data.gov.uk Public Datasets
  • 22. AWS Big data datasets
  • 23. Google Big data datasets
  • 24. GitHub Big data datasets
  • 25. Kaggle Big data datasets
  • 27. SAP HANA Express Edition Deploying in GCP DEMO