SlideShare a Scribd company logo
1 of 41
Download to read offline
Qrious about Insights
Big Data in the Real World
AUT DSRG Workshop
Guy Kloss
guy.kloss@qrious.co.nz
Enterprise Architect
Qrious Limited
7 February 2017
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and Jetsam
Guy Kloss | Big Data in the Real World 2/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Who/What is Qrious?
We help New Zealand businesses
and public sector organisations
create value
and solve their most pressing business problems
by turning data into actionable insight.
Guy Kloss | Big Data in the Real World 3/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Who/What is Qrious?
Backed by Spark
Approx. 60 employees
Offices in Auckland & Wellington
Substantial investment across Data, Platform & People
Built from the ground up
(new generation technology and working principles)
One of the largest Data Science teams in the country
with > 80% qualified to Masters & PhD level
and over 60 years of combined experience years of combined experience
NZs leading data analytics specialist by 2017
Guy Kloss | Big Data in the Real World 4/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Our Capabilities
Advanced analytics
Location insights
Big Data platforms
Consulting services
BI & Warehousing
Guy Kloss | Big Data in the Real World 5/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Who am I?
Chemical Engineer (Masters)
Rocket Scientist (German Aerospace Centre)
Computer Scientist (PhD)
Former lecturer (AUT)
Lead Software Developer and Head Crypto Geek @ Mega
Enterprise Architect at Qrious
Dad, baseballer, diver, . . . general geek!
Guy Kloss | Big Data in the Real World 6/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and Jetsam
Guy Kloss | Big Data in the Real World 7/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Data size
Number of records
Data volume
Guy Kloss | Big Data in the Real World 8/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
An exponentially growing data world
Primary Memory/Disk Capacity
Guy Kloss | Big Data in the Real World 9/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
An exponentially growing data world
Relative Speeds
Source: http://www.cs.cmu.edu/~amarp/cpu-io-gap
Guy Kloss | Big Data in the Real World 10/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Size Does Matter!
Access/processing beyond a single machine
(RAM, disk, CPU)
Expensive data transfers at volume
(latency, throughput)
Guy Kloss | Big Data in the Real World 11/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Storage Issues
Storage, access, index, find
Transfer, manage, prevent data loss
Guy Kloss | Big Data in the Real World 12/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Types of Data
Structured
Unstructured
Graphs
Free text
. . .
Guy Kloss | Big Data in the Real World 13/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Correlating . . . co-relating . . . mashing . . .
Not single record problem
But an m : n problem
Guy Kloss | Big Data in the Real World 14/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Beyond Exponential
Problems are between exponential and hyperexponential
→ Enabling data processing in an exponential world
Guy Kloss | Big Data in the Real World 15/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and Jetsam
Guy Kloss | Big Data in the Real World 16/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Number of Records
> 1 trillion (109
) records: Spark’s location based data set
Anonymised for privacy (on ingest)
Fully encrypted (at rest and in transport)
Continuous/stream ingestion
Normalisation and segmentation on data set
Correlating with external data set
→ Finding insights in this “hay mountain”
Guy Kloss | Big Data in the Real World 17/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Data Volume
100s of TB to PB of “Data Lakes”
Not just a backup/data grave
Fully encrypted (at rest and in transport)
Includes data querying and processing capability
→ Capability to “store everything” (every thing and kind)
Guy Kloss | Big Data in the Real World 18/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and Jetsam
Guy Kloss | Big Data in the Real World 19/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Divide and Conquer
Massively parallel processing: MPP
Parallelise: Map-Reduce
Pipelines: Stream processing
Guy Kloss | Big Data in the Real World 20/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Leverage Data Locality
Bring processing to the data
Guy Kloss | Big Data in the Real World 21/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
The Right Tools
Don’t re-invent the wheel
Use existing high performing tools where possible
Available high productivity frameworks, making use of high level languages
The right tool for the type of data
Use the Source, Luke!
(Leverage open source based tooling with a community)
Guy Kloss | Big Data in the Real World 22/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
The Right Data Organisation
Row vs. columnar storage
→ For analytics often better in columnar format
Guy Kloss | Big Data in the Real World 23/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
In, Out, Cha-Cha-Cha
Ingest data from (legacy, external) source systems
→ ETL – Extract, Transform, Load
Make sure the rhythm fits (no missing “Out”)
Guy Kloss | Big Data in the Real World 24/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and Jetsam
Guy Kloss | Big Data in the Real World 25/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Hadoop
Hadoop and distributions
Processing tools for relational, streaming, batch, graph, text, search, . . .
Allocates cluster resources dynamically
Data distributed (with redundancy),
so compute allocated where data is
Guy Kloss | Big Data in the Real World 26/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Hadoop Distributions
Many Hadoop distributions: Similar to Linux distributions
Cloudera Partnership with Qrious
“Bronze” partner
Ambitions to become “Silver” partner
and MSP (managed service provider)
Guy Kloss | Big Data in the Real World 27/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Basic Hadoop Tool Suite
Example: Cloudera Hadoop Distribution
Guy Kloss | Big Data in the Real World 28/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
MPP Databases
DB for massively parallel processing (MPP)
Greenplum database and forks
(based on PostgreSQL)
Guy Kloss | Big Data in the Real World 29/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Generic and Specialised DBs
Generic RDBMS (where useful)
NoSQL
Graph DB
Other columnar species
Guy Kloss | Big Data in the Real World 30/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and Jetsam
Guy Kloss | Big Data in the Real World 31/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Delivering a Suitable Solution
Includes:
System management
Connectivity
Application logic
Services
Yummy add-ons
Guy Kloss | Big Data in the Real World 32/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
System Management Framework
Security
Dedicated sub-networks with specific firewall rules
External firewalls
User and credentials management
Log collector
Other security tools . . .
System access
VPN
Remote desktop services
Guy Kloss | Big Data in the Real World 33/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Connectivity
API gateways
(Reverse) proxies
SFTP
Guy Kloss | Big Data in the Real World 34/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Application Logic
Platfor-as-a-Service (PaaS)
Huge benefits of containerising application logic (using Docker)
→ Much reduced cadence for delivery
APIs, Micro-Services
Orchestration of Big Data analysis
Guy Kloss | Big Data in the Real World 35/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Services
Solutioning, build
Analytics and development
Operation and maintenance
Guy Kloss | Big Data in the Real World 36/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Bonus Points for . . .
Provenance
(reproducibility, auditability, compliance)
AI and ML
Blockchain
(non-repudiation, trust, “smart contracts”,
identity management, federation, . . . )
Guy Kloss | Big Data in the Real World 37/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and Jetsam
Guy Kloss | Big Data in the Real World 38/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
In the Qrious Pipeline
Make Big Data a commodity: Don’t buy, pay what you need!
→ Big-Data-as-a-Service – BDPaaS
Sliced, diced and configured to your needs
Straight on bare metal,
not VMs (like most cloud hosters)
Guy Kloss | Big Data in the Real World 39/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Maximising the Jobmarket
What skills do you need?
RDBMS?
SAS?
NoSQL DBs?
Maybe Hadoop is a good answer?
Guy Kloss | Big Data in the Real World 40/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Questions?
Parallelise!
Guy Kloss
guy.kloss@qrious.co.nz
Just a humble hair–dryer from the 30s:
“One of the first machines used for
permanent wave hairstyling back in the
1920’s and 1930’s.”
Dark Roasted Blend:
http://www.darkroastedblend.com/2007/05/
mystery-devices-issue-2.html
Guy Kloss | Big Data in the Real World 41/41

More Related Content

Viewers also liked

Case study - Automotive DMS Connection to Salesforce.com
Case study - Automotive DMS Connection to Salesforce.comCase study - Automotive DMS Connection to Salesforce.com
Case study - Automotive DMS Connection to Salesforce.comRodney Birch
 
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Hortonworks
 
Real-Time Analytics for Industries
Real-Time Analytics for IndustriesReal-Time Analytics for Industries
Real-Time Analytics for IndustriesAvadhoot Patwardhan
 
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageWebinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageCloudera, Inc.
 
IBM Smarter Commerce - A Strategic Analysis
IBM Smarter Commerce - A Strategic AnalysisIBM Smarter Commerce - A Strategic Analysis
IBM Smarter Commerce - A Strategic AnalysisMadhuranath R
 
IBM Smarter Commerce Order Management for Communications
IBM Smarter Commerce Order Management for CommunicationsIBM Smarter Commerce Order Management for Communications
IBM Smarter Commerce Order Management for CommunicationsChris Shaw
 
Big Data Analytics - From Generating Big Data to Deriving Business Value
Big Data Analytics - From Generating Big Data to Deriving Business ValueBig Data Analytics - From Generating Big Data to Deriving Business Value
Big Data Analytics - From Generating Big Data to Deriving Business ValuePiyush Malik
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Driving digital transformation in Automotive industry
Driving digital transformation in Automotive industryDriving digital transformation in Automotive industry
Driving digital transformation in Automotive industryDebashis Majumder
 
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use Cases
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use CasesGlobal Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use Cases
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use CasesSanjay Sharma
 
Big Data & Analytics in the Manufacturing Industry: The Vaasan Group
Big Data & Analytics in the Manufacturing Industry: The Vaasan GroupBig Data & Analytics in the Manufacturing Industry: The Vaasan Group
Big Data & Analytics in the Manufacturing Industry: The Vaasan GroupIBM Analytics
 
Digital Transformation in Automotive
Digital Transformation in AutomotiveDigital Transformation in Automotive
Digital Transformation in AutomotiveStradablog
 

Viewers also liked (12)

Case study - Automotive DMS Connection to Salesforce.com
Case study - Automotive DMS Connection to Salesforce.comCase study - Automotive DMS Connection to Salesforce.com
Case study - Automotive DMS Connection to Salesforce.com
 
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
 
Real-Time Analytics for Industries
Real-Time Analytics for IndustriesReal-Time Analytics for Industries
Real-Time Analytics for Industries
 
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageWebinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
 
IBM Smarter Commerce - A Strategic Analysis
IBM Smarter Commerce - A Strategic AnalysisIBM Smarter Commerce - A Strategic Analysis
IBM Smarter Commerce - A Strategic Analysis
 
IBM Smarter Commerce Order Management for Communications
IBM Smarter Commerce Order Management for CommunicationsIBM Smarter Commerce Order Management for Communications
IBM Smarter Commerce Order Management for Communications
 
Big Data Analytics - From Generating Big Data to Deriving Business Value
Big Data Analytics - From Generating Big Data to Deriving Business ValueBig Data Analytics - From Generating Big Data to Deriving Business Value
Big Data Analytics - From Generating Big Data to Deriving Business Value
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Driving digital transformation in Automotive industry
Driving digital transformation in Automotive industryDriving digital transformation in Automotive industry
Driving digital transformation in Automotive industry
 
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use Cases
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use CasesGlobal Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use Cases
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use Cases
 
Big Data & Analytics in the Manufacturing Industry: The Vaasan Group
Big Data & Analytics in the Manufacturing Industry: The Vaasan GroupBig Data & Analytics in the Manufacturing Industry: The Vaasan Group
Big Data & Analytics in the Manufacturing Industry: The Vaasan Group
 
Digital Transformation in Automotive
Digital Transformation in AutomotiveDigital Transformation in Automotive
Digital Transformation in Automotive
 

Similar to Qrious about Insights -- Big Data in the Real World

Big Data for One Big Family
Big Data for One Big FamilyBig Data for One Big Family
Big Data for One Big FamilyMatt Asay
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryAli Dasdan
 
Decoding Data Science
Decoding Data ScienceDecoding Data Science
Decoding Data ScienceMatt Fornito
 
ATLUG Day of Champions
ATLUG Day of ChampionsATLUG Day of Champions
ATLUG Day of ChampionsPeter Presnell
 
Symposium 2018 - Big data transport and collaboration - Gregory Vial
Symposium 2018 - Big data  transport and collaboration - Gregory VialSymposium 2018 - Big data  transport and collaboration - Gregory Vial
Symposium 2018 - Big data transport and collaboration - Gregory VialPMI-Montréal
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressMarcel Blattner, PhD
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data ScienceEdureka!
 
Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxjuliennehar
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceInstitute of Contemporary Sciences
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data LakeRobert Chong
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course pptNjain85
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloOCTO Technology
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Mathieu DESPRIEE
 
How to Build Successful Data Team - Dataiku ?
How to Build Successful Data Team -  Dataiku ? How to Build Successful Data Team -  Dataiku ?
How to Build Successful Data Team - Dataiku ? Dataiku
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackAnant Corporation
 

Similar to Qrious about Insights -- Big Data in the Real World (20)

Big Data for One Big Family
Big Data for One Big FamilyBig Data for One Big Family
Big Data for One Big Family
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
 
Decoding Data Science
Decoding Data ScienceDecoding Data Science
Decoding Data Science
 
ATLUG Day of Champions
ATLUG Day of ChampionsATLUG Day of Champions
ATLUG Day of Champions
 
Symposium 2018 - Big data transport and collaboration - Gregory Vial
Symposium 2018 - Big data  transport and collaboration - Gregory VialSymposium 2018 - Big data  transport and collaboration - Gregory Vial
Symposium 2018 - Big data transport and collaboration - Gregory Vial
 
Big Data
Big DataBig Data
Big Data
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR Congress
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data Science
 
Bigdata notes
Bigdata notesBigdata notes
Bigdata notes
 
Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docx
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course ppt
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
 
How to Build Successful Data Team - Dataiku ?
How to Build Successful Data Team -  Dataiku ? How to Build Successful Data Team -  Dataiku ?
How to Build Successful Data Team - Dataiku ?
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 

More from Guy K. Kloss

Kauri ID - A Self-Sovereign, Blockchain-based Identity System
Kauri ID - A Self-Sovereign, Blockchain-based Identity SystemKauri ID - A Self-Sovereign, Blockchain-based Identity System
Kauri ID - A Self-Sovereign, Blockchain-based Identity SystemGuy K. Kloss
 
Representational State Transfer (REST) and HATEOAS
Representational State Transfer (REST) and HATEOASRepresentational State Transfer (REST) and HATEOAS
Representational State Transfer (REST) and HATEOASGuy K. Kloss
 
Introduction to LaTeX (For Word users)
 Introduction to LaTeX (For Word users) Introduction to LaTeX (For Word users)
Introduction to LaTeX (For Word users)Guy K. Kloss
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"Guy K. Kloss
 
Operations Research and Optimization in Python using PuLP
Operations Research and Optimization in Python using PuLPOperations Research and Optimization in Python using PuLP
Operations Research and Optimization in Python using PuLPGuy K. Kloss
 
Python Data Plotting and Visualisation Extravaganza
Python Data Plotting and Visualisation ExtravaganzaPython Data Plotting and Visualisation Extravaganza
Python Data Plotting and Visualisation ExtravaganzaGuy K. Kloss
 
Lecture "Open Source and Open Content"
Lecture "Open Source and Open Content"Lecture "Open Source and Open Content"
Lecture "Open Source and Open Content"Guy K. Kloss
 
Version Control with Subversion
Version Control with SubversionVersion Control with Subversion
Version Control with SubversionGuy K. Kloss
 
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. MultiprocessingBeating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. MultiprocessingGuy K. Kloss
 
Thinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ IntegrationThinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ IntegrationGuy K. Kloss
 
Thinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ IntegrationThinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ IntegrationGuy K. Kloss
 
Gaining Colour Stability in Live Image Capturing
Gaining Colour Stability in Live Image CapturingGaining Colour Stability in Live Image Capturing
Gaining Colour Stability in Live Image CapturingGuy K. Kloss
 
LaTeX Introduction for Word Users
LaTeX Introduction for Word UsersLaTeX Introduction for Word Users
LaTeX Introduction for Word UsersGuy K. Kloss
 
Thinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ IntegrationThinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ IntegrationGuy K. Kloss
 

More from Guy K. Kloss (14)

Kauri ID - A Self-Sovereign, Blockchain-based Identity System
Kauri ID - A Self-Sovereign, Blockchain-based Identity SystemKauri ID - A Self-Sovereign, Blockchain-based Identity System
Kauri ID - A Self-Sovereign, Blockchain-based Identity System
 
Representational State Transfer (REST) and HATEOAS
Representational State Transfer (REST) and HATEOASRepresentational State Transfer (REST) and HATEOAS
Representational State Transfer (REST) and HATEOAS
 
Introduction to LaTeX (For Word users)
 Introduction to LaTeX (For Word users) Introduction to LaTeX (For Word users)
Introduction to LaTeX (For Word users)
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
 
Operations Research and Optimization in Python using PuLP
Operations Research and Optimization in Python using PuLPOperations Research and Optimization in Python using PuLP
Operations Research and Optimization in Python using PuLP
 
Python Data Plotting and Visualisation Extravaganza
Python Data Plotting and Visualisation ExtravaganzaPython Data Plotting and Visualisation Extravaganza
Python Data Plotting and Visualisation Extravaganza
 
Lecture "Open Source and Open Content"
Lecture "Open Source and Open Content"Lecture "Open Source and Open Content"
Lecture "Open Source and Open Content"
 
Version Control with Subversion
Version Control with SubversionVersion Control with Subversion
Version Control with Subversion
 
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. MultiprocessingBeating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
 
Thinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ IntegrationThinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ Integration
 
Thinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ IntegrationThinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ Integration
 
Gaining Colour Stability in Live Image Capturing
Gaining Colour Stability in Live Image CapturingGaining Colour Stability in Live Image Capturing
Gaining Colour Stability in Live Image Capturing
 
LaTeX Introduction for Word Users
LaTeX Introduction for Word UsersLaTeX Introduction for Word Users
LaTeX Introduction for Word Users
 
Thinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ IntegrationThinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ Integration
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Qrious about Insights -- Big Data in the Real World

  • 1. Qrious about Insights Big Data in the Real World AUT DSRG Workshop Guy Kloss guy.kloss@qrious.co.nz Enterprise Architect Qrious Limited 7 February 2017
  • 2. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 2/41
  • 3. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Who/What is Qrious? We help New Zealand businesses and public sector organisations create value and solve their most pressing business problems by turning data into actionable insight. Guy Kloss | Big Data in the Real World 3/41
  • 4. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Who/What is Qrious? Backed by Spark Approx. 60 employees Offices in Auckland & Wellington Substantial investment across Data, Platform & People Built from the ground up (new generation technology and working principles) One of the largest Data Science teams in the country with > 80% qualified to Masters & PhD level and over 60 years of combined experience years of combined experience NZs leading data analytics specialist by 2017 Guy Kloss | Big Data in the Real World 4/41
  • 5. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Our Capabilities Advanced analytics Location insights Big Data platforms Consulting services BI & Warehousing Guy Kloss | Big Data in the Real World 5/41
  • 6. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Who am I? Chemical Engineer (Masters) Rocket Scientist (German Aerospace Centre) Computer Scientist (PhD) Former lecturer (AUT) Lead Software Developer and Head Crypto Geek @ Mega Enterprise Architect at Qrious Dad, baseballer, diver, . . . general geek! Guy Kloss | Big Data in the Real World 6/41
  • 7. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 7/41
  • 8. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Data size Number of records Data volume Guy Kloss | Big Data in the Real World 8/41
  • 9. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam An exponentially growing data world Primary Memory/Disk Capacity Guy Kloss | Big Data in the Real World 9/41
  • 10. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam An exponentially growing data world Relative Speeds Source: http://www.cs.cmu.edu/~amarp/cpu-io-gap Guy Kloss | Big Data in the Real World 10/41
  • 11. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Size Does Matter! Access/processing beyond a single machine (RAM, disk, CPU) Expensive data transfers at volume (latency, throughput) Guy Kloss | Big Data in the Real World 11/41
  • 12. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Storage Issues Storage, access, index, find Transfer, manage, prevent data loss Guy Kloss | Big Data in the Real World 12/41
  • 13. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Types of Data Structured Unstructured Graphs Free text . . . Guy Kloss | Big Data in the Real World 13/41
  • 14. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Correlating . . . co-relating . . . mashing . . . Not single record problem But an m : n problem Guy Kloss | Big Data in the Real World 14/41
  • 15. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Beyond Exponential Problems are between exponential and hyperexponential → Enabling data processing in an exponential world Guy Kloss | Big Data in the Real World 15/41
  • 16. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 16/41
  • 17. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Number of Records > 1 trillion (109 ) records: Spark’s location based data set Anonymised for privacy (on ingest) Fully encrypted (at rest and in transport) Continuous/stream ingestion Normalisation and segmentation on data set Correlating with external data set → Finding insights in this “hay mountain” Guy Kloss | Big Data in the Real World 17/41
  • 18. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Data Volume 100s of TB to PB of “Data Lakes” Not just a backup/data grave Fully encrypted (at rest and in transport) Includes data querying and processing capability → Capability to “store everything” (every thing and kind) Guy Kloss | Big Data in the Real World 18/41
  • 19. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 19/41
  • 20. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Divide and Conquer Massively parallel processing: MPP Parallelise: Map-Reduce Pipelines: Stream processing Guy Kloss | Big Data in the Real World 20/41
  • 21. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Leverage Data Locality Bring processing to the data Guy Kloss | Big Data in the Real World 21/41
  • 22. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam The Right Tools Don’t re-invent the wheel Use existing high performing tools where possible Available high productivity frameworks, making use of high level languages The right tool for the type of data Use the Source, Luke! (Leverage open source based tooling with a community) Guy Kloss | Big Data in the Real World 22/41
  • 23. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam The Right Data Organisation Row vs. columnar storage → For analytics often better in columnar format Guy Kloss | Big Data in the Real World 23/41
  • 24. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam In, Out, Cha-Cha-Cha Ingest data from (legacy, external) source systems → ETL – Extract, Transform, Load Make sure the rhythm fits (no missing “Out”) Guy Kloss | Big Data in the Real World 24/41
  • 25. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 25/41
  • 26. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Hadoop Hadoop and distributions Processing tools for relational, streaming, batch, graph, text, search, . . . Allocates cluster resources dynamically Data distributed (with redundancy), so compute allocated where data is Guy Kloss | Big Data in the Real World 26/41
  • 27. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Hadoop Distributions Many Hadoop distributions: Similar to Linux distributions Cloudera Partnership with Qrious “Bronze” partner Ambitions to become “Silver” partner and MSP (managed service provider) Guy Kloss | Big Data in the Real World 27/41
  • 28. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Basic Hadoop Tool Suite Example: Cloudera Hadoop Distribution Guy Kloss | Big Data in the Real World 28/41
  • 29. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam MPP Databases DB for massively parallel processing (MPP) Greenplum database and forks (based on PostgreSQL) Guy Kloss | Big Data in the Real World 29/41
  • 30. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Generic and Specialised DBs Generic RDBMS (where useful) NoSQL Graph DB Other columnar species Guy Kloss | Big Data in the Real World 30/41
  • 31. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 31/41
  • 32. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Delivering a Suitable Solution Includes: System management Connectivity Application logic Services Yummy add-ons Guy Kloss | Big Data in the Real World 32/41
  • 33. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam System Management Framework Security Dedicated sub-networks with specific firewall rules External firewalls User and credentials management Log collector Other security tools . . . System access VPN Remote desktop services Guy Kloss | Big Data in the Real World 33/41
  • 34. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Connectivity API gateways (Reverse) proxies SFTP Guy Kloss | Big Data in the Real World 34/41
  • 35. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Application Logic Platfor-as-a-Service (PaaS) Huge benefits of containerising application logic (using Docker) → Much reduced cadence for delivery APIs, Micro-Services Orchestration of Big Data analysis Guy Kloss | Big Data in the Real World 35/41
  • 36. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Services Solutioning, build Analytics and development Operation and maintenance Guy Kloss | Big Data in the Real World 36/41
  • 37. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Bonus Points for . . . Provenance (reproducibility, auditability, compliance) AI and ML Blockchain (non-repudiation, trust, “smart contracts”, identity management, federation, . . . ) Guy Kloss | Big Data in the Real World 37/41
  • 38. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 38/41
  • 39. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam In the Qrious Pipeline Make Big Data a commodity: Don’t buy, pay what you need! → Big-Data-as-a-Service – BDPaaS Sliced, diced and configured to your needs Straight on bare metal, not VMs (like most cloud hosters) Guy Kloss | Big Data in the Real World 39/41
  • 40. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Maximising the Jobmarket What skills do you need? RDBMS? SAS? NoSQL DBs? Maybe Hadoop is a good answer? Guy Kloss | Big Data in the Real World 40/41
  • 41. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Questions? Parallelise! Guy Kloss guy.kloss@qrious.co.nz Just a humble hair–dryer from the 30s: “One of the first machines used for permanent wave hairstyling back in the 1920’s and 1930’s.” Dark Roasted Blend: http://www.darkroastedblend.com/2007/05/ mystery-devices-issue-2.html Guy Kloss | Big Data in the Real World 41/41