SlideShare a Scribd company logo
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
Beyond a Big Data Pilot:
Building a Production Data Infrastructure
StampedeCon
29 May 2014, St. Louis
Stephen O’Sullivan (@steveos)
strata.svds.com @SVDataScience
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
2
Stephen O’Sullivan
Distinguished Architect
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
Beyond a Big Data
Pilot:
Building a Production Data
Infrastructure
Creating a data architecture involves many moving parts. By
examining the data value chain, from ingestion through to analytics,
we will explain how the various parts of the Hadoop and big data
ecosystem fit together to support batch, interactive and realtime
analytical workloads.
By tracing the flow of data from source to output, we’ll explore the
options and considerations for components, including data
acquisition, ingestion, storage, data services, analytics and data
management. Most importantly, we’ll leave you with a framework for
understanding these options and making choices.
3
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
4
Key-Value
Columnar
Graph
Document
GENERAL
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
5
UP OR OUT? Different use cases put different
demands on the data
infrastructure
• UC1
• UC2
• UC3
• UC4
• UCn
Increasing cost per unit of
capability from scale-up
architectures causes rationing of
resources. Only the most valuable
use cases are pursued.
Data Resource Usage
Value
scale-out
cost
UC 1 UC2 UC3 UC4
UCn
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
6
THE DATA VALUE CHAIN
Acquire Ingest Process Persist Integrate Analyze Expose
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
7
BUILDING A
DATA
PLATFORM
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
8
Acquisition:
from internal and external data sources
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
9
Ingestion
offline and real-time Processing
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
10
Persistence
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
11
Data Services
Exposing data to applications
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Service
s
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
12
Analytics
batch and real-time processing
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
Data
Management
Data security, operations,
lineage, quality, and metadata
management
13
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Service
s
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
Use Case • Collection in-store sales
transactions in near real-time
• Provide near real-time
dashboards of sales transaction
(roll up by store, region etc)
• Provide ad-hoc access to this
data as soon as its collected (ie
low latency, and fine grain)
14
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
15
APPLICATION SERVERS
DATA
CENTER
A
DATA
CENTER
B
BI Server
http
BI Server
http
FORTUNE 500
RETAIL COMPANY
Enabling Near Real-
time Sales
Transactions
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
16
APPLICATION SERVERS
DATA
CENTER
A
DATA
CENTER
B
CFS
BI Server
http
CFS
BI Server
http
FORTUNE 500
RETAIL COMPANY
Enabling Near Real-
time Sales
Transactions
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
• Data Acquisition
– Make sure you have the correct network
access in place
– Will depend on the data and your policies.
• Data Ingestion
– Make sure the solution you choose can
scale out. Apache flume is a good example
of this
– Make sure your not point to point. In Flume,
Storm, and Kafka you can configure forks
etc But you may need to handle duplicate
data
Ready to go into Production?
• Data Acquisition
– Can you see the “collectors” (internal or
external)?
– Do you need to encrypt the data (internally
or externally)?
• Data Ingestion
– Can you handle the traffic to the “collectors”?
– Redundant / self healing paths into the
cluster?
17
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
• Data Repository
– Make sure you have a way to address it, as
it will happen
– Hadoop and Cassandra makes it very easy
to add nodes. If you cannot add nodes be
prepared to drop data or stop processes or
both
– If’s its very wide data, and you query a
subset of the columns, Parquet would be a
good choice. If you would like to be able to
version your data schema, Avro is a good
choice.
• Data Services
– Build a restful service to access the data
– What is data resiliency I hear you ask..
Ready to go into Production?
• Data Repository
– Can you handle out of order data?
– Can you scale the cluster for data volume
spikes and/or processing spikes ?
– Should I just store plan text (compressed)?
• Data Services
– Do applications need to access this data?
– Do you have data resiliency?
18
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
19
Stovepipe:
One-to-one
relationship
from data
source to
product
Hard Failure:
If the data
source is
broken, so is
the app.
Multi-sourced:
Redundancy of
overlapping data
sources makes your
products more
resilient
Graceful Degradation:
If a data source
breaks, there is a
backup and your app
continues to function
Production data services
abstract the probabilistic
integration of overlapping
data sources. We call this
model a Data Mesh:
DATA RESILIENCY Products
Data
Sources
Broken
Data
Sources
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
• Analytics
– There are a few to choose from Hive,
Impala, Spark SQL, and HAWQ (and
growing). Share the same meta store.
Some are faster than others (depends on the
type of query)
– See if the current tool works with your distro.
You can also look at Platfora, Datameer, and
Karmasphere
– Yes you do, but the benefit is you still have
access to the raw data for the advance data
analyst or data science
– Now you have a data lake you can take
advantage of doing deep analytics on the
data without moving it out.
Ready to go into Production?
• Analytics
– Which is the right SQL on Hadoop solution
(for me)?
– Which BI tool should I use?
– Do I still need to set up business views of
the data?
– What about deep analytics?
20
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
21
Analytics
tools
Analytics
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
• Data Management
– It’s getting there.. At the query level using
Hive or Impala you can use Apache Sentry
or Apache Knox
– There are other 3rd party tools like Dataguise
that lets you do things like encryption at rest,
or masking
– Using “Fair Scheduler” will help you manage
your jobs SLA’s
– A 3rd party product by Pepper Data can help
with this too (and a little more)
Ready to go into Production?
• Data Management
– Security (who can see what?)
– Can you meet your SLA when other jobs /
queries are running?
– What monitoring do you have in place?
– Cluster failover?
22
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
Adding more
use cases
• I’m I duplicating data?
• Can I reuse the infrastructure
I’ve already created?
• Do have enough room in the
cluster (space/processing)?
• Will I impact the SLA’s of
jobs/queries currently running?
23
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
24
HIGH LEVEL
ARCHITECTURE
Oracle Stats
Collection
Pulling data over jdbc
Sending data to
Graphite Writing data to HDFS
Oracle Stats
CollectionOracle Stats
Collection
FORTUNE 500
RETAIL COMPANY
Enabling Real Time
Database Monitoring
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
25
FORTUNE 500
RETAIL COMPANY
Enabling Log
Collection & Search
statsd
http
APPLICATION SERVERS
Log Search
http
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
questions
26
Yes, We’re Hiring
svds.com/join-us
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
THANK YOU
Stephen O’Sullivan @steveos
27

More Related Content

What's hot

Actian forrester- hortonworks
Actian   forrester- hortonworksActian   forrester- hortonworks
Actian forrester- hortonworks
Hortonworks
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
Harald Erb
 
Developing a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceDeveloping a Strategy for Data Lake Governance
Developing a Strategy for Data Lake Governance
Tony Baer
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
Cloudera, Inc.
 
Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyHarnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case Study
DataWorks Summit
 
Hadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHAHadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHA
Hortonworks
 
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
DataWorks Summit/Hadoop Summit
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Technologies
 
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Hortonworks
 
IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?
Hortonworks
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
Cloudera, Inc.
 
Big data and its impact on SOA
Big data and its impact on SOABig data and its impact on SOA
Big data and its impact on SOA
Demed L'Her
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25
Hortonworks
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
Mark Kerzner
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
Caserta
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
Jeffrey T. Pollock
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
Scott Clinton
 
8 from zero to insight with real time big data
8 from zero to insight with real time big data8 from zero to insight with real time big data
8 from zero to insight with real time big dataDr. Wilfred Lin (Ph.D.)
 
A Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision MedicineA Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision Medicine
Cloudera, Inc.
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Cloudera, Inc.
 

What's hot (20)

Actian forrester- hortonworks
Actian   forrester- hortonworksActian   forrester- hortonworks
Actian forrester- hortonworks
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
 
Developing a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceDeveloping a Strategy for Data Lake Governance
Developing a Strategy for Data Lake Governance
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyHarnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case Study
 
Hadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHAHadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHA
 
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
 
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
 
IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
 
Big data and its impact on SOA
Big data and its impact on SOABig data and its impact on SOA
Big data and its impact on SOA
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
 
8 from zero to insight with real time big data
8 from zero to insight with real time big data8 from zero to insight with real time big data
8 from zero to insight with real time big data
 
A Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision MedicineA Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision Medicine
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
 

Viewers also liked

Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
James Serra
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
OCTO Technology
 
Big Data Asset Maturity Model
Big Data Asset Maturity ModelBig Data Asset Maturity Model
Big Data Asset Maturity Modelnoahwong
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Sumeet Singh
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
Uwe Printz
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
Uwe Printz
 
Apache Spark
Apache SparkApache Spark
Apache Spark
Uwe Printz
 
Unicom Big Data Conference
Unicom  Big Data ConferenceUnicom  Big Data Conference
Unicom Big Data Conference
Samudra Kanankearachchi
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
boorad
 
IT Operating Model
IT Operating ModelIT Operating Model
IT Operating Modelanusharaju38
 
How to create new business models with Big Data and Analytics
How to create new business models with Big Data and AnalyticsHow to create new business models with Big Data and Analytics
How to create new business models with Big Data and Analytics
Aki Balogh
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
Silicon Valley Data Science
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model
DATUM LLC
 

Viewers also liked (13)

Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
 
Big Data Asset Maturity Model
Big Data Asset Maturity ModelBig Data Asset Maturity Model
Big Data Asset Maturity Model
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Unicom Big Data Conference
Unicom  Big Data ConferenceUnicom  Big Data Conference
Unicom Big Data Conference
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
IT Operating Model
IT Operating ModelIT Operating Model
IT Operating Model
 
How to create new business models with Big Data and Analytics
How to create new business models with Big Data and AnalyticsHow to create new business models with Big Data and Analytics
How to create new business models with Big Data and Analytics
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model
 

Similar to Beyond a Big Data Pilot: Building a Production Data Infrastructure - StampedeCon 2014

Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
South West Data Meetup
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
Inside Analysis
 
Sqrrl March Webinar: How to Build a Big App
Sqrrl March Webinar: How to Build a Big AppSqrrl March Webinar: How to Build a Big App
Sqrrl March Webinar: How to Build a Big App
Sqrrl
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)
Jeffrey T. Pollock
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data Lakes
Kiran Kamreddy
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
Cloudera, Inc.
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?
Inside Analysis
 
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
AtScale
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
Hortonworks
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast Charts
Jeffrey T. Pollock
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
Hortonworks
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop Acceleration
Inside Analysis
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
Datameer
 
4. Big data & analytics HP
4. Big data & analytics HP4. Big data & analytics HP
4. Big data & analytics HP
MITEF México
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
 

Similar to Beyond a Big Data Pilot: Building a Production Data Infrastructure - StampedeCon 2014 (20)

Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
 
Sqrrl March Webinar: How to Build a Big App
Sqrrl March Webinar: How to Build a Big AppSqrrl March Webinar: How to Build a Big App
Sqrrl March Webinar: How to Build a Big App
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data Lakes
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?
 
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast Charts
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop Acceleration
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 
4. Big data & analytics HP
4. Big data & analytics HP4. Big data & analytics HP
4. Big data & analytics HP
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 

More from StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
StampedeCon
 

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 

Recently uploaded

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 

Beyond a Big Data Pilot: Building a Production Data Infrastructure - StampedeCon 2014

  • 1. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience Beyond a Big Data Pilot: Building a Production Data Infrastructure StampedeCon 29 May 2014, St. Louis Stephen O’Sullivan (@steveos) strata.svds.com @SVDataScience
  • 2. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 2 Stephen O’Sullivan Distinguished Architect
  • 3. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience Beyond a Big Data Pilot: Building a Production Data Infrastructure Creating a data architecture involves many moving parts. By examining the data value chain, from ingestion through to analytics, we will explain how the various parts of the Hadoop and big data ecosystem fit together to support batch, interactive and realtime analytical workloads. By tracing the flow of data from source to output, we’ll explore the options and considerations for components, including data acquisition, ingestion, storage, data services, analytics and data management. Most importantly, we’ll leave you with a framework for understanding these options and making choices. 3
  • 4. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 4 Key-Value Columnar Graph Document GENERAL
  • 5. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 5 UP OR OUT? Different use cases put different demands on the data infrastructure • UC1 • UC2 • UC3 • UC4 • UCn Increasing cost per unit of capability from scale-up architectures causes rationing of resources. Only the most valuable use cases are pursued. Data Resource Usage Value scale-out cost UC 1 UC2 UC3 UC4 UCn
  • 6. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 6 THE DATA VALUE CHAIN Acquire Ingest Process Persist Integrate Analyze Expose
  • 7. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 7 BUILDING A DATA PLATFORM External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Services
  • 8. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 8 Acquisition: from internal and external data sources External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Services
  • 9. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 9 Ingestion offline and real-time Processing External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Services
  • 10. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 10 Persistence External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Services
  • 11. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 11 Data Services Exposing data to applications External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Service s
  • 12. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 12 Analytics batch and real-time processing External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Services
  • 13. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience Data Management Data security, operations, lineage, quality, and metadata management 13 External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Service s
  • 14. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience Use Case • Collection in-store sales transactions in near real-time • Provide near real-time dashboards of sales transaction (roll up by store, region etc) • Provide ad-hoc access to this data as soon as its collected (ie low latency, and fine grain) 14
  • 15. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 15 APPLICATION SERVERS DATA CENTER A DATA CENTER B BI Server http BI Server http FORTUNE 500 RETAIL COMPANY Enabling Near Real- time Sales Transactions
  • 16. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 16 APPLICATION SERVERS DATA CENTER A DATA CENTER B CFS BI Server http CFS BI Server http FORTUNE 500 RETAIL COMPANY Enabling Near Real- time Sales Transactions
  • 17. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience • Data Acquisition – Make sure you have the correct network access in place – Will depend on the data and your policies. • Data Ingestion – Make sure the solution you choose can scale out. Apache flume is a good example of this – Make sure your not point to point. In Flume, Storm, and Kafka you can configure forks etc But you may need to handle duplicate data Ready to go into Production? • Data Acquisition – Can you see the “collectors” (internal or external)? – Do you need to encrypt the data (internally or externally)? • Data Ingestion – Can you handle the traffic to the “collectors”? – Redundant / self healing paths into the cluster? 17
  • 18. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience • Data Repository – Make sure you have a way to address it, as it will happen – Hadoop and Cassandra makes it very easy to add nodes. If you cannot add nodes be prepared to drop data or stop processes or both – If’s its very wide data, and you query a subset of the columns, Parquet would be a good choice. If you would like to be able to version your data schema, Avro is a good choice. • Data Services – Build a restful service to access the data – What is data resiliency I hear you ask.. Ready to go into Production? • Data Repository – Can you handle out of order data? – Can you scale the cluster for data volume spikes and/or processing spikes ? – Should I just store plan text (compressed)? • Data Services – Do applications need to access this data? – Do you have data resiliency? 18
  • 19. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 19 Stovepipe: One-to-one relationship from data source to product Hard Failure: If the data source is broken, so is the app. Multi-sourced: Redundancy of overlapping data sources makes your products more resilient Graceful Degradation: If a data source breaks, there is a backup and your app continues to function Production data services abstract the probabilistic integration of overlapping data sources. We call this model a Data Mesh: DATA RESILIENCY Products Data Sources Broken Data Sources Data Services
  • 20. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience • Analytics – There are a few to choose from Hive, Impala, Spark SQL, and HAWQ (and growing). Share the same meta store. Some are faster than others (depends on the type of query) – See if the current tool works with your distro. You can also look at Platfora, Datameer, and Karmasphere – Yes you do, but the benefit is you still have access to the raw data for the advance data analyst or data science – Now you have a data lake you can take advantage of doing deep analytics on the data without moving it out. Ready to go into Production? • Analytics – Which is the right SQL on Hadoop solution (for me)? – Which BI tool should I use? – Do I still need to set up business views of the data? – What about deep analytics? 20
  • 21. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 21 Analytics tools Analytics
  • 22. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience • Data Management – It’s getting there.. At the query level using Hive or Impala you can use Apache Sentry or Apache Knox – There are other 3rd party tools like Dataguise that lets you do things like encryption at rest, or masking – Using “Fair Scheduler” will help you manage your jobs SLA’s – A 3rd party product by Pepper Data can help with this too (and a little more) Ready to go into Production? • Data Management – Security (who can see what?) – Can you meet your SLA when other jobs / queries are running? – What monitoring do you have in place? – Cluster failover? 22
  • 23. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience Adding more use cases • I’m I duplicating data? • Can I reuse the infrastructure I’ve already created? • Do have enough room in the cluster (space/processing)? • Will I impact the SLA’s of jobs/queries currently running? 23
  • 24. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 24 HIGH LEVEL ARCHITECTURE Oracle Stats Collection Pulling data over jdbc Sending data to Graphite Writing data to HDFS Oracle Stats CollectionOracle Stats Collection FORTUNE 500 RETAIL COMPANY Enabling Real Time Database Monitoring
  • 25. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 25 FORTUNE 500 RETAIL COMPANY Enabling Log Collection & Search statsd http APPLICATION SERVERS Log Search http
  • 26. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience questions 26 Yes, We’re Hiring svds.com/join-us
  • 27. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience THANK YOU Stephen O’Sullivan @steveos 27