SlideShare a Scribd company logo
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
Beyond a Big Data Pilot:
Building a Production Data Infrastructure
StampedeCon
29 May 2014, St. Louis
Stephen O’Sullivan (@steveos)
strata.svds.com @SVDataScience
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
2
Stephen O’Sullivan
Distinguished Architect
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
Beyond a Big Data
Pilot:
Building a Production Data
Infrastructure
Creating a data architecture involves many moving parts. By
examining the data value chain, from ingestion through to analytics,
we will explain how the various parts of the Hadoop and big data
ecosystem fit together to support batch, interactive and realtime
analytical workloads.
By tracing the flow of data from source to output, we’ll explore the
options and considerations for components, including data
acquisition, ingestion, storage, data services, analytics and data
management. Most importantly, we’ll leave you with a framework for
understanding these options and making choices.
3
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
4
Key-Value
Columnar
Graph
Document
GENERAL
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
5
UP OR OUT? Different use cases put different
demands on the data
infrastructure
• UC1
• UC2
• UC3
• UC4
• UCn
Increasing cost per unit of
capability from scale-up
architectures causes rationing of
resources. Only the most valuable
use cases are pursued.
Data Resource Usage
Value
scale-out
cost
UC 1 UC2 UC3 UC4
UCn
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
6
THE DATA VALUE CHAIN
Acquire Ingest Process Persist Integrate Analyze Expose
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
7
BUILDING A
DATA
PLATFORM
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
8
Acquisition:
from internal and external data sources
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
9
Ingestion
offline and real-time Processing
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
10
Persistence
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
11
Data Services
Exposing data to applications
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Service
s
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
12
Analytics
batch and real-time processing
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
Data
Management
Data security, operations,
lineage, quality, and metadata
management
13
External
Systems
Data
Acquisition
Internal
Data
Sources
Data Management
Security, Operations, Data Quality, Meta Data Management and Data Lineage
Analytics
Data
Ingestion
Data
Repository
External
Data
Sources
Persistence
Offline
Processing
Real Time
Processing
Batch
Processing
Data
Service
s
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
Use Case • Collection in-store sales
transactions in near real-time
• Provide near real-time
dashboards of sales transaction
(roll up by store, region etc)
• Provide ad-hoc access to this
data as soon as its collected (ie
low latency, and fine grain)
14
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
15
APPLICATION SERVERS
DATA
CENTER
A
DATA
CENTER
B
BI Server
http
BI Server
http
FORTUNE 500
RETAIL COMPANY
Enabling Near Real-
time Sales
Transactions
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
16
APPLICATION SERVERS
DATA
CENTER
A
DATA
CENTER
B
CFS
BI Server
http
CFS
BI Server
http
FORTUNE 500
RETAIL COMPANY
Enabling Near Real-
time Sales
Transactions
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
• Data Acquisition
– Make sure you have the correct network
access in place
– Will depend on the data and your policies.
• Data Ingestion
– Make sure the solution you choose can
scale out. Apache flume is a good example
of this
– Make sure your not point to point. In Flume,
Storm, and Kafka you can configure forks
etc But you may need to handle duplicate
data
Ready to go into Production?
• Data Acquisition
– Can you see the “collectors” (internal or
external)?
– Do you need to encrypt the data (internally
or externally)?
• Data Ingestion
– Can you handle the traffic to the “collectors”?
– Redundant / self healing paths into the
cluster?
17
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
• Data Repository
– Make sure you have a way to address it, as
it will happen
– Hadoop and Cassandra makes it very easy
to add nodes. If you cannot add nodes be
prepared to drop data or stop processes or
both
– If’s its very wide data, and you query a
subset of the columns, Parquet would be a
good choice. If you would like to be able to
version your data schema, Avro is a good
choice.
• Data Services
– Build a restful service to access the data
– What is data resiliency I hear you ask..
Ready to go into Production?
• Data Repository
– Can you handle out of order data?
– Can you scale the cluster for data volume
spikes and/or processing spikes ?
– Should I just store plan text (compressed)?
• Data Services
– Do applications need to access this data?
– Do you have data resiliency?
18
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
19
Stovepipe:
One-to-one
relationship
from data
source to
product
Hard Failure:
If the data
source is
broken, so is
the app.
Multi-sourced:
Redundancy of
overlapping data
sources makes your
products more
resilient
Graceful Degradation:
If a data source
breaks, there is a
backup and your app
continues to function
Production data services
abstract the probabilistic
integration of overlapping
data sources. We call this
model a Data Mesh:
DATA RESILIENCY Products
Data
Sources
Broken
Data
Sources
Data
Services
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
• Analytics
– There are a few to choose from Hive,
Impala, Spark SQL, and HAWQ (and
growing). Share the same meta store.
Some are faster than others (depends on the
type of query)
– See if the current tool works with your distro.
You can also look at Platfora, Datameer, and
Karmasphere
– Yes you do, but the benefit is you still have
access to the raw data for the advance data
analyst or data science
– Now you have a data lake you can take
advantage of doing deep analytics on the
data without moving it out.
Ready to go into Production?
• Analytics
– Which is the right SQL on Hadoop solution
(for me)?
– Which BI tool should I use?
– Do I still need to set up business views of
the data?
– What about deep analytics?
20
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
21
Analytics
tools
Analytics
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
• Data Management
– It’s getting there.. At the query level using
Hive or Impala you can use Apache Sentry
or Apache Knox
– There are other 3rd party tools like Dataguise
that lets you do things like encryption at rest,
or masking
– Using “Fair Scheduler” will help you manage
your jobs SLA’s
– A 3rd party product by Pepper Data can help
with this too (and a little more)
Ready to go into Production?
• Data Management
– Security (who can see what?)
– Can you meet your SLA when other jobs /
queries are running?
– What monitoring do you have in place?
– Cluster failover?
22
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
Adding more
use cases
• I’m I duplicating data?
• Can I reuse the infrastructure
I’ve already created?
• Do have enough room in the
cluster (space/processing)?
• Will I impact the SLA’s of
jobs/queries currently running?
23
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
24
HIGH LEVEL
ARCHITECTURE
Oracle Stats
Collection
Pulling data over jdbc
Sending data to
Graphite Writing data to HDFS
Oracle Stats
CollectionOracle Stats
Collection
FORTUNE 500
RETAIL COMPANY
Enabling Real Time
Database Monitoring
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
25
FORTUNE 500
RETAIL COMPANY
Enabling Log
Collection & Search
statsd
http
APPLICATION SERVERS
Log Search
http
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
questions
26
Yes, We’re Hiring
svds.com/join-us
© 2014 Silicon Valley Data Science LLC
All Rights Reserved.
@SVDataScience
THANK YOU
Stephen O’Sullivan @steveos
27

More Related Content

What's hot

Actian forrester- hortonworks
Actian   forrester- hortonworksActian   forrester- hortonworks
Actian forrester- hortonworks
Hortonworks
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
Harald Erb
 
Developing a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceDeveloping a Strategy for Data Lake Governance
Developing a Strategy for Data Lake Governance
Tony Baer
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
Cloudera, Inc.
 
Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyHarnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case Study
DataWorks Summit
 
Hadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHAHadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHA
Hortonworks
 
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
DataWorks Summit/Hadoop Summit
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Technologies
 
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Hortonworks
 
IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?
Hortonworks
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
Cloudera, Inc.
 
Big data and its impact on SOA
Big data and its impact on SOABig data and its impact on SOA
Big data and its impact on SOA
Demed L'Her
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25
Hortonworks
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
Mark Kerzner
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
Caserta
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
Jeffrey T. Pollock
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
Scott Clinton
 
8 from zero to insight with real time big data
8 from zero to insight with real time big data8 from zero to insight with real time big data
8 from zero to insight with real time big data
Dr. Wilfred Lin (Ph.D.)
 
A Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision MedicineA Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision Medicine
Cloudera, Inc.
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Cloudera, Inc.
 

What's hot (20)

Actian forrester- hortonworks
Actian   forrester- hortonworksActian   forrester- hortonworks
Actian forrester- hortonworks
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
 
Developing a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceDeveloping a Strategy for Data Lake Governance
Developing a Strategy for Data Lake Governance
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyHarnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case Study
 
Hadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHAHadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHA
 
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
 
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
 
IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
 
Big data and its impact on SOA
Big data and its impact on SOABig data and its impact on SOA
Big data and its impact on SOA
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
 
8 from zero to insight with real time big data
8 from zero to insight with real time big data8 from zero to insight with real time big data
8 from zero to insight with real time big data
 
A Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision MedicineA Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision Medicine
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
 

Viewers also liked

Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
James Serra
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
OCTO Technology
 
Big Data Asset Maturity Model
Big Data Asset Maturity ModelBig Data Asset Maturity Model
Big Data Asset Maturity Model
noahwong
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Sumeet Singh
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
Uwe Printz
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
Uwe Printz
 
Apache Spark
Apache SparkApache Spark
Apache Spark
Uwe Printz
 
Unicom Big Data Conference
Unicom  Big Data ConferenceUnicom  Big Data Conference
Unicom Big Data Conference
Samudra Kanankearachchi
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
boorad
 
IT Operating Model
IT Operating ModelIT Operating Model
IT Operating Model
anusharaju38
 
How to create new business models with Big Data and Analytics
How to create new business models with Big Data and AnalyticsHow to create new business models with Big Data and Analytics
How to create new business models with Big Data and Analytics
Aki Balogh
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
Silicon Valley Data Science
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model
DATUM LLC
 

Viewers also liked (13)

Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
 
Big Data Asset Maturity Model
Big Data Asset Maturity ModelBig Data Asset Maturity Model
Big Data Asset Maturity Model
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Unicom Big Data Conference
Unicom  Big Data ConferenceUnicom  Big Data Conference
Unicom Big Data Conference
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
IT Operating Model
IT Operating ModelIT Operating Model
IT Operating Model
 
How to create new business models with Big Data and Analytics
How to create new business models with Big Data and AnalyticsHow to create new business models with Big Data and Analytics
How to create new business models with Big Data and Analytics
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model
 

Similar to Beyond a Big Data Pilot: Building a Production Data Infrastructure - StampedeCon 2014

Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
South West Data Meetup
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
Inside Analysis
 
Sqrrl March Webinar: How to Build a Big App
Sqrrl March Webinar: How to Build a Big AppSqrrl March Webinar: How to Build a Big App
Sqrrl March Webinar: How to Build a Big App
Sqrrl
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)
Jeffrey T. Pollock
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data Lakes
Kiran Kamreddy
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
Cloudera, Inc.
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?
Inside Analysis
 
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
AtScale
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
Hortonworks
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast Charts
Jeffrey T. Pollock
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
Hortonworks
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop Acceleration
Inside Analysis
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
Datameer
 
4. Big data & analytics HP
4. Big data & analytics HP4. Big data & analytics HP
4. Big data & analytics HP
MITEF México
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
 

Similar to Beyond a Big Data Pilot: Building a Production Data Infrastructure - StampedeCon 2014 (20)

Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
 
Sqrrl March Webinar: How to Build a Big App
Sqrrl March Webinar: How to Build a Big AppSqrrl March Webinar: How to Build a Big App
Sqrrl March Webinar: How to Build a Big App
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data Lakes
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?
 
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast Charts
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Level Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop AccelerationLevel Up – How to Achieve Hadoop Acceleration
Level Up – How to Achieve Hadoop Acceleration
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 
4. Big data & analytics HP
4. Big data & analytics HP4. Big data & analytics HP
4. Big data & analytics HP
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 

More from StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
StampedeCon
 

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 

Recently uploaded

Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 

Recently uploaded (20)

Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 

Beyond a Big Data Pilot: Building a Production Data Infrastructure - StampedeCon 2014

  • 1. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience Beyond a Big Data Pilot: Building a Production Data Infrastructure StampedeCon 29 May 2014, St. Louis Stephen O’Sullivan (@steveos) strata.svds.com @SVDataScience
  • 2. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 2 Stephen O’Sullivan Distinguished Architect
  • 3. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience Beyond a Big Data Pilot: Building a Production Data Infrastructure Creating a data architecture involves many moving parts. By examining the data value chain, from ingestion through to analytics, we will explain how the various parts of the Hadoop and big data ecosystem fit together to support batch, interactive and realtime analytical workloads. By tracing the flow of data from source to output, we’ll explore the options and considerations for components, including data acquisition, ingestion, storage, data services, analytics and data management. Most importantly, we’ll leave you with a framework for understanding these options and making choices. 3
  • 4. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 4 Key-Value Columnar Graph Document GENERAL
  • 5. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 5 UP OR OUT? Different use cases put different demands on the data infrastructure • UC1 • UC2 • UC3 • UC4 • UCn Increasing cost per unit of capability from scale-up architectures causes rationing of resources. Only the most valuable use cases are pursued. Data Resource Usage Value scale-out cost UC 1 UC2 UC3 UC4 UCn
  • 6. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 6 THE DATA VALUE CHAIN Acquire Ingest Process Persist Integrate Analyze Expose
  • 7. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 7 BUILDING A DATA PLATFORM External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Services
  • 8. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 8 Acquisition: from internal and external data sources External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Services
  • 9. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 9 Ingestion offline and real-time Processing External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Services
  • 10. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 10 Persistence External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Services
  • 11. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 11 Data Services Exposing data to applications External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Service s
  • 12. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 12 Analytics batch and real-time processing External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Services
  • 13. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience Data Management Data security, operations, lineage, quality, and metadata management 13 External Systems Data Acquisition Internal Data Sources Data Management Security, Operations, Data Quality, Meta Data Management and Data Lineage Analytics Data Ingestion Data Repository External Data Sources Persistence Offline Processing Real Time Processing Batch Processing Data Service s
  • 14. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience Use Case • Collection in-store sales transactions in near real-time • Provide near real-time dashboards of sales transaction (roll up by store, region etc) • Provide ad-hoc access to this data as soon as its collected (ie low latency, and fine grain) 14
  • 15. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 15 APPLICATION SERVERS DATA CENTER A DATA CENTER B BI Server http BI Server http FORTUNE 500 RETAIL COMPANY Enabling Near Real- time Sales Transactions
  • 16. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 16 APPLICATION SERVERS DATA CENTER A DATA CENTER B CFS BI Server http CFS BI Server http FORTUNE 500 RETAIL COMPANY Enabling Near Real- time Sales Transactions
  • 17. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience • Data Acquisition – Make sure you have the correct network access in place – Will depend on the data and your policies. • Data Ingestion – Make sure the solution you choose can scale out. Apache flume is a good example of this – Make sure your not point to point. In Flume, Storm, and Kafka you can configure forks etc But you may need to handle duplicate data Ready to go into Production? • Data Acquisition – Can you see the “collectors” (internal or external)? – Do you need to encrypt the data (internally or externally)? • Data Ingestion – Can you handle the traffic to the “collectors”? – Redundant / self healing paths into the cluster? 17
  • 18. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience • Data Repository – Make sure you have a way to address it, as it will happen – Hadoop and Cassandra makes it very easy to add nodes. If you cannot add nodes be prepared to drop data or stop processes or both – If’s its very wide data, and you query a subset of the columns, Parquet would be a good choice. If you would like to be able to version your data schema, Avro is a good choice. • Data Services – Build a restful service to access the data – What is data resiliency I hear you ask.. Ready to go into Production? • Data Repository – Can you handle out of order data? – Can you scale the cluster for data volume spikes and/or processing spikes ? – Should I just store plan text (compressed)? • Data Services – Do applications need to access this data? – Do you have data resiliency? 18
  • 19. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 19 Stovepipe: One-to-one relationship from data source to product Hard Failure: If the data source is broken, so is the app. Multi-sourced: Redundancy of overlapping data sources makes your products more resilient Graceful Degradation: If a data source breaks, there is a backup and your app continues to function Production data services abstract the probabilistic integration of overlapping data sources. We call this model a Data Mesh: DATA RESILIENCY Products Data Sources Broken Data Sources Data Services
  • 20. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience • Analytics – There are a few to choose from Hive, Impala, Spark SQL, and HAWQ (and growing). Share the same meta store. Some are faster than others (depends on the type of query) – See if the current tool works with your distro. You can also look at Platfora, Datameer, and Karmasphere – Yes you do, but the benefit is you still have access to the raw data for the advance data analyst or data science – Now you have a data lake you can take advantage of doing deep analytics on the data without moving it out. Ready to go into Production? • Analytics – Which is the right SQL on Hadoop solution (for me)? – Which BI tool should I use? – Do I still need to set up business views of the data? – What about deep analytics? 20
  • 21. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 21 Analytics tools Analytics
  • 22. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience • Data Management – It’s getting there.. At the query level using Hive or Impala you can use Apache Sentry or Apache Knox – There are other 3rd party tools like Dataguise that lets you do things like encryption at rest, or masking – Using “Fair Scheduler” will help you manage your jobs SLA’s – A 3rd party product by Pepper Data can help with this too (and a little more) Ready to go into Production? • Data Management – Security (who can see what?) – Can you meet your SLA when other jobs / queries are running? – What monitoring do you have in place? – Cluster failover? 22
  • 23. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience Adding more use cases • I’m I duplicating data? • Can I reuse the infrastructure I’ve already created? • Do have enough room in the cluster (space/processing)? • Will I impact the SLA’s of jobs/queries currently running? 23
  • 24. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 24 HIGH LEVEL ARCHITECTURE Oracle Stats Collection Pulling data over jdbc Sending data to Graphite Writing data to HDFS Oracle Stats CollectionOracle Stats Collection FORTUNE 500 RETAIL COMPANY Enabling Real Time Database Monitoring
  • 25. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience 25 FORTUNE 500 RETAIL COMPANY Enabling Log Collection & Search statsd http APPLICATION SERVERS Log Search http
  • 26. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience questions 26 Yes, We’re Hiring svds.com/join-us
  • 27. © 2014 Silicon Valley Data Science LLC All Rights Reserved. @SVDataScience THANK YOU Stephen O’Sullivan @steveos 27