SlideShare a Scribd company logo
About the BDW Meetup
• Big Data is a complex, rapidly changing
landscape
• We want to share our stories and hear
about yours
• Great networking opportunity for like
minded data nerds
• Opportunities to collaborate on exciting
projects
• Founded by Caserta Concepts, DW, BI &
Big Data Analytics Consulting
• Next BDW Meetup: September 23
About Caserta Concepts
• Financial Services
• Healthcare / Insurance
• Retail / eCommerce
• Digital Media / Marketing
• K-12 / Higher Education
Industries Served
• President: Joe Caserta, industry thought leader,
consultant, educator and co-author, The Data
Warehouse ETL Toolkit (Wiley, 2004)
Founded in 2001
• Big Data Analytics
• Data Warehousing
• Business Intelligence
• Strategic Data
Ecosystems
Focused
Expertise
Implementation Expertise & Offerings
Strategic Roadmap/
Assessment/Consulting
Database
BI/Visualization/
Analytics
Master Data Management
Big Data
Analytics
Storm
WHY NOSQL MATTERS… FOR ANALYTICS
Elliott Cordo
Principal Consultant, Caserta Concepts
NOSQL So what?
• NoSQL is one of the most exciting movements in BIG DATA.
• NoSQL is changing the way a lot of people think about
application development  especially analytic applications
• Not all data is efficiently stored or processed in a relational DB.
• High data volumes
• Data does not fit or require relational model
• We have new tools in our arsenal for processing, storing, and
analyzing data with these new challenges
But we love SQL
• Relational databases still have their place
• Flexible
• Rich Query Syntax
• THEY HAVE JOINS AND AGGREGATION!!
• The relational DB is great at being general purpose!
• You build nice normalized structure, establish the logical
relationships and then you can build any query you need for your
application.
• This has kept us happy in the Data Warehousing (and
app-dev) world for decades!
Scale and Performance
Performance:
• Relational databases have a lot of features, overhead that
we in many cases don’t need.
Scale out:
• Most relational databases scale vertically giving them
limits to how large they can get. Federation and Sharding
is an awkward manual process.
• Most NoSQL scale horizontally on commodity hardware
But what will we sacrifice?
• Query Features:
• NoSQL DB’s have fairly simple query languages. Limited or no support for
the following outside of map reduce:
• Joins
• Aggregation
Why? - NoSQL databases were born to be high performance.
Data is stored as it is to be used (tuned to a query) rather than modeled
around entities. So a sophisticated query language is not needed.
• BI and ETL tool support limited
So what about NoSQL for Analytics?
• NoSQL databases are generally not as flexible as relational databases
for ad-hoc questions.
• Secondary indexes provide some flexibility but lack of Joins generally
requires denormalization
• Materialized views: Joins and aggregates can be implemented via Map
Reduce  However materializing the world has it’s drawbacks!
• A different way of doing things:
• Client-side join: query “dimensions” get key  query “fact” on secondary index.
• Link walking  leverage metadata link between entities  Riak!
• Ad-hoc Map Reduce jobs
• Aggregate Navigation  navigate from aggregate entities of different grain
• Search!
Limited native
BI Tool Support!
NoSQL can be a great fit for analytic
applications!!
• High volumes/Low Latency analytic
environments
• Queries are largely known and can be
precomuted in-stream (via application itself or
Storm) or in batch using Map Reduce
• Sweet spot is very high volumes with relatively
static analytic requirements.
• Common Design Pattern:
• Compute aggregates and events in-line and
store to aggregate entities in NoSQL
• Write enriched detail records to NoSQL or
Hadoop for further processing
RDBMS NoSQL
Volume
QueryFlexibility
BIG ETL
• One of the promises of BIG DATA is being able to enrich,
process, enormous data volumes
• The processing engines:
• Storm  inline, real-time processing
• Hadoop  batch processing
• NoSQL can play an integral part of this architecture:
• Distributed Lookup Cache
• Shared State
• Queueing Mechanism
Storm Topology
Data
Sources
Relational
EDW
Analytic Data-stores
NoSQL databases are like snowflakes or
Smurfs..
They are all special and no two are alike!
Let’s review the main categories and determine their
general fit for analytic applications
• Platforms: Riak, Redis
• Buckets/Collections are equivalent to a table in a RDMS
• Primary unit of storage is a key value pair, the value can be
anything ranging from number to a JSON Document.
• Key value stores are super fast and simple
• Analytic Capabilities:
• Although many have very spartan feature sets some platforms like Riak
have analytic friendly Links, Tags Metadata, and powerful map reduce
capabilities!
• Generally writes and reads are ultra-fast  good candidate for a BIG
ETL component
Key Value
• Platforms: Cassandra, HBase
• Column families are the equivalent to a table in a RDMS
• Primary unit of storage is a column, they are stored contiguously
Skinny Rows: Most like relational database. Except columns are optional and not
stored if omitted:
Wide Rows: Rows can be billions of columns wide, used for time series,
relationships, analytics!
Analytic Capabilities: Widely used in analytic applications Typical analytic
design pattern is to use “Skinny Rows” for detail records, “Wide Rows” for
aggregates.
Columnar
Document
• Platforms: MongoDB, CouchDB
• Collections are the equivalent to a table in a RDMS
• Primary unit of storage is a document
{ “User" : ”Bobby”,
“Email”: bobby@db-lover.com,
“Channel”: “Web”,
“State”: “NJ” }
{ “User" : ”Susie”,
“Email”: “Susie@sql-enthusiast.com”,
“PreferredCategories: [
{ Category: “Fashion”,
CategoryAdded: “2012-01-01” },
{ Category: “Outdoor Equipment”,
CategoryAdded: “2013-01-01” } ],
“Channel”: In-Store }
Analytic Capabilities: most similar to relational in function but requires
denormalization. Secondary index support, map reduce. Mongo has a cool
new aggregation framework.
In the Real World: High Volume Sensor
Analytics
• Ingestion and analytics of Sensor Data
• 6 to 12 BILLION records being ingested daily (average
500k records per second at peek load)!
• Ingested data must be stored to disk and highly available
• Pre-defined aggregates and event monitors must be near
real-time
• Ad-hoc query capabilities required on historical data
One way to do it.. that worked
Storm Cluster
Sensor
Data
d3.js Analytics
Hadoop Cluster
Low Latency
Analytics
Atomic data
Aggregates
Event Monitors
• The Kafka messaging system is used for ingestion
• Storm is used for real-time ETL and outputs atomic data
and derived data needed for analytics
• Redis is used as a reference data lookup cache
• Real time analytics are produced from the aggregated
data.
• Higher latency ad-hoc analytics are done in Hadoop
using Pig and Hive
Kafka
Parting Thought
Polyglot Persistence – “where any decent sized
enterprise will have a variety of different data storage
technologies for different kinds of data. There will still
be large amounts of it managed in relational stores,
but increasingly we'll be first asking how we want to
manipulate the data and only then figuring out what
technology is the best bet for it.”
-- Martin Fowler
Contact
Elliott Cordo
Principal Consultant, Caserta
Concepts
P: (855) 755-2246 x267
E: elliott@casertaconcepts.com
info@casertaconcepts.com
1(855) 755-2246
www.casertaconcepts.com

More Related Content

What's hot

Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...DATAVERSITY
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB James Serra
 
Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?DATAVERSITY
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
NoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessNoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessInfiniteGraph
 
Enabling digital transformation api ecosystems and data virtualization
Enabling digital transformation   api ecosystems and data virtualizationEnabling digital transformation   api ecosystems and data virtualization
Enabling digital transformation api ecosystems and data virtualizationDenodo
 
Top 5 Data Architecture Challenges with Ron Huizenga
Top 5 Data Architecture Challenges with Ron HuizengaTop 5 Data Architecture Challenges with Ron Huizenga
Top 5 Data Architecture Challenges with Ron HuizengaEmbarcadero Technologies
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationMongoDB
 
Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0Denodo
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
 
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in thereOvum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in thereZaloni
 
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase PowerWebinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase PowerZaloni
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Amazon Web Services
 
Best Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsBest Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsJames Serra
 
JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)plarsen67
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsQuontra Solutions
 
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataWebinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataZaloni
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureLorenzo Nicora
 

What's hot (20)

Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
NoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessNoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-less
 
Enabling digital transformation api ecosystems and data virtualization
Enabling digital transformation   api ecosystems and data virtualizationEnabling digital transformation   api ecosystems and data virtualization
Enabling digital transformation api ecosystems and data virtualization
 
Top 5 Data Architecture Challenges with Ron Huizenga
Top 5 Data Architecture Challenges with Ron HuizengaTop 5 Data Architecture Challenges with Ron Huizenga
Top 5 Data Architecture Challenges with Ron Huizenga
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in thereOvum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
 
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase PowerWebinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
 
Best Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsBest Practices to Deliver BI Solutions
Best Practices to Deliver BI Solutions
 
JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
 
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataWebinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and Future
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 

Viewers also liked

What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsCloudera, Inc.
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoophadooparchbook
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data WarehouseCaserta
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with DataSeth Familian
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkVolker Hirsch
 

Viewers also liked (7)

What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of Work
 

Similar to Big Data Warehousing Meetup with Riak

NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...MongoDB
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZoneStartup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZoneIdan Tohami
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureArthur Gimpel
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
 
No sql now2011_review_of_adhoc_architectures
No sql now2011_review_of_adhoc_architecturesNo sql now2011_review_of_adhoc_architectures
No sql now2011_review_of_adhoc_architecturesNicholas Goodman
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 
Gilbane Boston 2011 big data
Gilbane Boston 2011 big dataGilbane Boston 2011 big data
Gilbane Boston 2011 big dataPeter O'Kelly
 
Evolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraEvolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraVishal Puri
 
Business Intelligence & NoSQL Databases
Business Intelligence & NoSQL DatabasesBusiness Intelligence & NoSQL Databases
Business Intelligence & NoSQL DatabasesRadhoueneRouached
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLTushar Shende
 

Similar to Big Data Warehousing Meetup with Riak (20)

NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
UNIT-2.pptx
UNIT-2.pptxUNIT-2.pptx
UNIT-2.pptx
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZoneStartup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
 
NoSQL
NoSQLNoSQL
NoSQL
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
A peek into the future
A peek into the futureA peek into the future
A peek into the future
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
No sql now2011_review_of_adhoc_architectures
No sql now2011_review_of_adhoc_architecturesNo sql now2011_review_of_adhoc_architectures
No sql now2011_review_of_adhoc_architectures
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Gilbane Boston 2011 big data
Gilbane Boston 2011 big dataGilbane Boston 2011 big data
Gilbane Boston 2011 big data
 
Evolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraEvolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital era
 
Business Intelligence & NoSQL Databases
Business Intelligence & NoSQL DatabasesBusiness Intelligence & NoSQL Databases
Business Intelligence & NoSQL Databases
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQL
 

More from Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseCaserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 

More from Caserta (20)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 

Recently uploaded

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfalexjohnson7307
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 

Recently uploaded (20)

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 

Big Data Warehousing Meetup with Riak

  • 1. About the BDW Meetup • Big Data is a complex, rapidly changing landscape • We want to share our stories and hear about yours • Great networking opportunity for like minded data nerds • Opportunities to collaborate on exciting projects • Founded by Caserta Concepts, DW, BI & Big Data Analytics Consulting • Next BDW Meetup: September 23
  • 2. About Caserta Concepts • Financial Services • Healthcare / Insurance • Retail / eCommerce • Digital Media / Marketing • K-12 / Higher Education Industries Served • President: Joe Caserta, industry thought leader, consultant, educator and co-author, The Data Warehouse ETL Toolkit (Wiley, 2004) Founded in 2001 • Big Data Analytics • Data Warehousing • Business Intelligence • Strategic Data Ecosystems Focused Expertise
  • 3. Implementation Expertise & Offerings Strategic Roadmap/ Assessment/Consulting Database BI/Visualization/ Analytics Master Data Management Big Data Analytics Storm
  • 4. WHY NOSQL MATTERS… FOR ANALYTICS Elliott Cordo Principal Consultant, Caserta Concepts
  • 5. NOSQL So what? • NoSQL is one of the most exciting movements in BIG DATA. • NoSQL is changing the way a lot of people think about application development  especially analytic applications • Not all data is efficiently stored or processed in a relational DB. • High data volumes • Data does not fit or require relational model • We have new tools in our arsenal for processing, storing, and analyzing data with these new challenges
  • 6. But we love SQL • Relational databases still have their place • Flexible • Rich Query Syntax • THEY HAVE JOINS AND AGGREGATION!! • The relational DB is great at being general purpose! • You build nice normalized structure, establish the logical relationships and then you can build any query you need for your application. • This has kept us happy in the Data Warehousing (and app-dev) world for decades!
  • 7. Scale and Performance Performance: • Relational databases have a lot of features, overhead that we in many cases don’t need. Scale out: • Most relational databases scale vertically giving them limits to how large they can get. Federation and Sharding is an awkward manual process. • Most NoSQL scale horizontally on commodity hardware
  • 8. But what will we sacrifice? • Query Features: • NoSQL DB’s have fairly simple query languages. Limited or no support for the following outside of map reduce: • Joins • Aggregation Why? - NoSQL databases were born to be high performance. Data is stored as it is to be used (tuned to a query) rather than modeled around entities. So a sophisticated query language is not needed. • BI and ETL tool support limited
  • 9. So what about NoSQL for Analytics? • NoSQL databases are generally not as flexible as relational databases for ad-hoc questions. • Secondary indexes provide some flexibility but lack of Joins generally requires denormalization • Materialized views: Joins and aggregates can be implemented via Map Reduce  However materializing the world has it’s drawbacks! • A different way of doing things: • Client-side join: query “dimensions” get key  query “fact” on secondary index. • Link walking  leverage metadata link between entities  Riak! • Ad-hoc Map Reduce jobs • Aggregate Navigation  navigate from aggregate entities of different grain • Search! Limited native BI Tool Support!
  • 10. NoSQL can be a great fit for analytic applications!! • High volumes/Low Latency analytic environments • Queries are largely known and can be precomuted in-stream (via application itself or Storm) or in batch using Map Reduce • Sweet spot is very high volumes with relatively static analytic requirements. • Common Design Pattern: • Compute aggregates and events in-line and store to aggregate entities in NoSQL • Write enriched detail records to NoSQL or Hadoop for further processing RDBMS NoSQL Volume QueryFlexibility
  • 11. BIG ETL • One of the promises of BIG DATA is being able to enrich, process, enormous data volumes • The processing engines: • Storm  inline, real-time processing • Hadoop  batch processing • NoSQL can play an integral part of this architecture: • Distributed Lookup Cache • Shared State • Queueing Mechanism Storm Topology Data Sources Relational EDW Analytic Data-stores
  • 12. NoSQL databases are like snowflakes or Smurfs.. They are all special and no two are alike! Let’s review the main categories and determine their general fit for analytic applications
  • 13. • Platforms: Riak, Redis • Buckets/Collections are equivalent to a table in a RDMS • Primary unit of storage is a key value pair, the value can be anything ranging from number to a JSON Document. • Key value stores are super fast and simple • Analytic Capabilities: • Although many have very spartan feature sets some platforms like Riak have analytic friendly Links, Tags Metadata, and powerful map reduce capabilities! • Generally writes and reads are ultra-fast  good candidate for a BIG ETL component Key Value
  • 14. • Platforms: Cassandra, HBase • Column families are the equivalent to a table in a RDMS • Primary unit of storage is a column, they are stored contiguously Skinny Rows: Most like relational database. Except columns are optional and not stored if omitted: Wide Rows: Rows can be billions of columns wide, used for time series, relationships, analytics! Analytic Capabilities: Widely used in analytic applications Typical analytic design pattern is to use “Skinny Rows” for detail records, “Wide Rows” for aggregates. Columnar
  • 15. Document • Platforms: MongoDB, CouchDB • Collections are the equivalent to a table in a RDMS • Primary unit of storage is a document { “User" : ”Bobby”, “Email”: bobby@db-lover.com, “Channel”: “Web”, “State”: “NJ” } { “User" : ”Susie”, “Email”: “Susie@sql-enthusiast.com”, “PreferredCategories: [ { Category: “Fashion”, CategoryAdded: “2012-01-01” }, { Category: “Outdoor Equipment”, CategoryAdded: “2013-01-01” } ], “Channel”: In-Store } Analytic Capabilities: most similar to relational in function but requires denormalization. Secondary index support, map reduce. Mongo has a cool new aggregation framework.
  • 16. In the Real World: High Volume Sensor Analytics • Ingestion and analytics of Sensor Data • 6 to 12 BILLION records being ingested daily (average 500k records per second at peek load)! • Ingested data must be stored to disk and highly available • Pre-defined aggregates and event monitors must be near real-time • Ad-hoc query capabilities required on historical data
  • 17. One way to do it.. that worked Storm Cluster Sensor Data d3.js Analytics Hadoop Cluster Low Latency Analytics Atomic data Aggregates Event Monitors • The Kafka messaging system is used for ingestion • Storm is used for real-time ETL and outputs atomic data and derived data needed for analytics • Redis is used as a reference data lookup cache • Real time analytics are produced from the aggregated data. • Higher latency ad-hoc analytics are done in Hadoop using Pig and Hive Kafka
  • 18. Parting Thought Polyglot Persistence – “where any decent sized enterprise will have a variety of different data storage technologies for different kinds of data. There will still be large amounts of it managed in relational stores, but increasingly we'll be first asking how we want to manipulate the data and only then figuring out what technology is the best bet for it.” -- Martin Fowler
  • 19. Contact Elliott Cordo Principal Consultant, Caserta Concepts P: (855) 755-2246 x267 E: elliott@casertaconcepts.com info@casertaconcepts.com 1(855) 755-2246 www.casertaconcepts.com