Datameer
and
Azure
Analyze Big Data
on the Cloud
Big Data Warehousing:
May 10, 2016
Today’s Topic:
Big Data Analytics on the Cloud
Presented by:
Presenters:
Joe Caserta
President
Caserta Concepts
Nikhil Kumar
Sr. Solutions Engineer
Datameer
Stefan Groschupf
CEO & Chairman
Datameer
James Serra
Data Platform Solution
Architect
Microsoft
Agenda
6:30 Networking
Grab some food and drink... Make some friends.
6:50 Joe Caserta
President, Caserta Concepts
Welcome + Intro to BDW Meetup
About the Meetup. Why MDM needs Graph now.
7:15 Nikhil Kumar
Sr. Solutions Engineer, Datameer
Solution use cases and technical demonstration
7:45 Stefan Groschupf
CEO & Chairman, Datameer
The evolving Hadoop-based analytics trends and
the role of cloud computing
8:15 James Serra
Data Platform Solution Architect, Microsoft
Benefits of the Azure Cloud Service
8:45 Q&A Ask Questions, Share your experience
About Caserta Concepts
• Consulting Data Innovation and Modern Data Engineering
• Award-winning company
• Internationally recognized work force
• Strategy, Architecture, Implementation, Governance
• Innovation Partner
• Strategic Consulting
• Advanced Architecture
• Build & Deploy
• Leader in Enterprise Data Solutions
• Big Data Analytics
• Data Warehousing
• Business Intelligence
• Data Science
• Cloud Computing
• Data Governance
Amazon Best Sellers
Most popular products based on sales.
Updated hourly.
Partners
Awards & Recognition
Caserta Innovation Lab (CIL)
• Internal laboratory established to test & develop solution concepts and ideas
• Used to accelerate client projects
• Examples:
• Search (SOLR) based BI
• Big Data Governance Toolkit / Data Quality Sub-System
• Text Analytics on Social Network Data
• Continuous Integration / End-to-end streaming
• Recommendation Engine Optimization
• MDM / Relationship Intelligence / Spark Graph
Speak with us about our open positions: leslie@casertaconcepts.com
We’re Hiring!
Does this word cloud excite you?
Spark
Big Data Architect
NoSQL
Cloud Computing
We’re Experiencing a Paradigm Shift
OLD WAY:
• Structure  Ingest  Analyze
• Fixed Capacity
• Monolith
NEW WAY:
• Ingest  Analyze  Structure
• Dynamic Capacity
• Ecosystem
RECIPE:
• Cloud
• Data Lake
• Polyglot Warehouse
Move to the Cloud
Existing On-Premise Solution
• Challenges with operations of Hadoop servers in Data Center
• Increasing infrastructure complexity
• Keeping up with data growth
Cloud Advantages
• Reduced upfront capital investment
• Faster speed to value
• Elasticity “Those that go out and buy expensive
infrastructure find that the problem scope
and domain shift really quickly. By the time
they get around to answering the original
question, the business has moved on.” -
Matt Wood, AWS
Cloud Market Share
Cost savings of dynamic capacity
Elasticity not only saves money
Essentially, Servers Suck
But more importantly think Infrastructure
as code
• Your servers should be API calls
• Use stateless processes
• Make all resources ephemeral
• Make everything scalable and elastic!
Hadoop on the the Cloud Makes Sense
Hadoop on Demand
•Bootstrap whatever processing engine makes sense
• Programmatically estimate instance type and cluster size
You May Need Some Persistent Servers
If at all possible they should be inherently scalable,
distributed, and elastic
Ingest Raw
Data
Organize, Define,
Complete
Munging, Blending
Machine Learning
Data Quality and Monitoring
Metadata, ILM , Security
Data Catalog
Data Integration
Fully Governed ( trusted)
Arbitrary/Ad-hoc Queries
and Reporting
Usage Pattern Data Governance
Metadata, ILM,
Security
The Big Data Pyramid
Big Data Pyramid on Azure
Thank You / Next up
Joe Caserta
President, Caserta Concepts
joe@casertaconcepts.com
@joe_Caserta
Coming up:

Big Data Analytics on the Cloud

  • 1.
    Datameer and Azure Analyze Big Data onthe Cloud Big Data Warehousing: May 10, 2016 Today’s Topic: Big Data Analytics on the Cloud Presented by: Presenters: Joe Caserta President Caserta Concepts Nikhil Kumar Sr. Solutions Engineer Datameer Stefan Groschupf CEO & Chairman Datameer James Serra Data Platform Solution Architect Microsoft
  • 2.
    Agenda 6:30 Networking Grab somefood and drink... Make some friends. 6:50 Joe Caserta President, Caserta Concepts Welcome + Intro to BDW Meetup About the Meetup. Why MDM needs Graph now. 7:15 Nikhil Kumar Sr. Solutions Engineer, Datameer Solution use cases and technical demonstration 7:45 Stefan Groschupf CEO & Chairman, Datameer The evolving Hadoop-based analytics trends and the role of cloud computing 8:15 James Serra Data Platform Solution Architect, Microsoft Benefits of the Azure Cloud Service 8:45 Q&A Ask Questions, Share your experience
  • 3.
    About Caserta Concepts •Consulting Data Innovation and Modern Data Engineering • Award-winning company • Internationally recognized work force • Strategy, Architecture, Implementation, Governance • Innovation Partner • Strategic Consulting • Advanced Architecture • Build & Deploy • Leader in Enterprise Data Solutions • Big Data Analytics • Data Warehousing • Business Intelligence • Data Science • Cloud Computing • Data Governance Amazon Best Sellers Most popular products based on sales. Updated hourly.
  • 4.
  • 5.
  • 6.
    Caserta Innovation Lab(CIL) • Internal laboratory established to test & develop solution concepts and ideas • Used to accelerate client projects • Examples: • Search (SOLR) based BI • Big Data Governance Toolkit / Data Quality Sub-System • Text Analytics on Social Network Data • Continuous Integration / End-to-end streaming • Recommendation Engine Optimization • MDM / Relationship Intelligence / Spark Graph
  • 7.
    Speak with usabout our open positions: leslie@casertaconcepts.com We’re Hiring! Does this word cloud excite you? Spark Big Data Architect NoSQL Cloud Computing
  • 10.
    We’re Experiencing aParadigm Shift OLD WAY: • Structure  Ingest  Analyze • Fixed Capacity • Monolith NEW WAY: • Ingest  Analyze  Structure • Dynamic Capacity • Ecosystem RECIPE: • Cloud • Data Lake • Polyglot Warehouse
  • 11.
    Move to theCloud Existing On-Premise Solution • Challenges with operations of Hadoop servers in Data Center • Increasing infrastructure complexity • Keeping up with data growth Cloud Advantages • Reduced upfront capital investment • Faster speed to value • Elasticity “Those that go out and buy expensive infrastructure find that the problem scope and domain shift really quickly. By the time they get around to answering the original question, the business has moved on.” - Matt Wood, AWS
  • 12.
  • 13.
    Cost savings ofdynamic capacity
  • 14.
  • 15.
    Essentially, Servers Suck Butmore importantly think Infrastructure as code • Your servers should be API calls • Use stateless processes • Make all resources ephemeral • Make everything scalable and elastic!
  • 16.
    Hadoop on thethe Cloud Makes Sense Hadoop on Demand •Bootstrap whatever processing engine makes sense • Programmatically estimate instance type and cluster size
  • 17.
    You May NeedSome Persistent Servers If at all possible they should be inherently scalable, distributed, and elastic
  • 18.
    Ingest Raw Data Organize, Define, Complete Munging,Blending Machine Learning Data Quality and Monitoring Metadata, ILM , Security Data Catalog Data Integration Fully Governed ( trusted) Arbitrary/Ad-hoc Queries and Reporting Usage Pattern Data Governance Metadata, ILM, Security The Big Data Pyramid
  • 19.
  • 20.
    Thank You /Next up Joe Caserta President, Caserta Concepts joe@casertaconcepts.com @joe_Caserta Coming up: