SlideShare a Scribd company logo
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
(cloud) Computing for the Enterprise
Increasing Business Agility
with Real-time Processing using Apache
Hadoop and Spark
Powered by
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Agenda
• Big Data and Real-time
Processing
– Use cases
– Why Hadoop and Spark?
– What’s required?
• Successfully Designing an
Elastic Compute
Infrastructure
• Solutions Demo
– Hadoop and Spark, powered by
Nebula and Scalr
Huy Nguyen
Sr. Director, Product
Marketing
Thomas Orozco
Product Manager
Presenters
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Evolution of Big Data and its Impact
• Businesses are pressed to operate in real-time
for competitive edge
• Mere minutes can make the difference between
a brilliantly handled crisis and a full-blown
social media disaster
• User, machine, or sensor generated data must
be processed in real-time
• Weekly reports, scheduled jobs, and batch
reporting alone are no longer solutions
• Data after-the-fact is losing competitive
advantages
• Data is more relevant to the business if it’s
“fresh data”
• Ability to act right now as things are happening
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Batch Processing and Real-time Processing: It’s all about ‘now’
Batch Processing
Acting on
“Data at Rest”
Real-time Processing
Acting on
“Data in Motion”
Static Infrastructure Requires an Elastic Infrastructure
ComputeCompute Compute
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Uses for Real-time, Stream Processing
IT Management:
Log processing, analysis, and log driven alerting, infrastructure fault
protection, intelligence and surveillance, fraud detection, etc…
Brand Management and Customer Engagement:
Sentiment analysis, data mining on social media streams and user-
generated content, algorithmic trading, geospatial location , etc…
Conversion Optimization:
Clickstream analysis and real-time targeted offer generation
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Why use Hadoop + Spark for Real-Time Processing?
Plenty of alternatives exist:
• Mesos (+ Spark), Storm, Message Queue (+ custom processing tier)
Hadoop + Spark stack offers unique benefits:
• Familiar and high-level API (HDFS distributed storage abstraction, YARN scheduling…
and rescheduling).
• Integrates naturally with traditional batch jobs (e.g. process log streams in real-time to
flag high-priority events, and run traditional map-reduce jobs on them later on).
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
What’s Required: The Move from Batch Processing to Real-time Processing
Hadoop YARN & Apache Spark: Builds processing workflows that parse, categorize, and
score information in real-time
Hadoop evolved from being “MapReduce
+ HDFS” to “YARN + HDFS”
YARN is used to distribute tasks across a
set of computing nodes — regardless of
whether these tasks are batch, interactive,
or real-time data access
Apache Spark, a cluster-computing platform
that supports real-time, streaming workloads,
backed by the robust HDFS storage engine
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Big Data
Storage
Compute
Decouple the compute tier from
storage tier for real-time processing
• Dynamically scaling the storage tier would
result in major inefficiencies or data loss
Processing
Tier
Processing tier (application and
infrastructure) must be able to “auto
scale” compute resources as the
volume, velocity, and variety of big data
increases
What’s Required: Decoupling the Compute/Storage Tier & Auto-scaling
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Suggested Architecture for Real-time Big Data Processing
A
Hadoop Compute Tier (YARN)
• One resource manager
• One history server
• Multiple node managers
B
Hadoop Storage Tier (HDFS)
• One name node
• Multiple data nodes
BA
C
Client Nodes
• Dispatch real-time data
processing jobs
C
D Intelligent Cloud Mgmt
Platform from Scalr
• Orchestration and auto-
scaling of applicationsD
E Turnkey Private Cloud
Infrastructure from
Nebula
• Elastic, on-demand cloud
computing infrastructure
E
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
INTRODUCTION TO NEBULA
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Nebula Turnkey Private Cloud
Fastest path to OpenStack
Nebula productizes OpenStack in a highly cost-efficient, fast
time-to-value, secure and scalable enterprise-class product
Cost-efficient: Software delivered using appliance with off-the-
shelf industry standard servers and storage – freedom of choice
Fast time-to-value: Curated OpenStack (rack integration or multi-
rack integration), enabling customers/partners to spend their
resources building applications, not building infrastructure
Open, Secure & Scalable: Identical clouds to deliver consistent
and predictable performance with open connectors for turnkey
eco-system
Enterprise-class: Highly available with connectors to existing
enterprise workflows & architecture (identity, storage, networking)
for zero disruption to IT
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Nebula Turnkey Private Cloud
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
DevOPs / DevTest
Workloads
Genome Sequencing
Workloads
Big Data / Real-time
Workloads
Media Rendering
Workloads
Self-Service IT
Process Improvements API / Integration
Cosmos Software
StorageCompute Network
Management & Orchestration
Identity/Security
Active Directory
Identity
Storage
Networking
VLANs
Enterprise
Intergration
The Only Enterprise-ready,
Turnkey Solution for OpenStack Private Clouds
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Traditional Infrastructure
Fixed Compute, Storage, Network
Private Cloud
Shared Resource Pool
•As real-time data feeds increase,
YARN tier can be provisioned to
scale-out across multiple servers
•As data feeds decrease,
resources can be de-provisioned
and returned to the shared pool
•Nebula enables resource pooling
of compute, storage, network
services for scale-out readiness
YARN Tier w/
Spark
YARN Tier w/
Spark
YARN Tier w/
Spark
YARN Tier w/
Spark
YARN Tier w/
Spark
YARN Tier w/
Spark
Auto-scaling with Nebula and Scalr
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
INTRODUCTION TO SCALR
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Scalr is used to:
Orchestrate
Resources
Provisioning
Templating
Auto-scaling
…
Define and Enforce
Policies
Lease Management
Network Policies
RBAC
…
Centrally
Manage Clouds
Multi-Cloud
Cost Analytics
SSO, CMDB, ITSM
integrations
…
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Scalr is trusted by:
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
SOLUTIONS DEMO
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
www.nebula.com or www.scalr.com
Nebula’s turnkey private cloud and Scalr’s intelligent Cloud
Management Platform meet these demands by delivering
an orchestrated infrastructure that can auto scale compute
and storage resources on-demand to process data feeds in
real-time
Summary
Emergent big data technology such as Hadoop YARN and
Apache Spark can build processing workflows that parse,
categorize, and score information in real-time
Data processing tiers (from application
to infrastructure) must be able to auto-
scale to accommodate the 3 Vs of Big
Data
For more information:
Businesses need to operate in
real-time to maintain competitive
edge
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
Benefits to Real-Time Processing
React to changing business conditions in real time
• Adapt and react quickly to data, market conditions and events happening in the
outside world
Faster time-to-market
• Development and deployment
Delivering the best user experience
• Personalized experience
© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
THANK YOU

More Related Content

What's hot

What's hot (20)

The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
PaaS or Fail: Rule the Cloud with Altus
PaaS or Fail: Rule the Cloud with AltusPaaS or Fail: Rule the Cloud with Altus
PaaS or Fail: Rule the Cloud with Altus
 
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera

 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
 
Big Data on OpenStack
Big Data on OpenStackBig Data on OpenStack
Big Data on OpenStack
 
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data HubCloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Solr consistency and recovery internals
Solr consistency and recovery internalsSolr consistency and recovery internals
Solr consistency and recovery internals
 
Machine Learning Loves Hadoop
Machine Learning Loves HadoopMachine Learning Loves Hadoop
Machine Learning Loves Hadoop
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
 
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 

Similar to Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing with Apache's Hadoop and Spark

Hybrid Cloud Keynote
Hybrid Cloud Keynote Hybrid Cloud Keynote
Hybrid Cloud Keynote
gcamarda
 
[CON6985]Expanding DBaaS Beyond Data Centers Hybrid Cloud Onboarding via Orac...
[CON6985]Expanding DBaaS Beyond Data Centers Hybrid Cloud Onboarding via Orac...[CON6985]Expanding DBaaS Beyond Data Centers Hybrid Cloud Onboarding via Orac...
[CON6985]Expanding DBaaS Beyond Data Centers Hybrid Cloud Onboarding via Orac...
Bharat Paliwal
 

Similar to Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing with Apache's Hadoop and Spark (20)

Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017
Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017
Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017
 
Valor diferencial de la propuesta cloud
Valor diferencial de la propuesta cloudValor diferencial de la propuesta cloud
Valor diferencial de la propuesta cloud
 
C2 five journeys_to_the_cloud
C2 five journeys_to_the_cloudC2 five journeys_to_the_cloud
C2 five journeys_to_the_cloud
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Hybrid Cloud Keynote
Hybrid Cloud Keynote Hybrid Cloud Keynote
Hybrid Cloud Keynote
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Netherlands Tech Tour 05 - Strategic Operationalization of MySQL
Netherlands Tech Tour 05 - Strategic Operationalization of MySQLNetherlands Tech Tour 05 - Strategic Operationalization of MySQL
Netherlands Tech Tour 05 - Strategic Operationalization of MySQL
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
High-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaHigh-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache Impala
 
Oracle Cloud Café hybrid Cloud 19 mai 2016
Oracle Cloud Café hybrid Cloud 19 mai 2016Oracle Cloud Café hybrid Cloud 19 mai 2016
Oracle Cloud Café hybrid Cloud 19 mai 2016
 
[CON6985]Expanding DBaaS Beyond Data Centers Hybrid Cloud Onboarding via Orac...
[CON6985]Expanding DBaaS Beyond Data Centers Hybrid Cloud Onboarding via Orac...[CON6985]Expanding DBaaS Beyond Data Centers Hybrid Cloud Onboarding via Orac...
[CON6985]Expanding DBaaS Beyond Data Centers Hybrid Cloud Onboarding via Orac...
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
Sesion covergentes 2016
Sesion covergentes 2016Sesion covergentes 2016
Sesion covergentes 2016
 
Five Journeys to (your) Cloud Infrastructure
Five Journeys to (your) Cloud InfrastructureFive Journeys to (your) Cloud Infrastructure
Five Journeys to (your) Cloud Infrastructure
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 

Recently uploaded

How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 

Recently uploaded (20)

First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 

Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing with Apache's Hadoop and Spark

  • 1. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. (cloud) Computing for the Enterprise Increasing Business Agility with Real-time Processing using Apache Hadoop and Spark Powered by
  • 2. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. Agenda • Big Data and Real-time Processing – Use cases – Why Hadoop and Spark? – What’s required? • Successfully Designing an Elastic Compute Infrastructure • Solutions Demo – Hadoop and Spark, powered by Nebula and Scalr Huy Nguyen Sr. Director, Product Marketing Thomas Orozco Product Manager Presenters
  • 3. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. Evolution of Big Data and its Impact • Businesses are pressed to operate in real-time for competitive edge • Mere minutes can make the difference between a brilliantly handled crisis and a full-blown social media disaster • User, machine, or sensor generated data must be processed in real-time • Weekly reports, scheduled jobs, and batch reporting alone are no longer solutions • Data after-the-fact is losing competitive advantages • Data is more relevant to the business if it’s “fresh data” • Ability to act right now as things are happening
  • 4. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. Batch Processing and Real-time Processing: It’s all about ‘now’ Batch Processing Acting on “Data at Rest” Real-time Processing Acting on “Data in Motion” Static Infrastructure Requires an Elastic Infrastructure ComputeCompute Compute
  • 5. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. Uses for Real-time, Stream Processing IT Management: Log processing, analysis, and log driven alerting, infrastructure fault protection, intelligence and surveillance, fraud detection, etc… Brand Management and Customer Engagement: Sentiment analysis, data mining on social media streams and user- generated content, algorithmic trading, geospatial location , etc… Conversion Optimization: Clickstream analysis and real-time targeted offer generation
  • 6. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. Why use Hadoop + Spark for Real-Time Processing? Plenty of alternatives exist: • Mesos (+ Spark), Storm, Message Queue (+ custom processing tier) Hadoop + Spark stack offers unique benefits: • Familiar and high-level API (HDFS distributed storage abstraction, YARN scheduling… and rescheduling). • Integrates naturally with traditional batch jobs (e.g. process log streams in real-time to flag high-priority events, and run traditional map-reduce jobs on them later on).
  • 7. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. What’s Required: The Move from Batch Processing to Real-time Processing Hadoop YARN & Apache Spark: Builds processing workflows that parse, categorize, and score information in real-time Hadoop evolved from being “MapReduce + HDFS” to “YARN + HDFS” YARN is used to distribute tasks across a set of computing nodes — regardless of whether these tasks are batch, interactive, or real-time data access Apache Spark, a cluster-computing platform that supports real-time, streaming workloads, backed by the robust HDFS storage engine
  • 8. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. Big Data Storage Compute Decouple the compute tier from storage tier for real-time processing • Dynamically scaling the storage tier would result in major inefficiencies or data loss Processing Tier Processing tier (application and infrastructure) must be able to “auto scale” compute resources as the volume, velocity, and variety of big data increases What’s Required: Decoupling the Compute/Storage Tier & Auto-scaling
  • 9. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. Suggested Architecture for Real-time Big Data Processing A Hadoop Compute Tier (YARN) • One resource manager • One history server • Multiple node managers B Hadoop Storage Tier (HDFS) • One name node • Multiple data nodes BA C Client Nodes • Dispatch real-time data processing jobs C D Intelligent Cloud Mgmt Platform from Scalr • Orchestration and auto- scaling of applicationsD E Turnkey Private Cloud Infrastructure from Nebula • Elastic, on-demand cloud computing infrastructure E
  • 10. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. INTRODUCTION TO NEBULA
  • 11. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. Nebula Turnkey Private Cloud Fastest path to OpenStack Nebula productizes OpenStack in a highly cost-efficient, fast time-to-value, secure and scalable enterprise-class product Cost-efficient: Software delivered using appliance with off-the- shelf industry standard servers and storage – freedom of choice Fast time-to-value: Curated OpenStack (rack integration or multi- rack integration), enabling customers/partners to spend their resources building applications, not building infrastructure Open, Secure & Scalable: Identical clouds to deliver consistent and predictable performance with open connectors for turnkey eco-system Enterprise-class: Highly available with connectors to existing enterprise workflows & architecture (identity, storage, networking) for zero disruption to IT
  • 12. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. Nebula Turnkey Private Cloud
  • 13. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. DevOPs / DevTest Workloads Genome Sequencing Workloads Big Data / Real-time Workloads Media Rendering Workloads Self-Service IT Process Improvements API / Integration Cosmos Software StorageCompute Network Management & Orchestration Identity/Security Active Directory Identity Storage Networking VLANs Enterprise Intergration The Only Enterprise-ready, Turnkey Solution for OpenStack Private Clouds
  • 14. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. Traditional Infrastructure Fixed Compute, Storage, Network Private Cloud Shared Resource Pool •As real-time data feeds increase, YARN tier can be provisioned to scale-out across multiple servers •As data feeds decrease, resources can be de-provisioned and returned to the shared pool •Nebula enables resource pooling of compute, storage, network services for scale-out readiness YARN Tier w/ Spark YARN Tier w/ Spark YARN Tier w/ Spark YARN Tier w/ Spark YARN Tier w/ Spark YARN Tier w/ Spark Auto-scaling with Nebula and Scalr
  • 15. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. INTRODUCTION TO SCALR
  • 16. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved.
  • 17. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. Scalr is used to: Orchestrate Resources Provisioning Templating Auto-scaling … Define and Enforce Policies Lease Management Network Policies RBAC … Centrally Manage Clouds Multi-Cloud Cost Analytics SSO, CMDB, ITSM integrations …
  • 18. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. Scalr is trusted by:
  • 19. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. SOLUTIONS DEMO
  • 20. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved.
  • 21. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. www.nebula.com or www.scalr.com Nebula’s turnkey private cloud and Scalr’s intelligent Cloud Management Platform meet these demands by delivering an orchestrated infrastructure that can auto scale compute and storage resources on-demand to process data feeds in real-time Summary Emergent big data technology such as Hadoop YARN and Apache Spark can build processing workflows that parse, categorize, and score information in real-time Data processing tiers (from application to infrastructure) must be able to auto- scale to accommodate the 3 Vs of Big Data For more information: Businesses need to operate in real-time to maintain competitive edge
  • 22. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. Benefits to Real-Time Processing React to changing business conditions in real time • Adapt and react quickly to data, market conditions and events happening in the outside world Faster time-to-market • Development and deployment Delivering the best user experience • Personalized experience
  • 23. © 2015 Nebula, Inc. All rights reserved. © 2015 Scalr, Inc. All rights reserved. THANK YOU