Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing with Apache's Hadoop and Spark

© 2015 Nebula, Inc. All rights reserved.
© 2015 Scalr, Inc. All rights reserved.
(cloud) Computing for the Enterprise
Increasing Business Agility
with Real-time Processing using Apache
Hadoop and Spark
Powered by

Agenda
• Big Data and Real-time
Processing
– Use cases
– Why Hadoop and Spark?
– What’s required?
• Successfully Designing an
Elastic Compute
Infrastructure
• Solutions Demo
– Hadoop and Spark, powered by
Nebula and Scalr
Huy Nguyen
Sr. Director, Product
Marketing
Thomas Orozco
Product Manager
Presenters

Evolution of Big Data and its Impact
• Businesses are pressed to operate in real-time
for competitive edge
• Mere minutes can make the difference between
a brilliantly handled crisis and a full-blown
social media disaster
• User, machine, or sensor generated data must
be processed in real-time
• Weekly reports, scheduled jobs, and batch
reporting alone are no longer solutions
• Data after-the-fact is losing competitive
advantages
• Data is more relevant to the business if it’s
“fresh data”
• Ability to act right now as things are happening

Batch Processing and Real-time Processing: It’s all about ‘now’
Batch Processing
Acting on
“Data at Rest”
Real-time Processing
Acting on
“Data in Motion”
Static Infrastructure Requires an Elastic Infrastructure
ComputeCompute Compute

Uses for Real-time, Stream Processing
IT Management:
Log processing, analysis, and log driven alerting, infrastructure fault
protection, intelligence and surveillance, fraud detection, etc…
Brand Management and Customer Engagement:
Sentiment analysis, data mining on social media streams and user-
generated content, algorithmic trading, geospatial location , etc…
Conversion Optimization:
Clickstream analysis and real-time targeted offer generation

Why use Hadoop + Spark for Real-Time Processing?
Plenty of alternatives exist:
• Mesos (+ Spark), Storm, Message Queue (+ custom processing tier)
Hadoop + Spark stack offers unique benefits:
• Familiar and high-level API (HDFS distributed storage abstraction, YARN scheduling…
and rescheduling).
• Integrates naturally with traditional batch jobs (e.g. process log streams in real-time to
flag high-priority events, and run traditional map-reduce jobs on them later on).

What’s Required: The Move from Batch Processing to Real-time Processing
Hadoop YARN & Apache Spark: Builds processing workflows that parse, categorize, and
score information in real-time
Hadoop evolved from being “MapReduce
+ HDFS” to “YARN + HDFS”
YARN is used to distribute tasks across a
set of computing nodes — regardless of
whether these tasks are batch, interactive,
or real-time data access
Apache Spark, a cluster-computing platform
that supports real-time, streaming workloads,
backed by the robust HDFS storage engine

Big Data
Storage
Compute
Decouple the compute tier from
storage tier for real-time processing
• Dynamically scaling the storage tier would
result in major inefficiencies or data loss
Processing
Tier
Processing tier (application and
infrastructure) must be able to “auto
scale” compute resources as the
volume, velocity, and variety of big data
increases
What’s Required: Decoupling the Compute/Storage Tier & Auto-scaling

Suggested Architecture for Real-time Big Data Processing
A
Hadoop Compute Tier (YARN)
• One resource manager
• One history server
• Multiple node managers
B
Hadoop Storage Tier (HDFS)
• One name node
• Multiple data nodes
BA
C
Client Nodes
• Dispatch real-time data
processing jobs
C
D Intelligent Cloud Mgmt
Platform from Scalr
• Orchestration and auto-
scaling of applicationsD
E Turnkey Private Cloud
Infrastructure from
Nebula
• Elastic, on-demand cloud
computing infrastructure
E

INTRODUCTION TO NEBULA

Nebula Turnkey Private Cloud
Fastest path to OpenStack
Nebula productizes OpenStack in a highly cost-efficient, fast
time-to-value, secure and scalable enterprise-class product
Cost-efficient: Software delivered using appliance with off-the-
shelf industry standard servers and storage – freedom of choice
Fast time-to-value: Curated OpenStack (rack integration or multi-
rack integration), enabling customers/partners to spend their
resources building applications, not building infrastructure
Open, Secure & Scalable: Identical clouds to deliver consistent
and predictable performance with open connectors for turnkey
eco-system
Enterprise-class: Highly available with connectors to existing
enterprise workflows & architecture (identity, storage, networking)
for zero disruption to IT

Nebula Turnkey Private Cloud

DevOPs / DevTest
Workloads
Genome Sequencing
Workloads
Big Data / Real-time
Workloads
Media Rendering
Workloads
Self-Service IT
Process Improvements API / Integration
Cosmos Software
StorageCompute Network
Management & Orchestration
Identity/Security
Active Directory
Identity
Storage
Networking
VLANs
Enterprise
Intergration
The Only Enterprise-ready,
Turnkey Solution for OpenStack Private Clouds

Traditional Infrastructure
Fixed Compute, Storage, Network
Private Cloud
Shared Resource Pool
•As real-time data feeds increase,
YARN tier can be provisioned to
scale-out across multiple servers
•As data feeds decrease,
resources can be de-provisioned
and returned to the shared pool
•Nebula enables resource pooling
of compute, storage, network
services for scale-out readiness
YARN Tier w/
Spark
YARN Tier w/
Spark
YARN Tier w/
Spark
YARN Tier w/
Spark
YARN Tier w/
Spark
YARN Tier w/
Spark
Auto-scaling with Nebula and Scalr

INTRODUCTION TO SCALR

Scalr is used to:
Orchestrate
Resources
Provisioning
Templating
Auto-scaling
…
Define and Enforce
Policies
Lease Management
Network Policies
RBAC
…
Centrally
Manage Clouds
Multi-Cloud
Cost Analytics
SSO, CMDB, ITSM
integrations
…

Scalr is trusted by:

SOLUTIONS DEMO

www.nebula.com or www.scalr.com
Nebula’s turnkey private cloud and Scalr’s intelligent Cloud
Management Platform meet these demands by delivering
an orchestrated infrastructure that can auto scale compute
and storage resources on-demand to process data feeds in
real-time
Summary
Emergent big data technology such as Hadoop YARN and
Apache Spark can build processing workflows that parse,
categorize, and score information in real-time
Data processing tiers (from application
to infrastructure) must be able to auto-
scale to accommodate the 3 Vs of Big
Data
For more information:
Businesses need to operate in
real-time to maintain competitive
edge

Benefits to Real-Time Processing
React to changing business conditions in real time
• Adapt and react quickly to data, market conditions and events happening in the
outside world
Faster time-to-market
• Development and deployment
Delivering the best user experience
• Personalized experience

THANK YOU

Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing with Apache's Hadoop and Spark

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing with Apache's Hadoop and Spark

Similar to Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing with Apache's Hadoop and Spark (20)

Recently uploaded

Recently uploaded (20)

Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing with Apache's Hadoop and Spark