This video dives into 7 best practices for how IT organizations can achieve true operational readiness on Hadoop using Driven and Cascading.
For any person, organization or enterprise that is currently involved in planning, deploying or managing a Hadoop infrastructure. Development Teams, IT Ops, Executive Management.
Key Takeaways:
- Connecting execution problems with application context
- Defining and enforcing SLAs
- Understanding inter-app dependencies
- Rationing your cluster
- Tracing data access at the operational level
- Building culture and tools supporting collaboration between developers, operators, & other Hadoop team members
2. 2
TRUSTED
by over 10,000
companies as
their big data
app platform
BACKED
by top Silicon Valley
investors True Ventures,
Rembrandt VP,
Bain Capital
FOUNDED
in 2008, with
headquarters in
San Francisco
ABOUT US
3. As big data applications become
the engine of your data management
strategy, they must meet higher
standards of quality, reliability, and
manageability.
3
WHY NOW?
4. WHY DO WE NEED OPERATIONAL READINESS ON HADOOP?
4
• Perform routine tasks in an automated and predictable manner
• Give consistent answers to customer questions
• Meet new regulatory requirements without significant investment
• Optimize use of cluster
• Improve quality and service of your data platform
5. Cascading Apps
CASCADING - THE STANDARD FOR BIG DATA APP DEVELOPMENT
5
New Fabrics
Clojure
SQL
Ruby
StormTez
System Integration
Mainfram
e
DB / DW
Data
Stores
Hadoo
p
In-Memory
• Write apps in
programming language
of choice
• Decouple application
logic from integration
• Future-proof your app to
run on many compute
fabrics
Cascading is the proven platform for building and deploying big data
applications on Hadoop with 10,000+ production deployments
6. End-to-end operational telemetry metadata for big data applications
Accessible via Web browser, command-line interface (CLI), or simple search queries
Easy integrations through JMX and upcoming Driven SDK
6
DRIVEN PROVIDES OPERATIONAL VISIBILITY TO YOUR HADOOP APPS
WARfiles
Web App
Server
Telemetry
metadata
(SSL)
Server
Web App
Server
HADOOP CLUSTERS
HADOOP APPS AND INFRASTRUCTURE
APPLICATIONS
YARN
Plugin
Web CLI JMX
8. 8
Marketing Sales Compliance
MANAGE BIG DATA APPS MORE EFFECTIVELY#1 - MONITOR THE FLEET, NOT THE VEHICLE
Data science team QA cluster Production cluster
By department
By team/cluster
Compare performance, resource consumption,
and other metrics across departments, teams,
and any other segment you define
9. 9
MANAGE BIG DATA APPS MORE EFFECTIVELY#2 - UNDERSTAND INTER-APP DEPENDENCIES
See how all apps consume
resources as they run
10. 10
BUILD HIGHER QUALITY BIG DATA APPS#3 - PROMOTE A CULTURE OF COLLABORATION
Enable multiple teams to
cooperate on what went wrong
and how to fix it
11. 11
BUILD HIGHER QUALITY BIG DATA APPS
RESUTS
Quickly and easily identify
execution errors without
parsing log files
Error report
#4 - SHARE APPLICATION CONTEXT AROUND ERRORS
12. 12
Pinpoint bottlenecks and
identify causes
#5 - TUNE THE APP BEFORE CHANGING YOUR INFRASTRUCTURE
Is the problem your
code, your data, or
your hardware?
Execution Waiting
13. 13
BUILD HIGHER QUALITY BIG DATA APPS
SOURCES OPERATIONS
(Hash joins, Hive queries, MapReduce jobs, etc.)
RESULTS
#6 - TRACE DATA FLOW AT AN OPERATIONAL LEVEL
Fully visualize your entire
data pipeline
Need reliable, reusable tooling to quickly build and consistently deliver data products
Need the degrees of freedom to solve problems ranging from simple to complex with existing skill sets
Need the flexibility to easily adapt an application to meet business needs (latency, scale, SLA), without having to rewrite the application
Need operational visibility for entire data application lifecycle