This document discusses offloading ETL workloads from data warehouses to Hadoop. It provides an overview of Bitwise, an ISO-certified company that provides ETL and data quality services. It also describes Driven, a platform for building, running, and managing big data applications. Driven provides visibility into data pipelines, monitors application performance, and enables collaboration around operational issues. It stores metadata about application telemetry in a scalable and searchable manner to provide end-to-end operational visibility for Hadoop applications.
2. ABOUT CONCURRENT
2
TRUSTED
by over 10,000
companies as their big
data app platform
BACKED
by top Silicon Valley
investors True Ventures,
Rembrandt VP, Bain
Capital
FOUNDED
in 2008, with
headquarters in San
Francisco
3. •Founded in
1995
•HQ in
Chicago, IL
•Offices in India
& Australia
•ISO 9001:2008
& ISO
27001:2005
Certified
Most
experienced
data
professionals
Proprietary
frameworks and
accelerators for
guaranteed, efficient
and cost-effective
services for data
projects
ABOUT BITWISE
3
8. DRIVEN PROVIDES OPERATIONAL READINESS TOETL WORKLOADS
PERFORMANCE MANAGEMENT FOR
BIG DATA APPLICATIONS
higher quality
big data apps
BUILD
big data apps
more reliably
RUN
big data apps
more effectively
MANAGE
9. BUILDHIGHER QUALITY BIGDATA APPS
9
SOURCES OPERATIONS
(Functions, filters, joins, and aggregators)
RESULTS
Fully visualize your entire data pipeline Quickly and easily identify execution errors
10. 10
BUILDHIGHER QUALITY BIGDATA APPS
Fully visualize your entire data pipeline Quickly and easily identify execution errors
11. RUN BIGDATA APPS MORE RELIABLY
11
CURRENTLY EXECUTING
Watch your apps execute in real time
Easily detect apps that violate SLA’s and
policies
Pinpoint bottlenecks and identify causes
12. RUN BIGDATA APPS MORE RELIABLY
12
Pinpoint bottlenecks and
identify causes
EXECUTING WAITING
Watch your apps execute in real time
Easily detect apps that violate SLA’s and
policies
Pinpoint bottlenecks and identify causes
DETAILED MAPPER/REDUCER STATS
13. RUN BIGDATA APPS MORE RELIABLY
13
Pinpoint bottlenecks and
identify causes
Watch your apps execute in real time
Easily detect apps that violate SLA’s and
policies
Pinpoint bottlenecks and identify causes
For example, see metrics for all apps
on the production cluster that failed to
execute in under 5 minutes…
…or all applications that use more than
their allotment of mappers
14. MANAGE BIGDATA APPSMORE EFFECTIVELY
14
See how all apps consume resources as they run
Compare performance, resource consumption, and other metrics
across departments, teams and any segment you define
15. MANAGE BIGDATA APPSMORE EFFECTIVELY
15
See how all apps consume resources as they run
Segment performance by team, by department or custom tags for
role-based views, chargeback models, and capacity planning
For example, see performance of all apps
owned by the DevOps team
Marketing Sales Compliance Data science team QA cluster Production cluster
16. MANAGE BIGDATA APPSFOR COMPLIANCE
16
Visualize Lineage – See exactly how each app ingests, manipulates
and outputs data
Further inspect lineage by detecting apps that write to, or read from, a
given dataset
SOURCES OPERATIONS
(Functions, filters, joins, and aggregators)
RESULTS
17. MANAGE BIGDATA APPSFOR COMPLIANCE
17
Visualize Lineage – See exactly how each app ingests, manipulates
and outputs data
Further inspect lineage by detecting apps that write to, or read from, a
given dataset
For example, show all apps that interact
with the dataset in “rain.txt”
18. MANAGE BIGDATA APPSFOR COLLABORATION
18
Create JIRA issues with views and data for quickly collaborating to
resolve performance problems
Integrate alerts with popular notification platforms like HipChat,
PagerDuty, & Nagios
With one click, create a Jira issue with a
link to this view
19. MANAGE BIGDATA APPSFOR COLLABORATION
19
Create JIRA issues with views and data for quickly collaborating to
resolve performance problems
Integrate alerts with popular notification platforms like HipChat,
PagerDuty, & Nagios
Automatically send app status
notifications via webhooks or JMX
20. NURTUREACULTUREOFOPERATIONALEXCELLENCENURTUREACULTUREOFOPERATIONALEXCELLENCE
“The coolest part about Driven
is being able to visualize data
pipelines and inspect
components in real time for
easy troubleshooting and
optimization. I don't know of
any other tool that's close in
functionality.”
- Neville Li
Software Engineer, Spotify
20
”Driven has given us a way to
monitor the performance of
our data-driven applications in
a manner which is visually
intuitive to both engineering and
business users.”
- Joao Vicente
Performance Architect Dun &
Bradstreet
21. End-to-end operational telemetry metadata for big data applications
Accessible via Web browser, command-line interface (CLI), or simple search queries
Easy integrations through JMX and upcoming Driven SDK
…THROUGH ASCALABLE, SEARCHABLE METADATA STORE
Telemetry metadata
(SSL)
YARNYARN
HADOOP APPS AND INFRASTRUCTURE
APPLICATIONS
Plugin
21
HADOOP CLUSTERS
WARfiles
Web App
Server
Server
Web CLI JMX
Web App
Server
CEO
Improved customer & brand experience
New product channels
Enhanced operational visibility
Enhanced enforcement of compliance & regulations
CIO
Developing a business case for Hadoop; developing ROI metrics
Developing an inventory of data assets on Hadoop
Promote data re use & ensure integrity of data feeds
Ensure infrastructure governance
CDO
Ensure continuity of data best practices on Hadoop
Develop & enforce regulatory protocols
Promote principles of data library
C_SUITE
Monetize new know-how for service offerings
Create better customer experiences
Increase mind- & wallet-share