To view the full webinar, please go to: http://info.datameer.com/Slideshare-Complement-Your-Existing-EDW-with-Hadoop-OnDemand.html
With 40% yearly growth in data volumes, traditional data warehouses have become increasingly expensive and challenging.
Much of today’s new data sources are unstructured, making the structured data warehouse an unsuitable platform for analyses. As a result, organizations now look at Hadoop as a data platform to complement existing BI data warehouses, and a scalable, flexible and cost-effective solution for data storage and analysis.
Join Datameer and Cloudera in this webinar to discuss how Hadoop and big data analytics can help to:
-Get all the data your business needs quickly into one environment
Shorten the time to insight from months to days
Extend the life of your existing data warehouse investments
Enable your business analysts to ask and answer bigger questions
2. View Recording
▪ You can view the recording of this
webinar at:
▪ http://info.datameer.com/SlideshareComplement-Your-Existing-EDW-withHadoop-OnDemand.html
3. About our Speakers
Karen Hsu
– Karen is Senior Director, Product Marketing
at Datameer. With over 15 years of
experience in enterprise software, Karen
Hsu has co-authored 4 patents and worked
in a variety of engineering, marketing and
sales roles.
– Most recently she came from Informatica
where she worked with the start-ups
Informatica purchased to bring data quality,
master data management, B2B and data
security solutions to market.
– Karen has a Bachelors of Science degree
in Management Science and Engineering
from Stanford University.
4. About our Speakers
Jeff Bean
– Jeff Bean has been at Cloudera since
2010. He's helped several of Cloudera's
most important customers and partners
through their adoptions of Hadoop and
HBase, including cluster sizing,
deployment, operations, application
design, and optimization. "
– Jeff has also spent time on Cloudera's
training team, where he focused on
partner enablement, training hundreds of
field personnel in Hadoop, it's usage, and
it's position in the market. Jeff currently
does partner engineering at Cloudera,
where he handles field support,
certifications, and joint engagements with
partners such as Datameer. "
6. Agenda
• Why optimize?
• What to optimize?
• How to optimize?
• Who has optimized already?
• Conclusion
7. Data Has Changed in the Last 30 Years
DATA GROWTH
END-USER
APPLICATIONS
THE INTERNET
MOBILE DEVICES
SOPHISTICATED
MACHINES
UNSTRUCTURED DATA – 90%
STRUCTURED DATA – 10%
1980
2013
8. EDW Expansion: A Vicious Cycle
§ Increasing
numbers
of
users
§ Growing
volumes
of
data
§ Addi7onal
data
sources
§ New
use
cases
Degraded
quality
of
service
and
inability
to
meet
SLAs
§ Constant
pressure
to
purchase
addi7onal
capacity
§
Enterprise
Data
Warehouse
9. Hadoop vs. Data Warehouse:
Freeing up Capacity for High Value Workloads
Today
All
growth
accommodated
by
incremental
investment
in
DW
100
TB
100%
Data
Growth
Data
Warehouse
$20,000
-‐
$100,000
/
TB
11
100
TB
+
100
TB
More
Capacity
in
Data
Warehouse
Incremental
Spend:
$2
to
$10
Million
10. Hadoop vs. Data Warehouse:
Freeing up Capacity for High Value Workloads
Future
Hadoop
offloads
data
and
workloads
to
defer/avoid
incremental
spend
and
reduce
data
management
TCO
100
TB
Lower
Value
Data
High
Value
Data
Keep
the
Right
Data
in
the
Data
Warehouse
System
• Opera7onal
Analy7cs
• Repor7ng
• Business
Analy7cs
50
TB
100
TB
Cloudera
/
Datameer
(Total
Cost
of
Cluster)
$1,000
-‐
$2,000
/
TB
50
TB
Incremental
Spend:
$240,000-‐
$300,000
ACV
Use
Hadoop
for
Everything
Else
Savings:
$1.85
to
9.8
MM
• Historical
Data
• Data
Processing
• Ad
Hoc
Exploratory
• Transforma7on
/
Batch
• Data
Hub
11. Agenda
• Why optimize?
• What to optimize?
• How to optimize?
• Who has optimized already?
• Conclusion
12. Assessing Workloads and Data
Data Warehouse
WORKLOADS
Analytics
Self-Service BI
Operational Business
Intelligence
▪ Data Processing (ELT)
– Staged data, to be processed
– Temp tables, BLOB/CLOB types, …
▪ Analytics / Machine
Data Processing (ELT)
Learning
DATA
– Deep and broad data sets, within
and beyond the warehouse
Operational
Data
Archival Data
Staged Data
14
▪ Self-Service BI (Ad-Hoc
Query)
– Operational data, actively used for BI
– Archival data, inactively used for BI
13. Offload Data Processing (ELT)
What?
Key Capabilities
Integrate any type of data with pre-built connectors
High-scale batch data
processing
High availability, disaster recovery, downtime-less upgrades
Low-latency SQL processing
Benefits of Cloudera and Datameer
Over 2X the performance at 1/10th the cost
96% reduction in ETL time
15
14. Offload Analytics / Machine Learning
What?
Training & scoring
predictive models
Deep and broad data sets
Key Capabilities
Drag-and-drop Data Mining and Machine Learning for a
business analyst
Automated support for Clustering, Recommendations,
Decision Tree, and Column Dependencies
Ability to run SAS, R natively on the same cluster
Benefits of Cloudera and Datameer
Greater flexibility at 1/10th the cost
Expand data mining and machine learning to analysts
15. Offload Self-Service Business
Intelligence
Workload
Key Capabilities
Self-Service BI,
Exploratory BI,
Data Discovery
250+ prebuilt analytics functions
Unknown Questions
Open source interactive SQL
Transparency and governance
Benefits of Cloudera and Datameer
Better flexibility at 1/10th the cost
Reduce analysis time from 4 weeks to 3 days
16. Complementing the Data Warehouse
Data Warehouse
Enterprise
Applications
(High $/Byte)
Load
OLTP
ETL
Archive
CLOUDERA / DATAMEER
Analyze
Integrate
Vis
Batch
Process
Storage
19
Operational BI
Query
Search
Business
Intelligence
Archival Data,
Exploration,
Analytics
17. Agenda
• Why optimize?
• What to optimize?
• How to optimize?
• Who has optimized already?
• Conclusion
24. Role
Responsibilities
Admin
Set up and maintain environment
Business Analyst
Work with partners to define
requirements and define goals
Deployment Team
Set up monitoring and
scheduling
ETL Architect
Prepare and cleanse data
25. Roles Mapped to Process!
Define
BA
Define goals, results, sources, requirements
Integrate
Admin
Source data, secure for ad hoc
Prepare &
Analyze
BA /
Arch.
Cleanse, combine, enrich data
Create analysis
Visualize
BA
Create infographics, dashboards
Deploy
Admin /
Deploy.
Team
Business: Validate with end users
Technical: Secure, monitor schedule
28. HELLO
my name is
Identify $2B in fraudulent
transactions
$5.15
$3.95
$4.10
$4.15
$4.55
$3.22
greg
7-ELEVEN
POS Reports
Location Data
Transactions
Authorizations
29. Structured
Logs
ImproveDoubling in size every
customer service,
Network
Data development, sales
15 months
Unstructured
Logs
111001
110010
01101001
01100100
10011101
01101110
34. EDW Optimization
Enterprise Data Warehouse
Discover fraud in
less time – from 2
days to 2 hours,
save $30M on DR
Avoid tens of
millions in
expansion
purchases
Offload 90% of all
data
Shrank EDW
footprint by 4PB,
20x performance
boost
35. Call to Action
▪ ROI and Solution Development
Consultation
▪ Join us at Hadoop World
▪ Contacts
– Jeff Bean jwfbean@cloudera.com
– Karen Hsu khsu@datameer.com