Complement Your Existing Data Warehouse with Big Data & Hadoop


Published on

To view the full webinar, please go to:

With 40% yearly growth in data volumes, traditional data warehouses have become increasingly expensive and challenging.

Much of today’s new data sources are unstructured, making the structured data warehouse an unsuitable platform for analyses. As a result, organizations now look at Hadoop as a data platform to complement existing BI data warehouses, and a scalable, flexible and cost-effective solution for data storage and analysis.

Join Datameer and Cloudera in this webinar to discuss how Hadoop and big data analytics can help to:

-Get all the data your business needs quickly into one environment
Shorten the time to insight from months to days
Extend the life of your existing data warehouse investments
Enable your business analysts to ask and answer bigger questions

Published in: Technology, Business
  • Be the first to comment

Complement Your Existing Data Warehouse with Big Data & Hadoop

  1. 1. Complement Your Existing 
 Data Warehouse with 
 Big Data & Hadoop © 2013 Datameer, Inc. All rights reserved.
  2. 2. View Recording ▪  You can view the recording of this webinar at: ▪
  3. 3. About our Speakers Karen Hsu –  Karen is Senior Director, Product Marketing at Datameer. With over 15 years of experience in enterprise software, Karen Hsu has co-authored 4 patents and worked in a variety of engineering, marketing and sales roles. –  Most recently she came from Informatica where she worked with the start-ups Informatica purchased to bring data quality, master data management, B2B and data security solutions to market.  –  Karen has a Bachelors of Science degree in Management Science and Engineering from Stanford University.  
  4. 4. About our Speakers Jeff Bean –  Jeff Bean has been at Cloudera since 2010. He's helped several of Cloudera's most important customers and partners through their adoptions of Hadoop and HBase, including cluster sizing, deployment, operations, application design, and optimization. " –  Jeff has also spent time on Cloudera's training team, where he focused on partner enablement, training hundreds of field personnel in Hadoop, it's usage, and it's position in the market. Jeff currently does partner engineering at Cloudera, where he handles field support, certifications, and joint engagements with partners such as Datameer. "
  5. 5. How Big Data Analytics and Hadoop Complement Your Existing Data Warehouse Jeff Bean, Cloudera Karen Hsu, Datameer © 2013 Datameer, Inc. All rights reserved.
  6. 6. Agenda •  Why optimize? •  What to optimize? •  How to optimize? •  Who has optimized already? •  Conclusion
  8. 8. EDW Expansion: A Vicious Cycle §  Increasing   numbers   of  users   §  Growing   volumes   of  data   §  Addi7onal   data   sources   §  New  use   cases   Degraded   quality  of   service  and   inability  to  meet   SLAs   §  Constant   pressure  to   purchase   addi7onal   capacity     §  Enterprise Data Warehouse
  9. 9. Hadoop vs. Data Warehouse:
 Freeing up Capacity for High Value Workloads Today   All  growth  accommodated  by  incremental  investment   in  DW   100  TB   100%     Data  Growth   Data  Warehouse   $20,000  -­‐  $100,000  /  TB   11   100  TB   +   100  TB   More  Capacity  in  Data   Warehouse   Incremental  Spend:  
 $2  to  $10  Million  
  10. 10. Hadoop vs. Data Warehouse:
 Freeing up Capacity for High Value Workloads Future
 Hadoop  offloads  data  and  workloads  to  defer/avoid   incremental  spend  and  reduce  data  management  TCO   100   TB   Lower  Value  Data   High  Value  Data   Keep  the  Right  Data  in  the   Data  Warehouse  System   • Opera7onal  Analy7cs   • Repor7ng   • Business  Analy7cs   50  TB   100   TB   Cloudera  /  Datameer   (Total  Cost  of  Cluster)   $1,000  -­‐  $2,000  /  TB   50  TB   Incremental  Spend:   $240,000-­‐  $300,000  ACV   Use  Hadoop  for  Everything  Else
 Savings:  $1.85  to  9.8  MM   • Historical  Data   • Data  Processing   • Ad  Hoc  Exploratory   • Transforma7on  /  Batch   • Data  Hub  
  11. 11. Agenda •  Why optimize? •  What to optimize? •  How to optimize? •  Who has optimized already? •  Conclusion
  12. 12. Assessing Workloads and Data Data Warehouse WORKLOADS Analytics Self-Service BI Operational Business Intelligence ▪  Data Processing (ELT) –  Staged data, to be processed –  Temp tables, BLOB/CLOB types, … ▪  Analytics / Machine Data Processing (ELT) Learning DATA –  Deep and broad data sets, within and beyond the warehouse Operational Data Archival Data Staged Data 14 ▪  Self-Service BI (Ad-Hoc Query) –  Operational data, actively used for BI –  Archival data, inactively used for BI
  13. 13. Offload Data Processing (ELT) What? Key Capabilities Integrate any type of data with pre-built connectors High-scale batch data processing High availability, disaster recovery, downtime-less upgrades Low-latency SQL processing Benefits of Cloudera and Datameer Over 2X the performance at 1/10th the cost 96% reduction in ETL time 15
  14. 14. Offload Analytics / Machine Learning What? Training & scoring
 predictive models Deep and broad data sets Key Capabilities Drag-and-drop Data Mining and Machine Learning for a business analyst Automated support for Clustering, Recommendations, Decision Tree, and Column Dependencies Ability to run SAS, R natively on the same cluster Benefits of Cloudera and Datameer Greater flexibility at 1/10th the cost Expand data mining and machine learning to analysts
  15. 15. Offload Self-Service Business Intelligence Workload Key Capabilities Self-Service BI,
 Exploratory BI,
 Data Discovery 250+ prebuilt analytics functions Unknown Questions Open source interactive SQL Transparency and governance Benefits of Cloudera and Datameer Better flexibility at 1/10th the cost Reduce analysis time from 4 weeks to 3 days
  16. 16. Complementing the Data Warehouse Data Warehouse Enterprise Applications (High $/Byte) Load OLTP ETL Archive CLOUDERA / DATAMEER Analyze Integrate Vis Batch Process Storage 19 Operational BI Query
 Search Business Intelligence Archival Data, Exploration, Analytics
  17. 17. Agenda •  Why optimize? •  What to optimize? •  How to optimize? •  Who has optimized already? •  Conclusion
  18. 18. Process! Integrate! Define! Ad Hoc Prepare and! Analyze! Deploy! Visualize and ! Validate! Production
  19. 19. Define! Profile and Assess Prioritize Identify "  Workloads in EDW" "  Constraints" "  Use cases" "  Ability to migrate" "  Portability" "  Return on investment" "  Size of data set" "  Disruption" © 2013 Datameer, Inc. All rights reserved.
  20. 20. Integrate! Migration Codeless Integration "  Data ingest paths" " ELT, not ETL" "  Map EDW workload to Cloudera" " 50+ Datameer connectors, plug-in API" © 2013 Datameer, Inc. All rights reserved.
  21. 21. Prepare and Analyze! Interactive Data Preparation Interactive + Smart Analytics Transparency + Governance " Ensure Data Quality" "  250+ built-in functions" "  Visual data lineage" " Enrich data" "  Automated machine learning" "  Complete audit trail" "  Metadata catalog" © 2013 Datameer, Inc. All rights reserved.
  22. 22. Visualize and Validate! Visualization Anywhere Validate "  Infographic or dashboard" " Verify results" "  Run on tablets and smart phone devices" " Tune" © 2013 Datameer, Inc. All rights reserved.
  23. 23. Deploy! Security Scheduling Monitoring "  LDAP / Active Directory " "  Dependency triggers" "  Monitoring system, jobs, "  Role based access control" "  Data synchronization" "  Support for Kerberos" "  External scheduling integration" performance, throughput" "  Error handling" "  Log management" © 2013 Datameer, Inc. All rights reserved.
  24. 24. Role Responsibilities Admin Set up and maintain environment Business Analyst Work with partners to define requirements and define goals Deployment Team Set up monitoring and scheduling ETL Architect Prepare and cleanse data
  25. 25. Roles Mapped to Process! Define BA Define goals, results, sources, requirements Integrate Admin Source data, secure for ad hoc Prepare & Analyze BA / Arch. Cleanse, combine, enrich data Create analysis Visualize BA Create infographics, dashboards Deploy Admin / Deploy. Team Business: Validate with end users Technical: Secure, monitor schedule
  26. 26. Use Cases Customer Operational Fraud and Compliance
  27. 27. Customer Reduce customer acquisition costs by 30%
  28. 28. HELLO my name is Identify $2B in fraudulent transactions $5.15 $3.95 $4.10 $4.15 $4.55 $3.22 greg 7-ELEVEN POS Reports Location Data Transactions Authorizations
  29. 29. Structured Logs ImproveDoubling in size every customer service, Network Data development, sales 15 months Unstructured Logs 111001 110010 01101001 01100100 10011101 01101110
  30. 30. Calculating ROI is a process
  31. 31. Apply ROI to Multiple Projects
  32. 32. Calculating Return
  33. 33. Business Benefits Funnel Optimization Increase Customer conversion by 3x Behavioral Analytics Increase Revenue by 2x Fraud Prevention Customer Segmentation Identify $2B in potential fraud Lower Customer Acquisition Costs by 30%
  34. 34. EDW Optimization Enterprise Data Warehouse Discover fraud in less time – from 2 days to 2 hours, save $30M on DR Avoid tens of millions in expansion purchases Offload 90% of all data Shrank EDW footprint by 4PB, 20x performance boost
  35. 35. Call to Action ▪  ROI and Solution Development Consultation ▪  Join us at Hadoop World ▪  Contacts –  Jeff Bean –  Karen Hsu