Your SlideShare is downloading. ×
Delivering on the Hadoop/HBase Integrated Architecture
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Delivering on the Hadoop/HBase Integrated Architecture

915
views

Published on

Published in: Technology

1 Comment
2 Likes
Statistics
Notes
  • Thanks for disabling SAVE!! We ll buy it else where.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
915
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
1
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • 9
  • 12
  • 13
  • Transcript

    • 1. © 2014 MapR Technologies 1© 2014 MapR Technologies Delivering on the Hadoop/HBase Integrated Architecture
    • 2. © 2014 MapR Technologies 2 Topics Getting Started Architectures In-Hadoop Databases in Action Introduction to In-Hadoop Databases What’s Next?
    • 3. © 2014 MapR Technologies 3 What Would (Did) Google Do? 2003 GFS 2004 Web index is batch (GFS/MapReduce) 2010 Web index is real-time (BigTable) The transition from batch to real-time 2004 MapReduce 2006 BigTable The explosion in operational applications Google’s operational data store (BigTable) has enabled multiple revolutions within the company: (1) (2)
    • 4. © 2014 MapR Technologies 4 Operations Vs. Analytics Operations (Databases) • Real-time • Reads/writes/updates • Current/recent data • Updated regularly • Fast inserts/updates • Large volumes of data Analytics (Data Warehouses) • Batch • Reports • Historical data • Generally non-volatile • Fast retrievals • Even larger volumes of data But is the data different?
    • 5. © 2014 MapR Technologies 5 Mobile application server Web application server OperationalAnalytics Hadoop Data exploration (SQL) Operational DBMS (e.g., Oracle, MongoDB) Batch import/export Customer 360 dashboard Churn analysis (predictive analytics) Typical Integration Is this okay?
    • 6. © 2014 MapR Technologies 6 Mobile application server Web application server Data exploration (SQL) Customer 360 dashboard Churn analysis (predictive analytics) In-Hadoop Databases • User profiles and state • User interactions • Real-time location data • Web and mobile session state • Comments/rankings Product/service optimization and personalization Real-time ad targeting OperationalReal-Time and Actionable Analytics
    • 7. © 2014 MapR Technologies 7 What Do You Get with In-Hadoop? • Real-time analysis/computation • Architectural simplicity – No duplication of data – Fewer disparate clusters to manage, less risk of error • Reduced network bandwidth utilization
    • 8. © 2014 MapR Technologies 8 Topics Getting Started Architectures In-Hadoop Databases in Action Introduction to In-Hadoop Databases What’s Next?
    • 9. © 2014 MapR Technologies 9 Customer data, network security event data Anomaly detection on large volumes of security event data, analytics on customer data to enable incremental sales
    • 10. © 2014 MapR Technologies 10 Sales data analysis, SaaS-based reporting Large scale analytics on POS data combined with fast responsiveness on large reports SaaS- delivered reports
    • 11. © 2014 MapR Technologies 11 Advertising Automation Cloud Buyers Cloud Industry data analysis, SaaS-based reporting Sales performance management data combined with fast responsiveness SaaS- delivered reports
    • 12. © 2014 MapR Technologies 12 Customer profile data, customer behavior data Analytics on customer behavior for better recommendations Telecommunications Company
    • 13. © 2014 MapR Technologies 13 whether we can publicly show this, even though Customer profile and transaction data Anomaly detection on customer transactions, recommendations based on large-scale analysis of purchases, customer care Financial Services Firm
    • 14. © 2014 MapR Technologies 14 Topics Getting Started Architectures In-Hadoop Databases in Action Introduction to In-Hadoop Databases What’s Next?
    • 15. © 2014 MapR Technologies 15 Databases on Direct Attached Storage (DAS) ext3/4 Database Files ext3/4 Database Files ext3/4 Database Files Pros: • Fast local file access • Lower cost vs. SAN/NAS Cons: • No storage management • Wasted capacity • No snapshots for backup • Unreliable storage • Add nodes to scale capacity Database Server Database Server Database Server
    • 16. © 2014 MapR Technologies 16 Databases on Networked Storage (SAN/NAS) Database Files Database Files Database Files Pros: • Snapshot/backup • Easy capacity expansion • Disaster recovery • Improved disk utilization • Seamless maintenance • Reliable Cons: • File access is remote • Expensive SAN/NAS Database Server Database Server Database Server
    • 17. © 2014 MapR Technologies 17 Databases on Hadoop (“In-Hadoop”) Hadoop Database Files Database Files Database Files Pros: • Reduced complexity • Lower operational cost • Faster local file access • Easy capacity expansion • Dynamic storage utilization Database Server Database Server Database Server
    • 18. © 2014 MapR Technologies 18 Lambda Architecture (lambda-architecture.net) BATCH VIEWS BATCH LAYER SERVING LAYER SPEED LAYER MERGE ALL DATA (HDFS) HADOOP BATCH RECOMPUTE PROCESS STREAM INCREMENT VIEWS STORM REAL-TIME INCREMENT Partial aggregate Partial aggregate Partial aggregate MERGED VIEW (HBASE) REAL-TIME DATA REAL-TIME VIEWS NEW DATA STREAM PRECOMPUTE VIEWS (MAPREDUCE)
    • 19. © 2014 MapR Technologies 19 Some of the Pieces for Hadoop Impala SQL Query Engines Kafka Flume Scribe Streaming Technologies
    • 20. © 2014 MapR Technologies 20 Enterprise Data Hub Architecture Load more data sources Enrich data in Hadoop Analyze Offload / Enrich / Reload RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS LOG FILES, CLICKSTREAMS BLOGS, TWEETS, LINK DATA MapR Control System (MCS) Hadoop User Experience (HUE) Batch Processing MR, YARN, Hive, Pig, etc. M7, HBase, other data stores Interactive Querying Drill, Impala, Presto, etc. MapR Data Platform MapR M7 Tables MAPR DISTRIBUTION FOR HADOOP BI REPORTS AND APPLICATIONS High speed streaming DATA MARTS DATA WAREHOUSE PARSE, PROFILE, ETL LOAD STREAMING REPLICATE, CDC CLEANSE, MATCH LOAD
    • 21. © 2014 MapR Technologies 21 Topics Getting Started Architectures In-Hadoop Databases in Action Introduction to In-Hadoop Databases What’s Next?
    • 22. © 2014 MapR Technologies 22 Caveats • Operational and analytical workloads demand different design points • MapReduce and HBase don’t like to share • YARN will help, but getting guaranteed capacity is still an issue • HBase compactions make the challenge harder
    • 23. © 2014 MapR Technologies 23 Plan Ahead • Expect to learn much more along the way, no matter how much you’ve already done • Talk to folks with successful deployments • Start with the lighter operational loads • Reserve plenty of resources in your cluster to handle both operational and analytical tasks
    • 24. © 2014 MapR Technologies 24 Topics Getting Started Architectures In-Hadoop Databases in Action Introduction to In-Hadoop Databases What’s Next?
    • 25. © 2014 MapR Technologies 25 Recent HBase Innovations • Apache HBase 0.98 – Released in early 2014 – 212 resolved JIRAs – New features, performance improvements, API cleanup, many bug fixes • C API for HBase (libHBase) – JIRA HBASE-1015 – Written by Aditya Kishore – https://github.com/mapr/libhbase
    • 26. © 2014 MapR Technologies 26 Other In-Hadoop Database Technologies • Databases in Hadoop – Apache Accumulo – Splice Machine – MarkLogic – OhmData – MapR M7 • Data Warehouses on Hadoop – HP Vertica – Pivotal HAWQ – Teradata Aster Big Data Analytics Appliance
    • 27. © 2014 MapR Technologies 27 Q&A @mapr maprtech dalekim@mapr.com Engage with us! MapR maprtech mapr-technologies