Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
MapR Distribution for Hadoop Overview
Top Ranked
Exponential
Growth
500+
Customers Cloud Leader...
© 2014 MapR Technologies 3
Topics for Today
• Hadoop Trends and Realities
• Hadoop Deployment Model
• Integrating Hadoop i...
© 2014 MapR Technologies 4© 2014 MapR Technologies
3 Trends
Forcing a revolution in enterprise architecture
© 2014 MapR Technologies 5
Industry Leaders Compete and Win with Data1TREND
More Data Beats Better Algorithms
Collecting i...
© 2014 MapR Technologies 6
Big Data is Overwhelming Traditional Systems
• Mission-critical reliability
• Transaction guara...
© 2014 MapR Technologies 7
Hadoop: The Disruptive Technology at the Core of Big Data3TREND
JOB TRENDS FROM INDEED.COM
Jan ...
© 2014 MapR Technologies 8
ENTERPRISE
DATA HUB
MARKETING
OPTIMIZATION
RISK & SECURITY
OPTIMIZATION
OPERATIONS
INTELLIGENCE...
© 2014 MapR Technologies 9© 2014 MapR Technologies
And 2 Realities
© 2014 MapR Technologies 10
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
ENTERPRISE
USERS
1REALITY
• Data staging
• Archive
• Da...
© 2014 MapR Technologies 11
Moving towards operational applications
2003
GFS
2004
Web index is batch
(GFS/MapReduce)
2010
...
© 2014 MapR Technologies 12© 2014 MapR Technologies
Hadoop Deployment Model
© 2014 MapR Technologies 13
Modern Data Architecture for Hadoop
Sources
RELATIONAL,
SAAS,
MAINFRAME
DOCUMENTS,
EMAILS
LOG ...
© 2014 MapR Technologies 14
Data Warehouse Optimization
Improve data services to customers while reducing enterprise archi...
© 2014 MapR Technologies 15
Operational Apps: Push Messaging Platform
MapR: Enabling the “smartest, most aware, precise, e...
© 2014 MapR Technologies 16© 2014 MapR Technologies
Integrating Hadoop into Enterprise Environments
© 2014 MapR Technologies 17
Hadoop Success Depends on
Enterprise
Grade
Functionality
Scaling for the
Future
© 2014 MapR Technologies 18
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low...
© 2014 MapR Technologies 19
Data
IT Budgets
TCO : Core to Hadoop evolution
• Hadoop TAM comes from disrupting enterprise d...
© 2014 MapR Technologies 20
Better Performance with Less Hardware
PREVIOUS
RECORD: 1.6 TB
with 2200 nodes
1.65 TBIN 1 MINU...
© 2014 MapR Technologies 21
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low...
© 2014 MapR Technologies 22
Data Protection: Replication and Snapshots
Replication
• Protect from hardware failures
• File...
© 2014 MapR Technologies 23
Hadoop Security
Authorization to
ensure the right
access to files
and databases
Authentication...
© 2014 MapR Technologies 24
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low...
© 2014 MapR Technologies 25
Metadata HA
MapReduce/YARN HA
Instant recovery
Rolling upgrades
HA is built in
• Distributed m...
© 2014 MapR Technologies 26
Disaster Recovery: Mirroring
• Flexible
– Choose the volumes/directories to mirror
– You don’t...
© 2014 MapR Technologies 27
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low...
© 2014 MapR Technologies 28
Seamless Integration with NFS
• POSIX compliance
– Random reads/writes
– Simultaneous reading ...
© 2014 MapR Technologies 29
When Hadoop Looks Like a NAS…
• Data ingestion is easy
– Popular online gaming company changed...
© 2014 MapR Technologies 30
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low...
© 2014 MapR Technologies 31
Pick the
Right Tool
for the Job
© 2014 MapR Technologies 32
Freedom of ChoiceManagement
MapR Data Platform
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
P...
© 2014 MapR Technologies 33
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low...
© 2014 MapR Technologies 34
Volumes
100K volumes are OK,
create as many as needed
Volumes dramatically simplify
management...
© 2014 MapR Technologies 35
Multi-tenancy Isolation
• Tasks sandboxed so they don’t impact other tasks or system daemons
•...
© 2014 MapR Technologies 36
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low...
© 2014 MapR Technologies 37
Operations + Analytics on One Platform
Fraud model
Recommendations
table
HADOOP
Fraud
investig...
© 2014 MapR Technologies 38© 2014 MapR Technologies
Recap
© 2014 MapR Technologies 39
Integrating Hadoop into the Enterprise
Enterprise-Grade Functionality + Future Proofing
1. Low...
© 2014 MapR Technologies 40
From Redundant Processing Silos and Data Science Experiments…
Opportunity to Revolutionize Ent...
© 2014 MapR Technologies 41
… to Consolidated Operational and Analytical Workloads
The Production Enterprise Data Hub
Hado...
© 2014 MapR Technologies 42
Q&A
@mapr maprtech
nitin@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies
Upcoming SlideShare
Loading in …5
×

Integrating Hadoop into your enterprise IT environment

1,829 views

Published on

http://bit.ly/1M8gzAM – As the old saying goes, "it's not what you do, but how you do it" that makes all the difference. The benefits of Hadoop are well-documented as mainstream adoption continues to grow. However, as with any new technology, integrating Hadoop with your existing data management infrastructure is crucial for getting the maximum value from its capabilities.

Join us for a special roundtable webcast on July 10th to learn how to do it the right way. Gain a deeper understanding of the fundamentals of Hadoop and its growing ecosystem, the key considerations for modifying your current data management practices and the types of Big Data applications you'll be able to build.

Published in: Technology, Business

Integrating Hadoop into your enterprise IT environment

  1. 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  2. 2. © 2014 MapR Technologies 2 MapR Distribution for Hadoop Overview Top Ranked Exponential Growth 500+ Customers Cloud Leaders 3X bookings Q1 ‘13 – Q1 ‘14 80% of accounts expand 3X 90% software licenses <1% lifetime churn >$1B in incremental revenue generated by 1 customer
  3. 3. © 2014 MapR Technologies 3 Topics for Today • Hadoop Trends and Realities • Hadoop Deployment Model • Integrating Hadoop into Your IT Environment
  4. 4. © 2014 MapR Technologies 4© 2014 MapR Technologies 3 Trends Forcing a revolution in enterprise architecture
  5. 5. © 2014 MapR Technologies 5 Industry Leaders Compete and Win with Data1TREND More Data Beats Better Algorithms Collecting interaction data from ecommerce, social media, offline, and call centers enables a “customer 360 view” and consumer intimacy Competitive Advantage is Decided by 0.5% Consumer financial services: 1% improvement in fraud detection means hundreds of millions of dollars Advertising and retail: 0.5% improvement in lift means millions of dollars increase in profitability
  6. 6. © 2014 MapR Technologies 6 Big Data is Overwhelming Traditional Systems • Mission-critical reliability • Transaction guarantees • Deep security • Real-time performance • Backup and recovery • Interactive SQL • Rich analytics • Workload management • Data governance • Backup and recovery Enterprise Data Architecture 2TREND ENTERPRISE USERS OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS PRODUCTION REQUIREMENTS PRODUCTION REQUIREMENTS OUTSIDE SOURCES
  7. 7. © 2014 MapR Technologies 7 Hadoop: The Disruptive Technology at the Core of Big Data3TREND JOB TRENDS FROM INDEED.COM Jan ‘06 Jan ‘12 Jan ‘14Jan ‘07 Jan ‘08 Jan ‘09 Jan ‘10 Jan ‘11 Jan ‘13
  8. 8. © 2014 MapR Technologies 8 ENTERPRISE DATA HUB MARKETING OPTIMIZATION RISK & SECURITY OPTIMIZATION OPERATIONS INTELLIGENCE • Multi-structured data staging & archive • ETL / DW optimization • Mainframe optimization • Data exploration • Recommendation engines & targeting • Customer 360 • Click-stream analysis • Social media analysis • Ad optimization • Network security monitoring • Security information & event management • Fraudulent behavioral analysis • Supply chain & logistics • System log analysis • Manufacturing quality assurance • Preventative maintenance • Smart meter analysis Common Use Cases: Taking Advantage of Hadoop
  9. 9. © 2014 MapR Technologies 9© 2014 MapR Technologies And 2 Realities
  10. 10. © 2014 MapR Technologies 10 OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS ENTERPRISE USERS 1REALITY • Data staging • Archive • Data transformation • Data exploration • Streaming, interactions Hadoop now on the critical path 2 Interoperability 1 Reliability and DR 4 Supports operations and analytics 3 High performance Keys for Production Success
  11. 11. © 2014 MapR Technologies 11 Moving towards operational applications 2003 GFS 2004 Web index is batch (GFS/MapReduce) 2010 Web index is real-time (BigTable) The transition from batch to real-time 2004 MapReduce 2006 BigTable The explosion in operational applications Google’s operational data store (BigTable) has enabled multiple revolutions within the company: (1) (2) 2REALITY
  12. 12. © 2014 MapR Technologies 12© 2014 MapR Technologies Hadoop Deployment Model
  13. 13. © 2014 MapR Technologies 13 Modern Data Architecture for Hadoop Sources RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS LOG FILES, CLICKSTREAMS SENSORS BLOGS, TWEETS, LINK DATA DATA WAREHOUSE Data Movement Data Access Analytics Search Schema-less data exploration BI, reporting Ad-hoc integrated analytics Data Transformation, Enrichment and Integration MAPR DISTRIBUTION FOR HADOOP Streaming (Spark Streaming, Storm) NoSQL ODBMS (HBase, Accumulo, …) Data Storage Platform DISTRIBUTION FOR HADOOP Batch / Search (MR, Spark, Hive, Pig, …) Operational Apps Recommendations Fraud Detection Logistics Optimized Data Architecture Machine Learning
  14. 14. © 2014 MapR Technologies 14 Data Warehouse Optimization Improve data services to customers while reducing enterprise architecture costs • Provide cloud, security, managed services, data center, & comms • Report on customer usage, profiles, billing, and sales metrics • Improve service: Measure service quality and repair metrics • Reduce customer churn – identify and address IP network hotspots • Cost of ETL & DW storage for growing IP and clickstream data; >3 months • Reliability & cost of Hadoop alternatives limited ETL & storage offload • MapR Data Platform for data staging, ETL, and storage at 1/10th the cost • MapR provided smallest datacenter footprint with best DR solution • Enterprise-grade: NFS file management, consistent snapshots & mirroring OBJECTIVES CHALLENGES SOLUTION • Increased scale to handle network IP and clickstream data • Reduced workload on DW to maintain reporting SLA’s to business • Unlocked new insights into network usage and customer preferences Business Impact FORTUNE 100 TELCO
  15. 15. © 2014 MapR Technologies 15 Operational Apps: Push Messaging Platform MapR: Enabling the “smartest, most aware, precise, easy-to-use, scalable, secure and powerful push messaging platform on the planet" • Enable organizations to build one-on-one brand relationships • Push messaging and geo-location targeting that • Support large numbers of customers in a multi-tenant platform • Target specific consumers in real time with relevant offers • Increase reliability of push messaging while lowering data center costs OBJECTIVES CHALLENGES SOLUTION • Increasing engagement and customer loyalty for 100’s of leading brands • Reduced hardware footprint by 50% • Consolidated 8 Hadoop clusters into 1 MapR cluster Business Impact • MapR Distribution for Hadoop with Apache HBase for operational workloads • Data placement control enables efficient cluster resource management
  16. 16. © 2014 MapR Technologies 16© 2014 MapR Technologies Integrating Hadoop into Enterprise Environments
  17. 17. © 2014 MapR Technologies 17 Hadoop Success Depends on Enterprise Grade Functionality Scaling for the Future
  18. 18. © 2014 MapR Technologies 18 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO Enterprise Requirements
  19. 19. © 2014 MapR Technologies 19 Data IT Budgets TCO : Core to Hadoop evolution • Hadoop TAM comes from disrupting enterprise data warehouse and storage spending • Gartner, "Forecast Analysis: Enterprise IT Spending by Vertical Industry Market, Worldwide, 2010-2016, 3Q12 Update.“ • Wall Street Journal, “Financial Services Companies Firms See Results from Big Data Push”, Jan. 27, 2014 $9,000 $40,000 <$1,000 DATA GROWING AT 40% 2013 ENTERPRISE STORAGE IT BUDGETS GROWING AT 2.5% 2014 2015 2016 2017 DATABASE WAREHOUSE $ PER TERABYTE 19 HADOOP
  20. 20. © 2014 MapR Technologies 20 Better Performance with Less Hardware PREVIOUS RECORD: 1.6 TB with 2200 nodes 1.65 TBIN 1 MINUTE 298 NODES NEW MINUTESORT WORLD RECORD MapR: With a Fraction of the Hardware Previous Record
  21. 21. © 2014 MapR Technologies 21 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO 2. Trusted Data Enterprise Requirements
  22. 22. © 2014 MapR Technologies 22 Data Protection: Replication and Snapshots Replication • Protect from hardware failures • File chunks, table regions and metadata are automatically replicated (3x by default) • At least one replica on a different rack Snapshots • Protect from user and application errors • Point-in-time recovery • Redirect on write • No performance or scale impact • Read files and tables directly from snapshot C1 C2 C3 C1 C2 C4 C1 C4 C4 C2 C5 C5 C6 C3 C5 C6 C3C6 C7 C7 C7 ₁
  23. 23. © 2014 MapR Technologies 23 Hadoop Security Authorization to ensure the right access to files and databases Authentication for users and user-created job requests Encryption to ensure user credentials and data are always secure Integration with existing security infrastructure
  24. 24. © 2014 MapR Technologies 24 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO 2. Trusted Data 3. Application SLAs Enterprise Requirements
  25. 25. © 2014 MapR Technologies 25 Metadata HA MapReduce/YARN HA Instant recovery Rolling upgrades HA is built in • Distributed metadata can self-heal • No practical limit on # of files • Jobs are not impacted by failures • Meet your data processing SLAs • Files and tables are accessible within seconds of a node failure or cluster restart • Upgrade the software with no downtime • No special configuration to enable HA High Availability (HA) Everywhere
  26. 26. © 2014 MapR Technologies 26 Disaster Recovery: Mirroring • Flexible – Choose the volumes/directories to mirror – You don’t need to mirror the entire cluster – Active/active • Fast – No performance impact – Automatic compression • Safe – Point-in-time consistency – End-to-end checksums • Easy – Graceful handling of network issues – No third-party software – Takes less than two minutes to configure! Production WAN Production Research Datacenter 1 Datacenter 2 WAN EC2
  27. 27. © 2014 MapR Technologies 27 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO 2. Trusted Data 3. Application SLAs 4. Open Standards Enterprise Requirements
  28. 28. © 2014 MapR Technologies 28 Seamless Integration with NFS • POSIX compliance – Random reads/writes – Simultaneous reading and writing to a file – Compression is automatic and transparent • Industry-standard NFS interface (in addition to HDFS API) – Stream data into the cluster – Leverage thousands of tools and applications – Easier to use non-Java programming languages – No need for most proprietary Hadoop connectors Hadoop
  29. 29. © 2014 MapR Technologies 29 When Hadoop Looks Like a NAS… • Data ingestion is easy – Popular online gaming company changed data ingestion from a complex Flume cluster to a 17- line Python script • Database bulk import/export with standard vendor tools – Large telco saved $30M on EDW costs (5 years) by leveraging MapR to pre-process and store raw data prior to loading into EDW • 1000s of applications/tools – Existing Linux commands, browsers work out of the box Application servers $ find . | grep log $ cp $ vi results.csv $ scp $ tail -f part-00000 Logs
  30. 30. © 2014 MapR Technologies 30 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO 2. Trusted Data 3. Application SLAs 4. Open Standards Enterprise Requirements 1. Freedom of Choice Future Proofing
  31. 31. © 2014 MapR Technologies 31 Pick the Right Tool for the Job
  32. 32. © 2014 MapR Technologies 32 Freedom of ChoiceManagement MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisiong. & Coordn. Savannah* Mahout MLLib ML, Graph GraphX MR v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Govnce.Tez* Accumulo* Hive Impala Shark Drill* SQL Sentry* Oozie ZooKeeperSqoop Knox* WhirrFalcon*Flume Data Integrtn. & Access HttpFS Hue * 2014 TIMELINE
  33. 33. © 2014 MapR Technologies 33 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO 2. Trusted Data 3. Application SLAs 4. Open Standards Enterprise Requirements 1. Freedom of Choice 2. Multiple Users Future Proofing
  34. 34. © 2014 MapR Technologies 34 Volumes 100K volumes are OK, create as many as needed Volumes dramatically simplify management of multiple users: • Replication factor • Scheduled mirroring • Scheduled snapshots • Data placement control • User access and tracking • Administrative permissions /projects /tahoe /yosemite /user /msmith /bjohnson
  35. 35. © 2014 MapR Technologies 35 Multi-tenancy Isolation • Tasks sandboxed so they don’t impact other tasks or system daemons • System resources protected from runaway jobs • Volume-based data placement • Label-based job scheduling Quotas • Storage quotas by volume/user/group • CPU and memory quotas by queue/user/group Security and delegation • Wire-level authentication and encryption (Kerberos not required) • Fine-grained administration permissions including volume-level delegation • Authenticate users to AD, LDAP and Kerberos via Linux PAM Reporting • Detailed reporting on resource usage (75+ different metrics) • All reports are available via UI, CLI and REST API
  36. 36. © 2014 MapR Technologies 36 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO 2. Trusted Data 3. Application SLAs 4. Open Standards Enterprise Requirements 1. Freedom of Choice 2. Multiple Users 3. Operational Applications Future Proofing
  37. 37. © 2014 MapR Technologies 37 Operations + Analytics on One Platform Fraud model Recommendations table HADOOP Fraud investigator Interactive marketer Online transactions Fraud detection Personalized offers Clickstream analysis Fraud investigation tool Real-time Operational Applications Analytics
  38. 38. © 2014 MapR Technologies 38© 2014 MapR Technologies Recap
  39. 39. © 2014 MapR Technologies 39 Integrating Hadoop into the Enterprise Enterprise-Grade Functionality + Future Proofing 1. Low TCO 2. Trusted Data 3. Application SLAs 4. Open Standards Enterprise Requirements 1. Freedom of Choice 2. Multiple Users 3. Operational Applications Future Proofing
  40. 40. © 2014 MapR Technologies 40 From Redundant Processing Silos and Data Science Experiments… Opportunity to Revolutionize Enterprise Data Architecture
  41. 41. © 2014 MapR Technologies 41 … to Consolidated Operational and Analytical Workloads The Production Enterprise Data Hub Hadoop
  42. 42. © 2014 MapR Technologies 42 Q&A @mapr maprtech nitin@mapr.com Engage with us! MapR maprtech mapr-technologies

×