The document discusses big data and MapR's big data solutions. It provides an overview of key big data concepts like the growth of digital data, common use cases, and the big data analytics lifecycle. It also summarizes MapR's enterprise-grade platform for Hadoop, highlighting features like high availability, security, and support for real-time and batch processing workloads. Example customer implementations from HP and Cisco are described that demonstrate how MapR has helped companies gain business insights from large volumes of diverse data.
6. The Growth of the Digital Universe
is Accelerating
2009
.8 ZB 2013
4.5 ZB
2020
40 ZB
Source: IDC Digital Universe Study 2013
*Forrester: Forrester Research Inc, Forrsights Business Intelligence
Big Data Survey Q3 2012
Only 12%* of
an enterprise’s
data being
used currently
12%
7. Why do I need to
invest in big data
initiatives?
13. The complete
big data
analytics
lifecycle must
be addressed
Source: A CentturyLink Technology Solutions adaptation from Forrester Research, Inc. The Future of Customer
Data Management, March 6, 2013
14. Data Lake /
Data Refinery
Risk, Fraud,
Compliance
Network Monitoring
Real-Time
Recommendations /
Offers
Sentiment and Social
Graph Analysis
Machine Generated
Data Analysis
Common Big Data Use Cases
Marketing Campaign
Analysis
Customer Churn
Analysis
Customer Experience
Analysis
15. Break Down
Data Silos
Create Data
Archive
Gain Business
Insights
Data
Lake Analytics
A Basic Use Case is a Common
Starting Point
16. Monitoring,Management,
OrchestrationandProvisioning
INFRASTRUCTURE LAYER
Compute Storage Network
The Enterprise Big Data Model
DATA LAYER
Data Integration Tools
Hadoop
Enterprise
Data
Warehouse
DBMS
External
Data
Sources
NoSQL
INSIGHT LAYER
Data Discovery Tools
BI Data
Science
Visual-
ization
Horizontal /
Vertical Analytics
Marketing,
Sales Execution,
and Operations
Apps
Real-Time
Analytics
Streaming
Applications
17. Big Data is Ideal for Managed Services
• Standard services
to reduce costs
• Diverse range of
use cases, data
types
• Add-on products
and services
• Flexible
commercial models
• Enterprise data
integration
HadoopHadoop
Network ServicesNetwork Services
Strategy and Professional ServicesStrategy and Professional Services
Infrastructure-as-a-ServiceInfrastructure-as-a-Service
Big Data Environment Planning and
Implementation
Big Data Environment Planning and
Implementation
Hadoop ManagementHadoop Management
Analytics (Custom,OTS)Analytics (Custom,OTS)
18. Case Study
Business Challenge
• Manage collection, storage and
manipulation of data
• Complexity and cost of big data as an in-
house solution
• Provide more value to customers
Benefits
• Fast information transfer
• Easy-to-use web-based information
delivery
• Customer can make business decisions
based on new data sources
• Customer increases efficiencies based on
historical data
Agricultural
Equipment
Manufacturing
Company
Thank goodness people, including CxOs, are talking about data again.
Base: 51 North American enterprise IT decision-makers at firms that have adopted or are currently conducting a proof of concept of Big Data.
Source: A commissioned study conducted by Forrester Consulting on behalf of Savvis, October 2013
A second trend in enterprise architecture has been big data overwhelming the existing workload-specific systems which are in production. (list of requirements for each of these on the side in text)
People started with mainframes or operational systems which run ERP, finance, CRM and other mission-critical applications. They require… (pick out attributes you want to stress on the left)
You also have data warehouses, marts, data mining, and other analytical systems which pull data from these operational and other systems for providing insights to the business for decision making
The amount/variety of data has been overloading these systems. You reach a certain point as you try to ingest new types of data when these systems are not cost-effective to scale to terabytes or petabytes of data
The first reality is that as people put Hadoop into production, to relieve the pressure from other systems in their enterprise architecture it needs to reliable . Hadoop needs to be held to the same enterprise standards as your Oracle, SAP, Teradata, NetApp storage, or any other enterprise system.
Many organizations are putting Hadoop into their data center to provide (list of use cases underneath) … it can do all of this and more, but
For Hadoop to act as a system of record , it must provide the same guarantees for SLA’s, performance, data protection, and more
Most importantly, Hadoop has the potential for both analytics AND operations. It can be used to optimize the data warehouse provide batch data refining or storage. But Hadoop can provide many operational analytics or database operations/jobs when done right.
The first trend is that the industry leaders have shown how to use big data to compete and win in their markets. It’s no longer a nice to have – you need big data to compete
Google pioneered MapReduce processing on commodity hardware and used that to catapult themselves to into the leading search engine even though they were 19th in the market
Yahoo! Leveraged these ideas to create Hadoop to keep up with Google and many mainstream companies have followed with new data-driven applications such as “people you may know” (started by LinkedIN and now used by Facebook, Twitter, and every social application), product recommendation engines, contextual and personalized music services (beats), measuring digital media effectiveness (comScore), serving more relevant/targeted ads(Comcast, rubicon project), fraud and risk detection, healthcare efficacy, and more
What makes the difference? A lot of attention is given to data science and developing sophisticated new algorithms, but in many cases just having more data beats better algorithms. (make point on collecting more consumer interaction as well as transaction data, as an example).
In addition, competitive advantage is decided by very small percentages. Just 1% improvement in fraud can mean hundreds $millions in savings. A ½% lift in advertising effectiveness means millions in new product sales and profitability. The same can be applied to customer churn, disease diagnosis, and more.
The infrastructure layer is often overlooked, but is critical to supporting any successful big data stack
The storage layer is key to the functionality and performance of the data layer
The data layer brings together different technologies and data sources – but integration remains a pain point
The end user interacts with the analytics layer to analyze the data and extract business insights
Perfect time to tee up MapR as differentiator for operations
Hadoop doesn’t fit neatly in one layer of the stack
Hadoop is emerging as its own technology stack, that spans analytics, data and storage
MapR Hadoop has the broadest span of any distribution
Lowest level support in storage, managing disk spindles directly to optimize speed
Most comprehensive support for open source projects for analytics or data management
Differentiated M7 tables functionality that improves the latency and stability of HBASE
MapR’s innovations have also expanded the use cases that are possible with Hadoop. Not only do we support the full Hadoop API set. MapR provides support for NFS so any file-based application can access the cluster with no changes or rewrites required.
MapR provides ODBC support, so any database application or SQL-based tool can access and manipulate data in a MapR cluster.
MapR supports real-time streaming access. This greatly expands the applications that are possible with Hadoop moving beyond a batch limitation. Finally, the full HA, DR and data protection capabilities of MapR allow mission critical apps to be deployed safely and allows administrators to meet stringent SLA targets.
The power of MapR begins with the power of open source innovation and community participation.
In some cases MapR leads the community in projects like Apache Mahout (machine learning) or Apache Drill (SQL on Hadoop)
In other areas, MapR contributes, integrates Apache and other open source software (OSS) projects into the MapR distribution, delivering a more reliable and performant system with lower overall TCO and easier system management.
MapR releases a new version with the latest OSS innovations on a monthly basis. We add 2-4 new Apache projects annually as new projects become production ready and based on customer demand.
The power of MapR begins with the power of open source innovation and community participation.
In some cases MapR leads the community in projects like Apache Mahout (machine learning) or Apache Drill (SQL on Hadoop)
In other areas, MapR contributes, integrates Apache and other open source software (OSS) projects into the MapR distribution, delivering a more reliable and performant system with lower overall TCO and easier system management.
MapR releases a new version with the latest OSS innovations on a monthly basis. We add 2-4 new Apache projects annually as new projects become production ready and based on customer demand.
With MapR Hadoop is Lights out Data Center Ready
MapR provides 5 99999’s of availability including support for rolling upgrades, self –healing and automated stateful failover. MapR is the only distribution that provides these capabilities,
MapR also provides dependable data storage with full data protection and business continuity features. MapR provides point in time recovery to protect against application and user errors. There is end to end check summing so data corruption is automatically detected and corrected with MapR’s self healing capabilities. Mirroring across sites is fully supported.
All these features support lights out data center operations. Every two weeks an administrator can take a MapR report and a shopping cart full of drives and replace failed drives.
HP.com has a case study dedicated to clickstream analytics. It talks about “Apache Hadoop”, which is actually MapR http://www.vertica.com/wp-content/uploads/2013/02/HP_BigData_casestudy.pdf
Objectives:
- How to make HP.com better and more sticky, to improve cross-sell and upsell.
- Improved ability to identify and correct issues with website hardware or software, which reduces risks of degraded customer experience and lost sales
Improved ability to deliver interactive, personalized website experience, which improves sales conversions and drives sales and revenue
“We capture 11 to 12 billion clicks per month,” Lormand says. To fully support trending and comparative analysis, HP must store around five years’ worth of clickstream data; analysts typically want to work with about 15 months’ worth at a time to perform year over year trend analysis. This allows the analysts to account for seasonality and show correlation to previous year’s traffic.
Now HP is better equipped to improve its website functionality and architecture. It can more easily correlate events across its server farms, for example, which will allow it to identify and isolate anomalies that will yield insights Into how website functionality is affecting user interactions. “Our HP Vertica solution gives us a true, end-to-end picture of our environment,” says Lormand. “And because it gives us faster results, we can respond to issues more quickly.”
HP will be able to better tailor its website interactivity to the needs of individual visitors, delivering a more precise and granular shopping experience. In the past, for example, the site guided visitors to information on the basis of broad categories. If the visitor seemed to fit the profile of a typical retail customer, that visitor would be guided to one set of solutions. Visitors fitting the profile of a home office user would be led to a different subset of products. But some visitors don’t always fit neatly into these categories. Now, thanks to the insight gained via the HP Vertica solution, HP can build website functionality that ensures the site responds appropriately to all kinds of visitors. And this, in turn, will enhance visitor satisfaction and improve sales conversion rates.
FULLMAPR HP CASE STUDY at http://www.mapr.com/sites/default/files/mapr_case_study_hp_4.pdf
HP leverages MapR as a low-cost, massive storage platform to integrate, consolidate, and analyze data from multiple sources. Its “data lake” has enabled the development of new solutions and client offerings, which are helping to improve the overall HP customer experience across all touch-points. After a comprehensive evaluation of Hadoop vendors, HP selected MapR as the clear choice for its performance, high availability, disaster recovery, manageability, and scalability.
At MapR, our main focus is not to sell you services, but to make your Hadoop and big data projects successful. This is before, during, and after you go into production
Before: Education services provide instructor-led courses in a variety of formats. We have 3-day training on Hadoop development and administration, but what is DIFFERENT is we have web-based training, as well. Unlike other companies that want to make money on services and continuous education, we are primarily a product company. With WBT, you can get more people up to speed on Hadoop at their own pace, without high travel expenses.
During: We also have both data science and data engineering teams to help with any phase of your project. Use case discovery, Implementation, Data Migration, data modeling, machine learning, HBase Schema design, Application Analysis, and performance tuning.
Again, our focus is not to build and manage your cluster, but do knowledge transfer, help with the heavy lifting, and make you self-sustaining
After: Support - MapR is focused on raising the bar for product and support capability in the world of Apache Hadoop. MapR delivers highly available resources to assist in all aspects of product deployment and usage. We created both a breakthrough product and support team to deliver the high level of mission critical support you expect. MapR's support team offers:
24x 7 community, phone and email support options – staffing in San Jose, India, and Japan.
On-demand patches and proactive update notification
Online incident submission and response
License Management
On-site installation and training
Local language support