The MapR distribution for Hadoop is globally recognized as the technology leader
Forrester published a Wave for Big Data Hadoop Solutions where it placed MapR as the highest ranking product based on current offering as well as roadmap.
Cloud: MapR has been selected by two of the companies most experienced with MapReduce technology which is a testament to the technology advantages of MapR’s distribution. Amazon through its Elastic MapReduce service (EMR) hosted over 2 million clusters in the past year. Amazon selected MapR to complement EMR as the only commercial Hadoop distribution being offered, sold and supported as a service by Amazon to its customers.
MapR was also selected by Google – the pioneer of MapReduce and the company whose white paper on MapReduce inspired the creation of Hadoop – has also selected MapR to make our distribution available on Google Compute Engine.
I talk about the evolution of “Must Have” strategies over time Mention how nothing goes away IoT is next? Ask audience what they think might be next
I usually give an example of something the CIO might be worried about with “Back Office” and “Front Office” Back Office, I talk to my time as IT person and talk about ensuring back office tools are available – back in the 80’s it was ok to just have phone and email working in the morning, now the expectation is much much higher Front Office, I talk about things like dashboards, metrics, numbers for C-level and across LOB. Again, expectation of getting information in easily consumable manner for front office very high today
Examples of the questions that are being asked Example of the thought process of how data gets to action Data Lake/Data Hub example that for this to work data can’t be siloed
1980’s very regimented approach to the stack. Things happened in a certain way with structured data (schema first) and that was it, no options 2000’s we start with scale-out, not scale-up – the notion of the 80s of just throw more hardware at the problem is no longer acceptable 2010s data lake, operational and analytic apps together for query, no-schema’s, visualization of the apps is key
Big Example of one bullet – HDFS vs. NFS for supporting/rewriting legacy apps (difficult, requires planning, resources, people, time) Fast Example of one – special purpose real-time apps or appliances, Oracle is always a good target here
Talk about data movement, the grey arrows, how that’s still hard to do today. Moving data to then do batch processing on multiple workloads across structured/unstructured is not optimal. There has to be a better way.
Big Pick one to talk to – I usually talk to schemas and give the Portal example (see next slide)
Fast Polyglot – has to be able to support multiple languages, talk about developers. Can use example of the question I got on webinar from a C++ developer, “How much Java do I need to know to be able to work in Hadoop?”
How MapR “fixes” Big & Fast and makes it doable, enterprise grade, fast, manageable, affordable.
Pick one or two to talk to. I usually talk about HA/DR and how that’s an important component to big and fast. Also mention how this view is all the Apache Hadoop parts and next build is where our value add comes in.
I usually talk to our start as a File System company and how that basis for our distribution differentiates us from the competition. Then talk to a couple of our unique differentiators. MapR-DB is a good one as most of the folks in the room will know little about us as a Hadoop vendor, so can really surprise them with no only Hadoop but database vendor as well.
Again, pick one or two to talk to. As I usually have talked to HA/DR, I leave that one and talk about Multi-tenancy and Performance.
This is the “Why Teradata and MapR” slide. It really speaks for itself and I usually put it up and let folks read it and make a comment along the lines of “when a company the size of Teradata says that 90% of their customers want us working together, we listen.”
Point out QueryGrid support for MapR which we announced at Teradata Universe EMEA event in April. Teradata Loom support coming this calendar year. And the reseller part is something that no one knew in the ones I’ve done thus far. So, yes, Teradata customers can purchase any of our core products from their Teradata rep. While it mentions the purchase of training, I usually push our free ODT training here too.
MapR subscribes to the Gartner Logical Data Warehouse view Hadoop NOT a replacement for the DW – part of larger ecosystem Our value props Note: this exact model is how we position with SAP too.
For the visual thinkers in the room. How MapR fits into the Teradata UDA (Unified Data Architecture). Data Sources (structured and unstructured) on the left Feed into MapR and Teradata In UDA – management, movement, access of data – data lake, hub Exported to apps And finally to UI
Cisco IT built a Big Data Platform to transform data management and provide big data analytics services to Cisco business teams. Cisco used MapR for their enterprise Hadoop architecture to unlock hidden business intelligence of their globally distributed large data sets, including structured and unstructured information, while also providing service-level agreements (SLAs) for internal customers. The complete infrastructure solution let Cisco analyze service sales opportunities in 1/10 the time, at 1/10 the cost; generated $40 million in incremental service bookings in the current fiscal year; and yielded a multi-tenant enterprise platform while delivering immediate business value. Case study: https://www.mapr.com/customers/cisco
This image is an abbreviated version of what Cisco has shown us as their big data reference architecture within their IT organization. Cisco uses MapR as their corporate Hadoop standard including the backbone of their real-time security information and event management (SIEM) solution. (get more details on Cisco use case slides here: https://drive.google.com/open?id=0B5TzetWfnSOGcW03ZkRhb1ZlNkE&authuser=0
Here you can see the “best of breed” approach Cisco maintains where MapR is used for large scale data storage, text analytics and machine learning and the DW is used for mission-critical financial reporting. SAP used for dashboarding.
1st use case was clickstream using applications logs which are ingested into Splunk and then into MapR
2nd 1/2 of last year they started using TDCH connector to bring data into Hadoop All ETL jobs using Hive and Datameer. use also for user analytics. generate some reports use Hive to create aggregated table with 160 attributes per user --> using to get into user 360 degree database - extract data and report in Tableau - reporting on # of visitors that landed on site and then converted into services, the banners they clicked on
2H2014 - 2nd phase of this project is they moved into MapR-DB and make it available to application users for personalized This customer 360 database is used to provide relevant
Smart Banner was 2nd application and is separate cluster.
They did the implementation themselves. 1st phase was moving use case onto MapR themselves 2nd phase - sent people to M7, admin, and Hive training - 10-12 people for training
3rd phase - real-time stream processing. Using our PS to develop real-time streaming application using Storm. Using data fomr schwab -- RT aggregation, ranking, and sorting for customer ... what are top 5 things they looked at, purchased, etc... then feed into real-time Oracle RT system for customer service so they can see what people looked at and best understand what
next phase - make data more self-service to generate reports rather than going through IT. want to make the data more accessible to end users using Drill