• Like
  • Save
Data Warehouse Evolution Roadshow
Upcoming SlideShare
Loading in...5
×
 

Data Warehouse Evolution Roadshow

on

  • 1,116 views

Data Warehouse Evolution Roadshow presented by MapR Technologies, Informatica, MicroStrategy, and Cisco!

Data Warehouse Evolution Roadshow presented by MapR Technologies, Informatica, MicroStrategy, and Cisco!

Statistics

Views

Total Views
1,116
Views on SlideShare
1,069
Embed Views
47

Actions

Likes
4
Downloads
0
Comments
0

1 Embed 47

https://twitter.com 47

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Data Warehouse Evolution Roadshow Data Warehouse Evolution Roadshow Presentation Transcript

    • The  Data  Warehouse  Evolu0on  Roadshow   ©MapR  Technologies  -­‐  Confiden6al   1  
    • Agenda     Welcome                                                                                                              MapR         Data  and  your  Data  Warehouse                              MapR   Big     The  New  Data  Warehouse          Informa6ca     Making  the  Most  of  Big  Data        MicroStrategy     Enterprise-­‐Grade  Hadoop:  Use  Cases                            MapR     Infrastructure  PlaLorm  For  Big  Data    Cisco               GeNng  Started/Q&A                                                                                    All     Close                                                                                                                          MapR   ©MapR  Technologies  -­‐  Confiden6al   2  
    • Big  Data  and     Your  Data  Warehouse   ©MapR  Technologies  -­‐  Confiden6al   3  
    •     “Data is a precious thing and will last longer than the systems themselves.” – Tim Berners-Lee, inventor of the World Wide Web. ©MapR  Technologies  -­‐  Confiden6al   4  
    •     “Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway.” – Geoffrey Moore, author and consultant. ©MapR  Technologies  -­‐  Confiden6al   5  
    •     “If we have data, let’s look at data. If all we have are opinions, let’s go with mine.” – Jim Barksdale, former Netscape CEO ©MapR  Technologies  -­‐  Confiden6al   6  
    • Big  Data  today  in  the  Enterprise   “Too  many  different  types,  sources  &  formats  of  cri6cal  data”       Mul0ple                     data  sources     Mul0ple   technologies     Mul0ple               copies  of  data     ©MapR  Technologies  -­‐  Confiden6al   7  
    • An  Enterprise  Data  Hub   Sensor     Data   Click   Streams   Enterprise   Data  Hub   Produc6on   Data   Web  Logs   Loca6on   Public   Social   Media   Sales   SCM   ü  ü  ü  ©MapR  Technologies  -­‐  Confiden6al     CRM   Combine  different  data  sources   Minimize  data  movement   One  plaLorm  for  analy6cs   8   Billing  
    • Big  Data  in  our  World   YouTube  users  upload  48  hours  of  new  video  every   minute  of  the  day.     §  Twieer  sees  roughly  175  million  tweets  every  day,  and   has  more  than  465  million  accounts.     §  Facebook  stores,  accesses,  and  analyzes  30+  Petabytes   of  user  generated  data.   §  More  than  5  billion  people  are  calling,  tex6ng,  twee6ng   and  browsing  on  mobile  phones  worldwide.     §  2.7  Zetabytes  of  data  exist  in  the  digital  universe  today.   §  Data  produc6on  will  be  44  6mes  greater  in  2020  than  it   was  in  2009.   §  ©MapR  Technologies  -­‐  Confiden6al   9  
    • Arrival  of  Big  Data  Impacts  Data  Warehouses   Variety   Volume   Prohibi6vely  expensive   storage  costs     ©MapR  Technologies  -­‐  Confiden6al   Inability  to  process   unstructured  formats     Velocity   Data   Warehouse   10   Faster  arrival  and   processing  needs  
    • The  Hadoop  Advantage   §  Fueling  an  industry  revolu6on  by   providing  infinite  capability  to   store  and  process  big  data   §  Expanding  analy6cs  across                 data  types     §  Compelling  economics   –   20  to  100X  more  cost  effec6ve  than   alterna6ves   Pioneered  at     ©MapR  Technologies  -­‐  Confiden6al   11  
    • Important  Drivers  for  Hadoop     §  Data  on  compute  drives  efficiencies  and   beeer  analy6cs   §  With  Hadoop  you  don’t  need  to  know   what  ques6ons  to  ask  beforehand   §  Simple  algorithms  on  Big  Data   outperform  complex  models   §  Powerful  ability  to  analyze     unstructured  data   ©MapR  Technologies  -­‐  Confiden6al   12  
    • What  is  the  Best  Way  to  Deploy  Hadoop?   Transitory  Data  Store     •  No long-term scale   advantages   •  Unprotected data Permanent  Data  Store     •  Highly available and fully protected  data   •  Works with existing tools vs.   •  ETL Tool focus •  Real-time ingestion and extraction •  Archive data from data warehouse Enterprise  Data  Hub   ©MapR  Technologies  -­‐  Confiden6al   13  
    •     “Hadoop ingests and stores data very cost effectively, and handles workloads such as the simple transformations in ETL. On the other hand, Hadoop does not address the missioncritical complex business analytic workloads…”     Mike  Koehler  -­‐  CEO  Teradata     ©MapR  Technologies  -­‐  Confiden6al   14  
    • Data  Warehouse  Op0mized:  Cost  Savings   RDBMS   DW   ETL  + Long   erm  S Storage   ETL  +    Long  TTerm  torage   Sensor  Data     Web  Logs   Query  +   Present     Hadoop   Benefits:   ü  Both  structured  and  unstructured  data   ü  Expanded  analy6cs  with  MapReduce,  NoSQL,  etc.   Solu0on   Hadoop   Cost  /  Terabyte   Hadoop  Advantage   $333   Teradata  Warehouse  Appliance   $16,500   50x  savings   Oracle  Exadata   $14,000   42x  savings   IBM  Netezza   $10,000   30x  savings   ©MapR  Technologies  -­‐  Confiden6al   15  
    • Exis6ng  Data   Social  Data   Weblog  Data   Telemetry   The  Enterprise   Data  Hub  for   Hadoop  Compute   -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐    -­‐       Freed  Up  Space   Fraud  Detec6on   Applica6on   ©MapR  Technologies  -­‐  Confiden6al   Enterprise                                         Data  W16   arehouse   Recommenda6on   Engine  
    • Mul0-­‐Tenant  Capabili0es  to  Share  a  Cluster   Successfully   §  Isola6on   –  Data  placement  control   –  Label  based  job  scheduling   §  Quotas   –  Storage,  CPU,  Memory   §  Security  and  delega6on   –  ACLs   –  AD,  LDAP,  Linux  PAM   §  Repor6ng   –  About  70  resource  usage  metrics   –  REST  API  integra6on     ©MapR  Technologies  -­‐  Confiden6al   17  
    • One  PlaMorm  for  Big  Data   Batch   Interac0ve   Log  file  Analysis   Data  Warehouse  Offload   Fraud  Detec6on   Clickstream  Analy6cs   Forensic  Analysis   Analy6c  Modeling   BI  User  Focus   Map   Reduce   File-­‐Based   Applica6ons   99.999%   HA   Data   Protec6on   Sensor  Analysis   “Twieerscraping”   Telema6cs   Process  Op6miza6on   Interac6ve   Batch   ©MapR  Technologies  -­‐  Confiden6al   Real-­‐Time   Real-­‐6me   SQL   Database   Scalability     &   Disaster   Recovery   Performance   18   Search   Enterprise   Integra6on   Stream   Processing   Mul6-­‐ tenancy   …
    • MapR  Means  More  from  Hadoop   ©MapR  Technologies  -­‐  Confiden6al   19  
    • The  New  Data  Warehouse   Big  Data  +  Hadoop       ©MapR  Technologies  -­‐  Confiden6al   20  
    • Agenda   §  Big  Data  and  Data  Warehouse  Op6miza6on   §  What  Are  Customers  Doing  to  Op6mize  their   Data  Warehouse?   §  Informa6ca  on  Hadoop  Complements  Your  Data   Warehouse     ©MapR  Technologies  -­‐  Confiden6al   21  
    • Big  Data  and  Data  Warehouse   Op0miza0on       ©MapR  Technologies  -­‐  Confiden6al   22  
    • 2014 2011 Devices & Machines 2007 Communities & Society 1990s 1980s BUSINESS 1960s-1970s USERS VALUE TECHNOLOGIES Few Employees Back Office Automation Business Ecosystems Customers/ Consumers Many Employees Front Office Productivity Line-of-Business Self-Service Social Engagement Real-Time Optimization E-Commerce OS/360 SOURCES TECHNOLOGY MAINFRAME 10 2 CLIENT-SERVER 10 4 WEB 10 6 CLOUD 10 7 SOCIAL 10 9 INTERNET OF THINGS 10 11 ©MapR  Technologies  -­‐  Confiden6al   23  
    • Informa0ca  +  Hadoop   PowerCenter  Developers  are  Now  Hadoop  Developers   Archive Profile Parse ETL Cleanse Match Transactions, OLTP, OLAP Analytics & Op Dashboards Documents and Emails Mobile Apps Social Media, Web Logs Machine Device, Scientific ©MapR  Technologies  -­‐  Confiden6al   Real-Time Alerts 24  
    • Data  Warehouse  Op0miza0on   1.    Iden6fy  inac6ve  &   infrequently  used  data   Data Warehouse Transactions, OLTP, OLAP Reports Documents and Emails Social Media, Web Logs 2.    Offload  data  &   processing    to  Hadoop   5.    Move  high  value   results  data  into  DW     3.    Ingest  raw  data,   replicate  changes  &   schemas   Machine Device, Scientific 4.    Store  &  prepare  (e.g.   ETL)  data  on  Hadoop   ©MapR  Technologies  -­‐  Confiden6al   25  
    • PowerCenter  Big  Data  Edi0on   Minimize  Risk   Quickly  staff  projects  with  trained   experts   Map  Once.  Deploy  AnywhereTM   Deploy  On-­‐Premise  or   in  the  Cloud   Traditional Grid     ©MapR  Technologies  -­‐  Confiden6al   26  
    • What  Are  Customers  Doing  to   Op0mize  their  Data  Warehouse?       ©MapR  Technologies  -­‐  Confiden6al   27  
    • Minimize  risk  and  grow  digital  business     The Challenge. Grow  digital  business  to  30%  ($1.8B)  and  reduce  fraud   The  Solu0on   Relational - SQL Server, Oracle, DB2, AS400, Mainframe The  Result   BI / Analytics Visualization & Reporting PowerCenter  Big  Data  Edi6on   Profile   Parse   ETL   •  Comprehensive  data   integra6on  plaLorm  to   integrate  large  volumes   of  data  from  over  18+   systems   •  Ability  to  use  exis6ng   skill  sets  &  make  them   more  produc6ve   Surveys & Net Promoter Scores (NPS) •  Lowest  risk  as  industry   leader   Social Media, Web Logs, JSON, XML Netezza, SQL Server, Oracle, SAS Machine, Forensic, Splunk Large  Global  Financial  Services  and  Communica0ons  Company   ©MapR  Technologies  -­‐  Confiden6al   28  
    • Reduce  Costs  &  Increase  Revenue   Consolidate  Data  on  Hadoop  &  Provide  360  View  of  Customer   The Challenge Data   increasing   20x   every   year   with   costs   rising   from   $17K   per   day   to   $50K   per  day  within  6  months.    Time  to  deliver  informa6on  taking  too  long. The  Solu0on   Business Reports Traditional Grid •  Gain  360  view  of   customer  behavior,   increase  cross-­‐sell  &   up-­‐sell  revenue   Transactions from 70 Data Centers In-­‐Store  POS   Data   B2B  Data   Exchange   Expected  Result   Data   Warehouse   Power  Center  Big   Data  Edi6on   •  Reduce  data  storage   costs  from  $50K  per   day  to  $500  per  day   172  TB   &  Data   Valida0on   Data  from  Gaming   Consoles,  TV,  Tablets,   Readers,  &  Clickstreams   from  5000  Web  Sites   •  Reduce  6me  to  deliver   informa6on  to  business   from  48  hours  to  15   minutes   Large  Global  Media  &  Entertainment  Company   ©MapR  Technologies  -­‐  Confiden6al   29  
    • Flexible  architecture  to  support  rapid  changes     The Challenge. Data  volumes  growing    at  3-­‐5  6mes  over  the  next  2-­‐3  years   The  Solu0on   The  Result   •  Manage  data  integra6on   and  load  of  10+  billion   records  from  mul6ple   disparate  data  sources   Traditional Grid DW   Data Virtualization Mainframe   RDBMS   EDW   Business Reports •  Flexible  data  integra6on   architecture  to  support   changing  business   requirements  in  a   heterogeneous  data   management  environment   DW   Unstructured   Data   Large  Government  Agency   ©MapR  Technologies  -­‐  Confiden6al   30  
    • Lower  costs  of  Big  Data  projects     The Challenge. Data   warehouse   exploding   with   over   200TB   of   data.     User   ac6vity      genera6ng  up  to  5  million  queries  a  day  impac6ng  query  performance The  Solu0on   The  Result   Business Reports ERP   CRM   Custom   Interac0on  Data   EDW •  Saved  $20M  +  $2-­‐3M   on-­‐going  by  archiving  &   op6miza6on   •  Reduced    project   6meline  from            6   months  to  2  weeks   Phase  1   •  Improved  performance   by  25%     Archived Archived   Data Data   •  Return  on  investment   in  less  than  6  months   ©MapR  Technologies  -­‐  Confiden6al   Large  Global  Financial  Ins0tu0on   31  
    • Lower  costs  and  minimize  risk   The Challenge. Increasing demand for faster data driven decision making and analytics as data volumes and processing loads rapidly increase The  Solu0on   RDBMS The  Result   •  Cost-­‐effec6vely  scale   performance     Near Real-Time Datamarts RDBMS Traditional Grid •  Increased  agility  by   standardizing  on  one   data  integra6on   plaLorm   Data Warehouse Web Logs ©MapR  Technologies  -­‐  Confiden6al   •  Lower  hardware  costs   Large  Global  Financial  Ins0tu0on   32   •  Leverage  new  data   sources  for  faster   innova6on  
    • Informa0ca  on  Hadoop   Complements  Your  Data   Warehouse       ©MapR  Technologies  -­‐  Confiden6al   33  
    • Maximize  Your  Return  On  Big  Data   Hadoop  complements  your  exisIng  infrastructure   Data  Assets   Opera0onal  Systems   OLTP   Analy0cal  Systems   Data  Products   Data   Warehouse   MDM   Transactions, OLTP, OLAP OLTP   Data   Mart   ODS   Documents, Email &  other  NoSQL   Social Media, Web Logs Machine Device, Scientific Access     &  Ingest   Parse  &   Prepare   Discover  &   Profile   Transform   &  Cleanse   Manage  (i.e.  Security,  Performance,  Governance,  Collabora6on)   ©MapR  Technologies  -­‐  Confiden6al   34   Extract  &   Deliver  
    • Data  Integra0on  &  Quality  on  Hadoop   1.  Entire Informatica mapping translated to Hive Query Language 2.  Optimized HQL converted to MapReduce & submitted to Hadoop cluster (job tracker). 3.  Advanced mapping transformations executed on Hadoop through User Defined Functions using Vibe SELECT              T1.ORDERKEY1  AS  ORDERKEY2,  T1.li_count,  orders.O_CUSTKEY  AS  CUSTKEY,  customer.C_NAME,                customer.C_NATIONKEY,  na6on.N_NAME,  na6on.N_REGIONKEY              FROM    (    SELECT  TRANSFORM  (L_Orderkey.id)  USING  CustomInfaTx    FROM  lineitem    GROUP  BY  L_ORDERKEY    )  T1    JOIN  orders  ON  (customer.C_ORDERKEY  =  orders.O_ORDERKEY)    JOIN  customer  ON  (orders.O_CUSTKEY  =  customer.C_CUSTKEY)    JOIN  na6on  ON  (customer.C_NATIONKEY  =  na6on.N_NATIONKEY)    WHERE  na6on.N_NAME  =  'UNITED  STATES'    )  T2                INSERT  OVERWRITE  TABLE  TARGET1  SELECT  *                INSERT  OVERWRITE  TABLE  TARGET2  SELECT  CUSTKEY,                              count(ORDERKEY2)  GROUP  BY  CUSTKEY;   MapReduce   UDF   Hive-QL ©MapR  Technologies  -­‐  Confiden6al   35      
    • Accelerate  Development     Reuse  and  Import  PowerCenter  Metadata       Import  and  validate   exis6ng  PowerCenter   mappings  before  running   on  Hadoop   ©MapR  Technologies  -­‐  Confiden6al   36  
    • Hadoop  Data  Profiling  Results   Value  and  Paeern  Frequency   to  isolated  inconsistent/dirty   data  or  unexpected  paeerns   Hadoop  Data  Profiling   results  –  exposed  to   anyone  in  enterprise    via   browser     CUSTOMER_ID  example   COUNTRY  CODE  example   2.  Value  &   Pabern     Analysis  of     Hadoop  Data   1.  Profiling  Stats:   Min/Max  Values,  NULLs,     Inferred  Data  Types,  etc.   Stats  to  iden6fy   outliers  and   anomalies  in  data     3.  Drilldown  Analysis  (into  Hadoop  Data)   Drill  down  into  actual   data  values  to  inspect   results  across  en6re  data   set,  including  poten6al   duplicates   ©MapR  Technologies  -­‐  Confiden6al   37  
    • Hadoop  Data  Domain  Discovery     Finding  funcIonal  meaning  of  Data  in  Hadoop   Leverage  INFA  rules/mapplets  to   iden6fy  func6onal  meaning  of   Hadoop  data     Sensi6ve  data     (e.g.  SSN,  Credit  Card  number,  etc.)       View/share  report  of  data  domains/ sensi6ve  data  contained  in  Hadoop.     Ability  to  drill  down  to  see  suspect  data   values.   PHI:    Protected  Health  Informa0on   PII:    Personally  Iden0fiable  Informa0on   Scalable  to  look  for/discover  ANY  Domain  type   ©MapR  Technologies  -­‐  Confiden6al   38  
    • Unified  Administra0on     Single Place to Manage & Monitor Full  traceability  from  workflow   to  MapReduce  jobs   View  generated   Hive  scripts   ©MapR  Technologies  -­‐  Confiden6al   39  
    • Maximize  Your  Return  on  Big  Data   Lower Big Data Costs Up To 2X (helps self-fund big data projects) •  5x  produc6vity  increase  using  exis6ng   developer  skills   Minimize Risk of New Technologies (single platform, quickly staff projects) •  Design  in  PowerCenter,  run  on  Hadoop  or   any  other  data  plaLorm   Accelerate Innovation (onboard, discover, operationalize) •  Enterprise  scalability,  security,  &  support   ©MapR  Technologies  -­‐  Confiden6al   40  
    • Making  the  Most  of  Big  Data   Leveraging  business  intelligence  to  turn  business  users  into  data   scien6sts   ©MapR  Technologies  -­‐  Confiden6al   41   MicroStrategy  Confiden6al.    Distribu6on  Prohibited  without  Prior  Authoriza6on.  
    • Agenda   1.  Self  Service   2.  Informa6on  Driven  Apps   3.  Mobility   4.  Advanced  Analy6cs   ©MapR  Technologies  -­‐  Confiden6al   42  
    • SELF-­‐SERVICE  ANALYTICS   Empowering everyone with rapid-fire data exploration and dashboarding ©MapR  Technologies  -­‐  Confiden6al   43  
    • Self-­‐Service  Analy0cs  Revolu0onizes  Tradi0onal  BI   Boost  user  sa6sfac6on  while  massively  increasing  produc6vity   More Productive! 5-10x! More content per creator" More Content" More Producers! 5-10x! More users can create content" More Collaborative! Peer-to-peer sharing" ©MapR  Technologies  -­‐  Confiden6al   More Content" Creators" 5-10x! More Sharing" 44   >100x! more content" creation and " consumption"
    • Business  User  Access  to  1000s  of  Data  Sources   Faster  access  to  your  data   Enterprise Applications Relational Databases CloudBased Data Personal or Departmental Big Data & Hadoop Spreadsheets, Access databases, CSV, public data downloads, etc. MapR MicroStrategy Modeled Data     SAP, Oracle e-Business, Siebel, Peoplesoft, etc. Oracle, SQL Server, MySQL, Teradata, Netezza, etc. Salesforce.com, NetSuite. Facebook, Eloqua, Google Docs, etc. Quick Data Import ©MapR  Technologies  -­‐  Confiden6al   No SQL or Scripting 45   Enterprise-certified singleversion of the truth
    • Enrich  Every  Analysis  with  Added  Insight   Enrich  with  Weather   Data     Impact  of  weather  on  game   outcome  and  aeendance   Professional  Sports   Enrich  with   Demographic  Data     Product  popularity  by  demographic   segment   Product  Sales   Enrich  with   Social  Data     Cross-­‐brand  affinity  to  determine   promo6ons  or  bundling  offers   Marke0ng  Promo0ons   ©MapR  Technologies  -­‐  Confiden6al   46  
    • World-­‐Class  Produc0on  Dashboard  Applica0ons   Informa6on-­‐Driven  Apps  are  the  future  of  dashboards   •  100  %  customized   look  and  feel   •  Comprehensive  data   •  Easy  to  use   •  Guided  workflow  for   consistent  user   experience   •  Personalized  for  each   user   •  Online  or  distributed   via  email   •  Mul6media             content-­‐enabled   •  Transac6on-­‐enabled   •  Live  data   ©MapR  Technologies  -­‐  Confiden6al   47  
    • Beyond  Mobile  Dashboards   Build  great  mobile  Smart  Apps  without  the  pain  of  na6ve  development   Analy0cs   Transac0ons   Mul0media   Update  systems  like    ERP/CRM   Analy6cs  and  data  visualiza6on   Add  videos  and  other  content   +" +" Apps for Every Customer-Facing Process" Apps for Every Internal Business Process" Logistics   Apps   ©MapR  Technologies  -­‐  Confiden6al   Operations   Apps   B2E" Apps   Data Collection" Product" Apps   Apps   48   Context-Aware" Apps   Executive" Apps  
    • Easy  Integra0on  with  Third  Party  Analy0c  Models   All  of  an  Organiza6on’s  Analy6cs  Can  Now  be  Distributed  Through  a  Single  PlaLorm   Deploy  Any  of  5000+   Open  Source  R  Analy6cs   Import  Predic6ve  Models   from  Popular  Packages   Create  Your  Own   Custom  Func6ons   MicroStrategy  R   Integra6on  Pack   PMML  Model   ƒApply(X) MicroStrategy  Custom   Func6on  Plug-­‐in   As  a  MicroStrategy  metric,  use  models  and   func6ons  in  any  report  or  dashboard   ©MapR  Technologies  -­‐  Confiden6al   49  
    • MicroStrategy  Analy0cs  PlaMorm   Comprehensive  analy6cs  suite  for  business   MicroStrategy Analytics Platform Self-Service Analytics Enterprise-Grade Business Intelligence Big Data Analytics Rapid-fire data discovery Produce and publish trusted analytics to elevate performance The power to transform your Big Data into insight •  Intuitive data exploration •  Self-service with no IT needed •  Access and combine data from all sources •  Trusted system-of-record reliability •  Advanced and predictive analytics •  Easy, cost-effective administration •  Fast dashboard development •  Comprehensive delivery options with massive user scale •  Blazing speed and performance Web or Mobile On-Premises or on MicroStrategy Cloud ©MapR  Technologies  -­‐  Confiden6al   50  
    • Two  Ways  to  Experience  MicroStrategy  Today   Best  of  all,  they’re  free!   MicroStrategy  Analy0cs   Desktop Fastest, easiest self-service analytics tool for business users. 100% free! See it in action MicroStrategy  Analy0cs   Express Cloud-based self-service visual analytics for any organization. Free for one year! See it in action ©MapR  Technologies  -­‐  Confiden6al   51  
    • Enterprise-­‐Grade  Hadoop     ©MapR  Technologies  -­‐  Confiden6al   52  
    • Use  Cases       ©MapR  Technologies  -­‐  Confiden6al   53  
    • Data  Warehouse  Offload:  Cost  Savings  +  Analy0cs   RDBMS   DW   ETL  + Long   erm  S Storage   ETL  +    Long  TTerm  torage   Sensor  Data     Web  Logs   Query  +   Present     Hadoop   Benefits:   ü  Both  structured  and  unstructured  data   ü  Expanded  analy6cs  with  MapReduce,  NoSQL,  etc.   Solu0on   Hadoop   Cost  /  Terabyte   Hadoop  Advantage   $333   Teradata  Warehouse  Appliance   $16,500   50x  savings   Oracle  Exadata   $14,000   42x  savings   IBM  Netezza   $10,000   30x  savings   ©MapR  Technologies  -­‐  Confiden6al   54  
    • Expand  Data  For  Exis0ng  Applica0ons   §  §  Network  security:  Network  IDS  with  a  3-­‐day  window   instead  of  a  10-­‐minute  window   Trade  Surveillance:  Rogue  trader  detec6on  on  intra-­‐ day  instead  of  end-­‐of-­‐day  market  data     §  Insurance:  Calculate  risk  triangles  for  individual   proper6es  instead  of  neighborhoods   ©MapR  Technologies  -­‐  Confiden6al   55       Advantages:   ü  1T  files  and  tables   ü  Real-­‐6me  data  inges6on  with   streaming  writes   ü  24x7  opera6ons  with   automated  failure  recovery     ü  Beeer  hardware  u6liza6on   with  2x  performance  
    • Combine  Different  Data  Sources       Advantages:   Streaming   writes  to   Hadoop   ü  Exponen6al  decrease  in   6me  to  market   Hadoop   ü  Real-­‐6me  data  inges6on   with  streaming  writes             Real-­‐6me   offers   ü  1T  files  and  tables   ü  24x7  opera6ons  with   automated  failure  recovery   POS/Online       Data   Retail  purchase  Info   ©MapR  Technologies  -­‐  Confiden6al   56  
    • New  Analy0cs       Advantages   ü  Increased  ROI  with  2x   performance   ü  High  available,  fully  data   protected  environment     •  Enhanced search •  Real-time event processing •  MapReduce-enabled machine learning algorithms ©MapR  Technologies  -­‐  Confiden6al   57   ü  Mul6ple  users  running   different  jobs  on  one   cluster    
    • Customer  Example   Cloud-­‐based  predic6ve  analy6cs  plaLorm   Apache  HBase   ý •  Compac6ons   •  Manual  administra6on   •  Poor  reliability   Cassandra   ý   þ   •  Compac6ons   •  Manual  administra6on   •  Eventual  consistency   •  •  •  •  •  No  compac6ons   Zero  administra6on   Strong  consistency   2x  Cassandra  performance     3x  HBase  performance   Sociocast  conducted  a  POC  with  the  three  solu6ons   ©MapR  Technologies  -­‐  Confiden6al   58  
    • MapR  Advantages  for  Enterprise  Data  Hub   •  Enterprise Grade Platform •  99.999% HA •  Full data protection •  Disaster recovery •  Easiest Integration •  Industry-standard interfaces: NFS, ODBC, LDAP, REST •  Streaming writes •  Best ROI •  Faster time to market •  Eliminate risk •  Reuse existing apps and tools ©MapR  Technologies  -­‐  Confiden6al   59  
    • In  the  era  of  the  “Internet  Of  Everything”     Unified  Compu0ng  Systems     The  Infrastructure  PlaMorm  For  Big  Data   ©MapR  Technologies  -­‐  Confiden6al   60  
    • ©MapR  Technologies  -­‐  Confiden6al   61  
    • “The  internet  of  everything   will  provide  a  21%  increase  in   corporate  profits  in  the  next   10  years”   ©MapR  Technologies  -­‐  Confiden6al   62  
    • How  many  IP  addresses  does  your  home  have?   IPV6   ©MapR  Technologies  -­‐  Confiden6al   63  
    • ©MapR  Technologies  -­‐  Confiden6al   64  
    • How  will  the  internet  of  things  change   Basketball?   ©MapR  Technologies  -­‐  Confiden6al   65  
    • Facebook  And  Cisco  Let  Brick-­‐&-­‐Mortars   Demand  Customers  Check-­‐In  To  Get  Wi-­‐Fi     10.03.13  at  Interop  Facebook  and  Cisco   roll  out  a  way  to  help  any  brick-­‐and-­‐ mortar  recoup  its  costs  by  asking  users   to  check-­‐in  to  get  Internet  access.  Those   who  oblige  get  dropped  on  the   business’  Facebook  Page,  and  their   anonymous,  aggregate  demographic   info  is  passed  to  the  merchant.   hep://techcrunch.com/2013/10/02/facebook-­‐wifi/   ©MapR  Technologies  -­‐  Confiden6al   66  
    • In-­‐Store  Manager  View  &  Capabili0es   Product  Catalog   Product  Characteris6cs   Marke6ng  Descrip6on   Quality  Data   Mul6-­‐media  Informa6on   Product  Sugges6ons   Promo0on  PorMolio   Campaign  Management   Customer  Segmenta6on   Loca6on  triggered  Rules   Consumer  Profile   CRM  profile   Loyalty  status   Consumer  Preferences   Applica0on  Analy0cs  &   Forecas0ng   Based  on  Historical  FooLal   Heatmap  Preferences   ©MapR  Technologies  -­‐  Confiden6al   67  
    • Beber  Retailing?     Retailers   Dashboard   Mobility  Services  Engine   Exis6ng   ERP     Systems   t   Exis0ng   Retailing   PlaMorm   Cisco  Wireless   WLAN  Controller   Consumer   Personal  Shopping  Assistant   ©MapR  Technologies  -­‐  Confiden6al   Cisco Wireless Access 68   Point
    • Big  Data  and  Key  Infrastructure  Abributes   (What  big  data  isn’t)   §  §  §  §  Usually  not  blade  servers  (not  enough  local  storage)   Usually  not  virtualized  (hypervisor  only  adds  overhead)   Usually  not  highly  oversubscribed  (significant  east-­‐west  traffic)   Usually  not  SAN/NAS   Low-­‐cost,  DAS-­‐based,   scale-­‐out  clustered   filesystem   Move  the   compute  to   the  storage     ©MapR  Technologies  -­‐  Confiden6al   69   $$$   69   69  
    • Cost,  Performance,  and  Capacity   HW:SW $ split 30:70 Expensive  Load   1TB/hr  ETL   Structured  Data:   Rela0onal   Database   $20K/TB   Enterprise     Data   Massive Scale-Out Column Store $10K/TB   $500-­‐$1K/TB   Hadoop No SQL HW:SW $ split 70:30 ©MapR  Technologies  -­‐  Confiden6al   70   Unstructured  Data:  Machine  Logs,    Web   Click  Stream,  Call  Data  Records,  Satellite  Feeds,   GPS  Data,  Sensor  Readings,  Sales  Data,  Blogs,   Emails,  Video  
    • Typical  big  data  deployments   Dedicated  “Pod”  for  Big  Data   General  Purpose  IT  Data  Center   IT  Infrastructure   standard  IT  servers   SAP   VMwar e   WEB   X86  servers   Big  Data   Big  Data   §  Experimental  use  of  Big  Data   §  App  team  mandated  infrastructure     §  Deployed  into  IT  Ops  mandated   infrastructures   §  Purpose  built  for  Big  Data   §  Big  Data  has  established  business  value   §  Performance  maeers   §  Large  or  small  clusters   §  “Skunk  works”   §  Small  to  medium  clusters   ©MapR  Technologies  -­‐  Confiden6al   71  
    • Cisco  UCS  Common  PlaMorm  Architecture  (CPA)   Building  Blocks  for  Big  Data   UCS   Manager   UCS  6200  Series   Fabric  Interconnects   Nexus  2232   Fabric   Extenders     ©MapR  Technologies  -­‐  Confiden6al   LAN,  SAN,   Management   UCS  240  M3   Servers   72   72  
    • Cisco  Big  Data     Common  PlaMorm  Architecture   Single-­‐SKU  Big  Data  SmartPlay  Bundles   The  Big  Data  Accelera0on  Kit   Cisco  Components   •  16  node  UCS  CPA  Solu6on     Cisco  SKUs:  UCS-­‐EZ-­‐BD-­‐HP  and  UCS-­‐EZ-­‐BD-­‐HC   MapR  Components   Single  Rack  UCS  Solu0ons   Single  Rack   Half-­‐Rack  UCS  Solu0ons   •  16-­‐node  M7  license     UCS  Solu0ons   Bundle  for  Hadoop   Bundle  for  Hadoop   Bundle  for  MPP   •  (2)  Free  Administrator  Training  Credits   Performance   Capacity   Configura0on   •  Installa6on  and  configura6on   UCS-­‐EZ-­‐BD-­‐HP   UCS-­‐EZ-­‐BD-­‐HC   UCS-­‐EZ-­‐BD-­‐STRT   Data  strategy  and  explora6on   •        •  MapR  SKU:  M7-­‐16-­‐CISCO-­‐12     2  x  UCS  6248   2  x  Nexus  2232  PP   8  x  C240  M3  (SFF)     2x  E5-­‐2690   256GB   24x  600GB  10K  SAS   hep://www.cisco.com/en/US/docs/ unified_compu6ng/ucs/UCS_CVDs/ Cisco_UCS_CPA_for_Big_Data_with_MapR.h tml   ©MapR  Technologies  -­‐  Confiden6al     2  x  UCS  6296   2  x  Nexus  2232  PP   16  x  C240  M3  (LFF)     E5-­‐2640  (12  cores)   128GB   12x  3TB  7.2K  SATA     73     2  x  UCS  6296   2  x  Nexus  2232  PP   16  x  C240  M3  (SFF)     2x  E5-­‐2665  (16  cores)   256GB   24  x  1TB  7.2K  SAS   73  
    • Hadoop  Hardware  Evolving  in  the  Enterprise   Typical  2009   Hadoop  node   • 1RU  server   • 4  x  1TB  3.5”  spindles   • 2  x  4-­‐core  CPU   • 1  x  GE   • 24  GB  RAM   • Single  PSU   • Running  Apache   • $   ©MapR  Technologies  -­‐  Confiden6al   Economics  favor   “fat”  nodes   • 6x-­‐9x  more  data/ node   • 3x-­‐6x  more  IOPS/ node   • Saturated  gigabit,   10GE  on  the  rise   • Fewer  total  nodes   lowers  licensing/ support  costs   • Increased   significance  of  node   and  switch  failure   74   Typical  2012   Hadoop  node   • 2RU  server   • 12  x  3TB  3.5”  or  24  x   1TB  2.5”  spindles   • 2  x  8-­‐core  CPU   • 1-­‐2  x  10GE   • 128  GB  RAM   • Dual  PSU   • Running  MapR     • $$$  
    • Seamless  Integra0on  with  Enterprise   ETH  1   ETH  2   SAN  B   Applica0ons   SAN  A   MGMT   MGMT   Uplink  Ports   OOB  Mgmt   Fabric  Switch   Server  Ports   Fabric  Extenders     Virtualized  Adapters   Compute  Blades   Half  /  Full  width   ©MapR  Technologies  -­‐  Confiden6al   6200   Fabric  A   F E X     A   Cluster   Chassis  1   F E X     B   CNA   6200   Fabric  B   FEX A FEX B CNA Rack Mount B200   75  
    • Extending  UCS  Enterprise  Applica0on  Ecosystem   to  Big  Data     Big Data Common Platform Architecture Enterprise Applications UCS Rack-Mount Servers ©MapR  Technologies  -­‐  Confiden6al   76   UCS Blade Servers SAN/NAS Arrays
    • UCSM  policy-­‐based  management,  provisioning,   and  monitoring  for  Big  Data  Infrastructure   UCS  Management  (160  Nodes  per  UCS  Managed  Cluster  Domain)   •  Cluster  Layout  and  Inventory   •  Per-­‐Server  Inventory   •  ID  Pools  (MAC,  IP,  UUID)  Management   Inventory & Asset Mgmt Fault Detection & SW Updates QoS Policies & Power Capping ©MapR  Technologies  -­‐  Confiden6al   •  Fault  detec6on  &  Logs     •  Event  Aggrega6on   •  System  so•ware  updates   •  QoS  Policy  defini6on   •  Policy  driven  framework   •  Policy  Based  Power  Capping   77  
    • CPA:  High-­‐performance  unified  fabric  and   compute  increases  cluster  efficiency     Single  wire  for  data  and  management   8  x  10GE   uplinks  per   FEX=  2:1   oversub  (16   servers/rack),   no   portchannel   (sta6c   pinning)   2  x  10GE  links   per  server  for  all   traffic,  data  and   management   ©MapR  Technologies  -­‐  Confiden6al   78  
    • Cisco  Unified  IO  Grant  Bandwidth   3G/s   2G/s   Individual   Ethernets       LAN  Traffic  (HDFS  Import)   3G/s   3G/s   Cluster  Traffic  (Shuffle)   3G/s   3G/s   Priori6sed  QoS   3G/s   Applica6on  Traffic  (HBase)     4G/s   5G/s   t1   t2   •   Near  Wire  Speed  without  CPU  load   •   Dynamic  bandwidth  management  according  to  SLA’s   •   See  network  sec6on  for  more   ©MapR  Technologies  -­‐  Confiden6al   79   t3  
    • Scaling  the  CPA   L2/L3  Switching   Single Rack 16 servers Single Domain Up to 10 racks, 160 servers UCS  Manager   UCS  Central   ©MapR  Technologies  -­‐  Confiden6al   Multiple Domains 80   80  
    • Big  Data  Infrastructure   UCS  Mul6-­‐Domain  (UCS  Central  Manages  up  to  10,000  nodes)   •  Inventory,  Fault,  Log,  Event  Aggrega6on   •  Global  ID  Pools,  Firmware  Updates,  Backups  and  Global   Admin  Policies     •  Global  Service  Profiles,  Templates  &  Policies   •  Sta6s6cs  Aggrega6on   •  HA  for  UCS  Central  Virtual  Machine  with  shared  storage   ©MapR  Technologies  -­‐  Confiden6al   81  
    • ©MapR  Technologies  -­‐  Confiden6al   82  
    • ©MapR  Technologies  -­‐  Confiden6al   83  
    • Q&A   ©MapR  Technologies  -­‐  Confiden6al   84  
    • Big  Data  Accelera0on  -­‐  Key  Benefits       §  Rapid  Big  Data  plaLorm  deployment/Accelerate  Big  Data  ROI   §  Ease  of  infrastructure  management  and  cluster  administra6on   §  Support  for  mission  cri6cal  workloads   §  Enterprise-­‐ready  workload  automa6on   §  Powerful  plaLorm  for  high  performance  and  high  capacity   §  Produc6on  ready  with  full  data  protec6on  and  disaster  recovery   §  Support  for  wide  variety  of  Big  Data  applica6ons,  including  but  not   limited  to:     –  data  warehouse  offload,   –  predic6ve  analy6cs,     –  360°  view  of  the  customer,     –  recommenda6on  engine,  and     –  long-­‐term  data  store   ©MapR  Technologies  -­‐  Confiden6al   85  
    • Big  Data  Accelera0on  Kit   Consul0ng  Services   16-­‐node  M7  UCS  Cluster     ü  Data  strategy  &  explora6on   ü  Integra6on  planning   ü  Installa6on  &  configura6on     ü  Highly  scalable  Cisco  UCS   CPA  solu6on   ü  HA  and  full  data  protec6on   ü  Advanced  admin  console   Helping   You     Get  Started     Formal  Training  &  Support     Hadoop  Self  Training       ü  Free  admin  training  for  (2)   ü  24/7  support   ©MapR  Technologies  -­‐  Confiden6al     ü  Series  of  jumpstart  videos   ü  User  forum  access   86  
    • Thank  You   ©MapR  Technologies  -­‐  Confiden6al   87