Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cisco event 6 05 2014v3 wwt only


Published on

Cisco Event 6/5 LA ; 6/6 Irvine

Published in: Technology
  • Be the first to comment

Cisco event 6 05 2014v3 wwt only

  1. 1. © 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 0 Leveraging Big Data to Create Value June 5th, 2014
  2. 2. Agenda 2 12-12:30pm Registration and Lunch 12:30-12:40pm Welcome and Introductions -- Art Hansen 12:40-1:45pm Keynote Presentation -- Chris Ward, Brian Vaughan, James Bigger 1:45-2:20pm Hadoop in the Real World by MapR -- David Feldman 2:20-2:30pm Break 2:30-2:45pm Cisco Unified Computing System Rack Mount Servers for Big Data – Wade Ison 2:45-3:30pm Big Data Brainstorm Breakouts 3:30-4:30pm Refreshments, Q&A Session, and Conclusion 4:30pm Raffle Drawing for iPad
  3. 3. Big Data as a Competitive Strategy Harvard’s Michael Porter: 1. Cost Leadership Strategy (Wal-Mart) 2. Differentiation Strategy (Southwest) 3. Innovation Strategy (Apple) 4. Operational Effectiveness Strategy (UPS) 5. Technology-based Competitive Strategy
  4. 4. What do we have that makes us different? • Custom Apps • Process (Workflow) • Big Data • People • Culture 4
  5. 5. Big Data’s Financial Benefits Gartner predicts that “Big Data will deliver transformational benefits to enterprises within 2 to 5 years, and by 2015 will enable enterprises adopting this technology to outperform competitors by 20% in every available financial metric
  6. 6. Goals for Today: • High ROI less than a year • Must be applied to things that are important to the business • Use of multiple patterns encouraged • New ways of correlating data that was formally not correlated • Remember Big Data patterns usually require scale • Understand Big Data Major Building Blocks • Learn the major patterns • Understand how to introduce Big Data into the enterprise in practical ways • Identify a solid use case for Big Data Tips for Winning:
  7. 7. WWT Big Data Leadership Team 20 years of management consulting and entrepreneurial experience. Expertise in financial services, insurance and telecom. Prior consulting experience with Opera Solutions and A. T. Kearney. Ph.D. in Physics from Oxford University. James Bigger Principal Consultant 15 years in management consulting, analytics and software experience. Expertise in healthcare and insurance. Prior experience with Opera Solutions, Mitchell Madison Group and Broadlane. Ph.D. in Physics from Stanford University. Brian Vaughan Principal Consultant 20 years in management consulting and executive leadership. Expertise in retail, marketing, hospitality & financial services. Prior consulting experience with Opera Solutions and The Boston Consulting Group. BA from Princeton University, MBA from the University of Virginia Darden School of Business. Chris Ward Principal Consultant Over 20 years of experience in a range of IT and security disciplines. Responsible for deploying large, secure, Hadoop-based platforms for the U. S. Government. 10 year of international experience implementing networking and virtual data center environments Undergraduate degree from AIU. Matt DuBell Principal Systems Engineer Over 7 Years of experience in management and analytics consulting. Led engagements in telecom at Opera Solutions. Previous experience performing predictive analytics for NASA and USAF at The Aerospace Corporation. Ph.D. in Mechanical Engineering from Pennsylvania State University. Yoni Malchi Engagement Manager 18 years of analytics and software development experience. Expertise in financial services, healthcare, insurance, retail and marketing science. Prior analytics development experience at Opera Solutions, FICO and J.D. Power and Associates. Ph.D. in Physics from Stanford University. . Jason Lu Chief Scientist Over 7 Years of management consulting and entrepreneurial experience. Expertize in financial services, travel, and retail sectors across US and Europe. Led Big Data strategy and analytical engagements at Opera Solutions. MSci in Astrophysics from the University of Cambridge. Jamie Milne Engagement Manager Over 8 years of experience in analytics consulting and delivery management. Ran engagements in wealth management, corporate security, marketing, education and transportation at Opera Solutions and IBM Global Business Services. BS in Mathematics from Georgetown University. Chris Infanti Engagement Manager Over 20 years of experience in enterprise datacenter, building innovative solutions in Big Data, storage, HPC, virtualization, data migration and enterprise applications. Formerly lead architect for NetApp's Big Data solutions, and led the development of the FlexPod select solutions. B.S. in Electrical Engineering. Prem Jain Principal Architect
  8. 8. Volume, Variety and Velocity of Data are Exploding The production of data is expanding at an astonishing rate. Drivers include the switch from analog to digital technologies and the creation of structured and unstructured data by individuals and companies via social media and the Web 8 • Every 60 Seconds: - 98,000+ tweets - 695,000 status updates - 11 million instant messages - 698,445 Google searches - 168 million+ emails sent - 1,820TB of data created - 217 new mobile web users • The need to process more data faster to respond to dynamic business trends has brought new requirements for database architectures • We believe the industry stands at the cusp of the most significant revolution in database and, therefore, application architectures in the past 20 years. VelocityVarietyVolume 0 10 20 30 40 2010 2015 2020 ZB Enterprise Managed Data Enterprise Created Data 0 10 20 30 40 50 60 70 80 2009 2010 2011 2012 2013 2014 Unstructured data storage Structured data storage EB Source: IDC, Gartner, EMC, Worldwide File-Based Storage 2010-2014 Forecast
  9. 9. Vendor Landscape Is Crowded and Growing Data Sources & Capture IT Infrastructure Data Management & Integration Analytics Platforms and Solutions Analytics Services and Support Data Vendors Infrastructure Vendors Open Data Platforms Proprietary Data Platforms Extended infrastructure + data platforms Systems Integrators Specialized End-to-End Solutions Analytics Service Providers Vertical Analytics Solutions
  10. 10. Distributed File System and Processing Language Characteristics • Parallel storage/processing • Flexible programming model • Horizontal scaling • Batch processing Non-relational Key-Value Database Characteristics • Fast read/write • Real time query • Horizontal scaling • Simple programming model • Dynamic schema Column-Oriented Analytics Database Characteristics • Relational • Efficient compression • Optimized for fast read of many/all records In-Memory Database and Processing Characteristics • Relational • Random Access • Extremely Fast Enablement / Uses • Complex Event Processing • Real Time Analytics • Potential to use a common database for transactions and analytics Enablement / Uses • Pre-processing of data for analytics • ETL for transforming unstructured data to structured • Data summarization Enablement / Uses • Real-time ingest • Rapid retrieval • Input to MapReduce Enablement / Uses • On-Line Analytics Processing (OLAP) • Data storage and retrieval for advanced analytics Foundational Emerging Key Big Data Technologies 10 Hadoop NoSQL Columnar In-Memory
  11. 11. The Big Data Software Stack The big data ecosystem includes open source and proprietary distributions that span the stack from ingest through analytics 11 JobFlow USER/MACHINE WORKFLOW Enterprise Structured Enterprise Unstructured 3rd Party Web/ Unstructured Flexible interfaces: TRANSFORM ANALYTICS DATABASE ANALYTICS ACCESS/ QUERIES INGEST FILE SYSTEM/ DATABASE MANAGEMENT Columnar In Memory Parallel RDBMS EMC/PIVOTAL HD / GREENPLUM HP/VERTICA/CLOUDERA ORACLE BIG DATA EXADATA/EXALYTICS IBM INFOSPHERE BIGINSIGHTS SAP HANA TERRACOTTA BIGMEMORY ZOOKEEPER CLOUDERA HORTONWORKS MAPR PIVOTALHD HADOOP CASSANDRA HBASE MONGODB TEREDATA NETEZZA GREENPLUM VERTICA OLAP Natural Language Custom Analytics Custom API’s SQL OPEN SOURCE COMMERCIAL OPEN SOURCE Fast, Scalable Provisioning Maintenance Flexible, Compressed, Fast Read Optimized for high vol reads Interfaces to accept data Real Time & Batch HDFS NoSQL - Document - Key-Value - Wide Column SQL PIG HIVE R PYTHON SAS SPSS Batch Streaming SQOOP FLUME SPLUNK TALEND LAYER PROPERTIES OPTIONS EXAMPLES OF PRODUCTS INTEGRATED OFFERINGS MapReduce HADOOP Parallel, Distributed ODS Data Warehouse Call Center Server Logs Financial Demographic OOZIE DATA ACQUIRE ORGANIZE ANALYZE DECIDE SOLUTIONS MICROSTRATEGY BUSINESS OBJECTS COGNOS ORACLE OBIEE PLUS
  12. 12. Technology: Expanding the Traditional Stack Big Data requires a technology stack that leverages existing infrastructure and introduces new technology for distributed parallel processing 12 Queries (SQL) Relational Databases Monolithic Hardware (few CPUs and network computers) “Shared Disk/Memory” Architecture (centralized processing) Direct Record Access or Queries Monolithic Hardware (few CPUs and network computers) “Shared Disk/Memory” Architecture (centralized processing) NoSQL Database Parallel Relational Database Distributed File System High-Performance Traditional Relational Database MapReduce Programs Distributed Hardware (multicore CPUs, multiple computers connected via high-performance network) “Shared Nothing” Architecture (distributed parallel processing) INTERFACE DATABASE/ DISTRIBUTED PROCESSING FRAMEWORK HARDWARE TRADITIONAL RELATIONAL DATABASE STACK STACK FOR THE NEW DATA FOUNDATION Source: IDC, CSC, Gartner
  13. 13. Business Need Class of Analytics Analytics: Translating Business Needs to Math Regardless of industry, many use cases translate into a limited class of “math problems” that big-data platforms (unlike transactional platforms) are optimized to solve at scale 13 Method Analytics Ready Stack Hardware & Software • Parallel • Distributed • Shared Nothing • Columnar • NoSQL • In-Memory • ARMA • Decision Trees • Genetic Algorithms • Graph Theory • Kalman Filter • KNN • Linear Regression • Logistic Regression • Matrix Factorization • Monte Carlo • Neural Networks • Sorting • Survival Time Analysis • Visualization • Regression • Classification • Clustering • Forecasting • Optimization • Simulation • Sparse Data Inference • Anomaly Detection • Natural Language Processing • Intelligent Data Design • Recommendation • Risk Scoring • Pricing • Capacity Planning • Cost Reduction • Matching • Retrieval
  14. 14. Defining The Business Opportunity Is The Starting Point The power of “Big Data” lies in bringing together data in a timely fashion from sources within and external to the enterprise - structured and unstructured - to create a complete view of critical business issues, therefore enabling advanced analytics to unlock key insights that drive significant business value 14 Outcome Analytics Data Technology Clearly defined use cases with the potential to deliver significant value by distilling vast data into new, previously unknowable intelligence Advanced machine learning techniques to analyze data and mine for insights to drive critical business decisions Structured or unstructured, internal or external, requiring new methods of storage/integration Emerging/new technology stacks using scalable, distributed architectures
  15. 15. Telematics is Transforming Auto Insurance Big Data Use Case Combine driving behavioral with actuarial data to create individualized risk models that more accurately predict claims losses that enables risk adjusted pricing to gain market share and increase margins Business Imperative To gain profitable market share, insurance companies need to offer the lowest “risk adjusted” pricing possible to consumers Methods • KNN • Linear Regression • SVD Class of Analytics • Regression • Clustering • Anomaly Detection • Sensors to capture routes, miles driven, time of day, braking patterns, driving speed • Geospatial maps tied to database layers Science & Data HDFS MapReduce NoSQL Data W/H In database Analytics Data Marts Technology Data 15 C a s e S t u d y I n s u r a n c e
  16. 16. Predictive Maintenance 16 FTP over MESH Data Logger Data Logger • One per truck • (Logs, Sensors, OEM Alarms, VIMS Service Port) Equipment Maintenance Dispatch & Operator Fuel, Oil Analysis, etc. Hours 1 Urgent Component Problem 2 Critical Sensor Problem Stratifying Alarms 3 Important/Not Urgent Component/Sensor Problem 4 Not Important Component or Sensor Problem 5 Noise - Ignore Data Logger Data Driven Preventative Maintenance Data/Analytics driven timing for preventative maintenance (e.g., oil changes) on individual Trucks1 Urgent Component Problems e.g., Engine, Transmission, Differentials, Torque Converters, Final Drives Major Component Failure Model(s) Project Scope • 252 Trucks – 200 sensors per truck • 7 Mine sites • 10,000 readings/second Data Integration • Integrating 15+ siloed data sources in multiple file formats • 10 Terabytes of data • 3 year historical data ecosystem Business Impact: Higher equipment up-time; reduced critical component failure; better preventative maintenance and increased labor productivity C a s e S t u d y M i n i n g
  17. 17. Data Warehouse Augmentation: Value Proposition Augmenting the Data Warehouse with a less expensive Hadoop system will allow companies to free up valuable space on their DW systems to run faster queries and analysis, whilst storing large volumes of their data universe WWT Hadoop Appliance Traditional Data Warehouse Full Data Universe CRM Social Media Billing Web logs Payments Scheduling Cold Data Warm Data Hot Data 2. About 50% of data that is brought into a typical Data Warehouse system is rarely accessed: Cold Data 3. About 80% of the queries and reporting performed on Hot Data does not need to be at DW speeds 1. A significant amount of data is thrown out during the ETL process that may be valuable in the future Traditional Data Warehouse Full Data Universe CRM Social Media Billing Web logs Payments Scheduling Cold Data Warm Data 2. Store Cold Data in Hadoop, taking advantage of lower cost per TB − Teradata: $17K − Hadoop: $2K 3. Continue to take advantage of DW agility and speed in real-time analysis and querying 1. Utilize additional Hadoop-based storage to store full data universe − Files can be stored in natural format Warm Data Hot Data Potential jumping-off point for Big Data Business Impact project CURRENTPROPOSED
  18. 18. Integrating Many Data Sources To Provide Lift Purchase History Service History Web Data Campaign Metadata Destination Word clouds Partner Hotels Profiled 100+m transactions for millions of customers Linked data for millions of customer interactions and service records Analyzed billions of page-views for behavioral indicators Extracted meaning from tens of thousands of email campaigns Mapped destinations to key “feature tags” which explain selection Geotagged tens of thousands of partner hotels by understanding free text description C a s e S t u d y G l o b a l A i r l i n e 18 Time Nov 2010 Dec 2010 Jan 2011 Feb 2011 Mar 2011 Apr 2011 May 2011 Jun 2011 Jul 2011 Aug 2011 Sept 2011 Hotel ExperienceFlight Car Rental Holiday Customer Travel Profile ID= xxxx 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Uptake% % Offered Lift 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Uptake% % Offered Time Nov 2010 Dec 2010 Jan 2011 Feb 2011 Mar 2011 Apr 2011 May 2011 Jun 2011 Jul 2011 Aug 2011 Sept 2011 Hotel ExperienceFlight Car Rental Holiday Customer Travel Profile: ID= xxxx
  19. 19. Typically social media tools focus on monitoring past/present activity. Predictive analytics allows users to identify important threads and intervene early, shifting the focus to future activity • Details on particular themes or attributes • Forecasts trend and a mechanism to intervene in attribute that are going viral • Word cloud shows ongoing buzz and sentiment • Tabular view shows emerging themes and sentiment, virality score and recommended time-window for action Social Media Analytics C a s e S t u d y C o n s u m e r E l e c t r o n i c s 19
  20. 20. Curriculum Management Engine Curriculum Management Engine We designed a recommendation engine that generates a dynamic set of recommendations on a daily basis (over 1MM/day, from sales force handhelds, website, call centers) that learns and adapts to increase its ability to change behaviors over time through a Curriculum Management Engine Plan for Smith Household: Total Wallet = $600 Aspiration: Achieve 60% share of wallet up from 40% How: • Habituate Pizza and Ice Cream and Increase Frequency • Move Into Dinner Entrees & Sides • Move Into Higher Margin Breakfast Entrees • Increase Frequency of Purchases VISIT #1: 1. Haven’t Bought In A While: 2. Others On My Route Like: 3. Would You Like Another?: 4. Just for You -- $1.00 Off Household Response VISIT #2 1. Would You Like Another? 2. Others On My Route Like: 3. No pizza; not yet consumed 4. Just For You Nature of Recommendations • Individuated Offers – Especially for You • Cross-Sell/ Up-sell – Based on latent needs • Reminders – Haven’t bought in a while • Trials – Never tried but similar people like it • Promotions – Being a loyal customer Recommendations for Grocery Retailer’s Customers Delivered $100 million p.a. in EBIT C a s e S t u d y F o o d G r o c e r
  21. 21. Using Internal and External Data with Advanced Analytics for Site Selection • Comprehensive performance data – Fronts store / pharmacy sales – Customer and patient demographics – Local area demographic • Web Scraping and Text Analytics – Neighborhood business profile – Competitor performance – Healthcare alternatives (ER, Urgent Care, PCPs) • Non-linear, multivariate predictive models – Linear/Logistic Regression – Decision Trees (CART) – Random Forest – Gradient Boosting Machine – Neural Networks • Incorporation of all data, including variables usually viewed as “qualitative” 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 R = 0.75 M o d e l P e r f o r m a n c e Predicted Patient Volume Actual Patient Volume +17% Model Recommendation 0.83 Original Expansion Plan 0.71 Potential Volume I m p a c t C a s e S t u d y R e t a i l P h a r m a c y
  22. 22. Designing Appropriate Reference Architectures A reference architecture is a specific set of software and hardware components that together comprise an Analytics-Ready Infrastructure 22 USER/MACHINE WORKFLOW Visualization Forecasts Pricing Reports Alerts Scores Offers NETWORK LAYER DESCRIPTION EXAMPLES OF PRODUCTS DATA FILE SYSTEM/ DATABASES Enterprise Structured Enterprise Unstructured 3rd Party Web/ Unstructured ODS Data Warehouse Call Center Server Logs Financial Demographic CUSTOM ANALYTICS ANALYTICS TOOLS ANALYTICS DATABASES • Flexible, Compressed, Fast Read • Columnar, In Memory, Parallel RDBMS • High-level programming languages with packaged analytical modules • Can be either general purpose or industry/function specific • Services • Advanced models • Parallel, Distributed • HDFS or NoSQL • Interfaces to accept fast and varied data “Analytics-Ready Infrastructure” COMPUTE STORAGE INGEST • 10Ge, low latency • Commodity, rack mount • Purpose built servers • Internal JBOD, Direct Attached, Network SAS R PYTHON SPSS VERTICA GREENPLUM TERADATA NETEZZA EXADATA SAP HANA CLOUDERA MAPR HORTONWORKS PIVOTALHD MARKLOGIC DATATACTICSORACLE NOSQL FLUME SQOOP TALEND VELOCIDATA UCS-C240 UCS-C460 HP 380P HP SL4540 UCS 6200 NEXUS 2200 HP 5800 DELL FORCE10 JBOD SATA JBOD SSD E-SERIES ISILON
  23. 23. Deploying new technologies and combining with existing architecture • How do we create an effective integrated Big Data stack? • What new technologies do we need and how do they fit together? Organizing for success • Where does Big Data fit? • What belongs in the BUs vs. centralized? • Who is responsible for data integrity? • Where do we find the critical resources needed to deliver Big Data solutions? Navigating a crowded and evolving vendor landscape • How do we separate marketing hype from reality? • Who should we use? Who can we trust Defining the business value proposition • What problem/opportunity are we pursuing? • What is the value that can be created? Four Major Big Data Challenges Facing Most Companies In our meetings with customers, four issues are consistently brought up as a major challenges related to creating a big data capability that can effectively support the business units 23 Key Big Data Challenges
  24. 24. Dual Approach to Delivering Big Data Solutions WWT offers customers both strategic and tactical approaches to derive value from the application of Big Data analytics and technology 24 • Strategic Roadmap − Big Data Strategy − Use Case Design • Use Case PoC − Analytics Development − Workflow Integration • Data Warehouse Augmentation − ETL Offload − Data Lake Creation • SAP HANA Implementation • Big Data Stack Build / Optimization • Production Support & Sustainment BIG DATA BUSINESS IMPACT Extract value from data to drive multiple Use Cases BIG DATA TECHNOLOGY OPTIMIZATION Accomplish data tasks, faster, cheaper, better
  25. 25. EXAMPLE SCALE OUT HARDWARE • Multiple Nexus 6000/ 7000 Series switches • 5 – 50 Big Data racks • Cisco SAP HANA scale-out (e.g. 8-16 UCS-B200) • Software scale-out EXAMPLE STARTER KIT: Cisco SAP HANA Medium Appliance (2 UCS-C460) • Big Data Solution Stack: o 2 UCS 6296PP o Each Big Data rack:  2 Nexus 2232PP  8-16 HP DL380 or SL4540, UCS-C240, etc. o Initially: 1 – 2 racks o Software: MapR, E. Service and Solution Offerings 25 • Develop a roadmap for implementing Big Data - Use case exploration - Data Governance, Infrastructure and Analytics ownership • Define high impact use cases • Design and test appropriate reference architectures Plan Design Pilot Scale WWT Offerings Indicative Infra- structure • Create detailed description of selected pilot use cases - Analytics - Workflow integration • Test various reference architectures • “Stand-up” reference architecture • Design the pilot - Success criteria - Timeline - Scope • Identify and prepare data • Build analytical models • Design workflow • Implement, manage and monitor Analytics-Ready Infrastructure Solution Development • Implement design changes from pilot learnings • Invest in software development as necessary to improve UI • Prepare ETL process for scale • Build out infrastructure as required to support rollout 4. Production Support • Operationalizing POC • Infrastructure Sustainment • Training • Ongoing support 3. Proof of Concept • POC design • Analytical models • Customer data loaded, processed and analyzed 1.Strategic Roadmap • Use case definition • Organizational alignment • Big Data Architecture high level design 2. Big Data Stack Build • Detailed design Big Data architecture and BOM • Procure, configure and deploy Big Data stack
  26. 26. Advanced Technology Center (ATC) COLLABORATIONENTERPRISE NETWORKS SECURITY DATA CENTER A highly collaborative, ecosystem to design, build, educate, demo & deploy advanced technology solutions for our customers & partners Hands-on Access to over $50M in Equipment • Point Product Demos • Tech. Training Sessions • EBCs / ATC Tours • Tech Days Demos • Customer Proof of Concepts • Reference Arch. Dev. • Product Training / PS • Version Upgrade Testing • Version Upgrade Testing • Strategic Ref. Arch. Demo (RAD) • Product Comparison –Func. • Product Comparison – Perf. • Customer Access to Lab • Customer Environment • Workshop Demos • Early Field Trials / Beta Code • Certification • Next Generation Networking • Nexus (7K, 5K, 3K & 2K) • Virtual Networking (Nexus 1000v) • OTV, LISP, Fabric Path • Layer 2 Extension • DR/BC Networking • BYOD (Bring Your Own Device) & Secure Mobility • Jukebox • ISE & RSA • ASA 1000v • VSG (Virtual Security Gateway) • Cyber Security Solutions • Unified Communications • Tandberg Video • VXI (View & XenDesktop) • WebEx, Call Center & Collaboration Solutions • Phones, Backpacks & Soft, Phone Clients • Telepresence & Business Video • Vblock, FlexPod & CloudSystem Matrix • EMC & NetApp Storage • vSphere / XenServer • vCloud Director • VDI (View / XenDesktop) • Cisco CIAC & BMC CLM • EMC’s UIM & Cloupia • FAST MDC (Mobile Data Center) Solutions 26
  27. 27. ATC Big Data Functions: Overview Three functions of the ATC have been identified, which will support Sales (and other) processes 27 Function Description Usage Proof of Concept • Test customer solutions prior to full onsite implementation, e.g. − Run Use Case analytical models and architectures on Big Data machines − Create Big Data hardware/software stack, potentially with client data • Mid-term project basis, to provide an environment for customer, based on a running engagement Technology Comparison • Compare Big Data solutions to provide insight into strengths and weaknesses of each • Run “bake-offs” to gauge how well a full solution can be solved using certain components • To test generic POCs, may be customer-driven • Inform Big Data Team on best solutions Field Demo • Showcase Big Data capabilities by hosting demos of WWT PoCs and analysis − Run Use Case analytical models and architectures on Big Data machines • Tool for sales calls and EBCs
  29. 29. First Step: Big Data Workshop 29