Your SlideShare is downloading. ×
0
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Key note   big data analytics ecosystem strategy
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Key note big data analytics ecosystem strategy

1,096

Published on

Keynote: Big Data and Data Warehouse Modernization – Trends & Directions" Les King

Keynote: Big Data and Data Warehouse Modernization – Trends & Directions" Les King

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,096
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
59
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. © 2013 IBM Corporation Data Server Day – Big Data and DW Modernization Big Data Analytics Ecosystem Les King Director, Database, Analytics, Big Data Solutions May, 2014 lking@ca.ibm.com
  • 2. 2 © 2013 IBM Corporation Agenda The Impact of Big Data in the Market Place IBM’s Analytics Portfolio Today IBM’s Analytics Vision Current Steps Towards that Vision Roadmap – A peak ahead
  • 3. 3 © 2013 IBM Corporation 3 The Future of Analytics - Cognitive Tabulating Systems Era 1900 Cognitive Systems Era 2011 Programmable Systems Era 1950
  • 4. 4 © 2013 IBM Corporation 2 years ago, Watson’s advanced analytic capabilities could sort through the equivalent of 200 million pages of data to uncover an answer in 3 SECONDS and would fill up this entire room. Today…Watson is now 24x faster and has gone from the size of a master bedroom to three stacked pizza boxes.
  • 5. 5 © 2013 IBM Corporation 2 years ago, Watson’s advanced analytic capabilities could sort through the equivalent of 200 million pages of data to uncover an answer in 3 SECONDS and would fill up this entire room. Today…Watson is now 24x faster and has gone from the size of a master bedroom to three stacked pizza boxes. Watson refers to a set of solutions for the era of Cognitive Analytics
  • 6. 6 © 2013 IBM Corporation Real time analytics is not only about reducing the latency between what flows through your transaction systems and when it lands in your data warehouse so you can perform analytics …. It is about real time activities performed by people ( your customers – and – potential customers ) through non-traditional sources ( facebook, tweets ) and being able to react to that immediately to capture an opportunity.
  • 7. 7 © 2013 IBM Corporation Today’s organizations are facing many disruptive forces The ability to exploit big data Creating the need for organizations to understand and anticipate customer behavior and needs based on customer insights across all channels Creating new opportunities to capture meaningful information from new varieties of data and content coming at organizations in huge volumes and at accelerated velocity Creating the need for all parts of the organization to optimize all of their processes to create new opportunities, to mitigate risk, and to increase efficiency 3The shift of power to the consumer1 Accelerating pressure to do more with less 2
  • 8. 8 © 2013 IBM Corporation Data AVAILABLE to an organization Data an organization can PROCESS The Big Data Conundrum The percentage of available data an enterprise can analyze is decreasing proportionately to the available to that enterprise Quite simply, this means as enterprises, we are getting “more naive” about our business over time
  • 9. 9 © 2013 IBM Corporation Transactional & Application Data Sensor Data Social Data Enterprise Content • Volume • Structured • Throughput • Variety • Unstructured • Volume • Variety • Unstructured • Veracity • Velocity • Structured • Ingestion Big Data is all data and all paradigms for extracting value
  • 10. 10 © 2013 IBM Corporation Breakthrough Analytics for All Data IBM’s capabilities span all dimensions of Big Data VelocityVolume • Build fast, accurate models on petabytes of data • New automated discovery techniques to understand what’s important in large volumes of data • Perform analytics where the data is for fast performance 10 Velocity Variety Veracity •Score models for immediate impact using streaming data •React in real time by embedding predictive models into apps •Establish alerts with visual context to understand what’s happening right now •Analyze social media to understand what’s being said about your business •Use natural language processing and sentiment analysis to process text data and extract key concepts •Analyze sensor data (Internet of Things) to improve business process & reduce costs •Uncover relationships among diverse entities to get a more accurate view of your entities •Discover relationships among social networks and predict their behavior •Prepare data for accurate models with sophisticated techniques Data in Many FormsData in MotionData at Scale Data in Doubt
  • 11. 11 © 2013 IBM Corporation Does the Era of Big Data Signify the End of the Data Warehouse? NO! “Instead they [organizations] are moving towards multiple systems, including content management, data warehouses, data marts and specialized file systems tied together with data services and metadata, which will become the "logical" enterprise data warehouse.” Andrew Foo, Senior IT Architect Smarter Planet Solutions Team - “Big data brings new life to the data warehouse by enriching it and introducing new insights taken from non-traditional sources, as well as unexplored data sources. The integration of big data and traditional data warehousing can produce results that are the best of both worlds.” Top 10 Strategic Technology Trends for 2013
  • 12. 12 © 2013 IBM Corporation In the era of Big Data… Different data workloads require different data systems Real Time Fraud Detection Sales AnalysisE-commerce Demand Analysis Transaction Processing Reporting and Analytics Operational Analytics Sensor Data Analysis Analytics Data Warehouse Transactional Database Operational Data Warehouse Mobile Data Serving JSON Database Mobile Storefront Time Series Database Data series 2Meter 2 Data series 1Meter 1 JSON doc 2Key 2 JSON doc 1Key 1 Key 2
  • 13. 3 © 2013 IBM Corporation 3 The Future of Analytics - Cognitive Tabulating Systems Era 1900 Cognitive Systems Era 2011 Programmable Systems Era 1950
  • 14. 3 © 2013 IBM Corporation 3 The Future of Analytics - Cognitive Tabulating Systems Era 1900 Cognitive Systems Era 2011 Programmable Systems Era 1950
  • 15. 15 © 2013 IBM Corporation So …. What’s the current challenge ? 1. Era of Cognitive Analytics 2. Quantum leap in ability to store and work with mass amounts of information 3. Real-time analytics which includes ANY data source and ANY type of data 4. Infusion of new volumes, veracity, velocity and variety of data 5. “Power” moving to the consumer 6. “Fit for Purpose” solutions are required 7. Companies have “history” – an established ecosystem – which cannot be ignored And to top it off …… In order to stay competitive, companies need the ability to exploit this while dealing with reduced expense budgets
  • 16. 16 © 2013 IBM Corporation Committed to Client Success IBM understands all kinds of data • Game-Changing Innovation – such as Watson, BLU acceleration, streaming analytics and expert integrated systems; 20 years of patent leadership • Business-Ready Capabilities – big data and analytics capabilities, integrated and hardened for serious use, with flexible deployment options IBM knows how to turn data into value • Client Expertise – deep industry know-how and solutions with global reach • Strong Ecosystem – growing investment with 360+ business partners & 100+ universities • Build on Current Investments – enhance existing analytics and information infrastructure with unparalleled breadth and depth of new capabilities IBM has invested in big data and analytics • $17B+ in Acquisitions – coupled with game-changing innovation since 2005 • Analytics Solution Centers – visited by 4000+ organizations accessing global expertise
  • 17. 17 © 2013 IBM Corporation IBM’s POV on Big Data & Analytics Build a culture that infuses analytics everywhere. Be proactive about privacy, security and governance. Invest in a Big Data & Analytics platform. 1. 2. 3.
  • 18. 18 © 2013 IBM Corporation IBM’s Key Platform Capabilities Accelerators Information Integration & Governance Data Warehouse Stream Computing Hadoop System DiscoveryApplication Development Systems Management BIG DATA PLATFORM PureData for Analytics & DB2 with BLU Acceleration Delivers deep insight with advanced database analytics & operational analytics Information Integration and Governance Govern data quality and manage the information lifecycle Accelerators Speed time to value with analytic and application accelerators InfoSphere Data Explorer Find, navigate, visualize all data InfoSphere BigInsights Bringing Hadoop to the enterprise InfoSphere Streams Analytics for data in-motion exploration
  • 19. 19 © 2013 IBM Corporation IBM’s Key Platform Capabilities Accelerators Information Integration & Governance Data Warehouse Stream Computing Hadoop System DiscoveryApplication Development Systems Management BIG DATA PLATFORM “IBM has the deepest Hadoop platform and application portfolio.” –The Forrester Wave™: 1Q12 “IBM InfoSphere BigInsights is a core capability of the most comprehensive Big Data analytics platforms out there right now…” – Krishna RoyLars “Mark Leader IBM offers by far the largest product and services portfolio by both breadth and depth most…” – Jeff Kelly, IBM is The Undisputed Leader in Big Data Market
  • 20. 20 © 2013 IBM Corporation IBM Netezza’s Market-Leading Evolution World’s First Data Warehouse Appliance World’s First 100 TB Data Warehouse Appliance World’s First Petabyte Data Warehouse Appliance World’s First Analytic Data Warehouse Appliance NPS® 8000 Series TwinFin™ with i- Class™ Advanced Analytics NPS® 10000 Series TwinFin™ 2003 2006 2009 2010 2011 2013 World’s fastest and “greenest” analytical platform Striper Simplicity Time to Value Extreme Performance Built-in analytic capabilities
  • 21. 21 © 2013 IBM Corporation IBM DB2’s Market-Leading Evolution Software Innovations in Warehousing Prescriptive Best Practices for WH environments Broader Software capabilities and tighter h/w integration Integrated purchase process; bundled support & services Data Partitioning Feature ( DPF ), Optimization for mixed workloads IBM Smart Analytic System ( ISAS ) InfoSphere Balanced Warehouse ( IBW ) 2003 2006 2009 2010 2011 2013 PureSystems branding; single admin; single PID PureData for Operational Analytics ( PDOA ) MDC, Autonomics, Simplified Admin, pureXML, Cubing Services, Mining BLU Acceleration leveraging columnar and “in-memory”, NOSQL, Big Data Multi-temperature storage; Real-time Warehousing, WLM, Temporal Analytics Range Partitioning, Active Warehousing, Compression, Cognos, ETL Balanced Configuration Unit ( BCU ) Mixed Workloads Operational Analytics Extreme Performance Oracle Application Compatibility NOSQL
  • 22. 22 © 2013 IBM Corporation BLU Acceleration for Cloud - >90% of OLTP systems have reporting running on them >50% of OLTP systems have analytics running on them Address the demands of these “mixed workload” environments DB2 for z/OS Informix Informix Informix Warehouse Accelerator Appliances PureData for Analytics powered by Netezza PureData for Operational Analytics Leveraging DB2 Software DB2 with BLU Acceleration Customized Software Multi-tenancy Virtual Environments Accelerators Cloud Analytics Platforms and Analytic Accelerators
  • 23. 23 © 2013 IBM Corporation© 2013 IBM Corporation BigInsights Enterprise Edition Components IBMOpen Source Visualization & Discovery Integration Workload Optimization Streams Netezza Flume DB2 DataStage IBM InfoSphere BigInsights Runtime Advanced Analytic Engines File System MapReduce HDFS Data Store HBase Text Processing Engine & Extractor Library (AQL+HIL) BigSheets JDBC Applications & Development Text Analytics MapReduce Pig & Jaql Hive Administration Index Splittable Text Compression Enhanced Security Flexible SchedulerJaql Pig ZooKeeper Lucene Oozie Adaptive MapReduce Hive Integrated Installer Admin Console Sqoop Adaptive Algorithms Dashboard & Visualization Apps Workflow Monitoring Management HCatalog Security Audit & History Lineage R Guardium Platform Computing Cognos GPFS
  • 24. 24 © 2013 IBM Corporation Enterprise Integration With Multiple Products Brings the Power of the Big Data Platform to BigInsights © 2013 IBM Corporation IBM InfoSphere Data Explorer Indexing and “on the glass” integration InfoSphere Streams Enables real-time, continuous analysis of data on the fly InfoSphere Guardium Auditing + Governance BigSQL Standard SQL query to data in Hadoop, Hive, or HBase Cognos Business Intelligence Support for Hive; Business Intelligence capabilities InfoSphere BigInsights Administration & Security Workload Optimization Connectors Advanced Engines Visualization & Exploration Development Tools Open source Hadoop components InfoSphere DataStage ETL Directly Into Hadoop without Map Reduce Platform Computing High performance, low- latency platform computing grid – Min 3X Perf Increase R (BigR in 2014) Application that allows users to execute R jobs directly from BigInsights web console DB2 and JDBC High speed parallel read-write for DB2 and JDBC connectivity WebSphere WAS 8.5 Liberty Profile – high performance secure REST access Rational & Data Studio RAD, Rational Team Concert & Data Studio collaborative development integration
  • 25. 4 © 2013 IBM Corporation 2 years ago, Watson’s advanced analytic capabilities could sort through the equivalent of 200 million pages of data to uncover an answer in 3 SECONDS and would fill up this entire room. Today…Watson is now 24x faster and has gone from the size of a master bedroom to three stacked pizza boxes.
  • 26. 26 © 2013 IBM Corporation Automobile and Manufacturing Quality Control and Customer Satisfaction In-flexibility and scalability limitations of existing IT solutions has been a inhibitor to competitive advantage. A new solution is needed to improve customer insights, quality and operational efficiency • Inventory control of parts • Manufacturing equipment and assembly line data •Warranty and services data from dealers •Telemetry data from vehicles •Customer services and social media data Next generation of Enterprise Data Warehouse: •Data landing zone and analytic zone for 5- 10 years of data •Warehouse reporting zone for high performance reports
  • 27. 27 © 2013 IBM Corporation Constant Contact Transforming Email Marketing Campaign Effectiveness with IBM Big Data Capabilities • InfoSphere BigInsights, IBM PureData for Analytics – powered by Netezza technology, Cognos BI Need • Analyze 35 billion annual emails to guide customers on best dates & times to send emails for maximum response Benefits • 40 times improvement in analysis performance • 15-25% performance increase in customer email campaigns • Analysis time reduced from hours to seconds
  • 28. 28 © 2013 IBM Corporation 28 Large European University generates own energy and uses analytics to monitor and manage consumption Need • After years of 8-digit electric bills, the university deployed an independent on- campus power generation system. But they lacked a solution to monitor, analyze, and manage production and consumption, Benefits • Anticipate lower energy consumption levels and costs • Ability to identify energy inefficient areas of campus and take corrective action • Improved understanding of how changes in power grid model affect energy efficiency Capabilities Utilized: Cognos BI, SPSS InfoSphere BigInsights InfoSphere Warehouse Tivoli Energy Management
  • 29. 29 © 2013 IBM Corporation What’s the Vision ?
  • 30. 30 © 2013 IBM Corporation The Next Generation Architecture for Big Data Where do we go next? The next generation architecture vision includes: Intelligent data provisioning across the ecosystem Seamless access to all data for applications Metadata asset catalog management Applications and analytics portability In-memory systems with BLU Acceleration Customer deployment options: cloud, software, and appliance Dynamic all data governance Enterprise security for all data Intelligent life-cycle management
  • 31. 31 © 2013 IBM Corporation Information Integration & Governance Logical Data WarehouseLogical Data Warehouse Exploration, landing and archive Trusted data Reporting & interactive analysis Deep analytics & modeling Data types Real-time processing & analytics Transaction and application data Machine and sensor data Enterprise content Social data Image and video Third-party data Operational systems Actionable insight Decision management Predictive analytics and modeling Reporting, analysis, content analytics Discovery and exploration The Logical Data Warehouse Leverage fit for purpose components and zones Advanced Application Capabilities Vertical Industry Accelerators
  • 32. 32 © 2013 IBM Corporation What steps have already been taken ?
  • 33. 33 © 2013 IBM Corporation Delivered as a cloud service, Cloudant eliminates complexity by enabling developers of fast-growing web and mobile apps to focus on developing their applications without the need to manage database infrastructure or growth Delivered as a cloud service, Cloudant eliminates complexity by enabling developers of fast-growing web and mobile apps to focus on developing their applications without the need to manage database infrastructure or growth Provides a NoSQL data layer delivered as a managed service Stores data of any structure as self-describing JSON documents Unique clustering framework that achieves elastic scalability that can span multiple racks, data centers, cloud providers or devices Provides multi-master replication that allows read and write to any replica and offline mobile app usage plus mobile replication & sync for occasionally connected apps Global data distribution and geo-load balancing provide high availability and enhanced performance for applications that require data to be located close to the user Provides full-text search, geo-location services, and flexible, real-time indexing Integrates via a RESTful API Monitored and managed 24x7 by the big data experts at Cloudant Based on open standards including– Apache CouchDB, Apache Lucene, GeoJSON and others Provides a NoSQL data layer delivered as a managed service Stores data of any structure as self-describing JSON documents Unique clustering framework that achieves elastic scalability that can span multiple racks, data centers, cloud providers or devices Provides multi-master replication that allows read and write to any replica and offline mobile app usage plus mobile replication & sync for occasionally connected apps Global data distribution and geo-load balancing provide high availability and enhanced performance for applications that require data to be located close to the user Provides full-text search, geo-location services, and flexible, real-time indexing Integrates via a RESTful API Monitored and managed 24x7 by the big data experts at Cloudant Based on open standards including– Apache CouchDB, Apache Lucene, GeoJSON and others Summary
  • 34. 34 © 2013 IBM Corporation DB2 with BLU Acceleration DB2 with BLU Acceleration – In-memory columnar data store – Orders of magnitude improvement for • Consumability • Speed • Storage savings BLU Acceleration is breakthrough technology – Combines and extends proven relational technology with in-memory – Over 25 patents filed and pending – Leveraging years of IBM R&D spanning 10 laboratories in 7 countries worldwide Typical experience – Simple to implement and use – Average of 37X performance gains – Greater than 10X compression gains DB210.5 Super analytics Super easy DB2WITH BLU ACCELERATION DB2WITH BLU ACCELERATION
  • 35. 35 © 2013 IBM Corporation Super Fast, Super Easy — Create, Load and Go! No Indexes, No Aggregates, No Tuning, No SQL changes, No schema changes IBM Research & Development Lab InnovationsIBM Research & Development Lab Innovations BLU Acceleration
  • 36. 36 © 2013 IBM Corporation Offerings and Deployment Models Pure Systems Cloud Software Pure Application System IBM Business Intelligence Pattern with BLU Acceleration IBM DB2 Data Mart with BLU Acceleration BLU Acceleration for the Cloud Pay by the hour for 1TB or 10TB Use your credit card Bring your own license DB2 10.5 Advanced Workgroup Advanced Enterprise Cognos BI 10.2 DB2 10.5 Advanced Editions include 5 user licenses of Cognos Application Platform Delivering Platform Services
  • 37. 37 © 2013 IBM Corporation “The BLU Acceleration technology has some obvious benefits: It makes our analytical queries run 4-15x faster and decreases the size of our tables by a factor of 10x. But it’s when I think about all the things I don't have to do with BLU, it made me appreciate the technology even more: no tuning, no partitioning, no indexes, no aggregates.” —Tom DeJuneas, IT Team Manager, Coca- Cola Bottling Co. Consolidated “ ”
  • 38. 38 © 2013 IBM Corporation NOSQL - Ready for Big Data Curt Cotner 2012 FerrariownsCar Curt Cotner 123 Maple Ave, ChicagoownsHouse Curt Cotner 2001 ThunderjetownsBoat DB 2 J S O N Big Data Analytics SocialMobileCloud137.343 38.825 0 20 40 60 80 100 120 140 160 Jena TDB DB2 Graph Store Seconds Emergence of a growing number of non-relational, distributed data stores for massive scale data { "firstName": "John", "lastName" : "Smith", "age" : 25, "address" : { "streetAddress": "21 2nd Street", "city" : "New York", "state" : "NY", "postalCode" : "10021" }, "phoneNumber": [ { "type" : "home", "number": "212 555-1234" }, { "type" : "fax", "number": "646 555-4567" } ] }
  • 39. 39 © 2013 IBM Corporation NOSQL – Why does it matter ? Combine data from systems of engagement with traditional data in same DB2 database – Best of both worlds – Simplicity and agility of XML, RDF, JSON + enterprise strengths of DB2 Store data from web/mobile apps in it's native form – Developers don’t have to learn anything new – XQuery, SPARQL, Mongo API, …. No new business processes to worry about – Security, Audit – Data Life Cycle Management – Backup and Recoverability – Resilience – High Availability and Disaster Recovery DB 2 J S O N Big Data Analytics SocialMobileCloud
  • 40. 40 © 2013 IBM Corporation What is BigInsights BigSQL Using rich standard SQL – Comprehensive SQL '92+ support (datatypes) SQL access to all data stored in BigInsights – Multiple Sources Via JDBC/ODBC Leveraging Map/Reduce for Parallelism OR Direct for Low- Latency Queries – Big SQL utilizes direct access or MapReduce: In direct access, users can run smaller, point queries, like HBase queries for example, that will execute quickly. For bigger complex queries on larger data sets, the parallelism of MapReduce is used to process the data. Scalable server architecture Data Sources Hive Tables HBase Tables CSV Files BigSQL Engine BigInsights Application SQL Language JDBC / ODBC Driver JDBC / ODBC Server
  • 41. 41 © 2013 IBM Corporation Big SQL 3.0– Features at a Glance Available for POWER Linux (Redhat) and Intel x64 Linux (Redhat/SUSE)
  • 42. 5 © 2013 IBM Corporation 2 years ago, Watson’s advanced analytic capabilities could sort through the equivalent of 200 million pages of data to uncover an answer in 3 SECONDS and would fill up this entire room. Today…Watson is now 24x faster and has gone from the size of a master bedroom to three stacked pizza boxes. Watson refers to a set of solutions for the era of Cognitive Analytics
  • 43. 43 © 2013 IBM Corporation Total respondents n = 1061 Big data objectives Top functional objectives identified by organizations with active big data pilots or implementations. Responses have been weighted and aggregated. Customer-centric outcomes Operational optimization Risk / financial management New business model Employee collaboration Big Data Requires Ability to Match Customer Information Trends More than 50% of Big Data analytics projects are “customer-centric” Integrating data increases the ability to create a complete picture of today’s ‘empowered consumer’ However Clients today struggle to link this customer information, hand-coding & repeatedly tweaking algorithms Solution IBM BigMatch for BigInsights
  • 44. 44 © 2013 IBM Corporation C. Johnson 123 Main Street 512-545-1234 CRM Supply Chain Fulfillment Support Ticketing External Sources 3rd Party Chris Johnston 123 Main Street 512-554-1234 Shipping: 456 Pine Ave Christine. Johnson 123 Main Street Call length Semi-structured notes Satisfaction C. Johnson Main Street 512-554-1234 C. Johnson 125 Main Street 512-554-1234 ChrisJohnson65 “Likes” Clothes, Camping Gear @ChristyJohnson65 Christy65 Circle / Network data Order Mgmt. Internal / Structured External / Unstructured Web Chris.johnson@cj.net BigMatch provides The Ultimate Customer Dimension for Analytics at Hadoop Scale Big Match matches all these records Big Match combines the MDM probabilistic matching engine & pre-built algorithms & BigInsights for customer matching natively within Hadoop Increased Value of Customer only if… Christine Johnson Married 1 child 4/15/74 Christy65 Mail Order responder Specialty Apparel Partner Sales data VIP: Gold Customer Sat: 80% Influence Score: 8/10
  • 45. 45 © 2013 IBM Corporation Match and Search Differentiators – Fuzzy Matching IBM’s library of fuzzy matching techniques is the most comprehensive. Fuzzy matches are then scored against probabilistic weights based on value frequencies in your data Nov 6, Phonetics Mohammed vs. Mahmoud Synonyms Andrew = Andy George = Jorge 1st = First Abbreviations AIG = American International Group Road = Rd Concatenation Van de Velde = Vandevelde Misalignment Kim Jung-il = Kim il Jung Edit Distance 867-5309 ~ 876- 5309 Transliteration Toyota = トヨダ Date Similarity 01/01/1973 ~ 01/02/1973 Proximity Geocodes and great-circle distance Noise Words Initiate Inc. = Initiate Typographical Errors John Smith vs. John Snith
  • 46. © 2013 IBM Corporation Thank You Les King Director, Database, Analytics, Big Data Solutions May, 2014 lking@ca.ibm.com

×