Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Enterprise Search: Addressing the First Problem of Big Data & Analytics - StampedeCon 2016


Published on

This session addresses the first problems of Big Data & Analytics–Identifying, Indexing, Connecting and Gaining Insight of Existing Data to Drive Value. HPE’s Chief Field Technologist will give her perspectives on Enterprise Search as a Fundamental Cornerstone of Building a Data Driven Enterprise.

Published in: Technology
  • Be the first to comment

Enterprise Search: Addressing the First Problem of Big Data & Analytics - StampedeCon 2016

  1. 1. Enterprise Search: Addressing the First Problem of Big Data & Analytics Raj Dhillon, Ph.D. Chief Field Technologist StampedeCon 2016 July 27, 2016
  2. 2. Foundational methodology 2 Thomas Bayes (1702-1761)
  3. 3. Foundational methodology 3 If we toss a coin 100 times and get heads every time, what’s the probability of getting a head on the 101st toss? 50% 99+% Traditional probability Bayesian Inference
  4. 4. Foundational methodology 4 Alfred Butts’ letter frequencies Claude Shannon (1916-2001)
  5. 5. Silos Volume and Velocity Expectations What is enterprise search and what are its challenges? Challenges Enterprise search is a means of identifying and enabling content from multiple enterprise-type sources to be indexed, searched, and displayed to a defined audience. An effective enterprise search platform should enable productivity. 5
  6. 6. Productivity depends on effective enterprise search 6 10% 50% 6 hours 50-80% The Butler Group reports up to 10% of staff costs are lost because employees are unable to find the right information to do their jobs. (2006) In a study of over 1000 middle managers, Accenture found that managers spend up to 2 hours a day searching for information, and more than 50% of the information they obtain has no value to them. (2007) According to the New York Times, data scientists spend 50-80% of their time collecting and prepping data. (2014) An Aberdeen Group study of 188 organizations that had implemented enterprise search revealed executives at the top performing companies within those examined saved 6 hours a week looking for information, compared to 1 hour for executives at the other companies. (2009)
  7. 7. Overcoming the data silo 7 Identify and connect disparate data sources
  8. 8. The data landscape is radically changing More connected people, apps, and things generating more data in many forms Business data Human data Machine data 10x faster growth than traditional business data 8
  9. 9. Why is processing human data different? – Human Information is made up of ideas, is diverse, and has context – Ideas don’t exactly match like data does; they have distance. – Human Information is not static – it’s dynamic and lives everywhere. 9 MobileTextsEmailAudioVideoSocial Media Transactional Data Documents Search Engine Images IT/OT
  10. 10. Enterprise Search: Let me Google that for you Web Enterprise 10 Content Web pages; largely homogeneous Variety of data sources; variety of file formats; heterogeneous Relevance Tolerates large number of results, as well as duplicated or overlapping information Demands small number of unique results with high degree of specificity Personalization Little personalization expected; expect list of returned results Expectation of customized results (data access) aligned with user profiles (role, group, projects, etc.) Analysis Generic Domain-specific
  11. 11. Big data requirements for enterprise search 11 Unifying diverse sets of data1 Allows users to ask questions that haven’t been asked before Automatic and real-time3 Content is automatically indexed and available for search, enabling users to find data almost as quickly as it’s being captured Identifying what’s relevant2 Increase productivity by streamlining search  users can focus on transforming and extracting the right data for analysis BenefitsRequirements Action-oriented / insight driven4 Maximize return on human capital
  12. 12. Tackling big data requirements for enterprise search 12 Unifying diverse sets of data1 Automatic and real-time3 Identifying what’s relevant2 HowRequirements Action-oriented / insight driven4 – Create single view of enterprise content by connecting to different sources and repositories – Data streamlining – Automatic query guidance – Intelligent summarization – Intelligent highlighting – Personalization – Classification and clustering – Handled via indexing protocol – not directly visible to end users – Concept navigation / visualizations – Eduction – Sentiment – Classification and clustering – Machine Learning
  13. 13. Personalizing data Implicit and explicit profiling Relationship discovery / community and expertise networks Intent-based ranking 13 Customer C is linked to Customer E via Customer D Customer H is the most influential in Customer B’s network Customer A is in Customer B’s network Customers F and G purchased the same model last year
  14. 14. Classification and Clustering 14 Product performance issues Side letters Off balance sheet transactions Managed classification: Create categories using business rules or training Automatic classification and clustering: Automatically determine categories based on patterns and relationships in information
  15. 15. Eduction and Sentiment I stayed at the resort last week, and though the mattresses were very nice, the service was awful. 15 Names Places IP addresses Companies Events Relationships Medicines Airports Cars Social Security numbers Phone numbers Credit cards Dates Holidays Job titles Currencies Eduction: Apply structure to unstructured data by automatically identifying and extracting terms in documents that lend themselves to key fields Sentiment: Decomposition and classification within a sentence to pull out the sentiment surrounding specific topics
  16. 16. Intelligent search with Machine Learning 16 Document interpretation / topic and concept identification Sentiment analysis Query analysis / clustering Personalization of content / recommendations Categorization / classification of data Entity identification Ranking results Auto-complete / directed navigation
  17. 17. What’s next? 17
  18. 18. What else are users asking for?  Improved treatment of poor quality data  More interactive search / digital assistants  Streamlined / better defined workflows  Better visualization / user experienceExtract Analyze Connect Index Search Predict
  19. 19. Case Studies 19
  20. 20. Stanford Children’s Health Research for healthcare provider ranking study Challenge – Quality and clinical effectiveness research on ~115K patients, ~390K encounters, ~3M documents – Diverse data types (structured and unstructured) across data silos involved – Time constraints vs extensive search scope Result – Cross patient search for cohort identification – Intuitive UI for simple query construction – Easy clinical note review with highlights, navigation and related concepts – Portable queries and results – Fast indexing 20
  21. 21. Leading Chinese telecom Communications service provider industry Challenge – Allow users to access information on thousands of public services directly from their mobile phones – success of this platform depends on the users’ ability to quickly find information Result – Over 740 million subscribers can search through more than 8,000 applications for public service information, including public transportation schedules, public health records, traffic offenses, and more – Users receive more accurate search results than ever before – Customers get the most relevant and useful information regardless of the terms they use in the search 21
  22. 22. Leading financial software, data and media company Subscribers require up-to-the-second information on market conditions and trends Challenge – Deliver search performance at the scale required by the size of its data repository, 200 million messages, 15-20 million chats daily – Provide robust, cost-efficient solution with scalability for large and growing volume of data, supported by small IT headcount Result – Detects trends in real-time messaging and chats for subscribers – Accommodates 10+ billion of document entries without compromising performance today – Ensures scalability delivers ROI in the future 22
  23. 23. Leading American multinational telecom Paying careful attention to every aspect of customer-facing processes and applications Challenge – Provide support desk staff with fast access to precise information required to address customer’s problem – Improve knowledge management system search capabilities Result – Reduced time-to-resolution with fast queries that ensure support experts can resolve customer issues quickly – Relevant results as query functionality makes sure that results deliver information most likely to resolve customer issues 23
  24. 24. NASCAR Fan and Media Engagement Center Challenge – Economic conditions – Rapidly changing media landscape (social media growth) – Rev pressures from sponsors – Industry leadership expectation Result – Live monitoring and analysis of broadcast, news and social media – Sponsors’ brand and fan sentiment analyses – Analytics to support race team sponsorship renewals – Crisis management – Build fan base with active engagement 24
  25. 25. Thank you 25