Discovery in action: the transformative power of oracle endeca information discovery

2,135 views
1,988 views

Published on

More than 80% of data is outside the reach of traditional analytical systems, including sources like social media, email, websites and more. Supercharge your analytic investments by unlocking the power of insights from any source.

1 Comment
2 Likes
Statistics
Notes
  • I have put together a step by step guide on how to implement the text enrichment component in Endeca and perform entity extraction and sentiment analysis

    http://www.business-intelligence-quotient.com/?p=1801
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
2,135
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide
  • Key Idea: Dialogs contain untapped insights about your business.One of the fastest growing new opportunities organizations have today is to better harness and understand the dialogs happening around their business.Let’s take your customers as an example.[build]They have different perceptions of products, services, and brand, which they are expressing every day - for better or worse - on social media sites, the internet, or directly to customer service reps.[build]Inside organizations we can also look at what the workforce is saying, in notes the sales and service team record as they interact with customers, in documents and emails, performance reviews, job satisfaction surveys, and more.[build]And so it goes across supplier and partner networks, with the competition, with the government, and with the general public. Everyone is thinking and saying things that can lead to transformative new insights.[transition]
  • Key Idea: Understanding and assessing the impact of these dialogs presents a huge new business opportunity.This information is often literally telling you how to improve the business.If you could listen directly to customer's opinions and feedback and assess its impact, organizations could better target products and services to the market, address customer satisfaction issues proactively, and improve brand perception.And understanding workforce performance and sentiment is driving the next level of improvements around operational efficiency, talent management, and employee satisfaction. And so on...So if the opportunity here is to understand and analyze these dialogs to improve the quality of the relationships you have with all of these groups, why isn't everyone already doing it?[ransition]
  • Key Idea: But, unstructured data is a big problem for IT.First, this data isn’t all captured in a single place. It’s common knowledge at this point that barely 20% of enterprise data is maintained in traditional data management systems.[build]Which means that over 80% of the world’s data is not. it’s spread out everywhere, on social media sites and in the internet, on file shares and your email, or maybe embedded in the text fields in your enterprise applications.[build]Most of this information is “unstructured”, and it turns out that it is really very different from traditional structured data. Unstructured data has no pre-defined data model and/or does not fit well into relational tables. Typically, there is no identifiable structure – it can have complex, hierarchical structures, and is often text-heavy. Due to the explosion and proliferation of the internet and social media, unstructured data is growing exponentially in volume and diversity – and organizations are looking for better ways to manage and analyze this data.The second challenge is the quality of this data. The truth is that while there are golden nuggets of insight buried in these mountains of data, most of it will be messy, and may not be all that valuable. How do you decide what matters to your business? There’s not necessarily a right answer: Different user groups will find value in different data sets.The bottom line is most organizations believe they lack the time and skills to make sense of it all, so they struggle to get started and never realize the benefits of unstructured analysis.[transition]
  • Key Idea: You need a solution that extends analytics to unstructured sources. This is Endeca.The key is, in order to make this information actionable, you need to be able to look across all of these diverse sources; correlate them back to the structured data that you are already using to measure and drive your business; and enable decision makers to interact directly with that data to generate insights and improve outcomes.What we’re going to talk about today is Oracle’s unique offering in this space.[build]Oracle already has Oracle BI is the best platform for analyzing structured data, and for managing the common business analytic definitions of your enterprise.[build]And to that, we now add Endeca Information Discovery, which extends Oracle’s analytic capabilities across unstructured sources, uniting the worlds of structured and unstructured to provide business users with complete visibility into their business processes, creating new insights and enabling better business decisions.Only Oracle provides a complete, integrated platform for structured and unstructured data analysis.Let me now give you an example of how all this information comes together in Endeca Information Discovery.[transition]
  • Key Idea: Unstructured data analytics with Endeca enhances business problem solving.Let's consider a solution to help quality engineers analyze, predict and avoid costly product recalls.To figure this out, a reliability engineer needs to ask some additional questions. “What parts receive the most claims? What other products contain those parts? Who supplies those parts? What did the customer say was wrong? What are industry experts and other consumers saying about our products?”The information sources holding these answers are diverse, so where do we start?[build]As in most Endeca solutions, we start by incorporating the gold standard data and metrics in your existing data warehouse and BI solutions. The unstructured data from the other systems will enrich and enhance this data to provide new insights.[build]One of the first places to look is your enterprise applications. Lots of valuable information gets left behind from the analysis you're doing in the warehouse, such as the text of your service requests or warranty claims. [build]Then there are all of the disparate unstructured sources within and beyond your organization. Government websites and public data sets, industry safety forums, and consumer chatter on popular social media sites.[build]By seeing all of this information together, business users can now ask the new questions that will help them understand what's happening, what could happen next, and where to go for more information. They've turned all of this disparate information into insight.Let’s now take a look at what it’s like to use the resulting Endeca application.[transition]
  • The use case we are looking at today is around incident analysis, for this example we look specifically at helping authorities combat international terrorism. The information sources you see on the screen represent a wide variety of public data sets from a number of disparate sources across the internet. Across these sources we will use Oracle Endeca Information Discovery to look for interesting patterns and relationships. Sources are both structured and unstructured or a combination of both. Out of a relational database we extract terrorist incidents that denote the time, location and type of event amongst other attributes. These incidents also contain a field that is a long free-form text description of the incident. These notes typically come from a file or content management system. We then combine this data with very unstructured sources that might possibly be talking about the very same incidents, such as news sites, social media and other public forums.In our demo we will be focusing on relating incidents of international terrorism with comments on social media. We are trying to find instances of people talking about the same terrorist incidents on facebook or twitter. Perhaps they know additional information about the incident that can help authorities with their investigation.To do this we will use Endeca's Text Enrichment capabilities across the unstructured fields contained in all the sources to uncover common themes and entities such as places, company names or people.[build]Here we see the country 'Somalia' is mentioned in all of the sources in various fields, both structured and unstructured[build]As is the term 'IED'[build]and 'bomb' and 'vehicle'The more terms that the various sources have in common, the more likely the sources are to be related. Oracle Endeca Information Discovery will summarize the different sources based on attributes that already exist in the data in addition to those identified by our text enrichment capabilities within the unstructured data.Let's see how that works...[transition]
  • Discovery in action: the transformative power of oracle endeca information discovery

    1. 1. 1 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    2. 2. Discovery in Action: TheTransformative Power of OracleEndeca Information DiscoveryAdam FerrariVP DevelopmentRichard TomlinsonDirector, Product Management2 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    3. 3. Safe Harbor Statements The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    4. 4. Agenda  Introduction to Endeca Information Discovery  Demos  Cooking Show4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    5. 5. Dialogues Contain Critical, Untapped Insights“Why should I pay a “She saved the project “Competitive pricing is “RT @finwiz. Checkoutchecking fee if other with strong leadership 15% lower than yours video http://bit.ly/wLe6Y2 banks don’t charge & building trust with the and they offer Sweet!” one?” customer.” discounted shipping.” Customers Workforce 3rd Parties The Public 5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    6. 6. New Insights Drive New Opportunities “ ” Customers Workforce • Improve Customer • Improve Loyalty and Productivity Satisfaction • Retain Best • Build Better Products Employees and Services • Attract A-Players Impact 3rd Parties The Public • Improve Operational • Understand Revenue & margin Unsolicited Efficiency Operational efficiency. • Create and Maintain Feedback Better partnerships Better Partnerships Product positioning. • Improve Brand Perception and Sentiment6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    7. 7. The Challenges of Unstructured Data MOSTLY TEXT, AND DATA CAN BE DIRTY ORA DIVERSE SCHEMAS DATA IS GROWING OF UNCERTAIN VALUE IN VOLUME AND DIVERSITYXML20% STRUCTURED 80% UNSTRUCTUREDBusiness Intelligenceand Data Warehouses Text in Enterprise Enterprise Content Systems, Websites Social Media Big Data Applications File Systems, Email7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    8. 8. Oracle Endeca Information Discovery Rapid, intuitive exploration and analysis of data from any combination of structured and unstructured sourcesUnstructured Analytics  Benefits – Unprecedented Information Visibility – Leverage Existing BI Investments – Self-Service Data Discovery – Reduced IT Costs, Better Business Decisions  Unique Features – Contextual Search, Navigation, Analytics – Dynamic Data and Metadata – Content Acquisition and Text Enrichment – In-Memory Performance 8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    9. 9. Extend Business Analytics with Unstructured Data Introducing Oracle Endeca Information DiscoveryOracle Business Intelligence Oracle Endeca Information DiscoveryBest platform for integrated ROLAP and MOLAP Best platform for Unstructured Analytics BI Server + Essbase Endeca Server Common Enterprise Hybrid Search/Analytical Database Information Model Flexible Data ModelStructured Data Unstructured Data OLTP & ODS Enterprise Applications Data Warehouse Websites Content Systems, Social Media Big Data Systems (Oracle, SAP, Others) & Data Marts Files, Email 9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    10. 10. Oracle Endeca Information Discovery Platform Technology Overview Studio – Web ApplicationStudio – Contextual Search, Navigation, AnalyticsIntuitive Exploration and Analysis – Qualitative and Outlier VisualizationsCreate and Share Apps – Easy Drag-and-Drop ApplicationsEndeca ServerHybrid Search/Analytical Database Endeca Server – Core DatabaseIn-Memory Architecture – Dynamic Data and Metadata – In-Memory, Multi-Threaded PerformanceIntegration Suite – Enterprise Scale, SecurityData Integration and EnrichmentStructured and Unstructured Integration Suite – ETL – Integrates Structured and Unstructured – Text Enrichment and Sentiment Analysis10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    11. 11. Oracle Endeca Information Discovery Understand the Complete Picture with Context from Any Source Data Warehouse / External Content Business Intelligence Government Agencies Product Sales Safety Administration Metric: Sale Price Claim from Competitor X Dimensions: Customer, Product, Dealer, – Model ABC – After Date driving this car for only 3 months, I started having… Warranty Claims Metric: Claim Count, Labor Cost, Part Cost Websites Dimensions: Customer, Product, Part, Dealer, Date Warranty Claims Industry Forums ClaimID ProdID PartID Date CustID Dealer PartCost LaborCost 12324 506 234 12/3 1233 Dealer1 $300 $200 “.. focus on passenger 12325 507 235 12/4 1545 Dealer2 $450 $900 Sales Transactions vehicle crashes, and are ProdID Wk CustID Date Dealer Price used to investigate injury 506 25 1233 10/3 Dealer1 $35,000 507 26 1545 09/4 Dealer2 $22,000 mechanisms to identify potential improvements “How do we avoid in vehicle design.…”Product Quality ApplicationCustomer Verbatim costly product recalls? Social Media Consumer Comments and Sentiment “..customer heard a “Love my new car but rattling sound toward left front driver side. Had having difficulty controlling issues with steering steering on sharp column locking…” corners..” 11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    12. 12. COOKING SHOW12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    13. 13. Oracle Endeca Information Discovery Understand the Complete Picture with Context from Any Source Data Warehouse / External Content Business Intelligence 3rd Party Agencies Incidents “Several civilians wounded Metric: # incidents, # wounded, etc in assault by al-Shabaab Dimensions: Event Type, Weapon, al-Islamiya in Mogadishu, Location, Date Somalia…” Websites Terrorist Incidents News Media IncidentI EventType Weapon Date Region Country GeoCoord NumWounded D “..Somalia: 1 dead after -1.234 12324 IED Bomb 12/3 Africa Somalia 5 10.1234 bomb explodes near Armed Middle -20.1234 12325 Attack Rockets 12/4 East Iran 54.1234 12 vehicle. Try http://t.co/ve54JafA.…” “How do we betterContent Management System combat international Social MediaIncident Long Text / Notes terrorism?” Citizen Comments and Sentiment “…in the Israac neighborhood “Reports of IED detonation in of Gaalkacyo, Somalia, assailants detonated an Somalia, captors holding an improvised explosive device American hostage moved him (IED) near a vehicle 3 times in less than 24 occupied…” hours..” 13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    14. 14. Part 1: Initial Data LoadStarting with raw incidentdata, our first goal is tocreate an application likethis, which enables searchand navigation of terrorismincidents14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    15. 15. Part 1: Initial Data Load Details 2. Data loaded using Integration Suite 3. Resulting key/value representation in the Endeca Server 1. Start with raw CSV data15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    16. 16. Part 2: Text EnrichmentNext, we enrich theincident data by miningthe large textdescription fieldassociated with theincident. This uncoversthemes, entities (people,companies, places)associated sentiment foreach16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    17. 17. Part 3: Adding Related Social Media Data Incident key/value data, loaded in Step 1 and expanded in Step 2 Now combined with Facebook and Twitter social media posts17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    18. 18. Part 3: Adding Related Social Media DataThe combined data allowsus to create analyses thatrelate information fromvarious sources usingshared attributes, such asthose derived from textenrichment like Themes,Entities and Sentiment.18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    19. 19. Q&A19 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    20. 20. 20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    21. 21. 21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    22. 22. APPENDIX: TECHNOLOGY OVERVIEW22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    23. 23. Oracle Endeca Server Flexible Data Model • TxnID = 12324 • ProductID = 506 Structured data • Category = Mountain Bike Transaction • Amount = $499.99 Key-Value Store • Suspension = Fox 32 F-Series TxnID ProductID Category Amount • FrameType = Aluminium • No segmentation into tables • Saddle = Bontrager SSR 12324 506 Mountain Bike 499 12325 507 Road Bike 1399 • Mountain Accessories = Fork and shock sag • No overarching schema meter • Mountain Accessories = Water Bottle Semi-structured data Simple concepts: • Review = A great bike for off road. Smooth ride e.g. XML over the bumps • Attributes – like columns, except may • ReviewSentiment = Positive be sparse, multi-valued, hierarchical • ReviewTerm = Great • ReviewTerm = Off Road • Records – each record is a collection • ReviewTerm = Smooth of attribute/value pairs • ReviewTerm = Bumps Accommodates: • • TxnID = 12325 ProductID = 507 • Idiosyncratic structure… • Category = Road Bike • Amount = $1399.49 each record is self describing, has its • Weight = 20lb. Unstructured data + Text enrichment own possibly unique schema • FrameType = Composite • Saddle = Bontrager Race • Multi-valued fields • Review = Disappointing for the price. The frame feels heavier than I expected. • Large fields of unstructured text • ReviewSentiment = Negative • ReviewTerm = Disappointing • ReviewTerm = Price • ReviewTerm = Heavy 23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    24. 24. Endeca’s Unique User Experience Interactive Data Exploration and Analysis • • Deep Search Search across all data Dynamic typeahead + Contextual Navigation • Data-Driven. Freely browse data without predefined paths or writing queries + Visual Analysis • Charts, crosstabs, key metrics • Geospatial visualization • Automatic spell correction • Interactive. Shows only valid next steps • Tag clouds • Unlocks unstructured data • Easy to Use. Familiar online experience24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    25. 25. Discovery Application Lifecycle Building applications in days, not months Diverse and changing Automatically unified in Oracle Interactive search, navigation Drag-and-drop applicationinformation integrated and Endeca Server – no predefined and visualization for enriched via ETL composition in Studio model required exploration and analysisStructuredSemi-StructuredUnstructured Iterate25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

    ×