Social Data Analytics using IBM Big Data Technologies

2,628 views
2,390 views

Published on

Distilling Insights from Social Media Using Big Data Technologies

Have you ever wondered what your customers are saying about you in Social media, and the impact it might be having on your business? This session will focus on how BigInsights and Big Data technologies can be used to glean useful and actionable insights from social media data.

You'll see how data can be ingested and prepped and do text analytics on social data in real time. Using Hadoop, we'll show you how you can store and analyze your large volume of historical social media data and reference data. This talk and demo will provide an introduction to text analytics and how it is used within the IBM Big Data platform for a social media solution.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,628
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
125
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Social Data Analytics using IBM Big Data Technologies

  1. 1. Social Data Analytics using IBM Big Data technologies Vijay Bommireddipalli vijayrb@us.ibm.com Development Manager, Social Data Accelerator IBM Big Data October 21, 2013 © 2012 IBM Corporation
  2. 2. Please note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 2 © 2011 IBM Corporation
  3. 3. Before we begin … 3 © 2011 IBM Corporation
  4. 4. Tag ! You’re it ! - Micro-segmentation 4 © 2011 IBM Corporation
  5. 5. Social Data Analytics - Using social media as a rich source of information Behavior Maybe our politicians should take a playbook out of the rivalry between duke/unc and take it to the courts http://ity.com/wfUsir I'm at Mickey's Irish Pub Downtown (206 3rd St, Court Ave, Raleigh) w/ 2 others http://4sq.com/gbsaYR @silliesylvia good!!! U Interest shouldnt! Think about the Location important stuff, like ur 43rd birthday ;) @silliesylvia I <3 your leather Consumption btw happy birthday Sylvia ;) leggings!! Its so katniss!! dear redbox please have kings speech for my new tv colin firth movie marathon Age Intent to consume @silliesylvia $10 dollars says matthew & mary get married next season :) #downtownabbey OMG OMG. just dropped my new ipad3 crappola!!! Consumption 5 Prediction Interest @bamagirl can’t wait to watch sherlock with you! Oh, robert downey jr, I still love you but bbc is so amazing Intent to consume 360 degree profile Personal Attributes • Sylvia Campbell, Female, In a Relationship • 32 years old, birthday on 7/17 • Lives near Raleigh, NC • College graduate; Income of 80-120k Buzz/Sentiment • Retweets BF’s comments • Interest in BBC shows: Downton Abbey, Sherlock, Fringe, (P&P?) • Sherlock Holmes, Robert Downey, Jr. • Hunger Games, Katniss/J. Lawrence Interests/Behavior • Watch movies, tv shows • Romance plots, “hero types”, strong women • Uses iPad 3, Redbox, Hulu • Shopping , interest in sales/deals • Duke/ UNC basketball © 2011 IBM Corporation
  6. 6. Social Data Analytics - Comprehensive Entity Extraction and Integration Name: Jane Doe Id: jaydee Address: Home of the Buccaneers Interests: running, yoga, football… Name: Jane Doe Name: Jane Doe, Cava Address: Tampa, FL Address: Tampa, Fl Twitter: jaydee Twitter: @maryguida Blog Topic: food Blog Topic: politics Hobbies: running, yoga, … Hobbies: running, yoga, … Relationships: Tony C (brother)… Relationships: Tony C (brother)… Name: J Doe Blog Topic: food Entity Integration Name: jane Address: Tampa, FL Relationships: Tony C (brother)., … All names are fictitious 6 Challenges:  Scale  1000’s sites, 100s millions users  Complex matching decisions  Partial, noisy and incomplete profile attributes  Only 3% of consumers have sufficient attribute information in their profiles. © 2011 IBM Corporation
  7. 7. Consumer Intelligence Timely Insights • Intent to buy various products • Current Location Personal Attributes • Identifiers: name, address, age, gender, occupation… • Interests: sports, pets, cuisine… • Life Cycle Status: marital, parental Social Media based 360-degree Consumer Profiles • Personal relationships: family, friends and roommates… • Business relationships: co-workers and work/interest network… • Life-changing events: relocation, having a baby, getting married, getting divorced, buying a house… What should I buy?? A mini laptop with Windows 7 OR a Apple MacBook!??! Location announcements I'm at Starbucks Parque Tezontle http://4sq.com/fYReSj 7 • Personal preferences of products • Product Purchase history Relationships Life Events Monetizable intent to buy I need a new products digital camera for my food pictures, any recommendations around 300? Products Interests Life Events College: Off to Stanford for my MBA! Bbye chicago! Looks like we'll be moving to New Orleans sooner than I thought. Intent to buy a house I'm thinking about buying a home in Buckingham Estates per a recommendation. Anyone have advice on that area? #atx #austinrealestate #austin © 2011 IBM Corporation
  8. 8. Social Data Analytics - Profile construction 8 © 2011 IBM Corporation
  9. 9. Social Data Analytics - Profile construction 9 © 2011 IBM Corporation
  10. 10. Big Data Platform and Accelerators - Summary  Software components that accelerate development and/or implementation of specific solutions or use cases on top of the Big Data platform  Provide business logic, data processing, and UI/visualization, tailored for a given use case  Analytic Applications Bundled with Big Data platform components – InfoSphere BigInsights and InfoSphere Streams BI / Exploration / Functional Industry Predictive Content Reporting Visualization App App Analytics Analytics IBM Big Data Platform Visualization & Discovery Applications & Development Systems Management Accelerators Hadoop System Stream Computing Data Warehouse Contextual Search Key Benefits  Information Integration & Governance Time to value  Leverage best practices around implementation of a given use case. Cloud | Mobile | Security 10 © 2011 IBM Corporation
  11. 11. Social Media Analytics Architecture Online flow: Data-in-motion analysis Real time analytics. Pre-defined views and charts Stream Computing and Analytics Social Media Data Ingest and Prep Entity Analytics: Profile Resolution Extract Buzz, Intent , Sentiment Dashboard BigInsights System and Analytics Social Media Data Extract Buzz, Intent , Sentiment And Consumer Profiles Entity Analytics and Integration Comprehensive Social Media Customer Profiles Pre-defined Workbooks and Dashboards Offline flow: Data-at-rest analysis Data Explorer Index using Push API Ad hoc access Optional: Indexed Search 11 © 2011 IBM Corporation
  12. 12. SDA 1.2  Social Media Sources Supported – Gnip, Boardreader – Tweets, Boards, Blogs  Analyze Streaming data as well as data at rest – Streams for processing of streaming data – BigInsights/Hadoop for input, output and configuration data  Key Micro-segmentation Attributes (out-of-box) – Personal Info: Gender, Location, Parental status, Marital status, Employment – Interests: Movie interest, Comic book fan, Product interest, Current customer of, Products owned – ** Attributes can be added in (requires some development effort)  Entity resolution across the different social media sources 12 © 2011 IBM Corporation
  13. 13. SDA 1.2  Outputs/Measures (out-of-box) – – – – Buzz Sentiment Intent to buy/start service Intend to attend/see  Example use cases – – – – Retail – Lead generation, Brand management Financial – Lead generation and Brand management Media & Entertainment: Brand management Generic  Visualization using BigSheets  Extendable/Customizable Solution 13 © 2011 IBM Corporation
  14. 14. SDA - Acting on the insights  Metrics based understanding of Feedback in Social Media – And more importantly Feedback from whom !  Comprehensive (social media) profiles with microsegmentation information  Campaign execution can be done in Social Media  Entity resolution across the different social media sources  External (social media) to Internal (CRM) linkage **coming 14 © 2011 IBM Corporation
  15. 15. SDA Outputs  Pre-defined Workbooks  Dashboards  Granular outputs for further slicing and dicing by Data Scientists 15 © 2011 IBM Corporation
  16. 16. SDA Conceptual Flow 16 © 2011 IBM Corporation
  17. 17. BigInsights & Streams Text Analytics High Performance rule based Information Extraction Engine  Highly scalable solution available for at-rest and in-motion analytics  Pre-built extractors, and toolkit to build custom Extractors • Rich Extractor library supports multiple languages • Declarative Information Extraction (IE) system based on an algebraic framework Sophisticated tooling to help build, test, and refine rules Developed at IBM Research since 2004 Embedded in several IBM products • BigInsights, Streams. • Lotus Notes • Cognos Consumer Insights What is TA 17 Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation
  18. 18. Applications of Text analytics Broad range of applications in many industries • CRM Analytics Voice of customer Product and Services gap analysis Customer churn • Social Media Analytics Purchase intent Customer churn prediction Reputational Risk • Digital Piracy Illegal broadcast of streaming and video content • Log Analytics Failure analysis and root cause identification Availability assurance • Regulatory Compliance Data Redaction • Identify and protect sensitive information 18 What is TA Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation
  19. 19. Performance Comparison (with ANNIE open source **) Task: Named Entity Recognition Dataset : Different document collections from the Enron corpus obtained by randomly sampling 1000 documents for each size Throughput (KB/sec) 700 600 500 400 ANNIE Open Source Entity Tagger 300 >10x faster < 60% memory SystemT 200 100 0 0 20 40 60 80 100 Average document size (KB) ** http://dl.acm.org/citation.cfm?id=1858681.1858695 Performance comparison with GATE 5 What is TA 19 Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation
  20. 20. Text Analytics Development Flow  Declarative language for extractor logic  Optimization and deployment to scalable runtime Extracted Information Development Tooling Extractor Text Analytics Optimizer Compiled Operator Graph Text Analytics Runtime Sample Input Documents Rule based language Annotator Query Language - AQL with familiar SQL-like syntax Specify annotator semantics declaratively Choose an efficient execution plan Highly scalable, embeddable Java runtime What is TA 20 Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation
  21. 21. Invoking Text Analytics within BigInsights Document encoded as JSON record. Jaql runtime coordinates a multi-stage map-reduce flow. JAQL Function Wrapper Input Record { label: “http://www.ibm ...”, text: “<html>n<head> …” } AQL SystemT Optimizer Dictionaries 21 Input Adapter SystemT Runtime Compiled Plan Output Adapter Output Record { label: “http://www.ibm ...”, text: “<html>n<head> …” Person: [ { firstName: [10, 15], lastName: [16, 25] }, … { firstName: [1042, 1045], lastName: [1046, 1050] } ], Hyperlink: [ { anchorText: [25, 33] }, … { anchorText: [990, 997] } ], H1: … Annotations added as additional attributes to JSON} record. © 2011 IBM Corporation
  22. 22. Additional Advantages of IBM Text Analytics Quality: Drives effectiveness of entire application • Enables high accuracy and coverage Performance: Dominant cost is CPU • Process large documents and large number of documents with high throughput Explain-ability • Determine the cause of errors and fix it without affecting the remaining correct results Reusability: easily adaptable for a different domain • The development platform must enable layers of abstractions to be built and easily reused in a different domain Expressivity • Rule language with a rich set of operators available to enable complex extraction tasks What is TA 22 Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation
  23. 23. BigInsights Text Analytics Development What is TA 23 Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation
  24. 24. AQL editor with content assist 24 What is TA Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation
  25. 25. Understanding the lineage of results Click to drill down and see the rules that triggered inclusion of results Explain and search through the results What is TA 25 Why Biginsights TA How is TA Deployed & used Dev. tools © 2011 IBM Corporation
  26. 26. IBM Text Analytics for Big Data High Performance Information Extraction Engine Analysis can be applied to data at-rest and in-motion • Build extractor once and use with BigInsights or Streams Parallel execution scales to Big Data volumes • Linearly scalable to extremely high volumes Highly customizable to a variety of domains and languages • Pre-built extractors available out of the box Sophisticated tooling enables ease of development and refinement of results 26 © 2011 IBM Corporation
  27. 27. Thank you 27 © 2011 IBM Corporation

×