Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AD306 - Turbocharge Your Enterprise Social Network With Analytics


Published on

Social is generating large volumes of data about the business (who interacts with whom, when, and in what context). However, little of this data is being actively leveraged in order to generate insights that allow the business to work smarter and faster. This technical session describes how to capture and collect interactions within IBM Connections through its public APIs and apply a variety of analytics, including map/reduce and graph analytics, on a scalable Hadoop platform. This allows us to uncover insights into what the corporate network structure looks like, how information propagates across the organization, how are opinions formed, and how resilient is the organization to attrition.

Published in: Technology, Business
  • Be the first to comment

AD306 - Turbocharge Your Enterprise Social Network With Analytics

  1. 1. AD306 Turbocharge Your Enterprise Social Network with Analytics Vincent Burckhardt, IBM David Robinson, IBM © 2014 IBM Corporation
  2. 2. Please Note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 2
  3. 3. Agenda  A Peek into Data Science  Extracting IBM Connections data for analytical purposes  Analytics And Connections Data 3
  4. 4. A Peek Into Data Science 4
  5. 5. What Is This Thing Called Data Science ? Credit: Rachel Schutt/Cathy O’Neil 5
  6. 6. A Single Coffee Receipt date cashier location 12/10/2013 6 time 13:09 Chris Raleigh500 size qty reg 1 item spent mocha .80
  7. 7. A Year’s Worth Of Coffee Receipts For One Person 01/10/2013 13:53 Chris size qty item location Raleigh500 reg 1 mocha 01/12/2013 14:02 Doug Carrabou date time cashier spent .80 reg 1 mocha .80 01/14/2013 13:09 Nadia Raleigh500 reg 1 vanilla .75 02/01/2013 14:02 Nadia Raleigh500 lg mocha 1.10 blend .60 mocha 1.10 blend .60 mocha 1.10 03/14/2013 13:14 Chris 1 Raleigh500 reg 1 04/20/2013 13:32 Nadia Stardoe lg 1 … 12/14/2013 13:14 Bev Raleigh500 reg 1 12/20/2013 13:32 Nadia Winston’s 7 reg 1 Insights M-F, 1-2 pm 72% Raleigh500 75% regular 63% mocha $.87 avg spending
  8. 8. A Year’s Worth Of Coffee Receipts For Many People 01/10/2013 13:53 Chris size qty item location Raleigh500 reg 1 mocha 01/12/2013 14:02 Doug Carrabou date time cashier spent person .80 Joel Toni reg 1 mocha .80 01/14/2013 13:09 Nadia Raleigh500 reg 1 vanilla .75 02/01/2013 14:02 Nadia Raleigh500 lg mocha 1.10 Joe blend .60 Dan mocha 1.10 Dave blend .60 Ken mocha 1.10 Sally 03/14/2013 13:14 Chris 1 Raleigh500 reg 1 04/20/2013 13:32 Nadia Stardoe lg 1 … 12/14/2013 13:14 Bev Raleigh500 reg 1 12/20/2013 13:32 Nadia Winston’s 8 reg 1 Joni You get the idea…
  9. 9. Business Actions From Insights  From a single transaction (one receipt)  To engaging the customer with relevant actions (many receipts) - 9 Coupons for food Weekend offers ? Loyalty card ? Employee rewards ?
  10. 10. Datafication  “The process of taking all aspects of life and turning them into data” – Google’s augmented-reality glasses – Twitter for thoughts – LinkedIn for professional networks Credit: Kenneth Cukier/Victor Mayer-Schoenberger May/June 2013 Foreign Affairs Today we’ll show you how to add Lotus Connections to the list  Creating new products with data, improving existing products with data 10
  11. 11. The Value of Connections ?  Obvious value: – Collaboration tool  Perhaps “not so obvious” value: –“Social Receipts” …Datafication of Interaction Patterns…Business Insights ! Business Insights Connections 11 Analytics
  12. 12. Possible Questions Connections Data Can Help Answer  Are you effectively communicating your message ?  Are other’s responding to your message ?  Are customers, business partners, contractors, employees responding to your message?  Who are brokers of information in the organization ?  What Lotus communities are the most effective ?  What are the communication patterns like between divisions ?  What are the communication characteristics of high performing organizations ? Ask Your Question… Find Your Business Value 12
  13. 13. Extracting IBM Connections data for analytical purposes 13
  14. 14. IBM Connections Profiles Forums Find the people you need Exchange ideas with, and benefit from the expertise of others Communities Work with people who share common roles and expertise Blogs Present your own ideas, and learn from others Files Micro-blogging Post, share, and discover documents, presentations, images, and more Reach out for help your social network Wikis Bookmarks Create web content together Save, share, and discover bookmarks Activities Home page Organize your work and tap your professional network See what's happening across your social network
  15. 15. Connections Maximizes The Value of Social Data  IBM Connections provides APIs and SPIs that allow the value of the social data to be maximized by external systems: – ALL Connections data can be accessed by external systems – Open, transparent, breaking down silos  Pull data from IBM Connections – Programmatically access much of the same information that you can through the IBM Connections user interface  Have Connections push data to you – All data changes (CUD) event in all IBM Connections components can be supplied to external consumers
  16. 16. Connections Architecture Common Services JMX / WSAdmin Administration Search IBM Connections Apps Person Card User Directory Navigational Header Directory RDB File System
  17. 17. Connections Architecture Connections Atom API Browser Mashups Feed Reader Sametime Lotus Notes Portlets Microsoft Office Your App HTTP Server & Proxy Cache REST API PUT Common Services JMX / WSAdmin Administration Search Navigational Header Directory POST HTML Form IBM Connections Apps Person Card User Directory DELETE Atom Entry RDB File System GET JavaScript HTML Atom Feed JSON
  18. 18. Connections Architecture Connections Atom API Browser Mashups Feed Reader Sametime Lotus Notes Portlets Microsoft Office Your App HTTP Server & Proxy Cache REST API PUT Common Services JMX / WSAdmin Administration Search DELETE Atom Entry POST GET HTML Form JavaScript HTML Atom Feed JSON IBM Connections Apps Person Card User Directory Navigational Header Directory RDB File System Integration bus Other Enterprise Services Event SPI Your App
  19. 19. The Event SPI is the social data fire-hose  Designed to allow 3rd party to get notified whenever a data change happens in any of the IBM Connections service – Real-time events generated by IBM Connections include all create, update, and delete (CUD) operations. – Potential to represent the complete interaction footprint of the enterprise – Allowing to capture, persist, model, analyze, visualize and monetize your enterprise network  SPI (System Programming Interface) vs API (Application Programming Interface) – SPI at lower level than APIs ... contribute Java code at system level – By contributing Java code written to this SPI, 3rd parties can listen to creation, deletion and update (and more!) events of content within IBM Connections
  20. 20. Event SPI – Programming aspects  Events: collections of data generated when activities (datamodifying, notifications) occur in IBM Connections – In the SPI, an event is represented by a Java bean / object – A Event encapsulate data such as the type of action and the object (and container) involved in the action  Events are delivered to Event Handlers: – An event handler is a Java class implemented by a 3rd party (you!) – Event handlers are registered in an XML file (eventconfig.xml) • Instructing what type of event to send to a given handler – Connections delivers Java bean representing the event to registered event handler(s) Event SPI Handler 1 Eventconfig.xml Handler N Handler 2
  21. 21. Event SPI – available data in each event blog.entry.created: “Amy Jones posted a blog entry in the blog named XYZ” Actor The person who initiated this action. Details: External id, name and, if not disabled, email address Type Type of action Example: CREATE, UPDATE, DELETE, NOTIFY, MEMBERSHIP, .. Item Container General concept for representing an individual entity within a container General concept for representing a "bucket" or "container" that contains other items Details: id, name, textual content, HTML and ATOM paths Details: id, name
  22. 22. Event SPI – available data in each event  Many more data fields encapsulated in events: – Correlation item set to represent parent-child relationship (events about commenting action) – Target set, allowing to deduce interaction between content and people – Membership delta field, indicating who has been added/removed from a community, activity, ... – ... see Event SPI documentation for full list (JavaDoc) Key point: the event model encapsulates all of data needed to understand the interaction between people, content and containers in the platform
  23. 23. Event SPI in the context of an analytic solution Challenges of analytics:  Large amount of incoming event stream – Over 100+ events per second CUD – Growing on longer term – Scalable framework for analysis • Horizontal scale to address growth  (Near) real-time indexing  No data loss
  24. 24. Taming the fire-hose... (1/2) Analysis, even basic, is time consuming, thus: Event SPI  Event Handler “Data backbone” Storage for asynchronous processing Goal: retaining as many events as possible for further analysis Analytics Service Analysis should not occur in the event handler, but in an external system (“Analytics Service”)  The event handler should not wait until the analytic service processes the event – It would result in an accumulation of events at Connections level – Problematic as Connections queue retaining events to be delivered to event handler has a limited depth => Design event handler to consume and process events as fast as possible, ie: as the interface between IBM Connections and an external system
  25. 25. Taming the fire-hose... (2/2)  Characteristics of the data backbone – Distributed and highly available – Horizontal scale – High throughput – Agnostic to consumers' state  Multiple options – Message broker MQ / MQTT / ActiveMQ / Apache Kafka – Database – ...
  26. 26. Integration with a message broker – Apache Kafka Java class implementing the EventHandler interface Send JSON representation of the event. Serialization to JSON through Open Source GSON library
  27. 27. Integration with a message broker – Apache Kafka Registration – through events-config.xml Java class implementing EventHandler interface Subscriptions define the events delivered by the SPI to the event handler. Properties: name/value pair injected in the event handler java class. Typically used to pass config. settings Filtered by event name, source (IBM service), or/and type (CREATE, UPDATE, DELETE, ...)
  28. 28. Integration with a message broker – Apache Kafka Deployment – jar and dependencies made available to the SPI (running in the IBM Connections News application) through a Shared Library in WebSphere Application Server
  29. 29. 3rd party events can also participate in the social analytics solution  IBM Connections provides OpenSocial Activity Streams APIs allowing 3rd party to push their own events to the Activity Stream  From Connections 4.5: – Events pushed through the Activity Stream APIs are also surfaced in the Event SPI – An option allows to NOT surface an event in the Activity Stream APIs, ie: only surface through the Event SPIs => 3rd party application can also participate in the social analytics graph simply by publishing to the Connections Activity Stream APIs
  30. 30. Pulling data – when is it needed ? You can “pull” all data from Connections... but is it really needed? Good news:  Events surface in most case all data needed for analytics purposes (including the content the event is about)  Events about the same object repeat data – If there are X events about the same object, the item/correlation data set will always contain the most up-to-date information about the referenced object  For an analytic solution – in a nutshell, this means that the Event SPI should be sufficient in most cases 30
  31. 31. Pulling data – when is it needed ?  “Push” approach (Event SPI) is sufficient to build most analytic solution – All necessary content (textual content, tags, …) is surfaced in every single event – All operation changing relationships (ie: adding/removing member, colleague, follower) are surfaced as events  “Pull” (REST APIs) approaches should stay limited to: 1. “Bootstrap” the Analytics Service based on a Connections system with data existing prior to the introduction of the event handler used in your analytic solution • Essentially building membership/network data (as needed) • Seeding the content should not be needed, as it is repeated whenever an event about the content is generated 2. Fetching data not available through the Event SPI • Relatively “rare” for events generated from Connections
  32. 32. Pulling data from Connections 2 main approaches for pulling data from Connections 1. REST APIs (Atom / OpenSocial format) – REST-style HTTP based APIs (XML, Json format) – Transparency: programmatically access much of the same information that can be accessed through the IBM Connections UI – “Drink your own champagne” - public APIs used internally by plug-ins, mobile … and even some components Web UI (Activity Stream, Activities, …) 2. Seedlist – Designed to allow crawling of Connections data for indexing purpose by a search engine – Surfacing all content in the system – therefore it can be of some value for an analytic solution – HTTP based APIs (Atom XML format) 32
  33. 33. Seedlist  Example: /forums/seedlist/myserver returns ALL forum entries in the system – Textual content, author, number of comments, number of recommendations, parent id, ACL
  34. 34. Authentication aspects for the REST APIs  REST APIs support basic authentication, form-based authentication and (for most APIs) Oauth  Private data: strict enforcement of access on API calls – Not very convenient for access by an analytic system...  “Super user” – Concept of “super user” - access control checks on private data are by-passed – The “super user” is a user mapped in the JEE “admin” role across all Connections services  Public data: APIs that access public data don't require authentication – Provided that the environment is not configured to prevent anonymous access
  35. 35. Pulling data from Connections – What to use, when? REST APIs (Atom / OS APIs) Pro Seedlist • • • Batch retrieval of textual content Incremental updates (but the Event SPI is much more suitable for this purpose) • Focused around content - does not expose all the data (missing tags membership information, ...) • Fine granularity: access content / metadata for a specific object / container Access relationship information APIs are available for fetching membership lists, network information, who liked a given object, ... Cons • Lack of batch retrieval capabilities  In some very specific cases, data not available in a form easily consumable to build an analytic solution – Example: getting the list of followers for a given object in the system – Query directly the Connections databases (in these specific cases only) – Database schema can change overtime and is private
  36. 36. Key points  Leverage the Event SPI as much as possible – Provides (most of) the data needed for any elaborated analytics solution – Just let Connections push data to you! Easier, perform well  “Fill the gaps” by pulling data from the Atom/Seedlist APIs – Initial loading of relationship / content data – Data not available through the Event SPI  One final warning: – Analytic solution access to private data through the Event SPI, and Atom/Seedlist APIs (with admin role) => Ensure your solution is not leaking private data to unauthorized users
  37. 37. Analytics And Connections Data 37
  38. 38. The “Enterprise” Workflow ETL Data Analytics Prep Data Sources Data Consumption Credit: Paco Nathan 38
  39. 39. The Analytics Data Service UI service node.js identity service data analytics service Stream Workflow Web Processing coordinator Server Graph Database Graph Analytics pub/sub Map/Reduce Tools Big Table DB Hadoop/Zookeeper
  40. 40. Frequently Heard Big Data Dimensions A Fuzzy definition: – 4Vs: volume, velocity, variety, value – Can’t fit or be processed on a single machine – data intensive vs. compute intensive – Analytics focused 40
  41. 41. Big Data Aspects For Us To Consider Connections data: semi-structured, line formatted output, that works well with “a hadoop cluster” and graph time and spacial aspects de-normalized combined with multiple data sources calculations = data too explored for insights, innovate with data doesn’t ‘expire’, sticky The difference between “BI” and “Analytics” – Hadoop environments are designed to interpret the data at processing time – Processing attributes chosen by the person processing the data 41
  42. 42. ‘Simple’ Analytics Are Often Best  More data usually beats better algorithms – LOTs of data. Simple algorithms is not a bad plan.  But you will probably always want to ‘sample’ for efficiency Credit: Anand Rajaraman, Netflix 42
  43. 43. Handling The Data From Connections  Full Refresh – Often called “bulk load”  Delta Updates – Streaming via the SPI  What do you do with the data as it comes in ? – Files ? – Directly into stores ? – Directly into analytics ?  A need for real time analytics ? 43
  44. 44. Why A Property Graph In Analytics ?  A property graph has: – key/value properties – both vertices and edges can have any number of properties – directed relationships – (hint: this is not rdf) Reference:  We want to answer questions like: – Context around the event – Cause and effect of an event – Things related to an event  Property graphs are a very useful tool – Data science part – Production part 44 Name: bob calls Name:roger
  45. 45. Graph Analytics: A Specific Example For Connections Data em·i·nence noun ˈe-mə-nən(t)s : a condition of being well-known and successful Source: Merriam-Webstertechnology in our analytics service to calculate a person’s eminence ? How might we use graph Online 45
  46. 46. Graph Analytics – A Glimpse At Eminence Calculations A real eminence score can have 13 or more measures just from Connections meta data alone. creates Person A Status Update comments on creates Status Update Comment Person B Look for this graph pattern, then count comments and weight by who commented, normalize… = an eminence score element 46
  47. 47. Visualizing Analytics: A Real Dashboard Example Scores are fictionalized 47
  48. 48. Gradually Add More Data and Analytics For Deeper Insights Connections CRM Finding potentially obese people… Source: The Wall Street Journal Articles Other… 48 E-mail For us: What other data is coming in the Connections Event SPI ? (hint: it can be more than just connections data) Twitter What other sources of data are there outside of Connections ?
  49. 49. Summary: Find Business Value In Your Connections Data  From “transactions”/“social receipts” To insights  Effective use of Connections APIs  Key insights using Big Data Analytics on Connections Data  Engagement for better productivity and faster execution – – at the personal, organizational and company wide levels  Your insights are limited only by the data and your ability to process it for insights 49
  50. 50. For More Information Visit IBM’s Emerging Technology Page ! Stop by the Innovation center to see more I’ll be there to answer your specific questions ! More information about the Connections APIs and SPIs in the IBM Connections product wiki under “Developing” 50
  51. 51.  Access Connect Online to complete your session surveys using any: – Web or mobile browser – Connect Online kiosk onsite 51
  52. 52. Engage Online  SocialBiz User Group – Join the epicenter of Notes and Collaboration user groups  Follow us on Twitter – @IBMConnect and @IBMSocialBiz  LinkedIn – Participate in the IBM Social Business group on LinkedIn:  Facebook – Like IBM Social Business on Facebook  Social Business Insights blog – Read and engage with our bloggers 52
  53. 53. Acknowledgements and Disclaimers Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. © Copyright IBM Corporation 2014. All rights reserved.  U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.  IBM, the IBM logo,, Lotus, and IBM Connections are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at Other company, product, or service names may be trademarks or service marks of others. 53