OSCON 2010 - Panning for Gold: A Web Prospector's Guide to Mining and Filtering Community Data
1. Welcome to “ Panning for Gold ” A “Web” Prospector’s Guide to Mining and Filtering Community Data
2. Introductions Ben Bassi, President & CEO Seasoned Internet Pioneer At the forefront of online innovation for almost two decades… Bassi was a part of the founding management team of Lycos.com responsible for creating many of the first communities and portals for companies like Microsoft, CompuServe, Prodigy, MTV and GTE. EVP of the Firefly Network (acquired by Microsoft as the basis for the Microsoft Passport for MSN).
3.
4. Let ’s Get Started! “ Panning for Gold ” A “Web” Prospector’s Guide to Mining and Filtering Community Data
12. 3. Extracting Social Media Blogging / Micro-Blogging UG and Product Reviews Support Desk CRM Emails Databases
13.
14.
15.
16.
17. Sample Data Sources Blogs, Polls, Surveys Call Center Data Data Discovery Utilizes Both Structured and Unstructured Data
18. Why is Discovery Technology Different? Take Data “As-Is” with no need to first clean source files No Pre-Defined Dictionary Required No Need To Know What is In Your Data Before Beginning Exploration No Need To Know Data Relationships – Technology Identifies Them for You Real Time Works on the “ Long Tail” of Information Enabling Detection of Emerging Opportunities and Risks Solution Enables “Lenses” Which are Reusable Knowledge Digital Assets That Can be Utilized Throughout the Organization For Discovery in other Fields of Interest The Connection Engine Platform that Enables Real Time Data Exploration and Discovery
19. Sample Applications Regulatory Analysis Operational Analysis Strategic & Competitive Intelligence Market Research Call Center Operations Warranty Claims Product Development Litigation Support E -Discovery Patent Research Pricing and Packaging Claims Processing Surveys WEB/BLOG Forum Subscribed Content Customer/User Profiling Social Media Exploration Voice of Customer Fraud Related Discovery
Explain this in context of WHO WE ARE AND WHY SHOULD THEY LISTEN TO CP RE: DATA COLLABORATION Go into Ben ’s history with first bullet Open Source Development – cover first Drupal site – Greenopolis, Twolia, Kabbalah, etc. SFA is Sales force automation such as salesforce.com Web association web awards for Greenopolis and for Twolia.com
Plenty of search automation tools for social monitoring, brand management, and automated search They automate the search process for you – some are more flexible than others, but mostly they determine the search methodology Sentiment Data is provided, based on predefined keywords (happy, hate, like, wish, etc.) Expensive! Costing thousands of dollars for the same results your competition is getting. Trackur - Trackur is an online reputation & social media monitoring tool designed to assist you in tracking what is said about you on the internet. Trackur scans hundreds of millions of web pages–including news, blogs, video, images, and forums–and lets you know if it discovers anything that matches the keywords that interest you. (YOU HAVE TO KNOW THE KEYWORDS UP FRONT!) Radian6 - Radian6 provides organizations with the software platform to listen, measure and engage in conversations across the social web. Our social media monitoring and engagement software is used by public relations, marketing and customer service and support professionals to better understand and serve their customers. Radian6 tracks mentions across over 100 million social media sites and sources and returns the results for exploration, understanding and action. (YOU HAVE TO KNOW THE WHAT YOU ’RE LOOKING FOR UP FRONT!) ScoutLabs - Scout Labs is a powerful web-based application that tracks social media and finds signals in the noise to help your team build better products and stronger customer relationships. (Again, you have to know what you ’re looking for ahead of time) Techrigy Alterian - Alterian SM2 is a social media monitoring and analysis solution designed for PR and Marketing professionals. Alterian SM2 helps you track conversations, review positive/negative sentiment for your brand, clients, competitors and partners across social media channels such as blogs, wikis, micro-blogs, social networks, video/photo sharing sites and real-time alerts. Visible Technologies - truCAST is Visible Technologies' proprietary discovery, collection, processing, analysis and engagement architecture for social media content. The truCAST engine is the foundation of our product suite, empowering our clients to comprehensively listen, learn, engage, and protect their brands online. (Need 3 products to provide 360 degree view of predefined terms) Even Hubspot promises intelligence but only offices a scraping of when your predefined keywords appear. Check out http://www.slideshare.net/StefanBetzold/social-media-monitoring-tools-an-overview for overview of all tools discussed on this page. According to Gartner Leaders: Vendors falling in this space have software that benefits both the company and the consumer. "Leaders' software convinces users that they will get something valuable by participating in a conversation or community," the report states, "Leaders' offerings demonstrate support for multiple CRM processes, not just one domain, and have substantial revenue coming specifically from their social CRM offerings." Jive Software Lithium Technologies Challengers None Visionaries: These vendors are paying close attention to market trends, such as collaboration. "Their products and product road maps exhibit innovation, especially in architecture and lightweight integration," Gartner writes. Mzinga Salesforce.com KickApps Niche Players: Some of these vendors or products may be narrow in scope, but Gartner writes that they "provide useful, focused technology." Gartner suggests, however, that growth is crucial for all of them, especially given the research firm's predicted rise in the need for differentiation in 2011. RightNow Demand Media Vovici Bazaarvoice Nielsen BuzzMetrics LiveWorld Hubbard One (a unit of Thomson Reuters) Radian6 Oracle CRM On Demand Globalpark Leverage Software InsideView Visible Technologies Overtone
Plenty of open source or free tools to automate search, provide alerts They will tell you who your influencers are Still, need to know what you are searching for (keywords, company name, product names, etc.) Google and Technorati Blog searches Twitter Search Twingly - Follow and discuss any topic or event with Twingly You know how tiring it is to go through 100's of links every day just to find the one or two that matters? Get started with Twingly in just 60 seconds or less and never spend time on irrelevant news again. Tweetgrid & Monitter – set up real time monitors of keywords Social Mention – Set up pages to monitor for keywords across blogs, microblogging, social networks, and bookmarking sites. Filterbox (acquired by Jive Software) - Filtrbox offers a content monitoring service that lets you define and “filtr” web content that is relevant to you or your business. Since Flitrbox is web-based, it can be integrated not only into web applications, but also into e-mail and mobile text messaging. (Close, but still lacks “DISCOVERY” tool set) Rapleaf – Lets you know where your members are members – are they on Facebook, Twitter, Myspace? How many friends do they have, how often do they post – crosses into CRM field. Still need to decipher information. Blogpulse - An automated trend discovery system for blogs. It analyzes and reports on daily activity in the blogosphere. OpenCalias – Takes a different approach and analyzes the data you have to give you insight into keywords and semantic search terms so you can reuse the information elsewhere – like all the tools we ’ve shown here….
Traditional search methods LIMIT the information you can pull in. (Mine) Often requires multiple tools and Manual Exports, Grouping, Aggregation. The ability to pull in useful information – not the same old stuff everyone else if using. Need to be flexible with the information gathered – can you determine the fields to be gathered? Pull in more than you need and filter it for nuggets – most tools pull a small subset of information.
Using traditional methods of extracting often results in SILOS of information. Multiple unrelated databases. Many Many spreadsheets
Raw Data needs to be aggregated, analyzed, filtered, compared, etc. Each time you merge and manually aggregate data, you remove your data from its context – the more times you do it – the more context is lost.