<Gina> Welcome to today ’s presentation, “Using Web Data to Drive Revenue and Reduce Costs.” My name is Gina Cerami, and I ’m the Vice President of Marketing here at Connotate. This presentation seeks to explore how companies can strengthen their competitive advantage by leveraging publicly available Web sources. This presentation will last approximately 30 minutes, followed by a live question and answer session . You may submit your questions anytime during the session using Chat feature. Also, during the presentation, we will pose several survey questions which you may answer using the Polling feature that will appear when we open the survey.
<Gina> We still have some folks logging in. So, while we ’re waiting, I’d like to take a moment to provide a bit of background about Connotate. Our heritage is in leading-edge research conducted at Rutgers University, funded in part by DARPA. For over a decade, our focus has been to discover the most efficient ways to extract value from Web data. Connotate is an expert in this field. Since 2000, we have been helping global clients like the Associated Press, Thomson Reuters, Dow Jones and many others leverage Web data for strategic advantage. Today, we will share best practices that we’ve developed over the years. We hope this information will help you get more value out of any Web data project you may attempt, either now or in the future. Our presenters today are Keith Cooper, CEO of Connotate and Chris Giarretta , Vice President of Sales Engineering at Connotate.
<Gina> To start off our discussion today, Keith Cooper will share use cases illustrating the variety of ways in which organizations are using Web data to make money and save money. These use cases span a wide spectrum of vertical industries including financial services, biochemicals, background screening, online advertising and the information industry. Chris Giarretta will then delve into some of the technical aspects of using Web data, including options for automating Web data collection processes, questions you should ask your vendor and best practices for tackling a Web data extraction project. Throughout the presentation, we will be conducting a series of polling questions to ask you how you are using Web data today and what you are planning to do in the future. Before I turn the presentation over to Keith, let me remind the audience that there will be live Q&A at the end of the presentation. If you would like to submit a question, please use the CHAT feature provided by Webex. You can submit your questions at anytime during the presentation. Now I will turn the presentation over to Keith Cooper.
<Keith> (Some suggested talking points – Keith, please modify as you see fit) – “ 90% of the data in the world today has been created in just the past two years .” Gina got the number from an IBM citation: http://www-01.ibm.com/software/info/rte/bdig/index-pre.html?S_TACT=101MY87W
<Keith> Here’s a quick overview to set the stage. We start with a Web site, on the left, and go through a process that we call Data Extraction. The data may initially be in HTML, PDF or images – not usable by a computer. Once we extract the data, we transform it into something usable such as XML or Excel files. At that point, it’s ready to be used by your applications to generate revenue or reduce costs, as you’ll see in the next few examples. We’ll come back and revisit this process in more detail after we talk about some real world case studies.
<Keith> There are many use cases…today we’ll touch on just a few of them.
<Keith> The first cases will focus on companies who are experiencing revenue growth derived from products and services that incorporate Web data.
<Keith> Generating sales leads is a big business. The higher quality the lead, the more it is worth. HG Data is building a business around this concept. Let’ say a plain vanilla B2B sales lead is worth $1 – just contact information for Joe (name, email, title). Now, append distinct attributes to that lead such as: --Joe’s years of job experience managing Oracle deployments and his background in retail --Joe’s connection to a CRM reseller who publishes a customer win announcement mentioning Joe’s employer --Joe’s employer posting job openings in his department From this, one can deduce that Joe may be heading up a long-term CRM project and is likely candidate for software or services that support Oracle-based CRM projects The value of that lead increases 5 or 10-fold by appending sales intelligence attributes to the lead. That’s what HG Data does, but at a huge, scale. The result is a database of millions of profiles of enterprise technology usage, a highly granular resource for technology vendors.
<Keith> To produce this high-value directory of sales leads, HG Data leverages millions of sources – including corporate postings, press releases, articles etc. They are using publicly available Web data, along with licensed content. By applying artificial intelligence to map the connections and by continually refreshing the model with updated Web data, HG Data creates a dynamic information service that it can sell to other businesses who find it extremely valuable.
<Keith> This next example is for the investors in the audience. Everyone is trying to outsmart the market. The trick is to do this with publicly available data. Smart investors are doing this today. Organizations and government agencies continually publish data on the Web that provide enough information to accurately predict company or sector performance well in advance of published numbers. However, it takes a lot of time to manually grab that data. An automated process makes it feasible.
<Keith> Let’s say you want to track camera sales. You can capture pricing data on a daily or weekly basis; depending on your analysis model you may want to do a full sweep of inventory or just samples.
<Keith> … .using automation, the specific data you wanted from the Web page is now in an Excel file that can be charted and trended over time. With this, you can: --Build unique time-series data sets for predictive analysis --Feed this into your proprietary modeling and analysis tools Outsmart the market!
<Keith> This next example focuses on a financial/business information provider. This company has rolled out several highly successful information services that leverage Web data – including one that provides accurate background data on parties to a financial transaction. The company sought to enhance this service with real-time data that made it much more valuable to its subscribers and command a premium price. The were able to leverage automation to roll out a high-end risk-assessment service that continually monitors data sources to provide up-to-date information on a company’s assigned risk status. This is important not only to help banks and other parties put a dollar value on the deal and weigh the risks involved but also to avoid compliance penalties for doing business with ‘bad guys”. There was a point in product development where the team reached a critical “build versus buy” decision. They were more than half-way to getting the product out the door when the process got bogged down. They turned to Connotate and we were able to help them get the product to market faster – ahead of the competition.
<Keith> Now I’ll explain the process of collecting and transforming the data. There are thousands of regulatory bodies posting judgments, actions and notices about companies and persons with whom you might be conducting business….These sites are updated daily. On your screen are three ….the Dubai Financial Services Authority, the Netherlands Authority for Financial Markets (AFM) and the U.S. Federal Trade Commission (the FTC). You’ll notice that the AFM site is password-protected. It is easy for an automated solution to input login and passwords to extract data, as long as you are already a legitimate subscriber to the service. On the next slide, you’ll see what happens when we pull out the data and structure it in a usable form.
<Keith> The data is now in XML format. This particular file provides information about a person on the Board of Governors of a Bank; the data extracted shows that he is a Politically Exposed Person (PEP) which impacts the bank’s risk status; this is a very valuable piece of information. The level of risk can change daily - hence the high value of automating access to websites that keep tabs on this type of information.
<Keith> Now let’s look at a completely different use case – Price Optimization. Sigma-Aldrich is a top tier life science company specializing in biochemicals, with over 7,600 employees and operations in 40 countries. In the world of chemical and biochemical manufacturing, margins are key. With all things being equal, the decision of where to buy can be motivated by just a few dollars and cents. Sigma recognizes that their customers are becoming more and more educated, and they wanted to track pricing for over 150,000 products across 40+ competitor sites. They were using some automated tools along with some manual processes to collect pricing data but these processes were introducing errors and yielding inaccurate results. Automation helped them not only reduce labor costs but also optimize their prices faster and more accurately to improve margins.
<Keith> Let’s take a step back and look at what price optimization might mean for your business. Results will vary widely across different industry segments. Here is one fact to consider. According to a July Gartner 2012 Report, “companies that have implemented price optimization successfully have realized improvements of two to four percent of total revenue or more.” So let’s put that into perspective: For a Fortune 500 Company, this translates to an uptick of anywhere from $8.75 million to $16 billion dollars – based on the 2% increase in annual revenue for smallest company on the Fortune 500 list to 4% of revenues for Walmart, who is typically at the top.
<Keith> Now, let’s turn our attention to cost reduction
<Keith> Tandem Select is a mid-size credit reporting agency (CRA) performing criminal records-checking as part of its background screening services. Previously, the company relied on manual processes to obtain criminal records from various court houses. By using automation to collect criminal records from websites across hundreds of jurisdictions, the company was able to reduce background check time from hours to minutes, offer its clients guaranteed turnaround time – all without increasing staff. The volume of business shot up 62% while expenses went down.
<Keith> Here is how the solution works. When a background check is requested, the company’s internal software application uses a Web services request to kick off the process – so this is a request-driven action – the Web data extraction piece (the “Agent”) goes to the appropriate court sites and looks for criminal records, traffic violations, etc. returning the data in a spreadsheet-type format. The entire roundtrip to Web sites and back takes between 6 and 20 seconds to complete…a dramatic reduction in turnaround time.
<Keith> Our last case study example is quite compelling….interactive advertising. Online Advertising is a huge industry – Pricewaterhouse estimated its’ market size as $17B in revenue in just the first half of 2012 In the world of online advertising, Billing is complex with multiple layers of providers, middlemen and services - each taking their cut of the pie when an ad is served. Usage statistics are published as Web data across 100s of different Ad Network portals – revenue collection can be a nightmare for Supply-Side Platforms which aggregate clicks and impression data for advertisers. Data collection processes were error-prone Revenue collection lagged due to need for extensive error-checking and correction Now, Automation supports accurate data collection from 100s of password-protected sites throughout the day – and in addition, Supply Side Platform providers can continually display aggregated dynamic ad exchange data, letting advertisers see real-time, side-by-side comparisons of online ad traffic They can instantaneously optimize ad placement
<Keith> The automated solution is simple: the input parameters are “logins” and “Passwords” . There are also filters, which are easily configured to precisely capture statistics needed such as the date/time of the ad campaign, the number of impressions, and revenue generated. All of this unstructured data is transformed into an Excel spreadsheet for fast, accurate billing. The basics of this solution are applicable in any situation where usage data is retrievable from multiple Web portals. Using automation to streamline the process of capturing and structuring Web data not only saves time and money, it reduces errors to ensure timely, accurate reporting and revenue collection.
<Keith> In a number of these use cases, I mentioned the use of automation. So let’s take a look at exactly what that means, when it comes into play and how it affects your use of Web data. At this point, I’m going to turn the discussion over to Chris Giarretta who will take a look at automation and other aspects of the technology behind these success stories.
<Chris> The first thing to consider is, “what kind of data is on the Web” and what do I want to retrieve? The universe of data is quite large…most of our customers focus on text data.
<Chris> … . we can also extract images form the page or, give the snapsot of the page in HTML or PDF format.
<Chris> Let ’s take a closer look at the pros and cons of various approaches. Some applications lend themselves to using manual approaches and crowdsourcing, but there is always the risk of introducing human error. A bigger concern is the fra gile nature of the Web. The Web changes all the time. Many projects require continual monitoring for changes; and change detection with highlighting to support workflow productivity. A robust automated solution such as Connotate ’s will provide that. More importantly, will the solution “break” when the HTML on the page changes. Connotate’s patented visual abstraction solution is designed to be resilient to certain changes; if the page is competely changed, customers using our hosted solution don’t have to worry – we get the automation up and running quickly which isn’t the case for scrapers and data providers. Without a true monitoring service and dynamic platform, a single pull system or fragile system only delivers a fragment of the value and doesn ’t allow for the time series analytics that organizations need today.
<Chris> All of these actual case studies that Keith mentioned before achieved results by following a fully automated approach. Scenarios that warrant an automated solution include situations where a lot of internal and external data needs to be aggregated and / or you need to monitor a variety of sources. If you are dealing with high volumes of data – or Web sites which change frequently, it quickly gets very expensive to have your staff continually check sites and look for changes. Automation is also required when you need frequent updates, such as news aggregation or price optimization in retail. At Connotate, we hear a lot of different data needs from all different kinds of companies and we understand that an automated solution is not always the answer. For example, when we see a company that needs to do a lot of complex product matching---let ’s say, for apparel—we may recommend crowd sourcing as a viable approach. Or -- if you have a small amount of data that you need only a few times a year – you may not need a scalable, automated approach. Now I’m going to turn it over to Gina who will invite you to answer a polling question.
Gina Let ’s take a brief pause to ask our audience about their experiences collecting data from the Web. Is it an automated process? Are you doing it manually? Or, are you not collecting Web data at all? Also, I ’d like to remind the audience that there will be a live Q&A session at the end of the presentation…but you can submit your questions anytime using the Chat feature on your screen. Now I’m going to hand the presentation back to Chris Giarretta so he can give you some tips on evaluating Web data extraction solution providers.
Chris If you are thinking about a Web data extraction project, I’d like to share some best practices we’ve learned over the years to help you get started.
<Chris> At this stage, after examining your options, if you still need to narrow down your options, it may be possible to apply automation to leverage Google and other search engines to refine the scope of your project. Once you have the list of URLs, we can help you identify the sites that are easy to access versus those that aren ’t. (Chris, can you give some examples?) Next, you need to think about scoping the project. How many sites? How often do you need to monitor and/or collect data? It ’s important to be flexible here and to work with someone who will take the time to understand your needs and adjust the scope/direction of the project, if needed to deliver you the most value. Finally, you ’ll want to look in the long-term and consider the maintenance costs of your project, and how to minimize them. Deploying software on-site gives you the most control, but you’re carrying the ball when it comes to maintaining the solution and expanding scope quickly if need be. A hosted deployment eliminates those headaches and can be more cost-effective in the long-run.
<Chris> Let’s take a look at questions to consider before you choose a solution.
<Chris> Here are five useful questions to ask when evaluating Web data extraction solutions.
Gina Let ’s take a moment to ask our audience to answer a question about the value of automation. Based on your experience and based on what you’ve heard hear today, do you believe that automating the collection process could add value to your business? Also, I ’d like to remind the audience that they can use the Chat feature on their screen now to submit questions to the presenters for our Q&A session at the end.
Gina We covered a number of use cases for collecting Web data - there are many other examples as well. You may be thinking of other strategic initiatives in your own organization. If so, we hope that you have found today ’s presentation helpful in discovering some of the aspects you need to consider as you decide the next steps in your project. At this point, we will be posting a final polling question. Please take a moment to respond before you leave the webinar.
<Gina> Now, for your questions. Several of you have asked about obtaining a copy of today ’s presentation. We will send you a link to the archived presentation within 2 business days. We also invite you to answer our last poll which appears on the right of your screen. -----------------
Thank you for attending today ’s Webinar. Please visit our Web site for information about our products, services, and future Webinars. This concludes our presentation.
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue
and Reduce Costs
Presenters: Keith Cooper, CEO, Connotate
Christian Giarretta, VP of Sales Engineering,
Moderator: Gina Cerami, VP of Marketing, Connotate
Date: March 12, 2013
Chief Executive Officer
VP of Sales Engineering
• Why Web Data?
• Drive revenue
• Reduce costs and streamline processes
• Automation Options
• Scoping Your Project
• Five steps to success
• Evaluating Providers
• Five questions to ask
The Web Provides the Largest Source of Data Ever
Financial sites Corporate sites
Regulatory sites Retail sites
State and local court sites
…and the Data Continues to Grow and Change
at Unprecedented Rates
• 1.2 zettabytes of new digital content created in 2011* (zettabyte = 1B
• The Internet will double in size every 5.32 years **
* IDC’s The Digital Universe
In Order to Use All of That Information…You
Need to Find It, Filter It and Format It…
…Then You Can Turn Web Data into Profits
Sample Use Cases:
Online ad usage reports
Business risk assessment
Aggregate construction bids
Supply chain monitoring
Voice of the Customer
Social media monitoring
Deliver High-Value Directories: HG Data
• Build the largest, most accurate database of B2B tech customer intelligence
• Combine public and private content in unique ways to reveal new insights
• Solution: Use automation to cost-effectively extract
business intelligence from millions of Web documents
• 10,000+ agents built to date
• Highly granular database of 1M + profiles of enterprise technology users
• Business Benefit
• Successful go-to-market: disruptive technology replaces manual process
• Extracting new value in business area long hobbled by stale sources
Research In Motion
Supplier to Customer Value Added
Law Firm of William Koy LLP
Reveals Customer Relationships Between
Business Entities: HG Data
Gain Transparency Mid-Quarter to Better
Predict Company Performance: Financial Firm
• Gain daily/weekly/monthly visibility into inventory/sales of companies and
market segments where data is made public only on a quarterly basis
• Solution: Continually monitor available inventory and other
data posted on websites in those markets
• Use automation to capture precise indicators on an ongoing basis
• Analyze trends and make predictions
• Business Benefit
• Transparency supports more accurate predictions of financial results to
support smarter investment decisions
Gain Transparency Mid-Quarter: Web Page
•Check camera prices daily
•Full-sweep of camera
•Map trends, spot anomalies
•Compare one or two
targeted suppliers to overall
Enhance Risk Assessment:
Business Information Provider
• Deliver updates/alerts on changes in assigned risk status of counterparties to
a financial transaction instead of just producing a static report
• Solution: Use automation to monitor websites for updates
• Monitors sites for changes that affect a business entity’s assigned risk status
– mergers, acquisitions, bankruptcies, de-listings, regulatory changes,
• Business Benefit:
• First-to-market with a risk assessment service offering continual monitoring
• Fresh Web data is integrated into customer (financial institution) workflow –
enhancing customer “stickiness”
• Automated Web data extraction solution delivered a 6-month payback
Price Optimization: Sigma-Aldrich
• Optimize product positioning in B2B market where buying decisions can be
motivated by a few dollars or cents
• Competitors’ prices are changing constantly
• Solution: Replace manual spot checking of prices with
precise automated Web data extraction
• Continually extracts sizing/pricing on more than 150,000 products
• Acquired usable data for historical trend analysis
• Business Benefit
• Optimizes prices to improve profit margins
• Reduced manpower devoted to data collection by 50%
Price Optimization Pays Off
$8.75M - $16B for Fortune 500 Company
realize 2 to 4%
Automated Records Check Improves Speed
and Accuracy: Tandem Select
• Criminal records are highly structured; accuracy and reliability is key for
people making hiring decisions
• Deliver guaranteed turnaround time on accurate checks without adding staff
• Solution: Replace manual processes to extract records
directly from court websites on demand
• Business Benefit
• Average background check time reduced from hours/days to minutes
• Much better quality - far fewer errors – guaranteed turnaround time
• In 12 months, order fulfillment increased 62% while operating expenses
Automated Records Check (FetchCheck):
Standard customer order
at Tandem site
Tandem’s application calls
with a Web services
Agent extracts, transforms
and normalizes data
Information is returned to
Process takes between 6
and 20 seconds to
Improve Revenue Collection Processes with
Accurate Reporting: Interactive Advertising
• Billing reconciliation was taking weeks/months (14 people overseeing daily
data collection, 5 days/week)
• Usage data posted on multiple password-protected Web sites (portals)
• Solution: Automated Web data collection accesses portals
for highly-accurate reporting and billing
• Reported data is 100% error-free; data is collected 365 days/year
• Business Benefit
• Quality data supports timely, accurate billing (reconciliation in days)
• Aggregated views enable ad placement optimization increasing customer ad
revenue 30 – 300%
Web Page is Transformed into Usable Data
1. Navigates the portal 2. Precisely captures statistics
3. Turns data
A Closer Look at Different Approaches
Manual offshore No economies of scale; human error compromises quality.
A viable approach for complex tasks like product matching
of apparel for one-shot projects; may be less reliable for
ongoing monitoring and long-term projects.
In-house or low-cost
Not resilient; scrapers break when Web page HTML
changes, expensive programmers must fix scripts, increasing
total cost of ownership (TCO)
High degree of control; better resiliency to change – reduced
TCO however, project complexity and future needs may
indicated hosted solution is better
Robust solution hosted
Highest resiliency; no maintenance burden – reduced TCO;
24/7 follow-the-sun support; infinitely scalable and no capital
expenditures for hardware or IT resources.
Manual versus Automated Approaches
Your Data Needs To Automate or Not?
High-volume data monitoring Automate
Variety of sources Automate
Frequent updates and/or monitoring Automate
Need for data post-processing Automate
Small amount of data required just a few
times a year from very simple sites
A manual approach may be
One-time feed of very specific data Purchase data from 3rd
Product matching applications where unique
identifiers are not available
May want to consider
Polling Question: Web Data Collection
Are you currently collecting data from the Web?
Yes – we are doing this using an automated process
Yes – we are collecting Web data using a manual process
Yes – we are using BOTH manual and automated approaches
No – we are not collecting Web data
Scoping Your Project: 5 Steps to Success
1. Clarify what you want to do with the
2. Look at what’s happening manually
today – find out how users are
accessing the Web – these are
targets for automation
3. Identify the sources you need
4. Narrow your scope….you may not
5. Anticipate future requirements
Evaluating Providers: 5 Questions to Ask
• Can it scale up easily and quickly?
Look for a proven ability to handle high-volume, high-frequency applications without draining
your IT resources
• Is it resilient ?
Can it withstand website formatting changes – or will it “break,” requiring code fixes?
• How does it detect / deliver updates?
You’ll save time and money with change detection with highlighting – the ability to detect and
deliver “just the changes”
• Does it support my operational workflows?
Built-in job scheduling, resource shared access and other features can increase efficiencies
and coordinate workflow
• What are the deployment options?
Flexible options for on-premise and hosted solutions should adapt to your needs
Polling Question: The Value of Automated
Web Data Collection
Do you believe automating Web data monitoring and
extraction could add value to your business?
Yes – we are doing this now
Yes – we are planning a project in the near future
No – not at this time
I need more information before deciding
Here’s What Success Looks Like…
Create new and
… Connotate’s experts are ready to take you
Q & A
Connotate will email a link to this presentation as well as a
copy of the slides to you within 2 business days.
If you have an immediate need and would like us to contact
you about a forthcoming project, please check the appropriate
box in the last polling question or call (+1) 732-296-8844.
For more information, you may also visit www.connotate.com
If you have an immediate need and would like us to contact
you about a forthcoming project, please check the appropriate
box in the last polling question or call (+1) 732-296-8844.
For more information, visit
www.connotate.com or www.connotate.co.uk