We've known for years that data-driven content was a 'thing' when we'd produce simple infographics that shared a few statistics and they'd get easy traction for us online. The game has lifted and consumers are becoming more and more obsessed with data and are now demanding higher quality and more complex data-driven content. The challenge for us now as "T-Shaped" marketers is that there are increasing demands for us to learn new skills to produce this content but we don't have the time to do this amongst the other things we need to be expert at.
This presentation is going to give you specific help on how to produce data-driven content without any programming skill. After watching this presentation you'll have the confidence to build your own data-driven content with the knowledge of:
- blueprints for data-driven content ideas
- scraping tools, frameworks and methodologies
- how to brief in a data scraping project to your in-house team or a freelancer
- how to turn your data into visually appealing content
- channels for promoting data-driven content to ensure it gets traction
Applied Enterprise Semantic Mining -- Charlotte 201410Mark Tabladillo
Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2014 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index, and will also provide a comparison between what semantic search is and what Delve does. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.
Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2012 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.
Presented at SQL Saturday Atlanta May 18, 2013
Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2012 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...Semrush
These slides were presented at the SEMrush webinar "How to Leverage Insights from Your Site’s Server Logs | 5 Hours of Technical SEO". Video replay and transcript are available at https://www.semrush.com/webinars/how-to-leverage-insights-from-your-site-s-server-logs-or-5-hours-of-technical-seo/
Applied Enterprise Semantic Mining -- Charlotte 201410Mark Tabladillo
Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2014 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index, and will also provide a comparison between what semantic search is and what Delve does. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.
Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2012 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.
Presented at SQL Saturday Atlanta May 18, 2013
Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2012 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...Semrush
These slides were presented at the SEMrush webinar "How to Leverage Insights from Your Site’s Server Logs | 5 Hours of Technical SEO". Video replay and transcript are available at https://www.semrush.com/webinars/how-to-leverage-insights-from-your-site-s-server-logs-or-5-hours-of-technical-seo/
Leveraging the semantic web meetup, Semantic Search, Schema.org and moreBarbaraStarr2009
A history and description of the adoption of Semantic Search by the major search and social engines. Covers schema.org, the knowledege graph and status to date (july 30, 2013). Presented From a Search Engine Point of View.
Smarter content with a Dynamic Semantic Publishing PlatformOntotext
Personalized content recommendation systems enable users to overcome the information overload associated with rapidly changing deep and wide content streams such as news. This webinar discusses Ontotext’s latest improvements to its Dynamic Semantic Publishing (DSP) platform NOW (News on the Web). The Platform includes social data mining, web usage mining, behavioral and contextual semantic fingerprinting, content typing and rich relationship search.
My slides for the session I held at Azure Bootcamp Bulgaria 2016. This presentation will work you through the basics of Azure Search and show you how to build a sample application using ASP.NET Web API and KnockoutJS for databinding.
GraphDB Cloud: Enterprise Ready RDF Database on DemandOntotext
GraphDB Cloud is an enterprise grade RDF graph database providing high-performance querying over large volumes of RDF data. On this webinar, Ontotext demonstrates how to instantly create and deploy a fully managed Graph Database, then import & query data with the (OpenRDF) GraphDB Workbench, and finally explore and visualize data with the build in visualization tools.
In this webinar Thomas Cook, Sales Director, AnzoGraph DB, uses real-world flight data to discuss RDF and its newer property-graph-functionality iteration, RDF*, wrapping up with a pair of real-world demonstrations via Zeppelin notebooks.
Session about how to optimize public-facing sites built on SharePoint publishing infrastructure. Session is based on real-life experiences and provides valuable information to all SharePoint developers who build sites using SharePoint publishing infrastructure. It was special session for SPUG.fi community meeting.
Thomas Delerm and Adrien Di Mascio from Logibal will explain the interest of web semantics in modern web applications for the best use of your data.
They’ll give the recipes that make Jahia an appropriate CMS for the semantic and linked data web, a.k.a "web 3.0"
Optimizing Content with SEO and Social MediaErudite
Understanding the nature of how data is structured on the web, and how search engines work - helps us to underpin our knowledge on how to improve our search and social media presence. This talk was delivered at Outreach Digital. A volunteer organisation that seeks to share and promote learning to new and interested digital marketers.
Social Media Data Collection & AnalysisScott Sanders
A non-technical primer on how to collect and analyze social media data. This was an invited lecture by Biostatistics and Bioinformatics Department in the School of Public Health at the University of Louisville.
ALT-F1.BE : The Accelerator (Google Cloud Platform)Abdelkrim Boujraf
The Accelerator is an IT infrastructure able to collect and analyze a massive amount of public data on the WWW.
The Accelerator leverages the untapped potential of web data with the first solution designed for diverse sectors,
completely scalable, available on-premise, and cloud-provider agnostic.
Leveraging the semantic web meetup, Semantic Search, Schema.org and moreBarbaraStarr2009
A history and description of the adoption of Semantic Search by the major search and social engines. Covers schema.org, the knowledege graph and status to date (july 30, 2013). Presented From a Search Engine Point of View.
Smarter content with a Dynamic Semantic Publishing PlatformOntotext
Personalized content recommendation systems enable users to overcome the information overload associated with rapidly changing deep and wide content streams such as news. This webinar discusses Ontotext’s latest improvements to its Dynamic Semantic Publishing (DSP) platform NOW (News on the Web). The Platform includes social data mining, web usage mining, behavioral and contextual semantic fingerprinting, content typing and rich relationship search.
My slides for the session I held at Azure Bootcamp Bulgaria 2016. This presentation will work you through the basics of Azure Search and show you how to build a sample application using ASP.NET Web API and KnockoutJS for databinding.
GraphDB Cloud: Enterprise Ready RDF Database on DemandOntotext
GraphDB Cloud is an enterprise grade RDF graph database providing high-performance querying over large volumes of RDF data. On this webinar, Ontotext demonstrates how to instantly create and deploy a fully managed Graph Database, then import & query data with the (OpenRDF) GraphDB Workbench, and finally explore and visualize data with the build in visualization tools.
In this webinar Thomas Cook, Sales Director, AnzoGraph DB, uses real-world flight data to discuss RDF and its newer property-graph-functionality iteration, RDF*, wrapping up with a pair of real-world demonstrations via Zeppelin notebooks.
Session about how to optimize public-facing sites built on SharePoint publishing infrastructure. Session is based on real-life experiences and provides valuable information to all SharePoint developers who build sites using SharePoint publishing infrastructure. It was special session for SPUG.fi community meeting.
Thomas Delerm and Adrien Di Mascio from Logibal will explain the interest of web semantics in modern web applications for the best use of your data.
They’ll give the recipes that make Jahia an appropriate CMS for the semantic and linked data web, a.k.a "web 3.0"
Optimizing Content with SEO and Social MediaErudite
Understanding the nature of how data is structured on the web, and how search engines work - helps us to underpin our knowledge on how to improve our search and social media presence. This talk was delivered at Outreach Digital. A volunteer organisation that seeks to share and promote learning to new and interested digital marketers.
Social Media Data Collection & AnalysisScott Sanders
A non-technical primer on how to collect and analyze social media data. This was an invited lecture by Biostatistics and Bioinformatics Department in the School of Public Health at the University of Louisville.
ALT-F1.BE : The Accelerator (Google Cloud Platform)Abdelkrim Boujraf
The Accelerator is an IT infrastructure able to collect and analyze a massive amount of public data on the WWW.
The Accelerator leverages the untapped potential of web data with the first solution designed for diverse sectors,
completely scalable, available on-premise, and cloud-provider agnostic.
Web scraping tools are software developed specifically to simplify the process of extracting data from websites. Data mining is a rather useful and commonly used process, but it can also easily turn into a complicated and messy activity and take a lot of time and effort.
What is Web Scraping and What is it Used For? | Definition and Examples EXPLAINED
For More details Visit - https://hirinfotech.com
About Web scraping for Beginners - Introduction, Definition, Application and Best Practice in Deep Explained
What is Web Scraping or Crawling? and What it is used for? Complete introduction video.
Web Scraping is widely used today from small organizations to Fortune 500 companies. A wide range of applications of web scraping a few of them are listed here.
1. Lead Generation and Marketing Purpose
2. Product and Brand Monitoring
3. Brand or Product Market Reputation Analysis
4. Opening Mining and Sentimental Analysis
5. Gathering data for machine learning
6. Competitor Analysis
7. Finance and Stock Market Data analysis
8. Price Comparison for Product or Service
9. Building a product catalog
10. Fueling Job boards with Job listings
11. MAP compliance monitoring
12. Social media Monitor and Analysis
13. Content and News monitoring
14. Scrape search engine results for SEO monitoring
15. Business-specific application
------------
Basics of web scraping using python
Python Scraping Library
Lessons learnt and system built while solving the last mile problem in machine learning - taking models to production. Used for the talk at - http://sched.co/BLvf
Copy of the slides from the Advanced Web Development Workshop presented by Ed Bachta, Charlie Moad and Robert Stein of the Indianapolis Museum of Art during the Museums and the Web 2008 conference in Montreal
I2 - SharePoint Hybrid Search Start to Finish - Thomas VochtenSPS Paris
One of the most compelling additions to a SharePoint practitioner’s toolbox is hybrid search. Although hybrid search capabilities were already around for a few years, with the introduction of the “Cloud Search Service Application” things got a lot more interesting. This demo-heavy session will focus on the technical implementation details and their prerequisites, as well as the typical hurdles that you’ll face in your first hybrid search project.
What is the current status quo of the Semantic Web as first mentioned by Tim Berners Lee in 2001?
Not only 10 blue links can drive you traffic anymore, Google has added many so called Knowlegde cards and panels to answer the specific informational need of their users. Sounds complicated, but it isn’t. If you ask for information, Google will try to answer it within the result pages.
I'll share my research from a theoretical point of view through exploring patents and papers, and actual testing cases in the live indices of Google. Getting your site listed as the source of an Answer Card can result in an increase of CTR as much as 16%. How to get listed? Come join my session and I'll shine some light on the factors that come into play when optimizing for Google's Knowledge graph.
Most data visualisation solutions today still work on data sources which are stored persistently in a data store, using the so called “data at rest” paradigms. More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. These data stream publish with high velocity and messages often have to be processed as quick as possible. For the processing and analytics on the data, so called stream processing solutions are available. But these only provide minimal or no visualisation capabilities. One was is to first persist the data into a data store and then use a traditional data visualisation solution to present the data.
If latency is not an issue, such a solution might be good enough. An other question is which data store solution is necessary to keep up with the high load on write and read. If it is not an RDBMS but an NoSQL database, then not all traditional visualisation tools might already integrate with the specific data store. An other option is to use a Streaming Visualisation solution. They are specially built for streaming data and often do not support batch data. A much better solution would be to have one tool capable of handling both, batch and streaming data. This talk presents different architecture blueprints for integrating data visualisation into a fast data solution and highlights some of the products available to implement these blueprints.
Off-Label Data Mesh: A Prescription for Healthier DataHostedbyConfluent
"Data mesh is a relatively recent architectural innovation, espoused as one of the best ways to fix analytic data. We renegotiate aged social conventions by focusing on treating data as a product, with a clearly defined data product owner, akin to that of any other product. In addition, we focus on building out a self-service platform with integrated governance, letting consumers safely access and use the data they need to solve their business problems.
Data mesh is prescribed as a solution for _analytical data_, so that conventionally analytical results (think weekly sales or monthly revenue reports) can be more accurately and predictably computed. But what about non-analytical business operations? Would they not also benefit from data products backed by self-service capabilities and dedicated owners? If you've ever provided a customer with an analytical report that differed from their operational conclusions, then this talk is for you.
Adam discusses the resounding successes he has seen from applying data mesh _off-label_ to both analytical and operational domains. The key? Event streams. Well-defined, incrementally updating data products that can power both real-time and batch-based applications, providing a single source of data for a wide variety of application and analytical use cases. Adam digs into the common areas of success seen across numerous clients and customers and provides you with a set of practical guidelines for implementing your own minimally viable data mesh.
Finally, Adam covers the main social and technical hurdles that you'll encounter as you implement your own data mesh. Learn about important data use cases, data domain modeling techniques, self-service platforms, and building an iteratively successful data mesh."
Apache Unomi presentation and update. By Serge Huber, CTO JahiaJahia Solutions Group
Serge Huber, CTO & Co-founder of Jahia presents Apache Unomi Project and it's evolution over years. He also updates us with the project's upcoming news & updates.
Similar to Jeremy cabral search marketing summit - scraping data-driven content (1) (20)
Come learn how YOU can Animate and Illuminate the World with Generative AI's Explosive Power. Come sit in the driver's seat and learn to harness this great technology.
QuickBooks Sync Manager Repair Tool- What You Need to Knowmarkmargaret23
Occurrence of technical errors on QuickBooks is common but it can be resolved with the use of QuickBooks Sync Manager Tool . With the help of this too, users can sync the QuickBooks Desktop company file with the Intuit online server. It is compatible with versions QuickBooks Pro, Premier, or Enterprise. In case a user faces sync-related errors then they simply need this repair tool.
Digital Marketing Training In BangaloreHoney385968
https://nidmindia.com/
Landing page optimization is the strategic process of methodically enhancing the various elements and components of a web page with the primary goal of increasing its effectiveness at converting visitors into leads or customers.
Google Ads Vs Social Media Ads-A comparative analysisakashrawdot
Explore the differences, advantages, and strategies of using Google Ads vs Social Media Ads for online advertising. This presentation will provide insights into how each platform operates, their unique features, and how they can be leveraged to achieve marketing goals.
The digital marketing industry is changing faster than ever and those who don’t adapt with the times are losing market share. Where should marketers be focusing their efforts? What strategies are the experts seeing get the best results? Get up-to-speed with the latest industry insights, trends and predictions for the future in this panel discussion with some leading digital marketing experts.
Trust Element Assessment: How Your Online Presence Affects Outbound Lead Gene...Martal Group
Learn how your business's online presence affects outbound lead generation and what you can do to improve it with a complimentary 13-Point Trust Element Assessment.
Most small businesses struggle to see marketing results. In this session, we will eliminate any confusion about what to do next, solving your marketing problems so your business can thrive. You’ll learn how to create a foundational marketing OS (operating system) based on neuroscience and backed by real-world results. You’ll be taught how to develop deep customer connections, and how to have your CRM dynamically segment and sell at any stage in the customer’s journey. By the end of the session, you’ll remove confusion and chaos and replace it with clarity and confidence for long-term marketing success.
Key Takeaways:
• Uncover the power of a foundational marketing system that dynamically communicates with prospects and customers on autopilot.
• Harness neuroscience and Tribal Alignment to transform your communication strategies, turning potential clients into fans and those fans into loyal customers.
• Discover the art of automated segmentation, pinpointing your most lucrative customers and identifying the optimal moments for successful conversions.
• Streamline your business with a content production plan that eliminates guesswork, wasted time, and money.
The session includes a brief history of the evolution of search before diving into the roles technology, content, and links play in developing a powerful SEO strategy in a world of Generative AI and social search. Discover how to optimize for TikTok searches, Google's Gemini, and Search Generative Experience while developing a powerful arsenal of tools and templates to help maximize the effectiveness of your SEO initiatives.
Key Takeaways:
Understand how search engines work
Be able to find out where your users search
Know what is required for each discipline of SEO
Feel confident creating an SEO Plan
Confidently measure SEO performance
The Good the Bad and The Ugly of Marketing MeasurementNapierPR
We explore how B2B marketers can impress the board by measuring their PR and marketing campaigns successfully, and explore 5 metrics that will get you promoted, and 3 that will get your fired.
We cover:
-Meaningless marketing metrics
-The difference between attribution and incrementality
-The importance of the customer journey
-Why you should care about prospects that are in market
-Measuring the unmeasurable
Mastering Dynamic Web Designing A Comprehensive Guide.pdfIbrandizer
Dynamic Web Designing involves creating interactive and adaptable web pages that respond to user input and change dynamically, enhancing user experience with real-time data, animations, and personalized content tailored to individual preferences.
Everyone knows the power of stories, but when asked to come up with them, we struggle. Either we second guess ourselves as to the story's relevance, or we just come up blank and can't think of any. Unlocking Everyday Narratives: The Power of Storytelling in Marketing will teach you how to recognize stories in the moment and to recall forgotten moments that your audience needs to hear.
Key Takeaways:
Understand Why Personal Stories Connect Better
How To Remember Forgotten Stories
How To Use Customer Experiences As Stories For Your Brand
Videos are more engaging, more memorable, and more popular than any other type of content out there. That’s why it’s estimated that 82% of consumer traffic will come from videos by 2025.
And with videos evolving from landscape to portrait and experts promoting shorter clips, one thing remains constant – our brains LOVE videos.
So is there science behind what makes people absolutely irresistible on camera?
The answer: definitely yes.
In this jam-packed session with Stephanie Garcia, you’ll get your hands on a steal-worthy guide that uncovers the art and science to being irresistible on camera. From body language to words that convert, she’ll show you how to captivate on command so that viewers are excited and ready to take action.
Exploring the Top Digital Marketing Company in CanadaSolomo Media
Choosing Solomo Media as your digital marketing company in Canada can propel your business to new heights. With their expertise, innovative solutions, and client-centric approach, they are well-equipped to help you achieve your digital marketing goals. By focusing on strategic planning, leveraging cutting-edge tools, and delivering measurable results, Solomo Media proves to be a valuable partner in navigating the complex world of digital marketing.
What is digital marketing And why is it used?125albina
Digital marketing refers to the use of digital channels, platforms, and technologies to promote products, services, or brands to a target audience. It encompasses a wide range of activities, including search engine optimization (SEO), social media marketing, email marketing, content marketing, pay-per-click (PPC) advertising, and more. The primary goal of digital marketing is to connect with potential customers where they spend much of their time: online. My Website: https://dev-topdigitalmarketingagency.pantheonsite.io/
First Things First: Building and Effective Marketing Strategy
Too many companies (and marketers) jump straight into activation planning without formalizing a marketing strategy. It may seem tedious, but analyzing the mindset of your targeted audiences and identifying the messaging points most likely to resonate with them is time well spent. That process is also a great opportunity for marketers to collaborate with sales leaders and account managers on a galvanized go-to-market approach. I’ll walk you through the methods and tools we use with our clients to ensure campaign success.
Key Takeaways:
-Recognize the critical role of strategy in marketing
-Learn our approach for building an actionable, effective marketing strategy
-Receive templates and guides for developing a marketing strategy
3. Insights from
Analyzing 1
Million Articles
“Original research based content
has the potential to achieve much
higher numbers of domain links
than other forms of content”
- Steve Rayson (Director -
BuzzSumo)
BuzzSumo Study
8. But… what if the data you
need isn’t available by API
or downloadable?
9. Disclaimer
Seek legal advice before
committing to a scraping
project
Scraping data could breach the
terms of service of a website
Scraping at a disruptive rate
could slow down or even crash
a website
10. What is data
scraping?
Data scraping is an automated way
using scripts and crawlers to
1. Fetch a page
2. Parse the data in that page to
extract information
3. Format the data in an
organised way
4. Store or export that data to
create a dataset (DB, CSV,
TXT etc)
11. Patterns in HTML & CSS
It’s easier to scrape content broken up by a unique id or class assigned to the
element you want to extract
12. Basic overview of XPath
XPath can be used to navigate through
elements and attributes in a document
Important to understand how tags are nested as
a scraper will follow this tree
Learn more:
https://www.slideshare.net/scrapinghub/xpath-
for-web-scraping
13. Finding an API
Learn more: http://www.gregreda.com/2015/02/15/web-scraping-finding-the-api/
14. Important
Excel analysis
skills
1. Match the same data across
multiple spreadsheets:
a. VLOOKUP
b. INDEX MATCH
2. Summarising data
a. Pivot Tables
b. Charts
3. Cleaning data
a. =TRIM()
b. =SPLIT()
Learn more:
● https://www.distilled.net/excel-for-seo/
● https://trumpexcel.com/clean-data-in-excel/
23. Data scraping tools
Desktop tools
Scrapesimilar
artoo.js
Tabula - extract tables from PDFs
Parsehub (free & paid versions)
Screaming Frog
URL Profiler
Scripts run on your local machine
Hosted Services
Google Sheets (ImportXML,
ImportJSON, ImportHTML)
Import.io - automatic page scraper
Mozenda - point and click screen
scraping (Windows only)
DIFFBot (Artificial Intelligence)
Connotate
24. Scraping with Google Sheets
Google Sheets Formulas (built in)
=importXML(url, xpath_query) -- imports
structured data using XPath
=importHTML(url, query, index) – imports data
from a table or list within an HTML page. Index
identifies which table in the source code
Learn more:
https://www.distilled.net/blog/distilled/guide-to-
google-docs-importxml/
= ImportJSON(url, query, parseOptions) --
imports JSON feeds into Google Sheets
http://date.jsontest.com/
{
"time": "11:35:24 AM",
"milliseconds_since_epoch": 1493552124786,
"date": "02-14-2014"
}
Learn more:
http://blog.fastfedora.com/projects/import-json
27. Predictive model for real estate
value
Learn more: http://www.louisdorard.com/guest/everyone-can-do-data-science
Realtor.com scraped by import.io => cleaned with Pandas => model built by BigML
29. Diffbot.com
4 main APIs that use artificial
intelligence for data extraction
1. Article: clean text from article,
html, author, date info, related
images, videos
2. Discussion: content of forum
threads, article comments,
product reviews
3. Product: pricing information,
product IDs, images, product
specs
4. Video: Author/uploader,
duration, title, description, date
uploaded, stats.
33. Briefing a freelancer
Inputs:
1. Project Goal
2. List of URLs
1. Provide it yourself
2. Provide an endpoint and a pattern of URLs
that you’d like captured
3. Specific inputs into any filters/data input fields
which may be required to capture all the data
combinations
1. Form values (numbers, sliders, etc)
2. Login details
4. Technical requirements
1. Location of IP when scraping
2. Frequency of scrape
3. Scraping language
Outputs:
1. Where the data will be stored?
a. Local file (CSV, TXT)
b. Database (SQLite)
c. Stored on webserver
2. Provide an example spreadsheet showing
how you would like to data presented
3. Specify any data manipulation needed to
have clean output from the scrape
4. Specify how the data will be used
a. HTML Table or
b. Single page application (React/Angular JS)
embedded with oEmbed
34. Avoid getting
blocked
● Spoof header as Googlebot
● Run scrape from multiple IP
addresses
● Run the scrape slowly
● Be careful scraping behind a
login
39. Charts
Easiest way to visualise data,
hardest to make look sexy with
Excel & Google Sheets
Source: https://www.labnol.org/software/find-right-chart-type-for-your-
data/6523/
47. Provide a new dimension on a
dataset
How? IMDB + the idea that people want their fav tv shows to come back on air
335,830
views
142
Linking root domains
48. Recognise patterns and service
them
How? Combined results from NBN map search + real estate listings
1,000+
New users within 72
hours
49. Display data in an accessible format
How? Allflicks.net combining IMDB with Netflix library plus filters
1.13k
Linking root domains
● Filterable
● Sortable
● Categorised
● Indexable!
51. Big data analysis ‘taster’
How? Scraped Google to analyse rich snippets + blog post with ‘taste’ of the data
128
Linking root domains
+
Lead source
52. Want more
ideas?
1. Scrape an online community to get a
list of URLs and their
a. Post titles
b. # of Upvotes
c. # of comments
d. Date posted
2. Mash together the data with social
shares, link data using URL Profiler
3. Analyse the data using pivot tables in
Excel or Google Sheets
Learn how: https://blog.parsehub.com/boost-your-
content-marketing-with-web-scraping-and-pivot-tables/
Scrape reddit,
growthhackers.com,
inbound.org, hackernews
55. Good ol’
fashioned
reachout
Find websites with audiences that
will be interested in your data
Give journalists and bloggers a
unique angle and potentially a
different dimension on the
dataset so they can write their
own unique story
Make contact - don’t be afraid to
use the phone or go for a
coffee
List of content distribution websites: bit.ly/content-distribution-list
57. We are always hiring!
finder.com.au/careers
jeremy@finder.com
Editor's Notes
Originally part of Y-Combinator
Blogs
Airbnb rates
Combined data from pitchfork music review websites + facebook likes for each review.
Pitchfork: music reviews for independent music
Facebook likes: artists with the least likes were the most hipster, because their second criteria is that it should be a band you’ve never heard of
Understanding this will help you understand page structures and what’s possible
Data.gov.au
Kaggle the data science community recently bought by google publishes alot of their datasets
Talk about they aren’t niche
Second opinion
https://blog.hartleybrody.com/web-scraping/
Talk about other JSON examples
30 seconds to produce this
Point and click
Can run on a regular basis
easy-to-use tool for intermediate to advanced users who are comfortable with XPath.
More advanced than ImportXML. Allows you to capture more information than what is possible in google sheets