Log File Analysis: The most powerful tool in your SEO toolkitTom Bennet
Slide deck from Tom Bennet's presentation at Brighton SEO, September 2014. Accompanying guide can be found here: http://builtvisible.com/log-file-analysis/
Image Credits:
https://www.flickr.com/photos/nullvalue/4188517246
https://www.flickr.com/photos/small_realm/11189803763/
https://www.flickr.com/photos/florianric/7263382550
http://fotojenix.wordpress.com/2011/07/08/weekly-photo-challenge-old-fashioned/
My presentation from Optimise Oxford in November 2016.
In it I discuss why you should be making use of server logs, and how to go about utilising them.
Automated Reports with Rstudio Server
Automated KPI reporting with Shiny Server
Process Validation Documentation with Jupyter Notebook
Automated Machine Learning with Dataiku
The webinar will present the SemaGrow demonstrator “Web Crawler + AgroTagger”, in order to collect feedback, ideas and comments about the status of the development and how the demonstrator helps to overcome data problems.
SemaGrow is a project funded by the Seventh Framework Programme (FP7) of the European Commission, aiming at developing algorithms, infrastructures and methodologies to cope with large data volumes and real time performance.
In this context, FAO is providing a component than can be used to crawl the Web, giving a meaning to discovered resources by using the AgroTagger, which can assign some AGROVOC URIs to resources gathered by a Web crawler.
The demonstrator is publicly available at https://github.com/agrisfao/agrotagger.
Log File Analysis: The most powerful tool in your SEO toolkitTom Bennet
Slide deck from Tom Bennet's presentation at Brighton SEO, September 2014. Accompanying guide can be found here: http://builtvisible.com/log-file-analysis/
Image Credits:
https://www.flickr.com/photos/nullvalue/4188517246
https://www.flickr.com/photos/small_realm/11189803763/
https://www.flickr.com/photos/florianric/7263382550
http://fotojenix.wordpress.com/2011/07/08/weekly-photo-challenge-old-fashioned/
My presentation from Optimise Oxford in November 2016.
In it I discuss why you should be making use of server logs, and how to go about utilising them.
Automated Reports with Rstudio Server
Automated KPI reporting with Shiny Server
Process Validation Documentation with Jupyter Notebook
Automated Machine Learning with Dataiku
The webinar will present the SemaGrow demonstrator “Web Crawler + AgroTagger”, in order to collect feedback, ideas and comments about the status of the development and how the demonstrator helps to overcome data problems.
SemaGrow is a project funded by the Seventh Framework Programme (FP7) of the European Commission, aiming at developing algorithms, infrastructures and methodologies to cope with large data volumes and real time performance.
In this context, FAO is providing a component than can be used to crawl the Web, giving a meaning to discovered resources by using the AgroTagger, which can assign some AGROVOC URIs to resources gathered by a Web crawler.
The demonstrator is publicly available at https://github.com/agrisfao/agrotagger.
Building Beautiful REST APIs in ASP.NET CoreStormpath
Core 1.0 is the latest iteration of ASP.NET. What’s changed? Everything! Nate Barbettini, .NET Developer Evangelist at Stormpath, does a deep dive on how to build RESTful APIs the right way on top of ASP.NET Web API.
Learn how to build new classes of sophisticated, real-time analytics by combining Apache Spark, the industry's leading data processing engine, with MongoDB, the industry’s fastest growing database.
We live in a world of “big data.” But it isn’t just the data itself that is valuable – it’s the insight it can generate. How quickly an organization can unlock and act on that insight has become a major source of competitive advantage. Collecting data in operational systems and then relying on nightly batch extract, transform, load (ETL) processes to update the enterprise data warehouse (EDW) is no longer sufficient.
In this live session, we show you how MongoDB and Spark work together and provide examples using the new Spark Connector for MongoDB.
This session was sponsored by Stratio & Paradigma.
Android Lab Test : Using the network with HTTP (english)Bruno Delb
Android Lab Test : Using the network with HTTP (english)
Video of tutorial on : https://www.youtube.com/playlist?list=PLL2Z3bzdO25yHwIV3XdMzKs61At0Ldh6L
Visit http://www.AndroidLabTest.com
author: Kamil Durski, Senior Ruby Developer at Polcode
Twitter: @kdurski Github: github.com/kdurski
Ruby has built-in support for threads yet it’s barely used, even in situations where it could be very handy, such as crawling the web. While it can be a pretty slow process, the majority of the time is spent on waiting for IO data from the remote server. This is the perfect case to use threads.
You can read more about multi-threaded web crawler in Ruby here: http://www.polcode.com/en/multi-threaded-web-crawler-in-ruby/
At the Devoxx 2015 conference in Belgium, Guillaume Laforge, Product Ninja & Advocate at Restlet, presented about the never-ending REST API design debate, covering many topics like HTTP status codes, Hypermedia APIs, pagination/searching/filtering, and more.
This talk discusses the principles of RESTful design and what it means to be HATEOAS. It concludes by demonstrating how to implement a simple RESTful API on top of ASP.NET Core.
With the popularization of github, the lack of security control in commits has exposed several sensitive data that can compromise both companies and ordinary users. To make the search easier, I've created a script to automate and collect the results. This script is in version 2.0 with some implementations and improvements in the code, I will demonstrate how to perform the collection and how this can cause a great impact.
Controlling crawler for better Indexation and RankingRajesh Magar
My employer actually raise this request to create more interactive and value added presentation for our crew. So first 99% credit goes to our be-loving MOZ and Rank-Fishkin, plus I've added some of the fascinating stats about the internet altogether. Which I am sure will take your breath away.
Server Log Files & Technical SEO Audits: What You Need to KnowSamuel Scott
Samuel Scott's September 2016 presentation on server log analysis at MozCon in Seattle, Washington. He goes through log data in general and then talks about how to find and fix problems that are found in server logs specifically.
This is my speaker session PPT slides from BrightonSEO conference last week. Please do let me know if you have any questions (saija@mahondigital.co.uk)
Building Beautiful REST APIs in ASP.NET CoreStormpath
Core 1.0 is the latest iteration of ASP.NET. What’s changed? Everything! Nate Barbettini, .NET Developer Evangelist at Stormpath, does a deep dive on how to build RESTful APIs the right way on top of ASP.NET Web API.
Learn how to build new classes of sophisticated, real-time analytics by combining Apache Spark, the industry's leading data processing engine, with MongoDB, the industry’s fastest growing database.
We live in a world of “big data.” But it isn’t just the data itself that is valuable – it’s the insight it can generate. How quickly an organization can unlock and act on that insight has become a major source of competitive advantage. Collecting data in operational systems and then relying on nightly batch extract, transform, load (ETL) processes to update the enterprise data warehouse (EDW) is no longer sufficient.
In this live session, we show you how MongoDB and Spark work together and provide examples using the new Spark Connector for MongoDB.
This session was sponsored by Stratio & Paradigma.
Android Lab Test : Using the network with HTTP (english)Bruno Delb
Android Lab Test : Using the network with HTTP (english)
Video of tutorial on : https://www.youtube.com/playlist?list=PLL2Z3bzdO25yHwIV3XdMzKs61At0Ldh6L
Visit http://www.AndroidLabTest.com
author: Kamil Durski, Senior Ruby Developer at Polcode
Twitter: @kdurski Github: github.com/kdurski
Ruby has built-in support for threads yet it’s barely used, even in situations where it could be very handy, such as crawling the web. While it can be a pretty slow process, the majority of the time is spent on waiting for IO data from the remote server. This is the perfect case to use threads.
You can read more about multi-threaded web crawler in Ruby here: http://www.polcode.com/en/multi-threaded-web-crawler-in-ruby/
At the Devoxx 2015 conference in Belgium, Guillaume Laforge, Product Ninja & Advocate at Restlet, presented about the never-ending REST API design debate, covering many topics like HTTP status codes, Hypermedia APIs, pagination/searching/filtering, and more.
This talk discusses the principles of RESTful design and what it means to be HATEOAS. It concludes by demonstrating how to implement a simple RESTful API on top of ASP.NET Core.
With the popularization of github, the lack of security control in commits has exposed several sensitive data that can compromise both companies and ordinary users. To make the search easier, I've created a script to automate and collect the results. This script is in version 2.0 with some implementations and improvements in the code, I will demonstrate how to perform the collection and how this can cause a great impact.
Controlling crawler for better Indexation and RankingRajesh Magar
My employer actually raise this request to create more interactive and value added presentation for our crew. So first 99% credit goes to our be-loving MOZ and Rank-Fishkin, plus I've added some of the fascinating stats about the internet altogether. Which I am sure will take your breath away.
Server Log Files & Technical SEO Audits: What You Need to KnowSamuel Scott
Samuel Scott's September 2016 presentation on server log analysis at MozCon in Seattle, Washington. He goes through log data in general and then talks about how to find and fix problems that are found in server logs specifically.
This is my speaker session PPT slides from BrightonSEO conference last week. Please do let me know if you have any questions (saija@mahondigital.co.uk)
SEO tools can help you deal with your search engine optimization efforts for your site or blog. These tools are for the most part offered on the web. When you are in search for best SEO, it can be hard to discover a suite that offers all that you have to deal with an SEO.
Nichola Stott – BrightonSEO April 2016: SEO SUX: How and Why UX Must Be Front...Authoritas
https://www.authoritas.com/brightonseo/
Nichola Stott – BrightonSEO April 2016: SEO SUX: How and Why UX Must Be Front and Centre to Your Technical Strategy
We have a team of 50 application developers who have unmatched expertise, comprehensive knowledge of variety of programming languages and tools and wealth of experience that enables them to develop seamless mobile and tablet applications on all operating systems with perfection and guaranteed satisfaction.
Es el conjunto de medidas preventivas y reactivas de las organizaciones y de los sistemas tecnológicos que permitan resguardar y proteger la información buscando mantener la confidencialidad, la disponibilidad e integridad de la misma.
Local Government Balances Security, Flexibility and Productivity with BlackBe...BlackBerry
Upgrading to BES10 was a straightforward process for the Council. Using BlackBerry Balance technology enables councilors to attend to their personal lives while putting in long hours at the office. For those Council members using iPads, the Secure Work Space container, which has Federal Information Processing Standard (FIPS) certification, enables the Council’s IT team to configure, secure and wipe sensitive data should devices be lost or stolen. Data Leakage Prevention (DLP) also ensures work information cannot be shared outside the Secure Work Space.
Through the Lens of an iPhone: Charleston, SCPaul Brown
The following photos were entirely taken and processed by me with an iPhone. See more: http://paulgordonbrown.com/category/iphoneography/
iPhoneography is the art of creating photos with an Apple iPhone. This is a style of mobile photography that differs from all other forms of digital photography in that images are both shot and processed on the iOS device.
Coming up with a gift doesn’t have to be hard or expensive, especially if it’s for your mom. Here are some gift ideas that will surely be appreciated by her.
Shared by: http://www.familychiropractic.com.sg/
Sooner or later we all have to work with HTML, despite its verbosity. Those of us who claim to love HTML may just be victims of Stockholm Syndrome, both praising yet secretly loathing it.
Basho designer John Newman is making the trek from the swamps of Florida to show us the way. In the modern world of markup preprocessors, these alternative syntaxes allow you to write simpler, cleaner, more concise code in a shorter amount of time. Certain techniques can even allow your team members who may be less-tech-savvy to contribute content directly without forcing you to wire up a WYSIWYG style CMS.
This talk explores great alternatives to plain HTML and CSS, and covers how Basho put these tools together to facilitate a painless, team-oriented approach to building sites and web apps.
Need to automate tasks in Alma but are not sure where to start? This will help you understand how APIs are relevant to real work you're doing, familiarize you with concepts that help you work with any API, and help you understand what you need to know specifically to work with the Alma API
New language from Google, static safe compiler, with GC and as fast as C++ or Java, syntax simpler then Python - 2 hour-long tutorial and you can start code.
In this talk Serhii will talk about Go, also known as Golang – an open source language developed at Google and used in production by companies such as Docker, Dropbox, Facebook and Google itself. Go is now heavily used as a general-purpose programming language that’s a pleasure to use and maintain. This introductory talk contains many live demos of basic language concepts, concurrency model, simple HTTP-based endpoint implementation and, of course, tests using build-in framework. This presentation will be interesting for backend engineers and DevOps to understand why Go had become so popular and how it might help to build robust and maintanable services.
Agenda of the presentation:
1. Go is not C, not Java, not anything
2. Rob Pike argument
3. Main ideas and basics
4. Concurrency model
5. Tools
6. Issues
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
En esta sesión voy a contar las decisiones técnicas que tomamos al desarrollar QuestDB, una base de datos Open Source para series temporales compatible con Postgres, y cómo conseguimos escribir más de cuatro millones de filas por segundo sin bloquear o enlentecer las consultas.
Hablaré de cosas como (zero) Garbage Collection, vectorización de instrucciones usando SIMD, reescribir en lugar de reutilizar para arañar microsegundos, aprovecharse de los avances en procesadores, discos duros y sistemas operativos, como por ejemplo el soporte de io_uring, o del balance entre experiencia de usuario y rendimiento cuando se plantean nuevas funcionalidades.
AWS에서는 Big Data 분석 및 처리를 위해 다양한 Analytics 서비스를 지원합니다. 이 세션에서는 시간이 지날수록 증가하는 데이터 분석 및 처리를 위해 데이터 레이크 카탈로그를 구축하거나 ETL을 위해 사용되는 AWS Glue 내부 구조를 살펴보고 효율적으로 사용할 수 있는 방법들을 소개합니다.
Drupal Perfomance. Talk given at DrupalCamp North, 25th July 2015.
This session looked at tools you can use to analyse the performance and benchmark a Drupal site. It then looked at tools and techniques that can be used to improve the site performance. The session also included a case study about the Drupal based BAFTA website that was built by Access. Focusing on the recent Film and TV awards, which saw a large amount of traffic in a short amount of time.
Web scraping is mostly about parsing and normalization. This presentation introduces people to harvesting methods and tools as well as handy utilities for extracting and normalizing data
Mastering Local SEO for Service Businesses in the AI Era is tailored specifically for local service providers like plumbers, dentists, and others seeking to dominate their local search landscape. This session delves into leveraging AI advancements to enhance your online visibility and search rankings through the Content Factory model, designed for creating high-impact, SEO-driven content. Discover the Dollar-a-Day advertising strategy, a cost-effective approach to boost your local SEO efforts and attract more customers with minimal investment. Gain practical insights on optimizing your online presence to meet the specific needs of local service seekers, ensuring your business not only appears but stands out in local searches. This concise, action-oriented workshop is your roadmap to navigating the complexities of digital marketing in the AI age, driving more leads, conversions, and ultimately, success for your local service business.
Key Takeaways:
Embrace AI for Local SEO: Learn to harness the power of AI technologies to optimize your website and content for local search. Understand the pivotal role AI plays in analyzing search trends and consumer behavior, enabling you to tailor your SEO strategies to meet the specific demands of your target local audience. Leverage the Content Factory Model: Discover the step-by-step process of creating SEO-optimized content at scale. This approach ensures a steady stream of high-quality content that engages local customers and boosts your search rankings. Get an action guide on implementing this model, complete with templates and scheduling strategies to maintain a consistent online presence. Maximize ROI with Dollar-a-Day Advertising: Dive into the cost-effective Dollar-a-Day advertising strategy that amplifies your visibility in local searches without breaking the bank. Learn how to strategically allocate your budget across platforms to target potential local customers effectively. The session includes an action guide on setting up, monitoring, and optimizing your ad campaigns to ensure maximum impact with minimal investment.
Mastering Multi-Touchpoint Content Strategy: Navigate Fragmented User JourneysSearch Engine Journal
Digital platforms are constantly multiplying, and with that, user engagement is becoming more intricate and fragmented.
So how do you effectively navigate distributing and tailoring your content across these various touchpoints?
Watch this webinar as we dive into the evolving landscape of content strategy tailored for today's fragmented user journeys. Understanding how to deliver your content to your users is more crucial than ever, and we’ll provide actionable tips for navigating these intricate challenges.
You’ll learn:
- How today’s users engage with content across various channels and devices.
- The latest methodologies for identifying and addressing content gaps to keep your content strategy proactive and relevant.
- What digital shelf space is and how your content strategy needs to pivot.
With Wayne Cichanski, we’ll explore innovative strategies to map out and meet the diverse needs of your audience, ensuring every piece of content resonates and connects, regardless of where or how it is consumed.
How to Run Landing Page Tests On and Off Paid Social PlatformsVWO
Join us for an exclusive webinar featuring Mariate, Alexandra and Nima where we will unveil a comprehensive blueprint for crafting a successful paid media strategy focused on landing page testing.With escalating costs in paid advertising, understanding how to maximize each visitor’s experience is crucial for retention and conversion.
This session will dive into the methodologies for executing and analyzing landing page tests within paid social channels, offering a blend of theoretical knowledge and practical insights.
The Pearmill team will guide you through the nuances of setting up and managing landing page experiments on paid social platforms. You will learn about the critical rules to follow, the structure of effective tests, optimal conversion duration and budget allocation.
The session will also cover data analysis techniques and criteria for graduating landing pages.
In the second part of the webinar, Pearmill will explore the use of A/B testing platforms. Discover common pitfalls to avoid in A/B testing and gain insights into analyzing A/B tests results effectively.
Digital marketing is the art and science of promoting products or services using digital channels to reach and engage with potential customers. It encompasses a wide range of online tactics and strategies aimed at increasing brand visibility, driving website traffic, generating leads, and ultimately, converting those leads into customers.
https://nidmindia.com/
10 Video Ideas Any Business Can Make RIGHT NOW!
You'll never draw a blank again on what kind of video to make for your business. Go beyond the basic categories and truly reimagine a brand new advanced way to brainstorm video content creation. During this masterclass you'll be challenged to think creatively and outside of the box and view your videos through lenses you may have never thought of previously. It's guaranteed that you'll leave with more than 10 video ideas, but I like to under-promise and over-deliver. Don't miss this session.
Key Takeaways:
How to use the Video Matrix
How to use additional "Lenses"
Where to source original video ideas
Mastering Local SEO for Service Businesses in the AI Era is tailored specifically for local service providers like plumbers, dentists, and others seeking to dominate their local search landscape. This session delves into leveraging AI advancements to enhance your online visibility and search rankings through the Content Factory model, designed for creating high-impact, SEO-driven content. Discover the Dollar-a-Day advertising strategy, a cost-effective approach to boost your local SEO efforts and attract more customers with minimal investment. Gain practical insights on optimizing your online presence to meet the specific needs of local service seekers, ensuring your business not only appears but stands out in local searches. This concise, action-oriented workshop is your roadmap to navigating the complexities of digital marketing in the AI age, driving more leads, conversions, and ultimately, success for your local service business.
Key Takeaways:
Embrace AI for Local SEO: Learn to harness the power of AI technologies to optimize your website and content for local search. Understand the pivotal role AI plays in analyzing search trends and consumer behavior, enabling you to tailor your SEO strategies to meet the specific demands of your target local audience. Leverage the Content Factory Model: Discover the step-by-step process of creating SEO-optimized content at scale. This approach ensures a steady stream of high-quality content that engages local customers and boosts your search rankings. Get an action guide on implementing this model, complete with templates and scheduling strategies to maintain a consistent online presence. Maximize ROI with Dollar-a-Day Advertising: Dive into the cost-effective Dollar-a-Day advertising strategy that amplifies your visibility in local searches without breaking the bank. Learn how to strategically allocate your budget across platforms to target potential local customers effectively. The session includes an action guide on setting up, monitoring, and optimizing your ad campaigns to ensure maximum impact with minimal investment.
For too many years marketing and sales have operated in silos...while in some forward thinking companies, the two organizations work together to drive new opportunity development and revenue. This session will explore the lessons learned in that beautiful dance that can occur when marketing and sales work together...to drive new opportunity development, account expansion and customer satisfaction.
No, this is not a conversation about MQLs and SQLs. Instead we will focus on a framework that allows the two organizations to drive company success together.
Videos are more engaging, more memorable, and more popular than any other type of content out there. That’s why it’s estimated that 82% of consumer traffic will come from videos by 2025.
And with videos evolving from landscape to portrait and experts promoting shorter clips, one thing remains constant – our brains LOVE videos.
So is there science behind what makes people absolutely irresistible on camera?
The answer: definitely yes.
In this jam-packed session with Stephanie Garcia, you’ll get your hands on a steal-worthy guide that uncovers the art and science to being irresistible on camera. From body language to words that convert, she’ll show you how to captivate on command so that viewers are excited and ready to take action.
Core Web Vitals SEO Workshop - improve your performance [pdf]Peter Mead
Core Web Vitals to improve your website performance for better SEO results with CWV.
CWV Topics include:
- Understanding the latest Core Web Vitals including the significance of LCP, INP and CLS + their impact on SEO
- Optimisation techniques from our experts on how to improve your CWV on platforms like WordPress and WP Engine
- The impact of user experience and SEO
Top 3 Ways to Align Sales and Marketing Teams for Rapid GrowthDemandbase
In this session, Demandbase’s Stephanie Quinn, Sr. Director of Integrated and Digital Marketing, Devin Rosenberg, Director of Sales, and Kevin Rooney, Senior Director of Sales Development will share how sales and marketing shapes their day-to-day and what key areas are needed for true alignment.
It's another new era of digital and marketers are faced with making big bets on their digital strategy. If you are looking at modernizing your tech stack to support your digital evolution, there are a few can't miss (often overlooked) areas that should be part of every conversation. We'll cover setting your vision, avoiding siloes, adding a democratized approach to data strategy, localization, creating critical governance requirements and more. Attendees will walk away with actions they can take into initiatives they are running today and consider for the future.
Unleash the power of UK SEO with Brand Highlighters! Our guide delves into the unique search landscape of Britain, equipping you with targeted strategies to dominate UK search engine results. Discover local SEO tactics, keyword magic for UK audiences, and mobile optimization secrets. Get your website seen by the right people and propel your brand to the top of UK searches.
To learn more: https://brandhighlighters.co.uk/blog/top-seo-agencies-uk/
Monthly Social Media News Update May 2024Andy Lambert
TL;DR. These are the three themes that stood out to us over the course of last month.
1️⃣ Social media is becoming increasingly significant for brand discovery. Marketers are now understanding the impact of social and budgets are shifting accordingly.
2️⃣ Instagram’s new algorithm and latest guidance will help us maintain organic growth. Instagram continues to evolve, but Reels remains the most crucial tool for growth.
3️⃣ Collaboration will help us unlock growth. Who we work with will define how fast we grow. Meta continues to evolve their Creator Marketplace and now TikTok are beginning to push ‘collabs’ more too.
Financial curveballs sent many American families reeling in 2023. Household budgets were squeezed by rising interest rates, surging prices on everyday goods, and a stagnating housing market. Consumers were feeling strapped. That sentiment, however, appears to be waning. The question is, to what extent?
To take the pulse of consumers’ feelings about their financial well-being ahead of a highly anticipated election, ThinkNow conducted a nationally representative quantitative survey. The survey highlights consumers’ hopes and anxieties as we move into 2024. Let's unpack the key findings to gain insights about where we stand.
In this presentation, Danny Leibrandt explains the impact of AI on SEO and what Google has been doing about it. Learn how to take your SEO game to the next level and win over Google with his new strategy anyone can use. Get actionable steps to rank your name, your business, and your clients on Google - the right way.
Key Takeaways:
1. Real content is king
2. Find ways to show EEAT
3. Repurpose across all platforms
2. About Me
• Former Senior Technical Consultant @ builtvisible.
• Now Freelance Technical SEO Consultant.
• @ohgm on Twitter.
• ohgm.co.uk for my webzone.
3. What I’d like to do today
1. Talk about access logs.
2. Show you some command line tools.
3. Show you some ways to apply these tools to
common scenarios.
4. Sit back down.
This talk is on the first significant difficulty spike in
server log analysis – having too much information.
9. ohgm.co.uk1 173.245.55.1232 - - [11/Apr/2016:10:23:42 +01003]
"GET4 /please5 HTTP/1.16" 2007 68128 "-9" "Mozilla/5.0
(compatible; Googlebot/2.1;
+http://www.google.com/bot.html)10" 173.245.55.12311
1. The host responding to the
request.
2. The IP that serviced the
request.
3. The date and time of the
request.
4. The HTTP method:
GET, POST, PUT, HEAD, or
DELETE.
5. The resource requested.
6. The HTTP Version
{HTTP/1.0|HTTP/1.1|HTTP/2}
7. The server response.
8. The download size.
9. The referring URL.
10. The reported User-Agent.
11. The IP that made the
request.
Configurations vary substantially.
10. Why SEOs Like Them.
There is a lack of overlap between server logs and
crawl simulation tools.
Access logs show what’s being accessed rather than
what’s simply accessible.
We find correlation between crawl efficiency
improvements and organic performance. Access logs
are one of the best tools for identifying crawl waste.
11. Why ‘Excel Fails’?
Microsoft Excel currently supports
1,048,576 rows of data.
There are no plans to increase this.
12. Agency Scenario
Your manager has sold a Server Log Analysis
project, requesting 1 month of access logs
from the client, a UK high street retailer.
You receive 15 access_log.gz files, totalling
17.6GB. Excel won’t open any of them. You
don’t know it yet, but they are unfiltered.
Good Luck.
24. If you’re on Windows, you
probably aren’t ready*.
*Unless ‘Ubuntu on Windows’ becomes
part of the non-developer release.
25. 1. Windows Update > Update Settings >
Advanced > Get Insider Preview
Builds.
2. Install Build 14316 or greater.
3. Enable ‘Windows Subsystem for Linux
(Beta)’.
4. Open cmd and type ‘bash’.
5. Type ‘y’ and hit enter at the prompt.
26. 1. Windows Update > Update Settings >
Advanced > Get Insider Preview
Builds.
2. Install Build 14316 or greater.
3. Enable ‘Windows Subsystem for Linux
(Beta)’.
4. Open cmd and type ‘bash’.
5. Type ‘y’ and hit enter at the prompt.
No
thanks.
30. Getting Started
• Navigate to the folder
containing the
downloaded files.
• Open your chosen
terminal (cmd,
terminal, or bash).
CTRL+SHIFT+Rclick inside a folder is
an alternate method.
37. Combine Multiple Log Files
We navigate to a folder containing all our server logs,
open the terminal, and type:
~$ cat *.log >> combined.log
“Take every .log file in the folder and append each to
combined.log”
38. But they gave me files in lots of
different folders.
39. Combine Multiple Files in Multiple
Folders
~$ find . -name ‘*.log’ -exec cat {} >> combined.log ;
“Search the current folder, and all subfolders for
filenames ending with ‘.log’. Append the contents of
these files to a new file called combined.log.”
gfind in GOW
41. Combine Multiple Files in Multiple
Folders Some of Which are compressed
~$ find . -name *.gz -exec gzip -dkr {} +
&& find . -name ‘*.log’ -exec cat {} >> combined.log ;
“Find all the files with the .gz extension beneath the current
folder.
Recursively Decompress all files. Keep the originals.
Once finished, find all the .log files, append them to a new
combined.log file.”
42. Preview Huge Files with less
less streams the contents of a file to the terminal
without loading the whole file into memory.
$~ less combined.log
You can use less to review access logs without
crippling your machine.
45. RTFM
If at any time you get stuck:
~$ toolname --help
or
~$ man toolname
or
Google what you are trying to do.
The --help ( often -h) flag will usually give you what you need to know.
‘man’ (manual) tends to be much more in depth. Both are read from the
command line.
47. UA Filtering: Googlebot
Our combined access logs are in a single file:
combined.log – 16.4GB
Too large to open in Excel.
Too large to open in Notepad.
Examining it with less?
It’s too full of filthy human data.
We need to cut it down to Googlebot.
48. grep is a tool that extracts lines from text files based on a
regular expression. Using grep is pretty simple:
~$ grep [options] [pattern] [file]
…
~$ grep ‘Googlebot’ combined.log
“Give me all the lines containing Googlebot in
combined.log”
Press Enter.
grep
50. Filtering Scenario: Googlebot
So we’ll store this output to a new file using ‘>>’:
~$ grep ‘Googlebot’ combined.log >> googlebot.log
“Append all lines in combined.log that contain
Googlebot into a new file, googlebot.log”
51. Like other tools, grep has a number of optional
argument flags. The count flag ‘-c’ can provide a
useful summary for direct questions:
~$ grep [options] [pattern] [file]
…
~$ grep -c “POST /wp-login” april.log
“Show me the count of login attempts in April on
ohgm.co.uk”
grep
52.
53. Filtering Scenario – Googlebot
You can’t just verify Googlebot by name.
Apparently some people aren’t honest on the internet.
55. Filtering Scenario – IP Ranges
Start End
64.233.160.0 64.233.191.255
66.102.0.0 66.102.15.255
66.249.64.0 66.249.95.255
72.14.192.0 72.14.255.255
74.125.0.0 74.125.255.255
209.85.128.0 209.85.255.255
216.239.32.0 216.239.63.255
If we were masochistic, we could write a
regular expression to capture these all…
56. Filtering Scenario – IP Ranges
The -E flag lets grep use Extended Regular Expressions.
~$ grep -E "((b(64).233.(1([6-8][0-9]|9[0-
1])))|(b(66).102.([0-9]|1[0-5]))|(b(66).249.(6[4-
9]|[7-8][0-9]|9[0-5]))|(b(72).14.(1(9[2-9])|2([0-4][0-
9]|5[0-5])))|(b(74).125.(25[0-5]|2[0-4][0-9]|[01]?[0-
9][0-9]?))|(209.85.(1(2[8-9]|[3-9][0-9])|2([0-4][0-
9]|5[0-5])))|(216.239.(25[0-5]|2[0-4][0-9]|[01]?[0-
9][0-9]?))).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
GbotUA.log > GbotIP.log
This shouldn’t work, but it does*.
*WOMM
57. Filtering Scenario – Impostors
The -v flag inverts the grep query to find impostors:
~$ grep -vE "((b(64).233.(1([6-8][0-9]|9[0-
1])))|(b(66).102.([0-9]|1[0-5]))|(b(66).249.(6[4-9]|[7-
8][0-9]|9[0-5]))|(b(72).14.(1(9[2-9])|2([0-4][0-9]|5[0-
5])))|(b(74).125.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-
9]?))|(209.85.(1(2[8-9]|[3-9][0-9])|2([0-4][0-9]|5[0-
5])))|(216.239.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-
9]?))).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" GbotUA.log >
Impostors.log
“Give me every request that claims to be Googlebot, but
doesn’t come from this IP range. Put them in an impostors
file.”
58. Filtering Scenario – Verifying
Googlebot
• Disclaimer: don’t blindly use awful regex (check
with Regexr) or IP ranges, especially if you’re
analysing logs for a site using IP detection for
international SEO purposes. Read more about
Googlebot’s Geo-distributed Crawling here first.
• Use the correct reverse DNS > forward DNS lookup
when it’s important to be right. This can be
automated.
59. Filtering Scenario – IP Ranges
Anyone cloaking today
will have a good list.
You might find them at the bar.
62. I Want A Sample
The sort and split utilities do what you’d expect:
~$ sort -R combined.log | split -l 1048576
“Randomly sort the lines in the combined.log. split
the output of this command into multiple files (up to)
1048576 lines long.”
shuf is easier, but not default OSX/GOW.
A pipe ‘|’ takes the output of one
command as the input of another.
65. I Just Want it in Excel.
Use wc to check it has fewer than 1,048,576 rows.
~$ wc -l sample.log
“Count the number of lines in sample.log.”
66.
67.
68.
69.
70. The Title of The Talk
Wasn’t a Lie And We
Aren’t Going to Use Excel
And Are Going to Answer
Questions Just Using The
Command Line I Hope
That’s OK. Sorry.
72. Formulating Questions
Work a basic hypothesis. Decides what needs to be
done if it is true, false, or indeterminate before you
get the data.
“Google is ignoring robots.txt” may not be action
guiding, whilst “Googlebot is ignoring search console
parameter restrictions” just might be.
74. Example Questions
How deep is Googlebot crawling?
Where is the wasted crawl? What proportion of
requests are currently wasted?
Where is Googlebot POSTing?
What are the most popular non-200/304 resources?
How many unique resources are being crawled?
Which is the more popular form of product page?
Which sitemap pages aren’t being crawled?
Always pivot with other data.
76. AWK
AWK is a programming language focused on text
manipulation.
We are going to use it to print some columns from
our log files. That’s it.
77. Logs are space separated by default.
Awk uses spaces to define column numbers.
~$ awk ‘ {print $col_number1, $col_number2}’ [file]
ohgm.co.uk1
173.245.55.1232-3-4
[11/Apr/2016:10:23:425+0100]6
"GET7/8HTTP/1.1”920010681211“”12
"Mozilla/5.013(compatible;14Googlebot/2.1;15+http://ww
w.google.com/bot.html)“16173.245.55.123
78. AWK
~$ awk ‘{print $8, $10}’ Googlebot.log >> Gbot_responses.txt
“Output the requested resource and server response of
Googlebot.log to Gbot_responses.txt.”
/ 200
/robots.txt 304
/robots.txt 500
/amazing-blog-post 200
/forgotten-blog-post 404
/forbidden-blog-post 403
/ 200
Tailor the command to the access log format in use.
79. uniq
uniq takes text as an input and returns unique lines.
uniq -c returns these lines prefixed with a count.
uniq -d returns only repeated lines.
uniq -u returns only non-repeated lines.
80. AWK
Like grep, awk also matches patterns, using /pattern/.
~$ awk ‘/bingbot/ {print $10}’ combined.log | uniq -c
“Look for lines containing bingbot in the unfiltered logs and
print their server response codes. Deduplicate these and
return a summary.”
216663 - 302
109232 - 200
18395 - 301
2568 - 404
2147 - 304
274 - 500
261 - 403
82. Ultimate Guide to Site Migrations
Get a big list of old URLs.
301 redirect them once to the
right places.
Make sure they get crawled.
83. Site Migrations
“We want a list of all URLs requested by Googlebot in
our pre-migration dataset, sorted by popularity
(number of requests).”
e.g.
/ 49587
/index.html 25169
/robots.txt 23334
/home 19417
84. Site Migrations
~$ awk ‘/Googlebot/ {print $7}’ combined.log |
uniq -c | sort -nr >> unique_requests.txt
“Take all access log requests, and filter to Googlebot.
Extract and output the requested resources.
Deduplicate these and return a summary.
Sort these by number in descending order.”
85. Site Migrations – Encouraging
Crawl
“We want our migration to switch as
quickly as possible.”
Get the list of redirects (URI stems) you want Google
to crawl into a file.
grep can use this file as the match criteria (lines
matching this OR this OR this).
86. Site Migrations – Encouraging
Crawl
We want the URLs Google has not yet crawled.
~$ grep -f wishlist.txt postmigration.log | awk
‘/Googlebot/ {print $8}’ | uniq >> wishlist-hits.txt
“Filter the post-migration log for lines that match
wishlist.txt. Return the resources requested by Googlebot.
Deduplicate and save.”
87. Site Migrations – Encouraging
Crawl
~$ cat wishlist-hits.txt wishlist.txt | uniq -u >>
uncrawled.txt
“Read the contents of both files. Save wishlist
entries that don’t appear in the access logs.”
Tip: use an indexing service like linklicious to encourage
crawling the uncrawled.