How can we use open source tools to understand complex site graphs?
Web crawlers needs websites well connected. Large ecommerce/news websites and feed readers are graphs with hundreds of thousands of vertices (web pages) and edges (links between them). Understanding these graphs has a direct effect in usability and SEO.
Search Engine Optimization (SEO) is about about analysing websites, measuring them and improving them empirically and iteratively.
Websites can be visualized and measured using the Graph Theory where nodes are webpages and internal links, directed edges.
This presentation explains the process to parameterize and compare websites based on their network of internal connections.
The technologies involved include web crawling spiders, Gephi, igraph on R and the graph database Neo4j.
Slides in English of my talk about Advanced on-page SEO covering topics such as content inventory, architecture of websites and PageRank, anatomy of URLs, visualization of websites and Graph theory, Single Page Applications with AJAX and markup of structured data.
Take Better Care of Library Data and Spreadsheets with Google Visualization A...Bohyun Kim
Presentation given at 2013 LITA Forum on Nov 8, 2013. http://www.ala.org/lita/conferences/forum/2013 ; Example files are at http://github.com/bohyunkim/examples
6 Must Have Google Analytics Filters by Jason CartwrightJM Garcia
This whitepaper covers a few tips and tricks regarding the use of filters that will help you get more from Google Analytics and aid you in making more informed decisions about your site and its content.
1.Setting your URLs to Lowercase
-If the URLs for your website allow upper and lowercase characters, Google will report on each version of a URL separately. Imagine the pain if you had 100s, 1000s or more pages on your website with multiple line entries for the same URL in your reports. This filter will save time by ensuring all the multiple line entries of a single URL are treated as the same page.
2. Excluding your own traffic from reports - Chances are that your own visits to your own web site are not going to be a large percentage of the total visits and page views. Nonetheless, you can still permanently remove your own traffic and that of the agencies you might be working with from appearing in your Google Analytics statistics1.
3. Showing the domain / hostname
- If you want to see the domain / hostname appended to page names within your reports, this filter will include it for you. If you are using Google Analytics across multiple domains, this can also help to identify which domains the Account ID is running on.
4. Filtering directories and sub-directories
- This filter is useful to either report on stats for a specific directory e.g. /blog/ or exclude a directory from the report. There are many scenarios where this can be beneficial, particularly if you want to separate out e.g. blog or newsfeed traffic from the rest of the traffic to your site.
5. Separating mobile and non-mobile traffic
- Mobile traffic is on the increase and mobile users will interact with your site differently to desktop users. This filter separates out the audiences so you can make more informed decisions towards each.
6. Making sense of the search keyword (Not Provided)
- If you've looked at the organic search phrase report, the chances are that one of the top keywords appearing reads as (not provided). This filter breaks down the (not provided) keyword by including the stats for the associated landing pages. This will give you an idea of what keywords are likely being classified under (not provided).
7. BONUS - See search rankings for keywords in Google
- Have you ever wanted to see what the ranking of a keyword was in Google and how much traffic it drives in that position? Using this bonus filter you can.
Quick & Easy Data Visualization with Google Visualization API + Google Char...Bohyun Kim
Presentation given at the 2014 CODE4LIB Conference, Raleigh, NC. Mar. 25, 2014.
Conference Info: http://code4lib.org/conference/2014/schedule
Example code: http://github.com/bohyunkim/examples
Search Engine Optimization (SEO) is about about analysing websites, measuring them and improving them empirically and iteratively.
Websites can be visualized and measured using the Graph Theory where nodes are webpages and internal links, directed edges.
This presentation explains the process to parameterize and compare websites based on their network of internal connections.
The technologies involved include web crawling spiders, Gephi, igraph on R and the graph database Neo4j.
Slides in English of my talk about Advanced on-page SEO covering topics such as content inventory, architecture of websites and PageRank, anatomy of URLs, visualization of websites and Graph theory, Single Page Applications with AJAX and markup of structured data.
Take Better Care of Library Data and Spreadsheets with Google Visualization A...Bohyun Kim
Presentation given at 2013 LITA Forum on Nov 8, 2013. http://www.ala.org/lita/conferences/forum/2013 ; Example files are at http://github.com/bohyunkim/examples
6 Must Have Google Analytics Filters by Jason CartwrightJM Garcia
This whitepaper covers a few tips and tricks regarding the use of filters that will help you get more from Google Analytics and aid you in making more informed decisions about your site and its content.
1.Setting your URLs to Lowercase
-If the URLs for your website allow upper and lowercase characters, Google will report on each version of a URL separately. Imagine the pain if you had 100s, 1000s or more pages on your website with multiple line entries for the same URL in your reports. This filter will save time by ensuring all the multiple line entries of a single URL are treated as the same page.
2. Excluding your own traffic from reports - Chances are that your own visits to your own web site are not going to be a large percentage of the total visits and page views. Nonetheless, you can still permanently remove your own traffic and that of the agencies you might be working with from appearing in your Google Analytics statistics1.
3. Showing the domain / hostname
- If you want to see the domain / hostname appended to page names within your reports, this filter will include it for you. If you are using Google Analytics across multiple domains, this can also help to identify which domains the Account ID is running on.
4. Filtering directories and sub-directories
- This filter is useful to either report on stats for a specific directory e.g. /blog/ or exclude a directory from the report. There are many scenarios where this can be beneficial, particularly if you want to separate out e.g. blog or newsfeed traffic from the rest of the traffic to your site.
5. Separating mobile and non-mobile traffic
- Mobile traffic is on the increase and mobile users will interact with your site differently to desktop users. This filter separates out the audiences so you can make more informed decisions towards each.
6. Making sense of the search keyword (Not Provided)
- If you've looked at the organic search phrase report, the chances are that one of the top keywords appearing reads as (not provided). This filter breaks down the (not provided) keyword by including the stats for the associated landing pages. This will give you an idea of what keywords are likely being classified under (not provided).
7. BONUS - See search rankings for keywords in Google
- Have you ever wanted to see what the ranking of a keyword was in Google and how much traffic it drives in that position? Using this bonus filter you can.
Quick & Easy Data Visualization with Google Visualization API + Google Char...Bohyun Kim
Presentation given at the 2014 CODE4LIB Conference, Raleigh, NC. Mar. 25, 2014.
Conference Info: http://code4lib.org/conference/2014/schedule
Example code: http://github.com/bohyunkim/examples
Javascript no para de expandirse y avanzar. Y ahora, con la llegada de EcmaScript 6, ciertos workflows y la forma de escribir código va a cambiar. Ahora los javascripters vamos a tener más herramientas en nuestras manos. Ya hay frameworks populares como Angular que sus futuras versiones vendrán en EcmaScript 6. ¿Porqué no hecharle un vistazo al futuro?
Charla impartida por Luis Calvo en la última edición de Codemotion (Madrid, Spain - Nov 21-22)
Greach es el evento sobre tecnologías basadas en lenguaje Groovy referente en España.
Dentro de este evento, la charla 'Use Groovy & Grails in your Spring Boot projects' se presenta como una propuesta de ejemplos y posibilidades de introducir este lenguaje y algunos módulos del framework Grails (basado también en Groovy) en proyectos implementados con la reciente solución lanzada por Spring llama Spring Boot.
More info:
http://buff.ly/1DXXQWU
El pasado 17 de mayo se celebró en las oficinas de Paradigma Digital el meetup mensual del grupo Python Madrid. Nuestro compañero Álvaro León nos habló de Kafka y Python.
Vídeo de la presentación: https://www.youtube.com/watch?v=HPfNDL-jIGM
Google Analytics es una herramienta de analítica la que se conoce sólo una parte de su potencial. Además de medir audiencias y su comportamiento, Google Analytics permite priorizar las inversiones en marketing online, recoger comportamientos de Single Page Applications y visualizar datos offline, por ejemplo de CRM y combinarlos con los de visitas online. También es posible recoger datos en tiempo real de ventas, por ejemplo de ecommerce y de dispositivos físicos como bluetooth beacons. Las funcionalidades de Google Analytics, en combinación con Big Query y otros servicios de Google Cloud Platform, la convierte en una de las plataformas más interesantes de analítica para la transformación digital.
Si quieres ver el vídeo en el que fue usada esta presentación, pulsa aquí: https://www.youtube.com/watch?v=2mfIU-NXGXI
Para ver la convocatoria en nuestra web, clic aquí: https://www.paradigmadigital.com/eventos/usar-google-analytics/
La convocatoria a través del grupo de Meetup.com, clic aquí: https://www.meetup.com/es-ES/Front-end-Developers-Madrid/events/231793469/
Esta presentación nos muestra qué es la programación reactiva, en qué consiste, qué nos permite hacer y por qué está tan de moda. Además, podemos ver el uso concreto de esta programación utilizando RxJava.
Autor: Juan Pablo González de Gracia.
Video https://www.youtube.com/edit?video_id=gQNcDyT2qnc
Description of the Google Analytics platform, how the tracking code works, metrics and dimensions, event hits, tests A/B, clientID and user_id
¿Cómo definir el roadmap de transformación digital? En Paradigma llevamos más de 20 años ayudando a grandes compañías en su camino hacia la digitalización.
El pasado 17 de mayo se celebró en las oficinas de Paradigma Digital el meetup mensual del grupo Python Madrid. Pablo González Fuente, de GMV, nos habló de Python y Flink.
Vídeo del evento: https://www.youtube.com/watch?v=HPfNDL-jIGM
En Paradigma creemos que los grandes dragones digitales han desbancado a las empresas tradicionales. La clave para combatir esos dragones es la transformación digital.
Lo que subyace bajo lo que denominamos "HTML5" es la conversión en nativo de "frameworks" y/o tecnologías utilizadas a diario. El navegador se convierte así en una aplicación cada vez más potente gracias a que HTML5 cada vez es más poderoso. Los Web Components van en esa línea, haciendo nativo el "templating", los "custom tags" los "import" (los "includes" de otros lenguajes) y el "shadow dom". Y yo con estos pelos...
Charla impartida por Luis Calvo en la última edición de Codemotion (Madrid, Spain - Nov 21-22)
Kubernetes es un proyecto open source de Google cuyo propósito es el de hacer de orquestador de containers. En este seminario se tratará de crear una base partiendo desde los principios más fundamentales, de forma que cualquiera con unos conceptos básicos de contenedores pueda entender cómo funciona kubernetes y qué utilidades nos ofrece a la hora de manejar contenedores.
Ponente: Alfredo Espejel, técnico de sistemas en Paradigma
Alfredo cuenta con casi 10 años de experiencia en administración de sistemas, principalmente Linux. Interesado también en las redes, pero sobre todo en las últimas tendencias y tecnologías.
Vídeo de la charla: https://www.youtube.com/watch?v=zI16fatmnVQ
Más información sobre el meetup: http://www.meetup.com/Cloud-Computing-Spain/events/226254765/
¿Cómo se despliega y autoescala Couchbase en Cloud? ¡Aprende de manera práctica!Paradigma Digital
En el pasado Meetup, presentamos Couchbase de manera general, pero ha llegado el momento de ir ahondando en los detalles del producto para conocer todas sus capacidades. Esto nos permitirá estar en mejor disposición para adoptarlo en nuestros proyectos.
En esta ocasión, se hablará de la capa de operaciones y despliegue de Couchbase aunque no con un enfoque tradicional en máquinas físicas, sino siguiendo las buenas prácticas del mercado. Explicaremos y haremos el despliegue en Google Cloud con escalabilidad horizontal elástica y automática.
Para llevar a cabo esto haremos uso, entre otras, de las siguientes tecnologías: Google Cloud, Kubernetes, Python y, por supuesto, Couchbase.
Pondremos a prueba nuestra infraestructura con una pequeña aplicación, si queréis ver los resultados, no os lo podéis perder!
Somos una empresa nativa digital, creada para Internet y con una manera diferente de hacer las cosas. A lo largo de estos últimos 10 años hemos construido una compañía sin jerarquías con una cultura de empresa basada en la libertad y la responsabilidad, que nos ha permitido llegar a ser el partner tecnológico de algunas de las grandes empresas españolas. Te contamos cuál es nuestro secreto. ¿Quieres conocer la cultura digital de Paradigma?
The PageRank algorithm is an important algorithm which is implemented to determine the quality of a page on the web. With search engines attaining a high position in guiding the traffic on the internet, PageRank is an important factor to determine its flow. Since link analysis is used in search engine's ranking systems, link based spam structure known as link farms are created by spammers to generate a high PageRank for their and in turn a target page. In this paper, we suggest a method through which these structures can be detected and thus the overall ranking results can be improved.
Hi All,
This Presentation will feature more about the working of search engine how do the inner functionality takes place. In the later half of the Presentation the Page Rank will be explained in depth. how do they calculate it, How it differing from the actual PR, Google PR. How frequently they do update the PR value in the google. and lots more with calculation and few examples.
Javascript no para de expandirse y avanzar. Y ahora, con la llegada de EcmaScript 6, ciertos workflows y la forma de escribir código va a cambiar. Ahora los javascripters vamos a tener más herramientas en nuestras manos. Ya hay frameworks populares como Angular que sus futuras versiones vendrán en EcmaScript 6. ¿Porqué no hecharle un vistazo al futuro?
Charla impartida por Luis Calvo en la última edición de Codemotion (Madrid, Spain - Nov 21-22)
Greach es el evento sobre tecnologías basadas en lenguaje Groovy referente en España.
Dentro de este evento, la charla 'Use Groovy & Grails in your Spring Boot projects' se presenta como una propuesta de ejemplos y posibilidades de introducir este lenguaje y algunos módulos del framework Grails (basado también en Groovy) en proyectos implementados con la reciente solución lanzada por Spring llama Spring Boot.
More info:
http://buff.ly/1DXXQWU
El pasado 17 de mayo se celebró en las oficinas de Paradigma Digital el meetup mensual del grupo Python Madrid. Nuestro compañero Álvaro León nos habló de Kafka y Python.
Vídeo de la presentación: https://www.youtube.com/watch?v=HPfNDL-jIGM
Google Analytics es una herramienta de analítica la que se conoce sólo una parte de su potencial. Además de medir audiencias y su comportamiento, Google Analytics permite priorizar las inversiones en marketing online, recoger comportamientos de Single Page Applications y visualizar datos offline, por ejemplo de CRM y combinarlos con los de visitas online. También es posible recoger datos en tiempo real de ventas, por ejemplo de ecommerce y de dispositivos físicos como bluetooth beacons. Las funcionalidades de Google Analytics, en combinación con Big Query y otros servicios de Google Cloud Platform, la convierte en una de las plataformas más interesantes de analítica para la transformación digital.
Si quieres ver el vídeo en el que fue usada esta presentación, pulsa aquí: https://www.youtube.com/watch?v=2mfIU-NXGXI
Para ver la convocatoria en nuestra web, clic aquí: https://www.paradigmadigital.com/eventos/usar-google-analytics/
La convocatoria a través del grupo de Meetup.com, clic aquí: https://www.meetup.com/es-ES/Front-end-Developers-Madrid/events/231793469/
Esta presentación nos muestra qué es la programación reactiva, en qué consiste, qué nos permite hacer y por qué está tan de moda. Además, podemos ver el uso concreto de esta programación utilizando RxJava.
Autor: Juan Pablo González de Gracia.
Video https://www.youtube.com/edit?video_id=gQNcDyT2qnc
Description of the Google Analytics platform, how the tracking code works, metrics and dimensions, event hits, tests A/B, clientID and user_id
¿Cómo definir el roadmap de transformación digital? En Paradigma llevamos más de 20 años ayudando a grandes compañías en su camino hacia la digitalización.
El pasado 17 de mayo se celebró en las oficinas de Paradigma Digital el meetup mensual del grupo Python Madrid. Pablo González Fuente, de GMV, nos habló de Python y Flink.
Vídeo del evento: https://www.youtube.com/watch?v=HPfNDL-jIGM
En Paradigma creemos que los grandes dragones digitales han desbancado a las empresas tradicionales. La clave para combatir esos dragones es la transformación digital.
Lo que subyace bajo lo que denominamos "HTML5" es la conversión en nativo de "frameworks" y/o tecnologías utilizadas a diario. El navegador se convierte así en una aplicación cada vez más potente gracias a que HTML5 cada vez es más poderoso. Los Web Components van en esa línea, haciendo nativo el "templating", los "custom tags" los "import" (los "includes" de otros lenguajes) y el "shadow dom". Y yo con estos pelos...
Charla impartida por Luis Calvo en la última edición de Codemotion (Madrid, Spain - Nov 21-22)
Kubernetes es un proyecto open source de Google cuyo propósito es el de hacer de orquestador de containers. En este seminario se tratará de crear una base partiendo desde los principios más fundamentales, de forma que cualquiera con unos conceptos básicos de contenedores pueda entender cómo funciona kubernetes y qué utilidades nos ofrece a la hora de manejar contenedores.
Ponente: Alfredo Espejel, técnico de sistemas en Paradigma
Alfredo cuenta con casi 10 años de experiencia en administración de sistemas, principalmente Linux. Interesado también en las redes, pero sobre todo en las últimas tendencias y tecnologías.
Vídeo de la charla: https://www.youtube.com/watch?v=zI16fatmnVQ
Más información sobre el meetup: http://www.meetup.com/Cloud-Computing-Spain/events/226254765/
¿Cómo se despliega y autoescala Couchbase en Cloud? ¡Aprende de manera práctica!Paradigma Digital
En el pasado Meetup, presentamos Couchbase de manera general, pero ha llegado el momento de ir ahondando en los detalles del producto para conocer todas sus capacidades. Esto nos permitirá estar en mejor disposición para adoptarlo en nuestros proyectos.
En esta ocasión, se hablará de la capa de operaciones y despliegue de Couchbase aunque no con un enfoque tradicional en máquinas físicas, sino siguiendo las buenas prácticas del mercado. Explicaremos y haremos el despliegue en Google Cloud con escalabilidad horizontal elástica y automática.
Para llevar a cabo esto haremos uso, entre otras, de las siguientes tecnologías: Google Cloud, Kubernetes, Python y, por supuesto, Couchbase.
Pondremos a prueba nuestra infraestructura con una pequeña aplicación, si queréis ver los resultados, no os lo podéis perder!
Somos una empresa nativa digital, creada para Internet y con una manera diferente de hacer las cosas. A lo largo de estos últimos 10 años hemos construido una compañía sin jerarquías con una cultura de empresa basada en la libertad y la responsabilidad, que nos ha permitido llegar a ser el partner tecnológico de algunas de las grandes empresas españolas. Te contamos cuál es nuestro secreto. ¿Quieres conocer la cultura digital de Paradigma?
The PageRank algorithm is an important algorithm which is implemented to determine the quality of a page on the web. With search engines attaining a high position in guiding the traffic on the internet, PageRank is an important factor to determine its flow. Since link analysis is used in search engine's ranking systems, link based spam structure known as link farms are created by spammers to generate a high PageRank for their and in turn a target page. In this paper, we suggest a method through which these structures can be detected and thus the overall ranking results can be improved.
Hi All,
This Presentation will feature more about the working of search engine how do the inner functionality takes place. In the later half of the Presentation the Page Rank will be explained in depth. how do they calculate it, How it differing from the actual PR, Google PR. How frequently they do update the PR value in the google. and lots more with calculation and few examples.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.iosrjce
The internet is a vast collection of billions of web pages containing terabytes of information
arranged in thousands of servers using HTML. The size of this collection itself is a formidable obstacle in
retrieving necessary and relevant information. This made search engines an important part of our lives. Search
engines strive to retrieve information as relevant as possible. One of the building blocks of search engines is the
Web Crawler. We tend to propose a two - stage framework, specifically two smart Crawler, for efficient
gathering deep net interfaces. Within the first stage, smart Crawler, performs site-based sorting out centre
pages with the assistance of search engines, avoiding visiting an oversized variety of pages. To realize
additional correct results for a targeted crawl, smart Crawler, ranks websites to order extremely relevant ones
for a given topic. Within the second stage, smart Crawler, achieves quick in – site looking by excavating most
relevant links with associate degree accommodative link -ranking
What is the current status quo of the Semantic Web as first mentioned by Tim Berners Lee in 2001?
Not only 10 blue links can drive you traffic anymore, Google has added many so called Knowlegde cards and panels to answer the specific informational need of their users. Sounds complicated, but it isn’t. If you ask for information, Google will try to answer it within the result pages.
I'll share my research from a theoretical point of view through exploring patents and papers, and actual testing cases in the live indices of Google. Getting your site listed as the source of an Answer Card can result in an increase of CTR as much as 16%. How to get listed? Come join my session and I'll shine some light on the factors that come into play when optimizing for Google's Knowledge graph.
This presentation is based on ranking of web pages, mainly it consist of PageRank algorithm and HITS algorithm. It gives brief knowledge of how to calculate page rank by looking at the links between the pages. It tells you about different techniques of search engine optimization.
Presentation given at DMZ about Data Structure Graphs.
Also known as Applying Social Network Analysis Techniques to Data Modeling and Data Architecture
Similar to Analysis of Websites as Graphs for SEO (20)
La arquitectura de microservicios persigue maximizar la adaptabilidad de las soluciones mediante la distribución de las responsabilidades del software en servicios con ciclo de vida independiente.
Lograr la independencia de los microservicios es clave para beneficiarse de las ventajas de la arquitectura. Esto exige un profundo entendimiento del dominio funcional, lo que se logra mediante DDD.
Por otro lado la arquitectura hexagonal nos permite estructurar el software de manera que la capa de código relacionada con el dominio funcional no se vea interferida por aspectos tecnológicos, es decir, que dicha capa sólo exprese el Ubiquitous Language, es decir el lenguaje del modelo en según lo llama DDD.
Dicha separación en capas y el invertir las dependencias permite además garantizar la máxima portabilidad del código.
¿Qué vamos a ver?
1. Beneficios
2. Domain Driven Design.
- Conceptos - Big Picture.
- Conceptos - Code architecture.
- Event Storming.
3. Clean Code Architecture.
- Hexagonal Architecture.
- Onion Architecture.
Bots 3.0: Dejando atrás los bots conversacionales con Dialogflow.Paradigma Digital
Atención personalizada y automatización de operativas con IA de forma sencilla con DialogFlow. Al terminar esta charla serás capaz de crear un bot con Dialogflow que solucione tareas sencillas.
En esta charla veremos:
- Cuales son las necesidades de negocio que satisface este tipo de soluciones
- Alternativas en el mercado
- Solución de la necesidad con DialogFlow
Ponente: Alex Asensio - Business Lead en Paradigma Digital
Pragmático y siempre enfocado a objetivos de negocio. Enamorado de la tecnología pero también con la forma en que entregamos software a nuestros clientes, basada en el "empirismo". Tech + Biz mano a mano es la fórmula de éxito que queremos compartir con ellos.
En esta nueva entrega sobre service-mesh veremos el que probablemente se convertirá en el producto de referencia: Istio.
Analizaremos las funcionalidades que aporta, su arquitectura interna, la integración con productos de terceros así como su repercusión
dentro de las arquitecturas actuales. Realizaremos algunos ejemplos para mostrar la funcionalidad y configuración
Ponente:
Abraham Rodríguez está especializado en soluciones cloud native con arquitecturas de microservicios, stack con el que ha trabajado en diversos proyectos. Apasionado defensor de todo lo relacionado con cloud, metodologías ágiles, software libre y devops.
En esta presentación hablamos de Linkerd, uno de los pioneros en el ámbito de las "arquitecturas Service Mesh". Haremos un repaso por la historia de este producto, conoceremos sus principales funcionalidades y tendremos una parte práctica en la que mostraremos su integración en arquitecturas distribuidas junto a Docker y Kubernetes.
¿Cómo hago que mis APIs sean usables?
A través de un ejemplo desarrollado en Spring veremos como realizar todo el proceso de diseño aplicando un conjunto de buenas prácticas que te ayuden en el proceso de toma de decisión a la hora de enfrentarte al diseño de APIs.
En este meetup vamos a analizar uno de los pilares básicos en el proceso de transformación digital de las empresas: API Management. Para ello, explicaremos en qué consiste esta estrategia, y los diferentes conceptos y componentes que intervienen en la misma.Además, para completar esta visión con un caso práctico, mostraremos un ejemplo de implementación mediante uno de los productos OpenSource de API Management más exitoso del mercado: WSO2.
https://www.meetup.com/Microservicios
Solr es un motor de búsqueda open source que proporciona unas herramientas muy potentes a la hora de realizar búsquedas sobre campos de texto. En esta charla se tratarán las características básicas y las principales funcionalidades, ya sean básicas (indexación y búsqueda) o avanzadas (resaltado, corrección ortográfica y resultados similares)
Impartido por Alejandro Marqués
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
1. Analysis of Websites as Graphs for SEO
Analysis of Websites as Graphs for SEO
Rubén Martínez – Junio 2015 – Open Analytics Madrid
2. Analysis of Websites as Graphs for SEO
Items
(books,
music,
etc)
used
to
be
arranged
in
5ght
silos
by
categories
3. Analysis of Websites as Graphs for SEO
There is more to websites than meets the eye
Has
a
website
ever
been
this
boring?
We
tend
to
think
of
websites
as
a
homepage
on
the
top
followed
by
a
second
layer
of
children
webpages
(categories),
a
third
level
below
(sub-‐categories)
and
pages
of
items
(products,
ar5cles,
etc)
at
the
bo@om.
Happily,
reality
is
not
so
simple!
4. Analysis of Websites as Graphs for SEO
First-ever website - 1990
Source:
Tim
Berners-‐Lee's
web
catalog
at
CERN.
A
copy
is
available
at
h@p://www.w3.org/History/19921103-‐hypertext/hypertext/WWW/TheProject.html
Not
even
the
1st
ever
website
was
a
simple
hierarchical
tree
of
categories
and
sub-‐categories
5. Analysis of Websites as Graphs for SEO
Websites are graphs
Graph
theory
A
graph
is
an
ordered
pair
G
=
(V,
E)
comprising
a
set
V
of
ver5ces
or
nodes
together
with
a
set
E
of
edges
or
links.
Websites
Websites
are
graphs
whose
webpages
are
nodes
and
links,
directed
edges.
Actual
websites
are
a
more
organic,
messy
business
Visualiza5on
of
a
300-‐pages
ecommerce
website
6. Analysis of Websites as Graphs for SEO
Link analysis in graph theory
PageRank
is
a
link
analysis
algorithm.
It
outputs
a
probability
distribu;on
that
represents
the
likelihood
that
a
person
clicking
on
links
will
arrive
at
any
par;cular
page.
Google’s
reasonable
surfer
model
of
weigh5ng
of
hyperlinks
by
their
posi5on
on
the
page
It
assigns
a
numerical
weigh5ng
to
each
element
of
a
hyperlinked
set
of
documents,
such
as
the
World
Wide
Web,
with
the
purpose
of
"measuring"
its
rela5ve
importance
within
the
set.
7. Analysis of Websites as Graphs for SEO
Optimization of PageRank in websites
The
PageRank
is
diluted
with
every
level
down
the
structure
of
categories
and
sub-‐categories.
This is a waste of expensive PageRank Same information on a leaner, more efficient web architecture
PageRank
is
not
as
important
in
SEO
as
it
used
to
be.
It
is
s5ll
useful
to
op5mise
web
architectures
On-‐page
SEO
is
mostly
about
analysing
graphs,
measuring
them
and
op5mising
them
empirically
and
itera5vely
8. Analysis of Websites as Graphs for SEO
Steps of the analysis of websites
Crawling
a
website
Cleaning
the
output
of
inlinks
csv
file
Source,Des5na5on
Visualizing
the
graph
Analysing
the
rela5ons
of
specific
nodes
Parameterizing
the
whole
graph
SEO
experts
are
usually
presented
with
inefficient
websites
that
require
ra5onaliza5on
and
more
o_en
than
not,
extensive
re-‐indexa5on
on
Google.
Understanding
and
parameterizing
the
graph
of
a
website
before
and
a_er
radical
changes
of
its
structure
is
key.
We
build
a
comma
separated
value
file
with
pairs
of
URLs
linking
to
other
URLs.
The
csv
file
contains
the
data
of
the
connected
graph
that
can
be
visualized,
parameterized
and
analysed.
9. Analysis of Websites as Graphs for SEO
Crawling and exporting a csv file of inlinks
1st
step
–
Crawl
a
significant
sample
of
the
webpages
of
a
website
Desktop
applica5ons
• Screaming
Frog
(fee
per
licence,
all
OS)
• Xenu
Link
Sleuth
(free,
Windows)
Bash
scripts
using
command
tools
-‐
Beware
–
poorly
wri@en
scripts
might
not
be
polite.
• CURL
• Wget
(2nd
step
-‐
Scrape
if
you
have
to
get
specific
snippets
of
text
from
the
crawled
pages)
Scrapy
in
Python
$
pip
install
scrapy
(3rd
step
Extract
data
if
you
have
to
get
specific
URLs
linked
from
the
scraped
text)
Beau5ful
Soup
A
Python
library
for
pulling
data
out
of
HTML
and
XML
files.
10. Analysis of Websites as Graphs for SEO
Cleansing & grooming of the output .csv file
Output:
csv
files
with
the
crawled
inlinks
Origin,
Des5na5on
URL
1,
URL
2
URL
2,
URL
3
URL
1,
URL
3
…
URL
n,
URL
m
Clean
and
filter:
best
with
bash
one-‐liners
#!/bin/bash
FILE=
DOMAIN=
cut
-‐f2,3
$FILE
|
sed
-‐e
"s/http://$DOMAIN//g"
-‐e
"s/http://www."$DOMAIN"//g"
-‐e
's/t/,/g'
|
grep
–vi
".jpg|http:|.css|.js|.gif|.png|@|mailto|xml|http|?|=“
>
filtered.csv
11. Analysis of Websites as Graphs for SEO
Visualization of a website or part of it
Gephi
is
an
interac5ve
visualiza5on
and
explora5on
plahorm
for
all
kinds
of
networks
and
complex
systems,
dynamic
and
hierarchical
graphs.
It
performs
poorly
with
large
graphs
(tens
of
thousands
of
nodes
and
hundreds
of
thousands
of
inlinks).
Other
tools?
–
promising
Key
Lines
h@p://keylines.com/neo4j
Tulip
h@p://tulip.labri.fr/TulipDrupal/
12. Analysis of Websites as Graphs for SEO
Example 1 - Graph of the website of an annual conference
The
home
(dark
green
node
in
the
center)
links
down
to
categories
(light
green
or
light
orange)
like
the
page
of
program
which
in
its
turn
links
down
to
item
pages
(dark
orange)
with
descrip5on
of
each
talk
with
bio
of
the
speaker,
etc.
This
web
architecture
seems
efficient
but
item
pages
might
be
be@er
connected
to
the
whole
graph
The
cluster
on
the
right
is
the
1st
edi5on
of
the
event
(few
talks).
The
cluster
on
the
le_
is
the
2nd
edi5on
of
the
event
(more
talks).
13. Analysis of Websites as Graphs for SEO
Example 2 - Graph of the website of a shopping website
The
orange
dots
are
products
and
green
balls
categories.
Why
do
they
ALL
connect
to
each
other?
Aren’t
there
products
more
relevant
to
users
and
to
the
business
than
others?
Some
products
get
more
traffic
but
yield
less
margin.
The
op5mal
web
architecture
overweighs
the
internal
linking
to
the
most
popular
products
with
the
highest
revenue
or
margin.
This
looks
like
a
programma5c
linking
scheme.
Ecommerce
is
usually
more
complex
than
it
is
represented
here.
14. Analysis of Websites as Graphs for SEO
Example 3 - Graphs of 2 directly competing websites
This
looks
like
an
organic
network
of
clusters
connec5ng
other
clusters
and
distant
nodes
with
thin
links.
This
is
a
dense
pack
of
many
webpages
connec5ng
to
many
other
webpages
without
discernible
pa@erns
or
clusters.
These
graphs
are
small
samples
of
2
large
websites
compe5ng
for
the
same
keywords
on
Google
Both
websites
are
successful
SEO
proposi5ons
with
radically
different
approaches.
Why?
15. Analysis of Websites as Graphs for SEO
Thin
connec5ons
tend
to
link
the
clusters,
allowing
informa5on
to
move
between
them.
Source: Giles, Jim. Making the links. Nature - Aug 23rd 2012
The power of weak links
These
networks
are
usually
efficient
enough
in
terms
of
SEO.
16. Analysis of Websites as Graphs for SEO
Analysis of the whole graph
igraph
is
a
collec5on
of
network
analysis
tools
It
is
available
in
R
library(igraph)
dat=read.csv(file.choose(),header=TRUE)
#
choose
an
edgelist
in
.csv
file
format
summary(dat)
g=graph.data.frame(dat,directed=TRUE)
vcount(g)
200637
ecount(g)
4174400
centralization.degree(g)
0.4998589
17. Analysis of Websites as Graphs for SEO
Analysis of the whole graph - parameters
transitivity(g)
0.001666909
graph.density(g)
0.0001036989
igraph
calculates
metrics
of
whole
graphs
with
built-‐in
func5ons.
Transi5vity
or
clustering
coefficient
measures
the
probability
that
the
adjacent
ver;ces
of
the
ver;ces
or
a
graph
are
connected.
This
metric
along
the
graph
density
are
useful
references
to
compare
websites
between
them
or
one
website
before
and
a_er
changes
in
its
web
architecture.
website5
has
the
lowest
values
of
transi5vity
and
density:
increasing
them
would
result
in
an
improved
SEO
Sheet1
graph vertices edges diameter transitivity
website1 8305 34185 30 0.007959 0.000499
website2 10852 88732 16 0.004671 0.000721
website3 11272 71035 20 0.004017 0.000639
website4 11593 47380 32 0.003730 0.001088
website5 200637 4174400 n/a 0.001667 0.000104
graph
density
18. Analysis of Websites as Graphs for SEO
Analysis of specific nodes
h@p://console.neo4j.org/
MATCH
(n:Crew)-‐[r:LOVES*]-‐(m)
WHERE
n.name='Neo'
RETURN
n,m
n
m
(0:Crew
{name:"Neo"})
(2:Crew
{name:"Trinity"})
19. Analysis of Websites as Graphs for SEO
Analysis of specific nodes
Count
the
number
of
nodes
connected
to
one
node
MATCH
(n
{
name:
'Neo'
})-‐-‐>(x)
RETURN
n,
count(*)
MATCH
(n
{
name:
'Neo'
})-‐-‐>(x)
RETURN
x
(2:Crew
{name:"Trinity"})
(1:Crew
{name:"Morpheus"})
n
count(*)
(0:Crew
{name:"Neo"})
2
20. Analysis of Websites as Graphs for SEO
Analysis of specific nodes
MATCH
(n:Crew)-‐[r:KNOWS*]-‐(m:Matrix)
WHERE
n.name='Neo'
RETURN
m
(3:Crew:Matrix
{name:"Cypher"})
(4:Matrix
{name:"Agent
Smith"})
Find
the
shortest
path
between
n
and
m
of
type
:LOVES
MATCH
p
=
shortestPath((n:Crew)-‐[:LOVES]-‐>(m:Matrix))
WHERE
n.name='Neo’
RETURN
p
AS
Neo,m
21. Analysis of Websites as Graphs for SEO
That’s all Folks!
Thank you.
Rubén
Marqnez
@ruben_at_it
rmar5nez@paradigmatecnologico.com