SlideShare a Scribd company logo
1 of 47
Download to read offline
Ranking the Web with Spark
Apache Big Data Europe 2016
sylvain@sylvainzimmer.com
@sylvinus
/usr/bin/whoami
• Jamendo (Founder & CTO, 2004-2011)
• TEDxParis (Co-founder, 2009-2012)
• dotConferences (Founder, 2012-)
• Pricing Assistant (Co-founder & CTO, 2012-)
transparency
reproducibility
https://uidemo.commonsearch.org
https://explain.commonsearch.org/?q=python&g=en
Ranking
Disclaimer: IANASRE
(I Am Not A Search Relevance Engineer)
What's in a score
score = fn( doc, query, language, user, time )
What's in a score
score = fn( doc, query )
What's in a score
score = fn( static_score, dynamic_score ( query ))
Static score
Static features
• Scopes:
• Page: URL depth, markup stats, ...
• Domain: Age, page count, blacklists, ...
• WebGraph: PageRank, ...
http://infolab.stanford.edu/~backrub/google.html
The Anatomy of a Large-Scale Hypertextual Web Search Engine (1998)
Crawler
Indexer
Database
SearcherRanker
Dynamic score
Dynamic features
• Text match: TF-IDF, BM25, proximity, topic, ...
• Query-level: number of words, popularity, ...
• Usage: clicks, dwell time, reformulations, ...
• Time
Scoring function
Users
Database
Elasticsearch
Indexer
Python, Spark
Data sources
Common Crawl, Alexa top 1M, ...
words, static score
query top 10 docs, final scores
Offline
Online
Searcher
Go
https://explain.commonsearch.org/?q=python&g=en
Issues with this architecture
• Static & dynamic scoring are in different
codebases
• No control over result diversity
• Hard to optimize
• Very dependent on Elasticsearch
Rescoring
Users
Database
Indexer
words, static score, features
query
Searcher
top 1k docs, features
Rescorer
final 10 docs
Issues with rescoring
• Latency
• Pagination
• Harder to explain
Learning to rank
LTR Model
• Features
• Training dataset
• Evaluation: NDCG, ERR, ...
• Algorithms: AdaRank, ListNet, LambdaMART, ...
• Learning with Spark!
The right questions
• What do users expect?
• What features?
• How to evaluate and fine-tune in the real world?
PageRank with Spark
http://commoncrawl.org
https://github.com/commonsearch/cosr-back
Common Search Pipeline
Doc sources
Common Crawl,
WARC files,
URLs ...
Filter
plugins
Document
parsing
Output
plugins
Data output
Database, file,
HDFS, S3, ...
Most popular Wikipedia pages
Dumping the web graph
Naive pyspark PageRank
GraphFrames
SparkSQL PageRank
SparkSQL PageRank
https://github.com/commonsearch/cosr-back/blob/master/spark/jobs/pagerank.py
Tests
http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm
https://github.com/commonsearch/cosr-back/blob/master/tests/sparktests/test_pagerank.py
https://about.commonsearch.org/developer/get-started
Top 10
Spam
Spamdexing
• Keyword stuffing, hidden text
• Scraper sites, Mirrors
• Link farms
• Splogs, Comment spam
• Domaining
• Cloaking
• Bombing
Questions?
https://about.commonsearch.org/contributing
https://github.com/commonsearch
contact@commonsearch.org
slack.commonsearch.org

More Related Content

What's hot

RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsGraph-TA
 
Graph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsGraph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsStéphane Fréchette
 
Graphs, Graphs everywhere - Lucene powered relation exploration
Graphs, Graphs everywhere - Lucene powered relation explorationGraphs, Graphs everywhere - Lucene powered relation exploration
Graphs, Graphs everywhere - Lucene powered relation explorationZbyszko Papierski
 
Search Intelligently - Liferay Symposium North America 2016, Chicago, USA
Search Intelligently - Liferay Symposium North America 2016, Chicago, USASearch Intelligently - Liferay Symposium North America 2016, Chicago, USA
Search Intelligently - Liferay Symposium North America 2016, Chicago, USAAndré Ricardo Barreto de Oliveira
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...South London Geek Nights
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataGraph-TA
 
Managing RDF data with graph databases
Managing RDF data with graph databasesManaging RDF data with graph databases
Managing RDF data with graph databasesGraph-TA
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...jexp
 
Elasticsearch @JBoss.org, 2014
Elasticsearch @JBoss.org, 2014Elasticsearch @JBoss.org, 2014
Elasticsearch @JBoss.org, 2014Lukas Vlcek
 

What's hot (11)

RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL Platforms
 
Graph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsGraph Databases for SQL Server Professionals
Graph Databases for SQL Server Professionals
 
Graphs, Graphs everywhere - Lucene powered relation exploration
Graphs, Graphs everywhere - Lucene powered relation explorationGraphs, Graphs everywhere - Lucene powered relation exploration
Graphs, Graphs everywhere - Lucene powered relation exploration
 
Search Intelligently - Liferay Symposium North America 2016, Chicago, USA
Search Intelligently - Liferay Symposium North America 2016, Chicago, USASearch Intelligently - Liferay Symposium North America 2016, Chicago, USA
Search Intelligently - Liferay Symposium North America 2016, Chicago, USA
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
 
DBpedia Japanese
DBpedia JapaneseDBpedia Japanese
DBpedia Japanese
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
 
Managing RDF data with graph databases
Managing RDF data with graph databasesManaging RDF data with graph databases
Managing RDF data with graph databases
 
NoSQL Roundup
NoSQL RoundupNoSQL Roundup
NoSQL Roundup
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
 
Elasticsearch @JBoss.org, 2014
Elasticsearch @JBoss.org, 2014Elasticsearch @JBoss.org, 2014
Elasticsearch @JBoss.org, 2014
 

Viewers also liked

Why and how Pricing Assistant migrated from Celery to RQ - Paris.py #2
Why and how Pricing Assistant migrated from Celery to RQ - Paris.py #2Why and how Pricing Assistant migrated from Celery to RQ - Paris.py #2
Why and how Pricing Assistant migrated from Celery to RQ - Paris.py #2Sylvain Zimmer
 
PyCon FR 2016 - Et si on recodait Google en Python ?
PyCon FR 2016 - Et si on recodait Google en Python ?PyCon FR 2016 - Et si on recodait Google en Python ?
PyCon FR 2016 - Et si on recodait Google en Python ?Sylvain Zimmer
 
Predictive modeling healthcare
Predictive modeling healthcarePredictive modeling healthcare
Predictive modeling healthcareTaposh Roy
 
Building distributed processing system from scratch - Part 2
Building distributed processing system from scratch - Part 2Building distributed processing system from scratch - Part 2
Building distributed processing system from scratch - Part 2datamantra
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streamingdatamantra
 
Keyboard covert channels
Keyboard covert channelsKeyboard covert channels
Keyboard covert channelsFreeman Zhang
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Introjeykottalam
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to datasetdatamantra
 
Evolution of apache spark
Evolution of apache sparkEvolution of apache spark
Evolution of apache sparkdatamantra
 
Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2datamantra
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosPaco Nathan
 
Anatomy of in memory processing in Spark
Anatomy of in memory processing in SparkAnatomy of in memory processing in Spark
Anatomy of in memory processing in Sparkdatamantra
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFramesSpark Summit
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streamingdatamantra
 
Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1datamantra
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLdatamantra
 
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKResilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKTaposh Roy
 
Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in ScalaAlex Payne
 

Viewers also liked (20)

Why and how Pricing Assistant migrated from Celery to RQ - Paris.py #2
Why and how Pricing Assistant migrated from Celery to RQ - Paris.py #2Why and how Pricing Assistant migrated from Celery to RQ - Paris.py #2
Why and how Pricing Assistant migrated from Celery to RQ - Paris.py #2
 
PyCon FR 2016 - Et si on recodait Google en Python ?
PyCon FR 2016 - Et si on recodait Google en Python ?PyCon FR 2016 - Et si on recodait Google en Python ?
PyCon FR 2016 - Et si on recodait Google en Python ?
 
Predictive modeling healthcare
Predictive modeling healthcarePredictive modeling healthcare
Predictive modeling healthcare
 
Building distributed processing system from scratch - Part 2
Building distributed processing system from scratch - Part 2Building distributed processing system from scratch - Part 2
Building distributed processing system from scratch - Part 2
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
Keyboard covert channels
Keyboard covert channelsKeyboard covert channels
Keyboard covert channels
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to dataset
 
Evolution of apache spark
Evolution of apache sparkEvolution of apache spark
Evolution of apache spark
 
Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2
 
Spark on yarn
Spark on yarnSpark on yarn
Spark on yarn
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
 
Anatomy of in memory processing in Spark
Anatomy of in memory processing in SparkAnatomy of in memory processing in Spark
Anatomy of in memory processing in Spark
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKResilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARK
 
Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
 

Similar to Ranking the Web with Spark

Information Architecture - Don't Make Me Think
Information Architecture - Don't Make Me ThinkInformation Architecture - Don't Make Me Think
Information Architecture - Don't Make Me ThinkKerry Dirks MCPS MS
 
2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato ReviewHang Li
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Talent42
 
SPLive Orlando - Beyond the Search Center - Application or Solution?
SPLive Orlando - Beyond the Search Center - Application or Solution?SPLive Orlando - Beyond the Search Center - Application or Solution?
SPLive Orlando - Beyond the Search Center - Application or Solution?Agnes Molnar
 
Feature driven agile oriented web applications
Feature driven agile oriented web applicationsFeature driven agile oriented web applications
Feature driven agile oriented web applicationsRam G Athreya
 
Search Engine Optimization (SEO) 101
Search Engine Optimization (SEO) 101Search Engine Optimization (SEO) 101
Search Engine Optimization (SEO) 101pointit
 
SEO in the Age of Artificial Intelligence | How AI influences Search
SEO in the Age of Artificial Intelligence | How AI influences SearchSEO in the Age of Artificial Intelligence | How AI influences Search
SEO in the Age of Artificial Intelligence | How AI influences SearchPhilipp Klöckner
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataAndy Stretton
 
Arron daniels 1 pager researching the tech talent market
Arron daniels 1 pager   researching the tech talent marketArron daniels 1 pager   researching the tech talent market
Arron daniels 1 pager researching the tech talent marketTalent42
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Search Analytics for Fun and Profit
Search Analytics for Fun and ProfitSearch Analytics for Fun and Profit
Search Analytics for Fun and ProfitLouis Rosenfeld
 
The Human Side of Productivity
The Human Side of ProductivityThe Human Side of Productivity
The Human Side of ProductivityTathagat Varma
 
Tri vuong-resume
Tri vuong-resumeTri vuong-resume
Tri vuong-resumeTri Vuong
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopPeter Skomoroch
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialBarbara Starr
 
Search Analytics: Powerful diagnostics for your site
Search Analytics:  Powerful diagnostics for your siteSearch Analytics:  Powerful diagnostics for your site
Search Analytics: Powerful diagnostics for your siteLouis Rosenfeld
 
Data Driven Design: Using Web Analytics to Improve Information Architectures
Data Driven Design: Using Web Analytics to Improve Information ArchitecturesData Driven Design: Using Web Analytics to Improve Information Architectures
Data Driven Design: Using Web Analytics to Improve Information ArchitecturesAndrea Wiggins
 
A fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP SpainA fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP SpainChristian Martorella
 

Similar to Ranking the Web with Spark (20)

Information Architecture - Don't Make Me Think
Information Architecture - Don't Make Me ThinkInformation Architecture - Don't Make Me Think
Information Architecture - Don't Make Me Think
 
2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015
 
SPLive Orlando - Beyond the Search Center - Application or Solution?
SPLive Orlando - Beyond the Search Center - Application or Solution?SPLive Orlando - Beyond the Search Center - Application or Solution?
SPLive Orlando - Beyond the Search Center - Application or Solution?
 
Feature driven agile oriented web applications
Feature driven agile oriented web applicationsFeature driven agile oriented web applications
Feature driven agile oriented web applications
 
Search Engine Optimization (SEO) 101
Search Engine Optimization (SEO) 101Search Engine Optimization (SEO) 101
Search Engine Optimization (SEO) 101
 
SEO in the Age of Artificial Intelligence | How AI influences Search
SEO in the Age of Artificial Intelligence | How AI influences SearchSEO in the Age of Artificial Intelligence | How AI influences Search
SEO in the Age of Artificial Intelligence | How AI influences Search
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
 
Arron daniels 1 pager researching the tech talent market
Arron daniels 1 pager   researching the tech talent marketArron daniels 1 pager   researching the tech talent market
Arron daniels 1 pager researching the tech talent market
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Search Analytics for Fun and Profit
Search Analytics for Fun and ProfitSearch Analytics for Fun and Profit
Search Analytics for Fun and Profit
 
The Human Side of Productivity
The Human Side of ProductivityThe Human Side of Productivity
The Human Side of Productivity
 
Tri vuong-resume
Tri vuong-resumeTri vuong-resume
Tri vuong-resume
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
Search Analytics: Powerful diagnostics for your site
Search Analytics:  Powerful diagnostics for your siteSearch Analytics:  Powerful diagnostics for your site
Search Analytics: Powerful diagnostics for your site
 
Data Driven Design: Using Web Analytics to Improve Information Architectures
Data Driven Design: Using Web Analytics to Improve Information ArchitecturesData Driven Design: Using Web Analytics to Improve Information Architectures
Data Driven Design: Using Web Analytics to Improve Information Architectures
 
A fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP SpainA fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP Spain
 

More from Sylvain Zimmer

Developer-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneDeveloper-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneSylvain Zimmer
 
[fr] Introduction et Live-code Backbone.js à DevoxxFR 2013
[fr] Introduction et Live-code Backbone.js à DevoxxFR 2013[fr] Introduction et Live-code Backbone.js à DevoxxFR 2013
[fr] Introduction et Live-code Backbone.js à DevoxxFR 2013Sylvain Zimmer
 
140byt.es - The Dark Side of Javascript
140byt.es - The Dark Side of Javascript140byt.es - The Dark Side of Javascript
140byt.es - The Dark Side of JavascriptSylvain Zimmer
 
Joshfire Framework 0.9 Technical Overview
Joshfire Framework 0.9 Technical OverviewJoshfire Framework 0.9 Technical Overview
Joshfire Framework 0.9 Technical OverviewSylvain Zimmer
 
Javascript Views, Client-side or Server-side with NodeJS
Javascript Views, Client-side or Server-side with NodeJSJavascript Views, Client-side or Server-side with NodeJS
Javascript Views, Client-side or Server-side with NodeJSSylvain Zimmer
 
no.de quick presentation at #ParisJS 4
no.de quick presentation at #ParisJS 4no.de quick presentation at #ParisJS 4
no.de quick presentation at #ParisJS 4Sylvain Zimmer
 
Kinect + Javascript tech talk at #ParisJS Jan 2011
Kinect + Javascript tech talk at #ParisJS Jan 2011Kinect + Javascript tech talk at #ParisJS Jan 2011
Kinect + Javascript tech talk at #ParisJS Jan 2011Sylvain Zimmer
 
Web Crawling with NodeJS
Web Crawling with NodeJSWeb Crawling with NodeJS
Web Crawling with NodeJSSylvain Zimmer
 
Archicamp présentation
Archicamp présentationArchicamp présentation
Archicamp présentationSylvain Zimmer
 
Twisted presentation & Jamendo usecases
Twisted presentation & Jamendo usecasesTwisted presentation & Jamendo usecases
Twisted presentation & Jamendo usecasesSylvain Zimmer
 

More from Sylvain Zimmer (10)

Developer-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneDeveloper-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing one
 
[fr] Introduction et Live-code Backbone.js à DevoxxFR 2013
[fr] Introduction et Live-code Backbone.js à DevoxxFR 2013[fr] Introduction et Live-code Backbone.js à DevoxxFR 2013
[fr] Introduction et Live-code Backbone.js à DevoxxFR 2013
 
140byt.es - The Dark Side of Javascript
140byt.es - The Dark Side of Javascript140byt.es - The Dark Side of Javascript
140byt.es - The Dark Side of Javascript
 
Joshfire Framework 0.9 Technical Overview
Joshfire Framework 0.9 Technical OverviewJoshfire Framework 0.9 Technical Overview
Joshfire Framework 0.9 Technical Overview
 
Javascript Views, Client-side or Server-side with NodeJS
Javascript Views, Client-side or Server-side with NodeJSJavascript Views, Client-side or Server-side with NodeJS
Javascript Views, Client-side or Server-side with NodeJS
 
no.de quick presentation at #ParisJS 4
no.de quick presentation at #ParisJS 4no.de quick presentation at #ParisJS 4
no.de quick presentation at #ParisJS 4
 
Kinect + Javascript tech talk at #ParisJS Jan 2011
Kinect + Javascript tech talk at #ParisJS Jan 2011Kinect + Javascript tech talk at #ParisJS Jan 2011
Kinect + Javascript tech talk at #ParisJS Jan 2011
 
Web Crawling with NodeJS
Web Crawling with NodeJSWeb Crawling with NodeJS
Web Crawling with NodeJS
 
Archicamp présentation
Archicamp présentationArchicamp présentation
Archicamp présentation
 
Twisted presentation & Jamendo usecases
Twisted presentation & Jamendo usecasesTwisted presentation & Jamendo usecases
Twisted presentation & Jamendo usecases
 

Recently uploaded

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 

Recently uploaded (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

Ranking the Web with Spark