Personal Information
Organization / Workplace
Espoo Finland
Occupation
Big Data Analytics Architect
Industry
Technology / Software / Internet
Website
fi.linkedin.com/in/dshestakov/
Tags
deep web
web crawling
hidden web
web crawler
web databases
search interfaces
web forms
web size
collaborative crawling
intelligent crawling
web metrics
apache hadoop
hadoop tuning
image similarity search
hadoop
mapreduce
tutorial
web ecosystem
deep web characterization
review
adaptive web crawling
atlanta
crawler architecture
crawling strategies
hadoop smart deployment
russian web
image search
image retrieval
hadoop job execution
map waves
image indexing
big data
web
form classifier
web data
hadoop cluster
hadoop jobs
hadoop optimization
hadoop job history
hadoop summit
amsterdam
hadoop joins
hadoop monitoring
algorithms
web engineering
web frontier
web robots
web spiders
spiders
robots
web coverage
web link structure
distributed web crawling
url frontier
stratified random sampling
random sampling
finland
web intelligence
wi-iat
usa
web structure
adaptive crawling
incremental crawling
focused crawling
publicly indexable web
web database
search forms
denmark
google
russian deep web
interface crawlers
perl
non-html forms
javascript-rich
web form crawler
mysql
dequel
deque
form query language
invisible web
ip random sampling
deep web size
dissertation
turku
lectio praecursoria
thesis
phd
js-rich
web crawlers
search interface
decision tree
aalborg
crawling algorithms
hdfs block size
hdfs
grid5k
scalability
dns-load balancing
toulouse
ip address
web characterization
host-ip clustering
virtual hosting
stratified sampling
high-dimensional indexing
multimedia retrieval
multithreaded mapper
smart deployment
mapfile
best practice
See more
Presentations
(8)Documents
(3)Likes
(64)Viimeinen keisari
Sophia Shestakova
•
5 years ago
How Will AI Change the Role of the Data Scientist?
Hugo Gävert
•
7 years ago
10 more lessons learned from building Machine Learning systems
Xavier Amatriain
•
8 years ago
Apache Hadoop at 10
Cloudera, Inc.
•
8 years ago
Enabling Python to be a Better Big Data Citizen
Wes McKinney
•
8 years ago
2016 Spark Summit East Keynote: Matei Zaharia
Databricks
•
8 years ago
Node Labels in YARN
DataWorks Summit
•
8 years ago
Nl HUG 2016 Feb Hadoop security from the trenches
Bolke de Bruin
•
8 years ago
Ibis: Scaling Python Analytics on Hadoop and Impala
Wes McKinney
•
8 years ago
Helsinki Spark Meetup Nov 20 2015
Chris Fregly
•
8 years ago
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Cloudera, Inc.
•
8 years ago
Hadoop Backup and Disaster Recovery
Cloudera, Inc.
•
11 years ago
Frontera-Open Source Large Scale Web Crawling Framework
sixtyone
•
8 years ago
Interactive Apache Spark in Your Browser
Cloudera, Inc.
•
8 years ago
PySpark Best Practices
Cloudera, Inc.
•
8 years ago
SQL-on-Hadoop Tutorial
Daniel Abadi
•
8 years ago
Talk given at Internet of Things Helsinki Meetup held at the premise of Zalando
Nissanka Wickremasinghe
•
8 years ago
Distro-independent Hadoop cluster management
DataWorks Summit
•
8 years ago
Apache HBase Performance Tuning
Lars Hofhansl
•
9 years ago
Sampling national deep Web
Denis Shestakov
•
12 years ago
Intelligent web crawling
Denis Shestakov
•
10 years ago
Examplar-based inpainting
Olivier Le Meur
•
9 years ago
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
•
9 years ago
Search Interfaces on the Web: Querying and Characterizing, PhD dissertation
Denis Shestakov
•
10 years ago
Terabyte-scale image similarity search: experience and best practice
Denis Shestakov
•
10 years ago
Current challenges in web crawling
Denis Shestakov
•
10 years ago
The Evolution of Hadoop at Spotify - Through Failures and Pain
Rafał Wojdyła
•
9 years ago
Improving Hadoop Cluster Performance via Linux Configuration
DataWorks Summit
•
9 years ago
Graph Structure in the Web - Revisited. WWW2014 Web Science Track
Chris Bizer
•
10 years ago
Personal Information
Organization / Workplace
Espoo Finland
Occupation
Big Data Analytics Architect
Industry
Technology / Software / Internet
Website
fi.linkedin.com/in/dshestakov/
Tags
deep web
web crawling
hidden web
web crawler
web databases
search interfaces
web forms
web size
collaborative crawling
intelligent crawling
web metrics
apache hadoop
hadoop tuning
image similarity search
hadoop
mapreduce
tutorial
web ecosystem
deep web characterization
review
adaptive web crawling
atlanta
crawler architecture
crawling strategies
hadoop smart deployment
russian web
image search
image retrieval
hadoop job execution
map waves
image indexing
big data
web
form classifier
web data
hadoop cluster
hadoop jobs
hadoop optimization
hadoop job history
hadoop summit
amsterdam
hadoop joins
hadoop monitoring
algorithms
web engineering
web frontier
web robots
web spiders
spiders
robots
web coverage
web link structure
distributed web crawling
url frontier
stratified random sampling
random sampling
finland
web intelligence
wi-iat
usa
web structure
adaptive crawling
incremental crawling
focused crawling
publicly indexable web
web database
search forms
denmark
google
russian deep web
interface crawlers
perl
non-html forms
javascript-rich
web form crawler
mysql
dequel
deque
form query language
invisible web
ip random sampling
deep web size
dissertation
turku
lectio praecursoria
thesis
phd
js-rich
web crawlers
search interface
decision tree
aalborg
crawling algorithms
hdfs block size
hdfs
grid5k
scalability
dns-load balancing
toulouse
ip address
web characterization
host-ip clustering
virtual hosting
stratified sampling
high-dimensional indexing
multimedia retrieval
multithreaded mapper
smart deployment
mapfile
best practice
See more