SlideShare a Scribd company logo
1 of 38
thinking in key-value stores http://developers.hover.in Bhasker V Kode co-founder  & CTO at hover.in at AWS Chennai November 10th, 2009
summary of tech at hover.in ,[object Object],[object Object],[object Object],http://developers.hover.in
typical webapp use-cases ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://developers.hover.in
typical crawler use-cases ,[object Object],[object Object],[object Object],[object Object],http://developers.hover.in
http://developers.hover.in crawler infrastructure at hover.in ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
http://developers.hover.in crawler infrastructure at hover.in ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
http://developers.hover.in # flowcontrol
http://developers.hover.in “ The domino effect is a chain reaction that occurs when a small change causes a similar change nearby, which then will cause another similar change.” via wikipedia page on Domino Effect
[object Object],[object Object],[object Object],[object Object],http://developers.hover.in
http://developers.hover.in crawler infrastructure at hover.in ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
spawning, in practice ,[object Object],[object Object],http://developers.hover.in
http://developers.hover.in “ the small portions of the program which cannot be parallelized will limit the overall speed-up available ” -  Amdahl's  Law wrt parallel computing...
http://developers.hover.in ,[object Object],[object Object],[object Object],[object Object],[object Object]
http://developers.hover.in Further , A & B need'nt be serial, but a batch of A's & a parallel batch of B's running tasks serially. Flowcontrol implemented by tail-recursive servers handling bursty AND slow traffic well.  flowcontrol 1 flowcontrol 2 where  is free cpu / time for handling  2x  B tasks
http://developers.hover.in crawler infrastructure at hover.in ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
http://developers.hover.in # distributed vs local
http://developers.hover.in # replication vs location transparency
http://developers.hover.in how it works at hover.in:  I  , “ node X ” will rather do tasks locally since this data is designated to me, rather than rpc'ing all over like crazy ! The meta-data of  which node does what calculated by   (a) statically assigned  (our choice)   (b) or a hash fn  (c) dynamically reassigned  (maybe later) which is  made available to nodes  by (a) replication or (b) location transparency  (our choice) Node1 Node2 NodeN
http://developers.hover.in crawler infrastructure at hover.in ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
http://developers.hover.in # persistent data vs cyclic queues
revisiting  typical webapp use-cases ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://developers.hover.in
fixed-length datastructures to the rescue http://developers.hover.in
http://developers.hover.in fixed-length datastores to the rescue ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],NEW!
http://developers.hover.in crawler infrastructure at hover.in ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
http://developers.hover.in # in-memory vs disk
http://developers.hover.in trends in key-value stores ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
in-memory is the new embedded ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://developers.hover.in
http://developers.hover.in popular key-value stores ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],http://developers.hover.in
[object Object],[object Object],[object Object],http://developers.hover.in
7 rules of in-memory capacity planning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://developers.hover.in
interesting key-value store features ,[object Object],[object Object],[object Object],http://developers.hover.in
interesting key-value store features - 2 ,[object Object],[object Object],[object Object],http://developers.hover.in
http://developers.hover.in crawler infrastructure at hover.in ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
brief introduction to hover.in choose words from your blog, & decide what content / ad you want when you hover* over it * or other events like click,right click,etc http://developers.hover.in or... the worlds first user-engagement platform for brands via in-text broadcasting or lets web publishers push client-side event handling to the cloud ,  to run various rich applications called  hoverlets demo at  http://start.hover.in/  and  http://hover.in/demo more at  http://hover.in  ,  http://developers.hover.in/blog/
summary of our erlang modules ,[object Object],http://developers.hover.in
thank you http://developers.hover.in
references ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://developers.hover.in

More Related Content

What's hot

Functional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksFunctional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksHuafeng Wang
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
 
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...PROIDEA
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresSteve Loughran
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopHadoop User Group
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloudQubole
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Sadayuki Furuhashi
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoopChristophe Marchal
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015N Masahiro
 
Distributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentdDistributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentdSATOSHI TAGOMORI
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka Dori Waldman
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 

What's hot (20)

Functional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksFunctional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming Frameworks
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
January 2011 HUG: Pig Presentation
January 2011 HUG: Pig PresentationJanuary 2011 HUG: Pig Presentation
January 2011 HUG: Pig Presentation
 
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloud
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
 
January 2011 HUG: Howl Presentation
January 2011 HUG: Howl PresentationJanuary 2011 HUG: Howl Presentation
January 2011 HUG: Howl Presentation
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
 
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big DataPowering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
Distributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentdDistributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentd
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 

Viewers also liked

Pebs14 hubbub bike summit
Pebs14   hubbub bike summitPebs14   hubbub bike summit
Pebs14 hubbub bike summitBirgit Hess
 
Scientist and inventor
Scientist and inventorScientist and inventor
Scientist and inventorLee W.
 
Data Networks: Next-Generation Optical Access toward 10 Gb/s Everywhere
Data Networks: Next-Generation Optical Access toward 10 Gb/s EverywhereData Networks: Next-Generation Optical Access toward 10 Gb/s Everywhere
Data Networks: Next-Generation Optical Access toward 10 Gb/s EverywhereXi'an Jiaotong-Liverpool University
 
Short-Run Digital Book Printing
Short-Run Digital Book PrintingShort-Run Digital Book Printing
Short-Run Digital Book PrintingRISO printer
 
Thermal bimorph valve operated microthruster.
Thermal bimorph valve operated microthruster.Thermal bimorph valve operated microthruster.
Thermal bimorph valve operated microthruster.SAI SIVA
 
Secure develpment 2014
Secure develpment 2014Secure develpment 2014
Secure develpment 2014Ariel Evans
 
Power point presentation
Power point presentationPower point presentation
Power point presentationsaki-t
 
A brief introduction to Spintronics
A brief introduction to SpintronicsA brief introduction to Spintronics
A brief introduction to SpintronicsRam Kumar K R
 
National river linking project
National river linking projectNational river linking project
National river linking projectRaj_Anand
 
Project oxygen
Project oxygenProject oxygen
Project oxygenlinkoravi
 
Hyderabadi and Andhra Cuisine
Hyderabadi and Andhra CuisineHyderabadi and Andhra Cuisine
Hyderabadi and Andhra Cuisineshweta_1712
 
Presentation on smartphone by Apu
Presentation on smartphone by ApuPresentation on smartphone by Apu
Presentation on smartphone by ApuAshiqul Islam Apu
 
E-books Are Not the Future of Books
E-books Are Not the Future of BooksE-books Are Not the Future of Books
E-books Are Not the Future of BooksXavier Badosa
 

Viewers also liked (20)

Hubbub deck short
Hubbub deck shortHubbub deck short
Hubbub deck short
 
Technology Advancement Core: Our Process
Technology Advancement Core: Our ProcessTechnology Advancement Core: Our Process
Technology Advancement Core: Our Process
 
Pebs14 hubbub bike summit
Pebs14   hubbub bike summitPebs14   hubbub bike summit
Pebs14 hubbub bike summit
 
Scientist and inventor
Scientist and inventorScientist and inventor
Scientist and inventor
 
Bbr security
Bbr securityBbr security
Bbr security
 
Data Networks: Next-Generation Optical Access toward 10 Gb/s Everywhere
Data Networks: Next-Generation Optical Access toward 10 Gb/s EverywhereData Networks: Next-Generation Optical Access toward 10 Gb/s Everywhere
Data Networks: Next-Generation Optical Access toward 10 Gb/s Everywhere
 
Short-Run Digital Book Printing
Short-Run Digital Book PrintingShort-Run Digital Book Printing
Short-Run Digital Book Printing
 
Thermal bimorph valve operated microthruster.
Thermal bimorph valve operated microthruster.Thermal bimorph valve operated microthruster.
Thermal bimorph valve operated microthruster.
 
Powerpoint
PowerpointPowerpoint
Powerpoint
 
Secure develpment 2014
Secure develpment 2014Secure develpment 2014
Secure develpment 2014
 
Power point presentation
Power point presentationPower point presentation
Power point presentation
 
SHMcloud vision
SHMcloud visionSHMcloud vision
SHMcloud vision
 
A brief introduction to Spintronics
A brief introduction to SpintronicsA brief introduction to Spintronics
A brief introduction to Spintronics
 
National river linking project
National river linking projectNational river linking project
National river linking project
 
Project Oxyzen
Project OxyzenProject Oxyzen
Project Oxyzen
 
Project oxygen
Project oxygenProject oxygen
Project oxygen
 
Shaan-E-Awadh
Shaan-E-AwadhShaan-E-Awadh
Shaan-E-Awadh
 
Hyderabadi and Andhra Cuisine
Hyderabadi and Andhra CuisineHyderabadi and Andhra Cuisine
Hyderabadi and Andhra Cuisine
 
Presentation on smartphone by Apu
Presentation on smartphone by ApuPresentation on smartphone by Apu
Presentation on smartphone by Apu
 
E-books Are Not the Future of Books
E-books Are Not the Future of BooksE-books Are Not the Future of Books
E-books Are Not the Future of Books
 

Similar to Key-value stores for efficient crawling and indexing

Scaleable PHP Applications in Kubernetes
Scaleable PHP Applications in KubernetesScaleable PHP Applications in Kubernetes
Scaleable PHP Applications in KubernetesRobert Lemke
 
Infrastructure as code, using Terraform
Infrastructure as code, using TerraformInfrastructure as code, using Terraform
Infrastructure as code, using TerraformHarkamal Singh
 
21 Www Web Services
21 Www Web Services21 Www Web Services
21 Www Web Servicesroyans
 
Apache Traffic Server
Apache Traffic ServerApache Traffic Server
Apache Traffic Serversupertom
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar clusterShivji Kumar Jha
 
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxDowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxLex Avstreikh
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkDatabricks
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Provectus
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 
Technology Stack Discussion
Technology Stack DiscussionTechnology Stack Discussion
Technology Stack DiscussionZaiyang Li
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
Knock Knock, Who’s There? With Justin Chen and Dhruv Jauhar | Current 2022
Knock Knock, Who’s There? With Justin Chen and Dhruv Jauhar | Current 2022Knock Knock, Who’s There? With Justin Chen and Dhruv Jauhar | Current 2022
Knock Knock, Who’s There? With Justin Chen and Dhruv Jauhar | Current 2022HostedbyConfluent
 
Presentation on Japanese doc sprint
Presentation on Japanese doc sprintPresentation on Japanese doc sprint
Presentation on Japanese doc sprintGo Chiba
 
Headless approach for offloading heavy tasks in Magento
Headless approach for offloading heavy tasks in MagentoHeadless approach for offloading heavy tasks in Magento
Headless approach for offloading heavy tasks in MagentoSander Mangel
 
Globo.com & Varnish
Globo.com & VarnishGlobo.com & Varnish
Globo.com & Varnishlokama
 
Kubernetes for the PHP developer
Kubernetes for the PHP developerKubernetes for the PHP developer
Kubernetes for the PHP developerPaul Czarkowski
 

Similar to Key-value stores for efficient crawling and indexing (20)

Scaleable PHP Applications in Kubernetes
Scaleable PHP Applications in KubernetesScaleable PHP Applications in Kubernetes
Scaleable PHP Applications in Kubernetes
 
Infrastructure as code, using Terraform
Infrastructure as code, using TerraformInfrastructure as code, using Terraform
Infrastructure as code, using Terraform
 
21 Www Web Services
21 Www Web Services21 Www Web Services
21 Www Web Services
 
Apache Traffic Server
Apache Traffic ServerApache Traffic Server
Apache Traffic Server
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
 
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxDowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache Spark
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
Technology Stack Discussion
Technology Stack DiscussionTechnology Stack Discussion
Technology Stack Discussion
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
Scaling PHP apps
Scaling PHP appsScaling PHP apps
Scaling PHP apps
 
Knock Knock, Who’s There? With Justin Chen and Dhruv Jauhar | Current 2022
Knock Knock, Who’s There? With Justin Chen and Dhruv Jauhar | Current 2022Knock Knock, Who’s There? With Justin Chen and Dhruv Jauhar | Current 2022
Knock Knock, Who’s There? With Justin Chen and Dhruv Jauhar | Current 2022
 
Advanced JavaScript
Advanced JavaScriptAdvanced JavaScript
Advanced JavaScript
 
Presentation on Japanese doc sprint
Presentation on Japanese doc sprintPresentation on Japanese doc sprint
Presentation on Japanese doc sprint
 
Headless approach for offloading heavy tasks in Magento
Headless approach for offloading heavy tasks in MagentoHeadless approach for offloading heavy tasks in Magento
Headless approach for offloading heavy tasks in Magento
 
Globo.com & Varnish
Globo.com & VarnishGlobo.com & Varnish
Globo.com & Varnish
 
Caching By Nyros Developer
Caching By Nyros DeveloperCaching By Nyros Developer
Caching By Nyros Developer
 
Presemtation Tier Optimizations
Presemtation Tier OptimizationsPresemtation Tier Optimizations
Presemtation Tier Optimizations
 
Kubernetes for the PHP developer
Kubernetes for the PHP developerKubernetes for the PHP developer
Kubernetes for the PHP developer
 

More from Bhasker Kode

Kafka & Storm - FifthElephant 2015 by @bhaskerkode, Helpshift
Kafka & Storm - FifthElephant 2015 by @bhaskerkode, HelpshiftKafka & Storm - FifthElephant 2015 by @bhaskerkode, Helpshift
Kafka & Storm - FifthElephant 2015 by @bhaskerkode, HelpshiftBhasker Kode
 
Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)
Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)
Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)Bhasker Kode
 
Recursion & Erlang, FunctionalConf 14, Bangalore
Recursion & Erlang, FunctionalConf 14, BangaloreRecursion & Erlang, FunctionalConf 14, Bangalore
Recursion & Erlang, FunctionalConf 14, BangaloreBhasker Kode
 
Parsing binaries and protocols with erlang
Parsing binaries and protocols with erlangParsing binaries and protocols with erlang
Parsing binaries and protocols with erlangBhasker Kode
 
hover.in at CUFP 2009
hover.in at CUFP 2009hover.in at CUFP 2009
hover.in at CUFP 2009Bhasker Kode
 
in-memory capacity planning, Erlang Factory London 09
in-memory capacity planning, Erlang Factory London 09in-memory capacity planning, Erlang Factory London 09
in-memory capacity planning, Erlang Factory London 09Bhasker Kode
 
erlang at hover.in , Devcamp Blr 09
erlang at hover.in , Devcamp Blr 09erlang at hover.in , Devcamp Blr 09
erlang at hover.in , Devcamp Blr 09Bhasker Kode
 
end user programming & yahoo pipes
end user programming & yahoo pipesend user programming & yahoo pipes
end user programming & yahoo pipesBhasker Kode
 
Overview of End-user Programming | Talk at DesignCamp Bangalore
Overview of End-user Programming | Talk at DesignCamp BangaloreOverview of End-user Programming | Talk at DesignCamp Bangalore
Overview of End-user Programming | Talk at DesignCamp BangaloreBhasker Kode
 

More from Bhasker Kode (9)

Kafka & Storm - FifthElephant 2015 by @bhaskerkode, Helpshift
Kafka & Storm - FifthElephant 2015 by @bhaskerkode, HelpshiftKafka & Storm - FifthElephant 2015 by @bhaskerkode, Helpshift
Kafka & Storm - FifthElephant 2015 by @bhaskerkode, Helpshift
 
Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)
Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)
Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)
 
Recursion & Erlang, FunctionalConf 14, Bangalore
Recursion & Erlang, FunctionalConf 14, BangaloreRecursion & Erlang, FunctionalConf 14, Bangalore
Recursion & Erlang, FunctionalConf 14, Bangalore
 
Parsing binaries and protocols with erlang
Parsing binaries and protocols with erlangParsing binaries and protocols with erlang
Parsing binaries and protocols with erlang
 
hover.in at CUFP 2009
hover.in at CUFP 2009hover.in at CUFP 2009
hover.in at CUFP 2009
 
in-memory capacity planning, Erlang Factory London 09
in-memory capacity planning, Erlang Factory London 09in-memory capacity planning, Erlang Factory London 09
in-memory capacity planning, Erlang Factory London 09
 
erlang at hover.in , Devcamp Blr 09
erlang at hover.in , Devcamp Blr 09erlang at hover.in , Devcamp Blr 09
erlang at hover.in , Devcamp Blr 09
 
end user programming & yahoo pipes
end user programming & yahoo pipesend user programming & yahoo pipes
end user programming & yahoo pipes
 
Overview of End-user Programming | Talk at DesignCamp Bangalore
Overview of End-user Programming | Talk at DesignCamp BangaloreOverview of End-user Programming | Talk at DesignCamp Bangalore
Overview of End-user Programming | Talk at DesignCamp Bangalore
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Key-value stores for efficient crawling and indexing

  • 1. thinking in key-value stores http://developers.hover.in Bhasker V Kode co-founder & CTO at hover.in at AWS Chennai November 10th, 2009
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 8. http://developers.hover.in “ The domino effect is a chain reaction that occurs when a small change causes a similar change nearby, which then will cause another similar change.” via wikipedia page on Domino Effect
  • 9.
  • 10.
  • 11.
  • 12. http://developers.hover.in “ the small portions of the program which cannot be parallelized will limit the overall speed-up available ” - Amdahl's Law wrt parallel computing...
  • 13.
  • 14. http://developers.hover.in Further , A & B need'nt be serial, but a batch of A's & a parallel batch of B's running tasks serially. Flowcontrol implemented by tail-recursive servers handling bursty AND slow traffic well. flowcontrol 1 flowcontrol 2 where is free cpu / time for handling 2x B tasks
  • 15.
  • 17. http://developers.hover.in # replication vs location transparency
  • 18. http://developers.hover.in how it works at hover.in: I , “ node X ” will rather do tasks locally since this data is designated to me, rather than rpc'ing all over like crazy ! The meta-data of which node does what calculated by (a) statically assigned (our choice) (b) or a hash fn (c) dynamically reassigned (maybe later) which is made available to nodes by (a) replication or (b) location transparency (our choice) Node1 Node2 NodeN
  • 19.
  • 21.
  • 22. fixed-length datastructures to the rescue http://developers.hover.in
  • 23.
  • 24.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35. brief introduction to hover.in choose words from your blog, & decide what content / ad you want when you hover* over it * or other events like click,right click,etc http://developers.hover.in or... the worlds first user-engagement platform for brands via in-text broadcasting or lets web publishers push client-side event handling to the cloud , to run various rich applications called hoverlets demo at http://start.hover.in/ and http://hover.in/demo more at http://hover.in , http://developers.hover.in/blog/
  • 36.
  • 38.