SlideShare a Scribd company logo
1 of 16
Download to read offline
Horizontally Scalable
Compute Infrastructure
Yosua Michael Maranatha
- What?
- Why?
- Problem Definition
- Example Problem: Word Counting
- Using single machines
- Proposed HSCI
- Using proposed HSCI
- Other use cases: Crawler
- Questions?
Content
What is Horizontally Scalable?
- Horizontally Scalable system is a system that
able to have more capacity by adding more
machines
- As examples if one machine can handle a load
of 100 rps, then we can use ten identical
machine to handle 1000 rps
What is Horizontally Scalable?
Reference: image_source
What is Horizontally Scalable
Compute Infrastructure?
- Compute Infrastructure is a infrastructure that
designed for computation or processing
- A Horizontally Scalable Compute Infrastructure
(HSCI) able to compute or process more things
by adding more machines
Why do we need HSCI?
● Scaling up computation vertically easily hit the
ceiling since the CPU computation speed
growth is relatively slow
● Also, scaling up vertically is not flexible and
usually cause down time during the time we
scale up or down
HSCI Design Problem Definitions
● Have a lot of independent tasks
● Have a bunch of machines
● Want to process those tasks with the machines
● Need a way to distribute tasks to the machines
nicely (balanced and robust)
● Able to easily add machine to speed up the
overall process if needed
Example Problem: Word Counting
Suppose we have more than hundred millions of
text file in GCS.
We want to count the term frequency on each word
on all the files.
Using a Single Machine
● Using a single machine we can loop each file in
the GCS
● For each file we can pre-process it by make it all
lower case and split by the word separator
(space, tab, comma, etc)
● Then we store and update the count for each
word using a hash-map or dictionary
● This methods will work, however it will be very
time consuming . . .
Proposed HSCI
Our proposed HSCI is:
- Breakdown the problems into independent
tasks
- Put the tasks into message queues (We use
Google Pub/Sub)
- The processors will get the task from the queue
and process it accordingly
- If the tasks is multi-layered, then the processors
will put the next task into the queue again
Proposed HSCI
Using proposed HSCI
● First we will count number of words on each
file, we put the file path into Pub/Sub
● The processor engine get the file, count the
word frequency, increment the frequency of
each words on memcached
● We have the final results on the memcached :)
● We can add more processing engine as needed
(all of them are stateless and identical to each
other)
Other use cases: Crawler
We are hiring! 1. Data Engineer
2. BI Engineer
3. Data Scientist
4. Software Engineer (Frontend, Backend & Mobile
Application)
Email Us on joindev@kumparan.com
THANK YOU!
QUESTIONS ?

More Related Content

Similar to Horizontally Scalable Compute Infrastructure

Similar to Horizontally Scalable Compute Infrastructure (20)

Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & Manipulation
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
Schedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterSchedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop cluster
 
Big table
Big tableBig table
Big table
 
HPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 HighlightsHPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 Highlights
 
Bt0070
Bt0070Bt0070
Bt0070
 
Apache Traffic Server
Apache Traffic ServerApache Traffic Server
Apache Traffic Server
 
operating system
operating systemoperating system
operating system
 
Distributed Computing & MapReduce
Distributed Computing & MapReduceDistributed Computing & MapReduce
Distributed Computing & MapReduce
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
Resource scheduling
Resource schedulingResource scheduling
Resource scheduling
 
Resource scheduling
Resource schedulingResource scheduling
Resource scheduling
 
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
 
Introduction to apache horn (incubating)
Introduction to apache horn (incubating)Introduction to apache horn (incubating)
Introduction to apache horn (incubating)
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAs
 
Hadoop bangalore-meetup-dec-2011-yoda
Hadoop bangalore-meetup-dec-2011-yodaHadoop bangalore-meetup-dec-2011-yoda
Hadoop bangalore-meetup-dec-2011-yoda
 

Recently uploaded

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Horizontally Scalable Compute Infrastructure

  • 2. - What? - Why? - Problem Definition - Example Problem: Word Counting - Using single machines - Proposed HSCI - Using proposed HSCI - Other use cases: Crawler - Questions? Content
  • 3. What is Horizontally Scalable? - Horizontally Scalable system is a system that able to have more capacity by adding more machines - As examples if one machine can handle a load of 100 rps, then we can use ten identical machine to handle 1000 rps
  • 4. What is Horizontally Scalable? Reference: image_source
  • 5. What is Horizontally Scalable Compute Infrastructure? - Compute Infrastructure is a infrastructure that designed for computation or processing - A Horizontally Scalable Compute Infrastructure (HSCI) able to compute or process more things by adding more machines
  • 6. Why do we need HSCI? ● Scaling up computation vertically easily hit the ceiling since the CPU computation speed growth is relatively slow ● Also, scaling up vertically is not flexible and usually cause down time during the time we scale up or down
  • 7. HSCI Design Problem Definitions ● Have a lot of independent tasks ● Have a bunch of machines ● Want to process those tasks with the machines ● Need a way to distribute tasks to the machines nicely (balanced and robust) ● Able to easily add machine to speed up the overall process if needed
  • 8. Example Problem: Word Counting Suppose we have more than hundred millions of text file in GCS. We want to count the term frequency on each word on all the files.
  • 9. Using a Single Machine ● Using a single machine we can loop each file in the GCS ● For each file we can pre-process it by make it all lower case and split by the word separator (space, tab, comma, etc) ● Then we store and update the count for each word using a hash-map or dictionary ● This methods will work, however it will be very time consuming . . .
  • 10. Proposed HSCI Our proposed HSCI is: - Breakdown the problems into independent tasks - Put the tasks into message queues (We use Google Pub/Sub) - The processors will get the task from the queue and process it accordingly - If the tasks is multi-layered, then the processors will put the next task into the queue again
  • 12. Using proposed HSCI ● First we will count number of words on each file, we put the file path into Pub/Sub ● The processor engine get the file, count the word frequency, increment the frequency of each words on memcached ● We have the final results on the memcached :) ● We can add more processing engine as needed (all of them are stateless and identical to each other)
  • 13. Other use cases: Crawler
  • 14. We are hiring! 1. Data Engineer 2. BI Engineer 3. Data Scientist 4. Software Engineer (Frontend, Backend & Mobile Application) Email Us on joindev@kumparan.com