SlideShare a Scribd company logo
1 of 1
Datalake
File
Reading
service
C1 C2
C3 C4
C5
LB
Work
Counting
Service
Work
Counting
Service
Work
Counting
Service
C1, FTP
C1
DB
Count
Handler
{
chunkId: c1,
Url: chunk_url
}
Very Large file word count
system design
Working
1. The file reading service is responsible to
read the file of some size and divide into
chunks.
2. Read from end of file the last word, if its
incomplete(no space/ fullstops at end) then
take that word out and append it to next
chunk.
3. Word count service will get a POST
request with a payload of chunk_id and
chunk_url to process. It can respond with
202 accepted.
4. Word count service download the file from
the url using ftp.
5. Word count service read the file and can
mark n indexes where n threads can work
in parallel. Say if file is of 1 gb then 7
indices are found(line number and index
where word start). 8 threads can be
created then and they can operate from 0-
index1, index1-index2 and so on.
For ex. If file has 1000 lines, then 1000/8 =
125. So the indices would be 0 then at line
124 first index where word starts. So first
thread would process form index 0 to 125 line
index2.
6. When word count service is done with
processing, it will pass on the info to count
handler which will update db by adding the
count to existing.

More Related Content

What's hot

Chapter28 data-file-handling
Chapter28 data-file-handlingChapter28 data-file-handling
Chapter28 data-file-handling
Deepak Singh
 
Files in c++ ppt
Files in c++ pptFiles in c++ ppt
Files in c++ ppt
Kumar
 

What's hot (20)

Link list assi
Link list assiLink list assi
Link list assi
 
File Handling in C++
File Handling in C++File Handling in C++
File Handling in C++
 
Filehadnling
FilehadnlingFilehadnling
Filehadnling
 
7 Data File Handling
7 Data File Handling7 Data File Handling
7 Data File Handling
 
File handling in c++
File handling in c++File handling in c++
File handling in c++
 
Chapter28 data-file-handling
Chapter28 data-file-handlingChapter28 data-file-handling
Chapter28 data-file-handling
 
File handling in c++
File handling in c++File handling in c++
File handling in c++
 
E mail forensics
E mail forensicsE mail forensics
E mail forensics
 
Elastic Search Indexing Internals
Elastic Search Indexing InternalsElastic Search Indexing Internals
Elastic Search Indexing Internals
 
File Pointers
File PointersFile Pointers
File Pointers
 
Stream classes in C++
Stream classes in C++Stream classes in C++
Stream classes in C++
 
Introduction to the core.ns application framework
Introduction to the core.ns application frameworkIntroduction to the core.ns application framework
Introduction to the core.ns application framework
 
Data file handling in c++
Data file handling in c++Data file handling in c++
Data file handling in c++
 
python file handling
python file handlingpython file handling
python file handling
 
Ch1 computer networks internet_encapsulation_4
Ch1 computer networks internet_encapsulation_4Ch1 computer networks internet_encapsulation_4
Ch1 computer networks internet_encapsulation_4
 
Files in c++ ppt
Files in c++ pptFiles in c++ ppt
Files in c++ ppt
 
Data file handling in python reading & writing methods
Data file handling in python reading & writing methodsData file handling in python reading & writing methods
Data file handling in python reading & writing methods
 
Filehandlinging cp2
Filehandlinging cp2Filehandlinging cp2
Filehandlinging cp2
 
Ipc in linux
Ipc in linuxIpc in linux
Ipc in linux
 
Basics of files and its functions with example
Basics of files and its functions with exampleBasics of files and its functions with example
Basics of files and its functions with example
 

Similar to Large file word count system design

INSTRUCTIONS For this assignment you will be generating all code on y.pdf
 INSTRUCTIONS For this assignment you will be generating all code on y.pdf INSTRUCTIONS For this assignment you will be generating all code on y.pdf
INSTRUCTIONS For this assignment you will be generating all code on y.pdf
adayarboot
 
Choose one of these three options A IPC using FIFO B Shar.pdf
Choose one of these three options A IPC using FIFO B Shar.pdfChoose one of these three options A IPC using FIFO B Shar.pdf
Choose one of these three options A IPC using FIFO B Shar.pdf
aghsports
 
The assigment is overdue now. I will up the price I am willing to pa.docx
The assigment is overdue now. I will up the price I am willing to pa.docxThe assigment is overdue now. I will up the price I am willing to pa.docx
The assigment is overdue now. I will up the price I am willing to pa.docx
rtodd17
 
Introductionto Xm Lmessaging
Introductionto Xm LmessagingIntroductionto Xm Lmessaging
Introductionto Xm Lmessaging
LiquidHub
 
This project explores usage of the IPC in the form of shared.pdf
This project explores usage of the IPC in the form of shared.pdfThis project explores usage of the IPC in the form of shared.pdf
This project explores usage of the IPC in the form of shared.pdf
adinathfashion1
 

Similar to Large file word count system design (20)

INSTRUCTIONS For this assignment you will be generating all code on y.pdf
 INSTRUCTIONS For this assignment you will be generating all code on y.pdf INSTRUCTIONS For this assignment you will be generating all code on y.pdf
INSTRUCTIONS For this assignment you will be generating all code on y.pdf
 
XML Tutor maXbox starter27
XML Tutor maXbox starter27XML Tutor maXbox starter27
XML Tutor maXbox starter27
 
FILE HANDLING.pptx
FILE HANDLING.pptxFILE HANDLING.pptx
FILE HANDLING.pptx
 
Choose one of these three options A IPC using FIFO B Shar.pdf
Choose one of these three options A IPC using FIFO B Shar.pdfChoose one of these three options A IPC using FIFO B Shar.pdf
Choose one of these three options A IPC using FIFO B Shar.pdf
 
TCP Sockets Tutor maXbox starter26
TCP Sockets Tutor maXbox starter26TCP Sockets Tutor maXbox starter26
TCP Sockets Tutor maXbox starter26
 
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftBuilding scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thrift
 
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...
 
File handling3 (1).pdf uhgipughserigrfiogrehpiuhnfi;reuge
File handling3 (1).pdf uhgipughserigrfiogrehpiuhnfi;reugeFile handling3 (1).pdf uhgipughserigrfiogrehpiuhnfi;reuge
File handling3 (1).pdf uhgipughserigrfiogrehpiuhnfi;reuge
 
Unit 1 introduction to web programming
Unit 1 introduction to web programmingUnit 1 introduction to web programming
Unit 1 introduction to web programming
 
The assigment is overdue now. I will up the price I am willing to pa.docx
The assigment is overdue now. I will up the price I am willing to pa.docxThe assigment is overdue now. I will up the price I am willing to pa.docx
The assigment is overdue now. I will up the price I am willing to pa.docx
 
Writing a Simple OS for Fun
Writing a Simple OS for FunWriting a Simple OS for Fun
Writing a Simple OS for Fun
 
L2 lotus help
L2 lotus helpL2 lotus help
L2 lotus help
 
Lotus Domino Admin.
Lotus Domino Admin.Lotus Domino Admin.
Lotus Domino Admin.
 
Introductionto Xm Lmessaging
Introductionto Xm LmessagingIntroductionto Xm Lmessaging
Introductionto Xm Lmessaging
 
This project explores usage of the IPC in the form of shared.pdf
This project explores usage of the IPC in the form of shared.pdfThis project explores usage of the IPC in the form of shared.pdf
This project explores usage of the IPC in the form of shared.pdf
 
telnet ftp email
telnet ftp emailtelnet ftp email
telnet ftp email
 
Application layer
Application layerApplication layer
Application layer
 
Application layer
Application layerApplication layer
Application layer
 
Apache web server
Apache web serverApache web server
Apache web server
 
Python reading and writing files
Python reading and writing filesPython reading and writing files
Python reading and writing files
 

Recently uploaded

Recently uploaded (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 

Large file word count system design

  • 1. Datalake File Reading service C1 C2 C3 C4 C5 LB Work Counting Service Work Counting Service Work Counting Service C1, FTP C1 DB Count Handler { chunkId: c1, Url: chunk_url } Very Large file word count system design Working 1. The file reading service is responsible to read the file of some size and divide into chunks. 2. Read from end of file the last word, if its incomplete(no space/ fullstops at end) then take that word out and append it to next chunk. 3. Word count service will get a POST request with a payload of chunk_id and chunk_url to process. It can respond with 202 accepted. 4. Word count service download the file from the url using ftp. 5. Word count service read the file and can mark n indexes where n threads can work in parallel. Say if file is of 1 gb then 7 indices are found(line number and index where word start). 8 threads can be created then and they can operate from 0- index1, index1-index2 and so on. For ex. If file has 1000 lines, then 1000/8 = 125. So the indices would be 0 then at line 124 first index where word starts. So first thread would process form index 0 to 125 line index2. 6. When word count service is done with processing, it will pass on the info to count handler which will update db by adding the count to existing.