SlideShare a Scribd company logo
1 of 17
Download to read offline
 
 
Lesson Keynote 
Distributed Systems in Data Engineering 
By: Oluwasegun Matthew | oadetimehin@terragonltd.com 
 
Summary 
1. Introduction to Distributed Systems 
a. The concept of server-client architecture 
b. Channel for Communication 
c. Impact on Data Engineering at Scale 
2. From Localhost to Production - things to watchout for... 
3. Industry based Technologies/Tools in View 
a. Messaging kits - RabbitMQ & Kafka 
b. In Memory Data Caching - Redis & Aerospike 
c. Data in Stream Tools - AWS Kinesis 
d. Monitoring and Log Watch - CloudWatch 
4. Summary - in class 
5. Questions 
Class Activity:​ ​Form 4 groups, choose from any of the messaging and in-memory data caching 
tool, use this to ​create a resilient distributed system​ to fix the following problems:  
- Crashing nature of e-Portal portal 
- Exam records processing 
Let’s Dive In... 
1 
 
 
1. Introduction to Distributed Systems 
According to Wikipedia through Google,  
 
A distributed system in its most simplest definition is a group of computers working together as to                                 
appear as a single computer to the end-user. These machines have a shared state, operate                             
concurrently and can fail independently without affecting the whole system’s uptime. 
This is in line with ever-growing technological expansion of the world, distributed systems are                           
becoming more and more widespread. Take a look at the increasing number of available                           
computer technologies/innovation around, this is sporadically increasing, and this result in                     
intense computational requirement. 
Yeah, Moore’s law proposed more computing power by fitting more transistors (which                       
approximately doubles every two years) into a simple chip using cost-efficient approach - cool,                           
but over the past 5 years, there has been little deviation from this - ability to scale horizontally                                   
and not just vertically alone. 
 
 
 
 
 
 
2 
 
 
The Concept of Server-Client Architecture 
Client-server architecture(client/server) is a network architecture in which each computer or                     
process on the network is either a client or a server.  
Just the way it is in a general world, activities is usually based on server/client relationship and                                 
this isn’t different in technology too e.g Cashier/Customer, Bus Conductor/Passengers etc. 
Another type of network architecture is known as a peer-to-peer architecture because each node                           
has equivalent responsibilities -​ but this isn’t what we are discussing today 
 
 
 
 
 
The approach of breaking breaking larger application into chunks over a server-client 
architecture can be explained with ​Microservices. ​Consider the cases below: 
 
 
3 
 
 
Case 1 - Monolith: ​At the core of the application is the business logic, which is implemented by                                   
modules that define services, domain objects, and events. Surrounding the core are adapters that                           
interface with the external world. Examples of adapters include database access components,                       
messaging components that produce and consume messages, and web components that either                       
expose API or implement a UI - this results in ​Monolithic Hell 
 
 
4 
 
 
Case 2 - Microservices:​ Here we are tackling complexity, A service typically implements a set of 
distinct features or functionality, such as order management, customer management etc. Each 
microservice is a mini-application that has its own hexagonal architecture consisting of business 
logic along with various adapters. Some microservices world expose an API that’s consumed by 
other microservies or by the application’s client. Other microservices might implement a web UI. 
At runtime, each instance is often a cloud VM or a Docker container. 
 
 
5 
 
 
 
 
 
 
Quiz ​Give…  
● Examples of a client/server relationship in real world 
● Methods of binding two systems that you know 
● Two architectures in which softwares are designed 
● Major issue with Monolithic design 
 
 
 
 
 
 
 
 
 
 
6 
 
 
Channel for Communication 
When we have a decentralized system, it’s important for us to make these systems communicate                             
with one-another. The client/server architecture emphasis a producer/consumer computing                 
architecture where the server acts as the producer and the client as a consumer. The model of                                 
communication can either be ​synchronous​ or ​asynchronous​. Each of this further broken into: 
- API Mode 
- Buffer Mode 
 
API Mode: is a synchronous (or instant feedback) mode of communication. It usually used for                             
one-to-one type of communication through protocols like http, https, smtp, smpp etc. 
 
Buffer Mode: is an asynchronous mode of communication, where feedback isn’t needed                       
immediately. It works for both one-to-one and broadcast communication. In this mode of                         
communication, a queuing/messaging/buffering system is placed in between these two systems                     
to manage flow of information. Here the following queuing algorithm is emphasized: 
- FIFO (First In First Out) 
- LIFO (Last In First Out) 
- SJF (Shortest Job First) 
- Round Robin 
 
Impact on Data Engineering at Scale 
Again, bringing the concept of distributed system into data Engineering...Hey, what’s data                       
engineering? 
Data engineering is the act of building and managing information or “big data” infrastructure.                           
Data engineers create architecture that helps analyze and process data in the way it’s needed by                               
an organization, from data processing to creating a pipeline of data into lake and warehouse for                               
business value creation. 
The following are some of the positive impacts of distributed system in data engineering: 
- Creating resilient data architecture 
- Easily managed systems 
7 
 
 
- Security and control 
- Reduced failure point 
- Fault detection with ease 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8 
 
 
 
 
 
 
 
 
 
Quiz ​Mention...  
● 2 Queue algorithms you are familiar with 
● Web Technologies that runs on HTTP protocol 
 
 
 
 
 
 
 
 
9 
 
 
2. From Localhost to Production - things to watchout for.. 
When systems are built on development environment, a lot isn’t considered, this may be due to 
experience, right information or un-envisaged circumstances. This implies that a perfect system 
cannot be built at development stage until it’s tested in real-life scenario. 
Sometimes, system overkill design might be a major flaw of the development phase, but the 
production will really tell or not. 
List of things to watch out: 
- Unexpected spike in platform/technology usage - system overload 
- Performance as a result of consistent platform usage 
- Security of interconnected systems 
- Extensibility of features 
- Easy of deployment 
 
 
 
 
 
 
 
 
 
 
 
 
 
10 
 
 
Enough of theoretical exposition, Let’s go practical… 
 
3. Industry based Technologies/Tools in View 
Here we shall talk about the different tools used in the industry to manage distributed system 
Messaging Kits ​- e.g. RabbitMQ or Kafka 
 
RabbitMQ is the most widely deployed open source message broker - ​https://www.rabbitmq.com/ 
Tutorial Guide (in PHP) - https://www.rabbitmq.com/tutorials/tutorial-three-php.html 
 
 
11 
 
 
 
 
12 
 
 
 
 
 
 
 
13 
 
 
In Memory Data Caching ​- e.g. Redis or Aerospike 
 
 
Redis is an open source in-memory data structure store used as a databse, cache and message                               
broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range                             
queries, bitmaps, hyperlogs etc.. - ​https://redis.io/ 
 
Documentation found here for PHP: https://github.com/amphp/redis 
14 
 
 
 
Data in Steam Tools ​- AWS Kinesis 
 
AWS Kinesis makes it easy to collect, process and analyze real-time streaming data so you can                               
get timely insights and react quickly to new information; owned by Amazon  
- https://aws.amazon.com/kinesis/ 
 
 
 
15 
 
 
Monitoring and Logs Watch​ - CloudWatch 
` 
AWS Cloudwatch is a monitoring and management service built for developers, system                       
operators, site reliability engineers (SRE), and IT managers https://aws.amazon.com/cloudwatch/ 
 
 
 
 
 
 
 
 
 
 
 
 
Assessment 
See class activity on the first page... 
 
 
 
 
16 
 
 
 
 
 
 
Questions and Mentorship 
For further questions, collaboration or mentorship, reach out: 
Email: oadetimehin@terragonltd.com  
Mobile: 07060514642 
 
 
17 

More Related Content

What's hot

Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Timothy Spann
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineSunil Nagaraj
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkDatabricks
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Araf Karsh Hamid
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
SOS: Optimizing Shuffle I/O with Brian Cho and Ergin Seyfe
SOS: Optimizing Shuffle I/O with Brian Cho and Ergin SeyfeSOS: Optimizing Shuffle I/O with Brian Cho and Ergin Seyfe
SOS: Optimizing Shuffle I/O with Brian Cho and Ergin SeyfeDatabricks
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...HostedbyConfluent
 
Best Practices for the Most Impactful Oracle Database 18c and 19c Features
Best Practices for the Most Impactful Oracle Database 18c and 19c FeaturesBest Practices for the Most Impactful Oracle Database 18c and 19c Features
Best Practices for the Most Impactful Oracle Database 18c and 19c FeaturesMarkus Michalewicz
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
Cloud Native PostgreSQL
Cloud Native PostgreSQLCloud Native PostgreSQL
Cloud Native PostgreSQLEDB
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Vinoth Chandar
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
 

What's hot (20)

Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture Pipeline
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
SOS: Optimizing Shuffle I/O with Brian Cho and Ergin Seyfe
SOS: Optimizing Shuffle I/O with Brian Cho and Ergin SeyfeSOS: Optimizing Shuffle I/O with Brian Cho and Ergin Seyfe
SOS: Optimizing Shuffle I/O with Brian Cho and Ergin Seyfe
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 
Best Practices for the Most Impactful Oracle Database 18c and 19c Features
Best Practices for the Most Impactful Oracle Database 18c and 19c FeaturesBest Practices for the Most Impactful Oracle Database 18c and 19c Features
Best Practices for the Most Impactful Oracle Database 18c and 19c Features
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Cloud Native PostgreSQL
Cloud Native PostgreSQLCloud Native PostgreSQL
Cloud Native PostgreSQL
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 

Similar to Distributed Systems in Data Engineering Lesson

CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfHasanAfwaaz1
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architectureFaren faren
 
NMS Projects and POCs completed and ongoing for OSS NAM v 1.5 Linkedin
NMS Projects and POCs completed and ongoing for OSS NAM v 1.5 LinkedinNMS Projects and POCs completed and ongoing for OSS NAM v 1.5 Linkedin
NMS Projects and POCs completed and ongoing for OSS NAM v 1.5 LinkedinJavier Guillermo, MBA, MSc, PMP
 
Introduction Of Cloud Computing
Introduction Of Cloud ComputingIntroduction Of Cloud Computing
Introduction Of Cloud ComputingMonica Rivera
 
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdfHOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdfAgaram Technologies
 
Distributed system
Distributed systemDistributed system
Distributed systemchirag patil
 
paradigms cloud.pptx
paradigms cloud.pptxparadigms cloud.pptx
paradigms cloud.pptxgunvinit931
 
A New Way Of Distributed Or Cloud Computing
A New Way Of Distributed Or Cloud ComputingA New Way Of Distributed Or Cloud Computing
A New Way Of Distributed Or Cloud ComputingAshley Lovato
 
Cloud computing Review over various scheduling algorithms
Cloud computing Review over various scheduling algorithmsCloud computing Review over various scheduling algorithms
Cloud computing Review over various scheduling algorithmsIJEEE
 
E-Business And Technology Essay
E-Business And Technology EssayE-Business And Technology Essay
E-Business And Technology EssayPamela Wright
 
Solving big data challenges for enterprise application
Solving big data challenges for enterprise applicationSolving big data challenges for enterprise application
Solving big data challenges for enterprise applicationTrieu Dao Minh
 
Client Server Model and Distributed Computing
Client Server Model and Distributed ComputingClient Server Model and Distributed Computing
Client Server Model and Distributed ComputingAbhishek Jaisingh
 
OOP - Basing Software Development on Reusable
OOP - Basing Software Development on Reusable OOP - Basing Software Development on Reusable
OOP - Basing Software Development on Reusable 17090AshikurRahman
 
Architectural Design Report G4
Architectural Design Report G4Architectural Design Report G4
Architectural Design Report G4Prizzl
 
Distributed Computing Report
Distributed Computing ReportDistributed Computing Report
Distributed Computing ReportIIT Kharagpur
 
Over view of software artitecture
Over view of software artitectureOver view of software artitecture
Over view of software artitectureABDEL RAHMAN KARIM
 

Similar to Distributed Systems in Data Engineering Lesson (20)

CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdf
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architecture
 
NMS Projects and POCs completed and ongoing for OSS NAM v 1.5 Linkedin
NMS Projects and POCs completed and ongoing for OSS NAM v 1.5 LinkedinNMS Projects and POCs completed and ongoing for OSS NAM v 1.5 Linkedin
NMS Projects and POCs completed and ongoing for OSS NAM v 1.5 Linkedin
 
Introduction Of Cloud Computing
Introduction Of Cloud ComputingIntroduction Of Cloud Computing
Introduction Of Cloud Computing
 
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdfHOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
 
Distributed system
Distributed systemDistributed system
Distributed system
 
publishable paper
publishable paperpublishable paper
publishable paper
 
paradigms cloud.pptx
paradigms cloud.pptxparadigms cloud.pptx
paradigms cloud.pptx
 
A New Way Of Distributed Or Cloud Computing
A New Way Of Distributed Or Cloud ComputingA New Way Of Distributed Or Cloud Computing
A New Way Of Distributed Or Cloud Computing
 
Cloud computing Review over various scheduling algorithms
Cloud computing Review over various scheduling algorithmsCloud computing Review over various scheduling algorithms
Cloud computing Review over various scheduling algorithms
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
E-Business And Technology Essay
E-Business And Technology EssayE-Business And Technology Essay
E-Business And Technology Essay
 
Solving big data challenges for enterprise application
Solving big data challenges for enterprise applicationSolving big data challenges for enterprise application
Solving big data challenges for enterprise application
 
Client Server Model and Distributed Computing
Client Server Model and Distributed ComputingClient Server Model and Distributed Computing
Client Server Model and Distributed Computing
 
OOP - Basing Software Development on Reusable
OOP - Basing Software Development on Reusable OOP - Basing Software Development on Reusable
OOP - Basing Software Development on Reusable
 
1.intro. to distributed system
1.intro. to distributed system1.intro. to distributed system
1.intro. to distributed system
 
Architectural Design Report G4
Architectural Design Report G4Architectural Design Report G4
Architectural Design Report G4
 
Distributed Computing Report
Distributed Computing ReportDistributed Computing Report
Distributed Computing Report
 
Report_Internships
Report_InternshipsReport_Internships
Report_Internships
 
Over view of software artitecture
Over view of software artitectureOver view of software artitecture
Over view of software artitecture
 

More from Adetimehin Oluwasegun Matthew

More from Adetimehin Oluwasegun Matthew (7)

Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 
Personal Branding - Necessity for DevOps Engineers
Personal Branding - Necessity for DevOps EngineersPersonal Branding - Necessity for DevOps Engineers
Personal Branding - Necessity for DevOps Engineers
 
Relevance of academics to Industry
Relevance of academics to IndustryRelevance of academics to Industry
Relevance of academics to Industry
 
Choosing a Careeer in Information Technology
Choosing a Careeer in Information TechnologyChoosing a Careeer in Information Technology
Choosing a Careeer in Information Technology
 
Engineering Data Pipeline for Data-Driven Analytics
Engineering Data Pipeline for Data-Driven AnalyticsEngineering Data Pipeline for Data-Driven Analytics
Engineering Data Pipeline for Data-Driven Analytics
 
Becoming a world class engineer
Becoming a world class engineerBecoming a world class engineer
Becoming a world class engineer
 

Recently uploaded

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringJuanCarlosMorales19600
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 

Recently uploaded (20)

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineering
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 

Distributed Systems in Data Engineering Lesson

  • 1.     Lesson Keynote  Distributed Systems in Data Engineering  By: Oluwasegun Matthew | oadetimehin@terragonltd.com    Summary  1. Introduction to Distributed Systems  a. The concept of server-client architecture  b. Channel for Communication  c. Impact on Data Engineering at Scale  2. From Localhost to Production - things to watchout for...  3. Industry based Technologies/Tools in View  a. Messaging kits - RabbitMQ & Kafka  b. In Memory Data Caching - Redis & Aerospike  c. Data in Stream Tools - AWS Kinesis  d. Monitoring and Log Watch - CloudWatch  4. Summary - in class  5. Questions  Class Activity:​ ​Form 4 groups, choose from any of the messaging and in-memory data caching  tool, use this to ​create a resilient distributed system​ to fix the following problems:   - Crashing nature of e-Portal portal  - Exam records processing  Let’s Dive In...  1 
  • 2.     1. Introduction to Distributed Systems  According to Wikipedia through Google,     A distributed system in its most simplest definition is a group of computers working together as to                                  appear as a single computer to the end-user. These machines have a shared state, operate                              concurrently and can fail independently without affecting the whole system’s uptime.  This is in line with ever-growing technological expansion of the world, distributed systems are                            becoming more and more widespread. Take a look at the increasing number of available                            computer technologies/innovation around, this is sporadically increasing, and this result in                      intense computational requirement.  Yeah, Moore’s law proposed more computing power by fitting more transistors (which                        approximately doubles every two years) into a simple chip using cost-efficient approach - cool,                            but over the past 5 years, there has been little deviation from this - ability to scale horizontally                                    and not just vertically alone.              2 
  • 3.     The Concept of Server-Client Architecture  Client-server architecture(client/server) is a network architecture in which each computer or                      process on the network is either a client or a server.   Just the way it is in a general world, activities is usually based on server/client relationship and                                  this isn’t different in technology too e.g Cashier/Customer, Bus Conductor/Passengers etc.  Another type of network architecture is known as a peer-to-peer architecture because each node                            has equivalent responsibilities -​ but this isn’t what we are discussing today            The approach of breaking breaking larger application into chunks over a server-client  architecture can be explained with ​Microservices. ​Consider the cases below:      3 
  • 4.     Case 1 - Monolith: ​At the core of the application is the business logic, which is implemented by                                    modules that define services, domain objects, and events. Surrounding the core are adapters that                            interface with the external world. Examples of adapters include database access components,                        messaging components that produce and consume messages, and web components that either                        expose API or implement a UI - this results in ​Monolithic Hell      4 
  • 5.     Case 2 - Microservices:​ Here we are tackling complexity, A service typically implements a set of  distinct features or functionality, such as order management, customer management etc. Each  microservice is a mini-application that has its own hexagonal architecture consisting of business  logic along with various adapters. Some microservices world expose an API that’s consumed by  other microservies or by the application’s client. Other microservices might implement a web UI.  At runtime, each instance is often a cloud VM or a Docker container.      5 
  • 6.             Quiz ​Give…   ● Examples of a client/server relationship in real world  ● Methods of binding two systems that you know  ● Two architectures in which softwares are designed  ● Major issue with Monolithic design                      6 
  • 7.     Channel for Communication  When we have a decentralized system, it’s important for us to make these systems communicate                              with one-another. The client/server architecture emphasis a producer/consumer computing                  architecture where the server acts as the producer and the client as a consumer. The model of                                  communication can either be ​synchronous​ or ​asynchronous​. Each of this further broken into:  - API Mode  - Buffer Mode    API Mode: is a synchronous (or instant feedback) mode of communication. It usually used for                              one-to-one type of communication through protocols like http, https, smtp, smpp etc.    Buffer Mode: is an asynchronous mode of communication, where feedback isn’t needed                        immediately. It works for both one-to-one and broadcast communication. In this mode of                          communication, a queuing/messaging/buffering system is placed in between these two systems                      to manage flow of information. Here the following queuing algorithm is emphasized:  - FIFO (First In First Out)  - LIFO (Last In First Out)  - SJF (Shortest Job First)  - Round Robin    Impact on Data Engineering at Scale  Again, bringing the concept of distributed system into data Engineering...Hey, what’s data                        engineering?  Data engineering is the act of building and managing information or “big data” infrastructure.                            Data engineers create architecture that helps analyze and process data in the way it’s needed by                                an organization, from data processing to creating a pipeline of data into lake and warehouse for                                business value creation.  The following are some of the positive impacts of distributed system in data engineering:  - Creating resilient data architecture  - Easily managed systems  7 
  • 8.     - Security and control  - Reduced failure point  - Fault detection with ease                                              8 
  • 9.                   Quiz ​Mention...   ● 2 Queue algorithms you are familiar with  ● Web Technologies that runs on HTTP protocol                  9 
  • 10.     2. From Localhost to Production - things to watchout for..  When systems are built on development environment, a lot isn’t considered, this may be due to  experience, right information or un-envisaged circumstances. This implies that a perfect system  cannot be built at development stage until it’s tested in real-life scenario.  Sometimes, system overkill design might be a major flaw of the development phase, but the  production will really tell or not.  List of things to watch out:  - Unexpected spike in platform/technology usage - system overload  - Performance as a result of consistent platform usage  - Security of interconnected systems  - Extensibility of features  - Easy of deployment                            10 
  • 11.     Enough of theoretical exposition, Let’s go practical…    3. Industry based Technologies/Tools in View  Here we shall talk about the different tools used in the industry to manage distributed system  Messaging Kits ​- e.g. RabbitMQ or Kafka    RabbitMQ is the most widely deployed open source message broker - ​https://www.rabbitmq.com/  Tutorial Guide (in PHP) - https://www.rabbitmq.com/tutorials/tutorial-three-php.html      11 
  • 14.     In Memory Data Caching ​- e.g. Redis or Aerospike      Redis is an open source in-memory data structure store used as a databse, cache and message                                broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range                              queries, bitmaps, hyperlogs etc.. - ​https://redis.io/    Documentation found here for PHP: https://github.com/amphp/redis  14 
  • 15.       Data in Steam Tools ​- AWS Kinesis    AWS Kinesis makes it easy to collect, process and analyze real-time streaming data so you can                                get timely insights and react quickly to new information; owned by Amazon   - https://aws.amazon.com/kinesis/        15 
  • 16.     Monitoring and Logs Watch​ - CloudWatch  `  AWS Cloudwatch is a monitoring and management service built for developers, system                        operators, site reliability engineers (SRE), and IT managers https://aws.amazon.com/cloudwatch/                          Assessment  See class activity on the first page...          16 
  • 17.             Questions and Mentorship  For further questions, collaboration or mentorship, reach out:  Email: oadetimehin@terragonltd.com   Mobile: 07060514642      17