SlideShare a Scribd company logo
1 of 26
Download to read offline
BIG DATA
Lucian Neghina
Big Data & Cloud Computing
by Developer for Developers
Ecosystem
Introduction
Let’s see what is Big Data
1
BIG DATA CHALLENGES
3 Vs of
Big Data
Terabytes
Records
Transactions
Tables, files
Batch
Near time
Real time
Streaming
Structured
Unstructured
Semistructured
All of the above
VOLUME
VELOCITY VARIETY
BIG DATA PIPELINE
Source Ingest
Process
Analyze
VisualizeStore
Structured
Semi-Structured
Unstructured
Messaging
API/ODBC
ETL
Replication
Web Dashboards
Mobile Devices
Web Services
Data Lake
Operational
Data Store
Real Time
Batch
Interactive
AI/ML
BIG DATA POPULAR USE CASES
Fraud Detection
Security Intelligence
Price Optimization
Behavioral
Analytics
Recommendation
Engines
Social Media Analysis
and Response
Internet of Things
Financial Trading
Improving Science
and Research
Performance
Optimisation
Improving
Healthcare
“
Big Data is data sets that are too large,
complex and dynamics for any
conventional data tools to capture,
store, manage and analyze.
Ingest
Big Data Component
2
DATA INGESTION
Source
systems
Ingest / Collect
CATEGORIES OF DATA
● Data in motion
● Data at rest
Destination system
WHAT IS
DATA INGESTION SQOOP
RDBMS
PostgreSQL,
Oracle,
MySQL, ...
Sqoop Import
Sqoop job
Map
Map
Map
Hadoop Cluster
Sqoop Export
HDFS
DATA INGESTION KAFKA
CONNECT
PRODUCER-CONSUMER
Data
Source
KafkaConnect
KafkaConnect
Data
Sink
Producer
Producer
Producer
Consumer
Consumer
ConsumerKafka Cluster
Kafka Cluster
DATA INGESTION FLUME
External
Source
HDFS
File
Flume Agent
Source
Sink 1
Sink 2
Channel 1
Channel 2
Event
Event
Event Event
Event
Event
Event
DATA INGESTION NIFI
Edge Data
IoT Devices
Client
Libraries
Mobile
Client
Libraries
Container
MiNiFi
IoT Devices
Client
Libraries
Gateway
MiNiFi
Server Cluster
NiFi NiFi NiFi
Regional Center
Server Cluster
NiFi NiFi NiFi
Core Data Center
Kafka
Storm
Others...
Kafka
Spark
Storage
Big Data Component
3
DATA STORAGE CAP
Consistency
Availability
Partition Tolerance
All the clients see the
same data regardless
of updates or deletes
System continues to
operate as expected
even with node failures
System continues to operate
as expected despite network
or message failures
DATA STORAGE HDFS
Distributed File System
Master/Slave architecture
Provides file permissions
and authentication
High fault-tolerance
Read/Write terabytes
of data per second
Streaming data access
Replicates the data
for durability
DATA STORAGE HBASE
NoSQL database
Consistency and
Partition Tolerance
No data types
Stores data in HDFS
Optimized for reads
Column-Oriented
Automatic sharding
and load balancing
Master/Slave architecture
Support Aggregation
DATA STORAGE CASSANDRA
NoSQL database
Optimized for writes
No Single Point of Failures
Column-Oriented
Tunable Consistency
Ring architecture
Availability and
Partition Tolerance
Scalable with large clusters
DATA STORAGE SOLR
Full-Text Search
Linear Scalability
Distributed Index
Schema / Schemaless
Auto Index Replication Inverted Indexing
Auto Failover and
Recovery
Sharding and Replications
DATA STORAGE REDIS
Persistence via Snapshot / Journal
Key-Value NoSQL database
In memory data store
Keys can have expiry time
Master/Slave architecture
Publish / Subscribe system
Consistency and
Partition Tolerance
DATA STORAGE TITAN
Graph database
CAP according to
backend storage
Geo, numeric range, full text
ElasticSearch, Solr, Lucene
Support ACID and
Eventual Consistency
Very large graphs
Storage backends
Cassandra, HBase, Oracle
Concurrent Transactions and
Operational Graph Processing
Elastic and linear
scalability
Process & Analyze
Big Data Component
4
DATA PROCESSING
BATCH
Data arrives and is processed
at certain interval.
NEAR REAL-TIME
The time between when data
arrives and is processed is very
small (micro-batches).
REAL TIME
Data arrives and is processed
in a continuous manner.
DATA ANALYTICS
INTERACTIVE
Set of approaches to explore data, supporting
exploration at the rate of human thought.
MACHINE LEARNING
Turning data into information using automated
methods without direct human intervention.
Visualization
Big Data Component
5
Monitoring
DATA VISUALIZATION
Business users
Data scientist,
developers
NotebooksBusiness Intelligence Frameworks
D3.js Chart.js Google
Charts
Thank You !
@eSolutionsGrup
www.esolutions.ro

More Related Content

What's hot

What's hot (20)

Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big data
Big dataBig data
Big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Data analytics vs. Data analysis
Data analytics vs. Data analysisData analytics vs. Data analysis
Data analytics vs. Data analysis
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big Data
Big DataBig Data
Big Data
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentation
 
Big Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation SlideBig Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation Slide
 
Ppt
PptPpt
Ppt
 
Data analytics
Data analyticsData analytics
Data analytics
 
Business Intelligence (BI) and Data Management Basics
Business Intelligence (BI) and Data Management  Basics Business Intelligence (BI) and Data Management  Basics
Business Intelligence (BI) and Data Management Basics
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 

Similar to Big Data Ecosystem

Unifying Analytics
Unifying AnalyticsUnifying Analytics
Unifying Analytics
Data Con LA
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Amazon Web Services
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Felix Gessert
 

Similar to Big Data Ecosystem (20)

Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)
 
Cortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ MicrosoftCortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ Microsoft
 
Architecting Data in the AWS Ecosystem
Architecting Data in the AWS EcosystemArchitecting Data in the AWS Ecosystem
Architecting Data in the AWS Ecosystem
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Unifying Analytics
Unifying AnalyticsUnifying Analytics
Unifying Analytics
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Big Data Ecosystem