SlideShare a Scribd company logo
Building Scalable Big Data
Pipelines
Christian Gügi, Solution Architect
19.09.2013
NOSQL SEARCH ROADSHOW ZURICH
AGENDA
 Opportunities & Challenges
 Integrating Hadoop
 Lambda Architecture
 Lambda in Practice
 Recommendations
ABOUT ME
 Solution Architect @ YMC
 Founder and organizer Swiss Big Data User Group
  http://www.bigdata-usergroup.ch/
 Contact
  christian.guegi@ymc.ch
  http://about.me/cguegi
  @chrisgugi
ABOUT YMC
 Founded in 2001
 Based in Kreuzlingen, Switzerland
 Big Data Analytics, Web Solutions and
Mobile Applications
 24 experts
  Consulting, creation, engineering
OPPORTUNITIES &
A.  New sources and types from inside & outside organisations
  “Internet of things”, sensors, RFID, intelligent devices, etc.
  Unstructured information – documents, web logs, email, social
media, etc.
  Trusted 3rd party sources – industry provider & aggregators,
governments “Open Data”, weather, etc.
B.  Technology innovations to exploit new world of data
  Low cost storage and process power (cloud, on-premise &
hybrid)
  New software patterns to handle speed & volume, structured
and unstructured (In-memory computation, Hadoop,
Mapreduce, etc.)
  Revolution in user experience, analytics, recommendations
BIG DATA – WHAT IS THE BIG DEAL?
BIG DATA – CHALLENGES
• Align business
strategy
• Data Management
• Privacy protection
• Lack of skilled and
experienced people
• Volume
• Velocity
• Variety
• Veracity
Character
of data
Overwhelming
landscape &
integration
Available
talent
Organisational
issues
INTEGRATING
TYPICAL RDBMS SZENARIO
Data
Sources
Data
Systems
Apps
DWH RDBMS
RDBMS NFS Others
BI Web Mobile
ETL
BIG DATA SZENARIO
Data
Sources
Data
Systems
Apps
DWH RDBMS
RDBMS NFS Logs
BI Web Mobile
Social
Media
Sensors
Hadoop
1) Recommendations, etc.
1)
HADOOP ECOSYSTEM
LAMBDA
ARCHITECTURE
 Credits Nathan Marz
 Former Engineer at Twitter
 Storm, Cascalog, ElephantDB
LAMBDA
http://www.manning.com/marz/
DESIGN PRINCIPLES
Lambda Architecture
 Human fault-tolerance
 Data immutability
 Re-computation
HUMAN FAULT-TOLERANCE
Lambda Architecture
 Design for human error
  Bugs in code
  Accidental data loss
  Data corruption
 Protect good data, so you can always fix
what went wrong
DATA IMMUTABILIY
Lambda Architecture
 Store data in it’s rawest form
 Create and read but no update
 No data can be lost
  To fix the system just delete bad data
  Can always revert to a true state
DATA IMMUTABILIY
Lambda Architecture
Name Location Time
Alice Zurich 2009/03/29
Bob Lucerne 2012/04/12
Tom Bern 2010/04/09
Name Location Time
Alice Zurich 2009/03/29
Bob Lucerne 2012/04/12
Tom Bern 2010/04/09
Alice Basel 2013/08/20
Name Location
Alice Zurich
Bob Lucerne
Tom Bern
Name Location
Alice Basel
Bob Lucerne
Tom Bern
Capturing change traditionally (mutability)
Capturing change (immutability)
RE-COMPUTATION
Lambda Architecture
 Always able to re-compute from historical
data
 Basis for all data systems
  query = function(all data)
All Data
Pre-computed
views
Query
LAYERS
Lambda Architecture
http://www.ymc.ch/en/lambda-architecture-part-1
Lambda in Practice
ONLINE MARKETING
 Tracking and analytics solution
 Improve customer targeting and
segmentation
 Various reports
 Real-time not required
OVERVIEW
HBase
FTP
HDFS
AdServer
Flume
log
HDFS
Campaign
Database
Sqoop
csv
csv
Up- &
Download
Hive
fs -put
Aggregated
Data
Web
PigImpala
DWH
BI apps
Oozie ZooKeeper
Cloudera
Manager
DATA PIPELINE
FTP
HDFS
AdServer
Flume
log
HDFS
Campaign
Database
Sqoop
csv
csv
fs -put
M/R
Avro
Avro
Avro
Extracting Transformation
M/R
M/R
Loading
Tracking
Profiles
Bulk Importer
DWH
ADVANTAGES
 Extensible – easily add speed layer later on
 Complements existing DWH/BI system
 ETL phases are decoupled
 Reliable
  Infrastructure
  Each step can be replayed
 Scalable
  Storage
  Processing
 Highly available
 Ad-hoc analysis right from the beginning
RECOMMENDATIONS
RECOMMENDATIONS
 Not a fixed, one-size-fits-all approach
  Adopt to your needs/requirements
 Hadoop complements existing systems
 How real-time do I need to be?
 Immutability and pre-computation are just good
ideas!
  Store information in rawest format possible
  Use a serialization framework (Avro, Thrift, Protocol
Buffers)
THANK YOU!
YMC AG
Sonnenstrasse 4
CH-8280 Kreuzlingen
Switzerland
Photo Credits:
Slide 05: Success opportunity achieve by Stephen McCulloch
Slide 08: Matrix by Gamaliel Espinoza Macedo.
Slide 12: Layers by Katelyn Leblanc
Slide 20: Mining For Information by JD Hancock
Slide 27: Warning Question by longzijun
@chrisgugi
CONTACT US
christian.guegi@ymc.ch
Tel. +41 (0)71 508 24 76
www.ymc.ch

More Related Content

What's hot

Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Databricks
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
Stratio
 
Credit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisCredit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph Analysis
Jen Aman
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
Mars Lan
 
JEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache SparkJEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache Spark
Taras Matyashovsky
 
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Databricks
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
kbajda
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
Databricks
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
Roman Chukh
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Databricks
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
Spark Summit
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
Rajesh Muppalla
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
Dremio Corporation
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream DataDataWorks Summit
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
Gary Stafford
 
Stratio CrossData: an efficient distributed datahub with batch and streaming ...
Stratio CrossData: an efficient distributed datahub with batch and streaming ...Stratio CrossData: an efficient distributed datahub with batch and streaming ...
Stratio CrossData: an efficient distributed datahub with batch and streaming ...
Stratio
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture
Tin Ho
 

What's hot (20)

Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
Credit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisCredit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph Analysis
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
 
JEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache SparkJEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache Spark
 
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Open source data ingestion
Open source data ingestionOpen source data ingestion
Open source data ingestion
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Stratio CrossData: an efficient distributed datahub with batch and streaming ...
Stratio CrossData: an efficient distributed datahub with batch and streaming ...Stratio CrossData: an efficient distributed datahub with batch and streaming ...
Stratio CrossData: an efficient distributed datahub with batch and streaming ...
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture
 

Viewers also liked

Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
Hakka Labs
 
Building a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkBuilding a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkDataWorks Summit
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
Growth Intelligence
 
Designing Data Pipelines Using Hadoop
Designing Data Pipelines Using HadoopDesigning Data Pipelines Using Hadoop
Designing Data Pipelines Using Hadoop
DataWorks Summit
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
Jonathan Holloway
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
Lars Albertsson
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Evan Chan
 
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Amazon Web Services
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
Lars Albertsson
 
Managing data workflows with Luigi
Managing data workflows with LuigiManaging data workflows with Luigi
Managing data workflows with Luigi
Teemu Kurppa
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
datamantra
 
Interactive workflow management using Azkaban
Interactive workflow management using AzkabanInteractive workflow management using Azkaban
Interactive workflow management using Azkaban
datamantra
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
Amazon Web Services
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Thoughtworks
 
Big data prototyping in AWS cloud
Big data prototyping in AWS cloudBig data prototyping in AWS cloud
Big data prototyping in AWS cloud
Samuel Yee
 
BI Suite Overview
BI Suite OverviewBI Suite Overview
BI Suite Overview
Bruno Saraiva
 
Big data pipelines
Big data pipelinesBig data pipelines
Big data pipelines
Vivek Aanand Ganesan
 
Getting Started with J2EE, A Roadmap
Getting Started with J2EE, A RoadmapGetting Started with J2EE, A Roadmap
Getting Started with J2EE, A Roadmap
Makarand Bhatambarekar
 

Viewers also liked (20)

Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Building a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkBuilding a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache Spark
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
 
Designing Data Pipelines Using Hadoop
Designing Data Pipelines Using HadoopDesigning Data Pipelines Using Hadoop
Designing Data Pipelines Using Hadoop
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
 
Managing data workflows with Luigi
Managing data workflows with LuigiManaging data workflows with Luigi
Managing data workflows with Luigi
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Interactive workflow management using Azkaban
Interactive workflow management using AzkabanInteractive workflow management using Azkaban
Interactive workflow management using Azkaban
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
 
Big data prototyping in AWS cloud
Big data prototyping in AWS cloudBig data prototyping in AWS cloud
Big data prototyping in AWS cloud
 
BI Suite Overview
BI Suite OverviewBI Suite Overview
BI Suite Overview
 
Big data pipelines
Big data pipelinesBig data pipelines
Big data pipelines
 
Getting Started with J2EE, A Roadmap
Getting Started with J2EE, A RoadmapGetting Started with J2EE, A Roadmap
Getting Started with J2EE, A Roadmap
 

Similar to Building Scalable Big Data Pipelines

Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
Guido Schmutz
 
Big data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sectorBig data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sector
Nicolas Sarramagna
 
BIM Conversion & Analysis Workshop: Story of the I-35W Bridge Collapse
BIM Conversion & Analysis Workshop: Story of the I-35W Bridge CollapseBIM Conversion & Analysis Workshop: Story of the I-35W Bridge Collapse
BIM Conversion & Analysis Workshop: Story of the I-35W Bridge Collapse
Safe Software
 
Internet of Things (IoT) and Big Data
Internet of Things (IoT) and Big DataInternet of Things (IoT) and Big Data
Internet of Things (IoT) and Big Data
Guido Schmutz
 
The sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of ThingsThe sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of Things
Stephan Reimann
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data Lösungen
Guido Schmutz
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data Architectures
Guido Schmutz
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
Trivadis
 
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...
Guido Schmutz
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04
Martin Bém
 
IBM CDS Overview
IBM CDS OverviewIBM CDS Overview
IBM CDS OverviewJean Tan
 
Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?
Swiss Data Forum Swiss Data Forum
 
Internet of Things and Big Data
Internet of Things and Big DataInternet of Things and Big Data
Internet of Things and Big Data
Swiss Data Forum Swiss Data Forum
 
Building Audi’s enterprise big data platform
Building Audi’s enterprise big data platformBuilding Audi’s enterprise big data platform
Building Audi’s enterprise big data platform
DataWorks Summit
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptx
shujee381
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
Mark Kromer
 
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
Amazon Web Services
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
James Serra
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
 

Similar to Building Scalable Big Data Pipelines (20)

Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
 
Big data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sectorBig data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sector
 
BIM Conversion & Analysis Workshop: Story of the I-35W Bridge Collapse
BIM Conversion & Analysis Workshop: Story of the I-35W Bridge CollapseBIM Conversion & Analysis Workshop: Story of the I-35W Bridge Collapse
BIM Conversion & Analysis Workshop: Story of the I-35W Bridge Collapse
 
Internet of Things (IoT) and Big Data
Internet of Things (IoT) and Big DataInternet of Things (IoT) and Big Data
Internet of Things (IoT) and Big Data
 
The sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of ThingsThe sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of Things
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data Lösungen
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data Architectures
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
 
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04
 
IBM CDS Overview
IBM CDS OverviewIBM CDS Overview
IBM CDS Overview
 
Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?
 
Internet of Things and Big Data
Internet of Things and Big DataInternet of Things and Big Data
Internet of Things and Big Data
 
Building Audi’s enterprise big data platform
Building Audi’s enterprise big data platformBuilding Audi’s enterprise big data platform
Building Audi’s enterprise big data platform
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptx
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big Data
 

More from Christian Gügi

Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment Transactions
Christian Gügi
 
Case Study: In-Store Analysis
Case Study: In-Store AnalysisCase Study: In-Store Analysis
Case Study: In-Store Analysis
Christian Gügi
 
Apache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeApache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeChristian Gügi
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to know
Christian Gügi
 
Online Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaOnline Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaChristian Gügi
 
Near Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBaseNear Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBase
Christian Gügi
 
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Christian Gügi
 

More from Christian Gügi (7)

Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment Transactions
 
Case Study: In-Store Analysis
Case Study: In-Store AnalysisCase Study: In-Store Analysis
Case Study: In-Store Analysis
 
Apache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeApache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data store
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to know
 
Online Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaOnline Media Data Stream Processing with Kafka
Online Media Data Stream Processing with Kafka
 
Near Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBaseNear Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBase
 
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
 

Recently uploaded

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 

Recently uploaded (20)

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 

Building Scalable Big Data Pipelines