SlideShare a Scribd company logo
1 of 30
BIG DATA|MAKING SENSE OF IT ALL
Author: Christos Erotocritou
christos@gigaspaces.com
Christos Erotocritou www.gigaspaces.com
Agenda
2
High-level view of the big data technology landscape
Big Data architecture & integration patterns
Complex compound queries
Orchestration in Big Data
1
2
3
4
Christos Erotocritou www.gigaspaces.com3
Key / Value
IMDG
Stream Processing
SSD
SQL NoSQL
Making Sense of the Exploding Big Data World
Let’s look at some
tools & technologies
Christos Erotocritou www.gigaspaces.com
SQL Technologies
5
• Query: ANSI 92
• Semantics:
• CRUD
• Aggregation
• Projection
• Partial update
• Performance: 100’s/Sec
• Consistency: Transactional
• Scaling: Mostly Scale-UP
• Availability: Disk Based
Christos Erotocritou www.gigaspaces.com
NoSQL Technologies
6
• Query: Proprietary but rich
• Semantics:
• CRUD
• Map/Reduce
• No Projection
• No Partial update
• Performance: 1000’s/Sec
• Consistency: Eventual
• Scaling: Mostly Scale-Out
• Availability: Replication based
Christos Erotocritou www.gigaspaces.com
IMDG Technologies
7
• Query: Proprietary but rich
• Semantics:
• CRUD
• Aggregation API + Map/Reduce
• Projection
• Partial Update
• Performance: 100k/sec
• Consistency: Transactional
• Scaling: Mostly Scale-Out
• Availability: Replication based
Christos Erotocritou www.gigaspaces.com
Key/Value Technologies
8
• Query: Key, Value
• Semantics:
• Mostly Read
• No Aggregation
• No Projection
• No Partial update
• Performance: 1M’s/sec
• Consistency: Atomic
• Scaling: Mostly Scale-Out
• Availability: Limited
Christos Erotocritou www.gigaspaces.com
Stream Processing Technologies
9
• Semantics:
• Event-driven data processing
• Performance: 10M’s/sec
• Machine learning
• Real-time analytics
• Depend on external persistency for
maintaining state
Spout
Bolt
Bolt
Bolt Bolt
Christos Erotocritou www.gigaspaces.com
SSD Technology is quickly shaping the Big Data Landscape
10
Great for heavy-reads initial-loads
Store indexes in memory and payload on SSD
SSD-extended in-memory products can provide
great performance with increased capacity and persistence
Big data but also fast data
Christos Erotocritou www.gigaspaces.com
Summary
11
Many API’s - Same Data
Use-case requirements across tools
SSD is shaping the landscape
Can we create a mashup of such technologies?
How can we integrate such
technologies and provide a
common access API ?
Christos Erotocritou www.gigaspaces.com
A typical Big Data App logical architecture can look like this
13
Batch
Processing
RT Analytics
Service
Storage
Front End
Application
/ Service
Back End
Front-end users accessing a
distributed multi-facet service
back-end users accessing a
business insights and system
maintenance metrics
Christos Erotocritou www.gigaspaces.com
We need a High-Speed Data Store…
14
• Key / Value
• Document
• Graph
• Map / Reduce
• Transactional
• Stream based
But we’re not there just yet…
Common Data Store serving
Multiple Semantics/API
Disk becomes
the new tape
High-speed
Data Store
Front
End
Back
End
Christos Erotocritou www.gigaspaces.com
We can use IMDG technologies to integrate all our Data Sources
15
High-speed
Data Bus
(IMDG)
Front
End
Back
End
MySQLMongoDB
Mongo Sync RDBMS Sync
Hadoop Sync
RT Streaming
Direct Access
RT Transactional
Data Access
Batch Layer
Speed Layer
Web Storage Layer
Data bus:
• Resilient, FT & HA
• Transactional
• High-throughput
Hadoop
Storm
Christos Erotocritou www.gigaspaces.com
Online consumer media service real-world use-case
16
High-speed
Data Bus
(IMDG)
Storm
Sync new data
available to end
user
Hadoop Sync
RT Streaming
Direct Access
(if needed)
Purchase order
(Transactional)
Long-term
analytics and
storage
Real-time business
analytics on user
activity
State persistency
Fast media
search
Hadoop
Downstream
System
Direct Access
(if needed)
business
analytics
Christos Erotocritou www.gigaspaces.com
What really goes on in the grid…
17
New
DataClient
Polling
Container
Notify
Container
Proc.
Data
MongoDB
New
Data
Client
Proc.
Data
Polling
container
writes to
MongoDB
New data
automatically
synced to ES
Mirror DB
Space
mirroring
Service
Application
Storage
Client writes
new data to
the space
Christos Erotocritou www.gigaspaces.com
Stream Processing Integration
18
Stream Producer
Storm
Stream
Processing
Spout
Data Grid
Data Stream (FIFO)
Christos Erotocritou www.gigaspaces.com
Summary
19
How to create a common data API
Using a high-speed data bus to integrate
SSD can be used for bigger and faster data
Using multiple
query semantics
Christos Erotocritou www.gigaspaces.com
Nested Queries & Projections
21
Query for a Person who lives in New York:
A … = new SQLQuery<Person>(Person.class, “address.city = ‘New York’”);
Query for a Dealer which sells a Honda:
B … = new SQLQuery<Dealer>(Dealer.class, “cars[*] = ‘Honda’”);
Query for a Person with projections on first and last names:
C IdsQuery<Person> idsQuery = new IdsQuery<Person>
(Person.class, new Long[]{id1, id2})
.setProjections(“firstName”, “lastName”);
Person result[] = space.readByIds(idsQuery).getResultsArray();
Christos Erotocritou www.gigaspaces.com
Basic Aggregations
22
Create a query that yields a results set:
A SQLQuery<Employee> = new SQLQuery<Employee>
(Employee.class, “country=? OR country=?”);
query.setParameter(1, “UK”);
query.setParameter(2, “USA”);
Perform aggregations for that result set:
B Integer maxAgeInSpace = max(space, query, “age”);
Integer minAgeInSpace = min(space, query, “age”);
Integer combinedAgeInSpace = sum(space, query, “age”);
Double averageAge = average(space, query, “age”);
Person oldestPersonInSpace = maxEntry(space, query, “age”);
Person youngestPersonInSpace = minEntry(space, query, “age”);
Christos Erotocritou www.gigaspaces.com
Complex & Compound Aggregations
23
Create a query that yields a results set:
A SQLQuery<Employee> = new SQLQuery<Employee>
(Employee.class, “country=? OR country=?”);
query.setParameter(1, “UK”);
query.setParameter(1, “USA”);
Perform group and filtering aggregations for that result set:
BB … = groupBy(space, query, new GroupByAggregator()
.select(average("salary"), min("salary"), max(“salary"))
.groupBy("department", “gender”))
.having(new GroupByFilter() {
public boolean process(GroupByValue group) {
return group.getDouble("avg(salary)") > 18000;}}));
Christos Erotocritou www.gigaspaces.com
Fast Update & Change API
24
Performing changes at the data-store level:
A IdsQuery<Person> idsQuery = new IdsQuery<Person>
(Person.class, id, routing)
space.change(idsQuery, new ChangeSet()
.increment(“balance.euro”, 5.2D));
Performing a series changes at the data-store level:
B IdsQuery<Person> idsQuery = new IdsQuery<Person>
(Person.class, id, routing)
space.change(idsQuery, new ChangeSet()
.increment(“someIntProperty”, 1)
.set(“someStringProperty”, “newValue”)
.putInMap(“someNestedProperty.someMapProperty”, “myKey”, 2);
Orchestration in Big Data
Christos Erotocritou www.gigaspaces.com26
Deploy
Install
Configure
Monitor
Manage Provision
The Application Deployment Lifecycle
Christos Erotocritou www.gigaspaces.com27
Create a Standardised Blueprint of the Application Topology
Node
Node Node
Node
Type: Container
Type: Server
Type: Container
Type: DB
Node
Type: App
Contained In
relationship
Connected To
relationship
Christos Erotocritou www.gigaspaces.com28
Using TOSCA & YAML to Describe the Application Topology
...
host:
type: cloudify.nodes.libcloud.Compute
...
##################################################################################
# Tomcat server
##################################################################################
tomcat_server:
type: cloudify.nodes.TomcatServer
relationships:
- type: cloudify.relationships.contained_in
target: host
##################################################################################
# MongoDB node as a backend data-store for the example Tomcat application
##################################################################################
mongodb:
type: cloudify.nodes.MongoDB
relationships:
- type: cloudify.relationships.contained_in
target: host
...
Christos Erotocritou www.gigaspaces.com29
Post-deployment Management & Monitoring
Thanks For Attending
Author: Christos Erotocritou
christos@gigaspaces.com

More Related Content

What's hot

Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Imply
 

What's hot (20)

An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
 
Apache Druid Vision and Roadmap
Apache Druid Vision and RoadmapApache Druid Vision and Roadmap
Apache Druid Vision and Roadmap
 
Au cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite ElasticAu cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite Elastic
 
Redis Streams plus Spark Structured Streaming
Redis Streams plus Spark Structured StreamingRedis Streams plus Spark Structured Streaming
Redis Streams plus Spark Structured Streaming
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
 
What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
 
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
 
MongoDB .local London 2019: Best Practices for Working with IoT and Time-seri...
MongoDB .local London 2019: Best Practices for Working with IoT and Time-seri...MongoDB .local London 2019: Best Practices for Working with IoT and Time-seri...
MongoDB .local London 2019: Best Practices for Working with IoT and Time-seri...
 
Architecting An Enterprise Storage Platform Using Object Stores
Architecting An Enterprise Storage Platform Using Object StoresArchitecting An Enterprise Storage Platform Using Object Stores
Architecting An Enterprise Storage Platform Using Object Stores
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of Things
 
Redshift VS BigQuery
Redshift VS BigQueryRedshift VS BigQuery
Redshift VS BigQuery
 
Splunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorSplunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operator
 
Druid
DruidDruid
Druid
 

Viewers also liked

After what i am good at
After what i am good atAfter what i am good at
After what i am good at
brady130
 
Declaracion del eurogrupo 09.05.12
Declaracion del eurogrupo 09.05.12Declaracion del eurogrupo 09.05.12
Declaracion del eurogrupo 09.05.12
ManfredNolte
 
Wendy belcher handouts
Wendy belcher handoutsWendy belcher handouts
Wendy belcher handouts
kimpalmore
 
CS-AMER-Nemours-MediCall-WebXchange
CS-AMER-Nemours-MediCall-WebXchangeCS-AMER-Nemours-MediCall-WebXchange
CS-AMER-Nemours-MediCall-WebXchange
Michael Kuck
 
Important Abbreviations
Important AbbreviationsImportant Abbreviations
Important Abbreviations
Omair Ayaz
 

Viewers also liked (15)

Cartaz la
Cartaz laCartaz la
Cartaz la
 
Membership form
Membership formMembership form
Membership form
 
After what i am good at
After what i am good atAfter what i am good at
After what i am good at
 
Cyclofenil 2624-43-3-api-manufacturer-suppliers
Cyclofenil 2624-43-3-api-manufacturer-suppliersCyclofenil 2624-43-3-api-manufacturer-suppliers
Cyclofenil 2624-43-3-api-manufacturer-suppliers
 
Cena Año Nuevo Barrabas 2014
Cena Año Nuevo Barrabas 2014Cena Año Nuevo Barrabas 2014
Cena Año Nuevo Barrabas 2014
 
Declaracion del eurogrupo 09.05.12
Declaracion del eurogrupo 09.05.12Declaracion del eurogrupo 09.05.12
Declaracion del eurogrupo 09.05.12
 
Wendy belcher handouts
Wendy belcher handoutsWendy belcher handouts
Wendy belcher handouts
 
Servicio al cliente
Servicio al clienteServicio al cliente
Servicio al cliente
 
Happy fourth of july
Happy fourth of julyHappy fourth of july
Happy fourth of july
 
CS-AMER-Nemours-MediCall-WebXchange
CS-AMER-Nemours-MediCall-WebXchangeCS-AMER-Nemours-MediCall-WebXchange
CS-AMER-Nemours-MediCall-WebXchange
 
Ghazal by shuja husan
Ghazal by shuja husanGhazal by shuja husan
Ghazal by shuja husan
 
Smart city a valuable journey - basics
Smart city a valuable journey  - basicsSmart city a valuable journey  - basics
Smart city a valuable journey - basics
 
Java7 New Features and Code Examples
Java7 New Features and Code ExamplesJava7 New Features and Code Examples
Java7 New Features and Code Examples
 
Important Abbreviations
Important AbbreviationsImportant Abbreviations
Important Abbreviations
 
Shot list
Shot listShot list
Shot list
 

Similar to Big Data Expo 2015 - Gigaspaces Making Sense of it all

Eagle6 mongo dc revised
Eagle6 mongo dc revisedEagle6 mongo dc revised
Eagle6 mongo dc revised
MongoDB
 

Similar to Big Data Expo 2015 - Gigaspaces Making Sense of it all (20)

2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News! ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
Redis for Fast Data Ingest
Redis for Fast Data IngestRedis for Fast Data Ingest
Redis for Fast Data Ingest
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in Action
 
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
Web Performance
Web PerformanceWeb Performance
Web Performance
 
Talend introduction v1
Talend introduction v1Talend introduction v1
Talend introduction v1
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL Migration
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
History of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setHistory of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature set
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Eagle6 mongo dc revised
Eagle6 mongo dc revisedEagle6 mongo dc revised
Eagle6 mongo dc revised
 

More from BigDataExpo

More from BigDataExpo (20)

Centric - Jaap huisprijzen, GTST, The Bold, IKEA en IENS. Zomaar wat toepassi...
Centric - Jaap huisprijzen, GTST, The Bold, IKEA en IENS. Zomaar wat toepassi...Centric - Jaap huisprijzen, GTST, The Bold, IKEA en IENS. Zomaar wat toepassi...
Centric - Jaap huisprijzen, GTST, The Bold, IKEA en IENS. Zomaar wat toepassi...
 
Google Cloud - Google's vision on AI
Google Cloud - Google's vision on AIGoogle Cloud - Google's vision on AI
Google Cloud - Google's vision on AI
 
Pacmed - Machine Learning in health care: opportunities and challanges in pra...
Pacmed - Machine Learning in health care: opportunities and challanges in pra...Pacmed - Machine Learning in health care: opportunities and challanges in pra...
Pacmed - Machine Learning in health care: opportunities and challanges in pra...
 
PGGM - The Future Explore
PGGM - The Future ExplorePGGM - The Future Explore
PGGM - The Future Explore
 
Universiteit Utrecht & gghdc - Wat zijn de gezondheidseffecten van omgeving e...
Universiteit Utrecht & gghdc - Wat zijn de gezondheidseffecten van omgeving e...Universiteit Utrecht & gghdc - Wat zijn de gezondheidseffecten van omgeving e...
Universiteit Utrecht & gghdc - Wat zijn de gezondheidseffecten van omgeving e...
 
Rob van Kranenburg - Kunnen we ons een sociaal krediet systeem zoals in het o...
Rob van Kranenburg - Kunnen we ons een sociaal krediet systeem zoals in het o...Rob van Kranenburg - Kunnen we ons een sociaal krediet systeem zoals in het o...
Rob van Kranenburg - Kunnen we ons een sociaal krediet systeem zoals in het o...
 
OrangeNXT - High accuracy mapping from videos for efficient fiber optic cable...
OrangeNXT - High accuracy mapping from videos for efficient fiber optic cable...OrangeNXT - High accuracy mapping from videos for efficient fiber optic cable...
OrangeNXT - High accuracy mapping from videos for efficient fiber optic cable...
 
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AIDynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
 
Teleperformance - Smart personalized service door het gebruik van Data Science
Teleperformance - Smart personalized service door het gebruik van Data Science Teleperformance - Smart personalized service door het gebruik van Data Science
Teleperformance - Smart personalized service door het gebruik van Data Science
 
FunXtion - Interactive Digital Fitness with Data Analytics
FunXtion - Interactive Digital Fitness with Data AnalyticsFunXtion - Interactive Digital Fitness with Data Analytics
FunXtion - Interactive Digital Fitness with Data Analytics
 
fashionTrade - Vroeger noemde we dat Big Data
fashionTrade - Vroeger noemde we dat Big DatafashionTrade - Vroeger noemde we dat Big Data
fashionTrade - Vroeger noemde we dat Big Data
 
BigData Republic - Industrializing data science: a view from the trenches
BigData Republic - Industrializing data science: a view from the trenchesBigData Republic - Industrializing data science: a view from the trenches
BigData Republic - Industrializing data science: a view from the trenches
 
Bicos - Hear how a top sportswear company produced cutting-edge data infrastr...
Bicos - Hear how a top sportswear company produced cutting-edge data infrastr...Bicos - Hear how a top sportswear company produced cutting-edge data infrastr...
Bicos - Hear how a top sportswear company produced cutting-edge data infrastr...
 
Endrse - Next level online samenwerkingen tussen personalities en merken met ...
Endrse - Next level online samenwerkingen tussen personalities en merken met ...Endrse - Next level online samenwerkingen tussen personalities en merken met ...
Endrse - Next level online samenwerkingen tussen personalities en merken met ...
 
Bovag - Refine-IT - Proces optimalisatie in de automotive sector
Bovag - Refine-IT - Proces optimalisatie in de automotive sectorBovag - Refine-IT - Proces optimalisatie in de automotive sector
Bovag - Refine-IT - Proces optimalisatie in de automotive sector
 
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...
 
Veco - Big Data in de Supply Chain: Hoe Process Mining kan helpen kosten te r...
Veco - Big Data in de Supply Chain: Hoe Process Mining kan helpen kosten te r...Veco - Big Data in de Supply Chain: Hoe Process Mining kan helpen kosten te r...
Veco - Big Data in de Supply Chain: Hoe Process Mining kan helpen kosten te r...
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about Data
 
VU Amsterdam - Big data en datagedreven waardecreatie: valt er nog iets te ki...
VU Amsterdam - Big data en datagedreven waardecreatie: valt er nog iets te ki...VU Amsterdam - Big data en datagedreven waardecreatie: valt er nog iets te ki...
VU Amsterdam - Big data en datagedreven waardecreatie: valt er nog iets te ki...
 
Booking.com - Data science and experimentation at Booking.com: a data-driven ...
Booking.com - Data science and experimentation at Booking.com: a data-driven ...Booking.com - Data science and experimentation at Booking.com: a data-driven ...
Booking.com - Data science and experimentation at Booking.com: a data-driven ...
 

Recently uploaded

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
varanasisatyanvesh
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
LuisMiguelPaz5
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
wsppdmt
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
saurabvyas476
 

Recently uploaded (20)

Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 

Big Data Expo 2015 - Gigaspaces Making Sense of it all

  • 1. BIG DATA|MAKING SENSE OF IT ALL Author: Christos Erotocritou christos@gigaspaces.com
  • 2. Christos Erotocritou www.gigaspaces.com Agenda 2 High-level view of the big data technology landscape Big Data architecture & integration patterns Complex compound queries Orchestration in Big Data 1 2 3 4
  • 3. Christos Erotocritou www.gigaspaces.com3 Key / Value IMDG Stream Processing SSD SQL NoSQL Making Sense of the Exploding Big Data World
  • 4. Let’s look at some tools & technologies
  • 5. Christos Erotocritou www.gigaspaces.com SQL Technologies 5 • Query: ANSI 92 • Semantics: • CRUD • Aggregation • Projection • Partial update • Performance: 100’s/Sec • Consistency: Transactional • Scaling: Mostly Scale-UP • Availability: Disk Based
  • 6. Christos Erotocritou www.gigaspaces.com NoSQL Technologies 6 • Query: Proprietary but rich • Semantics: • CRUD • Map/Reduce • No Projection • No Partial update • Performance: 1000’s/Sec • Consistency: Eventual • Scaling: Mostly Scale-Out • Availability: Replication based
  • 7. Christos Erotocritou www.gigaspaces.com IMDG Technologies 7 • Query: Proprietary but rich • Semantics: • CRUD • Aggregation API + Map/Reduce • Projection • Partial Update • Performance: 100k/sec • Consistency: Transactional • Scaling: Mostly Scale-Out • Availability: Replication based
  • 8. Christos Erotocritou www.gigaspaces.com Key/Value Technologies 8 • Query: Key, Value • Semantics: • Mostly Read • No Aggregation • No Projection • No Partial update • Performance: 1M’s/sec • Consistency: Atomic • Scaling: Mostly Scale-Out • Availability: Limited
  • 9. Christos Erotocritou www.gigaspaces.com Stream Processing Technologies 9 • Semantics: • Event-driven data processing • Performance: 10M’s/sec • Machine learning • Real-time analytics • Depend on external persistency for maintaining state Spout Bolt Bolt Bolt Bolt
  • 10. Christos Erotocritou www.gigaspaces.com SSD Technology is quickly shaping the Big Data Landscape 10 Great for heavy-reads initial-loads Store indexes in memory and payload on SSD SSD-extended in-memory products can provide great performance with increased capacity and persistence Big data but also fast data
  • 11. Christos Erotocritou www.gigaspaces.com Summary 11 Many API’s - Same Data Use-case requirements across tools SSD is shaping the landscape Can we create a mashup of such technologies?
  • 12. How can we integrate such technologies and provide a common access API ?
  • 13. Christos Erotocritou www.gigaspaces.com A typical Big Data App logical architecture can look like this 13 Batch Processing RT Analytics Service Storage Front End Application / Service Back End Front-end users accessing a distributed multi-facet service back-end users accessing a business insights and system maintenance metrics
  • 14. Christos Erotocritou www.gigaspaces.com We need a High-Speed Data Store… 14 • Key / Value • Document • Graph • Map / Reduce • Transactional • Stream based But we’re not there just yet… Common Data Store serving Multiple Semantics/API Disk becomes the new tape High-speed Data Store Front End Back End
  • 15. Christos Erotocritou www.gigaspaces.com We can use IMDG technologies to integrate all our Data Sources 15 High-speed Data Bus (IMDG) Front End Back End MySQLMongoDB Mongo Sync RDBMS Sync Hadoop Sync RT Streaming Direct Access RT Transactional Data Access Batch Layer Speed Layer Web Storage Layer Data bus: • Resilient, FT & HA • Transactional • High-throughput Hadoop Storm
  • 16. Christos Erotocritou www.gigaspaces.com Online consumer media service real-world use-case 16 High-speed Data Bus (IMDG) Storm Sync new data available to end user Hadoop Sync RT Streaming Direct Access (if needed) Purchase order (Transactional) Long-term analytics and storage Real-time business analytics on user activity State persistency Fast media search Hadoop Downstream System Direct Access (if needed) business analytics
  • 17. Christos Erotocritou www.gigaspaces.com What really goes on in the grid… 17 New DataClient Polling Container Notify Container Proc. Data MongoDB New Data Client Proc. Data Polling container writes to MongoDB New data automatically synced to ES Mirror DB Space mirroring Service Application Storage Client writes new data to the space
  • 18. Christos Erotocritou www.gigaspaces.com Stream Processing Integration 18 Stream Producer Storm Stream Processing Spout Data Grid Data Stream (FIFO)
  • 19. Christos Erotocritou www.gigaspaces.com Summary 19 How to create a common data API Using a high-speed data bus to integrate SSD can be used for bigger and faster data
  • 21. Christos Erotocritou www.gigaspaces.com Nested Queries & Projections 21 Query for a Person who lives in New York: A … = new SQLQuery<Person>(Person.class, “address.city = ‘New York’”); Query for a Dealer which sells a Honda: B … = new SQLQuery<Dealer>(Dealer.class, “cars[*] = ‘Honda’”); Query for a Person with projections on first and last names: C IdsQuery<Person> idsQuery = new IdsQuery<Person> (Person.class, new Long[]{id1, id2}) .setProjections(“firstName”, “lastName”); Person result[] = space.readByIds(idsQuery).getResultsArray();
  • 22. Christos Erotocritou www.gigaspaces.com Basic Aggregations 22 Create a query that yields a results set: A SQLQuery<Employee> = new SQLQuery<Employee> (Employee.class, “country=? OR country=?”); query.setParameter(1, “UK”); query.setParameter(2, “USA”); Perform aggregations for that result set: B Integer maxAgeInSpace = max(space, query, “age”); Integer minAgeInSpace = min(space, query, “age”); Integer combinedAgeInSpace = sum(space, query, “age”); Double averageAge = average(space, query, “age”); Person oldestPersonInSpace = maxEntry(space, query, “age”); Person youngestPersonInSpace = minEntry(space, query, “age”);
  • 23. Christos Erotocritou www.gigaspaces.com Complex & Compound Aggregations 23 Create a query that yields a results set: A SQLQuery<Employee> = new SQLQuery<Employee> (Employee.class, “country=? OR country=?”); query.setParameter(1, “UK”); query.setParameter(1, “USA”); Perform group and filtering aggregations for that result set: BB … = groupBy(space, query, new GroupByAggregator() .select(average("salary"), min("salary"), max(“salary")) .groupBy("department", “gender”)) .having(new GroupByFilter() { public boolean process(GroupByValue group) { return group.getDouble("avg(salary)") > 18000;}}));
  • 24. Christos Erotocritou www.gigaspaces.com Fast Update & Change API 24 Performing changes at the data-store level: A IdsQuery<Person> idsQuery = new IdsQuery<Person> (Person.class, id, routing) space.change(idsQuery, new ChangeSet() .increment(“balance.euro”, 5.2D)); Performing a series changes at the data-store level: B IdsQuery<Person> idsQuery = new IdsQuery<Person> (Person.class, id, routing) space.change(idsQuery, new ChangeSet() .increment(“someIntProperty”, 1) .set(“someStringProperty”, “newValue”) .putInMap(“someNestedProperty.someMapProperty”, “myKey”, 2);
  • 27. Christos Erotocritou www.gigaspaces.com27 Create a Standardised Blueprint of the Application Topology Node Node Node Node Type: Container Type: Server Type: Container Type: DB Node Type: App Contained In relationship Connected To relationship
  • 28. Christos Erotocritou www.gigaspaces.com28 Using TOSCA & YAML to Describe the Application Topology ... host: type: cloudify.nodes.libcloud.Compute ... ################################################################################## # Tomcat server ################################################################################## tomcat_server: type: cloudify.nodes.TomcatServer relationships: - type: cloudify.relationships.contained_in target: host ################################################################################## # MongoDB node as a backend data-store for the example Tomcat application ################################################################################## mongodb: type: cloudify.nodes.MongoDB relationships: - type: cloudify.relationships.contained_in target: host ...
  • 30. Thanks For Attending Author: Christos Erotocritou christos@gigaspaces.com