SlideShare a Scribd company logo
NOSQL in Media
Sander Kieft
About me
 Manager Core Services at Sanoma
 Responsible for all common services, including the
Big Data platform
 Work:
– Centralized services
– Data platform
– Search
 Like:
– Work
– Water(sports)
– Whiskey
– Tinkering: Arduino, Raspberry PI, soldering stuff
24 April 20152
Sanoma, B2C Publishing and Learning company
2+100
2 Finnish newspapers
Over 100 magazines
24 April 2015 Presentation name3
5
TV channels in Finland
and The Netherlands
200+
Websites
100
Mobile applications on
various mobile platforms
24 April 2015 Presentation name5
Not Only
SQL
Generic vs specialized solutions
24 April 2015 Presentation name6
 Data models
 Speed
 Scalability
 Partition tolerance
 Availability / Redundancy
 Cost per GB
Specialized focus
24 April 2015 Presentation name7
 CAP (or Brewster) Theorem says:
“it is impossible for a distributed computer system
to simultaneously provide all three of the following
guarantees:
– Consistency
– Availability
– Partition tolerance”
CAP Theorem
24 April 2015 Presentation name8
A
C P
CAP Theorem
24 April 2015 Presentation name9
A
C P
Availability
Each client can always
read and write
Partition Tolerance
The system works well
despite physical
network partitions
Consistency
All clients always have
the same view of the
data
RDBMS
MySQL
Postgres
MS SQL
Oracle
NOSQL
NOSQL
Eventual consistency
-- Werner Vogels, CTO Amazon
Various Data models
 key-value
 column
 document stores
 map/reduce
 graph
 search
 blob storage
Various data models
24 April 2015 Presentation name12
Key/value stores
Photo credits: John Chulick - https://www.flickr.com/photos/chulickphotos/8234894686/
Key/value stores
 Storing object on key
 Based on the Dynamo paper (Werner Vogels)
 Products:
– Riak
– Memcache/Membase
– Tokyo Cabinet
– Redis
– Voldemort
 Use cases:
– Counting
– Top lists
– Caches
– Pre-calculated optimizations
24 April 2015 Presentation name14
Bucket A B C
Key/Value buckets
24 April 2015 Presentation name15
User XXXX YYYY ZZZZ
Article 100 200 300
Article_<5 min. TIME> 50 100 150
Real time stats
24 April 2015 Presentation name16
Document Stores
Document stores
 Stores ”records” as documents
 Versioning
 Easy sharding (document self contained)
 Products:
– MongoDB
– CouchDB
– SimpleDB
 Use case:
– CMS
– Meta data
– Product catalog
24 April 2015 Presentation name20
From relational data model to document
24 April 2015 Presentation name21
Product
Properties
Application
Property
Property
MyJour
Item Based Framework
….
CMS
Architecture Content Platform
24 April 2015 Presentation name22
Content Platform Core
Search
Solr
Blob
Storage
(S3 & MT)
Article
storage
MongoDB
Analyse
CMS
CMS
Editorial
reuse-interface
ePub
Digital
Template
system
WoodWing
Content
Portal
Feeds
Noma
Viva
PDF Based Framework
….
HomeDeco
Sources Services Solutions Products
??
??
??
??
eLinea
Blendle
Google Currents
LINDA. nieuws
NU.nl search
Column stores
Column stores
 Lineage: Google's BigTable paper
 Records with many, many columns
 Distinguish between hot and cold data
 Versioning
 Records and columns can be sharded
 Products:
– Hbase
– Cassandra
– Hypertable
 Use cases:
– Analytics
– Messages
24 April 2015 Presentation name24
Big Data
Big Data
 Linage: Google GFS & Map/Reduce
 Distributed data storage and processing
 Advanced analytics capabilities on raw data
 Schema on read
Products:
 Hadoop
 MPP databases
 Use cases:
– Adhoc querying terabytes of data
– Data science
 Predictive analytics
 Model training
– Calculate recommendations
24 April 2015 Presentation name26
Big Data at Sanoma
 Main use case for reporting and analytics, moving to
data science
 A/B MVT testing evaluations
 Using Qlikview as a front-end
 Supply data to other environments (SAS,
Advertising, Behavioral Targeting)
 Agile process for adding sources, from raw to
intermediate to modeled datawarehouse
 Sanoma standard data platform, used in all Sanoma
countries
 > 250 Users: dashboard users
 40 daily users: analysts & developers
 43 source systems, with 125 different sources
 400 tables in hive
 Platform:
– Cloudera Hadoop
– 40-60 nodes
– > 400TB storage
– ~2000 jobs/day
 Typical data node / task tracker:
– 1-2 CPU 4-12 cores
– 2 system disks (RAID 1)
– 4 data disks (2TB, 3TB or 4TB)
– 24-32GB RAM
24 April 2015 Presentation name27
Sanoma Data lake
Traditional BI vs Big Data approach
28 24.4.2015 © Sanoma Media
Search
Photo credits: http://www.flickr.com/photos/emyanmei/8223998414/
Search
 Keyword search can be combined with
advanced forms of ranking the results
 Most of the fields go to an index
 Facets can be used for analytics
 Ranker can be replaced with custom logic
 Products:
– Solr
– ElasticSearch
– Marklogic
 Use cases:
– Content Search
– Analytics / Faceted
– Percolation
24 April 2015 Presentation name30
Search
24 April 2015 Presentation name31
Content
Q Σ Result ranking
Search too
24 April 2015 Presentation name32
Content
t
Σ Result ranking
User
Search too
24 April 2015 Presentation name33
Content
Page
Σ Result ranking
User
 Traditional queries: against index with existing data
 What if the data does not exist at time of query?
 Percolation allows registration of queries and then returning the query IDs, e.g. for notification when
new matches are available
 Use case:
– Search for a tweet, but after the initial results continuously
get newly tweeted items when they come in
Search - Percolation
24 April 2015 Presentation name34
Graph databases
Graph databases
 Lineage: Euler and graph theory.
 Data model: Nodes & edges, both which can
hold key-value pairs
 Products:
– AllegroGraph
– InfoGrid
– Neo4j
 Use cases:
– Social relationships
– Content Linking (Entity linking)
24 April 2015 Presentation name36
Jan Smit
3js
Nick en Simon
Volendam
Article
1
Article
2
Article
3
Blob storage
Blob storage
 Endless storage of binary data
 Storing larger objects then a single machine
 “Lower” price/GB compared to SAN storage
 Products
– Amazon S3
– CAStor
– (Hadoop)
 Use case:
– Media storage
– Archiving
24 April 2015 Presentation name38
Summary
 RDBMS systems are a good enough for many problems
 For specific problems NOSQL solutions provide a specific solution
 There’s a variety of NOSQL solutions with different characteristics
 NOSQL solutions will require a higher engineering effort
Summary
24 April 2015 Presentation name40
Dream NO SQL Architecture – Content Delivery
24 April 201541
CMS
Document storage
(MongoDB/
CouchDB)
Blob storage
(S3/
CAStor)
Search
(ElasticSearch/
Solr)
Website / Mobile
Application
Dream NO SQL Architecture - Analytics
24 April 201542
Event collection
Message Queue
(Kafka / Flume )
Event processing
(Storm)
Key-value
store
(Redis)
Real time
recommendations
/ targeting
Column
storage
(Cassandra/
Hbase)
Real time
Dashboarding
Big Data
(Hadoop)
Adhoc reporting &
Data science
CAP Theorem
24 April 2015 Presentation name43
A
C P
Availability
Each client can always
read and write
Partition Tolerance
The system works well
despite physical
network partitions
Consistency
All clients always have
the same view of the
data
MySQL Asterdata
Postgres Greenplum
MS SQL Vertica
Oracle
Dynamo Cassandra
Voldemort SimpleDB
Tokyo Cabinet CouchDB
KAI Riak
Big Table MongoDB Berkeley DB
Hypertable Terrastore MemcachDB
Hbase Scalaris Redis
Data models
Relational databases
Key-value
Column-oriented
Document-oriented
Use Cases for NoSQL in Media

More Related Content

What's hot

Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
James Serra
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Rittman Analytics
 
Analyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudAnalyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The Cloud
Robert Dempsey
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
Humza Naseer
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
Harald Erb
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
Brett VanderPlaats
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
Denodo
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Dmitry Anoshin
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020
Harald Erb
 
CRM UG Belux March 2017 - Power BI and Dynamics 365
CRM UG Belux March 2017 - Power BI and Dynamics 365CRM UG Belux March 2017 - Power BI and Dynamics 365
CRM UG Belux March 2017 - Power BI and Dynamics 365
Joris Poelmans
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Databricks
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
Kent Graziano
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDB
MongoDB
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
James Serra
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
C. Scyphers
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Big Data Spain
 
AWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWSAWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWS
Dmitry Anoshin
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Khalid Salama
 
A7 storytelling with_oracle_analytics_cloud
A7 storytelling with_oracle_analytics_cloudA7 storytelling with_oracle_analytics_cloud
A7 storytelling with_oracle_analytics_cloud
Dr. Wilfred Lin (Ph.D.)
 

What's hot (20)

Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
 
Analyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudAnalyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The Cloud
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020
 
CRM UG Belux March 2017 - Power BI and Dynamics 365
CRM UG Belux March 2017 - Power BI and Dynamics 365CRM UG Belux March 2017 - Power BI and Dynamics 365
CRM UG Belux March 2017 - Power BI and Dynamics 365
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDB
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 
AWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWSAWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWS
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
A7 storytelling with_oracle_analytics_cloud
A7 storytelling with_oracle_analytics_cloudA7 storytelling with_oracle_analytics_cloud
A7 storytelling with_oracle_analytics_cloud
 

Similar to Use Cases for NoSQL in Media

Hadoop Summit - Sanoma self service on hadoop
Hadoop Summit - Sanoma self service on hadoopHadoop Summit - Sanoma self service on hadoop
Hadoop Summit - Sanoma self service on hadoop
Sander Kieft
 
Scaling self service on Hadoop
Scaling self service on HadoopScaling self service on Hadoop
Scaling self service on Hadoop
DataWorks Summit
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Databricks
 
Our way from Drupal 6 to Thunder - Contentpool for Publishers
Our way from Drupal 6 to Thunder - Contentpool for PublishersOur way from Drupal 6 to Thunder - Contentpool for Publishers
Our way from Drupal 6 to Thunder - Contentpool for Publishers
OliverBerndt
 
Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04
Martin Bém
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
DATAVERSITY
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Sam Palani
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT Analytics
Arcadia Data
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for RedisManaging Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Amazon Web Services
 
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Amazon Web Services
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
Lynn Langit
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
Hitachi Vantara
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
KELLY_MANOVERV.PDF
KELLY_MANOVERV.PDFKELLY_MANOVERV.PDF
KELLY_MANOVERV.PDF
HernanKlint
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
Hakka Labs
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
ScyllaDB
 
Data Culture Series - Keynote & Panel - 19h May - London
Data Culture Series  - Keynote & Panel - 19h May - LondonData Culture Series  - Keynote & Panel - 19h May - London
Data Culture Series - Keynote & Panel - 19h May - London
Jonathan Woodward
 
Achieving the Digital Thread through PLM and ALM Integration using OSLC
Achieving the Digital Thread through PLM and ALM Integration using OSLCAchieving the Digital Thread through PLM and ALM Integration using OSLC
Achieving the Digital Thread through PLM and ALM Integration using OSLC
Koneksys
 

Similar to Use Cases for NoSQL in Media (20)

Hadoop Summit - Sanoma self service on hadoop
Hadoop Summit - Sanoma self service on hadoopHadoop Summit - Sanoma self service on hadoop
Hadoop Summit - Sanoma self service on hadoop
 
Scaling self service on Hadoop
Scaling self service on HadoopScaling self service on Hadoop
Scaling self service on Hadoop
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Our way from Drupal 6 to Thunder - Contentpool for Publishers
Our way from Drupal 6 to Thunder - Contentpool for PublishersOur way from Drupal 6 to Thunder - Contentpool for Publishers
Our way from Drupal 6 to Thunder - Contentpool for Publishers
 
Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT Analytics
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for RedisManaging Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
 
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
KELLY_MANOVERV.PDF
KELLY_MANOVERV.PDFKELLY_MANOVERV.PDF
KELLY_MANOVERV.PDF
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
Data Culture Series - Keynote & Panel - 19h May - London
Data Culture Series  - Keynote & Panel - 19h May - LondonData Culture Series  - Keynote & Panel - 19h May - London
Data Culture Series - Keynote & Panel - 19h May - London
 
Achieving the Digital Thread through PLM and ALM Integration using OSLC
Achieving the Digital Thread through PLM and ALM Integration using OSLCAchieving the Digital Thread through PLM and ALM Integration using OSLC
Achieving the Digital Thread through PLM and ALM Integration using OSLC
 

Recently uploaded

みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 

Recently uploaded (20)

みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 

Use Cases for NoSQL in Media

  • 2. About me  Manager Core Services at Sanoma  Responsible for all common services, including the Big Data platform  Work: – Centralized services – Data platform – Search  Like: – Work – Water(sports) – Whiskey – Tinkering: Arduino, Raspberry PI, soldering stuff 24 April 20152
  • 3. Sanoma, B2C Publishing and Learning company 2+100 2 Finnish newspapers Over 100 magazines 24 April 2015 Presentation name3 5 TV channels in Finland and The Netherlands 200+ Websites 100 Mobile applications on various mobile platforms
  • 4.
  • 5. 24 April 2015 Presentation name5 Not Only SQL
  • 6. Generic vs specialized solutions 24 April 2015 Presentation name6
  • 7.  Data models  Speed  Scalability  Partition tolerance  Availability / Redundancy  Cost per GB Specialized focus 24 April 2015 Presentation name7
  • 8.  CAP (or Brewster) Theorem says: “it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: – Consistency – Availability – Partition tolerance” CAP Theorem 24 April 2015 Presentation name8 A C P
  • 9. CAP Theorem 24 April 2015 Presentation name9 A C P Availability Each client can always read and write Partition Tolerance The system works well despite physical network partitions Consistency All clients always have the same view of the data RDBMS MySQL Postgres MS SQL Oracle NOSQL NOSQL
  • 10. Eventual consistency -- Werner Vogels, CTO Amazon
  • 12.  key-value  column  document stores  map/reduce  graph  search  blob storage Various data models 24 April 2015 Presentation name12
  • 13. Key/value stores Photo credits: John Chulick - https://www.flickr.com/photos/chulickphotos/8234894686/
  • 14. Key/value stores  Storing object on key  Based on the Dynamo paper (Werner Vogels)  Products: – Riak – Memcache/Membase – Tokyo Cabinet – Redis – Voldemort  Use cases: – Counting – Top lists – Caches – Pre-calculated optimizations 24 April 2015 Presentation name14
  • 15. Bucket A B C Key/Value buckets 24 April 2015 Presentation name15 User XXXX YYYY ZZZZ Article 100 200 300 Article_<5 min. TIME> 50 100 150
  • 16. Real time stats 24 April 2015 Presentation name16
  • 17.
  • 18.
  • 20. Document stores  Stores ”records” as documents  Versioning  Easy sharding (document self contained)  Products: – MongoDB – CouchDB – SimpleDB  Use case: – CMS – Meta data – Product catalog 24 April 2015 Presentation name20
  • 21. From relational data model to document 24 April 2015 Presentation name21 Product Properties Application Property Property
  • 22. MyJour Item Based Framework …. CMS Architecture Content Platform 24 April 2015 Presentation name22 Content Platform Core Search Solr Blob Storage (S3 & MT) Article storage MongoDB Analyse CMS CMS Editorial reuse-interface ePub Digital Template system WoodWing Content Portal Feeds Noma Viva PDF Based Framework …. HomeDeco Sources Services Solutions Products ?? ?? ?? ?? eLinea Blendle Google Currents LINDA. nieuws NU.nl search
  • 24. Column stores  Lineage: Google's BigTable paper  Records with many, many columns  Distinguish between hot and cold data  Versioning  Records and columns can be sharded  Products: – Hbase – Cassandra – Hypertable  Use cases: – Analytics – Messages 24 April 2015 Presentation name24
  • 26. Big Data  Linage: Google GFS & Map/Reduce  Distributed data storage and processing  Advanced analytics capabilities on raw data  Schema on read Products:  Hadoop  MPP databases  Use cases: – Adhoc querying terabytes of data – Data science  Predictive analytics  Model training – Calculate recommendations 24 April 2015 Presentation name26
  • 27. Big Data at Sanoma  Main use case for reporting and analytics, moving to data science  A/B MVT testing evaluations  Using Qlikview as a front-end  Supply data to other environments (SAS, Advertising, Behavioral Targeting)  Agile process for adding sources, from raw to intermediate to modeled datawarehouse  Sanoma standard data platform, used in all Sanoma countries  > 250 Users: dashboard users  40 daily users: analysts & developers  43 source systems, with 125 different sources  400 tables in hive  Platform: – Cloudera Hadoop – 40-60 nodes – > 400TB storage – ~2000 jobs/day  Typical data node / task tracker: – 1-2 CPU 4-12 cores – 2 system disks (RAID 1) – 4 data disks (2TB, 3TB or 4TB) – 24-32GB RAM 24 April 2015 Presentation name27
  • 28. Sanoma Data lake Traditional BI vs Big Data approach 28 24.4.2015 © Sanoma Media
  • 30. Search  Keyword search can be combined with advanced forms of ranking the results  Most of the fields go to an index  Facets can be used for analytics  Ranker can be replaced with custom logic  Products: – Solr – ElasticSearch – Marklogic  Use cases: – Content Search – Analytics / Faceted – Percolation 24 April 2015 Presentation name30
  • 31. Search 24 April 2015 Presentation name31 Content Q Σ Result ranking
  • 32. Search too 24 April 2015 Presentation name32 Content t Σ Result ranking User
  • 33. Search too 24 April 2015 Presentation name33 Content Page Σ Result ranking User
  • 34.  Traditional queries: against index with existing data  What if the data does not exist at time of query?  Percolation allows registration of queries and then returning the query IDs, e.g. for notification when new matches are available  Use case: – Search for a tweet, but after the initial results continuously get newly tweeted items when they come in Search - Percolation 24 April 2015 Presentation name34
  • 36. Graph databases  Lineage: Euler and graph theory.  Data model: Nodes & edges, both which can hold key-value pairs  Products: – AllegroGraph – InfoGrid – Neo4j  Use cases: – Social relationships – Content Linking (Entity linking) 24 April 2015 Presentation name36 Jan Smit 3js Nick en Simon Volendam Article 1 Article 2 Article 3
  • 38. Blob storage  Endless storage of binary data  Storing larger objects then a single machine  “Lower” price/GB compared to SAN storage  Products – Amazon S3 – CAStor – (Hadoop)  Use case: – Media storage – Archiving 24 April 2015 Presentation name38
  • 40.  RDBMS systems are a good enough for many problems  For specific problems NOSQL solutions provide a specific solution  There’s a variety of NOSQL solutions with different characteristics  NOSQL solutions will require a higher engineering effort Summary 24 April 2015 Presentation name40
  • 41. Dream NO SQL Architecture – Content Delivery 24 April 201541 CMS Document storage (MongoDB/ CouchDB) Blob storage (S3/ CAStor) Search (ElasticSearch/ Solr) Website / Mobile Application
  • 42. Dream NO SQL Architecture - Analytics 24 April 201542 Event collection Message Queue (Kafka / Flume ) Event processing (Storm) Key-value store (Redis) Real time recommendations / targeting Column storage (Cassandra/ Hbase) Real time Dashboarding Big Data (Hadoop) Adhoc reporting & Data science
  • 43. CAP Theorem 24 April 2015 Presentation name43 A C P Availability Each client can always read and write Partition Tolerance The system works well despite physical network partitions Consistency All clients always have the same view of the data MySQL Asterdata Postgres Greenplum MS SQL Vertica Oracle Dynamo Cassandra Voldemort SimpleDB Tokyo Cabinet CouchDB KAI Riak Big Table MongoDB Berkeley DB Hypertable Terrastore MemcachDB Hbase Scalaris Redis Data models Relational databases Key-value Column-oriented Document-oriented