SlideShare a Scribd company logo
1 of 21
OLAP ON QERIES IN SECONDS ON PETABYTE DATASET 
Distributing Petabucket data using CephFS 
Milosz Tanski, CTO @Adfin 
milosz@adfin.com 
October 2014
Outline 
 Who/what is AdFin? 
 What is PetaBucket? 
 Petabucket on CephFS 
 Contributing FSCache support to CephFS 
2 ©AdFin. All Rights Reserved
About Adfin 
 = Ad-Tech + Finance-Tech 
 Creating tools that bring buying intelligence to programmatic media. 
 Advertising is bought and sold in real time via RTB (since 2008) 
 Brining transparency to the Ad markets. 
 The Bloomberg, S&P, Markit… for Ad markets. 
3 ©AdFin. All Rights Reserved
We Deliver… Pretty Analytics 
4 ©AdFin. All Rights Reserved
We Deliver… Pretty Analytics 
5 ©AdFin. All Rights Reserved
We Deliver… Pretty Analytics 
6 ©AdFin. All Rights Reserved
We Deliver… Pretty Analytics 
7 ©AdFin. All Rights Reserved
What’s the problem? 
 Market is ~500 Billion impressions a day; it’s growing. 
 Each impression is unique. 
 Each is worth a small fraction of a penny. 
 Magnitude more then number of trades in the Financial markets 
 There’s a magnitude more bids for those impressions. 
 That’s a lot of data to process, store, analyze. 
8 ©AdFin. All Rights Reserved
Petabucket 
 Distributed, time series, relational, OLAP database 
 Relational query language (but not SQL) 
 Query in broken up into many smaller chunks 
 Great single node performance. 10s of millions rows a second. 
 Vectorized query processing, vectorized compressed bitmap indexes. 
 Responses in real-time. Goal is low single digit seconds (uncached) 
 Why? Because we’re a bit crazy. 
9 ©AdFin. All Rights Reserved
Queries easy for humans / machines 
10 ©AdFin. All Rights Reserved
High Level System Diagram 
11
Time series bulk import 
12
Petabucket and CephFS 
 CephFS as a single namespace storage for nodes 
 Why? 
 Scalable storage (speed / size) 
 Separate storage from computation 
 No SPOF 
 DFS performance 
 Client (kernel) performance 
13 ©AdFin. All Rights Reserved
High Level System Diagram, part 2 
14 ©AdFin. All Rights Reserved
CephFS is not production ready? 
 Again, we’re a bit crazy? 
 Started in early 2013. 
 When we started client and MDS were not ready. 
 We found and reported a lot of bugs. 
 Yan Zhen fixed a lot of bugs. Thanks Yan. 
 Today we’re happy and in production. 
 Processed multiple PB of data since then. 
15 ©AdFin. All Rights Reserved
FSCache for kclient 
 We decided to add local persistent caching support to the kclient. 
 Our access pattern: 
 Working set larger then node memory (page cache) 
 Append-only data (time series) 
 Most recent month, quarter of data access 100x more often 
 Benefits: 
 Reducing latency / speed lost by moving to non-local filesystem 
 Reduce Ceph network traffic and OSD utilization 
 Cheap local SSD drives get 500MB/s read performance 
 Not re-inventing the wheel 
16 ©AdFin. All Rights Reserved
Kernel programming is hard 
 Have to understand Ceph, kernel, concurrency. 
 An error in the kernel hangs or Oops your machine. 
 Bugs in other parts of the kernel? (CacheFS). 
 Prototype working in two weeks 
 First submission 2 months later. 
 In kernel 5 months later. 
 Number one problem concurrency. 
17 ©AdFin. All Rights Reserved
Ceph with FSCache Status 
 In since: 3.13 
 … Works well since: 3.15 
 … All bugs fixed: 3.17 
 Speed… as fast as your caching disk 
 Tested single client performance 1200MB/s 
18 ©AdFin. All Rights Reserved
Next steps… 
 Contributing to Ceph & kernel is addicting: 
 Ceph performance work. Improving latency / ioops. 
 Kernel work: readv2() syscall. File serving applications 
 http://lwn.net/Articles/612483/ 
19 ©AdFin. All Rights Reserved
Thank You!
Let’s Get in Touch 
21 ©AdFin. All Rights Reserved 
Milosz Tanski 
CTO 
milosz@adfin.com 
16 E. 34th Street, 15th Floor 
New York, New York 10016 
linkedin.com/company/AdFin 
twitter.com/AdFin

More Related Content

Viewers also liked

Transforming the Ceph Integration Tests with OpenStack
Transforming the Ceph Integration Tests with OpenStack Transforming the Ceph Integration Tests with OpenStack
Transforming the Ceph Integration Tests with OpenStack Ceph Community
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph Ceph Community
 
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration Ceph Community
 
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster Ceph Community
 
Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Community
 
London Ceph Day: Erasure Coding: Purpose and Progress
London Ceph Day: Erasure Coding: Purpose and Progress London Ceph Day: Erasure Coding: Purpose and Progress
London Ceph Day: Erasure Coding: Purpose and Progress Ceph Community
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Community
 
London Ceph Day: Ceph Performance and Optimization
London Ceph Day: Ceph Performance and Optimization London Ceph Day: Ceph Performance and Optimization
London Ceph Day: Ceph Performance and Optimization Ceph Community
 
Ceph Day Berlin: Ceph and iSCSI in a high availability setup
Ceph Day Berlin: Ceph and iSCSI in a high availability setupCeph Day Berlin: Ceph and iSCSI in a high availability setup
Ceph Day Berlin: Ceph and iSCSI in a high availability setupCeph Community
 
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic CloudCeph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic CloudCeph Community
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Community
 
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Community
 
Ceph Day LA: Ceph Ecosystem Update
Ceph Day LA: Ceph Ecosystem Update Ceph Day LA: Ceph Ecosystem Update
Ceph Day LA: Ceph Ecosystem Update Ceph Community
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Ceph Community
 
Ceph Day 2015 - Erasure Coding
Ceph Day 2015 - Erasure Coding Ceph Day 2015 - Erasure Coding
Ceph Day 2015 - Erasure Coding Ceph Community
 
Ceph Day Berlin: Erasure Code in Ceph
Ceph Day Berlin: Erasure Code in Ceph Ceph Day Berlin: Erasure Code in Ceph
Ceph Day Berlin: Erasure Code in Ceph Ceph Community
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
 
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Ceph Community
 
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture Ceph Community
 
Ceph Day Beijing: Containers and Ceph
Ceph Day Beijing: Containers and Ceph Ceph Day Beijing: Containers and Ceph
Ceph Day Beijing: Containers and Ceph Ceph Community
 

Viewers also liked (20)

Transforming the Ceph Integration Tests with OpenStack
Transforming the Ceph Integration Tests with OpenStack Transforming the Ceph Integration Tests with OpenStack
Transforming the Ceph Integration Tests with OpenStack
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph
 
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
 
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
 
Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective
 
London Ceph Day: Erasure Coding: Purpose and Progress
London Ceph Day: Erasure Coding: Purpose and Progress London Ceph Day: Erasure Coding: Purpose and Progress
London Ceph Day: Erasure Coding: Purpose and Progress
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
 
London Ceph Day: Ceph Performance and Optimization
London Ceph Day: Ceph Performance and Optimization London Ceph Day: Ceph Performance and Optimization
London Ceph Day: Ceph Performance and Optimization
 
Ceph Day Berlin: Ceph and iSCSI in a high availability setup
Ceph Day Berlin: Ceph and iSCSI in a high availability setupCeph Day Berlin: Ceph and iSCSI in a high availability setup
Ceph Day Berlin: Ceph and iSCSI in a high availability setup
 
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic CloudCeph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic Cloud
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
 
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
 
Ceph Day LA: Ceph Ecosystem Update
Ceph Day LA: Ceph Ecosystem Update Ceph Day LA: Ceph Ecosystem Update
Ceph Day LA: Ceph Ecosystem Update
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions
 
Ceph Day 2015 - Erasure Coding
Ceph Day 2015 - Erasure Coding Ceph Day 2015 - Erasure Coding
Ceph Day 2015 - Erasure Coding
 
Ceph Day Berlin: Erasure Code in Ceph
Ceph Day Berlin: Erasure Code in Ceph Ceph Day Berlin: Erasure Code in Ceph
Ceph Day Berlin: Erasure Code in Ceph
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
 
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
 
Ceph Day Beijing: Containers and Ceph
Ceph Day Beijing: Containers and Ceph Ceph Day Beijing: Containers and Ceph
Ceph Day Beijing: Containers and Ceph
 

Similar to Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS

OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017
OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017
OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017Cloud Native Day Tel Aviv
 
Streaming solutions for real time problems
Streaming solutions for real time problems Streaming solutions for real time problems
Streaming solutions for real time problems Aparna Gaonkar
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVYoshihiro Nakajima
 
Presentazione SimpliVity @ VMUGIT UserCon 2015
Presentazione SimpliVity @ VMUGIT UserCon 2015Presentazione SimpliVity @ VMUGIT UserCon 2015
Presentazione SimpliVity @ VMUGIT UserCon 2015VMUG IT
 
Optimizing Data for Fast Querying
Optimizing Data for Fast QueryingOptimizing Data for Fast Querying
Optimizing Data for Fast QueryingAndrei Ionescu
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
 
NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimePlamen Jeliazkov
 
Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...
Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...
Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...Puppet
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Growing Monitoring to Keep Up with Technology and Business Demands
Growing Monitoring to Keep Up with Technology and Business DemandsGrowing Monitoring to Keep Up with Technology and Business Demands
Growing Monitoring to Keep Up with Technology and Business DemandsZenoss
 
In-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessIn-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessDataWorks Summit
 
NetApp IT Data Center Strategies to Enable Digital Transformation
NetApp IT Data Center Strategies to Enable Digital TransformationNetApp IT Data Center Strategies to Enable Digital Transformation
NetApp IT Data Center Strategies to Enable Digital TransformationNetApp
 
Hadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep DiveHadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep DiveClaudioFahey1
 
WekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound AgainWekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound Againinside-BigData.com
 
Aem asset optimizations & best practices
Aem asset optimizations & best practicesAem asset optimizations & best practices
Aem asset optimizations & best practicesKanika Gera
 
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppBridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppMongoDB
 
Decreasing Incident Response Time
Decreasing Incident Response TimeDecreasing Incident Response Time
Decreasing Incident Response TimeBoni Bruno
 

Similar to Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS (20)

OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017
OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017
OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017
 
Streaming solutions for real time problems
Streaming solutions for real time problems Streaming solutions for real time problems
Streaming solutions for real time problems
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Hadoop Everywhere
Hadoop EverywhereHadoop Everywhere
Hadoop Everywhere
 
Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFV
 
Presentazione SimpliVity @ VMUGIT UserCon 2015
Presentazione SimpliVity @ VMUGIT UserCon 2015Presentazione SimpliVity @ VMUGIT UserCon 2015
Presentazione SimpliVity @ VMUGIT UserCon 2015
 
Optimizing Data for Fast Querying
Optimizing Data for Fast QueryingOptimizing Data for Fast Querying
Optimizing Data for Fast Querying
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real Time
 
Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...
Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...
Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...
 
SnapDiff
SnapDiffSnapDiff
SnapDiff
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Growing Monitoring to Keep Up with Technology and Business Demands
Growing Monitoring to Keep Up with Technology and Business DemandsGrowing Monitoring to Keep Up with Technology and Business Demands
Growing Monitoring to Keep Up with Technology and Business Demands
 
In-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessIn-Place analytics with Unified Data Access
In-Place analytics with Unified Data Access
 
NetApp IT Data Center Strategies to Enable Digital Transformation
NetApp IT Data Center Strategies to Enable Digital TransformationNetApp IT Data Center Strategies to Enable Digital Transformation
NetApp IT Data Center Strategies to Enable Digital Transformation
 
Hadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep DiveHadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep Dive
 
WekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound AgainWekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound Again
 
Aem asset optimizations & best practices
Aem asset optimizations & best practicesAem asset optimizations & best practices
Aem asset optimizations & best practices
 
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppBridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
 
Decreasing Incident Response Time
Decreasing Incident Response TimeDecreasing Incident Response Time
Decreasing Incident Response Time
 

Recently uploaded

Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 

Recently uploaded (20)

Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 

Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS

  • 1. OLAP ON QERIES IN SECONDS ON PETABYTE DATASET Distributing Petabucket data using CephFS Milosz Tanski, CTO @Adfin milosz@adfin.com October 2014
  • 2. Outline  Who/what is AdFin?  What is PetaBucket?  Petabucket on CephFS  Contributing FSCache support to CephFS 2 ©AdFin. All Rights Reserved
  • 3. About Adfin  = Ad-Tech + Finance-Tech  Creating tools that bring buying intelligence to programmatic media.  Advertising is bought and sold in real time via RTB (since 2008)  Brining transparency to the Ad markets.  The Bloomberg, S&P, Markit… for Ad markets. 3 ©AdFin. All Rights Reserved
  • 4. We Deliver… Pretty Analytics 4 ©AdFin. All Rights Reserved
  • 5. We Deliver… Pretty Analytics 5 ©AdFin. All Rights Reserved
  • 6. We Deliver… Pretty Analytics 6 ©AdFin. All Rights Reserved
  • 7. We Deliver… Pretty Analytics 7 ©AdFin. All Rights Reserved
  • 8. What’s the problem?  Market is ~500 Billion impressions a day; it’s growing.  Each impression is unique.  Each is worth a small fraction of a penny.  Magnitude more then number of trades in the Financial markets  There’s a magnitude more bids for those impressions.  That’s a lot of data to process, store, analyze. 8 ©AdFin. All Rights Reserved
  • 9. Petabucket  Distributed, time series, relational, OLAP database  Relational query language (but not SQL)  Query in broken up into many smaller chunks  Great single node performance. 10s of millions rows a second.  Vectorized query processing, vectorized compressed bitmap indexes.  Responses in real-time. Goal is low single digit seconds (uncached)  Why? Because we’re a bit crazy. 9 ©AdFin. All Rights Reserved
  • 10. Queries easy for humans / machines 10 ©AdFin. All Rights Reserved
  • 11. High Level System Diagram 11
  • 12. Time series bulk import 12
  • 13. Petabucket and CephFS  CephFS as a single namespace storage for nodes  Why?  Scalable storage (speed / size)  Separate storage from computation  No SPOF  DFS performance  Client (kernel) performance 13 ©AdFin. All Rights Reserved
  • 14. High Level System Diagram, part 2 14 ©AdFin. All Rights Reserved
  • 15. CephFS is not production ready?  Again, we’re a bit crazy?  Started in early 2013.  When we started client and MDS were not ready.  We found and reported a lot of bugs.  Yan Zhen fixed a lot of bugs. Thanks Yan.  Today we’re happy and in production.  Processed multiple PB of data since then. 15 ©AdFin. All Rights Reserved
  • 16. FSCache for kclient  We decided to add local persistent caching support to the kclient.  Our access pattern:  Working set larger then node memory (page cache)  Append-only data (time series)  Most recent month, quarter of data access 100x more often  Benefits:  Reducing latency / speed lost by moving to non-local filesystem  Reduce Ceph network traffic and OSD utilization  Cheap local SSD drives get 500MB/s read performance  Not re-inventing the wheel 16 ©AdFin. All Rights Reserved
  • 17. Kernel programming is hard  Have to understand Ceph, kernel, concurrency.  An error in the kernel hangs or Oops your machine.  Bugs in other parts of the kernel? (CacheFS).  Prototype working in two weeks  First submission 2 months later.  In kernel 5 months later.  Number one problem concurrency. 17 ©AdFin. All Rights Reserved
  • 18. Ceph with FSCache Status  In since: 3.13  … Works well since: 3.15  … All bugs fixed: 3.17  Speed… as fast as your caching disk  Tested single client performance 1200MB/s 18 ©AdFin. All Rights Reserved
  • 19. Next steps…  Contributing to Ceph & kernel is addicting:  Ceph performance work. Improving latency / ioops.  Kernel work: readv2() syscall. File serving applications  http://lwn.net/Articles/612483/ 19 ©AdFin. All Rights Reserved
  • 21. Let’s Get in Touch 21 ©AdFin. All Rights Reserved Milosz Tanski CTO milosz@adfin.com 16 E. 34th Street, 15th Floor New York, New York 10016 linkedin.com/company/AdFin twitter.com/AdFin

Editor's Notes

  1. Who is Adfin? What special sauce did we build … very large OLAP DB. Goals: Have you take a look at at CephFS … might be one of the few people talking about it. Realized that it’s possible for your organization to develop some expertise in-house… contribute.
  2. Name implies a combination of Advertising + Finance Markets. Two home town industries (Madison Ave and Wall St) Using tools and knowledge pioneered by the financial industry. Most media (by volume) is bought and sold pragmatically. Ala. HFT It’s an opaque marketplace. Bloomberg … Information Platform, S&P… Indices, Market … aggregating market data (CDS) I am going to keep butchering these analogies.
  3. Pictures of some of the tools we’ve built. Real time analysis into your own data and market data. Run a query get a result… lots of variables. Forecasting
  4. Pictures of some of the tools we’ve built. Real time analysis into your own data and market data. Run a query get a result… lots of variables. Forecasting
  5. Pictures of some of the tools we’ve built. Real time analysis into your own data and market data. Run a query get a result… lots of variables. Forecasting
  6. Pictures of some of the tools we’ve built. Real time analysis into your own data and market data. Run a query get a result… lots of variables. Forecasting
  7. The advertising market is larger then the financial market… in terms of volume of transactions. Each impression is worth a tiny fraction of a penny. When I looked at the number of transactions for an exchange like the NASDAQ… it’s like 50 million, NYSE 100 million. A lot of duct tape, but also a lot of efficiency. This number is not getting smaller. All advertising is going to be digitally bought and sold and that day is coming.
  8. Distributed, relational database for running real time analytics queries on very large time series data. KDB on many many nodes. Some fun things. It’s a relational model, but not SQL. 90% of queries or sums or group bys. Data is sharded into partitions by time. Spread across many nodes. We get pretty amazing singe node performance. 100s of millions of rows a second per partition. There’s been a lot of research into this stuff. Based on research into compression, indexing, query all from like last 3 to 4 years. For large datasets our goal is to answer under 10 seconds for really large queries. Reality is most things we do answer under 1 second. Why? Because the dataset is huge. Also, we’re a bit crazy.
  9. Distributed, relational database for running real time analytics queries on very large time series data. KDB on many many nodes. Some fun things. It’s a relational model, but not SQL. 90% of queries or sums or group bys. Data is sharded into partitions by time. Spread across many nodes. We get pretty amazing singe node performance. 100s of millions of rows a second per partition. There’s been a lot of research into this stuff. Based on research into compression, indexing, query all from like last 3 to 4 years. For large datasets our goal is to answer under 10 seconds for really large queries. Reality is most things we do answer under 1 second. Why? Because the dataset is huge. Also, we’re a bit crazy.
  10. Before we’re storing it all on local disks. Couple problems: Redundancy? Can’t grow computation without storage, vice versa. Looked into Ceph: Scalable storage, just throw more machines at it… don’t worry about topology too much. We could separate storage from computation. No SPOF, redundancy everywhere. Pretty good speed for DFS. We can leverage the kernel. The kernel client versus doing it directly. Page cache etc…Common theme
  11. “Beta company, okay using a beta product” We can get under the good. Early start was a bit rough. There was lots of bugs. We found lots of bugs. Community was great, esp Yan. Yan fixed our last bug around the end of 2013… haven’t had a single problem since. We’re not storing multi-PB yet but we processed multi-PB and haven’t had a problem
  12. We lost some performance as a result of this. Network latency, overhead, Ceph overhead. We can also go even cheaper without Ceph nodes / network. Our access pattern, write once read many (mostly true). Most recent data is most often use (working set larger then RAM smaller the the full DFS) The linux kernel people really put hundreds of man years into scabiliity.
  13. I don’t want to discourage anybody … we did something not smart, picked the hardest problem. It required us to know a lot of things about Ceph, kernel, concurrency. I would pick something simpler next time. There’s bugs in the other parts of the kernel? So one of the reasons we wanted to do this work in the kernel was concurrency, so our benefit was also out PITA.
  14. We got it up to the Ceph code base around 3.13 Bunch of bug fixes from external folks. We’ve exposed issues with FSCache code. We’ve fixed a bunch of concurrency bugs that only happen in the error path of FSCache under VMA pressure. A lot of filesystems benefit. We’re really happy with performance… we’ve made a good bet on the kernel. We’re able to really the fscache up to the speed of the disks we have.
  15. So despite the initial learning curve … we want to contribute work. Where we can leverage our knowledge … performance. We’ve built a lot of things in our system for improving latency. Learned what to do what not to do, where to apply lockless alogs. Readv2 syscall… Help all applications that do both IO and CPU bound work.
  16. Thanks for listening to me. Hopefully it was a good story of what we’re up to… how we’re leveraging Ceph. Motivating to help and contribute. It’s nice to have a vendor you can call up and yell at when things not working, but it’s even better to be able to guide the tool to do what you want. The Ceph community is great, there’s so many people contributing to so many different projects.
  17. Contact info