Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS

•Download as PPTX, PDF•

0 likes•472 views

AdFin is a company that provides analytics tools for programmatic advertising markets to bring transparency. They developed PetaBucket, a distributed, relational OLAP database that can query a petabyte dataset in seconds. AdFin uses CephFS for scalable storage across petabyte datasets and nodes. They contributed code to add local caching support to the Ceph kernel client to improve performance for their workload of querying recent time-series data more frequently.

Software

OLAP ON QERIES IN SECONDS ON PETABYTE DATASET
Distributing Petabucket data using CephFS
Milosz Tanski, CTO @Adfin
milosz@adfin.com
October 2014

Outline
 Who/what is AdFin?
 What is PetaBucket?
 Petabucket on CephFS
 Contributing FSCache support to CephFS
2 ©AdFin. All Rights Reserved

About Adfin
 = Ad-Tech + Finance-Tech
 Creating tools that bring buying intelligence to programmatic media.
 Advertising is bought and sold in real time via RTB (since 2008)
 Brining transparency to the Ad markets.
 The Bloomberg, S&P, Markit… for Ad markets.
3 ©AdFin. All Rights Reserved

We Deliver… Pretty Analytics
4 ©AdFin. All Rights Reserved

We Deliver… Pretty Analytics
5 ©AdFin. All Rights Reserved

We Deliver… Pretty Analytics
6 ©AdFin. All Rights Reserved

We Deliver… Pretty Analytics
7 ©AdFin. All Rights Reserved

$What’s the problem?  Market is ~500 Billion impressions a day; it’s growing.  Each impression is unique.  Each is worth a small fraction of a penny.  Magnitude more then number of trades in the Financial markets  There’s a magnitude more bids for those impressions.  That’s a lot of data to process, store, analyze. 8 ©AdFin. All Rights Reserved$

Petabucket
 Distributed, time series, relational, OLAP database
 Relational query language (but not SQL)
 Query in broken up into many smaller chunks
 Great single node performance. 10s of millions rows a second.
 Vectorized query processing, vectorized compressed bitmap indexes.
 Responses in real-time. Goal is low single digit seconds (uncached)
 Why? Because we’re a bit crazy.
9 ©AdFin. All Rights Reserved

Queries easy for humans / machines
10 ©AdFin. All Rights Reserved

Petabucket and CephFS
 CephFS as a single namespace storage for nodes
 Why?
 Scalable storage (speed / size)
 Separate storage from computation
 No SPOF
 DFS performance
 Client (kernel) performance
13 ©AdFin. All Rights Reserved

High Level System Diagram, part 2
14 ©AdFin. All Rights Reserved

CephFS is not production ready?
 Again, we’re a bit crazy?
 Started in early 2013.
 When we started client and MDS were not ready.
 We found and reported a lot of bugs.
 Yan Zhen fixed a lot of bugs. Thanks Yan.
 Today we’re happy and in production.
 Processed multiple PB of data since then.
15 ©AdFin. All Rights Reserved

FSCache for kclient
 We decided to add local persistent caching support to the kclient.
 Our access pattern:
 Working set larger then node memory (page cache)
 Append-only data (time series)
 Most recent month, quarter of data access 100x more often
 Benefits:
 Reducing latency / speed lost by moving to non-local filesystem
 Reduce Ceph network traffic and OSD utilization
 Cheap local SSD drives get 500MB/s read performance
 Not re-inventing the wheel
16 ©AdFin. All Rights Reserved

Kernel programming is hard
 Have to understand Ceph, kernel, concurrency.
 An error in the kernel hangs or Oops your machine.
 Bugs in other parts of the kernel? (CacheFS).
 Prototype working in two weeks
 First submission 2 months later.
 In kernel 5 months later.
 Number one problem concurrency.
17 ©AdFin. All Rights Reserved

Ceph with FSCache Status
 In since: 3.13
 … Works well since: 3.15
 … All bugs fixed: 3.17
 Speed… as fast as your caching disk
 Tested single client performance 1200MB/s
18 ©AdFin. All Rights Reserved

Next steps…
 Contributing to Ceph & kernel is addicting:
 Ceph performance work. Improving latency / ioops.
 Kernel work: readv2() syscall. File serving applications
 http://lwn.net/Articles/612483/
19 ©AdFin. All Rights Reserved

Let’s Get in Touch
21 ©AdFin. All Rights Reserved
Milosz Tanski
CTO
milosz@adfin.com
16 E. 34th Street, 15th Floor
New York, New York 10016
linkedin.com/company/AdFin
twitter.com/AdFin

We at Preply do our best to ensure that our website loads quicky as it has huge impact on business. In my talk I will explain: - why pageload metric is important from business standpoint and how to measure its impact. - how we evolved with our speed optimization technics starting from very basic ones(caching, orm optimizations) to more advanced(replicas, load-balancing) and the level where we are now(CDN optmization, microservices etc.) - I will talk about both front-end and backend optimization with focus on the stack we use: AWS, Django/Python, Postgres, Docker.

Heap Dump Analysis - AEM: Real World Issues

Kanika Gera

Flash Economics and Lessons learned from operating low latency platforms at h...Aerospike, Inc.

Configuring Aerospike - Part 2

Aerospike, Inc.

Distributing Data The Aerospike WayAerospike, Inc.

Providence net app upgrade plan PPMCAccenture

Володимир Цап "Constraint driven infrastructure - scale or tune?"

Fwdays

Ceph Day Shanghai - Ceph Performance Tools

Ceph Community

Viewers also liked

Transforming the Ceph Integration Tests with OpenStack

Ceph Community

iSCSI Target Support for Ceph

Ceph Community

Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration

Ceph Community

Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster

Ceph Community

Ceph Day New York 2014: Ceph, a physical perspective

Ceph Community

London Ceph Day: Erasure Coding: Purpose and Progress

Ceph Community

Ceph Day Shanghai - Hyper Converged PLCloud with Ceph

Ceph Community

London Ceph Day: Ceph Performance and Optimization

Ceph Community

Ceph Day Berlin: Ceph and iSCSI in a high availability setup

Ceph Community

Ceph Day Berlin: Scaling an Academic Cloud

Ceph Community

Ceph Day Beijing: Big Data Analytics on Ceph Object Store

Ceph Community

Ceph Day Berlin: Measuring and predicting performance of Ceph clusters

Ceph Community

Ceph Day LA: Ceph Ecosystem Update

Ceph Community

Reference Architecture: Architecting Ceph Storage Solutions

Ceph Community

Ceph Day 2015 - Erasure Coding

Ceph Community

Ceph Day Berlin: Erasure Code in Ceph

Ceph Community

Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...

Ceph Community

Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...

Ceph Community

Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture

Ceph Community

Ceph Day Beijing: Containers and Ceph

Ceph Community

Viewers also liked (20)

Transforming the Ceph Integration Tests with OpenStack

iSCSI Target Support for Ceph

Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration

Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster

Ceph Day New York 2014: Ceph, a physical perspective

London Ceph Day: Erasure Coding: Purpose and Progress

Ceph Day Shanghai - Hyper Converged PLCloud with Ceph

London Ceph Day: Ceph Performance and Optimization

Ceph Day Berlin: Ceph and iSCSI in a high availability setup

Ceph Day Berlin: Scaling an Academic Cloud

Ceph Day Beijing: Big Data Analytics on Ceph Object Store

Ceph Day Berlin: Measuring and predicting performance of Ceph clusters

Ceph Day LA: Ceph Ecosystem Update

Reference Architecture: Architecting Ceph Storage Solutions

Ceph Day 2015 - Erasure Coding

Ceph Day Berlin: Erasure Code in Ceph

Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...

Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...

Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture

Ceph Day Beijing: Containers and Ceph

Similar to Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS

OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017

Cloud Native Day Tel Aviv

Streaming solutions for real time problems

Aparna Gaonkar

Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon

DataWorks Summit/Hadoop Summit

Hadoop Everywhere

DataWorks Summit/Hadoop Summit

Software Stacks to enable SDN and NFV

Yoshihiro Nakajima

Presentazione SimpliVity @ VMUGIT UserCon 2015

VMUG IT

Optimizing Data for Fast Querying

Andrei Ionescu

Big Data Real Time Analytics - A Facebook Case Study

Nati Shalom

Building Your Own Facebook Real Time Analytics System with Cassandra and GigaSpaces. Facebook's real time analytics system is a good reference for those looking to build their real time analytics system for big data. The first part covers the lessons from Facebook's experience and the reason they chose HBase over Cassandra. In the second part of the session, we learn how we can build our own Real Time Analytics system, achieve better performance, gain real business insights, and business analytics on our big data, and make the deployment and scaling significantly simpler using the new version of Cassandra and GigaSpaces Cloudify.

NameNode Analytics - Querying HDFS Namespace in Real Time

Plamen Jeliazkov

Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...

Puppet

SnapDiff

Ashwin Pawar

Modern infrastructure for business data lakeEMC

Growing Monitoring to Keep Up with Technology and Business Demands

Zenoss

In-Place analytics with Unified Data Access

DataWorks Summit

NetApp IT Data Center Strategies to Enable Digital Transformation

NetApp

During an Insight Las Vegas 2017 breakout presentation, NetApp IT Customer-1 Director, Stan Cox, and Senior Storage Architect, Eduardo Rivera explained how NetApp IT enables digital transformation with data center strategies that incorporates ONTAP AFF systems in the data center to save power, cooling & space and NetApp Private Storage and ONTAP Cloud to leverage the public cloud while retaining control of their data. Using OnCommand Insight for data center management—and its integration with their configuration management database—the NetApp IT team knows what’s in their data centers, in terms of both functionality, usage, and inter-connections. NetApp IT believes knowing what’s in your data centers is fundamental to maintaining total cost of ownership, adapting to new technologies, leveraging the cloud while owning your data, and enabling digital transformation.

Hadoop Analytics on Isilon Deep Dive

ClaudioFahey1

WekaIO: Making Machine Learning Compute Bound Again

inside-BigData.com

In this video from the Stanford HPC Conference, Liran Zvibel from Weka.IO presents: Making Machine Learning Compute Bound Again. "GPUs are getting faster on a yearly cycle. Networking was able to catch up and support linear scaling of models that fit in memory. Traditional storage has not caught up to the condensed performance needed by GPU-filled servers. The amount of concurrent clients and the sheer amount of data required to effectively scale modern deep learning models keeps growing. We are going to present WekaIO, the lowest latency, highest throughput file system solution that scales to 100s of PB in a single namespace supporting the most challenging deep learning projects that run today. We will present real life benchmarks comparing WekaIO performance to a local SSD file system, showing that we are the only coherent shared storage that is even faster than the current caching solutons, while allowing customers to linearly scale performance by adding more GPU servers. Also, we will view the complete ML project lifecycle, from collecting data, cleaning, tagging, exploring, training, validating, and finally archiving, and how customers can use cloud bursting to leverage public cloud infrastructure for improved economics." Learn more: https://weka.io and http://hpcadvisorycouncil.com Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

Aem asset optimizations & best practices

Kanika Gera

Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp

MongoDB

Decreasing Incident Response Time

Boni Bruno

This presentation has been well received the the SANS community and many information security teams I engage with. It describes how integrating a full content repository to your existing security architecture can decrease incident response time and lead to fast identification of root cause. I also describe a new way of implementing NetFlow without sampling to provide greater visibility of your network. Enjoy! Boni Bruno, CISSP, CISM, CGEIT www.bonibruno.com

Similar to Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS (20)

OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017

Streaming solutions for real time problems

Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon

Hadoop Everywhere

Software Stacks to enable SDN and NFV

Presentazione SimpliVity @ VMUGIT UserCon 2015

Optimizing Data for Fast Querying

Big Data Real Time Analytics - A Facebook Case Study

NameNode Analytics - Querying HDFS Namespace in Real Time

Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...

SnapDiff

Modern infrastructure for business data lake

Growing Monitoring to Keep Up with Technology and Business Demands

In-Place analytics with Unified Data Access

NetApp IT Data Center Strategies to Enable Digital Transformation

Hadoop Analytics on Isilon Deep Dive

WekaIO: Making Machine Learning Compute Bound Again

Aem asset optimizations & best practices

Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp

Decreasing Incident Response Time

Recently uploaded

Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...

Shahin Sheidaei

Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.

Globus Compute wth IRI Workflows - GlobusWorld 2024

Globus

As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.

Globus Compute Introduction - GlobusWorld 2024

Globus

May Marketo Masterclass, London MUG May 22 2024.pdf

Adele Miller

SOCRadar Research Team: Latest Activities of IntelBroker

SOCRadar

The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month. The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies. However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News. Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!

Quarkus Hidden and Forbidden Extensions

Max Andersen

First Steps with Globus Compute Multi-User Endpoints

Globus

In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.

Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf

Jay Das

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...

Globus

Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.

OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam

takuyayamamoto1800

Enterprise Resource Planning System in Telangana

NYGGS Automation Suite

Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics. To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/

Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better

XfilesPro

Cyaniclab : Software Development Agency Portfolio.pdf

Cyanic lab

CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.

Vitthal Shirke Microservices Resume Montevideo

Vitthal Shirke

top nidhi software solution freedownload

vrstrong314

This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.

A Comprehensive Look at Generative AI in Retail App Testing.pdf

kalichargn70th171

Developing Distributed High-performance Computing Capabilities of an Open Sci...

Globus

COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.

Providing Globus Services to Users of JASMIN for Environmental Data Analysis

Globus

JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.

Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...

Globus

The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.

A Sighting of filterA in Typelevel Rite of Passage

Philip Schwarz

Recently uploaded (20)

Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...

Globus Compute wth IRI Workflows - GlobusWorld 2024

Globus Compute Introduction - GlobusWorld 2024

May Marketo Masterclass, London MUG May 22 2024.pdf

SOCRadar Research Team: Latest Activities of IntelBroker

Quarkus Hidden and Forbidden Extensions

First Steps with Globus Compute Multi-User Endpoints

Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...

OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam

Enterprise Resource Planning System in Telangana

Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better

Cyaniclab : Software Development Agency Portfolio.pdf

Vitthal Shirke Microservices Resume Montevideo

top nidhi software solution freedownload

A Comprehensive Look at Generative AI in Retail App Testing.pdf

Developing Distributed High-performance Computing Capabilities of an Open Sci...

Providing Globus Services to Users of JASMIN for Environmental Data Analysis

Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...

A Sighting of filterA in Typelevel Rite of Passage

Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS

1. OLAP ON QERIES IN SECONDS ON PETABYTE DATASET Distributing Petabucket data using CephFS Milosz Tanski, CTO @Adfin milosz@adfin.com October 2014

3. About Adfin  = Ad-Tech + Finance-Tech  Creating tools that bring buying intelligence to programmatic media.  Advertising is bought and sold in real time via RTB (since 2008)  Brining transparency to the Ad markets.  The Bloomberg, S&P, Markit… for Ad markets. 3 ©AdFin. All Rights Reserved

8. What’s the problem?  Market is ~500 Billion impressions a day; it’s growing.  Each impression is unique.  Each is worth a small fraction of a penny.  Magnitude more then number of trades in the Financial markets  There’s a magnitude more bids for those impressions.  That’s a lot of data to process, store, analyze. 8 ©AdFin. All Rights Reserved

9. Petabucket  Distributed, time series, relational, OLAP database  Relational query language (but not SQL)  Query in broken up into many smaller chunks  Great single node performance. 10s of millions rows a second.  Vectorized query processing, vectorized compressed bitmap indexes.  Responses in real-time. Goal is low single digit seconds (uncached)  Why? Because we’re a bit crazy. 9 ©AdFin. All Rights Reserved

11. High Level System Diagram 11

12. Time series bulk import 12

13. Petabucket and CephFS  CephFS as a single namespace storage for nodes  Why?  Scalable storage (speed / size)  Separate storage from computation  No SPOF  DFS performance  Client (kernel) performance 13 ©AdFin. All Rights Reserved

15. CephFS is not production ready?  Again, we’re a bit crazy?  Started in early 2013.  When we started client and MDS were not ready.  We found and reported a lot of bugs.  Yan Zhen fixed a lot of bugs. Thanks Yan.  Today we’re happy and in production.  Processed multiple PB of data since then. 15 ©AdFin. All Rights Reserved

16. FSCache for kclient  We decided to add local persistent caching support to the kclient.  Our access pattern:  Working set larger then node memory (page cache)  Append-only data (time series)  Most recent month, quarter of data access 100x more often  Benefits:  Reducing latency / speed lost by moving to non-local filesystem  Reduce Ceph network traffic and OSD utilization  Cheap local SSD drives get 500MB/s read performance  Not re-inventing the wheel 16 ©AdFin. All Rights Reserved

17. Kernel programming is hard  Have to understand Ceph, kernel, concurrency.  An error in the kernel hangs or Oops your machine.  Bugs in other parts of the kernel? (CacheFS).  Prototype working in two weeks  First submission 2 months later.  In kernel 5 months later.  Number one problem concurrency. 17 ©AdFin. All Rights Reserved

18. Ceph with FSCache Status  In since: 3.13  … Works well since: 3.15  … All bugs fixed: 3.17  Speed… as fast as your caching disk  Tested single client performance 1200MB/s 18 ©AdFin. All Rights Reserved

19. Next steps…  Contributing to Ceph & kernel is addicting:  Ceph performance work. Improving latency / ioops.  Kernel work: readv2() syscall. File serving applications  http://lwn.net/Articles/612483/ 19 ©AdFin. All Rights Reserved

20. Thank You!

Editor's Notes

Who is Adfin? What special sauce did we build … very large OLAP DB. Goals: Have you take a look at at CephFS … might be one of the few people talking about it. Realized that it’s possible for your organization to develop some expertise in-house… contribute.
Name implies a combination of Advertising + Finance Markets. Two home town industries (Madison Ave and Wall St) Using tools and knowledge pioneered by the financial industry. Most media (by volume) is bought and sold pragmatically. Ala. HFT It’s an opaque marketplace. Bloomberg … Information Platform, S&P… Indices, Market … aggregating market data (CDS) I am going to keep butchering these analogies.
Pictures of some of the tools we’ve built. Real time analysis into your own data and market data. Run a query get a result… lots of variables. Forecasting
Pictures of some of the tools we’ve built. Real time analysis into your own data and market data. Run a query get a result… lots of variables. Forecasting
Pictures of some of the tools we’ve built. Real time analysis into your own data and market data. Run a query get a result… lots of variables. Forecasting
Pictures of some of the tools we’ve built. Real time analysis into your own data and market data. Run a query get a result… lots of variables. Forecasting
The advertising market is larger then the financial market… in terms of volume of transactions. Each impression is worth a tiny fraction of a penny. When I looked at the number of transactions for an exchange like the NASDAQ… it’s like 50 million, NYSE 100 million. A lot of duct tape, but also a lot of efficiency. This number is not getting smaller. All advertising is going to be digitally bought and sold and that day is coming.
Distributed, relational database for running real time analytics queries on very large time series data. KDB on many many nodes. Some fun things. It’s a relational model, but not SQL. 90% of queries or sums or group bys. Data is sharded into partitions by time. Spread across many nodes. We get pretty amazing singe node performance. 100s of millions of rows a second per partition. There’s been a lot of research into this stuff. Based on research into compression, indexing, query all from like last 3 to 4 years. For large datasets our goal is to answer under 10 seconds for really large queries. Reality is most things we do answer under 1 second. Why? Because the dataset is huge. Also, we’re a bit crazy.
Distributed, relational database for running real time analytics queries on very large time series data. KDB on many many nodes. Some fun things. It’s a relational model, but not SQL. 90% of queries or sums or group bys. Data is sharded into partitions by time. Spread across many nodes. We get pretty amazing singe node performance. 100s of millions of rows a second per partition. There’s been a lot of research into this stuff. Based on research into compression, indexing, query all from like last 3 to 4 years. For large datasets our goal is to answer under 10 seconds for really large queries. Reality is most things we do answer under 1 second. Why? Because the dataset is huge. Also, we’re a bit crazy.
Before we’re storing it all on local disks. Couple problems: Redundancy? Can’t grow computation without storage, vice versa. Looked into Ceph: Scalable storage, just throw more machines at it… don’t worry about topology too much. We could separate storage from computation. No SPOF, redundancy everywhere. Pretty good speed for DFS. We can leverage the kernel. The kernel client versus doing it directly. Page cache etc…Common theme
“Beta company, okay using a beta product” We can get under the good. Early start was a bit rough. There was lots of bugs. We found lots of bugs. Community was great, esp Yan. Yan fixed our last bug around the end of 2013… haven’t had a single problem since. We’re not storing multi-PB yet but we processed multi-PB and haven’t had a problem
We lost some performance as a result of this. Network latency, overhead, Ceph overhead. We can also go even cheaper without Ceph nodes / network. Our access pattern, write once read many (mostly true). Most recent data is most often use (working set larger then RAM smaller the the full DFS) The linux kernel people really put hundreds of man years into scabiliity.
I don’t want to discourage anybody … we did something not smart, picked the hardest problem. It required us to know a lot of things about Ceph, kernel, concurrency. I would pick something simpler next time. There’s bugs in the other parts of the kernel? So one of the reasons we wanted to do this work in the kernel was concurrency, so our benefit was also out PITA.
We got it up to the Ceph code base around 3.13 Bunch of bug fixes from external folks. We’ve exposed issues with FSCache code. We’ve fixed a bunch of concurrency bugs that only happen in the error path of FSCache under VMA pressure. A lot of filesystems benefit. We’re really happy with performance… we’ve made a good bet on the kernel. We’re able to really the fscache up to the speed of the disks we have.
So despite the initial learning curve … we want to contribute work. Where we can leverage our knowledge … performance. We’ve built a lot of things in our system for improving latency. Learned what to do what not to do, where to apply lockless alogs. Readv2 syscall… Help all applications that do both IO and CPU bound work.
Thanks for listening to me. Hopefully it was a good story of what we’re up to… how we’re leveraging Ceph. Motivating to help and contribute. It’s nice to have a vendor you can call up and yell at when things not working, but it’s even better to be able to guide the tool to do what you want. The Ceph community is great, there’s so many people contributing to so many different projects.
Contact info

Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS

Similar to Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS (20)

Recently uploaded

Recently uploaded (20)

Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS

Editor's Notes