SlideShare a Scribd company logo
1 of 20
GlusterFS Translators
Conceptual Overview
Jeff Darcy
August 28, 2012
We Should Have Called It DIYFS
● GlusterFS is a not a file system
● It's a way to build new file systems
● We happen to have built a fairly nice one
● distribution, replication, NFS/Swift/Hadoop, . . .
● come see that presentation tomorrow
● Don't like it? Build your own!
Translating “translator”
● A translator converts requests from users into requests
for storage
● one to one, one to many, one to zero (e.g. caching)
● A translator can modify requests on the way through
● convert one request type into another
● modify paths, flags, even data (e.g. encryption)
● ...intercept or block them (e.g. access control)
● ...or spawn new requests (e.g. pre-fetch)
Example: Request Routing in DHT
User (via FUSE)
DHT
Brick 1 Brick 2
Request goes
this way (not used)
Example: Request Fan Out in AFR
User (via FUSE)
AFR
Brick 1 Brick 2
lock+setxattr
write
setxattr+unlock
lock+setxattr
write
setxattr+unlock
user write
Example: Read Ahead
User (via FUSE)
read-ahead
Brick 1
Prefetch read
(async)
Original
read
First
user read
User (via FUSE)
read-ahead
Brick 1
Second
user read
(not used)
Reply
from cache
Why Build Your Own?
● GlusterFS represents a particular set of design choices
● e.g. data safety is first priority
● . . . then consistency is second . . .
● . . . finally performance
● Those choices aren't right for everyone
● Tuning only gets you so far
● We can never cover all of the use cases
● this is where HekaFS came from
Tradeoff Example: Slow Replication
● Principle: data safety before performance
● We do extra operations to make sure data survives a
crash
● That means more network round trips
● Optimizations work well for buffered sequential writes
● not so much for small/random/synchronous writes
● Lesson: AFR (today) might not be right for some
workloads (e.g. virtual-machine images)
● . . . so I wrote bypass, hsrepl
Tradeoff Example: Slow Directory Listings
● Principle: consistency before performance
● We assume other clients might have added, changed,
or deleted files
● We do a new lookup/stat/getxattr each time
● This especially hurts us e.g. with PHP scripts, git
service
● Lesson: tune cache/prefetch translators, use
autoloaders/APC
● . . . or try xattr-prefetch, negative-lookup
Benefit of DIY
● Let's see how negative-lookup helps “PHP” workload
● 1000 files spread across 10 directories
● power-law distribution: 80% of hits to 10% of files
● Measure time to find each file
[root@gfs-i8c-04 phpsim]# ./worker.py
average latency = 0.690ms
[root@gfs-i8c-04 phpsim]# ./worker.py
average latency = 2.478ms
GlusterFS 3.3
negative-lookup
over 3x as fast
How Do Translators Work?
● Shared objects
● Dynamically loaded according to “volfile”
● dlopen/dlsym
● set up pointers to parents/children
● call init (constructor)
● call I/O functions through fops
● Conventions for validating/passing options etc.
● “Translator 101” series at hekafs.org
Asynchronous Programming Model
1267 STACK_WIND (frame, dht_unlink_cbk,
1268 cached_subvol, cached_subvol->fops->unlink,
1269 &local->loc);
1270
1271 return 0;
callback function
next translator specific function
DANGER ZONE
Danger Zone?
● You lost control when you called STACK_WIND
● callback might have already happened reentrantly
● . . . or it might be running on another thread right now
● . . . or it might not run for a long time
● Data might be modified, freed, still in use
● Be extremely careful doing anything but return after
STACK_WIND
● (please clean up local allocations/references though)
Saving Context
● Pass translator-specific information between original
function and call back
● Framework provides frame->local for exactly this
● pointer to whatever structure you want
● local to call, not translator (that's this->private)
● you allocate from mem_pool, we free when call is done
● Gotcha: frame will be shared between STACK_WIND
callbacks
Fan Out
406 xlator_list_t *trav = NULL;
…
419 trav = this->children;
...
440 local->call_count = priv->child_count;
441 while (trav) {
442 STACK_WIND (frame, stripe_lookup_cbk, trav->xlator,
443 trav->xlator->fops->lookup,
444 loc, xattr_req);
445 trav = trav->next;
446 }
local is shared
child-iteration idiom
Fan In
314 LOCK (&frame->lock);
315 {
316 callcnt = --local->call_count;
…
374 }
375 UNLOCK (&frame->lock);
376
377 if (!callcnt) {
lock shared structure
how we know
we're done
Deferring Calls
2175 stub = fop_writev_stub (frame, NULL, fd, vector, count, offset, flags,
2176 iobref, xdata);
....
1843 call_resume (stub);
● There's an fop_xxx_stub for each operation type
● . . . and for each callback too
● You can also call_stub_destroy instead of resuming
capture arguments
in a structure
from callback
or worker thread
Initiating New Calls
477 newframe = create_frame(this,&priv->pool);
...
487 STACK_WIND_COOKIE (newframe, hsrepl_np_cbk, child1,
488 child1, child1->fops->setxattr,
489 &tmploc, dict, 0, NULL);
...
120 STACK_DESTROY(frame->root);
actually creates
a whole stack
sometimes
copy_frame
in callback
instead of STACK_UNWIND
Persistent objects
● Inode (inode_t) represents a file on disk
● File descriptor (fd_t) represents an open file
● Reference counted - inode_ref, fd_unref
● Translators can add own context
● e.g. inode_ctx_put (inode, xlator, value)
● values are 64-bit unsigned (or pointers)
● adding context causes translator's forget/release to be
called when object is deleted
Utility Functions
● Dictionaries
● Memory management with accounting: GF_MALLOC,
GF_CALLOC, GF_FREE
● Logging: gf_log, gf_print_trace
● UUIDs, hashes, red/black trees, name resolution
● all sorts of other stuff

More Related Content

What's hot

Gsummit apis-2013
Gsummit apis-2013Gsummit apis-2013
Gsummit apis-2013Gluster.org
 
On demand file-caching_-_gustavo_brand
On demand file-caching_-_gustavo_brandOn demand file-caching_-_gustavo_brand
On demand file-caching_-_gustavo_brandGluster.org
 
Integration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpanaIntegration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpanaGluster.org
 
Developing apps and_integrating_with_gluster_fs_-_libgfapi
Developing apps and_integrating_with_gluster_fs_-_libgfapiDeveloping apps and_integrating_with_gluster_fs_-_libgfapi
Developing apps and_integrating_with_gluster_fs_-_libgfapiGluster.org
 
Lcna 2012-tutorial
Lcna 2012-tutorialLcna 2012-tutorial
Lcna 2012-tutorialGluster.org
 
Accessing gluster ufo_-_eco_willson
Accessing gluster ufo_-_eco_willsonAccessing gluster ufo_-_eco_willson
Accessing gluster ufo_-_eco_willsonGluster.org
 
Sharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjaySharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjayGluster.org
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionGluster.org
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveGluster.org
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster PerformanceGluster.org
 
Red Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFSRed Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFSbipin kunal
 
Gluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmapGluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmapGluster.org
 
State of the_gluster_-_lceu
State of the_gluster_-_lceuState of the_gluster_-_lceu
State of the_gluster_-_lceuGluster.org
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overviewGluster.org
 
Life as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiLife as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiGluster.org
 
20160130 Gluster-roadmap
20160130 Gluster-roadmap20160130 Gluster-roadmap
20160130 Gluster-roadmapGluster.org
 
Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster.org
 
Sdc 2012-challenges
Sdc 2012-challengesSdc 2012-challenges
Sdc 2012-challengesGluster.org
 

What's hot (20)

Gsummit apis-2013
Gsummit apis-2013Gsummit apis-2013
Gsummit apis-2013
 
On demand file-caching_-_gustavo_brand
On demand file-caching_-_gustavo_brandOn demand file-caching_-_gustavo_brand
On demand file-caching_-_gustavo_brand
 
Integration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpanaIntegration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpana
 
Gluster d2
Gluster d2Gluster d2
Gluster d2
 
Developing apps and_integrating_with_gluster_fs_-_libgfapi
Developing apps and_integrating_with_gluster_fs_-_libgfapiDeveloping apps and_integrating_with_gluster_fs_-_libgfapi
Developing apps and_integrating_with_gluster_fs_-_libgfapi
 
Lcna 2012-tutorial
Lcna 2012-tutorialLcna 2012-tutorial
Lcna 2012-tutorial
 
Accessing gluster ufo_-_eco_willson
Accessing gluster ufo_-_eco_willsonAccessing gluster ufo_-_eco_willson
Accessing gluster ufo_-_eco_willson
 
Sharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjaySharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika Dhananjay
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introduction
 
YDAL Barcelona
YDAL BarcelonaYDAL Barcelona
YDAL Barcelona
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep Dive
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster Performance
 
Red Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFSRed Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFS
 
Gluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmapGluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmap
 
State of the_gluster_-_lceu
State of the_gluster_-_lceuState of the_gluster_-_lceu
State of the_gluster_-_lceu
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overview
 
Life as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiLife as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan Rossi
 
20160130 Gluster-roadmap
20160130 Gluster-roadmap20160130 Gluster-roadmap
20160130 Gluster-roadmap
 
Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016
 
Sdc 2012-challenges
Sdc 2012-challengesSdc 2012-challenges
Sdc 2012-challenges
 

Viewers also liked

Gluster for sysadmins
Gluster for sysadminsGluster for sysadmins
Gluster for sysadminsGluster.org
 
Hands On Gluster with Jeff Darcy
Hands On Gluster with Jeff DarcyHands On Gluster with Jeff Darcy
Hands On Gluster with Jeff DarcyGluster.org
 
Responsibilities of gluster_maintainers
Responsibilities of gluster_maintainersResponsibilities of gluster_maintainers
Responsibilities of gluster_maintainersGluster.org
 
Gluster wireshark niels_de_vos
Gluster wireshark niels_de_vosGluster wireshark niels_de_vos
Gluster wireshark niels_de_vosGluster.org
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storageGluster.org
 
Gluster Webinar: Introduction to GlusterFS v3.3
Gluster Webinar: Introduction to GlusterFS v3.3Gluster Webinar: Introduction to GlusterFS v3.3
Gluster Webinar: Introduction to GlusterFS v3.3GlusterFS
 
Gluster as Block Store in Containers
Gluster as Block Store in ContainersGluster as Block Store in Containers
Gluster as Block Store in ContainersGluster.org
 
Guardian Open Platform Launch Event
Guardian Open Platform Launch EventGuardian Open Platform Launch Event
Guardian Open Platform Launch EventMatt McAlister
 
20160401 Gluster-roadmap
20160401 Gluster-roadmap20160401 Gluster-roadmap
20160401 Gluster-roadmapGluster.org
 
Leases and-caching final
Leases and-caching finalLeases and-caching final
Leases and-caching finalGluster.org
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSGlusterFS
 
Bug triage in_gluster
Bug triage in_glusterBug triage in_gluster
Bug triage in_glusterGluster.org
 
Join the super_colony_-_feb2013
Join the super_colony_-_feb2013Join the super_colony_-_feb2013
Join the super_colony_-_feb2013Gluster.org
 
Debugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosDebugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosGluster.org
 
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013Gluster.org
 

Viewers also liked (17)

Qemu gluster fs
Qemu gluster fsQemu gluster fs
Qemu gluster fs
 
Gluster for sysadmins
Gluster for sysadminsGluster for sysadmins
Gluster for sysadmins
 
Hands On Gluster with Jeff Darcy
Hands On Gluster with Jeff DarcyHands On Gluster with Jeff Darcy
Hands On Gluster with Jeff Darcy
 
Responsibilities of gluster_maintainers
Responsibilities of gluster_maintainersResponsibilities of gluster_maintainers
Responsibilities of gluster_maintainers
 
Gluster wireshark niels_de_vos
Gluster wireshark niels_de_vosGluster wireshark niels_de_vos
Gluster wireshark niels_de_vos
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storage
 
Gluster Webinar: Introduction to GlusterFS v3.3
Gluster Webinar: Introduction to GlusterFS v3.3Gluster Webinar: Introduction to GlusterFS v3.3
Gluster Webinar: Introduction to GlusterFS v3.3
 
Gluster as Block Store in Containers
Gluster as Block Store in ContainersGluster as Block Store in Containers
Gluster as Block Store in Containers
 
Guardian Open Platform Launch Event
Guardian Open Platform Launch EventGuardian Open Platform Launch Event
Guardian Open Platform Launch Event
 
20160401 Gluster-roadmap
20160401 Gluster-roadmap20160401 Gluster-roadmap
20160401 Gluster-roadmap
 
Leases and-caching final
Leases and-caching finalLeases and-caching final
Leases and-caching final
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFS
 
Bug triage in_gluster
Bug triage in_glusterBug triage in_gluster
Bug triage in_gluster
 
Join the super_colony_-_feb2013
Join the super_colony_-_feb2013Join the super_colony_-_feb2013
Join the super_colony_-_feb2013
 
Gdeploy 2.0
Gdeploy 2.0Gdeploy 2.0
Gdeploy 2.0
 
Debugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosDebugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vos
 
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
 

Similar to Lcna tutorial-2012

Dragoncraft Architectural Overview
Dragoncraft Architectural OverviewDragoncraft Architectural Overview
Dragoncraft Architectural Overviewjessesanford
 
Php 5.6 From the Inside Out
Php 5.6 From the Inside OutPhp 5.6 From the Inside Out
Php 5.6 From the Inside OutFerenc Kovács
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisationgrooverdan
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia DatabasesJaime Crespo
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django applicationbangaloredjangousergroup
 
Hadoop 3.3.0 S3A community call 202004
Hadoop 3.3.0 S3A community call 202004Hadoop 3.3.0 S3A community call 202004
Hadoop 3.3.0 S3A community call 202004Gabor Bota
 
Webinar: Avoiding Sub-optimal Performance in your Retail Application
Webinar: Avoiding Sub-optimal Performance in your Retail ApplicationWebinar: Avoiding Sub-optimal Performance in your Retail Application
Webinar: Avoiding Sub-optimal Performance in your Retail ApplicationMongoDB
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...Chris Fregly
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaPrajal Kulkarni
 
StormCrawler at Bristech
StormCrawler at BristechStormCrawler at Bristech
StormCrawler at BristechJulien Nioche
 
Load testing in Zonky with Gatling
Load testing in Zonky with GatlingLoad testing in Zonky with Gatling
Load testing in Zonky with GatlingPetr Vlček
 
Less and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developersLess and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developersSeravo
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Databricks
 
Gsummit apis-2012
Gsummit apis-2012Gsummit apis-2012
Gsummit apis-2012Gluster.org
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
 

Similar to Lcna tutorial-2012 (20)

Dragoncraft Architectural Overview
Dragoncraft Architectural OverviewDragoncraft Architectural Overview
Dragoncraft Architectural Overview
 
Php 5.6 From the Inside Out
Php 5.6 From the Inside OutPhp 5.6 From the Inside Out
Php 5.6 From the Inside Out
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisation
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Php
PhpPhp
Php
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 
Hadoop 3.3.0 S3A community call 202004
Hadoop 3.3.0 S3A community call 202004Hadoop 3.3.0 S3A community call 202004
Hadoop 3.3.0 S3A community call 202004
 
Webinar: Avoiding Sub-optimal Performance in your Retail Application
Webinar: Avoiding Sub-optimal Performance in your Retail ApplicationWebinar: Avoiding Sub-optimal Performance in your Retail Application
Webinar: Avoiding Sub-optimal Performance in your Retail Application
 
Php
PhpPhp
Php
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
StormCrawler at Bristech
StormCrawler at BristechStormCrawler at Bristech
StormCrawler at Bristech
 
Load testing in Zonky with Gatling
Load testing in Zonky with GatlingLoad testing in Zonky with Gatling
Load testing in Zonky with Gatling
 
Less and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developersLess and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developers
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
 
Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
Gsummit apis-2012
Gsummit apis-2012Gsummit apis-2012
Gsummit apis-2012
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 

More from Gluster.org

Automating Gluster @ Facebook - Shreyas Siravara
Automating Gluster @ Facebook - Shreyas SiravaraAutomating Gluster @ Facebook - Shreyas Siravara
Automating Gluster @ Facebook - Shreyas SiravaraGluster.org
 
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravaranfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas SiravaraGluster.org
 
Facebook’s upstream approach to GlusterFS - David Hasson
Facebook’s upstream approach to GlusterFS  - David HassonFacebook’s upstream approach to GlusterFS  - David Hasson
Facebook’s upstream approach to GlusterFS - David HassonGluster.org
 
Throttling Traffic at Facebook Scale
Throttling Traffic at Facebook ScaleThrottling Traffic at Facebook Scale
Throttling Traffic at Facebook ScaleGluster.org
 
GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS  GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS Gluster.org
 
Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...Gluster.org
 
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)Gluster.org
 
Data Reduction for Gluster with VDO
Data Reduction for Gluster with VDOData Reduction for Gluster with VDO
Data Reduction for Gluster with VDOGluster.org
 
Releases: What are contributors responsible for
Releases: What are contributors responsible forReleases: What are contributors responsible for
Releases: What are contributors responsible forGluster.org
 
RIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
RIO Distribution: Reconstructing the onion - Shyamsundar RanganathanRIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
RIO Distribution: Reconstructing the onion - Shyamsundar RanganathanGluster.org
 
Gluster and Kubernetes
Gluster and KubernetesGluster and Kubernetes
Gluster and KubernetesGluster.org
 
Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!Gluster.org
 
Gluster: a SWOT Analysis
Gluster: a SWOT Analysis Gluster: a SWOT Analysis
Gluster: a SWOT Analysis Gluster.org
 
GlusterD-2.0: What's Happening? - Kaushal Madappa
GlusterD-2.0: What's Happening? - Kaushal MadappaGlusterD-2.0: What's Happening? - Kaushal Madappa
GlusterD-2.0: What's Happening? - Kaushal MadappaGluster.org
 
Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6Gluster.org
 
What Makes Us Fail
What Makes Us FailWhat Makes Us Fail
What Makes Us FailGluster.org
 
Gluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and futureGluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and futureGluster.org
 
Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2Gluster.org
 
Architecture of the High Availability Solution for Ganesha and Samba with Kal...
Architecture of the High Availability Solution for Ganesha and Samba with Kal...Architecture of the High Availability Solution for Ganesha and Samba with Kal...
Architecture of the High Availability Solution for Ganesha and Samba with Kal...Gluster.org
 
Gluster Containerized Storage for Cloud Applications
Gluster Containerized Storage for Cloud ApplicationsGluster Containerized Storage for Cloud Applications
Gluster Containerized Storage for Cloud ApplicationsGluster.org
 

More from Gluster.org (20)

Automating Gluster @ Facebook - Shreyas Siravara
Automating Gluster @ Facebook - Shreyas SiravaraAutomating Gluster @ Facebook - Shreyas Siravara
Automating Gluster @ Facebook - Shreyas Siravara
 
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravaranfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
 
Facebook’s upstream approach to GlusterFS - David Hasson
Facebook’s upstream approach to GlusterFS  - David HassonFacebook’s upstream approach to GlusterFS  - David Hasson
Facebook’s upstream approach to GlusterFS - David Hasson
 
Throttling Traffic at Facebook Scale
Throttling Traffic at Facebook ScaleThrottling Traffic at Facebook Scale
Throttling Traffic at Facebook Scale
 
GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS  GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS
 
Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...
 
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
 
Data Reduction for Gluster with VDO
Data Reduction for Gluster with VDOData Reduction for Gluster with VDO
Data Reduction for Gluster with VDO
 
Releases: What are contributors responsible for
Releases: What are contributors responsible forReleases: What are contributors responsible for
Releases: What are contributors responsible for
 
RIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
RIO Distribution: Reconstructing the onion - Shyamsundar RanganathanRIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
RIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
 
Gluster and Kubernetes
Gluster and KubernetesGluster and Kubernetes
Gluster and Kubernetes
 
Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!
 
Gluster: a SWOT Analysis
Gluster: a SWOT Analysis Gluster: a SWOT Analysis
Gluster: a SWOT Analysis
 
GlusterD-2.0: What's Happening? - Kaushal Madappa
GlusterD-2.0: What's Happening? - Kaushal MadappaGlusterD-2.0: What's Happening? - Kaushal Madappa
GlusterD-2.0: What's Happening? - Kaushal Madappa
 
Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6
 
What Makes Us Fail
What Makes Us FailWhat Makes Us Fail
What Makes Us Fail
 
Gluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and futureGluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and future
 
Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2
 
Architecture of the High Availability Solution for Ganesha and Samba with Kal...
Architecture of the High Availability Solution for Ganesha and Samba with Kal...Architecture of the High Availability Solution for Ganesha and Samba with Kal...
Architecture of the High Availability Solution for Ganesha and Samba with Kal...
 
Gluster Containerized Storage for Cloud Applications
Gluster Containerized Storage for Cloud ApplicationsGluster Containerized Storage for Cloud Applications
Gluster Containerized Storage for Cloud Applications
 

Recently uploaded

Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Lcna tutorial-2012

  • 2. We Should Have Called It DIYFS ● GlusterFS is a not a file system ● It's a way to build new file systems ● We happen to have built a fairly nice one ● distribution, replication, NFS/Swift/Hadoop, . . . ● come see that presentation tomorrow ● Don't like it? Build your own!
  • 3. Translating “translator” ● A translator converts requests from users into requests for storage ● one to one, one to many, one to zero (e.g. caching) ● A translator can modify requests on the way through ● convert one request type into another ● modify paths, flags, even data (e.g. encryption) ● ...intercept or block them (e.g. access control) ● ...or spawn new requests (e.g. pre-fetch)
  • 4. Example: Request Routing in DHT User (via FUSE) DHT Brick 1 Brick 2 Request goes this way (not used)
  • 5. Example: Request Fan Out in AFR User (via FUSE) AFR Brick 1 Brick 2 lock+setxattr write setxattr+unlock lock+setxattr write setxattr+unlock user write
  • 6. Example: Read Ahead User (via FUSE) read-ahead Brick 1 Prefetch read (async) Original read First user read User (via FUSE) read-ahead Brick 1 Second user read (not used) Reply from cache
  • 7. Why Build Your Own? ● GlusterFS represents a particular set of design choices ● e.g. data safety is first priority ● . . . then consistency is second . . . ● . . . finally performance ● Those choices aren't right for everyone ● Tuning only gets you so far ● We can never cover all of the use cases ● this is where HekaFS came from
  • 8. Tradeoff Example: Slow Replication ● Principle: data safety before performance ● We do extra operations to make sure data survives a crash ● That means more network round trips ● Optimizations work well for buffered sequential writes ● not so much for small/random/synchronous writes ● Lesson: AFR (today) might not be right for some workloads (e.g. virtual-machine images) ● . . . so I wrote bypass, hsrepl
  • 9. Tradeoff Example: Slow Directory Listings ● Principle: consistency before performance ● We assume other clients might have added, changed, or deleted files ● We do a new lookup/stat/getxattr each time ● This especially hurts us e.g. with PHP scripts, git service ● Lesson: tune cache/prefetch translators, use autoloaders/APC ● . . . or try xattr-prefetch, negative-lookup
  • 10. Benefit of DIY ● Let's see how negative-lookup helps “PHP” workload ● 1000 files spread across 10 directories ● power-law distribution: 80% of hits to 10% of files ● Measure time to find each file [root@gfs-i8c-04 phpsim]# ./worker.py average latency = 0.690ms [root@gfs-i8c-04 phpsim]# ./worker.py average latency = 2.478ms GlusterFS 3.3 negative-lookup over 3x as fast
  • 11. How Do Translators Work? ● Shared objects ● Dynamically loaded according to “volfile” ● dlopen/dlsym ● set up pointers to parents/children ● call init (constructor) ● call I/O functions through fops ● Conventions for validating/passing options etc. ● “Translator 101” series at hekafs.org
  • 12. Asynchronous Programming Model 1267 STACK_WIND (frame, dht_unlink_cbk, 1268 cached_subvol, cached_subvol->fops->unlink, 1269 &local->loc); 1270 1271 return 0; callback function next translator specific function DANGER ZONE
  • 13. Danger Zone? ● You lost control when you called STACK_WIND ● callback might have already happened reentrantly ● . . . or it might be running on another thread right now ● . . . or it might not run for a long time ● Data might be modified, freed, still in use ● Be extremely careful doing anything but return after STACK_WIND ● (please clean up local allocations/references though)
  • 14. Saving Context ● Pass translator-specific information between original function and call back ● Framework provides frame->local for exactly this ● pointer to whatever structure you want ● local to call, not translator (that's this->private) ● you allocate from mem_pool, we free when call is done ● Gotcha: frame will be shared between STACK_WIND callbacks
  • 15. Fan Out 406 xlator_list_t *trav = NULL; … 419 trav = this->children; ... 440 local->call_count = priv->child_count; 441 while (trav) { 442 STACK_WIND (frame, stripe_lookup_cbk, trav->xlator, 443 trav->xlator->fops->lookup, 444 loc, xattr_req); 445 trav = trav->next; 446 } local is shared child-iteration idiom
  • 16. Fan In 314 LOCK (&frame->lock); 315 { 316 callcnt = --local->call_count; … 374 } 375 UNLOCK (&frame->lock); 376 377 if (!callcnt) { lock shared structure how we know we're done
  • 17. Deferring Calls 2175 stub = fop_writev_stub (frame, NULL, fd, vector, count, offset, flags, 2176 iobref, xdata); .... 1843 call_resume (stub); ● There's an fop_xxx_stub for each operation type ● . . . and for each callback too ● You can also call_stub_destroy instead of resuming capture arguments in a structure from callback or worker thread
  • 18. Initiating New Calls 477 newframe = create_frame(this,&priv->pool); ... 487 STACK_WIND_COOKIE (newframe, hsrepl_np_cbk, child1, 488 child1, child1->fops->setxattr, 489 &tmploc, dict, 0, NULL); ... 120 STACK_DESTROY(frame->root); actually creates a whole stack sometimes copy_frame in callback instead of STACK_UNWIND
  • 19. Persistent objects ● Inode (inode_t) represents a file on disk ● File descriptor (fd_t) represents an open file ● Reference counted - inode_ref, fd_unref ● Translators can add own context ● e.g. inode_ctx_put (inode, xlator, value) ● values are 64-bit unsigned (or pointers) ● adding context causes translator's forget/release to be called when object is deleted
  • 20. Utility Functions ● Dictionaries ● Memory management with accounting: GF_MALLOC, GF_CALLOC, GF_FREE ● Logging: gf_log, gf_print_trace ● UUIDs, hashes, red/black trees, name resolution ● all sorts of other stuff

Editor's Notes

  1. Note conversion of single user request into multiple lower-level requests (some of different types)
  2. NB: we could use atomics, but don't due to portability concerns