Joblib is a Python module that provides tools for parallel computing such as caching results on disk, efficient persistence of large numpy arrays, and support for multiple parallel backends including threading, multiprocessing, distributed, and ipyparallel. Recent updates have improved Joblib's persistence functionality by allowing multiple arrays to be stored in a single file without memory copies, supporting additional compression formats like gzip and bz2, and enabling custom parallel backends. Future work may include persisting directly to file-like objects and remote storage as well as using Joblib in cloud computing environments.
A summary of available MongoDB backup options. We manage backup for hundreds of MongoDB servers at scalegrid.io and we have learnt a few things along the way.
Introduction to Vacuum Freezing and XIDPGConf APAC
These are slides which were used by Masahiko Sawada of NTT, Japan for his presentation at pgDay Asia. He spoke about internals of VACCUM and XID Wraparound issue of PosgreSQL.
The latest version of my PostgreSQL introduction for IL-TechTalks, a free service to introduce the Israeli hi-tech community to new and interesting technologies. In this talk, I describe the history and licensing of PostgreSQL, its built-in capabilities, and some of the new things that were added in the 9.1 and 9.2 releases which make it an attractive option for many applications.
A summary of available MongoDB backup options. We manage backup for hundreds of MongoDB servers at scalegrid.io and we have learnt a few things along the way.
Introduction to Vacuum Freezing and XIDPGConf APAC
These are slides which were used by Masahiko Sawada of NTT, Japan for his presentation at pgDay Asia. He spoke about internals of VACCUM and XID Wraparound issue of PosgreSQL.
The latest version of my PostgreSQL introduction for IL-TechTalks, a free service to introduce the Israeli hi-tech community to new and interesting technologies. In this talk, I describe the history and licensing of PostgreSQL, its built-in capabilities, and some of the new things that were added in the 9.1 and 9.2 releases which make it an attractive option for many applications.
For the first time this year, 10gen will be offering a track completely dedicated to Operations at MongoSV, 10gen's annual MongoDB user conference on December 4. Learn more at MongoSV.com
Come learn about the different ways to back up your single servers, replica sets, and sharded clusters
Developing High Performance Application with Aerospike & GoChris Stivers
In this presentation, Chris Stivers, introduces the audience to Aerospike and provides tips on improving performance of Application written in Go. Tips include how to use memory more effectively in Go, and using Aerospike for high throughput / low latency transactions.
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)Ontico
HighLoad++ 2017
Зал «Кейптаун», 8 ноября, 17:00
Тезисы:
http://www.highload.ru/2017/abstracts/3096.html
PostgreSQL is the world’s most advanced open source database. Indeed! With around 270 configuration parameters in postgresql.conf, plus all the knobs in pg_hba.conf, it is definitely ADVANCED!
How many parameters do you tune? 1? 8? 32? Anyone ever tuned more than 64?
No tuning means below par performance. But how to start? Which parameters to tune? What are the appropriate values? Is there a tool --not just an editor like vim or emacs-- to help users manage the 700-line postgresql.conf file?
Join this talk to understand the performance advantages of appropriately tuning your postgresql.conf file, showcase a new free tool to make PostgreSQL configuration possible for HUMANS, and learn the best practices for tuning several relevant postgresql.conf parameters.
AppOS: PostgreSQL Extension for Scalable File I/O @ PGConf.Asia 2019Sangwook Kim
General-purpose operating systems sacrifice database performance in order to preserve generality. This performance impact has become particularly salient with modern hardware, such as high-performance SSDs. On the other hand, substantial effort is required to customize or construct operating system-like functionalities, such as user-level CPU and disk I/O scheduling, inside a database engine to meet the performance need. In this talk, I will present AppOS, a library OS implemented as a PostgreSQL extension for scalable file I/O. AppOS improves PostgreSQL throughput up to 13x and 99th response time up to 32x for a write-intensive workload, without code changes nor hardware updates. Specifically, I will explain why Linux I/O stack is problematic for the PostgreSQL performance. Next, I will introduce the core concepts and the internals of AppOS. Finally, I will share several use cases and performance results of AppOS for PostgreSQL.
Benchmark results for running bioinformatics platform Galaxy on the Amazon Web Services cloud. Results include info about disks, instance types, sizes, and variable data size.
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...Ontico
Когда стоит дилемма, какое DBMS решение выбрать, то приходится принимать во внимание много факторов — latency, bandwidth, ACID-complience, наличие/отсутствие server-side-scripting, возможности репликации, удобство развертывания и администрирования, наличие известных багов или maintenance window и т.д.
Я хочу рассказать лишь об одном из факторов, который имеет особенное значение на проектах с многомиллионными аудиториями — это Total Cost of Ownership или, по-простому, цена. Чем больше аудитория у проекта, тем больше эта аудитория создает нагрузку на базы данных, тем больше должно быть серверов с базами данных, тем больше финансовых затрат это требует.
Можно экстенсивно наращивать количество серверов, но до определенного предела, когда становится понятным, что далее дешевле будет внедрить новое, более производительное решение, которое позволит радикально снизить цену и количество железа.
Мой рассказ будет посвящен тому, как мы в Почте@Mail.Ru перешли на Tarantool, и как его использование сэкономило нам миллион долларов.
Data Science in DevOps/SysOps - Boaz Shuster - DevOpsDays Tel Aviv 2018DevOpsDays Tel Aviv
I will explain how collecting data from your continuous integration and delivery environments can help improve production releases. This talk is inspired by my previous job where I had the chance to boost the release pipeline using metrics and data.
Developers’ mDay 2019. - Rastko Vasiljević, SuperAdmins – Infrastructure as code na primeru Ansible-a
Developers’ mDay konferencija okuplja inspirativne ljude iz oblasti web developmenta. U pitanju je događaj stručnog karaktera, namenjen web developerima sa ciljem da se upoznaju sa aktuelnim tehnologijama u projektovanju web sistema, iskustvima u korišćenju najnovijih tehnika i tehnologija, kao i u rešavanju problema sa kojima se svakodnevno suočavaju.
For the first time this year, 10gen will be offering a track completely dedicated to Operations at MongoSV, 10gen's annual MongoDB user conference on December 4. Learn more at MongoSV.com
Come learn about the different ways to back up your single servers, replica sets, and sharded clusters
Developing High Performance Application with Aerospike & GoChris Stivers
In this presentation, Chris Stivers, introduces the audience to Aerospike and provides tips on improving performance of Application written in Go. Tips include how to use memory more effectively in Go, and using Aerospike for high throughput / low latency transactions.
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)Ontico
HighLoad++ 2017
Зал «Кейптаун», 8 ноября, 17:00
Тезисы:
http://www.highload.ru/2017/abstracts/3096.html
PostgreSQL is the world’s most advanced open source database. Indeed! With around 270 configuration parameters in postgresql.conf, plus all the knobs in pg_hba.conf, it is definitely ADVANCED!
How many parameters do you tune? 1? 8? 32? Anyone ever tuned more than 64?
No tuning means below par performance. But how to start? Which parameters to tune? What are the appropriate values? Is there a tool --not just an editor like vim or emacs-- to help users manage the 700-line postgresql.conf file?
Join this talk to understand the performance advantages of appropriately tuning your postgresql.conf file, showcase a new free tool to make PostgreSQL configuration possible for HUMANS, and learn the best practices for tuning several relevant postgresql.conf parameters.
AppOS: PostgreSQL Extension for Scalable File I/O @ PGConf.Asia 2019Sangwook Kim
General-purpose operating systems sacrifice database performance in order to preserve generality. This performance impact has become particularly salient with modern hardware, such as high-performance SSDs. On the other hand, substantial effort is required to customize or construct operating system-like functionalities, such as user-level CPU and disk I/O scheduling, inside a database engine to meet the performance need. In this talk, I will present AppOS, a library OS implemented as a PostgreSQL extension for scalable file I/O. AppOS improves PostgreSQL throughput up to 13x and 99th response time up to 32x for a write-intensive workload, without code changes nor hardware updates. Specifically, I will explain why Linux I/O stack is problematic for the PostgreSQL performance. Next, I will introduce the core concepts and the internals of AppOS. Finally, I will share several use cases and performance results of AppOS for PostgreSQL.
Benchmark results for running bioinformatics platform Galaxy on the Amazon Web Services cloud. Results include info about disks, instance types, sizes, and variable data size.
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...Ontico
Когда стоит дилемма, какое DBMS решение выбрать, то приходится принимать во внимание много факторов — latency, bandwidth, ACID-complience, наличие/отсутствие server-side-scripting, возможности репликации, удобство развертывания и администрирования, наличие известных багов или maintenance window и т.д.
Я хочу рассказать лишь об одном из факторов, который имеет особенное значение на проектах с многомиллионными аудиториями — это Total Cost of Ownership или, по-простому, цена. Чем больше аудитория у проекта, тем больше эта аудитория создает нагрузку на базы данных, тем больше должно быть серверов с базами данных, тем больше финансовых затрат это требует.
Можно экстенсивно наращивать количество серверов, но до определенного предела, когда становится понятным, что далее дешевле будет внедрить новое, более производительное решение, которое позволит радикально снизить цену и количество железа.
Мой рассказ будет посвящен тому, как мы в Почте@Mail.Ru перешли на Tarantool, и как его использование сэкономило нам миллион долларов.
Data Science in DevOps/SysOps - Boaz Shuster - DevOpsDays Tel Aviv 2018DevOpsDays Tel Aviv
I will explain how collecting data from your continuous integration and delivery environments can help improve production releases. This talk is inspired by my previous job where I had the chance to boost the release pipeline using metrics and data.
Developers’ mDay 2019. - Rastko Vasiljević, SuperAdmins – Infrastructure as code na primeru Ansible-a
Developers’ mDay konferencija okuplja inspirativne ljude iz oblasti web developmenta. U pitanju je događaj stručnog karaktera, namenjen web developerima sa ciljem da se upoznaju sa aktuelnim tehnologijama u projektovanju web sistema, iskustvima u korišćenju najnovijih tehnika i tehnologija, kao i u rešavanju problema sa kojima se svakodnevno suočavaju.
Adobe AEM Maintenance - Customer Care Office HoursAndrew Khoury
This presentation covers how to maintain Adobe Experience Manager 6.x (AEM / CQ / Communiqué) environments.
See the presentation video here:
https://helpx.adobe.com/experience-manager/kt/eseminars/ccoo-aem-Aug-recording.html
Introduction to Ansible - (dev ops for people who hate devops)Jude A. Goonawardena
This presentation covers the fundamentals of Ansible and it is useful for the people who are seeking information to start using Ansible, what it is capable of and how to use it right. I purposely made syntax errors in certain slides to demonstrate how to fix such errors and giving the idea of the importance to write syntax right. If you have any questions please don't hesitate to contact me over my mail address judeashan@gmail.com. All the best!!
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...Flink Forward
The Apache Beam programming model is designed to support several advanced data processing features such as autoscaling and dynamic work rebalancing. In this talk, we will first explain how dynamic work rebalancing not only provides a general and robust solution to the problem of stragglers in traditional data processing pipelines, but also how it allows autoscaling to be truly effective. We will then present how dynamic work rebalancing works as implemented in the Google Cloud Dataflow runner and which path other Apache Beam runners link Apache Flink can follow to benefit from it.
Percona XtraBackup - New Features and ImprovementsMarcelo Altmann
Percona XtraBackup is an open-source hot backup utility for MySQL - based servers that doesn't lock your database during the backup. In this talk, we will cover the latest development and new features introduced on Xtrabackup and its auxiliary tools: - Page Tracking - Azure Blob Storage Support - Exponential Backoff - Keyring Components - and more.
There are at least 40 to 50 different formats of GC logs. Here, we explained the commonly used GC log formats, tricks, patterns and tools to analyze them effectively.
Ansible is the simplest way to automate. MoldCamp, 2015Alex S
Ansible is a radically simple IT automation engine. This is new and great configuration management system (like Chef, Puppet) that has been created in 2012 year. Also Ansible is pretty simple and flexible system, that helps you in managing your servers and execute Ad-hoc commands.
During this session I will explain how to start using Ansible in infrastructure orchestration and what are pros and cons of this system. Also I will explain you our experience in deployments, provisioning and other aspects.
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebula Project
In this session we'll explore measuring VM performance and evaluating changes to settings or infrastructure which can affect performance positively. We'll also share the best current practice for architecture for high performance clouds from our experience.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
3. Overview of Joblib
Embarrassingly Parallel computing helper
Efficient disk caching to avoid recomputation
Fast I/O persistence
No dependencies, optimized for numpy arrays
Joblib is the parallel backend used by Scikit-Learn
https://pythonhosted.org/joblib/
5. Caching on disk
Use a memoize pattern with the Memory object
>>>fromjoblibimportMemory
>>>mem=Memory(cachedir='/tmp/joblib')
>>>importnumpyasnp
>>>a=np.vander(np.arange(3)).astype(np.float)
>>>square=mem.cache(np.square)
>>>b=square(a)
________________________________________________________________________________
[Memory]Callingsquare...
square(array([[0., 0., 1.],
[1., 1., 1.],
[4., 2., 1.]]))
___________________________________________________________square-0...s,0.0min
>>>c=square(a)#norecomputation
Use md5 hash of input parameters
Results are persisted on disk
6. Persistence
Convert/create an arbitrary object into/from a string of bytes
Persistence in Joblib is based on pickle and Pickler/Unpickler subclasses
>>>importnumpyasnp
>>>importjoblib
>>>obj=[('a',[1,2,3]),('b',np.arange(10))]
>>>joblib.dump(obj,'/tmp/test.pkl')
['/tmp/test.pkl','/tmp/test.pkl_01.npy']
>>>joblib.load('/tmp/test.pkl')
[('a',[1,2,3]),('b',array([0,1,2,3,4,5,6,7,8,9]))]
Use compression for fast I/O
>>>joblib.dump(obj,'/tmp/test.pkl',compress=True,cache_size=0)
['/tmp/test.pkl','/tmp/test.pkl_01.npy.z']
>>>joblib.load('/tmp/test.pkl')
Access numpy arrays with np.memmapfor out-of-core computing or for
sharing between multiple workers
8. Persistence refactoring
Until 0.9.4:
An object with multiple arrays is persisted in multiple files
Only zlib compression available
Memory copies with compression
9. Persistence refactoring
Until 0.9.4:
An object with multiple arrays is persisted in multiple files
Only zlib compression available
Memory copies with compression
In 0.10.0 (not released yet):
An object with multiple arrays goes in a single file
Support of all compression methods provided by the python
standard library
No memory copies with compression
https://github.com/joblib/joblib/pull/260
10. Persistence refactoring strategy
Write numpy array buffer interleaved in the pickle stream
⇒ all arrays in a single file
Caveat: Not compatible with pickle format
Dump/reconstruct the array by chunks of bytes using numpy functions
⇒ avoid memory copies
13. Persistence in a single file
>>>importnumpyasnp
>>>importjoblib
>>>obj=[np.ones((5000,5000)),np.random.random((5000,5000))]
#only1fileisgenerated:
>>>joblib.dump(obj,'/tmp/test.pkl',compress=True)
['/tmp/test.pkl']
>>>joblib.load('/tmp/test.pkl')
[array([[1., 1.,..., 1., 1.]],
array([[0.47006195, 0.5436392,..., 0.1218267, 0.48592789]])]
useful with scikit-learn estimators
simpler management of backup files
robust when using memory map on distributed file systems
14. Compression formats
New supported compression formats
⇒ gzip, bz2, lzma and xz
Automatic compression based on file extension
Automatic detection of compression format when loading
Valid compression file formats
Slower than zlib
15. Compression formats
New supported compression formats
⇒ gzip, bz2, lzma and xz
Automatic compression based on file extension
Automatic detection of compression format when loading
Valid compression file formats
Slower than zlib
Example with gzipcompression:
>>>joblib.dump(obj,'/tmp/test.pkl.gz',compress=('gzip',3))
>>>joblib.load('/tmp/test.pkl.gz')
[array([[1., 1.,..., 1., 1.]],
array([[0.47006195, 0.5436392,..., 0.1218267, 0.48592789]])]
>>>#orwithfileextensiondetection
>>>joblib.dump(obj,'/tmp/test.pkl.gz')
16. Performance comparison: Memory footprint and Speed
Joblib persists faster and with low extra memory consumption
Performance is data dependent
http://gael-varoquaux.info/programming/new_low-
overhead_persistence_in_joblib_for_big_data.html
18. Custom parallel backends
Until 0.9.4:
Only threading and multiprocessing backends
Not extensible
Active backend cannot be changed easily at a high level
19. Custom parallel backends
Until 0.9.4:
Only threading and multiprocessing backends
Not extensible
Active backend cannot be changed easily at a high level
In 0.10.0:
Common API using ParallelBackendBaseinterface
Use withto set the active backend in a context manager
New backends for:
distributed implemented by Matthieu Rocklin
ipyparallel implemented by Min RK
YARN implemented by Niels Zielemaker
https://github.com/joblib/joblib/pull/306 contributed by Niels Zielemaker
21. IPython parallel backend
Integration for Joblib available in version 5.1
1. Launch a 5 engines cluster:
$ipcontroller&
$ipclusterengines-n5
2. Run the following script:
importtime
importipyparallelasipp
fromipyparallel.joblibimportregisterasregister_joblib
fromjoblibimportparallel_backend,Parallel,delayed
#Registeripyparallelbackend
register_joblib()
#Startthejob
withparallel_backend("ipyparallel"):
Parallel(n_jobs=20,verbose=50)(delayed(time.sleep)(1)foriinrange(10))
24. Persistence in file objects
1. With regular file object:
>>>withopen('/tmp/test.pkl','wb')asfo:
... joblib.dump(obj,fo)
>>>withopen('/tmp/test.pkl','rb')asfo:
... joblib.load(fo)
Also works with gzip.GzipFile, bz2.BZ2File and lzma.LZMAFile
https://github.com/joblib/joblib/pull/351
25. Persistence in file objects
1. With regular file object:
>>>withopen('/tmp/test.pkl','wb')asfo:
... joblib.dump(obj,fo)
>>>withopen('/tmp/test.pkl','rb')asfo:
... joblib.load(fo)
Also works with gzip.GzipFile, bz2.BZ2File and lzma.LZMAFile
https://github.com/joblib/joblib/pull/351
2. Or with any file-like object, e.g exposing read/write functions:
>>>withRemoteStoreObject(hostname,port,obj_id)asrso:
... joblib.dump(obj,rso)
>>>withRemoteStoreObject(hostname,port,obj_id)asrso:
... joblib.load(rso)
⇒Example: blob databases, remote storage
26. Joblib in the cloud
Share persistence files
Use computing ressources
27. Conclusion
Persistence of numpy arrays in a single file...
... and soon in file objects
New compression formats: gzip, bz2, xz, lzma
... extend to faster implementations (blosc) ?
Persists without memory copies
... manipulate bigger data
New parallel backends: distributed, ipyparallel, YARN...
... new ones to come ?
Thanks !