Slides for my talk at the London Perl Workshop in Nov 2013, featuring the Devel::SizeMe perl module.
See also the screencast at https://archive.org/details/Perl-Memory-Profiling-LPW2013
Slides for my talk at SkyCon'12 in Limerick.
Here I've squeezed four talks into one, covering a lot of ground quickly, so I've included links to more detailed presentations and other resources.
Slides for my Perl Memory Use talk at YAPC::Asia in Tokyo, September 2012.
(This uploaded version includes quite a few slides from the OSCON version that I skipped at YAPC::Asia in order to have more time for a demo.)
DBD::Gofer is the scalable stateless proxy driver for Perl DBI.
These are the slides for my lightning talk on DBD::Gofer given at the Italian Perl Workshop in 2008 (with a few extra slides added).
Application Logging in the 21st century - 2014.keyTim Bunce
Slides for my talk at the Austrian Perl Workshop in Salzburg on October 10th.
A video of the talk can be found at https://www.youtube.com/watch?v=4Qj-_eimGuE
Slides of my talk on Devel::NYTProf and optimizing perl code at YAPC::NA in June 2014. It covers use of NYTProf and outlines a multi-phase approach to optimizing your perl code.
A video of the talk and questions is available at https://www.youtube.com/watch?v=T7EK6RZAnEA&list=UU7y4qaRSb5w2O8cCHOsKZDw
Devel::NYTProf 2009-07 (OUTDATED, see 201008)Tim Bunce
The slides of my "State-of-the-art Profiling with Devel::NYTProf" talk at OSCON in July 2009.
I'll upload a screencast and give the link in a blog post at http://blog.timbunce.org
Slides for my talk at SkyCon'12 in Limerick.
Here I've squeezed four talks into one, covering a lot of ground quickly, so I've included links to more detailed presentations and other resources.
Slides for my Perl Memory Use talk at YAPC::Asia in Tokyo, September 2012.
(This uploaded version includes quite a few slides from the OSCON version that I skipped at YAPC::Asia in order to have more time for a demo.)
DBD::Gofer is the scalable stateless proxy driver for Perl DBI.
These are the slides for my lightning talk on DBD::Gofer given at the Italian Perl Workshop in 2008 (with a few extra slides added).
Application Logging in the 21st century - 2014.keyTim Bunce
Slides for my talk at the Austrian Perl Workshop in Salzburg on October 10th.
A video of the talk can be found at https://www.youtube.com/watch?v=4Qj-_eimGuE
Slides of my talk on Devel::NYTProf and optimizing perl code at YAPC::NA in June 2014. It covers use of NYTProf and outlines a multi-phase approach to optimizing your perl code.
A video of the talk and questions is available at https://www.youtube.com/watch?v=T7EK6RZAnEA&list=UU7y4qaRSb5w2O8cCHOsKZDw
Devel::NYTProf 2009-07 (OUTDATED, see 201008)Tim Bunce
The slides of my "State-of-the-art Profiling with Devel::NYTProf" talk at OSCON in July 2009.
I'll upload a screencast and give the link in a blog post at http://blog.timbunce.org
This is a slightly updated draft of a talk I was planning on giving at Hadoop Summit in 2015. However the abstract was rejected. Rather than toss it, I'm going to share it with all of you on the (almost) 1 year anniversary of the first big commit of this feature!
Keep in mind that this is (currently) locked away in trunk. If you ever want to see this see the light of day, bug your vendors....
How to Develop Puppet Modules: From Source to the Forge With Zero ClicksCarlos Sanchez
Puppet Modules are a great way to reuse code, share your development with other people and take advantage of the hundreds of modules already available in the community. But how to create, test and publish them as easily as possible? now that infrastructure is defined as code, we need to use development best practices to build, test, deploy and use Puppet modules themselves. Three steps for a fully automated process
* Continuous Integration of Puppet Modules
* Automatic release and upload to the Puppet Forge
* Deploy to Puppet master
Terraform Immutablish Infrastructure with Consul-TemplateZane Williamson
Abstract: Terraform Immutablish Infrastructure with Consul-Template
• What is immutablish infrastructure? How is it different from Immutable?
• Present one of the ways we use Terraform and Consul-Template at Trulia
• What to consider when going down this route, because it is not a "silver bullet"
• Twitter: @zane_williamson, GitHub: @sepulworld
https://www.youtube.com/watch?v=Di3yZ08tsO8
Introductory Overview to Managing AWS with TerraformMichael Heyns
From the AWS NZ Auckland Community Meetup - May 4th 2017
https://www.meetup.com/AWS_NZ/events/236169428/
We get a first look at Hashicorp's Terraform and how to use it for Infrastructure as Code with Amazon Web Services.
We'll also share how it fits in with our current CI/CD workflow at the Invenco cloud services team
Sample code available at https://github.com/beanaroo/aws_nz_meetup-terraform_intro
Roll Your Own API Management Platform with nginx and LuaJon Moore
We recently replaced a proprietary API management solution with an in-house implementation built with nginx and Lua that let us get to a continuous delivery practice in a handful of months. Learn about our development process and the overall architecture that allowed us to write minimal amounts of code, enjoying native code performance while permitting interactive codeing, and how we leveraged other open source tools like Vagrant, Ansible, and OpenStack to build an automation-rich delivery pipeline. We will also take an in-depth look at our capacity management approach that differs from the rate limiting concept prevalent in the API community.
"Сравнение" инструментов анализа памяти в perl.
Текста мало, но я надеюсь целевая аудитория поймёт:)
Примеры кода использованные в презентации тут: https://github.com/kadavr/yapc-russia-2016
This is a slightly updated draft of a talk I was planning on giving at Hadoop Summit in 2015. However the abstract was rejected. Rather than toss it, I'm going to share it with all of you on the (almost) 1 year anniversary of the first big commit of this feature!
Keep in mind that this is (currently) locked away in trunk. If you ever want to see this see the light of day, bug your vendors....
How to Develop Puppet Modules: From Source to the Forge With Zero ClicksCarlos Sanchez
Puppet Modules are a great way to reuse code, share your development with other people and take advantage of the hundreds of modules already available in the community. But how to create, test and publish them as easily as possible? now that infrastructure is defined as code, we need to use development best practices to build, test, deploy and use Puppet modules themselves. Three steps for a fully automated process
* Continuous Integration of Puppet Modules
* Automatic release and upload to the Puppet Forge
* Deploy to Puppet master
Terraform Immutablish Infrastructure with Consul-TemplateZane Williamson
Abstract: Terraform Immutablish Infrastructure with Consul-Template
• What is immutablish infrastructure? How is it different from Immutable?
• Present one of the ways we use Terraform and Consul-Template at Trulia
• What to consider when going down this route, because it is not a "silver bullet"
• Twitter: @zane_williamson, GitHub: @sepulworld
https://www.youtube.com/watch?v=Di3yZ08tsO8
Introductory Overview to Managing AWS with TerraformMichael Heyns
From the AWS NZ Auckland Community Meetup - May 4th 2017
https://www.meetup.com/AWS_NZ/events/236169428/
We get a first look at Hashicorp's Terraform and how to use it for Infrastructure as Code with Amazon Web Services.
We'll also share how it fits in with our current CI/CD workflow at the Invenco cloud services team
Sample code available at https://github.com/beanaroo/aws_nz_meetup-terraform_intro
Roll Your Own API Management Platform with nginx and LuaJon Moore
We recently replaced a proprietary API management solution with an in-house implementation built with nginx and Lua that let us get to a continuous delivery practice in a handful of months. Learn about our development process and the overall architecture that allowed us to write minimal amounts of code, enjoying native code performance while permitting interactive codeing, and how we leveraged other open source tools like Vagrant, Ansible, and OpenStack to build an automation-rich delivery pipeline. We will also take an in-depth look at our capacity management approach that differs from the rate limiting concept prevalent in the API community.
"Сравнение" инструментов анализа памяти в perl.
Текста мало, но я надеюсь целевая аудитория поймёт:)
Примеры кода использованные в презентации тут: https://github.com/kadavr/yapc-russia-2016
PCD – Process Control Daemon is a light-weight system level process manager for Embedded-Linux based projects (consumer electronics, network devices, etc.).
PCD starts, stops and monitors all the user space processes in the system, in a synchronized manner, using a textual configuration file.
PCD recovers the system in case of errors and provides useful and detailed debug information.
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...peknap
Reducing memory usage is well covered in the history of this conference, yet new tricks still do exist. When optimizing memory footprint for an home gateway device, the author found some unexpected places where small changes can save valuable amount of DRAM or Flash space. This talk will visit different areas including - Kernel: fragmentation threshold, page frame reclamation task and atomic memory. Application level: Memory inefficient shared libraries due to ABI compliance and dynamic loading. Toolchain: Tuning malloc allocator parameters and compiler options. System level: General kernel might be more memory efficient than MMU-less uClinux, and preventing lock up when the system is on the brink of running out of memory.
Workshop - Linux Memory Analysis with VolatilityAndrew Case
Slides from my 3 hour workshop at Blackhat Vegas 2011. Covers using Volatility to perform Linux memory analysis investigations as well Linux kernel internals.
There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe several techniques available in R to speed up workloads like these, by running multiple iterations simultaneously, in parallel.
Many of these techniques require the use of a cluster of machines running R, and I'll provide examples of using cloud-based services to provision clusters for parallel computations. In particular, I will describe how you can use the SparklyR package to distribute data manipulations using the dplyr syntax, on a cluster of servers provisioned in the Azure cloud.
Presented by David Smith at Data Day Texas in Austin, January 27 2018.
Quantifying the Performance of Garbage Collection vs. Explicit Memory ManagementEmery Berger
This talk answers an age-old question: is garbage collection faster/slower/the same speed as malloc/free? We introduce oracular memory management, an approach that lets us measure unaltered Java programs as if they used malloc and free. The result: a good GC can match the performance of a good allocator, but it takes 5X more space. If physical memory is tight, however, conventional garbage collectors suffer an order-of-magnitude performance penalty.
A lot of data scientists use the python library pandas for quick exploration of data. The most useful construct in pandas (based on R, I think) is the dataframe, which is a 2D array(aka matrix) with the option to “name” the columns (and rows). But pandas is not distributed, so there is a limit on the data size that can be explored.
Spark is a great map-reduce like framework that can handle very big data by using a shared nothing cluster of machines.
This work is an attempt to provide a pandas-like DSL on top of spark, so that data scientists familiar with pandas have a very gradual learning curve.
A lot of data scientists use the python library pandas for quick exploration of data. The most useful construct in pandas (based on R, I think) is the dataframe, which is a 2D array(aka matrix) with the option to “name” the columns (and rows). But pandas is not distributed, so there is a limit on the data size that can be explored.
Spark is a great map-reduce like framework that can handle very big data by using a shared nothing cluster of machines.
This work is an attempt to provide a pandas-like DSL on top of spark, so that data scientists familiar with pandas have a very gradual learning curve.
Find out which is faster, SQL or NoSQL, for traditional reporting tasks. Discover how you can optimise MongoDB aggregation pipelines and how to push complex computation down to the database.
Speaker: Akira Kurogane, Senior Technical Services Engineer, MongoDB
Level: 300 (Advanced)
Track: Performance
One week your active dataset consumes 90% of available RAM. The next week it's 110%. Is that a 10% or 99% performance degradation? Let's discover what it looks like when different hardware capacity limitations are hit. For example, memory vs. disk bottlenecks, the rare CPU bottleneck and network bottlenecks, seeing what happens when you drop a crucial index during peak load, or what happens when you run multiple WiredTiger nodes on the same server without limiting their cache size.
What You Will Learn:
- Performance analysis
- Post-mortem log analysis
- Capacity planning
These slides were presented on a Software Craftsmanship meetup @ EPAM Hungary on 26 January, 2017.
During the talk we went through the evolution of structured data analytics in Spark. We compared the RDD, the SparkSQL (DataFrame) and the DataSet APIs. We used the very latest and greatest Spark 2.1, released on December 28, went through code samples and dove deep into Spark optimizations. The code samples can be downloaded from here: https://github.com/symat/spark-api-comparison
Fixed width data can be processed efficiently in Perl using forks and shared file handles. This talk describes the basic mechanism and alternatives for improving the performance in dealing with the records.
Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA
Massively scalable, always on, and ridiculously fast. Apache Cassandra is the database chosen by Apple, Netflix, and 30 of the Fortune 100 to power their critical infrastructure. How do we analyze petabytes of data, whether it be massive batching or as it’s ingested via streaming with Apache Kafka? Enter Apache Spark. Challenging MapReduce head on, Apache Spark offers powerful constructs that make it possible to slice and dice your data, whether it be through machine learning, graph queries, as well as transformations familiar to people with functional programming backgrounds such as map, filter, and reduce. Step away ready to rock with the most powerful distributed database, scalable messaging, and analytics platform on the planet.
Watch the video here
https://www.youtube.com/watch?v=X-FKmKc9hkI
Mike Pittaro - High Performance Hardware for Data Analysis PyData
Choosing hardware for big data analysis is difficult because of the many options and variables involved. The problem is more complicated when you need a full cluster for big data analytics.
This session will cover the basic guidelines and architectural choices involved in choosing analytics hardware for Spark and Hadoop. I will cover processor core and memory ratios, disk subsystems, and network architecture. This is a practical advice oriented session, and will focus on performance and cost tradeoffs for many different options.
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...Gianmario Spacagna
Abstract:
Legacy enterprise architectures still rely on relational data warehouse and require moving and syncing with the so-called "Data Lake" where raw data is stored and periodically ingested into a distributed file system such as HDFS.
Moreover, there are a number of use cases where you might want to avoid storing data on the development cluster disks, such as for regulations or reducing latency, in which case Alluxio (previously known as Tachyon) can make this data available in-memory and shared among multiple applications.
We propose an Agile workflow by combining Spark, Scala, DataFrame (and the recent DataSet API), JDBC, Parquet, Kryo and Alluxio to create a scalable, in-memory, reactive stack to explore data directly from source and develop high quality machine learning pipelines that can then be deployed straight into production.
In this talk we will:
* Present how to load raw data from an RDBMS and use Spark to make it available as a DataSet
* Explain the iterative exploratory process and advantages of adopting functional programming
* Make a crucial analysis on the issues faced with the existing methodology
* Show how to deploy Alluxio and how it greatly improved the existing workflow by providing the desired in-memory solution and by decreasing the loading time from hours to seconds
* Discuss some future improvements to the overall architecture
Bio:
Gianmario is a Senior Data Scientist at Pirelli Tyre, processing telemetry data for smart manufacturing and connected vehicles applications.
His main expertise is on building production-oriented machine learning systems.
Co-author of the Professional Manifesto for Data Science (datasciencemanifesto.com), founder of the Data Science Milan Meetup group and currently writing "Python Deep Learning" book (will be published soon).
He loves evangelising his passion for best practices and effective methodologies amongst the community.
Prior to Pirelli, he worked in Financial Services (Barclays), Cyber Security (Cisco) and Predictive Marketing (AgilOne).
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.
Vous avez récemment commencé à travailler sur Spark et vos jobs prennent une éternité pour se terminer ? Cette présentation est faite pour vous.
Himanshu Arora et Nitya Nand YADAV ont rassemblé de nombreuses bonnes pratiques, optimisations et ajustements qu'ils ont appliqué au fil des années en production pour rendre leurs jobs plus rapides et moins consommateurs de ressources.
Dans cette présentation, ils nous apprennent les techniques avancées d'optimisation de Spark, les formats de sérialisation des données, les formats de stockage, les optimisations hardware, contrôle sur la parallélisme, paramétrages de resource manager, meilleur data localité et l'optimisation du GC etc.
Ils nous font découvrir également l'utilisation appropriée de RDD, DataFrame et Dataset afin de bénéficier pleinement des optimisations internes apportées par Spark.
Slides of my Perl 6 DBDI (database interface) talk at YAPC::EU in August 2010. Please also see the fun screencast that includes a live demo of perl6 using a perl5 DBI driver: http://timbunce.blip.tv/file/3973550/
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)Tim Bunce
Slides of my talk on Devel::NYTProf and optimizing perl code at the Italian Perl Workshop (IPW09). It covers the new features in NYTProf v3 and a new section outlining a multi-phase approach to optimizing your perl code.
30 mins long plus 10 mins of questions. Best viewed fullscreen.
An update of my Perl Myths talk (for http://ossbarcamp.com in Dublin, Ireland, September 2009). It covers jobs, cpan, community, best practices, power tools, and perl 6.
Slides of my talk about the DashProfiler perl module, which enables lightweight always-on performance monitoring for critical sections of code. See
http://search.cpan.org/perldoc?DashProfiler
Perl Myths 200802 with notes (OUTDATED, see 200909)Tim Bunce
Perl programming has it's share of myths. This presentation debunks a few popular ones with hard facts. Surprise yourself with the realities.
THIS VERSION IS OUTDATED. PLEASE SEE http://www.slideshare.net/Tim.Bunce/perl-myths-200909
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
2. Ouch!
$ perl some_script.pl
Out of memory!
$
$ perl some_script.pl
Killed.
$
$ perl some_script.pl
$
Someone shouts: "Hey! My process has been killed!"
$ perl some_script.pl
[...later...] "Why is it taking so long?"
4. $ perl -e 'system("cat /proc/$$/stat")'
# $$ = pid
4752 (perl) S 4686 4752 4686 34816 4752 4202496 536 0 0 0 0 0 0 0 20 0 1 0 62673440 123121664
440 18446744073709551615 4194304 4198212 140735314078128 140735314077056 140645336670206 0 0
134 0 18446744071579305831 0 0 17 10 0 0 0 0 0 0 0 0 0 0 4752 111 111 111
$ perl -e 'system("cat /proc/$$/statm")'
30059 441 346 1 0 160 0
$ perl -e 'system("ps -p $$ -o vsz,rsz,sz,size")'
VSZ
RSZ
SZ
SZ
120236 1764 30059
640
$ perl -e 'system("top -b -n1 -p $$")'
...
PID USER
PR NI VIRT RES SHR S %CPU %MEM
13063 tim
20
0 117m 1764 1384 S 0.0 0.1
TIME+ COMMAND
0:00.00 perl
$ perl -e 'system("cat /proc/$$/status")'
...
VmPeak:!
120236 kB
VmSize:!
120236 kB <- total (code, libs, stack, heap etc.)
VmHWM:!
1760 kB
VmRSS:!
1760 kB <- how much of the total is resident in physical memory
VmData:!
548 kB <- data (heap)
VmStk:!
92 kB <- stack
VmExe:!
4 kB <- code
VmLib:!
4220 kB <- libs, including libperl.so
VmPTE:!
84 kB
VmPTD:!
28 kB
VmSwap:!
0 kB
...
Further info on unix.stackexchange.com
5. C Program Code
int main(...) { ... }
Read-only Data
eg “String constants”
Read-write Data
un/initialized variables
Heap
(not to scale!)
Shared Lib Code
Shared Lib R/O Data
repeated for each lib
Shared Lib R/W Data
//
C Stack
System
(not the perl stack)
7. $ perl -e 'system("cat /proc/$$/smaps")' # note ‘smaps’ not ‘maps’
address
...
perms ...
pathname
7fb00fbc1000-7fb00fd22000 r-xp ... /.../5.10.1/x86_64-linux/CORE/libperl.so
Size:
1412 kB
<- size of executable code in libperl.so
Rss:
720 kB
<- amount that's currently in physical memory
Pss:
364 kB
Shared_Clean:
712 kB
Shared_Dirty:
0 kB
Private_Clean:
8 kB
Private_Dirty:
0 kB
Referenced:
720 kB
Anonymous:
0 kB
AnonHugePages:
0 kB
Swap:
0 kB
KernelPageSize:
4 kB
MMUPageSize:
4 kB
... repeated for every segment ...
... repeated for every segment ...
8. Memory Pages
✦
Process view:
✦
✦
Large continuous regions of memory. Simple.
Operating System view:
✦
Memory is divided into pages
✦
Pages are loaded to physical memory on demand
✦
Mapping can change without the process knowing
9. C Program Code
Read-only Data
Read-write Data
Memory is divided into pages
Page size is typically 4KB
Heap
← Page ‘resident’ in physical
memory
← Page not resident
Shared Lib Code
Shared Lib R/O Data
Shared Lib R/W Data
C Stack
System
RSS “Resident Set Size”
is how much process memory is
currently in physical memory
10. Key Point
✦
Don’t use Resident Set Size (RSS)
✦
✦
✦
Unless you really want to know what’s currently resident.
It can shrink even while the process size grows.
Heap size or Total memory size is a good indicator.
14. malloc manages memory allocation
Heap
perl data
malloc() requests big
chunks of memory from the
operating system as needed.
Almost never returns it!
Perl makes lots of malloc
and free requests.
Freed fragments of various
sizes accumulate.
19. Devel::Peek
•
Gives you a textual view of data
$ perl -MDevel::Peek -e '%a = (42 => "Hello World!"); Dump(%a)'
SV = IV(0x1332fd0) at 0x1332fe0
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x1346730
SV = PVHV(0x1339090) at 0x1346730
REFCNT = 2
FLAGS = (SHAREKEYS)
ARRAY = 0x1378750 (0:7, 1:1)
KEYS = 1
FILL = 1
MAX = 7
Elt "42" HASH = 0x73caace8
SV = PV(0x1331090) at 0x1332de8
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x133f960 "Hello World!"0
CUR = 12
<= length in use
LEN = 16
<= amount allocated
20. Devel::Size
•
Gives you a measure of the size of a data structure
$ perl -MDevel::Size=total_size -le 'print total_size( 0 )'
24
$ perl -MDevel::Size=total_size -le 'print total_size( [] )'
64
$ perl -MDevel::Size=total_size -le 'print total_size( {} )'
120
$ perl -MDevel::Size=total_size -le 'print total_size( [ 1..100 ] )'
3264
•
•
•
Created by Dan Sugalski, now maintained by Nicholas Clark
Is very fast, and accurate for most simple data types.
Has limitations and bugs, but is the best tool we have.
21. Arenas
Heads and Bodies are allocated from ‘arenas’ (slabs) managed by perl.
One for SV heads an one for each size of SV body.
More efficient than malloc in space and speed.
Introspect arenas with Devel::Arena and Devel::Gladiator.
$ perl -MDevel::Gladiator=arena_table -e 'warn arena_table()'
ARENA COUNTS:
1063 SCALAR
199 GLOB
120 ARRAY
95 CODE
66 HASH
...
22. Key Notes
✦
All variable length data storage comes from malloc
✦
✦
Heads and Bodies are allocated from ‘arenas’ managed by perl
✦
✦
malloc has overheads, bucket and fragmentation issues
Arenas have less overhead but are never freed
Memory usage will always be higher than the sum of the sizes.
24. Memory Profiling?
✦
Track memory size over time?
✦
✦
Experiments with Devel::NYTProf
✦
✦
See where memory is allocated and freed?
Turned out to not be very useful
Need to know what is ‘holding’ memory.
25. Space in Hiding
✦
Perl tends to consume extra memory to save time
✦
This can lead to surprises, for example:
✦
✦
sub foo {
my $var = "X" x 10_000_000;
}
foo();
# ~20MB still used after return!
sub bar{
my $var = "X" x 10_000_000;
bar($_[0]-1) if $_[0]; # recurse
}
bar(50);
# ~1GB still used after return!
26. X-Ray Vision!
✦
Want to see inside the black box
✦
Want to know “where memory is being held”
✦
A snapshot “crawl and dump” approach
✦
Separate capture from analysis
27. My Plan
✦
✦
✦
✦
✦
✦
✦
✦
(circa 2012)
Extend Devel::Size
Add a C-level callback hook
Add some kind of "data path name" mechanism
Add a function to return the size of everything
Stream the data to disk
Write tools to manipilate, summarize & query the data
Write tools to visualize the data
Write tools to compare sets of data
29. Devel::SizeMe Outputs
✦
✦
✦
✦
✦
✦
✦
Text - handy for testing and simple structures
Graphviz - useful visualization for up to ~1000 nodes
Treemap - useful for simple top-down view (“blame”)
Gephi - full network view (structure, relationships)
SQLite db
Very little analysis implemented yet
Ref-loops are isolated from “owners”
33. Devel::SizeMe Summary
✦
Focussed on memory use
✦
Walks trees of pointers in perl internals
✦
Can dump individual data structures
✦
Stream-based - scales to any size of application
✦
Multiple output formats
✦
Very minimal and informal data model
34. Current Limitations
✦
Very minimal and informal data model
✦
Ref loops gets separated out
✦
Accumulating sizes up tree happens too soon
✦
Can’t edit the tree without invalidating sizes
✦
Needs a multi-phase processing pipeline
✦
Needs a more task-oriented user interface
35. Recommendations
✦
Store the data in some kind of database
✦
Perform transformations on the database data
✦
Generate UI from the database - scalability
✦
Express queries as db queries - flexibility
✦
What kind of database? Relational or Graph?
36. Possible Futures
✦
Feed Devel::MAT data into SQLite
✦
Feed SQLite data into Neo4j
✦
Develop useful Cypher query fragments
✦
Develop graph simplifications as plugins
✦
Develop visualizations