This document provides an overview of file systems and storage technologies, including Unix System 5, log-structured file systems, ZFS, RAID, flash memory, and garbage collection. It discusses how files are represented and accessed in different systems. The key aspects covered are:
- How Unix System 5 represents files using inodes and disk blocks
- How log-structured file systems write files sequentially to avoid overwriting and better suit flash memory
- Techniques used in modern file systems like ZFS to provide redundancy, detect errors, and improve performance
- Challenges of flash memory like limited write cycles and how file systems address these
- Garbage collection methods used in log-structured file systems to reclaim
University of Virginia
cs4414: Operating Systems
http://rust-class.org
For embedded notes, see:
http://rust-class.org/class-22-microkernels-and-beyond.html
University of Virginia
cs4414: Operating Systems
http://rust-class.org
What is special about the kernel
Privileged Instructions
How many processes should a browser have?
gash demo
University of Virginia
cs4414: Operating Systems
http://rust-class.org
The Internet
Benchmarking: Customer vs. Developer
Cheating on Benchmarks
Networking
Latency and Bandwidth
Tracing Routes
Network Layers
For embedded notes and videos, see:
http://rust-class.org/class-13-the-internet.html
An introduction and evaluations of a wide area distributed storage systemHiroki Kashiwazaki
A presentation on Storage Developer Conference (SDC) 2014 in Santa Clara, California. General overview of distcloud until now and the future.
米カリフォルニア州サンタクララで開催された Storage Developer Conference 2014 での発表資料です。distcloud のこれまでとこれからの総括。
University of Virginia
cs4414: Operating Systems
http://rust-class.org
For embedded notes, see:
http://rust-class.org/class-22-microkernels-and-beyond.html
University of Virginia
cs4414: Operating Systems
http://rust-class.org
What is special about the kernel
Privileged Instructions
How many processes should a browser have?
gash demo
University of Virginia
cs4414: Operating Systems
http://rust-class.org
The Internet
Benchmarking: Customer vs. Developer
Cheating on Benchmarks
Networking
Latency and Bandwidth
Tracing Routes
Network Layers
For embedded notes and videos, see:
http://rust-class.org/class-13-the-internet.html
An introduction and evaluations of a wide area distributed storage systemHiroki Kashiwazaki
A presentation on Storage Developer Conference (SDC) 2014 in Santa Clara, California. General overview of distcloud until now and the future.
米カリフォルニア州サンタクララで開催された Storage Developer Conference 2014 での発表資料です。distcloud のこれまでとこれからの総括。
Let Me Pick Your Brain - Remote Forensics in Hardened EnvironmentsNicolas Collery
Full Disk Encryption (FDE) may be rather useful as a defense mechanism against potential theft of a computer system. Usually such protections comes with some levels of hardening like removing administrative rights. However, when the system is compromised and requires careful forensic analysis, FDE and hardening can be quite painful to forensic analysts. This presentation delivered at IIC-SG-2018 (Infosec In the City - Singapore) and at Div0 (Division0 local security meetup) highlights few techniques to let a remote analyst perform investigations.
https://www.infosec-city.com
https://www.meetup.com/div-zero/
Docker storage drivers by Jérôme PetazzoniDocker, Inc.
The first release of Docker only supported AUFS, and AUFS was available (out of the box) only on Debian and Ubuntu kernel. Then Red Hat wanted Docker to run on its distros, and contributed the Device Mapper driver, and later the BTRFS driver, and recently the overlayfs driver.
Jérôme presents how those drivers compare from a high-level perspective, explaining their pros and cons.
Then he showed each driver in action, and look at low-level implementation details. We won't dive into the golang implementation code itself, but we will explain the concepts of each driver. This will help to better understand how they work, and give some hints when it comes to troubleshoot their behaviour.
Linux Kernel Booting Process (2) - For NLKBshimosawa
Describes the bootstrapping part in Linux, and related architectural mechanisms and technologies.
This is the part two of the slides, and the succeeding slides may contain the errata for this slide.
Union FileSystem - A Building Blocks Of a ContainerKnoldus Inc.
Namespace, CGroup, and Union file-system are the basic building blocks of a container. Let’s have our focus on file-system. Why yet another file-system for the container? Is Conventional Linux file-systems like ext2, ext3, ext4, XFS, etc. not good enough to meet the purpose? In this blog post, I will try to answer these questions. Here we will be delving deeply into the Union File System and a few of its essential properties.
Needle In An Encrypted Haystack: Forensics in a hardened environment (with Fu...Nicolas Collery
Full Disk Encryption (FDE) may be rather useful as a defense mechanism against potential theft of a computer system. However, when the system is compromised and requires careful forensic analysis, FDE can be quite painful to forensic analysts. Unless you deal with standard and widely supported encryption such as LUKS, Bitlocker, TrueCrypt or few others, it might really hard to get through the layers of crypto code in proprietary software.
This presentation delivered at HTCIA (HIGH TECHNOLOGY CRIME INVESTIGATION ASSOCIATION - Singapore) highlights few techniques to let a remote analyst perform investigations.
https://htcia.org
Invent the Future (Operating Systems in 2029)David Evans
University of Virginia
cs4414: Operating Systems
http://rust-class.org
For embedded notes, see:
http://rust-class.org/class-23-invent-the-future.html
"Even so, mankind will suffer badly from the disease of boredom, a disease spreading more widely each year and growing in intensity. This will have serious mental, emotional and sociological consequences, and I dare say that psychiatry will be far and away the most important medical specialty in 2014. The lucky few who can be involved in creative work of any sort will be the true elite of mankind, for they alone will do more than serve a machine.
Indeed, the most somber speculation I can make about A.D. 2014 is that in a society of enforced leisure, the most glorious single word in the vocabulary will have become work!"
Isaac Asimov, visit to the 2014 World's Fair, 1964
University of Virginia
cs4414: Operating Systems
http://rust-class.org
Scheduling in Linux, 2002-2014
Energy and Scheduling
OSX Mavericks Timer Coalescing
Scheduling Web Servers
Healthcare.gov
For embedded notes, see: http://rust-class.org/class-12-scheduling-in-linux-and-web-servers.html
Let Me Pick Your Brain - Remote Forensics in Hardened EnvironmentsNicolas Collery
Full Disk Encryption (FDE) may be rather useful as a defense mechanism against potential theft of a computer system. Usually such protections comes with some levels of hardening like removing administrative rights. However, when the system is compromised and requires careful forensic analysis, FDE and hardening can be quite painful to forensic analysts. This presentation delivered at IIC-SG-2018 (Infosec In the City - Singapore) and at Div0 (Division0 local security meetup) highlights few techniques to let a remote analyst perform investigations.
https://www.infosec-city.com
https://www.meetup.com/div-zero/
Docker storage drivers by Jérôme PetazzoniDocker, Inc.
The first release of Docker only supported AUFS, and AUFS was available (out of the box) only on Debian and Ubuntu kernel. Then Red Hat wanted Docker to run on its distros, and contributed the Device Mapper driver, and later the BTRFS driver, and recently the overlayfs driver.
Jérôme presents how those drivers compare from a high-level perspective, explaining their pros and cons.
Then he showed each driver in action, and look at low-level implementation details. We won't dive into the golang implementation code itself, but we will explain the concepts of each driver. This will help to better understand how they work, and give some hints when it comes to troubleshoot their behaviour.
Linux Kernel Booting Process (2) - For NLKBshimosawa
Describes the bootstrapping part in Linux, and related architectural mechanisms and technologies.
This is the part two of the slides, and the succeeding slides may contain the errata for this slide.
Union FileSystem - A Building Blocks Of a ContainerKnoldus Inc.
Namespace, CGroup, and Union file-system are the basic building blocks of a container. Let’s have our focus on file-system. Why yet another file-system for the container? Is Conventional Linux file-systems like ext2, ext3, ext4, XFS, etc. not good enough to meet the purpose? In this blog post, I will try to answer these questions. Here we will be delving deeply into the Union File System and a few of its essential properties.
Needle In An Encrypted Haystack: Forensics in a hardened environment (with Fu...Nicolas Collery
Full Disk Encryption (FDE) may be rather useful as a defense mechanism against potential theft of a computer system. However, when the system is compromised and requires careful forensic analysis, FDE can be quite painful to forensic analysts. Unless you deal with standard and widely supported encryption such as LUKS, Bitlocker, TrueCrypt or few others, it might really hard to get through the layers of crypto code in proprietary software.
This presentation delivered at HTCIA (HIGH TECHNOLOGY CRIME INVESTIGATION ASSOCIATION - Singapore) highlights few techniques to let a remote analyst perform investigations.
https://htcia.org
Invent the Future (Operating Systems in 2029)David Evans
University of Virginia
cs4414: Operating Systems
http://rust-class.org
For embedded notes, see:
http://rust-class.org/class-23-invent-the-future.html
"Even so, mankind will suffer badly from the disease of boredom, a disease spreading more widely each year and growing in intensity. This will have serious mental, emotional and sociological consequences, and I dare say that psychiatry will be far and away the most important medical specialty in 2014. The lucky few who can be involved in creative work of any sort will be the true elite of mankind, for they alone will do more than serve a machine.
Indeed, the most somber speculation I can make about A.D. 2014 is that in a society of enforced leisure, the most glorious single word in the vocabulary will have become work!"
Isaac Asimov, visit to the 2014 World's Fair, 1964
University of Virginia
cs4414: Operating Systems
http://rust-class.org
Scheduling in Linux, 2002-2014
Energy and Scheduling
OSX Mavericks Timer Coalescing
Scheduling Web Servers
Healthcare.gov
For embedded notes, see: http://rust-class.org/class-12-scheduling-in-linux-and-web-servers.html
Kernel-Level Programming: Entering Ring NaughtDavid Evans
University of Virginia
cs4414: Operating Systems
http://rust-class.org
Leslie Lamport wins the Turing Award!
Hardware-Based Memory Isolation
Software-Based Memory Isolation
Kernel-Level Programming
Which came first, programming languages or operating systems?
Programming without other programs
Kernel development
IronKernel
For embedded notes, see:
http://rust-class.org/class-14-entering-ring-naught.html
[See a more recent version of this talk here: http://www.slideshare.net/DavidEvansUVa/invent-the-future-operating-systems-in-2029]
http://rust-class.org
How to make Predictions
You Will (but the company that brought it to you wasn't AT&T)
Why is Human Progress Increasing Exponentially
Neil deGrasse Tyson and Science's Endless Golden Age
Malthus
Malthus' Fallacy
What the Future Holds
Smarter Scheduling (Priorities, Preemptive Priority Scheduling, Lottery and S...David Evans
University of Virginia
cs4414: Operating Systems
http://rust-class.org
Scheduling Recap
Real-Time Scheduling
On-Demand vs. Planned Scheduling
First Come, First Served
Round-Robin
Priorities
Priority Preemptive
Priority Inversion
Lottery Scheduling
Stride Scheduling
For embedded notes, see: http://rust-class.org/class-11-smarter-scheduling.html
presentatie gegeven op het seminar "Nu: Scrum op school" waarin ik aangeef dat Scrum ook voor schoolorganisaties een goede manier van projectuitvoeren is. Seminar is gegeven op 2 april 2014.
University of Virginia
cs4414: Operating Systems
http://rust-class.org
Explicit vs. Automatic Memory Management
Garbage Collection, Reference Counting
Rust ownership types
For embedded notes, see: http://rust-class.org/class9-pointers-in-rust.html
University of Virginia
cs4414: Operating Systems
http://rust-class.org
What happened with Apple's SSL implementation
How to make sure this doesn't happen to you!
Sharing data
ARCs in Rust
Scheduling
For embedded notes, see:
University of Virginia
cs4414: Operating Systems
http://rust-class.org
Explicit Memory Management
4.3BSD
Morris Worm
fingerd code
NX bit
For embedded notes, see: http://rust-class.org/class-8-managing-memory.html
University of Virginia
cs4414: Operating Systems
Rust Expressions and Higher-Order Procedures
How to Share a Processor
Non-Preemptive and Preemptive Multitasking
Kernel Timer Interrupt
Bijna iedereen die wel eens op de command line dingen uitvoert, kent wel een paar Linux commando's. Deze presentatie behandelt de Linux basiskennis aan de hand van de LPIC-1 examenstof (volgens opbouw van "LPI Certification in a Nutshell"):
Topic 101: System Architecture &
Topic 102: Linux Installation and Package Management
LPIC-1 is een certificaat dat wordt uitgegeven door Linux Professional Institute (LPI) en waarmee je jouw Linux kennis in de arbeidsmarkt kunt aantonen.
http://www.linuxnijmegen.nl/bijeenkomsten/31-lugn18-dinsdag-11-februari-2014
cs4414: Operating Systems
http://rust-class.org/class-1-what-is-an-operating-system.html
Class 1: What is an Operating System?
Why so many programming languages?
Introducing Rust
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensMatthew Ahrens
Guest lecture at Brown University's Computer Science Operating Systems class, CS167, by Matt Ahrens, co-creator of ZFS. Introduction by professor Tom Doeppner. Recording, March 2017: https://youtu.be/uJGkyMxdNFE
Topics:
- Data structures and algorithms used by ZFS snapshots
- Overview of ZFS on-disk structure
- Data structures used for ZFS space allocation
- RAID-Z compared with traditional RAID-4/5/6
Class website: http://cs.brown.edu/courses/cs167/
Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block ...Kuniyasu Suzaki
Linux Symposium 2009 Slide Suzaki
"Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"
http://www.linuxsymposium.org/2009/
Paper: http://www.kernel.org/doc/ols/2009/ols2009-pages-275-286.pdf
The talk will cover most of the performance enhancement introduced to Scylla over the past 12 months. As the throughput was very good before, we focused on Scylla’s behaviour under all types of workloads and data models. Scylla improved its latency under all scenarios, improving the behaviours of data models such as large partitions and time series, improvement of the I/O scheduler and behaviour of streaming and repair.
ITC 360Professor John CovingtonSystem Administration And Managemen.docxchristiandean12115
ITC 360Professor John CovingtonSystem Administration And Management Homework # 1
1) (5 pts) Fill in the blanks…
• The smallest addressable unit of disk space is called
• The smallest allocatable unit of disk space is called
• A subset of a hard disk drive that can contain a separate file system is called
2) (5 pts) In the FAT file system, how many entries are there in the FAT itself (hint: how is it indexed) and what does each entry contain? Hint: I don’t want an actual size, I want what the size is based on.
For problems #3 – #8 , consider a 500 GB hard disk drive with 512 byte sectors and 4 KB (4096 byte) clusters. Consider an ASCII text document midterm.txt storing the answers to this exam. The file midterm.txt is
42500 characters. Recall: every ASCII character consumes 8 bits (i.e. 1 byte). Answer every question in short-answer form. Show calculations where necessary.
3) (5 pts) How many clusters will be allocated to midterm.txt by the Operating System?
4) (5 pts) In the last cluster it is given, how many bytes does midterm.txt use?
5) (5 pts) In the last cluster it is given, how many sectors does midterm.txt use?
6) (5 pts) What is the RAM slack of midterm.txt?
7) (5 pts) What is the file slack of midterm.txt?
8) (5 pts) What is the drive slack of midterm.txt?
9) (20 pts) The following tables contain a simplified layout for a FAT file system. Describe the hierarchical arrangement of files and directories and show their content. Do this in a clean, clear text-format way using tabbing and newlines in your answer.
Root (\) starts at Cluster: 1
FAT
Cluster
Status
Cluster
Status
Cluster
Status
1
EOF
7
8
13
EOF
2
17
8
EOF
14
EOF
3
EOF
9
EOF
15
EOF
4
EOF
10
14
16
EOF
5
EOF
11
EOF
17
EOF
6
7
12
EOF
(
Cluster
Conten
t
(
D
fo
r
directory
,
F
fo
r
regula
r
file)
1
D
:
prog
s
11
doc
s
12
user
s
13
README.tx
t
6
2
F
:
lette
r
t
o
th
e
department
3
F
:
lette
r
t
o
m
y
student
4
F
:
us
e
emacs
5
D
:
<empty>
6
F
:
t
o
install
,
downloa
d
IS
O
image
7
next
,
unpac
k
int
o
C:\temp
8
9
F
:
918%%%429!!!
10
F
:
123%%%986xxx
11
D
:
p.ex
e
9
q.ex
e
10
12
D
:
a.tx
t
16
b.tx
t
3
c.tx
t
2
13
D
:
Bo
b
5
Rya
n
15
14
pqr
***888,,,
15
D
:
profile.tx
t
4
16
F
:
t
o
d
o
list
17
th
e
end
)Clusters
For Questions #10 and #11, consider a disk whose blocks are sequentially labeled A, B, C, D, E, F, G, and H.
10) (10 pts) A RAID organization of this disk that looks like this…
is using a technique known as… (indicate the single best answer)
a) parity
b) striping
c) mirroring
d) fragmenting
e) clustering
11) (10 pts) A RAID organization of this disk that looks like this…
is u.
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for CypherpunksDavid Evans
David Evans
DC Area Crypto Day
Johns Hopkins University
30 October 2015
This (non-research) talk will start with a tutorial introduction to cryptocurrencies and how bitcoin works (and doesn’t work) today. We’ll touch on some of the legal, policy, and business aspects of bitcoin and discuss some potential research opportunities in cryptocurrencies.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
2. Plan for Today
Recap: Unix System 5 File System
Creating a File
Better File Systems: ZFS, RAID
Flash Memory
1
PS4 is due
11:59pm
Sunday, 6 April
Exam 2 Redo: posted on
course site, due 11:69pm
3. 2
0
1
2
…
9
10
11
12 Disk Block
(1K bytes)
Indirect
Disk Block
(1K bytes)
4 bytes for each = 256 pointers
Disk Block
(1K bytes)
Disk Block
(1K bytes)
Disk Block
(1K bytes)
Double
Indirect
Disk Block
Indirect
Disk Block
(1K bytes)
Indirect
Disk Block
(1K bytes)
D
(
D
(1
D
(
Diskmap
(Unix System 5)
6. Finding a Free Block
5
Data
I-List (inodes)
Superblock
Boot block
Not to scale!
0
1
…
98
99
List of free disk blocks
0
1
…
98
99
7. Finding a Free inode
6
Data
I-List (inodes)
Superblock
Boot block
Not to scale!
0 0
1 1
2 0
3 0
… …
Superblock keeps a cache of free inodes
8. Finding a Free inode
7
Data
I-List (inodes)
Superblock
Boot block
Not to scale!
0 0
1 1
2 0
3 0
… …
Superblock keeps a cache of free inodes
Lots more to do!
Need to select disk blocks, update directory, etc.
Read the OSTEP chapter.
9. Modern File Systems
8
IBM 350 Disk Storage (1956)
118,000 in3, 5MB, 600ms seek
Seagate HDD (2013)
23 in3, 4TB (4M MB), 5ms seek
10. What should a modern file system do
that Unix S5FS doesn’t?
9
13. 12
“MacZFS is free data storage and protection software
for all Mac OS users. It’s for people who have Mac OS,
who have any data, and who really like their data.
Whether on a single-drive laptop or on a massive
server, it’ll store your petabytes with ragingly redundant
RAID reliability, and it’ll keep the bit-rotted bleeps and
bloops out of your iTunes library.”
26. Adaptive Replacement Cache
25
T1: Recent Cache Entries
Accessed Again
T2: Frequently-Used Blocks
Size of T1 adapts
B1: Evicted from T1 (LRU) B2: Evicted from T2 (LRU)
How should relative size of T1 and T2 be adjusted?
BlocksinCache“Ghost”Entries
27. Adaptive Replacement Cache
26
T1: Recent Cache Entries
Accessed Again
T2: Frequently-Used Blocks
Size of T1 adapts
B1: Evicted from T1 (LRU) B2: Evicted from T2 (LRU)
BlocksinCache“Ghost”Entries
Hit in B1: should increase size of T1, drop entry from T2 to B2
Hit in B2: should increase size of T2, drop entry from T1 to B1
32. Drain
How NAND Flash Works
31
Oxide Layer
Adapted from http://computer.howstuffworks.com/flash-memory1.htm
Word Line
BitLine
Control gate
Floating gate
stores electrons
Source 1
Uncharged State
33. Drain
How NAND Flash Works
32
Oxide Layer
Adapted from http://computer.howstuffworks.com/flash-memory1.htm
Word Line
BitLine
Control gate
Floating gate
stores electrons
Source 0
Charged State
----------------------------------------
35. Summary: Storage Systems
34
Device Example Time to Access Cost per Bit
Mercury (Gin) Delay Line UNIVAC (1951) 220,000ns (average)
$ 0.38 (1968)
(a bazillion n$)
DRAM
Kingston KVR16N11/4
4GB DDR3 ($40)
13.75ns 1.16 n$
SSD
Samsung 500GB
($300)
~10,000 ns
(for random read)
0.075 n$
Disk Drive
Seagate Desktop HDD 4
TB SATA 6Gb/s NCQ
64MB
5,000,000ns 0.0046 n$
36. Challenges of Flash
Writing (1 0) is expensive
Erasing (0 1) is super expensive:
Apply electric field to release charge
Can only erase a full block (often 128K) at a time
Cells wear out after 10,000-1M erasings
Reading disturbs nearby cells
Cannot read same cell too many times
35
But: no seek time – time to access every cell is the same!
37. How should we design a file
system for flash memory?
36
39. Log-Structured File System
38
Write sequentially: never overwrite data
File 1 File 2
Updated
File 1
Disk
April Fool’s? What’s wrong with this picture?
40. Where does the meta-data go?
39
Block 0
Disk
Block 1 Block 2
InodeA
41. When should we do the writes?
40
Block 0
Disk
Block 1 Block 2
InodeA
42. When should we do the writes?
41
Block 0
Disk
Block 1 Block 2
InodeA
Block 3 Block 4 Block 5
In-Memory Buffer
Block 6 Block 7
InodeB
43. When should we do the writes?
42
Block 0
Disk
Block 1 Block 2
InodeA
Block 3 Block 4 Block 5
In-Memory Buffer
Block 6 Block 7
InodeB
44. Updating a File
43
Block 0
Disk
Block 1 Block 2
InodeA
Block 3 Block 4 Block 5
Disk, continued
Block 6 Block
InodeB
Block 7
Suppose the contents of Block 1 are modified?
48. Recap: how did we do this for S5FS?
47
Filename Inode
. 494211
.. 494205
.DS_Store 494212
class0 6565946
class1 6565826
… …
class16 5649155
class2 494218
… …
49. Recap: how did we do this for S5FS?
48
Filename Inode
. 494211
.. 494205
.DS_Store 494212
class0 6565946
class1 6565826
… …
class16 5649155
class2 494218
… …
51. 50
Block 0
Disk
Block 1 Block 2
InodeA
Block 3 Block 4 Block 5
Disk, continued
Block 6 Block
InodeB
Block 7
Block 1 -
update
InodeA’
imap
0
1
2
Pointer to most recent version of inode.
52. 51
Block 0
Disk
Block 1 Block 2
InodeA
Block 3 Block 4 Block 5
Disk, continued
Block 6 Block
InodeB
Block 7
Block 1 -
update
InodeA’
imap
0
1
2
Pointer to most recent version of inode.
Where should we store the imap?
53. 52
Block 0
Disk
Block 1 Block 2
InodeA
Block 3 Block 4 Block 5
Disk, continued
Block 6 Block
InodeB
Block 7
Block 1 -
update
InodeA’
imap
0
1
2
Pointer to most recent version of inode.
At the end of each write! (when
necessary) – its small (4 bytes *
number of inodes), and sequential
writes are cheap!
54. 53
Block 0
Disk
Block 1 Block 2
InodeA
Block 3 Block 4 Block 5
Disk, continued
Block 6 Block 7InodeB
Block 7
Block 1 -
update
InodeA’
imap
Block 8
Block 0 -
update
…
Won’t the disk fill up with lots of old junk?
Block 5 -
update
InodeA’
InodeB’
imap
62. Differences with Flash
No need for sequential writes
Just need to find unused blocks
Can do 1 0 rewrites!
Maintain a bitmap of used blocks at fixed block
Lots of complexities:
Bits wear out, read disruption, etc.
61
Who should deal with those complexities?
66. Summary: Storage Systems
65
Device Example Time to Access Cost per Bit
Mercury (Gin) Delay Line UNIVAC (1951) 220,000ns (average)
$ 0.38 (1968)
(a bazillion n$)
DRAM
Kingston KVR16N11/4
4GB DDR3 ($40)
13.75ns 1.16 n$
SSD
Samsung 500GB
($300)
~10,000 ns
(for random read)
0.075 n$
Disk Drive
Seagate Desktop HDD 4
TB SATA 6Gb/s NCQ
64MB
5,000,000ns 0.0046 n$
ModernHardDrive
67. Relevance to PS4?
66
Not expected to implement any of this
– a very simple filesystem in memory is
fine (but feel free to surprise us!)
Your filesystem is in memory: no need to deal with
complexities of interfacing with persistent media
(but doing this could be a good post-PS4 project!).