panel session presentation on long-term digital storage; presented at Library of Congress workshop in September 2012; the challenge of storing large PB (petabytes) of data in 2012, 2015, and 2018
At the PASS Summit in Seattle, one of the outstanding keynote presentations was by Dr. Dave DeWitt, Microsoft Fellow, and leader of the Microsoft Jim Gray Systems Lab, in Madison, WI.
Dr. DeWitt is working on releases 1 and 2 of SQL Server Parallel Database Warehouse. In his keynote he reviewed the 30 year history of CPU, memory, and disk performance. Variations in performance gains across these subsystems, with disk performance lagging badly, have major impacts on database system performance.
Disk performance gains have been made in three areas, Capacity, Transfer Rate, and Average Seek Time. However, the gains over the last 30 years have not been uniform.
Capacity of high performance disk drives has increased by a factor of 10,000. Transfer rates have increased by a factor of 65. The average seek time has only increased by a factor of 10. Dr. DeWitt talked about the impact of these discrepancies on OLTP and Data Warehouse applications.
One of his conclusions is that some problems can be fixed through smarter software, but that “SSDs provide the only real help.”
The Details That Matter: Kafka in Production, at Scale with Or Arnon and Elad...HostedbyConfluent
Are you running at scale? Did you experience “voodoo problems” in your infrastructure? We have a 5M messages/sec cluster that taught us some valuable lessons. Seeing our Kafka clusters become sluggish or crash, taking our production services with them, we have some insights that we hope help you steer your next production incident and make sure your data pipelines run smoothly. We’ll tell the story of skews and anomalies in CPU and disk metrics - drawing graphs and conclusions. Understand how compacted topics, partitions distribution, and RAM can affect your cluster’s performance. Finally, look at how a small configuration drift can rattle your cluster. Our goal is to provide you with the tools and knowledge to navigate this uncharted territory.
AWS vs Azure vs Google Cloud Storage Deep DiveRightScale
Cloud services keep evolving, and cloud storage is no different. It can be difficult to keep up to date with the latest from each cloud provider and understand how they compare. We’ll drill down on object, block, archival, and file storage for the leading public clouds. We’ll also compare prices for a variety of storage scenarios.
AWS re:Invent 2016: Deep Dive on Amazon Elastic Block Store (STG301)Amazon Web Services
In this popular session, you will learn about the latest features and use cases for Amazon EBS, including best practices, an overview of newly introduced features, and brand-new re:Invent announcements. In particular we will cover the expanded portoflio of volume types, including provisioned IOPS, cold storage, and throughput-optimized. This session will help database admins and application architects understand how to blend performance and cost with applicaitns for big data analytics, data warehousing, and transactional and NoSQL databases.
Accelerating hbase with nvme and bucket cacheDavid Grier
This set of slides describes some initial experiments which we have designed for discovering improvements for performance in Hadoop technologies using NVMe technology
At the PASS Summit in Seattle, one of the outstanding keynote presentations was by Dr. Dave DeWitt, Microsoft Fellow, and leader of the Microsoft Jim Gray Systems Lab, in Madison, WI.
Dr. DeWitt is working on releases 1 and 2 of SQL Server Parallel Database Warehouse. In his keynote he reviewed the 30 year history of CPU, memory, and disk performance. Variations in performance gains across these subsystems, with disk performance lagging badly, have major impacts on database system performance.
Disk performance gains have been made in three areas, Capacity, Transfer Rate, and Average Seek Time. However, the gains over the last 30 years have not been uniform.
Capacity of high performance disk drives has increased by a factor of 10,000. Transfer rates have increased by a factor of 65. The average seek time has only increased by a factor of 10. Dr. DeWitt talked about the impact of these discrepancies on OLTP and Data Warehouse applications.
One of his conclusions is that some problems can be fixed through smarter software, but that “SSDs provide the only real help.”
The Details That Matter: Kafka in Production, at Scale with Or Arnon and Elad...HostedbyConfluent
Are you running at scale? Did you experience “voodoo problems” in your infrastructure? We have a 5M messages/sec cluster that taught us some valuable lessons. Seeing our Kafka clusters become sluggish or crash, taking our production services with them, we have some insights that we hope help you steer your next production incident and make sure your data pipelines run smoothly. We’ll tell the story of skews and anomalies in CPU and disk metrics - drawing graphs and conclusions. Understand how compacted topics, partitions distribution, and RAM can affect your cluster’s performance. Finally, look at how a small configuration drift can rattle your cluster. Our goal is to provide you with the tools and knowledge to navigate this uncharted territory.
AWS vs Azure vs Google Cloud Storage Deep DiveRightScale
Cloud services keep evolving, and cloud storage is no different. It can be difficult to keep up to date with the latest from each cloud provider and understand how they compare. We’ll drill down on object, block, archival, and file storage for the leading public clouds. We’ll also compare prices for a variety of storage scenarios.
AWS re:Invent 2016: Deep Dive on Amazon Elastic Block Store (STG301)Amazon Web Services
In this popular session, you will learn about the latest features and use cases for Amazon EBS, including best practices, an overview of newly introduced features, and brand-new re:Invent announcements. In particular we will cover the expanded portoflio of volume types, including provisioned IOPS, cold storage, and throughput-optimized. This session will help database admins and application architects understand how to blend performance and cost with applicaitns for big data analytics, data warehousing, and transactional and NoSQL databases.
Accelerating hbase with nvme and bucket cacheDavid Grier
This set of slides describes some initial experiments which we have designed for discovering improvements for performance in Hadoop technologies using NVMe technology
Amazon Elastic Block Store (Amazon EBS) provides flexible, persistent storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of all types of Amazon EBS block storage including General Purpose SSD (gp2) and Provisioned IOPS SSD (io1). Along the way, we will share Amazon EBS best practices for optimizing performance, managing snapshots and securing data.
Amazon Elastic Block Store (Amazon EBS) provides flexible, persistent storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of all types of Amazon EBS block storage including General Purpose SSD (gp2) and Provisioned IOPS SSD (io1). Along the way, we will share Amazon EBS best practices for optimizing performance, managing snapshots and securing data.
Amazon Elastic Block Store (Amazon EBS) provides persistent block level storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of the types of Amazon EBS block storage including General Purpose (SSD), Provisioned IOPS (SSD) as well as the new Throughput Optimized HDD and Cold HDD. Along the way, we will share Amazon EBS best practices for performance, management and security.
Accelerating forensic and incident response workflow: the case for a new stan...Bradley Schatz
Today’s forensic processes are mired by practices carried over from a pre-networked world. Practitioners and responders are faced with the unsatisfactory choice of either forensically preserving only a limited amount of evidence while accepting the risk of missing relevant information (triage), or delaying analysis while waiting for full forensic preservation. This seminar will examine the role of existing forensic imaging formats in creating such an environment, and examine how an improved forensic image format (the AFF4 forensic container format) enables practitioners to perform forensic analysis without the delays imposed by current approaches.
Cloud Storage Comparison: AWS vs Azure vs Google vs IBMRightScale
As public cloud storage services mature, it becomes easier to make apples-to-apples comparisons. We drill down on the latest specs and features for object, block, archival, and file storage across AWS, Azure, Google, and IBM. We also compare prices for a variety of storage scenarios.
Implementation of Dense Storage Utilizing HDDs with SSDs and PCIe Flash Acc...Red_Hat_Storage
At Red Hat Storage Day New York on 1/19/16, Red Hat partner Seagate presented on how to implement dense storage using HDDs with SSDs and PCIe flash accelerator cards.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Amazon Elastic Block Store (Amazon EBS) provides flexible, persistent storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of all types of Amazon EBS block storage including General Purpose SSD (gp2) and Provisioned IOPS SSD (io1). Along the way, we will share Amazon EBS best practices for optimizing performance, managing snapshots and securing data.
Amazon Elastic Block Store (Amazon EBS) provides flexible, persistent storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of all types of Amazon EBS block storage including General Purpose SSD (gp2) and Provisioned IOPS SSD (io1). Along the way, we will share Amazon EBS best practices for optimizing performance, managing snapshots and securing data.
Amazon Elastic Block Store (Amazon EBS) provides persistent block level storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of the types of Amazon EBS block storage including General Purpose (SSD), Provisioned IOPS (SSD) as well as the new Throughput Optimized HDD and Cold HDD. Along the way, we will share Amazon EBS best practices for performance, management and security.
Accelerating forensic and incident response workflow: the case for a new stan...Bradley Schatz
Today’s forensic processes are mired by practices carried over from a pre-networked world. Practitioners and responders are faced with the unsatisfactory choice of either forensically preserving only a limited amount of evidence while accepting the risk of missing relevant information (triage), or delaying analysis while waiting for full forensic preservation. This seminar will examine the role of existing forensic imaging formats in creating such an environment, and examine how an improved forensic image format (the AFF4 forensic container format) enables practitioners to perform forensic analysis without the delays imposed by current approaches.
Cloud Storage Comparison: AWS vs Azure vs Google vs IBMRightScale
As public cloud storage services mature, it becomes easier to make apples-to-apples comparisons. We drill down on the latest specs and features for object, block, archival, and file storage across AWS, Azure, Google, and IBM. We also compare prices for a variety of storage scenarios.
Implementation of Dense Storage Utilizing HDDs with SSDs and PCIe Flash Acc...Red_Hat_Storage
At Red Hat Storage Day New York on 1/19/16, Red Hat partner Seagate presented on how to implement dense storage using HDDs with SSDs and PCIe flash accelerator cards.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Long-Term Storage - Panel Session @ Library of Congress Workshop
1. Long-‐Term
Storage
Panel
Session
Erik
Riedel,
EMC
Library
of
Congress
Workshop
September
2012
top
picture
“Once
Blue”
by
Jesse
Wagstaff
via
flickr/cc
right
picture
by
AusNn
Marshall
via
flickr/cc
revision
3
2. Parameters
• Non-‐compressible
data
• Long-‐term
storage
• Very
high
reliability
• Request
rate
of
10%
per
year
• 5,
20,
50
PB
in
2012,
2015,
2018
11. Cost
2012
10%yr
Disks
Disk
BW
Racks
Bandwidth
Actual
Days-‐to-‐fill
5
PB
16
MB/s
2,700
200
GB/s
6
30
GB/s
3
GB/s
19
20
PB
63
MB/s
11,000
1.1
TB/s
23
115
GB/s
11
GB/s
20
50
PB
159
MB/s
27,000
2.7
TB/s
56
280
GB/s
28
GB/s
21
2012
sqN/person
$/sqN
$/month
20
employees
90
$48
$86,000/month
Washington,
DC
80
employees
75
$48
$288,000/month
Washington,
DC
200
employees
75
$24
$360,000/month
Minneapolis,
MN
2012
$/month
@
$0.01/GB
5
PB
$50,000/month
20
PB
$200,000/month
50
PB
$500,000/month
Cost
if
using
e.g.
“cold”
public
cloud
storage
For
comparison,
the
cost
to
“store”
20
librarians
or
data
scienNsts
12. AssumpNons
• Data
protecNon
in
a
single
data
center,
using
an
erasure-‐coding
scheme
at
1.6x
overhead
• 480
drive
racks
in
2012
(40U)
• 600
drive
racks
in
2015
and
2018
(50+U)
• 10%/year
access
assumes
10%
of
total
data
is
accessed
in
even
distribuNon
over
365
days/year,
24
hours/day
–
opNmisNc
• 10%/2day
access
assumes
10%
of
data
is
accessed
on
only
2
days
per
year
(say
Thanksgiving
and
Xmas)
–
very
bursty
• Bandwidth
is
theoreNcal
bandwidth
at
40
Gb/s
per
rack
(4x
10
GbE)
• Actual
bandwidth
is
1/10
of
theoreNcal
maximum
for
2012
and
2015;
up
to
1/3
theoreNcal
max
for
2018
(sohware
improvements)
• sqh
per
person
and
$/sqh
references
hip://www.inc.com/news/arNcles/2010/10/washington-‐dc-‐rents-‐top-‐those-‐in-‐nyc.html
hip://newsfeed.Nme.com/2011/02/08/youre-‐not-‐imagining-‐it-‐your-‐cubicle-‐is-‐gekng-‐smaller/
13. References
• Why
access
to
data
maiers,
not
just
“dark
storage”,
but
wide
access
to
electronic
data:
– The
Internet
Archive
– hip://archive.org/about/
– History
of
the
Internet,
sNll
online
aher
20
years
– hip://www.cs.cmu.edu/~riedel/library/birthday.html
(from
April
2003,
LoC
workshop
on
Digital
PreservaNon)
• What
about
Flash?
– Death
of
Disks
(has
been
widely
exaggerated)
– hip://www.cs.cmu.edu/~riedel/#HECFSIO2011
– How
to
Build
Big
Storage
as
a
Cloud
– hip://storageconference.org/2012/PresentaNons/R00.Keynote.pdf
16. What
About
Tape?
• Tapes
are
not
a
commodity
technology
• 2011
total
worldwide
market
for
tape
cartridges
is
about
8m
units
(just
under
$1b
annual
revenue)
• Compare
to
the
HDD
business
at
650m
units
in
2010
(close
to
$40b
annual
revenue)
• 80
disk
drives
are
manufactured
for
each
tape
cartridge;
robots
are
complicated
• Fits
parNcular
applicaNon
segments
very
well,
but
is
not
a
general-‐purpose
soluNon
hip://www.storagenewsleier.com/news/tapes/sccg-‐ww-‐tape-‐market-‐lto-‐1q11
hip://techreport.com/discussions.x/20890
17. David
Anderson,
James
Dykes,
Erik
Riedel
“SCSI
vs.
ATA
-‐
More
than
an
interface”
2nd
Conference
on
File
and
Storage
Technology
(FAST).
San
Francisco,
CA.
April
2003.
www.cs.cmu.edu/~riedel/#SCSIvsATA