Virtual data provided by Delphix can eliminate data as a constraint in application development by enabling:
1) Fast provisioning of full-sized development databases in minutes from production data without moving large amounts of data. This allows development and testing to parallelize and find bugs earlier.
2) Self-service access to consistent, masked data for multiple use cases like development, security and cloud migration. Masking only needs to be done once before cloning databases.
3) Optimized data movement to the cloud through compression, encryption and replication of thin cloned data sets 1/3 the size of full production databases. This improves cloud migration and enables active-active disaster recovery across sites.
Virtualized storage is fast becoming the new norm.
Nobody can justify provisioning non-production environments the way they did up to now.
This presentation is about how Delphix removes the biggest bottleneck in IT operations, development, and QA by virtualizing data. It identifies the bottleneck and the impact on IT, then describes how Delphix removes it to enable DevOps continuous delivery.
Accelerating Devops via Data Virtualization | DelphixDelphixCorp
“Accelerating DevOps Using Data Virtualization” at the Collaborate 2016 conference in Las Vegas. It discusses the inevitability of data virtualization and its many use cases.
In this webinar, Skytap and Sky IT Group share tips and advanced technology for how to build better software faster using cloud-based dev/test environments.
Flintstones or Jetsons? Jump Start Your Virtual Test LabTechWell
The power of virtualization has made it easy and inexpensive to create multiple environments for testing. How you implement your virtualization strategy can boost not only the savings on physical gear and availability of test environments but also your testing productivity. Sharing his experience working through the evolution of Verisign’s virtual test lab, David Silk examines how a well-implemented virtual lab can push your testing productivity to new levels. Learn about the key practices to get a virtual test lab working like an advanced Jetson’s-style machine while avoiding the Flintstone's dinosaur approach. See how Verisign’s approach focuses on the whole environment—not just one virtual machine at a time. Learn where to start and how to build a virtual test lab that leverages the technology, ensures repeatability, and saves test engineers time and effort. Don’t be a Flintstone!
Skytap parasoft webinar new years resolution- accelerate sdlcSkytap Cloud
In this webinar, co-hosted by Parasoft and Skytap, find out how to get your software lifecycle in shape for the New Year. You'll learn strategies for helping DevOps and Test collaborate in ways that make your SDLC leaner and more scalable.
Virtualized storage is fast becoming the new norm.
Nobody can justify provisioning non-production environments the way they did up to now.
This presentation is about how Delphix removes the biggest bottleneck in IT operations, development, and QA by virtualizing data. It identifies the bottleneck and the impact on IT, then describes how Delphix removes it to enable DevOps continuous delivery.
Accelerating Devops via Data Virtualization | DelphixDelphixCorp
“Accelerating DevOps Using Data Virtualization” at the Collaborate 2016 conference in Las Vegas. It discusses the inevitability of data virtualization and its many use cases.
In this webinar, Skytap and Sky IT Group share tips and advanced technology for how to build better software faster using cloud-based dev/test environments.
Flintstones or Jetsons? Jump Start Your Virtual Test LabTechWell
The power of virtualization has made it easy and inexpensive to create multiple environments for testing. How you implement your virtualization strategy can boost not only the savings on physical gear and availability of test environments but also your testing productivity. Sharing his experience working through the evolution of Verisign’s virtual test lab, David Silk examines how a well-implemented virtual lab can push your testing productivity to new levels. Learn about the key practices to get a virtual test lab working like an advanced Jetson’s-style machine while avoiding the Flintstone's dinosaur approach. See how Verisign’s approach focuses on the whole environment—not just one virtual machine at a time. Learn where to start and how to build a virtual test lab that leverages the technology, ensures repeatability, and saves test engineers time and effort. Don’t be a Flintstone!
Skytap parasoft webinar new years resolution- accelerate sdlcSkytap Cloud
In this webinar, co-hosted by Parasoft and Skytap, find out how to get your software lifecycle in shape for the New Year. You'll learn strategies for helping DevOps and Test collaborate in ways that make your SDLC leaner and more scalable.
This joint webinar for DBmaestro (www.dbmaestro.com)and Delphix discuss the synergy between Delphix’s Database Virtualiztion and DBmaestro’s Database Enforced Change Management solutions.
The session discuss the challenges in database development and show in practice how Database Enforced
Change Management and Database Virtualization work together to create a version control, branching and merging method that addresses these challenges.
Data Vault 2.0 Demystified: East Coast TourWhereScape
Dan Linstedt, inventor of Data Vault 2.0, explained why many see Data Vault as the trend of the future for Data Warehousing.
During the event, attendees heard how Data Vault 2.0 can help their teams:
- Manage and enforce compliance to Sarbanes-Oxley, HIPPA, and BASIL II in your Enterprise Data Warehouse
- Spot business problems that were never visible previously
- Rapidly reduce business cycle time for implementing changes
- Merge new business units into the organization rapidly
- Achieve rapid ROI and delivery of information to new Star Schemas
- Consolidate disparate data stores, tackling Master Data Management
- Implement and deploy Service-Oriented Architecture fast
- Scale efficiently to hundreds of Terabytes or Petabytes
- Reach SEI CMM Level 5 compliance (repeatable, consistent, redundant architecture)
- More easily trace all of your data back to the source system
*** Watch the on demand webinar recording here - https://curiositysoftware.ie/resources/test-data-development-webinar/ ***
A Curiosity Software and Windocks webinar, presented live on the 2nd of February, 2021. Now available to stream on demand!
Test data “provisioning” is lagging far behind the sophistication of today’s systems. Development has shifted to containerisation and microservices, rapidly ripping out and replacing reusable components. Testers must also rapidly rip-and-replace versioned components in their environments, while retaining complex data relationships between shifting technologies. The deployed data must furthermore be diverse, compliant and compact, fulfilling all positive and negative scenarios in the shortest test runs possible.
Sound like an impossible requirement? While it is, if you rely on making costly physical copies of low-variety production data. “Test data management” instead needs to embrace the world of containers and APIs, along with the pipelines that enable developers to deliver so rapidly. We need a new approach to testing massively complex systems in short sprints.
This webinar will showcase how Test Data Automation combines with containerised data cloning, automatically deploying versioned virtual databases as tests are created and run. Huw Price, Managing Director of Curiosity Software Ireland, and Paul Stanton, co-founder and Vice President of Windocks, will show you how:
1. Test Data Automation provides complete and compliant data on demand, delivering test-ready data that is masked and enhanced with synthetic data.
2. Parallel test teams and frameworks leverage fresh containers, without slow data provisioning or complex configuration.
3. Organisations regain full visibility and control over test data, while enjoying the added affordability of database virtualisation.
*** Watch the on demand webinar recording here - https://curiositysoftware.ie/resources/test-data-development-webinar/ ***
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScapeWhereScape
Join Dan Linstedt and WhereScape to learn the benefits that Data Vault 2.0 offers to data warehousing teams, what it is and isn't, and how data vault automation can help teams implement Data Vault 2.0 more quickly and successfully.
The Data lake hidden in your backups - Big Data Expo 2019webwinkelvakdag
Data warehouse projects have an alarmingly high failure rate – leading analysts estimate anywhere from 50% to 64% over the last 15 years. These projects don’t fail from a lack of good tools in the market. They fail because of the time, expense, and complexity of centralizing, storing, updating, and transforming enterprise data so that it can be unlocked for other uses.
But what if you had already built a data lake covering all of your production workloads, with data stretching back for months or years and updated daily? What if this data lake was accessible, easily portable, and available for analysis using any set of tools? And what if the expense and staffing to create and maintain this data lake was already part of your budget and had been for years?
Your organization’s backup data represents exactly this resource. We all have backups to recover from outages, human error, malware, and disasters. But imagine you could take advantage of your backup data in order to:
Analyze historical data sets with no impact on production workloads
Migrate workloads into the cloud to take advantage of cloud-based analytics resources
Fully automate DevOps processes using daily backup data
Perform ransomware and compliancy checks and audits
Comply with GDPR data subject access requests (DSAR’s)
And numerous other use cases…
Building Confidence in Big Data - IBM Smarter Business 2013 IBM Sverige
Success with big data comes down to confidence. Without confidence in the underlying data, decision makers may not trust and act on analytic insight. You need confidence in your data – that it’s correct, trusted, and protected through automated integration, visual context, and agile governance. You need confidence in your ability to accelerate time to value, with fast deployments of big data appliances. Learn how clients have succeeded with big data by building confidence in their data, ability to deploy, and skills. Presenter: David Corrigan, Big Data specialist, IBM. Mer från dagen på http://bit.ly/sb13se
Is Your Organization Ready for Data Vault?WhereScape
Enterprise Data Vaults based on the Data Vault 2.0 system of business intelligence developed by Dan Linstedt offer IT teams greater resiliency to business and technology changes, an easier path to ingesting new and multiple data sources, and greater scalability and data consistency. But, how do you know if your IT organization has what it needs to achieve a timely, cost-efficient and successful implementation?
Review this October 2018 presentation and learn firsthand from inventor Dan Linstedt the benefits of pursuing the Data Vault 2.0 methodology for IT organizations and the characteristics inherent within the IT teams successful in its use. You'll also see the results that data vault automation, such as WhereScape® Data Vault Express™, is helping successful organizations to achieve.
AWS re:Invent 2016: From Dial-Up to DevOps - AOL’s Migration to the Cloud (DE...Amazon Web Services
AOL originally provided dial-up service to millions of people. Today, AOL powers advertising and media experiences for the web’s top destinations. How do you maintain observability and reliability to both business and technical teams for high-traffic services in a dynamic infrastructure? Join us as we discuss AOL’s DevOps journey. We will dive into its engineering culture, automation, and monitoring best practices that have allowed AOL to successfully reinvent their infrastructure, as they moved from globally distributed data centers to the AWS Cloud. Session sponsored by Datadog.
AWS Competency Partner
Why 2015 is the Year of Copy Data - What are the requirements?Storage Switzerland
Data is the new currency of business. To fully protect and exploit this data requires that it be copied to various backend processes like data protection, compliance and data analytics. The problem is that primary data is growing by 35 to 50% per year, the need to copy all this data can exacerbate this problem by 10X! Data centers have to find a way to mitigate this problem but still drive full value from backend processes.
In 2015 IT professionals will be hearing a lot about how copy data management will address this problem. But all copy data solutions are not created equal. Listen to experts from Storage Switzerland and Catalogic to define exactly what copy data is and what IT professionals should expect from these solutions.
Successfully convince people with data visualizationKyle Hailey
Successfully convince people with data visualization
video of presentation available at https://www.youtube.com/watch?v=3PKjNnt14mk
from Data by the Bay conference
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
11. Put your energy into the constraint
Top 5 constraints in IT
1. Dev environments setup
2. QA setup
3. Code Architecture
4. Development
5. Product management
- Gene Kim
Surveyed
• 14000 companies
• 100s of CIOs
17. Development Pipeline for QA
17
0 2 4 6 8 10 12 14 16 18 20 22 24
Reset
Test
Reset
Test
Reset
Test
Physical Data
Wait Time
Hours
Refresh
( > 80%)
Testing(< 20%)
18. Data Management not Agile
18
• 20% SDLC time lost waiting for data
• 60% dev/QA time consumed by data tasks
Conclusion:
Data management does not scale to
Agile
- Infosys
Data is the Constraint
25. 2. Bad data leads to bugs: late stage bugs
Dev QA UAT Production
26. 2. Bad data leads to bugs: late stage bugs
Dev QA UAT Production
#
bugs
Found
27. Dev Testing UAT Production
2. Bad data leads to bugs: late stage bugs
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7
Cost
To
Correct
Software
Engineering
Economics
– Barry Boehm (1981)
28. Developer Asks for
DB
Get
Access
Manager approves
DBA Request
system
Setup DB
System
Admin
Request
storage
Setup
machine
Storage
Admin
Allocate
storage
(take snapshot)
3. Slow environment builds: delays
29. Why are hand offs so expensive?
1hour
1 day
9 days
3. Slow environment builds: delays
51. Physical Data : late stage bugs
Dev QA UAT Production
0
50
100
150
200
250
300
350
400
450
500
Dev Testing UAT Production
Bugs Discovered Legacy
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7
Cost
To
Correct
Cost
To
Correct
52. Physical Data : find bugs fast
Dev QA UAT Production
Dev Testing UAT Production
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7
Cost
To
Correct
53. RefreshTest Refresh
Test
Refresh
Test
Virtual Data : Fast Refresh
53
0 2 4 6 8 10 12 14 16 18 20 22 24
Hours
Virtual Data
Physical Data
Bookmark, Reset
99% Less Downtime Data FederationVersion Control
Bookmark and BranchQuickly Refresh Sync across data sources
54. Virtual Data: Version Control
54
Dev Dev
2.1 2.2
Production Time Flow
Live Archive data for years
• Archive EBS R11 before upgrade to R12
• Sarbanes-Oxley
• Dodd-Frank
• Financial Stress tests
Production
62. Tradition Protection: Network & Perimeter
EndpointsPerimeter Defense
Protect the
Interior
Encryption
Network
Intrusion
Detection
Endpoint
Defense
“Organizations should use data
Masking to protect sensitive data
at rest and in transit from insiders'
and outsiders' attacks.”
- Gartner
Magic Quadrant for
Data Masking Technology
63. Insider Threats Are Costly
$1,075
$1,900
$7,378
$33,565
$81,500
$85,959
$96,424
$126,545
$144,542
Botnets
Viruses, worms,…
Malware
Stolen devices
Malicious code
Phishing & social…
Web-based attacks
Denial of services
Malicious insiders
Average Annualized Cyber Crime Cost Weighted by
Attack Frequency
Consolidated view, n = 252 separate companies
2015 Global Cost of Cyber Crime Study,
Ponemon Institute
64. Costs more
Quality is
lower
Hard to mask
consistently
Moving data
from prod to
non-prod
takes a long
time
Ease of Use
Instant data
Consistent
65. Virtual Data Masking
• Automates discovery
• Provides different masking algorithms for different data types
• Mask once clone many with thin cloning
Mask Data
6 hours Clone 18 Hours
Clone
15 min
Mask Data
Mask
4 hours
Mask
Data
80. 9TB database 1TB change day : 30 days
0
10
20
30
40
50
60
70
week1
week2
week3
week4
original
Oracle
Delphix
Storage
Required
(TB)
Days
81. RPO & RTO
81
• RPO
– Any time in last 30 days
– Down to the second
• RTO
– Minutes
– Push button
0
5
10
15
week1
week2
week3
week4 original
Delphix
87. • Projects “12 months to 6 months.”
– New York Life
• Insurance product “about 50 days ... to about 23 days”
– Presbyterian Health
• “Can't imagine working without it”
– State of California
Virtual Data Quotes
90. A database refresh in 15 minutes?
That is mind blowing!
Delphix nailed it for us.
- Matt Lawrence , Sr Director Wind River (Intel)
Took 3 weeks to build a dev env
now with Delphix takes less than a day
the db part is less than 15 minutes
- Marty Boos , Stubhub (Ebay)
Delphix goes beyond storage
Delphix so much more than
We thought it was
-Michael Brow State of Colorado
91. Worth investing on this product
the technology is strong and
value prop is high
- Deloitte
I'm convinced about Delphix's
technology Delphix can really
increase the quality of Dev / QA
- Oaktable Member
Delphix allows us to move fast and setup database copies in seconds
Delphix is powerful and allowed us to scale from 2 projects to 11
We need Delphix to scale our agile environment
– Tim Campos, CIO, Facebook
92. The Goal : eliminate the constraint
Improvement
not made
at the constraint
is an illusion
Theory of Constraints
if you look at what’s really impeding flow from development to operations to the customer, it’s typically IT operations.
Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment. When that happens terrible things happen. People actually horde environments. They invite people to their teams because the know they have reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.
One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it“
One of the best predictors of DevOps performance is that IT Operations can make available environments available on-demand to Development and Test, so that they can build and test the application in an environment that is synchronized with Production.
One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it
Eliyahu Goldratt
if you look at what’s really impeding flow from development to operations to the customer, it’s typically IT operations.
Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment. When that happens terrible things happen. People actually horde environments. They invite people to their teams because the know they have reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.
One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it“
One of the best predictors of DevOps performance is that IT Operations can make available environments available on-demand to Development and Test, so that they can build and test the application in an environment that is synchronized with Production.
One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it
Eliyahu Goldratt
if you look at what’s really impeding flow from development to operations to the customer, it’s typically IT operations.
Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment. When that happens terrible things happen. People actually horde environments. They invite people to their teams because the know they have reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.
One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it“
One of the best predictors of DevOps performance is that IT Operations can make available environments available on-demand to Development and Test, so that they can build and test the application in an environment that is synchronized with Production.
One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it
Eliyahu Goldratt
if you look at what’s really impeding flow from development to operations to the customer, it’s typically IT operations.
Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment. When that happens terrible things happen. People actually horde environments. They invite people to their teams because the know they have reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.
One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it“
One of the best predictors of DevOps performance is that IT Operations can make available environments available on-demand to Development and Test, so that they can build and test the application in an environment that is synchronized with Production.
One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it
Eliyahu Goldratt
if you look at what’s really impeding flow from development to operations to the customer, it’s typically IT operations.
Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment. When that happens terrible things happen. People actually horde environments. They invite people to their teams because the know they have reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.
One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it“
One of the best predictors of DevOps performance is that IT Operations can make available environments available on-demand to Development and Test, so that they can build and test the application in an environment that is synchronized with Production.
One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it
Eliyahu Goldratt
IT bottlenecks
Setting Priorities
Company Goals
Defining Metrics
Fast Iterations
IT version of
“The Goal”
by E. Goldratt
“One of the most powerful things that organizations
can do is to enable development and testing to get
environment they need when they need it“
What happens now in the industry
Typically the application development life cycle is something like this
We have some production database with production applications running on top of the database
And we have developers either customizing that application or writing new functionality for the application
We need copies of that data to make sure our code runs correctly when it gets to production develop and
We have teams of people, DBAs, sys admins, storage admins, etc making these copies
It’s slow work to copy all this data
It’s tedious work
All the while we have developers and QA testers waiting for these copies
Not enough resources
Contention on shared environments
Lack of enough environments
Late stage bug discovery
Faulty Data leading to bugs
Subsets
Synthetic data
Old data
Slow environment builds
Delays
Developers waiting
QA slow and expensive
Not enough resources
Contention on shared environments
Lack of enough environments
Late stage bug discovery
Faulty Data leading to bugs
Subsets
Synthetic data
Old data
Slow environment builds
Delays
Developers waiting
QA slow and expensive
Not sure if you’ve run into this but I have personally experience the following
When I was talking to one group at Ebay, in that development group they
Shared a single copy of the production database between the developers on that team.
What this sharing of a single copy of production meant, is that whenever a
Developer wanted to modified that database, they had to submit their changes to code
Review and that code review took 1 to 2 weeks.
I don’t know about you, but that kind of delay would stifle my motivation
And I have direct experience with the kind of disgruntlement it can cause.
When I was last a DBA, all schema changes went through me.
It took me about half a day to process schema changes. That delay was too much so it was unilaterally decided by
They developers to go to an EAV schema. Or entity attribute value schema
Which mean that developers could add new fields without consulting me and without stepping on each others feat.
It also mean that SQL code as unreadable and performance was atrocious.
Besides creating developer frustration, sharing a database
also makes refreshing the data difficult as it takes a while to refresh the full copy
And it takes even longer to coordinate a time when everyone stops using the copy to make the refresh
All this means is that the copy rarely gets refreshed and the data gets old and unreliable
Not sure if you’ve run into this but I have personally experience the following
When I was talking to one group at Ebay, in that development group they
Shared a single copy of the production database between the developers on that team.
What this sharing of a single copy of production meant, is that whenever a
Developer wanted to modified that database, they had to submit their changes to code
Review and that code review took 1 to 2 weeks.
I don’t know about you, but that kind of delay would stifle my motivation
And I have direct experience with the kind of disgruntlement it can cause.
When I was last a DBA, all schema changes went through me.
It took me about half a day to process schema changes. That delay was too much so it was unilaterally decided by
They developers to go to an EAV schema. Or entity attribute value schema
Which mean that developers could add new fields without consulting me and without stepping on each others feat.
It also mean that SQL code as unreadable and performance was atrocious.
Besides creating developer frustration, sharing a database
also makes refreshing the data difficult as it takes a while to refresh the full copy
And it takes even longer to coordinate a time when everyone stops using the copy to make the refresh
All this means is that the copy rarely gets refreshed and the data gets old and unreliable
KLA Tencore
Stateado
To circumvent the problems of sharing a single copy of production
Many shops we talk to create subsets.
One company we talked to , spends 50% of time copying databases
have to subset because not enough storage
subsetting process constantly needs fixing modification
Now What happens when developers use subsets -- ****** -----
We talked to Presbyterian Healthcare
And they told us that they spend 96% of their QA cycle time building the QA environment
And only 4% actually running the QA suite
This happens for every QA suite
meaning
For every dollar spent on QA there was only 4 cents of actual QA value
And that 96% cost is infrastructure time and overhead
We talked to Presbyterian Healthcare
And they told us that they spend 96% of their QA cycle time building the QA environment
And only 4% actually running the QA suite
This happens for every QA suite
meaning
For every dollar spent on QA there was only 4 cents of actual QA value
And that 96% cost is infrastructure time and overhead
We talked to Presbyterian Healthcare
And they told us that they spend 96% of their QA cycle time building the QA environment
And only 4% actually running the QA suite
This happens for every QA suite
meaning
For every dollar spent on QA there was only 4 cents of actual QA value
And that 96% cost is infrastructure time and overhead
Internet vs browser
Automate or die – the revolution will be automated
The worst enemy of companies today is thinking that they have the best processes that exist, that their IT organizations are using the latest and greatest technology and nothing better exists in the field. This mentality will be the undermining of many companies.
http://www.kylehailey.com/automate-or-die-the-revolution-will-be-automated/
Data IS the constraint
Business skeptics are saying to themselves that data processes are just a rounding error in most of their project timelines, and that they are sure their IT has developed processes to fix that. That’s the fundamental mistake. The very large and often hidden data tax lay in all the ways that we’ve optimized our software, data protection, and decision systems around the expectation that data is simply not virtual. The belief that there is no agility problem is part of the problem.
http://www.kylehailey.com/data-is-the-constraint/
Due to the constraints of building clone copy database environments one ends up in the “culture of no”
Where developers stop asking for a copy of a production database because the answer is “no”
If the developers need to debug an anomaly seen on production or if they need to write a custom module which requires a copy of production they know not to even ask and just give up.
Fastest query is the query not run
In the physical database world, 3 clones take up 3x the storage.
In the virtual world 3 clones take up 1/3 the storage thanks to block sharing and compression
Delphix radically changes this paradym
Delphix is software that we provide as
a virtual machine OVA file that you spin up on any commodity intel hardware
You give us any storage
Delphix maps it’s own proprietary file system on to the storage
We have a web UI
With the web UI you can point us to any database or data source such as
Oracle, SQL Server, Sybase, Postgres, flatfiles etc
At link time we take one full copy.
We only do it once and never again
We compress the data so
If the data is 3TB on source it will be
1TB on Delphix
From then and forever we just pull in the changed blocks
With the changed blocks Delphix builds up a timeline of data versions
The default window is 2 weeks but you can configure it to be 2 months or 2 years
You can spin up a copy of the data down to the second at any point in time in the time window
Now with a few clicks of a mouse and in a few minutes we can spin up copies on
Developer machines, QA machines, UAT etc
When we make copies there is no data being moved
We just point the copies to data that already exists on Delphix
There is no data on the target machines
All the data is on Delphix
Delphix looks like a NAS or NFS file server to the target machines
We give them a read writeable point in time snapshot o the data
We also track all the block changes on the virtual databases
With the block change tracking on the virutal database we can do cool thigs links
Roll them back, branch them, version them, share them, book mark the data
All this is super simple to run
Delphix can generally be be run by a junior DBA in quarter time
The coolest thing, especially for DevOps process, is self server interface for developers and testers
Where they can refresh data from production
Roll back changes
Bookmark and share data between dev and QA
We can treat data the way we treat code
For example Stubhub went from 5 copies of production in development to 120
Giving each developer their own copy
Stubhub estimated a 20% reduction in bugs that made it to production
Slow downs mean bottlenecks
We talked to Presbyterian Healthcare
And they told us that they spend 96% of their QA cycle time building the QA environment
And only 4% actually running the QA suite
This happens for every QA suite
meaning
For every dollar spent on QA there was only 4 cents of actual QA value
And that 96% cost is infrastructure time and overhead
We talked to Presbyterian Healthcare
And they told us that they spend 96% of their QA cycle time building the QA environment
And only 4% actually running the QA suite
This happens for every QA suite
meaning
For every dollar spent on QA there was only 4 cents of actual QA value
And that 96% cost is infrastructure time and overhead
Physically independent but logically correlated
Cloning multiple source databases at the same time can be a daunting task
One example with our customers is Informatica
Who had a project to integrate 6 databases into one central database
The time of the project was estimated at 12 months
With much of that coming from trying to orchestrating
Getting copies of the 6 databases at the same point in time
Like herding cats
Walmart.com
Informatical had a 12 month project to integrate 6 databases.
After installing Delphix they did it in 6 months.
I delivered this early
I generated more revenue
I freed up money and put it into innovation
won an award with Ventana Research for this project
Data masking should be a budgeted item in the enterprise IT spending. JP Morgan—joined by other banks and major companies—is going to spend a large amount on cybersecurity, yet still doesn’t feel like this sum is enough. Why is that?
Traditional security is network security, AKA perimeter defense. Keeps the exterior protected.
Enhanced by endpoint defense, which locks down phones/laptops in this era of bring your own device (BYOD)
That being said, organizations are taking increasingly longer to detect network and system intrusions. According to a Trustwave survey, an external party informed the company of the breach in 80% of cases.
That’s why it’s so important to protect the interior—protecting the data itself. As an analogy, perimeter security is like building castle walls—but protecting the interior means strong body armor for all of the knights you send out onto the open battlefield.
Unshackle yourself from massive infrastructure drag and bureaucratic quagmires
And put a jetpack on your IT organizations and application development projects
Moving the data IS the big gorilla. Eliminating the data tax is crucial to the success of your company. And, if huge databases can be ready at target data centers in minutes, the rest of the excuses are flimsy.
virtual data – virtualized data – uses a small footprint. A truly virtual data platform can deliver full size datasets cheaper than subsets. A truly virtual data platform can move the time or the location pointer on its data very rapidly, and can store any version that’s needed in a library at an unbelievably low cost. And, a truly virtual data platform can massively improve app quality by making it reliable and dead simple to return to a common baseline for one or many databases in a very short amount of time. Applications delivered with agile data can afford a lot more full size virtual copies, eliminating wait time and extra work caused by sharing, as well as side effects. With the cost of data falling so dramatically, business can radically increase their utilization of existing hardware and storage, delivering much more rapidly without any additional cost. An agile data platform presents data so rapidly and reliably that the data becomes commoditized – and servers that sit idle because it would just take too long to rebuild can now switch roles on demand.
Now let’s look at Delphix Data as a Service
With Delphix and Data as a Service, provisioning copies of data becomes push button functionality that finishes in minutes.
How does this work?
Delphix is provided as software
Delphix software is a virtual machine.
Delphix virtual machine manages storage and maps its own advanced specialized file system onto storage.Delphix an be used with any storage such as EMC, Netapp, Fujitsu, JBODs etc
Once Delphix is installed and has been allocated storage it can be point at a data source.
Once and only once, Delphix will pull in a full copy of the data source in and compress it.
From then and forever Delphix just pulls in the changed data blocks and stores them creating a time line of data.
From that timeline clone copies of production can be spun up in minutes on target machine.
The clones can be made at any point in time from the time flow storage on Delphix down to the second.
Each clone is for all intents and purposes a completely independent, full size, read/write copy of production
Delphix can typically be managed by a single person in just a faction of their time
Delphix provides a special developer centric self service interface for developers and QA where
Developers can provision their own copies of data and have access to typical developer features such
As rollback, bookmark, branching and refresh.
<div>Icon made by <a href="http://www.freepik.com" title="Freepik">Freepik</a> from <a href="http://www.flaticon.com" title="Flaticon">www.flaticon.com</a> is licensed under <a href="http://creativecommons.org/licenses/by/3.0/" title="Creative Commons BY 3.0">CC BY 3.0</a></div>
<div>Icon made by <a href="http://www.freepik.com" title="Freepik">Freepik</a> from <a href="http://www.flaticon.com" title="Flaticon">www.flaticon.com</a> is licensed under <a href="http://creativecommons.org/licenses/by/3.0/" title="Creative Commons BY 3.0">CC BY 3.0</a></div>
<div>Icon made by <a href="http://www.freepik.com" title="Freepik">Freepik</a> from <a href="http://www.flaticon.com" title="Flaticon">www.flaticon.com</a> is licensed under <a href="http://creativecommons.org/licenses/by/3.0/" title="Creative Commons BY 3.0">CC BY 3.0</a></div>
<div>Icon made by <a href="http://www.freepik.com" title="Freepik">Freepik</a> from <a href="http://www.flaticon.com" title="Flaticon">www.flaticon.com</a> is licensed under <a href="http://creativecommons.org/licenses/by/3.0/" title="Creative Commons BY 3.0">CC BY 3.0</a></div>
<div>Icon made by <a href="http://www.freepik.com" title="Freepik">Freepik</a> from <a href="http://www.flaticon.com" title="Flaticon">www.flaticon.com</a> is licensed under <a href="http://creativecommons.org/licenses/by/3.0/" title="Creative Commons BY 3.0">CC BY 3.0</a></div>
<div>Icon made by <a href="http://www.freepik.com" title="Freepik">Freepik</a> from <a href="http://www.flaticon.com" title="Flaticon">www.flaticon.com</a> is licensed under <a href="http://creativecommons.org/licenses/by/3.0/" title="Creative Commons BY 3.0">CC BY 3.0</a></div>
<div>Icon made by <a href="http://www.freepik.com" title="Freepik">Freepik</a> from <a href="http://www.flaticon.com" title="Flaticon">www.flaticon.com</a> is licensed under <a href="http://creativecommons.org/licenses/by/3.0/" title="Creative Commons BY 3.0">CC BY 3.0</a></div>
<div>Icon made by <a href="http://www.freepik.com" title="Freepik">Freepik</a> from <a href="http://www.flaticon.com" title="Flaticon">www.flaticon.com</a> is licensed under <a href="http://creativecommons.org/licenses/by/3.0/" title="Creative Commons BY 3.0">CC BY 3.0</a></div>
<div>Icon made by <a href="http://www.freepik.com" title="Freepik">Freepik</a> from <a href="http://www.flaticon.com" title="Flaticon">www.flaticon.com</a> is licensed under <a href="http://creativecommons.org/licenses/by/3.0/" title="Creative Commons BY 3.0">CC BY 3.0</a></div>
<div>Icon made by <a href="http://www.freepik.com" title="Freepik">Freepik</a> from <a href="http://www.flaticon.com" title="Flaticon">www.flaticon.com</a> is licensed under <a href="http://creativecommons.org/licenses/by/3.0/" title="Creative Commons BY 3.0">CC BY 3.0</a></div>