This document discusses the challenges of building an optimal data management platform that can leverage on-demand hardware resources. It summarizes the CAP theorem, which states that a distributed system cannot simultaneously provide consistency, availability, and partition tolerance. The document introduces Pivotal's solution, called the Enterprise Data Fabric (EDF), which is designed to mine the gap between strong consistency and availability. The EDF uses service entities, membership roles, and configurable consistency levels to optimize for consistency and availability based on data and workflow requirements. It exploits parallelism and caches data to improve performance across distributed and global deployments.
On the other hand DeskStream employs client-side desktop virtualization technology to offer an optimized and high performance Dynamic Virtual Desktops (DVDs) by combining the best of centralized IT management and client computing together to deliver truly anywhere, anytime personalized desktop execution on any device resulting in significant cost savings and uncompromised user experience.
On the other hand DeskStream employs client-side desktop virtualization technology to offer an optimized and high performance Dynamic Virtual Desktops (DVDs) by combining the best of centralized IT management and client computing together to deliver truly anywhere, anytime personalized desktop execution on any device resulting in significant cost savings and uncompromised user experience.
Why is Virtualization Creating Storage Sprawl? By Storage SwitzerlandINFINIDAT
Desktop and server virtualization have brought many benefits to the data center. These two initiatives have allowed IT to respond quickly to the needs of the organization while driving down IT costs, physical footprint requirements and energy demands. But there is one area of the data center that has actually increased in cost since virtualization started to make its way into production… storage. Because of virtualization, more data centers need "ash to meet the random I/O nature of the virtualized environment, which of course is more expensive, on a dollar per GB basis, than hard disk drives. The single biggest problem however is the signi!cant increase in the number of discrete storage systems that service the environment. This “storage sprawl” threatens the return on investment (ROI) of virtualization projects and makes storage more complex to manage.
Learn more at www.infinidat.com.
Today Cloud computing is used in a wide range of domains. By using cloud computing a user
can utilize services and pool of resources through internet. The cloud computing platform
guarantees subscribers that it will live up to the service level agreement (SLA) in providing
resources as service and as per needs. However, it is essential that the provider be able to
effectively manage the resources. One of the important roles of the cloud computing platform is
to balance the load amongst different servers in order to avoid overloading in any host and
improve resource utilization.
It is defined as a distributed system containing a collection of computing and communication
resources located in distributed data enters which are shared by several end users. It has widely
been adopted by the industry, though there are many existing issues like Load Balancing, Virtual
Machine Migration, Server Consolidation, Energy Management, etc.
A virtual data center (DC) facility is typically a pool of cloud-enabled IT infrastructure wherein resources are specifically designed to cater to distinct business requirements in a safe and secured environment.
The Journey Toward the Software-Defined Data CenterCognizant
Computing's evolution toward a software-defined data center (SDDC) -- an extension of the cloud delivery model of infrastructure as a service (IaaS) -- is complex and multifaceted, involving multiple layers of virtualization: servers, storage, and networking. We provide a detailed adoption roadmap to guide your efforts.
Learn about IBM zEnterprise Strategy for the Private Cloud.The white paper defines the strategy for implementing the zEnterprise System as an integrated, heterogeneous, and virtualized infrastructure, ideal for supporting Infrastructure as a Service (IaaS) in cloud computing deployments.To know more about System z, visit http://ibm.co/PNo9Cb.
As a C-level executive, you are always on the lookout for ways to reduce IT costs while increasing systems capability in order to grow sales and/or improve service. Leveraging applications with automated business processes that enable user centric interconnected applications embracing interfaces that are inherently enabled for smart devices with efficient systems represents a proven way to improve the business.
Read how IBM and NC State created a “cloud computing” model for provisioning technology that offered a quantum improvement in access, efficiency and convenience over traditional computer labs.
The Software-Defined Data Center - Dell and Cumulus NetworksCumulus Networks
The software-defined data center has been rapidly identified as a key technology to better enable organizations of all sizes to achieve the affordable capacity and operational efficiency that the largest cloud operators enjoy.
To watch the on-demand webinar: http://go.cumulusnetworks.com/cumulus-SDDC
*UP There Everywhere has partnered with Fortune Media in Beijing, Shanghai, and Hong Kong to expand our marketing reach into Asia.
www.media-fortune.com
Why is Virtualization Creating Storage Sprawl? By Storage SwitzerlandINFINIDAT
Desktop and server virtualization have brought many benefits to the data center. These two initiatives have allowed IT to respond quickly to the needs of the organization while driving down IT costs, physical footprint requirements and energy demands. But there is one area of the data center that has actually increased in cost since virtualization started to make its way into production… storage. Because of virtualization, more data centers need "ash to meet the random I/O nature of the virtualized environment, which of course is more expensive, on a dollar per GB basis, than hard disk drives. The single biggest problem however is the signi!cant increase in the number of discrete storage systems that service the environment. This “storage sprawl” threatens the return on investment (ROI) of virtualization projects and makes storage more complex to manage.
Learn more at www.infinidat.com.
Today Cloud computing is used in a wide range of domains. By using cloud computing a user
can utilize services and pool of resources through internet. The cloud computing platform
guarantees subscribers that it will live up to the service level agreement (SLA) in providing
resources as service and as per needs. However, it is essential that the provider be able to
effectively manage the resources. One of the important roles of the cloud computing platform is
to balance the load amongst different servers in order to avoid overloading in any host and
improve resource utilization.
It is defined as a distributed system containing a collection of computing and communication
resources located in distributed data enters which are shared by several end users. It has widely
been adopted by the industry, though there are many existing issues like Load Balancing, Virtual
Machine Migration, Server Consolidation, Energy Management, etc.
A virtual data center (DC) facility is typically a pool of cloud-enabled IT infrastructure wherein resources are specifically designed to cater to distinct business requirements in a safe and secured environment.
The Journey Toward the Software-Defined Data CenterCognizant
Computing's evolution toward a software-defined data center (SDDC) -- an extension of the cloud delivery model of infrastructure as a service (IaaS) -- is complex and multifaceted, involving multiple layers of virtualization: servers, storage, and networking. We provide a detailed adoption roadmap to guide your efforts.
Learn about IBM zEnterprise Strategy for the Private Cloud.The white paper defines the strategy for implementing the zEnterprise System as an integrated, heterogeneous, and virtualized infrastructure, ideal for supporting Infrastructure as a Service (IaaS) in cloud computing deployments.To know more about System z, visit http://ibm.co/PNo9Cb.
As a C-level executive, you are always on the lookout for ways to reduce IT costs while increasing systems capability in order to grow sales and/or improve service. Leveraging applications with automated business processes that enable user centric interconnected applications embracing interfaces that are inherently enabled for smart devices with efficient systems represents a proven way to improve the business.
Read how IBM and NC State created a “cloud computing” model for provisioning technology that offered a quantum improvement in access, efficiency and convenience over traditional computer labs.
The Software-Defined Data Center - Dell and Cumulus NetworksCumulus Networks
The software-defined data center has been rapidly identified as a key technology to better enable organizations of all sizes to achieve the affordable capacity and operational efficiency that the largest cloud operators enjoy.
To watch the on-demand webinar: http://go.cumulusnetworks.com/cumulus-SDDC
*UP There Everywhere has partnered with Fortune Media in Beijing, Shanghai, and Hong Kong to expand our marketing reach into Asia.
www.media-fortune.com
Think big data, and think opportunity. That is, think beyond storing and managing data, and leverage analytics to derive more value than imaginable from your business intelligence. This white paper offers a forward thinking, collaborative approach to analyzing data and changing the way you think about business.
White Paper: Configuring a Customized Session Identifier in Documentum Web De...EMC
This document explains the process of configuring an application-based custom session identifier for Documentum Web Development Kit-based applications.
Tech Book: WAN Optimization Controller Technologies EMC
This EMC Engineering TechBook provides a high-level overview of the WAN Optimization Controller (WOC) appliance, including network and deployment topologies, storage and replication application, FCIP configurations, and WAN Optimization Controller appliances.
Trabalho desenvolvido para a área de segurança do trabalho. trata-se de um apanhado de especificações técnicas obtidas de fabricantes, fornecedores e sites de referencia com o foco de padronizar a compra de equipamentos de proteção individual (EPI´s) pela empresa.
A marketing strategy and implementation firm, Take3 helps technology companies win the war for mindshare and market share --- corporate marketing leaders or CEOs who need to stake out a unique position in the market with a compelling point of view; sales leaders who want to arm their reps with tools to sell value; product marketing leaders who want to quickly focus the company on the customers, plays, and messages that needed to win;
We can help you win customers and outsell competitors with greater clarity and alignment. Is it time for you to Take3?
ACIC Rome & Veritas: High-Availability and Disaster Recovery ScenariosAccenture Italia
A white paper to illustrate High-Availability and Disaster Recovery Scenarios and use-cases developed by Accenture and Veritas in the Accenture Cloud Innovation Center of Rome.
The word “transformation” is suddenly everywhere. Business transformation, data center transformation, IT
transformation—the term is in jeopardy of becoming a buzzword before anyone has actually achieved a transformation.
But there is a reason for the sudden urgency, and what’s driving this trend is particularly relevant to providers of
hosted services. Incremental improvement in service delivery is no longer
adequate. It is no longer competitive
Configuration and Deployment Guide For Memcached on Intel® ArchitectureOdinot Stanislas
This Configuration and Deployment Guide explores designing and building a Memcached infrastructure that is scalable, reliable, manageable and secure. The guide uses experience with real-world deployments as well as data from benchmark tests. Configuration guidelines on clusters of Intel® Xeon®- and Atom™-based servers take into account differing business scenarios and inform the various tradeoffs to accommodate different Service Level Agreement (SLA) requirements and Total Cost of Ownership (TCO) objectives.
Xd planning guide - storage best practicesNuno Alves
The Citrix Storage planning guide provides a list of best practices, recommendations and
performance related tips that cover the most critical areas of storage integration with Citrix
XenDesktop. It is not intended as a comprehensive guide for planning and configuring storage
infrastructures, nor as a storage training handbook.
Due to scope, this guide provides some device-specific information. For additional device- specific
configuration, Citrix suggests reviewing the storage vendor’s documentation, the storage vendor’s
hardware compatibility list, and contacting the vendor’s technical support if necessary
USING SOFTWARE-DEFINED DATA CENTERS TO ENABLE CLOUD BUILDERSJuniper Networks
Juniper-VMware Areas of Collaboration
Customers looking to cloud technologies for better application agility and more efficient support of their entire IT operations are finding that broader use of virtualization across hardware is fundamental to achieving these goals.
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDEMC
CloudBoost is a cloud-enabling solution from EMC
Facilitates secure, automatic, efficient data transfer to private and public clouds for Long-Term Retention (LTR) of backups. Seamlessly extends existing data protection solutions to elastic, resilient, scale-out cloud storage
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOEMC
With EMC XtremIO all-flash array, improve
1) your competitive agility with real-time analytics & development
2) your infrastructure agility with elastic provisioning for performance & capacity
3) your TCO with 50% lower capex and opex and double the storage lifecycle.
• Citrix & EMC XtremIO: Better Together
• XtremIO Design Fundamentals for VDI
• Citrix XenDesktop & XtremIO
-- Image Management & Storage
-- Demonstrations
-- XtremIO XenDesktop Integration
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC
Explore findings from the EMC Forum IT Study and learn how cloud computing, social, mobile, and big data megatrends are shaping IT as a business driver globally.
Reference architecture with MIRANTIS OPENSTACK PLATFORM.The changes that are going on in IT with disruptions from technology, business and culture and so IT to solve the issues has to change from moving from traditional models to broker provider model.
Force Cyber Criminals to Shop Elsewhere
Learn the value of having an Identity Management and Governance solution and how retailers today are benefiting by strengthening their defenses and bolstering their Identity Management capabilities.
Container-based technology has experienced a recent revival and is becoming adopted at an explosive rate. For those that are new to the conversation, containers offer a way to virtualize an operating system. This virtualization isolates processes, providing limited visibility and resource utilization to each, such that the processes appear to be running on separate machines. In short, allowing more applications to run on a single machine. Here is a brief timeline of key moments in container history.
This white paper provides an overview of EMC's data protection solutions for the data lake - an active repository to manage varied and complex Big Data workloads
This infographic highlights key stats and messages from the analyst report from J.Gold Associates that addresses the growing economic impact of mobile cybercrime and fraud.
This white paper describes how an intelligence-driven governance, risk management, and compliance (GRC) model can create an efficient, collaborative enterprise GRC strategy across IT, Finance, Operations, and Legal areas.
The Trust Paradox: Access Management and Trust in an Insecure AgeEMC
This white paper discusses the results of a CIO UK survey on a“Trust Paradox,” defined as employees and business partners being both the weakest link in an organization’s security as well as trusted agents in achieving the company’s goals.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
1. Arguably one of the hardest problems — and consequently, most
exciting opportunities — faced by today’s solution designers is
figuring out how to leverage this on-demand hardware to build
an optimal data management platform that can:
• Exploit memory and machine parallelism for low-latency
and high scalability grow and shrink elastically with business
demand
• Treat failures as the norm, rather than the exception, providing
always-on service
• Span time zones and geography uniting remote business
processes and stakeholders
• Support pull, push, transactional, and analytic based workloads
increase cost-effectiveness with service growth
What’s So Hard
A data management platform that dynamically runs across
many machines requires — as a foundation — a fast, scalable,
fault-tolerant distributed system. It is well-known, however, that
distributed systems have unavoidable trade-offs and notoriously
complex implementation challenges [3,4]. Introduced at PODC
2000 by Eric Brewer [5] and formally proven by Seth Gilbert
and Nancy Lynch in 2002 [6], the CAP Theorem, in particular,
stipulates that it is impossible for a distributed system to be
simultaneously: Consistent, Available, and Partition-Tolerant. At
any given time, only two of these three desirable properties can
be achieved. Hence when building distributed systems, design
trade-offs must be made.
Eventually Consistent Solutions
Are Not Always Viable
Driven by awareness of CAP limitations, a popular design
approach for scaling Web-Oriented applications is to anticipate
that large-scale systems will inevitably encounter network
partitions and to therefore relax consistency in order to allow
services to remain highly available even as partitions occur [7].
Rather than go down and interrupt user service upon network
White paper
The Hardest Problems in
Data Management
goPivotal.com
Introduction
Modern hardware trends and economics [1] combined with cloud/virtualization technology [2] are
radically reshaping today’s data management landscape, ushering in a new era where many machine,
many core, memory-based computing topologies can be dynamically assembled from existing IT resources
and pay-as-you-go clouds.
Table of Contents
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
What’s So Hard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Eventually Consistent Solutions
Are Not Always Viable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Our Solution: The EDF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Mining the Gap in CAP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Have It All (Just Not at Once) . . . . . . . . . . . . . . . . . . . . . . . . . 3
Amortizing CAP For High Performance. . . . . . . . . . . . . . . . . 3
Exploiting Parallelism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Active Caching: Automatic, Reliable Push
Notifications for Event-Driven Architectures. . . . . . . . . . . 4
Spanning The Globe. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Memory-Centric Performance. . . . . . . . . . . . . . . . . . . . . . . . . 5
Elastic Scalability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Tools, Debugging, Testing, Support. . . . . . . . . . . . . . . . . . . . . . . . 5
Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2. White paper The Hardest Problems in Data Management
2
outages, many Web user applications--e.g., customer shopping
carts, e-mail search, social network queries—can tolerate stale
data and adequately merge conflicts, possibly guided by help
from end users once partitions are fixed. Several so-called
eventually consistent platforms designed for partition-tolerance
and availability—largely inspired by Amazon, Google, and
Microsoft—are available as products, cloud-based services, or
even derivative, open source community projects.
In addition to these prior art approaches by Web2.0 vendors,
GemStone’s perspective on distributed system design is further
illuminated by 25 years of experience with customers in the
world of high finance. Here, in stark contrast to Web user-
oriented applications, financial applications are highly automated
and consistency of data is paramount. Eventually consistent
solutions are not an option. Business rules do not permit relaxed
consistency, so in variants like account trading balances must
be enforced by strong transactional consistency semantics. In
application terms, the cost of an apology [8] makes rollback too
expensive, so many financial applications must limit workflow
exposure traditionally by relying on OLTP systems with ACID
guarantees.
But just like Web 2.0 companies, financial applications also need
highly available data. So in practice, financial institutions must
“have their cake and eat it too”.
So given CAP limitations, “How is it possible to prioritize
consistency and availability yet also manage service interruptions
caused by network outages?”
The intersection between strong consistency and availability
(with some assurance for partition-tolerance) identifies a
challenging design gap in CAP-aware solution space. Eventually
consistent approaches area nonstarter. Distributed databases
using 2 phase commit provide transactional guarantees but only
so long as all nodes see each other. Quorum based approaches
provide consistency but block service availability during network
partitions.
Collaboratively designed with customers over the last five
years, GemStone has developed practical ways to mine this
gap, delivering an off-the-shelf, fast, scalable, and reliable data
management solution we call the enterprise data fabric (EDF).
OUR SOLUTION: THE EDF
Any choice of distributed system design—shared memory,
shared-disk, or shared-nothing architecture—has inescapable
CAP trade-offs with downstream implications on data
correctness and concurrency. High-level design choices also
impact throughput and latency as well as programmability
models, potentially imposing constraints on schema flexibility and
transactional semantics. Guided by distributed systems theory
[9], industry trends in parallel/distributed databases [10], and
customer experience, the EDF solution adopts a shared-nothing
scalability architecture where data is partitioned onto nodes
connected into a seamless, expandable, resilient fabric capable
of spanning process, machine, and geographical boundaries. By
simply connecting more machine nodes, the EDF scales data
storage horizontally. Within a data partition (not to be confused
with network partitions), data entries are key/value pairs with
thread-based, read-your-writes [7] consistency. The isolation of
data into partitions creates a service-oriented design pattern
where related partitions can be grouped into abstractions
called service entities [11]. A service entity is deployed on a
single machine at a time where it owns and manages a discrete
collection of data — a holistic chunk of the overall business
schema — hence, multiple data entries co-located within a
service entity can be queried and updated transactionally,
independent of data within another service entity.
Mining the Gap in CAP
From a CAP perspective, service entities enable the EDF to
exploit a well-known approach to fault tolerance based on partial-
failure modes, fault isolation, and graceful degradation of service
[12]. Service entities allow application designers to demarcate
units of survivability in the face of network faults. So rather than
striving for complete partition-tolerance (an impossible result)
when consistency and availability must be prioritized, the EDF
implements a relaxed, weakened form of partition-tolerance that
isolates the effects of network failures enabling the maximum
number of services to remain fully consistent and available when
network splits occur.
By configuring membership roles (reflecting service entity
responsibilities) for each member of the distributed system,
customers can orthogonally (in a light-weight, non-invasive,
declarative manner) encode business semantics that give the
GemFire Enterprise
Quorum Algorithms
Partition Tolerance
Eventual Consistency
- Simple DB
- Dynamo
- Cassandra
- Couch DB
C A
P
3. White paper The Hardest Problems in Data Management
3
EDF the ability to decide under what circumstances a member
node can safely continue operating after a disruption caused
by network failure. Membership roles perform application
decomposition by encapsulating the relationship of one system
member to another, expressing causal data dependencies,
message stability rules (defining what attendant members must
be reachable), loss actions (what to do if required roles are
absent), and resumption actions (what to do when required roles
rejoin). For example, membership roles can identify independent
subnetworks in a complex workflow application. So long as
interdependent producer and consumer activities are present
after a split, reads and writes to their data partitions can safely
continue. Rather than arbitrarily disabling an entire losing side
during network splits, the EDF consults fine-grained knowledge
of service-based role dependencies and keeps as many services
consistently available as possible.
Have It All (Just Not at Once)
CAP requirements and data mutability requirements can evolve
as data flows across space and time through business processes.
So all data is not equal[13]—as data is processed by separate
activities in a business workflow, consistency, availability, and
partition-tolerance requirements change. For example, in an
e-commerce site, acquisition of shopping cart items is prioritized
as a high-write availability scenario, i.e., an e-retailer wants
shopping cart data to be highly available even at the cost of
weakened consistency, lest service interruption stalls impulse
purchases, or worse yet, ultimately forces customers to visit
competitor store fronts. Once items are placed in the shopping
cart and the order is clicked, automated order fulfillment
workflows reprioritize for data consistency (at the cost of
weakened availability) since no live customer is waiting for the
next Web screen to appear; if an activity blocks, another segment
of the automated workflow can be alternatively launched, or the
activity can be retried. In addition to changing CAP requirements,
data mutability requirements also change— e.g., at the front end
of a workflow, data may be write-intensive; whereas afterward,
during bulk processing, the same captured data may be accessed
in read-only or read-mostly fashion.
Our EDF solution exploits this changing nature of data by flexibly
configuring consistency, availability, and partition-tolerance
trade-offs, depending on where and when data is processed in
application workflows:
Thus, business can achieve all three CAP properties—but at
different application locations and times.
Each logical unit of EDF data sharing, called a region, maybe
individually configured for synchronous or asynchronous state
machine replication, persistence, and stipulations for N (number
of replicas), W (number of writers for message stability), R
(number of replicas for servicing a read request).
Amortizing CAP For High Performance
In addition to tuning CAP properties, this system configurability
lets designers selectively amortize the cost of fault-tolerance
throughout an end-to-end system architecture resulting in
optimal throughput and latency.
For example, Wall Street trading firms with extreme low-latency
requirements for data capture can configure order matching
systems to run completely in-memory (with synchronous,
yet very fast, redundancy to another in-memory node)
while providing disaster recovery with persistent (on-disk),
asynchronous data replication and eventual consistency to a
metropolitan area network across the river in New Jersey. To
maximize performance for competitiveness, risks of failure
are spread out and fine-tuned across the business workflow
according to failure probability: the common failure of single
machine nodes is offset by intra-datacenter, strongly consistent
HA memory backups; while the more remote possibility of
a complete data center outage event is off set by a weakly
consistent, multi-homed, disk-resident disaster recovery backup.
Exploiting Parallelism
A design goal of the EDF is to optimize performance by exploiting
parallelism and concurrency wherever and whenever possible.
For example, the EDF exploits many-core thread-level parallelism
by managing entries in highly concurrent data structures and
4. White paper The Hardest Problems in Data Management
4
utilizing service pools for computation and network I/O. The
EDF employs partitioned and pipelined parallelism to perform
distributed queries, aggregation, and internal distribution
operations on behalf of user applications. With data partitioned
among multiple nodes, a query can be replicated to many
independent processors each returning on a small part of the
overall query. Similarly, business data schemas and workloads
frequently consist of multi-step operations on uniform (e.g.,
sequential time series) data streams. These operations can be
composed into parallel dataflow graphs where the output of one
operator is streamed into the input of another; hence multiple
operators can work continuously in task pipelines.
External applications can use the Function Service API to create
Map/Reduce programs that run in parallel over data in the EDF.
Rather than data flowing from many nodes into a single client
application, control (Runnable functions) logic flows from the
application to many machines in the EDF. This dramatically
reduces process execution time as well as network bandwidth.
To further assist parallel function execution, data partitions
can be tuned dynamically by all applications using the Partition
Resolver API. By default, the EDF uses a hashing policy where a
data entry key is hashed to compute a random bucket mapped
to a member node. The physical location of the key-value pair
is virtualized from the application. Custom partitioning, on the
other hand, enables applications to co-locate related data entries
together to form service entities. For example, a financial risk
analytics application can co-locate all trades, risk sensitivities,
and reference data associated with a single instrument in the
same region. Using the Function Service, control flow can then
be directly routed to specific nodes holding particular data
sets; hence queries can be localized and then aggregated in
parallel increasing the speed of execution when compared to a
distributed query.
Server machines can also be organized into server groups —
aligned on service entity functionality — to provide concurrent
access to shared data on behalf of client application nodes.
Active Caching: Automatic, Reliable Push Notifications
for Event-Driven Architectures
EDF client applications can then create active caches to eliminate
latency. If application requests do not find data in the client
cache, the data is automatically fetched from the EDF by the
server. As modifications to shared regions occur, these updates
are automatically pushed to clients where applications receive
real-time event notifications. Likewise, modifications initiated by a
client application are sent to the server and then pushed to other
applications listening for region updates.
Caching eliminates the need for polling since updates are
pushed to client applications as real-time events In addition to
subscribing to all events on a data region, clients can subscribe to
events using key-based regular expressions or continuous queries
based on query language predicates. Clients can create semantic
views or narrow slices of the entire EDF that act as durable,
stateful pub/sub messaging topics specific to application use-
cases. Collectively, these views drive an enterprise-wide, real-time
event-driven architecture.
5. White paper The Hardest Problems in Data Management
5
Spanning The Globe
To accommodate wider-scale topologies beyond client/server,
the EDF spans geographical boundaries using WAN gateways
that allow businesses to orchestrate multi-site workflows.
By connecting multiple distributed systems together, system
architects can create 24x7 global workflows wherein each distinct
geography/city acts as a service entity that owns and distributes
its regional chunk of business data. Sharing of data across
cities — e.g., to create a global book for coordinating trades
between exchanges in New York, London, and Tokyo — can be
done via asynchronous replication via persistent WAN queues.
Here, consistency of data is weakened and eventual, but with the
positive trade off of high application availability (the entire global
service won’t be disrupted if one city goes dark) and low-latency
business processing at individual locations.
Memory-Centric Performance
With plunging cost of fast memory chips (1TB/$15,000), 64-bit
computing, and multi-core servers, the EDF can give system
designers an opportunity to rethink the traditional memory
hierarchy for data management applications. The EDF obviates
the need for high-latency (tens of milliseconds) disk-based
systems by pooling memory from many machines into an ocean
of RAM, creating a fast (microsecond latency), redundant
virtualized memory layer capable of caching and distributing all
operational data within an enterprise.
The EDF core is built in Java where garbage collection of 64-bit
heaps can cause stop-the-world performance interruptions.
Object graphs are managed in serialized form to decrease strain
on the garbage collector. To reduce pauses further, there source
manager actively monitors Java virtual machine heap growth
and proactively evicts cached data on an LRU basis to avoid GC
interruptions.
As flash memory replaces disk—an inevitability predicted by the
new five minute rule [14]—EDF overflow and disk persistence can
be augmented with flash memory for even faster overflow and
lower-latency durability.
Elastic Scalability
From a distributed systems viewpoint, a key technical challenge
for elastic growth is to figure out how to initialize new nodes
into a running distributed system with consistent data. The EDF
implements a patentpending method for bootstrapping joining
nodes based on the current data that ensures “in-flight” changes
appear consistently in the new member nodes. When a node
joins the EDF, it is initialized with a distributed snapshot [15]
taken in a non-blocking manner from current state gleaned from
the active nodes. While this initial snapshot is being delivered and
applied at the new node, concurrent updates are captured and
then merged once the new node is initialized.
Tools, Debugging, Testing, Support
Even with theorem proven algorithms and extensive hardware
and development resources, implementing distributed systems
is challenging. Service outages from cloud computing giants like
Amazon, Google, and Microsoft attest to the enormous difficulty
of getting distributed systems right.
Subtle race conditions occur within innumerable process
interleavings spread across many threads in many processes
in many machines. Recreating problem — at ic scenarios,
messaging patterns, and timing sequences is difficult. Testing an
asynchronous distributed system with a fail stop model is not
even enough since Byzantine faults occur in the real world (e.g.,
TCP check sums fail silently).
Our EDF solution is continuously tested for both correctness
and performance using state-of-the-art quality assurance and
performance tools including static and dynamic model-checking,
profiling, failure-injection, and code analysis. We use a highly
configurable multi-threaded, multi-vm, multi-node distributed
testing framework that provisions tests across a highly scalable
hardware test bed. We run a continuous project to improve our
modeling of EDF performance based on mathematical, analytical,
and simulation techniques.
Despite rigorous design, testing, and simulation,we know that
bugs will always exist. Therefore, a vital design consideration
of the EDF distributed system Includes mechanisms for
comprehensive management, debugging, and support, e..g,
meticulous system logging, continuous statistics monitoring,
system health checks, management consoles, scriptable admin
tools, management APIs with configurable threshold conditions,
performance visualization tools.
Our EDF solution is built and supported by a team with a 25+ year
history of world-class 24x7x365 support that includes just-in-
time, on-site operational consultations along with company-wide
escalation path ways to insure customer success and business
continuity.
6. White paper The Hardest Problems in Data Management
6
Conclusion
To build a data management platform that can scale to exploit
on-demand virtualization environments requires a distributed
system at its core. The CAP Theorem, however, tells us that
building large-scale distributed system requires unavoidable
trade offs. There is a current, design gap in CAP-aware solution
space unsolved by the current wave of eventually consistent
platforms. To bridge this gap, our solution, called the Enterprise
Data Fabric (EDF), prioritizes strong consistency and high
availability by introducing a weakened form of partition tolerance
that lets businesses shrink application exposure to failure by
demarcating functional boundaries according to service entity
role dependencies. When network faults inevitably occur, failures
can be contained in that service does not disappear wholesale,
but degrades gracefully in a partial manner. Our solution also lets
designers apply varying combinations of consistency, availability,
and partition-tolerance at different times and locations in a
business workflow. Hence, by tuning CAP semantics, a data
solutions architect can amortize partition out age risks across
the entire work flow, thus maximizing consistency with availability
while minimizing disruption windows caused by network
partitions. This configurability ultimately enables architects to
build an optimal, high performance data management solution
capable of scaling with business growth and deployment
topologies.
From a functional view point, the EDF is a memory-centric
distributed system for business-wide caching, distribution, and
analysis of operational data. It lets businesses run faster, reliably,
and more intelligently by co-locating entire enterprise working
data sets in memory, rather than constantly fetching this data
from highlatency disk.
The EDF melds the functionality of scalable data base
management, messaging, and stream processing into holistic
middleware for sharing, replicating, and querying operational
data across process, machine, and LAN/ WAN boundaries. To
applications, the EDF appears as a single, tightly interwoven,
overlay mesh that can expand and stretch elastically to match
the size and shape of a growing enterprise. And unlike single-
purpose, poll-driven, key-value stores, the EDF is alive platform
— real-time Web infrastructure — that pushes real-time events
Solution Consistency Turnable CAP Multi-Entry Transactions Latency Bottleneck
EDF Tunable: weak, read-your-writes,
and ACID on Service-entities
Yes Tunable: Linnearizability to
Seriallizability
Memory (u Seconds)
Dynamo, Cassandra,
Voldemort
Weak/Eventual Tunable N/R/W No: Only single key/ value updates Disk (Dynamo=2ms, Cassandra=.12ms
read, 15ms write, Voldemort=10ms)
SimpleDB Weak/Eventual Tunable N/R/W No: Only single key/ value updates Web (Seconds)
GoogleApp Engine/
Megastore
Strong on local entitiesgroups;
Weak/Eventual for distributed
updates
No Yes Disk (30ms)
CouchDB Weak/Eventual MVCC No Document Versioning Disk (ms to sec)
Azure Tables Basically available Soft State
Eventual Consistencey (BASE)
No Entity Group Transactions Web (Seconds)
memcached Weak/Eventual Expiration No No: Only single key/ value updates Memory (u Seconds)
Solution Active Caching Pub/Sub CQs/
Triggers
Custom App Partitioning Custom Service MAP/Reduce
EDF Yes Yes Yes Yes
Dynamo, Cassandra,
Voldemort
No No No No
SimpleDB No No No No
GoogleApp Engine/
Megastore
No No Yes No
CouchDB No Yes No Yes
Azure Tables No No No No
memcached No No No No