StorageOS is a distributed block storage platform that can be deployed with Docker using volume plugins. It allows the creation of highly available volumes that can be accessed from any node and follow containers during rescheduling. While built with cloud-native principles in mind, some limitations remain such as an assumption of geographic proximity between nodes. Overall, StorageOS scores well at 8 out of 8 on cloud-native storage principles due to its integration with Docker, ability to schedule data optimally, and provision of high availability and portability for application volumes.
EMC Isilon Best Practices for Hadoop Data StorageEMC
This white paper describes the best practices for setting up and managing the HDFS service on an Isilon cluster to optimize data storage for Hadoop analytics.
EMC Isilon Best Practices for Hadoop Data StorageEMC
This white paper describes the best practices for setting up and managing the HDFS service on an Isilon cluster to optimize data storage for Hadoop analytics.
Run compute-intensive Apache Hadoop big data workloads faster with Dell EMC P...Principled Technologies
Moving compute-intensive, Hadoop big data workloads to current-generation Dell EMC PowerEdge R640 servers powered by 2nd Generation Intel Xeon Scalable processors could allow your organization to better meet the data analysis challenges of today. Faster analysis of large data sets means getting insight into your organization, products, and services sooner, which could help your organization grow and beat its competition.
IBM Object Storage and Software Defined Solutions - CleversafeDiego Alberto Tamayo
Digital Content Growth
• Continued growth in graphical content creation
• Multi-device and HD/4K/8K make it even more challenging to store and process data
• Time & location shifting: viewing on individual schedule
• Content life-cycle management provides a balance between cost & performance
while maintaining customer experience
• Meta data availability and access
• Digital disruption with Over The Top (OTT) digital only competitors - new business
models
• OTT viewing will grow from representing 3.4% of TV viewing hours in 2013 to
20.4% by 2017 in NA
• 63% stream on-demand media more than weekly
• Security and data protection are big issues.
• File sharing declines with legal on-demand options
• Connection speeds are increasing making new options possible
Slow performance and unavailable critical applications can impinge a company’s progress. You can apply patches and updates to improve application quality and user experience, but these changes need to be tested in resource-intensive environments before deployment. Keeping these applications accessing data is vital, too, as on-premises events can put availability at risk.
Our Dell EMC VMAX 250F and PowerEdge server solution supported test/dev environments and production database applications simultaneously without affecting the production applications’ performance. As we added VMs designed for test/dev environments, the production workload maintained an acceptable level of IOPS and achieved an average storage latency of less than a millisecond. The solution also kept data highly available with no downtime and no performance drop when we initiated a lost host connection for the primary storage. To run critical database applications of your company, consider the Dell EMC VMAX 250F for your datacenter.
Your datacenter is capable of doing great things—if you let it. Upgrades from Intel for compute, storage, and networking components can help your business support new services and expand your customer base. In our hands-on testing, we found that new Intel processors, high-bandwidth network components, and SATA or PCIe SSDs working together can boost your datacenter’s capabilities, which could translate to better business operations for your organization.
Symantec NetBackup 7.6 benchmark comparison: Data protection in a large-scale...Principled Technologies
The footprint of a VM can grow quickly in an enterprise environment and large-scale VM deployments in the thousands are common. As this number of deployed systems grows, so does the risk of failure. Critical failures can become unavoidable and offering data protection from a backup solution promotes business continuity. Elongated protection windows requiring multiple jobs of different types can create resource contention with production environments and may require valuable IT admin time, so a finite window for system backups can have plenty of importance.
In our hands-on SAN backup testing, the Symantec NetBackup Integrated Appliance running NetBackup 7.6 offered application protection to 1,000 VMs in 66.8 percent less time than Competitor “E” did. In addition, the Symantec NetBackup Integrated Appliance with NetBackup 7.6 created backup images that offered granular recovery without additional steps. These time and effort savings can scale as your VM footprint grows, allowing you to execute both system protection and user-friendly, simplified recovery.
Highlights
●● ● ●Complete: combining compute, interconnect,
storage and system software
●● ● ●Modular and Extensible: match the
right combination of components and
configurations to meet your workload
●● ● ●Integrated: racked and tested in
IBM manufacturing to reduce time to
compute
A company’s success depends on critical application performance and availability. Upgrades and patches can improve application efficiency and user experience, but making the necessary changes requires resource intensive environments to test updates before deploying them. What’s more, these applications need to continue accessing data even in the event of an on-premises crisis.
Our Dell EMC VMAX 250F and PowerEdge server solution supported test/dev environments and production database applications simultaneously without affecting the production applications’ performance. Storage latency for the VMAX 250F peaked at a millisecond in our testing while IOPS stayed within an acceptable range. The solution also kept data highly available with no downtime or performance drop when we initiated a lost host connection for the primary storage. Consider the Dell EMC VMAX 250F array for your datacenter to support the critical database applications that drive your company.
SNIA : Swift Object Storage adding EC (Erasure Code)Odinot Stanislas
In depth presentation on EC integration in Swift object storage. Content delivered by Paul Luse, Sr. Staff Engineer @ Intel and Kevin Greenan, Staff Software Engineer - Box during fall SNIA event
Run compute-intensive Apache Hadoop big data workloads faster with Dell EMC P...Principled Technologies
Moving compute-intensive, Hadoop big data workloads to current-generation Dell EMC PowerEdge R640 servers powered by 2nd Generation Intel Xeon Scalable processors could allow your organization to better meet the data analysis challenges of today. Faster analysis of large data sets means getting insight into your organization, products, and services sooner, which could help your organization grow and beat its competition.
IBM Object Storage and Software Defined Solutions - CleversafeDiego Alberto Tamayo
Digital Content Growth
• Continued growth in graphical content creation
• Multi-device and HD/4K/8K make it even more challenging to store and process data
• Time & location shifting: viewing on individual schedule
• Content life-cycle management provides a balance between cost & performance
while maintaining customer experience
• Meta data availability and access
• Digital disruption with Over The Top (OTT) digital only competitors - new business
models
• OTT viewing will grow from representing 3.4% of TV viewing hours in 2013 to
20.4% by 2017 in NA
• 63% stream on-demand media more than weekly
• Security and data protection are big issues.
• File sharing declines with legal on-demand options
• Connection speeds are increasing making new options possible
Slow performance and unavailable critical applications can impinge a company’s progress. You can apply patches and updates to improve application quality and user experience, but these changes need to be tested in resource-intensive environments before deployment. Keeping these applications accessing data is vital, too, as on-premises events can put availability at risk.
Our Dell EMC VMAX 250F and PowerEdge server solution supported test/dev environments and production database applications simultaneously without affecting the production applications’ performance. As we added VMs designed for test/dev environments, the production workload maintained an acceptable level of IOPS and achieved an average storage latency of less than a millisecond. The solution also kept data highly available with no downtime and no performance drop when we initiated a lost host connection for the primary storage. To run critical database applications of your company, consider the Dell EMC VMAX 250F for your datacenter.
Your datacenter is capable of doing great things—if you let it. Upgrades from Intel for compute, storage, and networking components can help your business support new services and expand your customer base. In our hands-on testing, we found that new Intel processors, high-bandwidth network components, and SATA or PCIe SSDs working together can boost your datacenter’s capabilities, which could translate to better business operations for your organization.
Symantec NetBackup 7.6 benchmark comparison: Data protection in a large-scale...Principled Technologies
The footprint of a VM can grow quickly in an enterprise environment and large-scale VM deployments in the thousands are common. As this number of deployed systems grows, so does the risk of failure. Critical failures can become unavoidable and offering data protection from a backup solution promotes business continuity. Elongated protection windows requiring multiple jobs of different types can create resource contention with production environments and may require valuable IT admin time, so a finite window for system backups can have plenty of importance.
In our hands-on SAN backup testing, the Symantec NetBackup Integrated Appliance running NetBackup 7.6 offered application protection to 1,000 VMs in 66.8 percent less time than Competitor “E” did. In addition, the Symantec NetBackup Integrated Appliance with NetBackup 7.6 created backup images that offered granular recovery without additional steps. These time and effort savings can scale as your VM footprint grows, allowing you to execute both system protection and user-friendly, simplified recovery.
Highlights
●● ● ●Complete: combining compute, interconnect,
storage and system software
●● ● ●Modular and Extensible: match the
right combination of components and
configurations to meet your workload
●● ● ●Integrated: racked and tested in
IBM manufacturing to reduce time to
compute
A company’s success depends on critical application performance and availability. Upgrades and patches can improve application efficiency and user experience, but making the necessary changes requires resource intensive environments to test updates before deploying them. What’s more, these applications need to continue accessing data even in the event of an on-premises crisis.
Our Dell EMC VMAX 250F and PowerEdge server solution supported test/dev environments and production database applications simultaneously without affecting the production applications’ performance. Storage latency for the VMAX 250F peaked at a millisecond in our testing while IOPS stayed within an acceptable range. The solution also kept data highly available with no downtime or performance drop when we initiated a lost host connection for the primary storage. Consider the Dell EMC VMAX 250F array for your datacenter to support the critical database applications that drive your company.
SNIA : Swift Object Storage adding EC (Erasure Code)Odinot Stanislas
In depth presentation on EC integration in Swift object storage. Content delivered by Paul Luse, Sr. Staff Engineer @ Intel and Kevin Greenan, Staff Software Engineer - Box during fall SNIA event
Containerizing stateful apps with Kubernetes and SUSE CaaS PlatformJuan Herrera Utande
SUSE Containers as a Service Platform (CaaSP) is the simpliest and fastest way to deploy and mantain a Kubernetes cluster with a rock-solid enterprise approach.
Once deployed not only container-native applications can leverage all its functionalities. Existing applications can also be repackaged in a containerized way thanks for Kubernete's unique features to deal with state and persistence. Thanks to volume management, stateful sets more and more applications can benefit from declarative deployments, autoscaling and self healing.
This presentation shows some tips about the main architectural challenges and also presents how SUSE dealt with them during their move to containerized deployment of its Cloud, PaaS and Storage solutions.
Top 6 Practices to Harden Docker Images to Enhance Security9 series
Dockers can be considered equivalent to containers. Different verses of tools and platforms of containers are being used to develop containers to work more profitably. However, there are so many principles for protecting applications based on the container by collaborating with other secured applications.
The presentation describes the functionality as well as the advantages of a project undertaken by us in the Cloud Computing course whereby we build a PaaS using Docker.
The link to the project is :
https://github.com/kanika2107/Paas_with_Docker_CloudProject
Demo video : https://www.youtube.com/watch?v=2h-hodrzTu8&feature=youtu.be
#IIITHyderabad #CloudComputing #CSE565 #Monsoon15 #SIEL #Docker #PaaS #Apache #Containers #PHP #SQL #HTML #Web2Py
Achieving Separation of Compute and Storage in a Cloud WorldAlluxio, Inc.
Alluxio Tech Talk
Feb 12, 2019
Speaker:
Dipti Borkar, Alluxio
The rise of compute intensive workloads and the adoption of the cloud has driven organizations to adopt a decoupled architecture for modern workloads – one in which compute scales independently from storage. While this enables scaling elasticity, it introduces new problems – how do you co-locate data with compute, how do you unify data across multiple remote clouds, how do you keep storage and I/O service costs down and many more.
Enter Alluxio, a virtual unified file system, which sits between compute and storage that allows you to realize the benefits of a hybrid cloud architecture with the same performance and lower costs.
In this webinar, we will discuss:
- Why leading enterprises are adopting hybrid cloud architectures with compute and storage disaggregated
- The new challenges that this new paradigm introduces
- An introduction to Alluxio and the unified data solution it provides for hybrid environments
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio, Inc.
Alluxio Tech Talk
Aug 7, 2019
Speaker:
Dipti Borkar, Alluxio
Alluxio 2.0 is the most ambitious platform upgrade since the inception of Alluxio with greatly expanded capabilities to empower users to run analytics and AI workloads on private, public or hybrid cloud infrastructures leveraging valuable data wherever it might be stored.
This release, now available for download, includes many advancements that will allow users to push the limits of their data-workloads in the cloud.
In this tech talk, we will introduce the key new features and enhancements such as:
- Support for hyper-scale data workloads with tiered metadata storage, distributed cluster services, and adaptive replication for increased data locality
- Machine learning and deep learning workloads on any storage with the improved POSIX API
- Better storage abstraction with support for HDFS clusters across different versions & active sync with Hadoop
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...Docker, Inc.
Mark Church - Product Manager, Docker
Don Stewart - Solutions Architect, Docker
Persistent storage has quickly advanced from something considered incompatible with containers to a mature set of solutions and patterns that have been thoroughly adopted by the industry. We’ll define the persistent characteristics of different use-cases and map these to some of the many solutions that exist for container storage. From this talk you’ll learn about the storage options available to users on Swarm, Kubernetes, on-premises, cloud, and how they work and compare to each other. You’ll also learn how to characterize different persistent application requirements and the solutions best for suited for them.
Dealing with data storage pain points? Learn why a true Software-defined Storage solution is ideal for improving application performance, managing diversity and migrating between different vendors, models and generations of storage devices.
A Hybrid Cloud Storage solution strategy that integrates local storage with cloud models can dramatically change the performance, cost and reliability parameters. Get detailed insights from Netmagic solutions.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Cheryl Hung (oicheryl.com) presents an overview of the current trends in hardware, cloud, and open-source, with thoughts on the most interesting areas of innovation and the impact on infrastructure and business.
Multi-Arch Infra From the Ground Up.pptxCheryl Hung
Webinar: https://www.brighttalk.com/webcast/17792/588393?utm_source=ArmLtd&utm_medium=brighttalk&utm_campaign=588393
Multi-architecture infrastructures enable workloads to run on the best hardware for the task (Arm or x86) and optimize price/performance ratios while boosting design flexibility. However, migrating from a single- to a multi-arch framework can be tricky.
Join me to learn about:
- What is multi-architecture infrastructure?
- The trend for moving cloud workloads to multi-arch
- How early adopters handled the challenges
- The growing software ecosystem of Arm
- Comparing the price-performance of common workloads on x86 vs Arm
- Resources to get you started
This talk prepares developers for the road ahead with the ability to run workloads on the best hardware without needing to be concerned with the underlying architecture.
At a high level, the goal of Multi-Arch infrastructure is that workloads can run on the best hardware for their price/performance needs, without developers being concerned with the underlying architecture. That doesn’t mean it’s easy! Multi-Arch touches Infra As Code, CI/CD, packaging, binaries, images, Kubernetes upgrades, testing, scheduling, rollout, reproducible builds, performance testing and more. This talk looks at how early adopters handled the challenges so you are prepared for the road ahead.
Infrastructure matters - The DevOps Conference, CopenhagenCheryl Hung
http://oicheryl.com/
With hyperscale cloud computing, we now have near-infinite computing resources at our fingertips. But data centres emit as much CO2 as the global airline industry, and that will only increase with growing deployment of AI, smart cities, blockchain and other compute-intensive technologies.
Sustainability, GreenOps and Net Zero aren’t faraway concepts - what you build today directly impacts our planet and our future. How do you understand how to use resources most effectively, and will you create infrastructure that matters?
Cloud Native Trends and 2022 Predictions - Cheryl Hung, 16 June 2022 - Cloud ...Cheryl Hung
1. Why cloud native? The drivers and the challenges of cloud native transformation
2. Industry trends: How cloud native has evolved
3. Cheryl's predictions for 2022
Lessons learned from 3 years inside cncf - WTF is Cloud Native, 4 September 2021Cheryl Hung
Cheryl Hung discusses her journey inside CNCF, home of Kubernetes and one of the top open source foundations, some hard truths about community, and thoughts about the future of cloud native.
oicheryl.com
Lessons learned from 3 years inside CNCF - Swiss Cloud Native DayCheryl Hung
I discuss my journey inside CNCF, home of Kubernetes and one of the top open source foundations, some hard truths about community, and thoughts on the future of cloud native.
https://www.oicheryl.com/
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
5. 5
Why is this tricky with
containers?
@oicheryl
First of all, does anyone not know what containers are?
Applications have become loosely coupled and stateless
Designed to scale and manage failure – it is no longer economical to remediate state
So why is this difficult with containers?
6. 6
@oicheryl
No pets
First of all, You know the cattle/pet analogy? Don’t treat your servers like pets that you have to
lovingly name and take care of, treat them like cattle that when they get ill, you take them out
back and shoot them.
In other words, you don’t want special storage pets, you want ordinary commodity hardware or
cloud instances.
7. 7
@oicheryl
Data
follows
Secondly, data needs to follow containers around. When nodes fail you want to reschedule
that container to another node and for data to follow it.
You want to avoid mapping containers to individual hosts - then you’ve lost your portability and
mobility.
8. 8
@oicheryl
Humans
are
failable
Thirdly, humans are failable. Don’t rely on someone running through a playbook, they will
screw things up.
You want to manage storage through APIs and you want that integrated with Docker and
Kubernetes. So let’s take a closer look at Docker.
9. 9
Docker container layers
@oicheryl
Docker containers comprise a layered image and a writable ‘Container layer’. Here, the base
image is Ubuntu and the top right is the thin r/w container layer, where new or modified data is
stored.
When a container is deleted its writable layer is removed leaving just the underlying image
layers behind.
This is good because sharing layers makes images smaller and lack of state makes it easy to
move containers around.
This is bad because generally you want your app to do something useful in the real world.
So we can look at options with Docker.
10. Docker local volumes
@oicheryl
Docker volumes allow you to share data between host and containers through local
volumes.
Here you run docker volume create which creates a volume called mydata, mount
that into the container at /data, and write a file. When the container exits, it’s persisted
under /var/lib/docker/volumes.
On the plus side, you can’t get faster than writing to your local host. The downside is
that because the data tied to that host, it doesn’t follow you around and if that host
goes down your data is inaccessible. Plus there’s no locking so you have to be careful
with consistency, plus it’s subject to the noisy neighbour problem.
To extend storage Docker has volume plugins, but first I want to
12. 12
What is Cloud Native?
@oicheryl
Horizontally scalable
Built to handle failures
Resilient and survivable
Minimal operator overhead
Decoupled from the underlying platform
What do I mean by cloud native?
Horizontally scalable
Built to handle failures, so no single point of failure
Resilient and survivable, in other words it should be self healing
Minimal operator overhead - it should be API-driven and automatable
Decoupled from the underlying platform and hardware.
How does that apply to storage?
13. 1. Platform agnostic
Eight principles of Cloud Native Storage
@oicheryl
The storage platform should be able to run anywhere and not have proprietary
dependencies that lock an application to a particular platform or a cloud provider.
Additionally, it should be capable of being scaled out in a distributed topology just as
easily as it could be scaled up based on application requirements. Upgrades and
scaling of the storage platform should be implemented as a non-disruptive operation
to the running applications.
14. 1. Platform agnostic
2. API driven
Eight principles of Cloud Native Storage
@oicheryl
Storage resources and services should be easy to be provisioned, consumed, moved
and managed via an API, and provide integration with application runtime and
orchestrator platforms.
15. 1. Platform agnostic
2. API driven
3. Declarative and
composable
Eight principles of Cloud Native Storage
@oicheryl
Storage resources should be declared and composed just like all other resources
required by applications and services, allowing storage resources and services to be
deployed and provisioned as part of application instantiation through orchestrators.
16. 1. Platform agnostic
2. API driven
3. Declarative and
composable
4. Application centric
Eight principles of Cloud Native Storage
@oicheryl
Storage should be presented to and consumed by applications and not by operating
systems or hypervisors. It is no longer desirable to present storage to operating
systems instances, and then, later, have to map applications to operating system
instances to link to storage (whether on-premises or in a cloud provider, or on VMs or
bare metal) . Storage needs to be able to follow an application as it scales, grows,
and moves between platforms and clouds.
17. 1. Platform agnostic
2. API driven
3. Declarative and
composable
4. Application centric
Eight principles of Cloud Native Storage
5. Agile
@oicheryl
The platform should be able to dynamically react to changes in the environment and
be able to move application data between locations, dynamically resize volumes for
growth, take point in time copies of data for data retention or to facilitate rapid
recovery of data, and integrate naturally into dynamic, rapidly changing application
environments.
18. 1. Platform agnostic
2. API driven
3. Declarative and
composable
4. Application centric
Eight principles of Cloud Native Storage
5. Agile
6. Natively secure
@oicheryl
Storage services should integrate and inline security features such as encryption and
RBAC and not depend on secondary products to secure application data.
19. 1. Platform agnostic
2. API driven
3. Declarative and
composable
4. Application centric
Eight principles of Cloud Native Storage
5. Agile
6. Natively secure
7. Performant
@oicheryl
The storage platform should be able to offer deterministic performance in complex
distributed environments and scale efficiently using a minimum of compute resources.
20. 1. Platform agnostic
2. API driven
3. Declarative and
composable
4. Application centric
Eight principles of Cloud Native Storage
5. Agile
6. Natively secure
7. Performant
8. Consistently
available
@oicheryl
The storage platform should manage data distribution with a predictable, proven data
model to ensure high availability, durability, consistency of application data. During
failure conditions, data recovery processes should be application independent and not
affect normal operations.
21. 21
Storage landscape
So that’s the eight principles of cloud native storage. Now we’ll take a look at few
different paradigms, a popular example of each and score them against those 8
principles.
Warning in advance, this is really high level but I hope I can put these into the context
of running them with Docker.
22. Centralised file system: NFS
@oicheryl
Your classic NAS or network attached storage is NFS. You take one NFS server and
export the local filesystem over the network to a number of clients. Who here uses
NFS?
It doesn’t really follow any of the eight principles; it’s hard to scale horizontally, it’s not
integrated into Docker or Kubernetes natively, and it’s a single point of failure so
there’s no availability guarantees, although there are commercial options for failover.
First designed in 1984 - definitely not cloud native. I’ve scored NFS 0.
23. Centralised file system: NFS
@oicheryl
Your classic NAS or network attached storage is NFS. You take one NFS server and
export the local filesystem over the network to a number of clients. Who here uses
NFS?
It doesn’t really follow any of the eight principles; it’s hard to scale horizontally, it’s not
integrated into Docker or Kubernetes natively, and it’s a single point of failure so
there’s no availability guarantees, although there are commercial options for failover.
First designed in 1984 - definitely not cloud native. I’ve scored NFS 0.
24. Storage array: Dell EMC
@oicheryl
Next up is the classic hardware storage array like Dell EMC. This image is totally
unfair of course but it does show the complexity.
It’s even less platform agnostic, since you have to buy the hardware from a specific
vendor, and typically is easier to scale up than horizontally. It’s also absurdly
expensive, has long lead times, inefficient (no thin provisioning) and definitely not
application centric.
But it’s optimised for deterministic performance, which is why many enterprises who
use databases still use storage arrays. Anybody using storage arrays from Dell EMC,
HPE, NetApp, Hitachi and so on?
Also not very cloud native - I’ve given it 2.
25. Storage array: Dell EMC
@oicheryl
Next up is the classic hardware storage array like Dell EMC. This image is totally
unfair of course but it does show the complexity.
It’s even less platform agnostic, since you have to buy the hardware from a specific
vendor, and typically is easier to scale up than horizontally. It’s also absurdly
expensive, has long lead times, inefficient (no thin provisioning) and definitely not
application centric.
But it’s optimised for deterministic performance, which is why many enterprises who
use databases still use storage arrays.
Also not very cloud native - I’ve given it 2.
26. Distributed: Ceph
@oicheryl
Jumping ahead, let’s talk about distributed storage like Ceph, which is a distributed
object store maintained by Red Hat. Distributed architectures typically trade off
performance against consistency, so you get better performance if you don’t need
strong consistency.
So how cloud native is Ceph? Distributed architectures are usually designed with
scaling in mind, and although it’s not natively integrated with Docker or K8s you can
look into a project called Rook for the latter.
The big downsides of Ceph is that it’s complicated to set up, and failures are
expensive. Because of the way the data is distributed across all nodes in a cluster
means that any failures need rebuilding from the whole cluster. The distributed
architecture also means that one write fans out between 13 and 40 times, which limits
the performance you can get from your cluster.
4/8 By the way, if you want to look at the numbers I’m giving, these slides are on
oicheryl.com.
27. Distributed: Ceph
@oicheryl
Jumping ahead, let’s talk about distributed storage like Ceph, which is a distributed
object store maintained by Red Hat. Distributed architectures typically trade off
performance against consistency, so you get better performance if you don’t need
strong consistency.
So how cloud native is Ceph? Distributed architectures are usually designed with
scaling in mind, and although it’s not natively integrated with Docker or K8s you can
look into a project called Rook for the latter.
The big downsides of Ceph is that it’s complicated to set up, and failures are
expensive. Because of the way the data is distributed across all nodes in a cluster
means that any failures need rebuilding from the whole cluster. The distributed
architecture also means that one write fans out between 13 and 40 times, which limits
the performance you can get from your cluster.
4/8 By the way, if you want to look at the numbers I’m giving, these slides are on
oicheryl.com.
28. Public cloud: AWS EBS
@oicheryl
Now if you’re a hipster and you want to join all the cool kids on public cloud, then EBS
is a popular option, which stands for Elastic block storage.
EBS is pretty nice, everything is scalable, highly consistent and high performance. On
the downside, you have a maximum of 40 EBS volumes you can mount per EC2
instance, which limits how many containers you can run per EC2 instance, and when
you move containers between hosts, mounting physical block devices to nodes takes
at least 45 seconds.
I’ll also mention S3 which is Amazon’s eventual consistency object store and
seemingly powers half the internet. Lots of users choose a single Availability Zone
because it’s cheap, but because only S3 guarantees 99.9% monthly uptime, which is
43 minutes downtime a month, outages take out half the internet too, which might be
a good thing if you’re as addicted to Reddit as I am. Great for better for backups and
non-critical data, not so great for business data.
How are we doing on the cloud native front? Well, it’s scalable and highly consistent
and high performance, which is great. But you’re locked into Amazon as a cloud
provider, which is obviously how they like it, it gets pretty expensive which I’m sure
some of you know, and there’s privacy issues about moving sensitive data to the
cloud.
30. Public cloud: AWS EBS
@oicheryl
Now if you’re a hipster and you want to join all the cool kids on public cloud, then EBS
is a popular option, which stands for Elastic block storage.
EBS is pretty nice, everything is scalable, highly consistent and high performance. On
the downside, you have a maximum of 40 EBS instances you can mount per EC2
instance, which limits how many containers you can run per EC2 instance, and
mounting physical block devices to nodes takes at least 45 seconds, which is not
good for being able to move containers to different hosts.
I’ll also mention S3 which is Amazon’s eventual consistency object store and
seemingly powers half the internet. Lots of users choose a single Availability Zone
because it’s cheap, and because only S3 guarantees 99.9% monthly uptime, which is
43 minutes downtime a month, outages take out half the internet too, which might be
a good thing if you’re as addicted to Reddit as I am. Great for better for backups and
non-critical data, not so great for business data.
How are we doing on the cloud native front? Well, it’s scalable and highly consistent
and high performance, which is great. But you’re locked into Amazon as a cloud
provider, which is obviously how they like it, it gets pretty expensive which I’m sure
some of you know, and there’s privacy issues about moving sensitive data to the
cloud.
31. 6/8 By the way, if you want to look at the numbers I’m giving, these slides are on
oicheryl.com.
32. Volume plugin: StorageOS
@oicheryl
Volume plugins are Docker’s way of extending storage capabilities, and StorageOS is
an example of a distributed block storage platform which is deployed with Docker.
To use StorageOS you could create a docker volume with the storageos driver, set
the size is 15 GB and in this case, tell it to create two replicas of the volume on other
nodes. This gets you the high availability (if the node with the master volume goes
down you can promote one of the replicas to a new master), plus all the volumes are
accessible from any node, so if your container goes down you can spin it up
anywhere without worrying about which host it’s on. But it’s not a distributed
filesystem, so the StorageOS can schedule the master volume to the same node as
the container, meaning your reads are local and fast and deteministic.
Given this was built with those principles in mind, I’m obviously giving it an 8, but
there are some downsides; for instance, right now it assumes your cluster is
geographically close, so cross availability zone replication would be slow.
33. Volume plugin: StorageOS
@oicheryl
Volume plugins are Docker’s way of extending storage capabilities, and StorageOS is
an example of a distributed block storage platform which is deployed with Docker.
To use StorageOS you could create a docker volume with the storageos driver, set
the size is 15 GB and in this case, tell it to create two replicas of the volume on other
nodes. This gets you the high availability (if the node with the master volume goes
down you can promote one of the replicas to a new master), plus all the volumes are
accessible from any node, so if your container goes down you can spin it up
anywhere without worrying about which host it’s on. But because it’s not a distributed
filesystem, the StorageOS scheduler can always schedule the master volume to the
same node as the container, meaning your reads are local and you get good
throughput.
Given this was built with those principles in mind, I’m obviously giving it an 8, but
there are some downsides; for instance, right now it assumes your cluster is
geographically close, so cross availability zone replication would be slow.
34. Plugin framework: REX-Ray
@oicheryl
I mentioned Dell EMC before as a hardware vendor, they are also the developers of
REX-Ray which I want to mention because superficially it looks like another storage
plugin.
REX-Ray doesn’t provide storage itself; it’s a framework which supports a number of
different storage systems. Not really a cloud native - it’s just a connector to existing
storage options. So I’m not going to give it a score.
35. 33
Conclusion
Of course, you can run lots of these things in combination. You could run StorageOS
on EBS, REX-Ray on top of Ceph, or NFS from VMs. There’s no one solution, and
often it’s a bit of trial and expensive (error). But hopefully those eight principles have
given you a way to evaluate what you need against what you’re currently using.