LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGijccsa
Load balancing techniques in cloud computing can be applied at different levels. There are two main
levels: load balancing on physical server and load balancing on virtual servers. Load balancing on a
physical server is policy of allocating physical servers to virtual machines. And load balancing on virtual
machines is a policy of allocating resources from physical server to virtual machines for tasks or
applications running on them. Depending on the requests of the user on cloud computing is SaaS (Software
as a Service), PaaS (Platform as a Service) or IaaS (Infrastructure as a Service) that has a proper load
balancing policy. When receiving the task, the cloud data center will have to allocate these tasks efficiently
so that the response time is minimized to avoid congestion. Load balancing should also be performed
between different datacenters in the cloud to ensure minimum transfer time. In this paper, we propose a
virtual machine-level load balancing algorithm that aims to improve the average response time and
average processing time of the system in the cloud environment. The proposed algorithm is compared to the
algorithms of Avoid Deadlocks [5], Maxmin [6], Throttled [8] and the results show that our algorithms
have optimized response times.
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGijccsa
Load balancing techniques in cloud computing can be applied at different levels. There are two main
levels: load balancing on physical server and load balancing on virtual servers. Load balancing on a
physical server is policy of allocating physical servers to virtual machines. And load balancing on virtual
machines is a policy of allocating resources from physical server to virtual machines for tasks or
applications running on them. Depending on the requests of the user on cloud computing is SaaS (Software
as a Service), PaaS (Platform as a Service) or IaaS (Infrastructure as a Service) that has a proper load
balancing policy. When receiving the task, the cloud data center will have to allocate these tasks efficiently
so that the response time is minimized to avoid congestion. Load balancing should also be performed
between different datacenters in the cloud to ensure minimum transfer time. In this paper, we propose a
virtual machine-level load balancing algorithm that aims to improve the average response time and
average processing time of the system in the cloud environment. The proposed algorithm is compared to the
algorithms of Avoid Deadlocks [5], Maxmin [6], Throttled [8] and the results show that our algorithms
have optimized response times.
• What is Storm?
• Who use Storm?
• Storm Vs Hadoop
• Storm Components
• Storm Topology
• Storm Primitives
• Why Storm is ideal for Real Time Processing?
A tutorial presentation based on storm.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
Engaging users in real-time is the topic of our times. Whether it’s a game, a shop, or a content-network, the aim remains the same: providing a personalized experience. In this workshop we will look under the hood of Apache Storm and lay a firm foundation on how to use it with PHP. By that, you can leverage your existing codebase and PHP expertise for an entirely new world: real-time analytics and business logic working on message streams. During the course of the workshop, we will introduce Apache Storm and take a look at all of its components. We will then skyrocket the applicability of Storm by showing you how to implement their components with PHP. All exercises will be conducted using an example project, the infamous and most exhilarating lolcat kitten game ever conceived: Plan 9 From Outer Kitten. In order to follow the hands-on excercises, you will need a development VM prepared by us with all relevant system components and our project repositories. To make the workshop experience as smooth as possible for all participants, please bring a prepared computer to the workshop, as there will be no time to deal with installation and setup issues. Please download all prerequisites and install them as described: VM, Plan 9 webapp, Plan 9 storm backend, (Tutorial: https://github.com/DECK36/plan9_workshop_tutorial ).
Why building a big data platform is hard? What are the key aspects involved in providing a "Serverless" experience for data folks. And how Databricks solves infrastructure problems and provides the "Serverless" experience.
Training Slides: 151 - Tungsten Replicator - Moving your DataContinuent
This 21min training session provides a refresher or getting-started overview what Tungsten Replicator is, how it works and an introduction to Replicator stages and states.
TOPICS COVERED
- Review the capabilities of Tungsten Replicator
- Take a look at the inner workings of the Replicator
- Understand Replicator States
- Explore the Replicator Stages
This slides are for a brief seminar that I give in a Ph.D. exam "Perspective in Parallel Computing" (held by prof. Marco Danelutto) at University of Pisa (Italy).
They are a rapid introduction to Apache Storm and how it relates to classical algorithmic skeleton parallel frameworks
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...DataStax
A solid backup strategy is a DBA's bread and butter. Cassandra's nodetool snapshot makes it easy to back up the SSTable files, but there remains the question of where to put them and how. Knewton's backup strategy uses Ansible for distributed backups and stores them in S3.
Unfortunately, it's all too easy to store backups that are essentially useless due to the absence of a coherent restoration strategy. This problem proved much more difficult and nuanced than taking the backups themselves. I will discuss Knewton's restoration strategy, which again leverages Ansible, yet I will focus on general principles and pitfalls to be avoided. In particular, restores necessitated modifying our backup strategy to generate cluster-wide metadata that is critical for a smooth automated restoration. Such pitfalls indicate that a restore-focused backup design leads to faster and more deterministic recovery.
About the Speaker
Joshua Wickman Database Engineer, Knewton
Dr. Joshua Wickman is currently part of the database team at Knewton, a NYC tech company focused on adaptive learning. He earned his PhD at the University of Delaware in 2012, where he studied particle physics models of the early universe. After a brief stint teaching college physics, he entered the New York tech industry in 2014 working with NoSQL, first with MongoDB and then Cassandra. He was certified in Cassandra at his first Cassandra Summit in 2015.
Apache Storm and twitter Streaming API integrationUday Vakalapudi
1) Storm is a distributed, real-time computation system.
2) The input stream of a Storm cluster is handled by a component called a spout. The spout passes the data to a bolt, a bolt either persists the data in some sort of storage, or passes it to some other bolt. You can imagine a Storm cluster as a chain of bolt components that each make some kind of transformation on the data exposed by the spout.
1) Real-time systems must guarantee the data processing.
2) And also it should be horizontally scalable, means, just adding few nodes to improve the scalability of a cluster.
3) It should be fault-tolerance, means, if any error occurs or any node goes down, our system should work without any hesitation.
4) We need to get rid of all the intermediate message brokers, because they are complex, and slow, because, instead of sending messages directly from producer to consumers, it has to go through third party message brokers, moreover, those third party message brokers are persist the input data into the disk. This whole process will consume extra time to process the data.
5) In comparison with Storm, Hadoop is ok, because Hadoop also provides a high latency system, so if you take a few hours of down time, you still have high latency, but in real time systems, if you take few hours of down time. Then you no longer in real time, which means robustness requirements, are much harder. Storm satisfies all those properties without any hesitation.
1) Both Hadoop and Storm are distributed and fault-Tolerance systems, but, Hadoop mainly used for batch processing systems, whereas Storm used for Real-time computation systems.
2) Storm doesn’t have inbuilt Storage system, it mainly builds on “come and get some” strategy. In other side, Hadoop have HDFS as storage file system.
1) Both Storm and Flume used for real-time data processing, but Flume will not give you real-time computation systems. moreover flume depends on channel Message broker component, for, guaranteed data processing, here, channel always persist the data before sending it to Consumer, but for Storm, there is no intermediate message brokers concept, it Just Works like as lite as possible. Whatever business logic that you want to write, will goes under Bolt component of Storm.
• What is Storm?
• Who use Storm?
• Storm Vs Hadoop
• Storm Components
• Storm Topology
• Storm Primitives
• Why Storm is ideal for Real Time Processing?
A tutorial presentation based on storm.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
Engaging users in real-time is the topic of our times. Whether it’s a game, a shop, or a content-network, the aim remains the same: providing a personalized experience. In this workshop we will look under the hood of Apache Storm and lay a firm foundation on how to use it with PHP. By that, you can leverage your existing codebase and PHP expertise for an entirely new world: real-time analytics and business logic working on message streams. During the course of the workshop, we will introduce Apache Storm and take a look at all of its components. We will then skyrocket the applicability of Storm by showing you how to implement their components with PHP. All exercises will be conducted using an example project, the infamous and most exhilarating lolcat kitten game ever conceived: Plan 9 From Outer Kitten. In order to follow the hands-on excercises, you will need a development VM prepared by us with all relevant system components and our project repositories. To make the workshop experience as smooth as possible for all participants, please bring a prepared computer to the workshop, as there will be no time to deal with installation and setup issues. Please download all prerequisites and install them as described: VM, Plan 9 webapp, Plan 9 storm backend, (Tutorial: https://github.com/DECK36/plan9_workshop_tutorial ).
Why building a big data platform is hard? What are the key aspects involved in providing a "Serverless" experience for data folks. And how Databricks solves infrastructure problems and provides the "Serverless" experience.
Training Slides: 151 - Tungsten Replicator - Moving your DataContinuent
This 21min training session provides a refresher or getting-started overview what Tungsten Replicator is, how it works and an introduction to Replicator stages and states.
TOPICS COVERED
- Review the capabilities of Tungsten Replicator
- Take a look at the inner workings of the Replicator
- Understand Replicator States
- Explore the Replicator Stages
This slides are for a brief seminar that I give in a Ph.D. exam "Perspective in Parallel Computing" (held by prof. Marco Danelutto) at University of Pisa (Italy).
They are a rapid introduction to Apache Storm and how it relates to classical algorithmic skeleton parallel frameworks
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...DataStax
A solid backup strategy is a DBA's bread and butter. Cassandra's nodetool snapshot makes it easy to back up the SSTable files, but there remains the question of where to put them and how. Knewton's backup strategy uses Ansible for distributed backups and stores them in S3.
Unfortunately, it's all too easy to store backups that are essentially useless due to the absence of a coherent restoration strategy. This problem proved much more difficult and nuanced than taking the backups themselves. I will discuss Knewton's restoration strategy, which again leverages Ansible, yet I will focus on general principles and pitfalls to be avoided. In particular, restores necessitated modifying our backup strategy to generate cluster-wide metadata that is critical for a smooth automated restoration. Such pitfalls indicate that a restore-focused backup design leads to faster and more deterministic recovery.
About the Speaker
Joshua Wickman Database Engineer, Knewton
Dr. Joshua Wickman is currently part of the database team at Knewton, a NYC tech company focused on adaptive learning. He earned his PhD at the University of Delaware in 2012, where he studied particle physics models of the early universe. After a brief stint teaching college physics, he entered the New York tech industry in 2014 working with NoSQL, first with MongoDB and then Cassandra. He was certified in Cassandra at his first Cassandra Summit in 2015.
Apache Storm and twitter Streaming API integrationUday Vakalapudi
1) Storm is a distributed, real-time computation system.
2) The input stream of a Storm cluster is handled by a component called a spout. The spout passes the data to a bolt, a bolt either persists the data in some sort of storage, or passes it to some other bolt. You can imagine a Storm cluster as a chain of bolt components that each make some kind of transformation on the data exposed by the spout.
1) Real-time systems must guarantee the data processing.
2) And also it should be horizontally scalable, means, just adding few nodes to improve the scalability of a cluster.
3) It should be fault-tolerance, means, if any error occurs or any node goes down, our system should work without any hesitation.
4) We need to get rid of all the intermediate message brokers, because they are complex, and slow, because, instead of sending messages directly from producer to consumers, it has to go through third party message brokers, moreover, those third party message brokers are persist the input data into the disk. This whole process will consume extra time to process the data.
5) In comparison with Storm, Hadoop is ok, because Hadoop also provides a high latency system, so if you take a few hours of down time, you still have high latency, but in real time systems, if you take few hours of down time. Then you no longer in real time, which means robustness requirements, are much harder. Storm satisfies all those properties without any hesitation.
1) Both Hadoop and Storm are distributed and fault-Tolerance systems, but, Hadoop mainly used for batch processing systems, whereas Storm used for Real-time computation systems.
2) Storm doesn’t have inbuilt Storage system, it mainly builds on “come and get some” strategy. In other side, Hadoop have HDFS as storage file system.
1) Both Storm and Flume used for real-time data processing, but Flume will not give you real-time computation systems. moreover flume depends on channel Message broker component, for, guaranteed data processing, here, channel always persist the data before sending it to Consumer, but for Storm, there is no intermediate message brokers concept, it Just Works like as lite as possible. Whatever business logic that you want to write, will goes under Bolt component of Storm.
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesComunidade NetPonto
A plataforma Windows Azure abre espaço a desenvimento de aplicações utilizando o novo paradigma: "A Nuvem". Aplicações escaláveis, redundantes, e mais próximas do utilizador final. Isto tudo utilizando como base os conhecimentos que já tem e o novo Visual Studio 2010.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
Amazon Aurora is a cloud-optimized relational database that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The recently announced PostgreSQL-compatibility, together with the original MySQL compatibility, are perfect for new application development and for migrations from overpriced, restrictive commercial databases. In this session, we’ll do a deep dive into the new architectural model and distributed systems techniques behind Amazon Aurora, discuss best practices and configurations, look at migration options and share customer experience from the field.
The event, held on 27th April 2019, was part of the Global Azure Bootcamp and covered Microsoft's Cosmos DB, more specifically:
- Introduction to Cosmos DB, its features, internals, resource models, and request units.
- DEMO: Create an SQL API. Download sample .NET app. Simple queries.
- Covered Change Feed and showcased various use case scenarios.
- Detailed Global Distribution and Consistency Models implications.
- DEMO: Mongo - Lift and shift. Run simple .NET code against a MongoDB (in docker container) and cosmos.
- Introduction to Tinkerpop graphs
- DEMO: Graphs API. Download sample .NET app. Simple queries.
https://techspark.mt/global-azure-bootcamp-27th-april-2019/
Building a Scalable Architecture for web appsDirecti Group
Visit http://wiki.directi.com/x/LwAj for the video. This is a presentation I delivered at the Great Indian Developer Summit 2008. It covers a wide-array of topics and a plethora of lessons we have learnt (some the hard way) over the last 9 years in building web apps that are used by millions of users serving billions of page views every month. Topics and Techniques include Vertical scaling, Horizontal Scaling, Vertical Partitioning, Horizontal Partitioning, Loose Coupling, Caching, Clustering, Reverse Proxying and more.
Relational databases are a cornerstone of the enterprise IT landscape, powering business-critical applications of many kinds. Though they have been around for a while, current commercial relational databases have lagged behind in innovation. Amazon Aurora, a managed database service built for the cloud, is intended to change that. It targets the high-performance needs of business-critical applications with an emphasis on cost-effectiveness. In this session, we will look into how Aurora fits the needs of applications built and bought by enterprises to power their business. You will learn about the overall architecture, capabilities, and cost-effectiveness of Aurora, comparing it to current commercial database offerings. We will explore best practices for enterprises adopting Aurora for existing and new workloads, as well as strategies, tools, and techniques for migrating existing databases to Aurora. You will also hear from Expedia, one of world’s leading travel companies on how they are using Amazon Aurora to power application with high performance database needs.
Similar to Sql saturday azure storage by Anton Vidishchev (20)
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
3. About me
Program Manager @ Edgar Online, RRD
Windows Azure MVP
Co-organizer of Odessa .NET User Group
Ukrainian IT Awards 2013 Winner – Software Engineering
http://cloudytimes.azurewebsites.net/
http://www.linkedin.com/in/antonvidishchev
https://www.facebook.com/anton.vidishchev
5. Windows Azure Storage
Cloud Storage - Anywhere and anytime access
Blobs, Disks, Tables and Queues
Highly Durable, Available and Massively Scalable
Easily build “internet scale” applications
10 trillion stored objects
900K request/sec on average (2.3+ trillion per month)
Pay for what you use
Exposed via easy and open REST APIs
Client libraries in .NET, Java, Node.js, Python, PHP,
Ruby
12. Design Goals
Highly Available with Strong Consistency
Provide access to data in face of failures/partitioning
Durability
Replicate data several times within and across regions
Scalability
Need to scale to zettabytes
Provide a global namespace to access data around
the world
Automatically scale out and load balance data to
meet peak traffic demands
13. Windows Azure Storage Stamps
Access blob storage via the URL: http://<account>.blob.core.windows.net/
Data access
Storage
Location
Service
LB
LB
Front-Ends
Front-Ends
Partition Layer
Partition Layer
Inter-stamp (Geo) replication
DFS Layer
DFS Layer
Intra-stamp replication
Intra-stamp replication
Storage Stamp
Storage Stamp
15. Availability with Consistency for Writing
All writes are appends to the end of a log, which is
an append to the last extent in the log
Write Consistency across all replicas for an
extent:
Appends are ordered the same across all
3 replicas for an extent (file)
Only return success if all 3 replica
appends are committed to storage
When extent gets to a certain size or on
write failure/LB, seal the extent’s replica
set and never append anymore data to it
Write Availability: To handle failures during write
Seal extent’s replica set
Append immediately to a new extent
(replica set) on 3 other available nodes
Add this new extent to the end of the
partition’s log (stream)
Partition Layer
16. Availability with Consistency for Reading
Read Consistency: Can
read from any replica, since
data in each replica for an
extent is bit-wise identical
Read Availability: Send out
parallel read requests if first
read is taking higher than
95% latency
Partition Layer
17. Dynamic Load Balancing – Partition Layer
Spreads index/transaction processing
across partition servers
Master monitors traffic
load/resource utilization on
partition servers
Dynamically load balance
partitions across servers to
achieve better
performance/availability
Does not move data around, only
reassigns what part of the index a
partition server is responsible for
Partition Layer
Index
18. Dynamic Load Balancing – DFS Layer
DFS Read load balancing across replicas
Monitor latency/load on each
node/replica; dynamically select
what replica to read from and start
additional reads in parallel based on
95% latency
Partition Layer
19. Architecture Summary
Durability: All data stored with at least 3 replicas
Consistency: All committed data across all 3 replicas are identical
Availability: Can read from any 3 replicas; If any issues writing seal
extent and continue appending to new extent
Performance/Scale: Retry based on 95% latencies; Auto scale out and
load balance based on load/capacity
Additional details can be found in the SOSP paper:
“Windows Azure Storage: A Highly Available Cloud Storage Service with Strong
Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct.
2011
21. General .NET Best Practices For Azure
Storage
Disable Nagle for small messages (< 1400 b)
ServicePointManager.UseNagleAlgorithm = false;
Disable Expect 100-Continue*
ServicePointManager.Expect100Continue = false;
Increase default connection limit
ServicePointManager.DefaultConnectionLimit = 100; (Or
More)
Take advantage of .Net 4.5 GC
GC performance is greatly improved
Background GC: http://msdn.microsoft.com/enus/magazine/hh882452.aspx
22. General Best Practices
Locate Storage accounts close to compute/users
Understand Account Scalability targets
Use multiple storage accounts to get more
Distribute your storage accounts across regions
Consider heating up the storage for better
performance
Cache critical data sets
To get more request/sec than the account/partition targets
As a Backup data set to fall back on
Distribute load over many partitions and avoid
spikes
23. General Best Practices (cont.)
Use HTTPS
Optimize what you send & receive
Blobs: Range reads, Metadata, Head Requests
Tables: Upsert, Projection, Point Queries
Queues: Update Message
Control Parallelism at the application layer
Unbounded Parallelism can lead to slow latencies and
throttling
Enable Logging & Metrics on each storage
service
24. Blob Best Practices
Try to match your read size with your write size
Avoid reading small ranges on blobs with large blocks
CloudBlockBlob.StreamMinimumReadSizeInBytes/
StreamWriteSizeInBytes
How do I upload a folder the fastest?
Upload multiple blobs simultaneously
How do I upload a blob the fastest?
Use parallel block upload
Concurrency (C)- Multiple workers upload different
blobs
Parallelism (P) – Multiple workers upload different
blocks for same blob
25. Concurrency Vs. Blob Parallelism
•
•
•
C=1, P=1 => Averaged ~ 13. 2 MB/s
C=1, P=30 => Averaged ~ 50.72 MB/s
C=30, P=1 => Averaged ~ 96.64 MB/s
• Single TCP connection is bound by
TCP rate control & RTT
• P=30 vs. C=30: Test completed
almost twice as fast!
• Single Blob is bound by the limits
of a single partition
• Accessing multiple blobs
concurrently scales
10000
8000
6000
4000
2000
Time (s)
XL VM Uploading 512, 256MB
Blobs (Total upload size =
128GB)
0
27. Table Best Practices
Critical Queries: Select PartitionKey, RowKey to avoid hotspots
Table Scans are expensive – avoid them at all costs for latency sensitive
scenarios
Batch: Same PartitionKey for entities that need to be updated
together
Schema-less: Store multiple types in same table
Single Index – {PartitionKey, RowKey}: If needed, concatenate
columns to form composite keys
Entity Locality: {PartitionKey, RowKey} determines sort order
Store related entites together to reduce IO and improve performance
Table Service Client Layer in 2.1 and 2.2: Dramatic performance
improvements and better NoSQL interface
28. Queue Best Practices
Make message processing idempotent: Messages
become visible if client worker fails to delete
message
Benefit from Update Message: Extend visibility time
based on message or save intermittent state
Message Count: Use this to scale workers
Dequeue Count: Use it to identify poison messages
or validity of invisibility time used
Blobs to store large messages: Increase throughput
by having larger batches
Multiple Queues: To get more than a single queue
(partition) target