The CAP theorem states that it is impossible for a distributed computer system to simultaneously provide consistency, availability, and partition tolerance. It explains the fundamental tradeoffs between these aspects of distributed systems. Consistency means that all nodes see the same data at the same time. Availability means that node failures do not prevent survivors from continuing to operate. Partition tolerance means the system continues operating despite network failures. The theorem proves that a distributed system can only satisfy two of these properties at once.
Today's technical landscape features workloads that can no longer be accomplished on a single server using technology from years past. As a result, we must find new ways to accommodate the increasing demands on our compute performance. Some of these new strategies introduce trade-offs and additional complexity into a system.
In this presentation, we give an overview of scaling and how to address performance concerns that business are facing, today.
Today's technical landscape features workloads that can no longer be accomplished on a single server using technology from years past. As a result, we must find new ways to accommodate the increasing demands on our compute performance. Some of these new strategies introduce trade-offs and additional complexity into a system.
In this presentation, we give an overview of scaling and how to address performance concerns that business are facing, today.
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn. This was a presentation made at QCon 2009 and is embedded on LinkedIn's blog - http://blog.linkedin.com/
Big Data Day LA 2015 - Lessons Learned Designing Data Ingest Systemsaaamase
During my time working on attribution and ingest systems, I've encountered several different approaches to solving the simple question: "How do I get data from A to B". In this session, I'd like to share some of the problems I've encountered and how to effectively solve them.
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
Eventing and streaming open a world of compelling new possibilities to our software and platform designs. They can reduce time to decision and action while lowering total platform cost. But they are not a panacea. Understanding the edges and limits of these architectures can help you avoid painful missteps. This talk will focus on event driven and streaming architectures and how Apache Kafka can help you implement these. It will also discuss key tradeoffs you will face along the way from partitioning schemes to the impact of availability vs. consistency (CAP Theorem). Finally we’ll discuss some challenges of scale for patterns like Event Sourcing and how you can use other tools and even features of Kafka to work around them. This talk assumes a basic understanding of Kafka and distributed computing, but will include brief refresher sections.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn. This was a presentation made at QCon 2009 and is embedded on LinkedIn's blog - http://blog.linkedin.com/
Big Data Day LA 2015 - Lessons Learned Designing Data Ingest Systemsaaamase
During my time working on attribution and ingest systems, I've encountered several different approaches to solving the simple question: "How do I get data from A to B". In this session, I'd like to share some of the problems I've encountered and how to effectively solve them.
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
Eventing and streaming open a world of compelling new possibilities to our software and platform designs. They can reduce time to decision and action while lowering total platform cost. But they are not a panacea. Understanding the edges and limits of these architectures can help you avoid painful missteps. This talk will focus on event driven and streaming architectures and how Apache Kafka can help you implement these. It will also discuss key tradeoffs you will face along the way from partitioning schemes to the impact of availability vs. consistency (CAP Theorem). Finally we’ll discuss some challenges of scale for patterns like Event Sourcing and how you can use other tools and even features of Kafka to work around them. This talk assumes a basic understanding of Kafka and distributed computing, but will include brief refresher sections.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
2. CAP Theorem
• Conjectured by Prof. Eric Brewer at
PODC (Principle of Distributed
Computing) 2000 keynote talk
• Described the trade-offs involved in
distributed system
• It is impossible for a web service to
provide following three guarantees at
the same time:
• Consistency
• Availability
• Partition-tolerance
3. CAP Theorem
• Consistency:
– All nodes should see the same data at the same time
• Availability:
– Node failures do not prevent survivors from
continuing to operate
• Partition-tolerance:
– The system continues to operate despite network
partitions
• A distributed system can satisfy any two of these
guarantees at the same time but not all three
5. CAP Theorem
• A simple example:
Hotel Booking: are we double-booking the
same room?
Bob Dong
6. CAP Theorem
• A simple example:
Hotel Booking: are we double-booking the
same room?
Bob Dong
7. CAP Theorem
• A simple example:
Hotel Booking: are we double-booking the
same room?
Bob Dong
8. CAP Theorem: Proof
• 2002: Proven by research
conducted by Nancy Lynch
and Seth Gilbert at MIT
Gilbert, Seth, and Nancy Lynch. "Brewer's
conjecture and the feasibility of consistent,
available, partition-tolerant web services."
ACM SIGACT News 33.2 (2002): 51-59.
10. CAP Theorem: Proof
• A simple proof using two nodes:
A B
Not Consistent!
Respond to client
11. CAP Theorem: Proof
• A simple proof using two nodes:
A B
Not Available!
Wait to be updated
12. CAP Theorem: Proof
• A simple proof using two nodes:
A B
Not Partition
Tolerant!
A gets updated from B
13. Why this is important?
• The future of databases is distributed (Big
Data Trend, etc.)
• CAP theorem describes the trade-offs
involved in distributed systems
• A proper understanding of CAP theorem is
essential to making decisions about the future
of distributed database design
• Misunderstanding can lead to erroneous or
inappropriate design choices
14. Problem for Relational Database to Scale
• The Relational Database is built on the
principle of ACID (Atomicity, Consistency,
Isolation, Durability)
• It implies that a truly distributed relational
database should have availability, consistency
and partition tolerance.
• Which unfortunately is impossible …
15. Revisit CAP Theorem
C A
P
• Of the following three
guarantees potentially offered
a by distributed systems:
• Consistency
• Availability
• Partition tolerance
• Pick two
• This suggests there are three
kinds of distributed systems:
• CP
• AP
• CA
Any problems?
16. A popular misconception: 2 out 3
• How about CA?
• Can a distributed system
(with unreliable network)
really be not tolerant of
partitions?
C A
17. A few witnesses
• Coda Hale, Yammer software engineer:
– “Of the CAP theorem’s Consistency, Availability,
and Partition Tolerance, Partition Tolerance is
mandatory in distributed systems. You cannot
not choose it.”
http://codahale.com/you-cant-sacrifice-partition-tolerance/
18. A few witnesses
• Werner Vogels, Amazon CTO
– “An important observation is that in larger
distributed-scale systems, network partitions are a
given; therefore, consistency and availability
cannot be achieved at the same time.”
http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
19. A few witnesses
• Daneil Abadi, Co-founder of Hadapt
– So in reality, there are only two types of systems
... I.e., if there is a partition, does the system give
up availability or consistency?
http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html
20. CAP Theorem 12 year later
• Prof. Eric Brewer: father of CAP
theorem
• “The “2 of 3” formulation was
always misleading because it
tended to oversimplify the
tensions among properties. ...
• CAP prohibits only a tiny part of
the design space: perfect
availability and consistency in the
presence of partitions, which are
rare.”
http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
21. Consistency or Availability
C A
P
• Consistency and Availability is
not “binary” decision
• AP systems relax consistency
in favor of availability – but
are not inconsistent
• CP systems sacrifice
availability for consistency-
but are not unavailable
• This suggests both AP and CP
systems can offer a degree of
consistency, and availability,
as well as partition tolerance
22. AP: Best Effort Consistency
• Example:
– Web Caching
– DNS
• Trait:
– Optimistic
– Expiration/Time-to-live
– Conflict resolution
24. Types of Consistency
• Strong Consistency
– After the update completes, any subsequent access
will return the same updated value.
• Weak Consistency
– It is not guaranteed that subsequent accesses will
return the updated value.
• Eventual Consistency
– Specific form of weak consistency
– It is guaranteed that if no new updates are made to
object, eventually all accesses will return the last
updated value (e.g., propagate updates to replicas in a
lazy fashion)
25. Eventual Consistency Variations
• Causal consistency
– Processes that have causal relationship will see
consistent data
• Read-your-write consistency
– A process always accesses the data item after it’s
update operation and never sees an older value
• Session consistency
– As long as session exists, system guarantees read-
your-write consistency
– Guarantees do not overlap sessions
26. Eventual Consistency Variations
• Monotonic read consistency
– If a process has seen a particular value of data item,
any subsequent processes will never return any
previous values
• Monotonic write consistency
– The system guarantees to serialize the writes by the
same process
• In practice
– A number of these properties can be combined
– Monotonic reads and read-your-writes are most
desirable
27. Eventual Consistency
- A Facebook Example
• Bob finds an interesting story and shares with
Alice by posting on her Facebook wall
• Bob asks Alice to check it out
• Alice logs in her account, checks her Facebook
wall but finds:
- Nothing is there!
28. Eventual Consistency
- A Facebook Example
• Bob tells Alice to wait a bit and check out later
• Alice waits for a minute or so and checks back:
- She finds the story Bob shared with her!
29. Eventual Consistency
- A Facebook Example
• Reason: it is possible because Facebook uses
an eventual consistent model
• Why Facebook chooses eventual consistent
model over the strong consistent one?
– Facebook has more than 1 billion active users
– It is non-trivial to efficiently and reliably store the
huge amount of data generated at any given time
– Eventual consistent model offers the option to
reduce the load and improve availability
30. Eventual Consistency
- A Dropbox Example
• Dropbox enabled immediate consistency via
synchronization in many cases.
• However, what happens in case of a network
partition?
31. Eventual Consistency
- A Dropbox Example
• Let’s do a simple experiment here:
– Open a file in your drop box
– Disable your network connection (e.g., WiFi, 4G)
– Try to edit the file in the drop box: can you do
that?
– Re-enable your network connection: what
happens to your dropbox folder?
32. Eventual Consistency
- A Dropbox Example
• Dropbox embraces eventual consistency:
– Immediate consistency is impossible in case of a
network partition
– Users will feel bad if their word documents freeze
each time they hit Ctrl+S , simply due to the large
latency to update all devices across WAN
– Dropbox is oriented to personal syncing, not on
collaboration, so it is not a real limitation.
33. Eventual Consistency
- An ATM Example
• In design of automated teller machine (ATM):
– Strong consistency appear to be a nature choice
– However, in practice, A beats C
– Higher availability means higher revenue
– ATM will allow you to withdraw money even if the
machine is partitioned from the network
– However, it puts a limit on the amount of withdraw
(e.g., $200)
– The bank might also charge you a fee when a
overdraft happens
34. Dynamic Tradeoff between C and A
• An airline reservation system:
– When most of seats are available: it is ok to rely
on somewhat out-of-date data, availability is more
critical
– When the plane is close to be filled: it needs more
accurate data to ensure the plane is not
overbooked, consistency is more critical
• Neither strong consistency nor guaranteed
availability, but it may significantly increase
the tolerance of network disruption
35. Heterogeneity: Segmenting C and A
• No single uniform requirement
– Some aspects require strong consistency
– Others require high availability
• Segment the system into different components
– Each provides different types of guarantees
• Overall guarantees neither consistency nor
availability
– Each part of the service gets exactly what it needs
• Can be partitioned along different dimensions
36. Discussion
• In an e-commercial system (e.g., Amazon, e-Bay,
etc), what are the trade-offs between consistency
and availability you can think of? What is your
strategy?
• Hint -> Things you might want to consider:
– Different types of data (e.g., shopping cart, billing,
product, etc.)
– Different types of operations (e.g., query, purchase,
etc.)
– Different types of services (e.g., distributed lock, DNS,
etc.)
– Different groups of users (e.g., users in different
geographic areas, etc.)
37. Partitioning Examples
• Data Partitioning
• Operational Partitioning
• Functional Partitioning
• User Partitioning
• Hierarchical Partitioning
38. Partitioning Examples
Data Partitioning
• Different data may require different consistency
and availability
• Example:
• Shopping cart: high availability, responsive, can
sometimes suffer anomalies
• Product information need to be available, slight
variation in inventory is sufferable
• Checkout, billing, shipping records must be consistent
39. Partitioning Examples
Operational Partitioning
• Each operation may require different balance
between consistency and availability
• Example:
• Reads: high availability; e.g.., “query”
• Writes: high consistency, lock when writing; e.g.,
“purchase”
40. Partitioning Examples
Functional Partitioning
• System consists of sub-services
• Different sub-services provide different
balances
• Example: A comprehensive distributed system
– Distributed lock service (e.g., Chubby) :
• Strong consistency
– DNS service:
• High availability
41. Partitioning Examples
User Partitioning
• Try to keep related data close together to
assure better performance
• Example: Craglist
– Might want to divide its service into several data
centers, e.g., east coast and west coast
• Users get high performance (e.g., high availability and
good consistency) if they query servers closet to them
• Poorer performance if a New York user query Craglist in
San Francisco
42. Partitioning Examples
Hierarchical Partitioning
• Large global service with local “extensions”
• Different location in hierarchy may use
different consistency
• Example:
– Local servers (better connected) guarantee more
consistency and availability
– Global servers has more partition and relax one of
the requirement
43. What if there are no partitions?
• Tradeoff between Consistency and Latency:
• Caused by the possibility of failure in
distributed systems
– High availability -> replicate data -> consistency
problem
• Basic idea:
– Availability and latency are arguably the same
thing: unavailable -> extreme high latency
– Achieving different levels of
consistency/availability takes different amount of
time
44. CAP -> PACELC
• A more complete description of the space of
potential tradeoffs for distributed system:
– If there is a partition (P), how does the system
trade off availability and consistency (A and C);
else (E), when the system is running normally in
the absence of partitions, how does the system
trade off latency (L) and consistency (C)?
Abadi, Daniel J. "Consistency tradeoffs in modern distributed database
system design." Computer-IEEE Computer Magazine 45.2 (2012): 37.
46. Examples
• PA/EL Systems: Give up both Cs for availability and lower
latency
– Dynamo, Cassandra, Riak
• PC/EC Systems: Refuse to give up consistency and pay
the cost of availability and latency
– BigTable, Hbase, VoltDB/H-Store
• PA/EC Systems: Give up consistency when a partition
happens and keep consistency in normal operations
– MongoDB
• PC/EL System: Keep consistency if a partition occurs but
gives up consistency for latency in normal operations
– Yahoo! PNUTS
47. Contact: Prof. Dong Wang: dwang5@nd.edu
http://www3.nd.edu/~dwang5/teach/spring15/spring15_wang_flyer.pdf
CSE 40437/60437, Spring 2015:
Social Sensing & Cyber-Physical Systems
Cyber
Physical
Systems
Embedded Computing
Systems
Green Navigation
Systems
Zero-Energy Buildings
Smart Grids
Body Area Networks
Social Sensing
48. Contact: Prof. Dong Wang: dwang5@nd.edu
http://www3.nd.edu/~dwang5/teach/spring15/spring15_wang_flyer.pdf
CSE 40437/60437, Spring 2015:
Social Sensing & Cyber-Physical Systems
Cyber
Physical
Systems
Embedded Computing
Systems
Green Navigation
Systems
Zero-Energy Buildings
Smart Grids
Body Area Networks
Social Sensing
49. Analytics
Data
News and Public
Sources
Events
Decision
Support
Boston Bombing
Hurricane Sandy
Egypt unrest
Stock Prediction
(Money)
Traffic Monitoring
(Time)
Disaster Response
(Lives)
Geo Tagging (Smart
City)
Contact: Prof. Dong Wang: dwang5@nd.edu
http://www3.nd.edu/~dwang5/teach/spring15/spring15_wang_flyer.pdf
Applications
Sensors
People
CSE 40437/60437, Spring 2015:
Social Sensing & Cyber-Physical Systems