This talk will highlight some of the challenges to building a fault tolerant distributed architecture, and how MemSQL's architecture tackles these challenges.
Brent Friedman
- Three hour workshop format, 30 minutes or so on history, trends, quick review of normalization, 45 minutes on a normalization walk-through including everyone, then break into small teams (3-6 ppl) to do a 'normalization challenge' on a real world practical problem
Brent Friedman
- Three hour workshop format, 30 minutes or so on history, trends, quick review of normalization, 45 minutes on a normalization walk-through including everyone, then break into small teams (3-6 ppl) to do a 'normalization challenge' on a real world practical problem
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationMuhammad Saleem
Efficient federated query processing is of significant importance to tame the large amount of data available on the Web of Data. Previous works have focused on generating optimized query execution plans for fast result retrieval. However, devising source selection approaches beyond triple pattern-wise source selection has not received much attention. This work presents HiBISCuS, a novel hypergraph-based source selection approach to federated SPARQL querying. Our approach can be directly combined with existing SPARQL query federation engines to achieve the same recall while querying fewer data sources. We extend three well-known SPARQL query federation engines with HiBISCus and compare our extensions with the original approaches on FedBench. Our evaluation shows that HiBISCuS can efficiently reduce the total number of sources selected without losing recall. Moreover, our approach significantly reduces the execution time of the selected engines on most of the benchmark queries.
FedX - Optimization Techniques for Federated Query Processing on Linked Dataaschwarte
The final slides of our talk about FedX at the 10th International Semantic Web Conference in Bonn. For details about FedX see http://www.fluidops.com/fedx/
Re-using Media on the Web: Media fragment re-mixing and playoutMediaMixerCommunity
A number of novel application ideas will be introduced based on the media fragment creation, specification and rights management technologies. Semantic search and retrieval allows us to organize sets of fragments by topical or conceptual relevance. These fragment sets can then be played out in a non-linear fashion to create a new media re-mix. We look at a server-client implementation supporting Media Fragments, before allowing the participants to take the sets of media they have selected and create their own re-mix.
Hadoop is an open source software framework that supports data-intensive distributed applications. Hadoop is licensed under the Apache v2 license. It is therefore generally known as Apache Hadoop. Hadoop has been developed, based on a paper originally written by Google on MapReduce system and applies concepts of functional programming. Hadoop is written in the Java programming language and is the highest-level Apache project being constructed and used by a global community of contributors. Hadoop was developed by Doug Cutting and Michael J. Cafarella. And just don't overlook the charming yellow elephant you see, which is basically named after Doug's son's toy elephant!
The topics covered in presentation are:
1. Big Data Learning Path
2.Big Data Introduction
3. Hadoop and its Eco-system
4.Hadoop Architecture
5.Next Step on how to setup Hadoop
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison Severalnines
Galera Cluster for MySQL, Percona XtraDB Cluster and MariaDB Cluster (the three “flavours” of Galera Cluster) make use of the Galera WSREP libraries to handle synchronous replication.MySQL Cluster is the official clustering solution from Oracle, while Galera Cluster for MySQL is slowly but surely establishing itself as the de-facto clustering solution in the wider MySQL eco-system.
In this webinar, we will look at all these alternatives and present an unbiased view on their strengths/weaknesses and the use cases that fit each alternative.
This webinar will cover the following:
MySQL Cluster architecture: strengths and limitations
Galera Architecture: strengths and limitations
Deployment scenarios
Data migration
Read and write workloads (Optimistic/pessimistic locking)
WAN/Geographical replication
Schema changes
Management and monitoring
The database market is large and filled with many solutions. In this talk, Seth Luersen from MemSQL we will take a look at what is happening within AWS, the overall data landscape, and how customers can benefit from using MemSQL within the AWS ecosystem.
Converging Database Transactions and Analytics SingleStore
delivered at the Gartner Data and Analytics 2018 show in Texas. This presentation discusses real-time applications and their impact on existing data infrastructures
MemSQL 201: Advanced Tips and Tricks WebcastSingleStore
Topics discussed include differences between columnstore and rowstore engines, data ingestion, data sharding and query tuning, lastly memory and workload management.
Watch the replay at https://memsql.wistia.com/medias/4siccvlorm
An Engineering Approach to Database EvaluationsSingleStore
This talk will go over a methodical approach for making a decision, dig into interesting tradeoffs, and give tips about what things to look for under the hood and how to evaluate the tech behind the database.
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationMuhammad Saleem
Efficient federated query processing is of significant importance to tame the large amount of data available on the Web of Data. Previous works have focused on generating optimized query execution plans for fast result retrieval. However, devising source selection approaches beyond triple pattern-wise source selection has not received much attention. This work presents HiBISCuS, a novel hypergraph-based source selection approach to federated SPARQL querying. Our approach can be directly combined with existing SPARQL query federation engines to achieve the same recall while querying fewer data sources. We extend three well-known SPARQL query federation engines with HiBISCus and compare our extensions with the original approaches on FedBench. Our evaluation shows that HiBISCuS can efficiently reduce the total number of sources selected without losing recall. Moreover, our approach significantly reduces the execution time of the selected engines on most of the benchmark queries.
FedX - Optimization Techniques for Federated Query Processing on Linked Dataaschwarte
The final slides of our talk about FedX at the 10th International Semantic Web Conference in Bonn. For details about FedX see http://www.fluidops.com/fedx/
Re-using Media on the Web: Media fragment re-mixing and playoutMediaMixerCommunity
A number of novel application ideas will be introduced based on the media fragment creation, specification and rights management technologies. Semantic search and retrieval allows us to organize sets of fragments by topical or conceptual relevance. These fragment sets can then be played out in a non-linear fashion to create a new media re-mix. We look at a server-client implementation supporting Media Fragments, before allowing the participants to take the sets of media they have selected and create their own re-mix.
Hadoop is an open source software framework that supports data-intensive distributed applications. Hadoop is licensed under the Apache v2 license. It is therefore generally known as Apache Hadoop. Hadoop has been developed, based on a paper originally written by Google on MapReduce system and applies concepts of functional programming. Hadoop is written in the Java programming language and is the highest-level Apache project being constructed and used by a global community of contributors. Hadoop was developed by Doug Cutting and Michael J. Cafarella. And just don't overlook the charming yellow elephant you see, which is basically named after Doug's son's toy elephant!
The topics covered in presentation are:
1. Big Data Learning Path
2.Big Data Introduction
3. Hadoop and its Eco-system
4.Hadoop Architecture
5.Next Step on how to setup Hadoop
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison Severalnines
Galera Cluster for MySQL, Percona XtraDB Cluster and MariaDB Cluster (the three “flavours” of Galera Cluster) make use of the Galera WSREP libraries to handle synchronous replication.MySQL Cluster is the official clustering solution from Oracle, while Galera Cluster for MySQL is slowly but surely establishing itself as the de-facto clustering solution in the wider MySQL eco-system.
In this webinar, we will look at all these alternatives and present an unbiased view on their strengths/weaknesses and the use cases that fit each alternative.
This webinar will cover the following:
MySQL Cluster architecture: strengths and limitations
Galera Architecture: strengths and limitations
Deployment scenarios
Data migration
Read and write workloads (Optimistic/pessimistic locking)
WAN/Geographical replication
Schema changes
Management and monitoring
The database market is large and filled with many solutions. In this talk, Seth Luersen from MemSQL we will take a look at what is happening within AWS, the overall data landscape, and how customers can benefit from using MemSQL within the AWS ecosystem.
Converging Database Transactions and Analytics SingleStore
delivered at the Gartner Data and Analytics 2018 show in Texas. This presentation discusses real-time applications and their impact on existing data infrastructures
MemSQL 201: Advanced Tips and Tricks WebcastSingleStore
Topics discussed include differences between columnstore and rowstore engines, data ingestion, data sharding and query tuning, lastly memory and workload management.
Watch the replay at https://memsql.wistia.com/medias/4siccvlorm
An Engineering Approach to Database EvaluationsSingleStore
This talk will go over a methodical approach for making a decision, dig into interesting tradeoffs, and give tips about what things to look for under the hood and how to evaluate the tech behind the database.
Stream Processing with Pipelines and Stored ProceduresSingleStore
This talk will discuss an upcoming feature in MemSQL 6.5 showing how advanced stream processing use cases can be tackled with a combination of stored procedures (new in 6.0) and MemSQL's pipelines feature.
Learn how to leverage MPP technology and distributed data to deliver high volume transactional and analytical work loads which result in real time dashboards on rapidly changing data using standard SQL tools. Demonstrations will include the streaming of structured and JSON data from Kafka messages through a micro-batch ETL process into the MemSQL database where the data is then queried using standard SQL tools and visualized leveraging Tableau.
This session will focus on image recognition, the techniques available, and how to put those techniques into production. It will further explore algebraic operations on tensors, and how that can assist in large-scale, high-throughput, highly-parallel image recognition.
LIVE DEMO: Constructing and executing a real-time image recognition pipeline using Kafka and Spark.
Speaker: Neil Dahlke, MemSQL Senior Solutions Engineer
How Database Convergence Impacts the Coming Decades of Data ManagementSingleStore
How Database Convergence Impacts the Coming Decades of Data Management by Nikita Shamgunov, CEO and co-founder of MemSQL.
Presented at NYC Database Month in October 2017. NYC Database Month is the largest database meetup in New York, featuring talks from leaders in the technology space. You can learn more at http://www.databasemonth.com.
James Burkhart explains how Uber supports millions of analytical queries daily across real-time data with Apollo. James covers the architectural decisions and lessons learned building an exactly-once ingest pipeline storing raw events across in-memory row storage and on-disk columnar storage and a custom metalanguage and query layer leveraging partial OLAP result set caching and query canonicalization. Putting all the pieces together provides thousands of Uber employees with subsecond p95 latency analytical queries spanning hundreds of millions of recent events.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Building a Fault Tolerant Distributed Architecture
1. Building a Fault Tolerant Distributed
Architecture
Rodrigo Toste Gomes
Software Engineer
2. A MemSQL Cluster
2
- Master Aggregator
- Leaves
Leaves store shards of data.
Master Aggregator is the source of truth for cluster state:
- Location of data shards;
3. Fault Tolerance and High Availability
3
Maintain replicas of shards in different nodes.
System can tolerate one node failure.
Master Aggregator is also responsible for:
- Location of data shard replicas;
- Replication states;
- Moving shards around.
4. 4
Cluster State:
Node State:
- Leaf A is online
- Leaf B is online
Databases:
- data
Shards:
- data:0 - primary on A
- data:0 - replica on B
- data:1 - primary on B
- data:1 - replica on A
Example Cluster
Nodes:
Master Aggregator
Leaf A
Leaf B
5. Maintaining the Cluster State
5
Cluster State:
- Leaf A is online
Leaf A:
Offline MA
BA
Heartbeats
Cluster State:
- Leaf A is offline
Leaf A:
Offline
6. Maintaining the Cluster State
6
Cluster State:
- Leaf A is online
- data:0 - primary on A
- data:0 - replica on B
Leaf A:
Offline
Leaf B:
Online; replica of tweets:0
Cluster State:
- Leaf A is offline
- data:0 - primary on B
Leaf A:
Offline
Leaf B:
Online; replica of data:0
7. Maintaining the Cluster State
7
Cluster State:
- Leaf A is online
- data:0 - primary on A
- data:0 - replica on B
Leaf A:
Offline
Leaf B:
Online; replica of tweets:0
Cluster State:
- Leaf A is offline
- data:0 - primary on B
Leaf A:
Offline
Leaf B:
Online; replica of data:0
Leaf B:
Online; primary of data:0
8. MemSQL Distributed Architecture
8
Master Aggregator:
- Maintains primary copy of cluster state;
- Monitors node state via heartbeats.
Leaves:
- Monitor for changes in local and cluster state;
- Reconcile local state with cluster state.
What happens when a node is unable to reconcile its local
state with the cluster state?
9. Shard Can’t Replicate
9
Cluster State:
- data:0 primary on A
- data:0 replica on B
Leaf A:
Primary data:0
Replicating data:0 to B
Leaf B:
Replica of data:0
MA
BA
Heartbeats
data:0
10. Shard Can’t Replicate
10
Cluster State:
- data:0 primary on A
- data:0 replica on B
Leaf A:
Primary data:0
Leaf B:
Replica of data:0
Network Partition A | B
MA
BA
Heartbeats
data:0
11. Strategy 1: Don’t block writes
11
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 replica on B
Leaf A:
Primary data:0
Leaf B:
Out of Date Replica of data:0
Network Partition A | B
12. Strategy 1: Don’t block writes (Leaf A crashes)
12
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 replica on B
Leaf A:
Offline
Leaf B:
Out of Date Replica of data:0
Network Partition A | B
Cluster State:
- data:0 primary on B
Leaf B:
Primary of data:0
Not a MemSQL approach
given potential for data loss
13. Strategy 2: Do nothing - block writes indefinitely
13
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 replica on B
Leaf A:
Primary data:0
Leaf B:
Replica of data:0
Network Partition A | B
Write workload stalls
14. Strategy 3: Leaf Notify
14
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 synchronous replica on B
Leaf A:
Primary data:0
Leaf B:
Replica of data:0
Network Partition A | B
Write workload is stalled
15. Strategy 3: Leaf Notify
15
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 synchronous replica on B
Leaf A:
Primary data:0
Leaf Notifies MA - no replication
Leaf B:
Replica of data:0
Network Partition A | B
16. Strategy 3: Leaf Notify
16
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 asynchronous replica on B
Leaf A:
Primary data:0
Leaf B:
Out of Date Replica of data:0
Network Partition A | B
17. MemSQL Distributed Architecture
17
Master Aggregator:
- Maintains primary copy of cluster state;
- Monitors node state via heartbeats.
Leaves:
- Monitor for changes in local and cluster state;
- Reconcile local state with cluster state;
- Notify MA to change cluster state when reconciling is
impossible.