This deck was presented at the SF Kubernetes Meetup held at Microsoft's downtown SF office, introducing the architecture of TiDB and TiKV (a CNCF project), key use cases, a user story with Mobike (one of the largest bikesharing platforms in the world), and how TiDB is deployed across different cloud environment using TiDB Operator.
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Kevin Xu
This presentation was delivered at the NYC SQL meetup on September 27, 2018. It provided a technical overview of the TiDB Platform, a deep dive into TiDB's MySQL compatible layer and MySQL ecosystem tools, use case of Mobike, and appendix with detail materials on coprocessor and transaction model.
The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP
Ed Huang, CTO of PingCAP, talked at Go System Conference about dealing with the typical and profound issues related to Go’s runtime as your systems become more complex. Taking TiDB as an example, he demonstrated how these problems can be reproduced, located, and analyzed in production.
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]Kevin Xu
This deck introduces TiDB, an open source distributed NewSQL database, to the Portland Cloud Native meetup on September 25, 2018. It includes materials on technical architecture, core features, using TiDB Operator to deploy in any cloud environment, and appendix on transaction model and join support.
This is the speech Shen Li gave at GopherChina 2017.
TiDB is an open source distributed database. Inspired by the design of Google F1/Spanner, TiDB features in infinite horizontal scalability, strong consistency, and high availability. The goal of TiDB is to serve as a one-stop solution for data storage and analysis.
In this talk, we will mainly cover the following topics:
- What is TiDB
- TiDB Architecture
- SQL Layer Internal
- Golang in TiDB
- Next Step of TiDB
Shen Li, VP engineering at PingCAP, shares the slides about TiDB with the Big Data Ecosystem. Enjoy~
TiDB, an open source distributed HTAP database. Inspired by Google Spanner/F1, PingCAP develops TiDB, an open source distributed Hybrid Transactional/Analytical Processing (HTAP) database. TiDB features infinite horizontal scalability, strong consistency, and high availability. The goal of TiDB is to serve as a one-stop solution for online transactions and analysis.
This is the speech Shen Li gave at Cloud Connect Event Shanghai·China 2017.
TiDB is an open source distributed database. Inspired by the design of Google F1/Spanner, TiDB features in infinite horizontal scalability, strong consistency, and high availability. The goal of TiDB is to serve as a one-stop solution for data storage and analysis. In this talk, we will mainly cover the following topics:
(1) The overall architecture of TiDB and implementation details
(2) How TiDB stores large volumes of data and empowers computation
(3) How TiDB embraces the big data ecosystem and reduces the cost of big data analysis and the user threshold
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Kevin Xu
This presentation was delivered at the NYC SQL meetup on September 27, 2018. It provided a technical overview of the TiDB Platform, a deep dive into TiDB's MySQL compatible layer and MySQL ecosystem tools, use case of Mobike, and appendix with detail materials on coprocessor and transaction model.
The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP
Ed Huang, CTO of PingCAP, talked at Go System Conference about dealing with the typical and profound issues related to Go’s runtime as your systems become more complex. Taking TiDB as an example, he demonstrated how these problems can be reproduced, located, and analyzed in production.
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]Kevin Xu
This deck introduces TiDB, an open source distributed NewSQL database, to the Portland Cloud Native meetup on September 25, 2018. It includes materials on technical architecture, core features, using TiDB Operator to deploy in any cloud environment, and appendix on transaction model and join support.
This is the speech Shen Li gave at GopherChina 2017.
TiDB is an open source distributed database. Inspired by the design of Google F1/Spanner, TiDB features in infinite horizontal scalability, strong consistency, and high availability. The goal of TiDB is to serve as a one-stop solution for data storage and analysis.
In this talk, we will mainly cover the following topics:
- What is TiDB
- TiDB Architecture
- SQL Layer Internal
- Golang in TiDB
- Next Step of TiDB
Shen Li, VP engineering at PingCAP, shares the slides about TiDB with the Big Data Ecosystem. Enjoy~
TiDB, an open source distributed HTAP database. Inspired by Google Spanner/F1, PingCAP develops TiDB, an open source distributed Hybrid Transactional/Analytical Processing (HTAP) database. TiDB features infinite horizontal scalability, strong consistency, and high availability. The goal of TiDB is to serve as a one-stop solution for online transactions and analysis.
This is the speech Shen Li gave at Cloud Connect Event Shanghai·China 2017.
TiDB is an open source distributed database. Inspired by the design of Google F1/Spanner, TiDB features in infinite horizontal scalability, strong consistency, and high availability. The goal of TiDB is to serve as a one-stop solution for data storage and analysis. In this talk, we will mainly cover the following topics:
(1) The overall architecture of TiDB and implementation details
(2) How TiDB stores large volumes of data and empowers computation
(3) How TiDB embraces the big data ecosystem and reduces the cost of big data analysis and the user threshold
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big DataPingCAP
The performance of analytical query processing in data management systems depends primarily on the capabilities of the system's query optimizer. Increased data volumes and heightened interest in processing complex analytical queries have prompted Pivotal to build a new query optimizer.
In this paper we present the architecture of Orca, the new query optimizer for all Pivotal data management products, including Pivotal Greenplum Database and Pivotal HAWQ. Orca is a comprehensive development uniting state-of-the-art query optimization technology with own original research resulting in a modular and portable optimizer architecture.
In addition to describing the overall architecture, we highlight several unique features and present performance comparisons against other systems.
Building a transactional key-value store that scales to 100+ nodes (percona l...PingCAP
This slide deck from Siddon Tang, Chief engineer from PingCAP, was for Siddon's talk at Percona Live 2018 regarding how to scale TiKV, an open source transactional Key-Value store to 100+ nodes.
This is the speech Siddon Tang gave at the 1st Rust Meetup in Beijing on April 16, 2017.
Siddon Tang:Chief Architect of PingCAP
The slide covered the following topics:
- Why do we use Rust in TiKV
- TiKV architecture introduction
- Key technology
- Future plan
At TiDB DevCon 2020, Max Liu, CEO at PingCAP, gave a keynote speech. He believes that today’s database should be more real-time, more flexible, and easier to use, and TiDB, an elastic, cloud-native, real-time HTAP database, is exactly that kind of database.
This is the speech Max Liu gave at Percona Live Open Source Database Conference 2016.
Max Liu: Co-founder and CEO, a hacker with a free soul
The slide covered the following topics:
- Why another database?
- What kind of database we want to build?
- How to design such a database, including the principles, the architecture, and design decisions?
- How to develop such a database, including the architecture and the core technologies for TiKV and TiDB?
- How to test the database to ensure the quality and stability?
Presto talk @ Global AI conference 2018 Bostonkbajda
Presented at Global AI Conference in Boston 2018:
http://www.globalbigdataconference.com/boston/global-artificial-intelligence-conference-106/speaker-details/kamil-bajda-pawlikowski-62952.html
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Facebook, Airbnb, Netflix, Uber, Twitter, LinkedIn, Bloomberg, and FINRA, Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments in the last few years. Presto is really a SQL-on-Anything engine in a single query can access data from Hadoop, S3-compatible object stores, RDBMS, NoSQL and custom data stores. This talk will cover some of the best use cases for Presto, recent advancements in the project such as Cost-Based Optimizer and Geospatial functions as well as discuss the roadmap going forward.
Iceberg: a modern table format for big data (Ryan Blue & Parth Brahmbhatt, Netflix)
Presto Summit 2018 (https://www.starburstdata.com/technical-blog/presto-summit-2018-recap/)
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...PingCAP
Being one of the most complex components of a DBMS, query optimizers could benefit from adaptive policies that are learned systematically from the data and the query workload. This paper takes the approach used by Marcus et al. in Bao and adapt it to SCOPE, a big data system used internally at Microsoft. Along the way, multiple new challenges have been solved. This paper also evaluates the efficacy of the approach on production workloads that include 150K daily jobs.
Paper:
https://dl.acm.org/doi/pdf/10.1145/3448016.3457568
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...PingCAP
This paper proposes interleaving with coroutines for
any type of index join. It showcases the proposal on SAP
HANA by implementing binary search and CSB+-tree traversal for an instance of index join related to dictionary compression. Coroutine implementations not only perform similarly to prior interleaving techniques, but also resemble the original code closely, while supporting both interleaved and non-interleaved execution. Thus, this paper claims that coroutines
make interleaving practical for use in real DBMS codebases.
Paper: http://www.vldb.org/pvldb/vol11/p230-psaropoulos.pdf
Follow PingCAP on Twitter: https://twitter.com/PingCAP
Follow PingCAP on LinkedIn: https://www.linkedin.com/company/13205484/
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
Dean will provide practical tips and techniques learned from helping hundreds of customers deploy InfluxDB and InfluxDB Enterprise. This includes hardware and architecture choices, schema design, configuration setup, and running queries.
InfluxDB 2.0: Dashboarding 101 by David G. SimmonsInfluxData
InfluxDB 2.0 has some new dashboarding and querying capabilities that will make using a time series database even easier. This InfluxDays NYC 2019 presentation presented by David G. Simmons (Senior Developer Evangelist at InfluxData), walks you through how to set up your first dashboard.
Unifying Frontend and Backend Development with Scala - ScalaCon 2021Taro L. Saito
Scala can be used for developing both frontend (Scala.js) and backend (Scala JVM) applications. A missing piece has been bridging these two worlds using Scala. We built Airframe RPC, a framework that uses Scala traits as a unified RPC interface between servers and clients. With Airframe RPC, you can build HTTP/1 (Finagle) and HTTP/2 (gRPC) services just by defining Scala traits and case classes. It simplifies web application design as you only need to care about Scala interfaces without using existing web standards like REST, ProtocolBuffers, OpenAPI, etc. Scala.js support of Airframe also enables building interactive Web applications that can dynamically render DOM elements while talking with Scala-based RPC servers. With Airframe RPC, the value of Scala developers will be much higher both for frontend and backend areas.
This is the slide deck used for introducing TiDB, an open source MySQL-compatible HTAP distributed database, at the SF DevOps meetup on August 20, 2018.
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big DataPingCAP
The performance of analytical query processing in data management systems depends primarily on the capabilities of the system's query optimizer. Increased data volumes and heightened interest in processing complex analytical queries have prompted Pivotal to build a new query optimizer.
In this paper we present the architecture of Orca, the new query optimizer for all Pivotal data management products, including Pivotal Greenplum Database and Pivotal HAWQ. Orca is a comprehensive development uniting state-of-the-art query optimization technology with own original research resulting in a modular and portable optimizer architecture.
In addition to describing the overall architecture, we highlight several unique features and present performance comparisons against other systems.
Building a transactional key-value store that scales to 100+ nodes (percona l...PingCAP
This slide deck from Siddon Tang, Chief engineer from PingCAP, was for Siddon's talk at Percona Live 2018 regarding how to scale TiKV, an open source transactional Key-Value store to 100+ nodes.
This is the speech Siddon Tang gave at the 1st Rust Meetup in Beijing on April 16, 2017.
Siddon Tang:Chief Architect of PingCAP
The slide covered the following topics:
- Why do we use Rust in TiKV
- TiKV architecture introduction
- Key technology
- Future plan
At TiDB DevCon 2020, Max Liu, CEO at PingCAP, gave a keynote speech. He believes that today’s database should be more real-time, more flexible, and easier to use, and TiDB, an elastic, cloud-native, real-time HTAP database, is exactly that kind of database.
This is the speech Max Liu gave at Percona Live Open Source Database Conference 2016.
Max Liu: Co-founder and CEO, a hacker with a free soul
The slide covered the following topics:
- Why another database?
- What kind of database we want to build?
- How to design such a database, including the principles, the architecture, and design decisions?
- How to develop such a database, including the architecture and the core technologies for TiKV and TiDB?
- How to test the database to ensure the quality and stability?
Presto talk @ Global AI conference 2018 Bostonkbajda
Presented at Global AI Conference in Boston 2018:
http://www.globalbigdataconference.com/boston/global-artificial-intelligence-conference-106/speaker-details/kamil-bajda-pawlikowski-62952.html
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Facebook, Airbnb, Netflix, Uber, Twitter, LinkedIn, Bloomberg, and FINRA, Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments in the last few years. Presto is really a SQL-on-Anything engine in a single query can access data from Hadoop, S3-compatible object stores, RDBMS, NoSQL and custom data stores. This talk will cover some of the best use cases for Presto, recent advancements in the project such as Cost-Based Optimizer and Geospatial functions as well as discuss the roadmap going forward.
Iceberg: a modern table format for big data (Ryan Blue & Parth Brahmbhatt, Netflix)
Presto Summit 2018 (https://www.starburstdata.com/technical-blog/presto-summit-2018-recap/)
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...PingCAP
Being one of the most complex components of a DBMS, query optimizers could benefit from adaptive policies that are learned systematically from the data and the query workload. This paper takes the approach used by Marcus et al. in Bao and adapt it to SCOPE, a big data system used internally at Microsoft. Along the way, multiple new challenges have been solved. This paper also evaluates the efficacy of the approach on production workloads that include 150K daily jobs.
Paper:
https://dl.acm.org/doi/pdf/10.1145/3448016.3457568
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...PingCAP
This paper proposes interleaving with coroutines for
any type of index join. It showcases the proposal on SAP
HANA by implementing binary search and CSB+-tree traversal for an instance of index join related to dictionary compression. Coroutine implementations not only perform similarly to prior interleaving techniques, but also resemble the original code closely, while supporting both interleaved and non-interleaved execution. Thus, this paper claims that coroutines
make interleaving practical for use in real DBMS codebases.
Paper: http://www.vldb.org/pvldb/vol11/p230-psaropoulos.pdf
Follow PingCAP on Twitter: https://twitter.com/PingCAP
Follow PingCAP on LinkedIn: https://www.linkedin.com/company/13205484/
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
Dean will provide practical tips and techniques learned from helping hundreds of customers deploy InfluxDB and InfluxDB Enterprise. This includes hardware and architecture choices, schema design, configuration setup, and running queries.
InfluxDB 2.0: Dashboarding 101 by David G. SimmonsInfluxData
InfluxDB 2.0 has some new dashboarding and querying capabilities that will make using a time series database even easier. This InfluxDays NYC 2019 presentation presented by David G. Simmons (Senior Developer Evangelist at InfluxData), walks you through how to set up your first dashboard.
Unifying Frontend and Backend Development with Scala - ScalaCon 2021Taro L. Saito
Scala can be used for developing both frontend (Scala.js) and backend (Scala JVM) applications. A missing piece has been bridging these two worlds using Scala. We built Airframe RPC, a framework that uses Scala traits as a unified RPC interface between servers and clients. With Airframe RPC, you can build HTTP/1 (Finagle) and HTTP/2 (gRPC) services just by defining Scala traits and case classes. It simplifies web application design as you only need to care about Scala interfaces without using existing web standards like REST, ProtocolBuffers, OpenAPI, etc. Scala.js support of Airframe also enables building interactive Web applications that can dynamically render DOM elements while talking with Scala-based RPC servers. With Airframe RPC, the value of Scala developers will be much higher both for frontend and backend areas.
This is the slide deck used for introducing TiDB, an open source MySQL-compatible HTAP distributed database, at the SF DevOps meetup on August 20, 2018.
"Smooth Operator" [Bay Area NewSQL meetup]Kevin Xu
This slide was delivered at the Bay Area NewSQL meetup in California on how TiDB, an open source NewSQL distributed database, is deployed and managed on any Kubernetes-enabled cloud environment by applying the Operator pattern.
This slide was delivered at the Kubernetes/Docker meetup in Cologne, Germany, hosted by Giant Swarms on how TiDB, an open source NewSQL distributed database, is deployed and managed on any Kubernetes-enabled cloud environment by applying the Operator pattern.
This slide was delivered at the Bay Area In-Memory Computing meetup in California on how TiDB, an open source NewSQL distributed database, is deployed and managed on any Kubernetes-enabled cloud environment by applying the Operator pattern.
When Apache Spark Meets TiDB with Xiaoyu MaDatabricks
During the past 10 years, big-data storage layers mainly focus on analytical use cases. When it comes to analytical cases, users usually offload data onto Hadoop cluster and perform queries on HDFS files. People struggle dealing with modifications on append only storage and maintain fragile ETL pipelines.
On the other hand, although Spark SQL has been proven effective parallel query processing engine, some tricks common in traditional databases are not available due to characteristics of storage underneath. TiSpark sits directly on top of a distributed database (TiDB)’s storage engine, expand Spark SQL’s planning with its own extensions and utilizes unique features of database storage engine to achieve functions not possible for Spark SQL on HDFS. With TiSpark, users are able to perform queries directly on changing / fresh data in real time.
The takeaways from this two are twofold:
— How to integrate Spark SQL with a distributed database engine and the benefit of it
— How to leverage Spark SQL’s experimental methods to extend its capacity.
This presentation provides an overview of the architecture and technology of TiDB, an open-source distributed NewSQL database, and how it helps Mobike, one of the largest dockless bikeshare platform, scale its infrastructure to achieve hyper-growth.
How QBerg scaled to store data longer, query it fasterMariaDB plc
The continuous increase in terms of services and countries to which QBerg delivers its services requires an ever-increasing load of resources. During the last year QBerg has reached a critical point, storing so much transactional data that standard relational databases were unable to meet the SLAs, or support the features, required by customers. As an example, they had to cap web analytics to running on a maximum of four months of history. The introduction of MariaDB ColumnStore, flanked by existing MariaDB Server databases, not only will allow them to store multiple years’ worth of historical data for analytics – it decreased overall processing time by one order of magnitude right off the bat. The move to a unified platform was incremental, using MariaDB MaxScale as both a router and a replicator. QBerg is now able to replicate full InnoDB schemas to MariaDB ColumnStore and incrementally update big tables without impacting the performance of ongoing transactions.
High Performance Computing (HPC)*Storage*Networking*Supercomputing*Beowulf Clusters*Datacenters*IT Infrastrucutre*Linux*Open Source -- William Wu, Director of Product Management for Penguin Computing, presented at the Penguin Computing booth theater at SC18 about how hyperscale infrastructure should be built and designed to handle demand increase for high-performance computing, storage, and networking.
This deck was the keynote speech delivered by Kevin Xu (GM of Global Strategy at Operations) and Shen Li (VP of Engineering at PingCAP) on TiDB architecture, tools and migration path, and TiDB Cloud fully-managed offering at Percona Live Europe 2018 in Frankfurt, Germany.
Free GitOps Workshop + Intro to Kubernetes & GitOpsWeaveworks
Follow along in this free workshop and experience GitOps!
AGENDA:
Welcome - Tamao Nakahara, Head of DX (Weaveworks)
Introduction to Kubernetes & GitOps - Mark Emeis, Principal Engineer (Weaveworks)
Weave Gitops Overview - Tamao Nakahara
Free Gitops Workshop - David Harris, Product Manager (Weaveworks)
If you're new to Kubernetes and GitOps, we'll give you a brief introduction to both and how GitOps is the natural evolution of Kubernetes.
Weave GitOps Core is a continuous delivery product to run apps in any Kubernetes. It is free and open source, and you can get started today!
https://www.weave.works/product/gitops-core
If you’re stuck, also come talk to us at our Slack channel! #weave-gitops http://bit.ly/WeaveGitOpsSlack (If you need to invite yourself to the Slack, visit https://slack.weave.works/)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
This is the slide I presented at PyCon SG 2019. I talked about overview of Airflow and how we can use Airflow and the other data engineering services on AWS and GCP to build data pipelines.
Using druid for interactive count distinct queries at scaleItai Yaffe
At NMC (Nielsen Marketing Cloud) we need to present to our clients the number of unique users who meet a given criteria. The condition is typically a set-theoretic expression over a stream of events for a given time range. Historically, we have used ElasticSearch to answer these types of questions, however, we have encountered major scaling issues. In this presentation we will detail the journey of researching, benchmarking and productionizing a new technology, Druid, with DataSketches, to overcome the limitations we were facing
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
[Study Guide] Google Professional Cloud Architect (GCP-PCA) CertificationAmaaira Johns
Start Here---> https://bit.ly/3bGEd9l <---Get complete detail on GCP-PCA exam guide to crack Professional Cloud Architect. You can collect all information on GCP-PCA tutorial, practice test, books, study material, exam questions, and syllabus. Firm your knowledge on Professional Cloud Architect and get ready to crack GCP-PCA certification. Explore all information on GCP-PCA exam with the number of questions, passing percentage, and time duration to complete the test.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
2. Agenda
● History and Community
● Technical Walkthrough
● Use Case with Mobike
● Q&A
● (Time Permitting) TiDB on Google Kubernetes Engine
3. A little about me
● General Manager of Global Strategy and
Operations
● Studied CS and Law at Stanford
● Program in Javascript, Python, and
(more recently) learning Rust
4. A little about PingCAP
● Founded in April 2015 by 3
infrastructure engineers
● Offices throughout North America
and China
6. PingCAP.com
Our Product: the TiDB Platform
● TiDB Platform (Ti = Titanium)
○ TiDB (SQL Layer)
○ TiKV (Key-Value Storage)
○ TiSpark (Spark plugin to TiKV)
● Open source from Day 1
○ GA 1.0: October 2017
○ GA 2.0: April 2018
11. PingCAP.com
TiKV (in CNCF): The Storage Foundation
Region 5
Region 1
Region 3
TiKV node 1
Store 1
Region 4
gRPC
Region 1
Region 2
TiKV node 2
Store 2
Region 3
gRPC
Region 3
Region 1
Region 5
TiKV node 3
Store 3
gRPC
Region 5
Region 1
Region 2
TiKV node 4
Store 4
gRPC
Client
PD 1
PD 2
PD 3
Placement
Driver
Raft GroupRegion 4
Region 4
12. PingCAP.com
TiDB: The (My)SQL Layer
Node1 Node2 Node3 Node4
MySQL Network Protocol
SQL Parser
Cost-based Optimizer
Coprocessor API
ODBC/JDBC MySQL Client
Any ORM which
supports MySQL
TiDB
TiKV
14. PingCAP.com
Join Support
● Hash Join (fastest; if table <= 50 million rows)
● Sort Merge Join (join on indexed column or ordered
data source)
● Index Lookup Join (join on indexed column; ideally
after filter, result < 10,000 rows)
● Chosen based on Cost-base Optimizer:
15. PingCAP.com
TiSpark: Complex OLAP
Spark ExecSpark Exec
Spark Driver
Spark Exec
TiKV TiKV TiKV TiKV
TiSpark
TiSpark TiSpark TiSpark
TiKV
Placement
Driver (PD)
gRPC
Distributed Storage Layer
gRPC
retrieve data location
retrieve real data from TiKV
16. PingCAP.com
Placement Driver
● Provide a God’s view of
the entire cluster
● Store metadata, balancing
workload, issue
timestamps
● Also a cluster with
embedded etcd
Placement
Driver
Placement
Driver
Placement
Driver
Raft Raft
Raft
17. PingCAP.com
Transaction Model
● Timestamp Oracle Service (from
Google’s Percolator)
● 2-Phase commit protocol (2PC)
● Problem: Single point of failure
● Solution: PD HA cluster
Placement
Driver
Placement
Driver
Placement
Driver
Raft Raft
Raft
26. PingCAP.com
Scenario #1: Locking/Unlocking
● Locking and unlocking of smart bikes
generates massive data
● Smooth experience is the key to user retention
● TiDB supports this system by alerting
administrators when the success rate of
locking/unlocking drops, within minutes
● Quickly find malfunctioning bikes
27. PingCAP.com
Scenario #2: Real-Time Analysis
● Synchronize TiDB with MySQL
instances using Syncer
(proprietary tool)
● TiDB + TiSpark empower
real-time analysis with
horizontal scalability
● No need for Hadoop + Hive
28. PingCAP.com
Scenario #3: Mobike Store
● An innovative loyalty program that
must be on 24x7x365
● TiDB provides:
○ High-concurrency for peak or
promotional season
○ Permanent storage
○ Horizontal scalability
● No interruption as business evolves
29. PingCAP.com
Thank You!
20% OFF KubeCon:
KCNA18SPR
kevin@pingcap.com
@kevinsxu; @pingcap
TiDB Cloud Early Access:
https://www.pingcap.com/
tidb-cloud/
TiDB Academy Sign-up:
www.pingcap.com/tidb-ac
ademy/
32. PingCAP.com
CBO 101
Network cost Memory cost CPU cost
In TiDB, the default memory factor is 5 and CPU factor is 0.8.
For example: Operator Sort(r), its cost would be:
TiDB will maintain histogram of data
33. PingCAP.com
Relational -> KV
ID Name Email
1 Edward h@pingcap.com
2 Tom tom@pingcap.com
...
user/1 Edward,h@pingcap.com
user/2 Tom,tom@pingcap.com
...
In TiKV -∞
+∞
(-∞, +∞)
Sorted map
“User” Table
Some region...
34. PingCAP.com
Index Structure
Row:
Key: tablePrefix_rowPrefix_tableID_rowID (IDs are assigned by TiDB, all int64)
Value: [col1, col2, col3, col4]
Index:
Key: tablePrefix_idxPrefix_tableID_indexID_ColumnsValue_rowID
Value: [null]
Keys are ordered by byte array in TiKV, so can support SCAN
Every key is appended a timestamp, issued by Placement Driver
35. PingCAP.com
Guaranteeing Correctness
● Formal proof using TLA+
○ a formal specification and verification language to reason about and
prove aspects of complex systems
● Raft
● TSO/Percolator
● 2PC
● See details: https://github.com/pingcap/tla-plus
36. PingCAP.com
MySQL Compatibility - Summary
● Compatibility with MySQL 5.7
○ Joins, subqueries, DML, DDL
etc.
● On the roadmap:
○ Views, Window Functions
● Missing:
○ Stored Procedures, Triggers,
Events, Fulltext
pingcap.com
/docs/sql/mysql-compatibility/
37. PingCAP.com
MySQL Compatibility - Nuanced
● Some features work differently
○ Auto Increment
○ Optimistic Locking
● TiDB works better with smaller
transactions
○ Recommended to batch
updates, deletes, inserts to
5000 rows
pingcap.com
/docs/sql/mysql-compatibility/