Dharma will describe the internals of the system design and various design trade-offs they had to make in the process of building Azure Cosmos DB service. He will also share his experiences from operating a globally distributed database service worldwide and maintaining comprehensive Service Level Agreements (SLAs).
WSO2Con ASIA 2016: An Introduction to the WSO2 Analytics PlatformWSO2
In today’s connected world organizations have access to an enormous amount of data. We often don’t know what they mean or how we can use them, in terms of hindsight, oversight, insight and foresight, to gain competitive advantage in the market. Use cases ranging from simple system monitoring to complex fraud analysis demands this.
The WSO2 Data Analytics platform lets you collect data, allows you to explore it through batch, real-time, interactive and predictive processing technologies and allows you to communicate your results. In this talk, we will discuss the WSO2 Data Analytics platform and how it brings together all analytics technologies into a single platform and user experience.
The workshop tells about HBase data model, architecture and schema design principles.
Source code demo:
https://github.com/moisieienko-valerii/hbase-workshop
WSO2Con ASIA 2016: An Introduction to the WSO2 Analytics PlatformWSO2
In today’s connected world organizations have access to an enormous amount of data. We often don’t know what they mean or how we can use them, in terms of hindsight, oversight, insight and foresight, to gain competitive advantage in the market. Use cases ranging from simple system monitoring to complex fraud analysis demands this.
The WSO2 Data Analytics platform lets you collect data, allows you to explore it through batch, real-time, interactive and predictive processing technologies and allows you to communicate your results. In this talk, we will discuss the WSO2 Data Analytics platform and how it brings together all analytics technologies into a single platform and user experience.
The workshop tells about HBase data model, architecture and schema design principles.
Source code demo:
https://github.com/moisieienko-valerii/hbase-workshop
Paolo Castagna is a Senior Sales Engineer at Confluent. His background is on 'big data' and he has, first hand, saw the shift happening in the industry from batch to stream processing and from big data to fast data. His talk will introduce Kafka Streams and explain why Apache Kafka is a great option and simplification for stream processing.
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2
Today’s highly connected world is flooding businesses with big and fast-moving data. The ability to trawl this data ocean and identify actionable insights can deliver a competitive advantage to any organization. The WSO2 Analytics Platform enables businesses to do just that by providing batch, real-time, interactive and predictive analysis capabilities all in one place.
In this tutorial we will
Plug in the WSO2 Analytics Platform to some common business use cases
Showcase the numerous capabilities of the platform
Demonstrate how to collect data, analyze, predict and communicate effectively
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedHostedbyConfluent
Enforcing format, changing schema, introducing privacy filters have always been a challenge with the classical Kafka-API. In this talk we'll cover how to extend existing applications with webassembly, allowing developers to change the shape of data at runtime, per application without creating additional topics. By leveraging WebAssembly, we can extend the capabilities of the Kafka-API beyond what it was initially imagined. Come and learn about the future of the Kafka-API
(PFC308) How Dropbox Scales Massive Workloads Using Amazon SQS | AWS re:Inven...Amazon Web Services
In this session, learn how Dropbox scales to provide one of the largest cloud storage and file sharing services in the world. Hear how Dropbox leverages Amazon EC2 to run varied workloads including thumbnail generation and document prevent, as well as document indexing to support full-text search. Dropbox presents ''Livefill'' - a generic framework built on top of Amazon SQS. Livefill enables them to trigger customizable data-processing workloads on data stored in Amazon S3 and helps them support more than 200,000 workload requests per second, spread across thousands of machines.
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...HostedbyConfluent
If a real-time dashboard takes 5 minutes to refresh, it’s not real-time. With data lakes increasingly enabling massive amounts of unprocessed data sets, delivering low-latency analytics is not for the faint-hearted. Learn how to stream massive amounts of data which used to be impossible to handle from Kafka, to serve real-time applications using lake-scale optimized approaches to storage and indexing.
Kurze Zusammenfassung Über Stream Reasoning/CEP und Überblick zu Apache Storm/Apache Spark im Rahmen des #devspace / OpenSpace 2015 in Leipzig
Relevante "Grundlagen" die ich als Ausgangspunkt u.a. genommen habe:
http://www.kr.tuwien.ac.at/staff/beck/pub/ijcai15-bde.pdf
http://www.kr.tuwien.ac.at/staff/beck/pub/aaai2015.pdf
http://www.kr.tuwien.ac.at/staff/beck/slides/slides-aaai2015.pdf
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Spark Streaming & Kafka-The Future of Stream ProcessingJack Gudenkauf
Hari Shreedharan/Cloudera @Playtika. With its easy to use interfaces and native integration with some of the most popular ingest tools, such as Kafka, Flume, Kinesis etc, Spark Streaming has become go-to tool for stream processing. Code sharing with Spark also makes it attractive. In this talk, we will discuss the latest features in Spark Streaming and how it integrates with Kafka natively with no data loss, and even do exactly once processing!
Tuning Java Driver for Apache CassandraNenad Bozic
Apache Cassandra is distributed masterless column store database which is becoming mainstream for analytics and IoT data. Many use cases where Cassandra is natural fit require latency tuning in order to serve requests really fast. DataStax driver has many options, some less familiar, which can greatly influence performance aspect. This talk will focus on Java applications and options at your disposal in DataStax Java driver which became standard when you want to use this database. We will concentrate on both monitoring and tuning aspect of things and we will provide different options for different use cases. There is no silver bullet solution and having many options requires deep dive when you want to figure out right decision. This talk will narrow down options and give you push in the right direction.
Paolo Castagna is a Senior Sales Engineer at Confluent. His background is on 'big data' and he has, first hand, saw the shift happening in the industry from batch to stream processing and from big data to fast data. His talk will introduce Kafka Streams and explain why Apache Kafka is a great option and simplification for stream processing.
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2
Today’s highly connected world is flooding businesses with big and fast-moving data. The ability to trawl this data ocean and identify actionable insights can deliver a competitive advantage to any organization. The WSO2 Analytics Platform enables businesses to do just that by providing batch, real-time, interactive and predictive analysis capabilities all in one place.
In this tutorial we will
Plug in the WSO2 Analytics Platform to some common business use cases
Showcase the numerous capabilities of the platform
Demonstrate how to collect data, analyze, predict and communicate effectively
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedHostedbyConfluent
Enforcing format, changing schema, introducing privacy filters have always been a challenge with the classical Kafka-API. In this talk we'll cover how to extend existing applications with webassembly, allowing developers to change the shape of data at runtime, per application without creating additional topics. By leveraging WebAssembly, we can extend the capabilities of the Kafka-API beyond what it was initially imagined. Come and learn about the future of the Kafka-API
(PFC308) How Dropbox Scales Massive Workloads Using Amazon SQS | AWS re:Inven...Amazon Web Services
In this session, learn how Dropbox scales to provide one of the largest cloud storage and file sharing services in the world. Hear how Dropbox leverages Amazon EC2 to run varied workloads including thumbnail generation and document prevent, as well as document indexing to support full-text search. Dropbox presents ''Livefill'' - a generic framework built on top of Amazon SQS. Livefill enables them to trigger customizable data-processing workloads on data stored in Amazon S3 and helps them support more than 200,000 workload requests per second, spread across thousands of machines.
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...HostedbyConfluent
If a real-time dashboard takes 5 minutes to refresh, it’s not real-time. With data lakes increasingly enabling massive amounts of unprocessed data sets, delivering low-latency analytics is not for the faint-hearted. Learn how to stream massive amounts of data which used to be impossible to handle from Kafka, to serve real-time applications using lake-scale optimized approaches to storage and indexing.
Kurze Zusammenfassung Über Stream Reasoning/CEP und Überblick zu Apache Storm/Apache Spark im Rahmen des #devspace / OpenSpace 2015 in Leipzig
Relevante "Grundlagen" die ich als Ausgangspunkt u.a. genommen habe:
http://www.kr.tuwien.ac.at/staff/beck/pub/ijcai15-bde.pdf
http://www.kr.tuwien.ac.at/staff/beck/pub/aaai2015.pdf
http://www.kr.tuwien.ac.at/staff/beck/slides/slides-aaai2015.pdf
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Spark Streaming & Kafka-The Future of Stream ProcessingJack Gudenkauf
Hari Shreedharan/Cloudera @Playtika. With its easy to use interfaces and native integration with some of the most popular ingest tools, such as Kafka, Flume, Kinesis etc, Spark Streaming has become go-to tool for stream processing. Code sharing with Spark also makes it attractive. In this talk, we will discuss the latest features in Spark Streaming and how it integrates with Kafka natively with no data loss, and even do exactly once processing!
Tuning Java Driver for Apache CassandraNenad Bozic
Apache Cassandra is distributed masterless column store database which is becoming mainstream for analytics and IoT data. Many use cases where Cassandra is natural fit require latency tuning in order to serve requests really fast. DataStax driver has many options, some less familiar, which can greatly influence performance aspect. This talk will focus on Java applications and options at your disposal in DataStax Java driver which became standard when you want to use this database. We will concentrate on both monitoring and tuning aspect of things and we will provide different options for different use cases. There is no silver bullet solution and having many options requires deep dive when you want to figure out right decision. This talk will narrow down options and give you push in the right direction.
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsData Con LA
Adoption of open source software (OSS) at the enterprise level has flourished, as more businesses discover the considerable advantages that open source solutions hold over their proprietary counterparts, and as the enterprise mentality around open source continues to shift. We will discuss how to identify good application candidates for Apache Cassandra and Kafka as well as best practices and common pitfalls.
This presentation will also cover:
The origins of Apache Cassandra and Kafka and how these technologies have come to shape how next-gen applications are built.
Production use cases of Cassandra and Kafka: Real-time payments and buying a house (Lendi and Worldpay)
Core concepts that make the magic; Explaining the technical attributes that make your project a good fit for these technologies and the architectural patterns that make the best use of it’s capability.
Speaker: Adam Zegelin, SVP Engineering and Co-Founder, Instaclustr
As Instaclustr's founding software engineer, Adam provides the foundation knowledge of Instaclustr's capability and engineering environment. Adam is also focused on providing Instaclustr's contribution to the broader open source community on which our products and the services rely, including Apache Cassandra, Apache Spark, and other core technologies such as CoreOS and Docker. Prior to founding Instaclustr, Adam worked on large-scale big data projects with Australian Government agencies.
Big Data Architecture Workshop - Vahid Amiridatastack
Big Data Architecture Workshop
This slide is about big data tools, thecnologies and layers that can be used in enterprise solutions.
TopHPC Conference
2019
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseBizTalk360
Microsoft Cosmos DB is the Swiss army NoSQL database in the cloud. It is a multi-model, multi-API, globally-distributed, highly-available, and secure No-SQL database in Azure. In this session, we will explore its capabilities and features through several demos.
Learn essentials of Microsoft azure for developers.
Microsoft Azure is a growing collection of integrated cloud services which developers and IT professionals use to build, deploy and manage applications through our global network of datacentres. With Azure, you get the freedom to build and deploy wherever you want, using the tools, applications and frameworks of your choice.
Data Lake and the rise of the microservicesBigstep
By simply looking at structured and unstructured data, Data Lakes enable companies to understand correlations between existing and new external data - such as social media - in ways traditional Business Intelligence tools cannot.
For this you need to find out the most efficient way to store and access structured or unstructured petabyte-sized data across your entire infrastructure.
In this meetup we’ll give answers on the next questions:
1. Why would someone use a Data Lake?
2. Is it hard to build a Data Lake?
3. What are the main features that a Data Lake should bring in?
4. What’s the role of the microservices in the big data world?
NoSQL – Data Center Centric Application EnablementDATAVERSITY
The growth of Datacenter infrastructure is trending out of bounds, along with the pace in user activity and data generation in this digital era. However, the nature of the typical application deployment within the data center is changing to accommodate new business needs. Those changes introduce complexities in application deployment architecture and design, which cascade into requirements for a new generation of database technology (NoSQL) destined to ease that complexity. This webcast will discuss the modern data centers data centric application, the complexities that must be dealt with and common architectures found to describe and prescribe new data center aware services. Well look at the practical issues in implementation and overview current state of art in NoSQL database technology solving the problems of data center awareness in application development.
Massively scalable ETL in real world applications: the hard wayJ On The Beach
Big Data examples always give the correct answers. However, in the real world, Big Data might be corrupt, contradictory or consist of so many small files it becomes extremely hard to keep track - let alone scale. A solid architecture will help to overcome many of the difficulties.
Floris will talk about a real-world implementation of a massively scalable ETL architecture. Two years ago, at the time of the implementation, Airflow just became part of Apache and still left many features to be desired for. However, requirements from the start were thousands of ETL tasks per day on average, but on occasion, this could become hundreds of thousands. The script-based method that was in place was already not capable to meet the requirements on a day to day basis and needed to be replaced as soon as possible. So this custom framework was rolled out in just 8 weeks of development time.
Traditional Big Data is done on Data you have. You load the data into a repository and perform map reduce or other style calculations on the data. However, certain industries need to perform complex operations on data you might not have. Data you can acquire, Data that can be shared with you, and Data that you can model are all types of data you may not have but may need to integrate instantly into a complex data analysis. Problem is: you may not even know you need this data until deep into the execution stack at runtime. This talk discusses a new functional language paradigm for dealing naturally with data you don’t have and about how to make all data first-class citizens, regardless of whether you have it or you don’t, and we will give a demo of a project written in Scala to deal exactly with this issue.
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...J On The Beach
Industry 4.0, aka the "Fourth Industrial Revolution," refers to the computerization of manufacturing. One important aspect of Industry 4.0 is the ability to monitor the health and reliability of a physical manufacturing plant using low-cost IoT sensors. For example, machine learning models can be trained to predict the physical degradation of a manufacturing system as a function of acoustic measurements obtained from strategically placed microphones; however, the same acoustic measurements can be used to reverse engineer proprietary information about the manufacturing process and/or precisely what is being manufactured at the time of recording. Thus, improved reliability and fault tolerance is achieved at the cost of what appears to be an unprecedented new class of security vulnerabilities related to the acoustic side channel.
As a case study, we report a novel acoustic side channel attack against a commercial DNA synthesizer, a commonly used instrument in fields such as synthetic biology. Using a smart phone-quality microphone placed on or in the near vicinity of a DNA synthesizer, we were able to determine with 88.07% accuracy the sequence of DNA being produced; using a database of biologically relevant known-sequences, we increased the accuracy of our model to 100%. An academic or industrial research project may use the synthetic DNA to engineer an organism with desired traits or functions; however, while the organism is still under development, prior to publication, patent, and/or copyright, the research remains vulnerable to academic intellectual property theft and/or industrial espionage. On the other hand, this attack could also be used for benevolent purposes, for example, to determine whether a suspected criminal or terrorist is engineering a harmful pathogen. Thus, it is essential to recognize both the benefits and risks inherent to the cyber-physical systems that will inevitably control Industry 4.0 manufacturing processes and to take steps to mitigate them whenever possible.
Where is the edge in IoT and how much can you do there? Data collection? Analytics? I’ll show you how to build and deploy an embedded IoT edge platform that can do data collection, analytics, dashboarding and much more. All using Open Source.
As IoT deployments move forward, the need to collect, analyze, and respond to data further out on the edge becomes a critical factor in the success – or failure – of any IoT project. Network bandwidth costs may be dropping, and storage is cheaper than ever, but at IoT scale, these costs can still quickly overrun a project’s budget and ultimately doom it to failure.
The more you centralize your data collection and storage, the higher these costs become. Edge data collection and analysis can dramatically lower these costs, plus decrease the time to react to critical sensor data. With most data platforms, it simply isn’t practical, or even possible, to push collection AND analytics to the edge. In this talk I’ll show how I’ve done exactly this with a combination of open source hardware – Pine64 – and open source software – InfluxDB – to build a practical, efficient and scalable data collection and analysis gateway device for IoT deployments. The edge is where the data is, so the edge is where the data collection and analytics needs to be.
Drinking from the firehose, with virtual streams and virtual actorsJ On The Beach
Event Stream Processing is a popular paradigm for building robust and performant systems in many different domains, from IoT to fraud detection to high-frequency trading. Because of the wide range of scenarios and requirements, it is difficult to conceptualize a unified programming model that would be equally applicable to all of them. Another tough challenge is how to build streaming systems with cardinalities of topics ranging from hundreds to billions while delivering good performance and scalability.
In this session, Sergey Bykov will talk about the journey of building Orleans Streams that originated in gaming and monitoring scenarios, and quickly expanded beyond them. He will cover the programming model of virtual streams that emerged as a natural extension of the virtual actor model of Orleans, the architecture of the underlying runtime system, the compromises and hard choices made in the process. Sergey will share the lessons learned from the experience of running the system in production, and future ideas and opportunities that remain to be explored.
Over the last twenty years, there has been a paradigm shift in software development: from meticulously planned release cycles to an experimental way of working in which lead times are becoming shorter and shorter.
How can Java ever keep up with this trend when we have Docker containers that are several hundred megabytes in size, with warm-up times of ten minutes or longer? In this talk, I'll demonstrate how we can use Quarkus so that we can create super small, super fast Java containers! This will give us better possibilities for scaling up and down - which can be a game-changer, especially in a serverless environment. It will also provide the shortest possible lead times, as well as a much better use of cloud performance with the added bonus of lower costs.
When Cloud Native meets the Financial SectorJ On The Beach
We live in our own bubble of microservices and endlessly horizontal scaling infrastructure, but there is still critical infrastructure that runs the world of financial systems depending on Windows boxes, FTP servers, and single-threaded protocols. This talk is about how to glue these two worlds together, what works for us and what doesn't.
The advancement of technology in the last decade or so has allowed astronomy to see exponential growth in data volumes. ESA's space telescope Euclid will gather high-resolution images of a third of the sky, ~850GB of data downloaded daily for 6 years, by 2032 ground-based telescope LSST will have generated 500PB of data and the radio telescope SKA will be producing more data per second than the entire internet worldwide. This talk will address the questions of what current techniques exist to address big data volumes, how the astronomical community will prepare for this big data wave, and what other challenges lie ahead?
The world is moving from a model where data sits at rest, waiting for people to make requests of it, to where data is constantly moving and streams of data flow to and from devices with or without human interaction. Decisions need to be made based on these streams of data in real-time, models need to be updated, and intelligence needs to be gathered. In this context, our old-fashioned approach of CRUD REST APIs serving CRUD database calls just doesn't cut it. It's time we moved to a stream-centric view of the world.
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...J On The Beach
Our increasingly connected world leveraging the Internet of Things (IoT) creates great value, in connected healthcare, smart cities, and more. The increasing use of IoT also creates great risk. We will discuss the challenges and risks we need to address as developers in TIPPSS - Trust, Identity, Privacy, Protection, Safety, and Security - for devices, systems and solutions we deliver and use. Florence leads IEEE workstreams on clinical IoT and data interoperability with blockchain addressing TIPPSS issues. She is an author of IEEE articles on "Enabling Trust and Security - TIPPSS for IoT" and "Wearables and Medical Interoperability - the Evolving Frontier", "TIPPSS for Smart Cities" in the 2017 book "Creating, Analysing and Sustaining Smarter Cities: A Systems Perspective" , and Editor in Chief for an upcoming book on "Women Securing the Future with TIPPSS for IoT."
Pushing AI to the Client with WebAssembly and BlazorJ On The Beach
Want to run your AI algorithms directly in the browser on the client-side? Now you can with WebAssembly and Blazor. Join us as we write code directly in WebAssembly. Then, we’ll look at Blazor and how you can use it, along with WebAssembly to run your tooling client side in the browser.
Want to run your AI algorithms directly in the browser on the client-side without the need for transpilers or browser plug-ins? Well, now you can with WebAssembly and Blazor. WebAssembly (WASM) is the W3C specification that will be used to provide the next generation of development tools for the web and beyond. Blazor is Microsoft’s experiment that allows ASP.Net developers to create web pages that do much of the scripting work in C# using WASM. Come join us as we learn to write code directly in WebAssembly’s human-readable format. Then, we’ll look at the current state of Blazor and how you can use it, along with WebAssembly to run your tooling client side in the browser.
RAFT protocol is a well-known protocol for consensus in Distributed Systems. Want to learn how consensus is achieved in a system with a large amount of data such as Axon Server’s Event Store? Join this talk to hear about all specifics regarding data replication in highly available Event Store!
Axon is a free and open source Java framework for writing Java applications following DDD, event sourcing, and CQRS principles. While especially useful in a microservices context, Axon provides great value in building structured monoliths that can be broken down into microservices when needed.
Axon Server is a messaging platform specifically built to support distributed Axon applications. One of its key benefits is storing events published by Axon applications. In not so rare cases, the number of these events is over millions, even billions. Availability of Axon Server plays a significant role in the product portfolio. To keep event replication reliable we chose RAFT protocol for consensus implementation of our clustering features.
In short, consensus involves multiple servers agreeing on values. Once they reach a decision on a value, that decision is final. Typical consensus algorithms make progress when any majority of their servers is available; for example, a cluster of 5 servers can continue to operate even if 2 servers fail. If more servers fail, they stop making progress (but will never return an incorrect result).
Join this talk to learn why we chose RAFT; what were our findings during the design, the implementation, and testing phase; and what does it mean to replicate an event store holding billions of events!
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...J On The Beach
Thinking of moving to Microservices? Watch out! That quest is full of traps, social traps. If you are not able to handle it, you may be blocked by meetings, frustration, endless challenges that will make you miss the monolith. In this talk, I share my experience and mistakes, so you can avoid them.
Creating or migrating to a Microservices architecture might easily become a big mess, not only due to technical challenges but mostly because of human factors: it’s a major change in the software culture of a company. In this talk, I’ll share my past experience as the technical lead of an ambitious Microservices-based product, I’ll go through the parts we struggled with, and give you some advice on how to deal with what I call the Six Pitfalls:
The Common Patterns Phobia
The Book Club Cult
The Never-Decoupled Story
The Buzz Words Syndrome
The Agile Trap
The Conway’s Law Hackers
Instead of randomly injecting faults ( i.e. Chaos Monkey), what if we could order our experiments to perform min number of experiments for maximum yield? We present a solution(& results) to the problem of experiment selection using Lineage Driven Fault Injection to reduce the search space of faults.
Lineage Driven Fault Injection (LDFI) is a state of the art technique in chaos engineering experiment selection. LDFI since its inception has used an SAT solver under the hood which presents solutions to the decision problem (which faults to inject) in no particular order. As SRE’s we would like to perform experiments that reveal the bugs that the customers are most likely to hit first. In this talk, we present new improvements to LDFI that orders the experiment suggestions.
In the first the half of the talk we will show LDFI is a technique that can be widely used within an enterprise. We present the motivation for ordering the chaos experiments along with some prioritization we utilized while conducting the experiments. We also highlight how ordering is a general purpose technique that we can use to encode the peculiarities of a heterogeneous microservices architecture. LDFI can work in an enterprise by harnessing the observability infrastructure to model the redundancy of the system.
Next, we present experiments conducted within our organization using ordered LDFI and some preliminary results. We show examples of services where we discovered bugs, and how carefully controlling the order of experiments allowed LDFI to avoid running unnecessary experiments. We also present an example of an application where we declared the service shippable under crash stop model. We also present a comparison with Chaos Monkey and show how LDFI found the known bugs in a given application using orders of magnitude fewer experiments than a random fault injection tool like Chaos Monkey.
Finally, we discuss how we plan to take LDFI forward. We discuss open problems and possible solutions for scalarizing probabilities of failure, latency injection, integration with service mesh technologies like envoy for fine-grained fault injection, fault injection for stateful systems.
Key takeaways: 1) Understand how LDFI can be integrated in the enterprise by harnessing the observability infrastructure. 2) Limitations of LDFI w.r.t unordered solutions and why ordering matters for chaos engineering experiments. 3) Preliminary results of prioritized LDFI and a future direction for the community.
Complexity in systems should be defeated if it is possible to do. But the default nature of our computer systems are complex and servers are doomed to fail. In this talk, we will go through new approaches in modern architectures to design and evaluate new computer systems.
Interaction Protocols: It's all about good mannersJ On The Beach
Distributed systems collaborate to achieve collective goals via a system of rules. Rules that affords good hygiene, fault tolerance, effective communication and trusted feedback. These rules form protocols which enable the system to achieve its goals.
Distributed and concurrent systems can be considered a social group that collaborates to achieve collective goals. In order to collaborate a system of rules must be applied, that affords good hygiene, fault tolerance, and effective communication to coordinate, share knowledge, and provide feedback in a polite trusted manner. These rules form a number of protocols which enable the group to act as a system which is greater than the sum of the individual components.
In this talk, we will explore the history of protocols and their application when building distributed systems.
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...J On The Beach
Do you want to check the efficiency of the new, state of the art, GraalVM JIT Compiler in comparison to the old but mostly used JIT C2? Let’s have a side by side comparison from a performance standpoint on the same source code.
The talk reveals how traditional Just In Time Compiler (e.g. JIT C2) from HotSpot/OpenJDK internally manages runtime optimizations for hot methods in comparison to the new, state of the art, GraalVM JIT Compiler on the same source code, emphasizing all of the internals and strategies used by each Compiler to achieve better performance in most common situations (or code patterns). For each optimization, there is Java source code and corresponding generated assembly code in order to prove what really happens under the hood.
Each test is covered by a dedicated benchmark (JMH), timings and conclusions. Main topics of the agenda: - Scalar replacement - Null Checks - Virtual calls - Lock coarsening - Lock elision - Virtual calls - Scalar replacement - Lambdas - Vectorization (few cases)
The tools used during my research study are JITWatch, Java Measurement Harness, and perf. All test scenarios will be launched against the latest official Java release (e.g. version 11).
Leadership is easy when you're a manager, or an expert in a field, or a conference speaker! In a Kanban organisation, though, we "encourage acts of leadership at every level". In this talk, we look at what it means to be a leader in the uncertain, changing and high-learning environment of software development. We learn about the importance of safety in encouraging others to lead and follow, and how to get that safety using both technical and human practices; the necessity of a clear, compelling vision and provision of information on how we're achieving it; and the need to be able to ask awkward and difficult questions... especially the ones without easy answers.
Machine Learning: The Bare Math Behind LibrariesJ On The Beach
During this presentation, we will answer how much you’ll need to invest in a superhero costume to be as popular as Superman. We will generate a unique logo which will stand against the ever popular Batman and create new superhero teams. We shall achieve it using linear regression and neural networks.
Machine learning is one of the hottest buzzwords in technology today as well as one of the most innovative fields in computer science – yet people use libraries as black boxes without basic knowledge of the field. In this session, we will strip them to bare math, so next time you use a machine learning library, you’ll have a deeper understanding of what lies underneath.
During this session, we will first provide a short history of machine learning and an overview of two basic teaching techniques: supervised and unsupervised learning.
We will start by defining what machine learning is and equip you with an intuition of how it works. We will then explain the gradient descent algorithm with the use of simple linear regression to give you an even deeper understanding of this learning method. Then we will project it to supervised neural networks training.
Within unsupervised learning, you will become familiar with Hebb’s learning and learning with concurrency (winner takes all and winner takes most algorithms). We will use Octave for examples in this session; however, you can use your favourite technology to implement presented ideas.
Our aim is to show the mathematical basics of neural networks for those who want to start using machine learning in their day-to-day work or use it already but find it difficult to understand the underlying processes. After viewing our presentation, you should find it easier to select parameters for your networks and feel more confident in your selection of network type, as well as be encouraged to dive into more complex and powerful deep learning methods.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Lessons learnt from building a globally distributed database service from the ground up.
1. Azure Cosmos DB
Lessons learnt from building a globally distributed database from the ground up
Dharma Shukla, @dharmashukla, Distinguished Engineer, Microsoft
3. 2010 2014 2015 2017
DocumentDB Cosmos DBProject Florence
• Originally started to
address the problems faced
by large scale apps inside
Microsoft
• Built from the ground up
for the cloud
• Used extensively inside
Microsoft
• One of the fastest growing
services on Azure
4. Guaranteed high availability within region and globally
Guaranteed low latency at the 99th percentile, worldwide
Guaranteed consistency
Iterate & query without worrying about schemas & index management
Elastically scale throughput and storage, any time, on-demand, globally
Provide a variety of data model and API choices
Global distribution from the ground up
Fully resource governed stack
Comprehensive SLAs (availability, latency, throughput, consistency)
Operate at low cost
Schema-agnostic database engine
Requirements
Turnkey global distribution
6. Globaldistributionfromtheground-up
• Cosmos DB as a foundational Azure service
– Available in all Azure regions by default, including sovereign/government clouds
• Automatic multi-region replication
– Associate any number of regions with your database account
– Policy based geo-fencing
• Multi-homing APIs
– Apps don’t need to be redeployed during regional failover
• Allows for dynamically setting priorities to regions
– Simulate regional disaster via API
– Test the end to end availability for the entire app (beyond just the database)
• First to offer comprehensive SLA for latency, throughput, availability and consistency
7. • Globally distributed with reads and writes served from local region
• Write optimized, latch-free database engine designed for SSDs and low latency access
• Synchronous and automatic indexing at sustained ingestion rates
Guaranteedlowlatency@P99
8. • System designed to independently scale storage and throughput
• Transparent server side partition management and routing
• Automatically indexed SSD storage
• Automatic global distribution of data across any number of Azure
regions
• Optionally evict old data using built-in support for TTL
Elasticallyscalablestorage
10. Elastically scale throughput from 10 to 100s of
millions of requests/sec across multiple regions
Customers pay by the hour for the provisioned
throughput
Transparent server side partition management and
routing
Support for requests/sec and requests/min for
different workloads
9 PM PST
Less throughput
More throughput
More throughput
Less throughput
11 PM PST
Provisionedrequest/sec
Time
12000000
10000000
8000000
6000000
4000000
2000000
Nov 2016 Dec 2016
Black Friday
Hourly throughput (request/sec)
Elasticallyscalablethroughput,globally
14. • At global scale, schema/index management is
hard
• Automatic and synchronous indexing of all
ingested content - hash, range, geo-spatial, and
columnar
• No schemas or secondary indices ever
needed
• Resource governed, write optimized database
engine with latch free and log structured
techniques
• Online and in-situ index transformations
Schemaagnosticindexing
locations headquarter exports
0 1
country
Germany
city
Berlin
country
France
city
Paris
city
Moscow
city
Athens
Belgium 0 1
{
"locations":
[
{ "country": "Germany", "city": "Berlin" },
{ "country": "France", "city": "Paris" }
],
"headquarter": "Belgium",
"exports":[{ "city": "Moscow" },{ "city": "Athens"}]
}
15. • Database engine operates on atom-record-sequence
(ARS) based type system
• All data models are translated to ARS
• API and wire protocols are supported via extensible
modules
• Instance of a given data model can be materialized as
trees
• Graph, documents, key-value, column-family, … more
to come
Nativesupportformultipledatamodels
SQL
17. Resource Model
• Single system image of
globally distributed, URI
addressable logical
resources
• Consistent, hierarchical
overlay over horizontally
partitioned entities
• Extensible custom
projections
18. Horizontal partitioning
• All resources are horizontally
partitioned
• Resource Partition
• Consistent, highly available and
resource governed, coordination
primitive
• Uniquely belongs to a tenant
• Partition management is transparent
and made highly responsive
19. Global distribution
• All resources are horizontally
partitioned and vertically
distributed
• Nested consensus
• Distribution can be within a cluster,
x-cluster, x-DC or x-region
20. Partition-sets
• Dynamic allocations of system
resources
• Dynamic replication topologies
(e.g. tree, chain, hub-spoke)
based on consistency level and
network conditions
21. Resource Governed Stack
• Replica density, COGS and SLA, all
depend on stringent resource
governance across the entire stack
• Request Unit (RU)
• Rate based currency
• Normalized across various
access methods
• Available for second (RU/s) and
minute (RU/m) granularities
• All engine operations are finely
calibrated