This talk delves into the many ways that a user has to use HBase in a project. Lars will look at many practical examples based on real applications in production, for example, on Facebook and eBay and the right approach for those wanting to find their own implementation. He will also discuss advanced concepts, such as counters, coprocessors and schema design.
Apache HBase Improvements and Practices at XiaomiHBaseCon
Duo Zhang and Liangliang He (Xiaomi)
In this session, we’ll discuss the various practices around HBase in use at Xiaomi, including those relating to HA, tiered compaction, multi-tenancy, and failover across data centers.
At Salesforce, we have deployed many thousands of HBase/HDFS servers, and learned a lot about tuning during this process. This talk will walk you through the many relevant HBase, HDFS, Apache ZooKeeper, Java/GC, and Operating System configuration options and provides guidelines about which options to use in what situation, and how they relate to each other.
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
"While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second.
This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they can be implemented on top of HBase, using schemas that optimize for the given access patterns. "
The tech talk was gieven by Ranjeeth Kathiresan, Salesforce Senior Software Engineer & Gurpreet Multani, Salesforce Principal Software Engineer in June 2017.
Apache HBase Improvements and Practices at XiaomiHBaseCon
Duo Zhang and Liangliang He (Xiaomi)
In this session, we’ll discuss the various practices around HBase in use at Xiaomi, including those relating to HA, tiered compaction, multi-tenancy, and failover across data centers.
At Salesforce, we have deployed many thousands of HBase/HDFS servers, and learned a lot about tuning during this process. This talk will walk you through the many relevant HBase, HDFS, Apache ZooKeeper, Java/GC, and Operating System configuration options and provides guidelines about which options to use in what situation, and how they relate to each other.
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
"While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second.
This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they can be implemented on top of HBase, using schemas that optimize for the given access patterns. "
The tech talk was gieven by Ranjeeth Kathiresan, Salesforce Senior Software Engineer & Gurpreet Multani, Salesforce Principal Software Engineer in June 2017.
Apache HBase is the Hadoop opensource, distributed, versioned storage manager well suited for random, realtime read/write access. This talk will give an overview on how HBase achieve random I/O, focusing on the storage layer internals. Starting from how the client interact with Region Servers and Master to go into WAL, MemStore, Compactions and on-disk format details. Looking at how the storage is used by features like snapshots, and how it can be improved to gain flexibility, performance and space efficiency.
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
In this presentation, we will introduce Hotspot's Garbage First collector (G1GC) as the most suitable collector for latency-sensitive applications running with large memory environments. We will first discuss G1GC internal operations and tuning opportunities, and also cover tuning flags that set desired GC pause targets, change adaptive GC thresholds, and adjust GC activities at runtime. We will provide several HBase case studies using Java heaps as large as 100GB that show how to best tune applications to remove unpredicted, protracted GC pauses.
Apache phoenix: Past, Present and Future of SQL over HBAseenissoz
HBase as the NoSQL database of choice in the Hadoop ecosystem has already been proven itself in scale and in many mission critical workloads in hundreds of companies. Phoenix as the SQL layer on top of HBase, has been increasingly becoming the tool of choice as the perfect complementary for HBase. Phoenix is now being used more and more for super low latency querying and fast analytics across a large number of users in production deployments. In this talk, we will cover what makes Phoenix attractive among current and prospective HBase users, like SQL support, JDBC, data modeling, secondary indexing, UDFs, and also go over recent improvements like Query Server, ODBC drivers, ACID transactions, Spark integration, etc. We will conclude by looking into items in the pipeline and how Phoenix and HBase interacts with other engines like Hive and Spark.
From: DataWorks Summit 2017 - Munich - 20170406
HBase hast established itself as the backend for many operational and interactive use-cases, powering well-known services that support millions of users and thousands of concurrent requests. In terms of features HBase has come a long way, overing advanced options such as multi-level caching on- and off-heap, pluggable request handling, fast recovery options such as region replicas, table snapshots for data governance, tuneable write-ahead logging and so on. This talk is based on the research for the an upcoming second release of the speakers HBase book, correlated with the practical experience in medium to large HBase projects around the world. You will learn how to plan for HBase, starting with the selection of the matching use-cases, to determining the number of servers needed, leading into performance tuning options. There is no reason to be afraid of using HBase, but knowing its basic premises and technical choices will make using it much more successful. You will also learn about many of the new features of HBase up to version 1.3, and where they are applicable.
Overview of HBase cluster replication feature, covering implementation details as well as monitoring tools and tips for troubleshooting and support of Replication deployments.
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
Most developers are familiar with the topic of “database design”. In the relational world, normalization is the name of the game. How do things change when you’re working with a scalable, distributed, non-SQL database like HBase? This talk will cover the basics of HBase schema design at a high level and give several common patterns and examples of real-world schemas to solve interesting problems. The storage and data access architecture of HBase (row keys, column families, etc.) will be explained, along with the pros and cons of different schema decisions.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
Anoop Sam John and Ramkrishna Vasudevan (Intel)
HBase provides an LRU based on heap cache but its size (and so the total data size that can be cached) is limited by Java’s max heap space. This talk highlights our work under HBASE-11425 to allow the HBase read path to work directly from the off-heap area.
Practical learnings from running thousands of Flink jobsFlink Forward
Flink Forward San Francisco 2022.
Task Managers constantly running out of memory? Flink job keeps restarting from cryptic Akka exceptions? Flink job running but doesn’t seem to be processing any records? We share practical learnings from running thousands of Flink Jobs for different use-cases and take a look at common challenges they have experienced such as out-of-memory errors, timeouts and job stability. We will cover memory tuning, S3 and Akka configurations to address common pitfalls and the approaches that we take on automating health monitoring and management of Flink jobs at scale.
by
Hong Teoh & Usamah Jassat
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...Michael Stack
Wellington Chevreuil of Cloudera
Track 1: Internals
https://open.mi.com/conference/hbasecon-asia-2019
THE COMMUNITY EVENT FOR APACHE HBASE™
July 20th, 2019 - Sheraton Hotel, Beijing, China
https://hbase.apache.org/hbaseconasia-2019/
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Hadoop World 2011: Advanced HBase Schema DesignCloudera, Inc.
While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second.
This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they can be implemented on top of HBase, using schemas that optimize for the given access patterns.
Apache HBase is the Hadoop opensource, distributed, versioned storage manager well suited for random, realtime read/write access. This talk will give an overview on how HBase achieve random I/O, focusing on the storage layer internals. Starting from how the client interact with Region Servers and Master to go into WAL, MemStore, Compactions and on-disk format details. Looking at how the storage is used by features like snapshots, and how it can be improved to gain flexibility, performance and space efficiency.
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
In this presentation, we will introduce Hotspot's Garbage First collector (G1GC) as the most suitable collector for latency-sensitive applications running with large memory environments. We will first discuss G1GC internal operations and tuning opportunities, and also cover tuning flags that set desired GC pause targets, change adaptive GC thresholds, and adjust GC activities at runtime. We will provide several HBase case studies using Java heaps as large as 100GB that show how to best tune applications to remove unpredicted, protracted GC pauses.
Apache phoenix: Past, Present and Future of SQL over HBAseenissoz
HBase as the NoSQL database of choice in the Hadoop ecosystem has already been proven itself in scale and in many mission critical workloads in hundreds of companies. Phoenix as the SQL layer on top of HBase, has been increasingly becoming the tool of choice as the perfect complementary for HBase. Phoenix is now being used more and more for super low latency querying and fast analytics across a large number of users in production deployments. In this talk, we will cover what makes Phoenix attractive among current and prospective HBase users, like SQL support, JDBC, data modeling, secondary indexing, UDFs, and also go over recent improvements like Query Server, ODBC drivers, ACID transactions, Spark integration, etc. We will conclude by looking into items in the pipeline and how Phoenix and HBase interacts with other engines like Hive and Spark.
From: DataWorks Summit 2017 - Munich - 20170406
HBase hast established itself as the backend for many operational and interactive use-cases, powering well-known services that support millions of users and thousands of concurrent requests. In terms of features HBase has come a long way, overing advanced options such as multi-level caching on- and off-heap, pluggable request handling, fast recovery options such as region replicas, table snapshots for data governance, tuneable write-ahead logging and so on. This talk is based on the research for the an upcoming second release of the speakers HBase book, correlated with the practical experience in medium to large HBase projects around the world. You will learn how to plan for HBase, starting with the selection of the matching use-cases, to determining the number of servers needed, leading into performance tuning options. There is no reason to be afraid of using HBase, but knowing its basic premises and technical choices will make using it much more successful. You will also learn about many of the new features of HBase up to version 1.3, and where they are applicable.
Overview of HBase cluster replication feature, covering implementation details as well as monitoring tools and tips for troubleshooting and support of Replication deployments.
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
Most developers are familiar with the topic of “database design”. In the relational world, normalization is the name of the game. How do things change when you’re working with a scalable, distributed, non-SQL database like HBase? This talk will cover the basics of HBase schema design at a high level and give several common patterns and examples of real-world schemas to solve interesting problems. The storage and data access architecture of HBase (row keys, column families, etc.) will be explained, along with the pros and cons of different schema decisions.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
Anoop Sam John and Ramkrishna Vasudevan (Intel)
HBase provides an LRU based on heap cache but its size (and so the total data size that can be cached) is limited by Java’s max heap space. This talk highlights our work under HBASE-11425 to allow the HBase read path to work directly from the off-heap area.
Practical learnings from running thousands of Flink jobsFlink Forward
Flink Forward San Francisco 2022.
Task Managers constantly running out of memory? Flink job keeps restarting from cryptic Akka exceptions? Flink job running but doesn’t seem to be processing any records? We share practical learnings from running thousands of Flink Jobs for different use-cases and take a look at common challenges they have experienced such as out-of-memory errors, timeouts and job stability. We will cover memory tuning, S3 and Akka configurations to address common pitfalls and the approaches that we take on automating health monitoring and management of Flink jobs at scale.
by
Hong Teoh & Usamah Jassat
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...Michael Stack
Wellington Chevreuil of Cloudera
Track 1: Internals
https://open.mi.com/conference/hbasecon-asia-2019
THE COMMUNITY EVENT FOR APACHE HBASE™
July 20th, 2019 - Sheraton Hotel, Beijing, China
https://hbase.apache.org/hbaseconasia-2019/
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Hadoop World 2011: Advanced HBase Schema DesignCloudera, Inc.
While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second.
This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they can be implemented on top of HBase, using schemas that optimize for the given access patterns.
HBase can be an intimidating beast for someone considering its adoption. For what kinds of workloads is it well suited? How does it integrate into the rest of my application infrastructure? What are the data semantics upon which applications can be built? What are the deployment and operational concerns? In this talk, I'll address each of these questions in turn. As supporting evidence, both high-level application architecture and internal details will be discussed. This is an interactive talk: bring your questions and your use-cases!
With the public confession of Facebook, HBase is on everyone's lips when it comes to the discussion around the new "NoSQL" area of databases. In this talk, Lars will introduce and present a comprehensive overview of HBase. This includes the history of HBase, the underlying architecture, available interfaces, and integration with Hadoop.
HBase Advanced Schema Design - Berlin Buzzwords - June 2012larsgeorge
While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second. This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they
http://berlinbuzzwords.de/sessions/advanced-hbase-schema-design
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLPiotr Pruski
This is the hands-on-lab document I created accompanying my presentation at the Information On Demand 2013 conference for Session Number 1687 - Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL.
*Contact me for data files*
This lab has 3 independant parts:
Part I - Creating Big SQL Tables and Loading Data
(exploring different ways to create and load HBase tables with Big SQL. includes an optional section on HBase access via JAQL)
Part II - Query Handling
(how to query HBase tables with Big SQL)
Part III - Connecting to Big SQL Server via JDBC
(using BIRT, a business intelligence and reporting tool, to run a simple report on a tpch orders table showcasing use of the BigSQL JDBC driver)
HBase In Action - Chapter 04: HBase table designphanleson
HBase In Action - Chapter 04: HBase table design
Learning HBase, Real-time Access to Your Big Data, Data Manipulation at Scale, Big Data, Text Mining, HBase, Deploying HBase
Chris Larsen (Yahoo!)
This year we'll talk about the joys of the HBase Fuzzy Row Filter, new TSDB filters, expression support, Graphite functions and running OpenTSDB on top of Google’s hosted Bigtable. AsyncHBase now includes per-RPC timeouts, append support, Kerberos auth, and a beta implementation in Go.
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseCloudera, Inc.
Apache HBase is a distributed data store that is in production today at many enterprises and sites serving large volumes of near-real-time random-accesses. As Apache HBase matures, the community has augmented the system with new features that many enterprise consider to be hard requirements. We will discuss how the upcoming HBase 0.96 release addresses many of these shortcomings by introducing new features that will help the administrator monitor and control access to the system, and new mechanisms to minimize downtime due to expected and unexpected outages.
HBase from the Trenches - Phoenix Data Conference 2015Avinash Ramineni
Apache HBase has been widely adopted at many enterprises. In this talk we will cover a few war stories with troubleshooting, tuning and fixing problems with HBase Cluster. We will be covering some of the best practices, tools , utilities and lessons learnt from evaluating deployments at different organizations
HBase hast established itself as the backend for many operational and interactive use-cases, powering well-known services that support millions of users and thousands of concurrent requests. In terms of features HBase has come a long way, overing advanced options such as multi-level caching on- and off-heap, pluggable request handling, fast recovery options such as region replicas, table snapshots for data governance, tuneable write-ahead logging and so on. This talk is based on the research for the an upcoming second release of the speakers HBase book, correlated with the practical experience in medium to large HBase projects around the world. You will learn how to plan for HBase, starting with the selection of the matching use-cases, to determining the number of servers needed, leading into performance tuning options. There is no reason to be afraid of using HBase, but knowing its basic premises and technical choices will make using it much more successful. You will also learn about many of the new features of HBase up to version 1.3, and where they are applicable.
We start by looking at distributed database features that impact latency. Then we take a deeper look at the HBase read and write paths with a focus on request latency. We examine the sources of latency and how to minimize them.
Basic Introduction to Cassandra with Architecture and strategies.
with big data challenge. What is NoSQL Database.
The Big Data Challenge
The Cassandra Solution
The CAP Theorem
The Architecture of Cassandra
The Data Partition and Replication
Big Data Architecture Workshop - Vahid Amiridatastack
Big Data Architecture Workshop
This slide is about big data tools, thecnologies and layers that can be used in enterprise solutions.
TopHPC Conference
2019
This is an introduction to relational and non-relational databases and how their performance affects scaling a web application.
This is a recording of a guest Lecture I gave at the University of Texas school of Information.
In this talk I address the technologies and tools Gowalla (gowalla.com) uses including memcache, redis and cassandra.
Find more on my blog:
http://schneems.com
Introduction to HBase. HBase is a NoSQL databases which experienced a tremendous increase in popularity during the last years. Large companies like Facebook, LinkedIn, Foursquare are using HBase. In this presentation we will address questions like: what is HBase?, and compared to relational databases?, what is the architecture?, how does HBase work?, what about the schema design?, what about the IT ressources?. Questions that should help you consider whether this solution might be suitable in your case.
Flipboard services over 100 million users using heterogenous results including user generated content, interest profile, algorithmically generated content, social firehose, friends graph, ads, and web/rss crawlers. To personalize and serve these results in real time, Flipboard employs a variety of data models, access patterns and configuration. This talk will present how some of these strategies are implemented using HBase.
The workshop tells about HBase data model, architecture and schema design principles.
Source code demo:
https://github.com/moisieienko-valerii/hbase-workshop
Everything I know about software in spaghetti bolognese: managing complexityJAX London
None of of us enjoy spaghetti code, much less spaghetti builds, spaghetti tests, or spaghetti classpaths. Some days software engineering seems like the struggle against spaghetti - modern projects often become a large and complex tangle of source code and resources, and the runtime interactions are even worse! This means many projects are hard to navigate, difficult to extend and problematic to deploy/run. You only have to look at the OpenJDK itself to see the problems it faces in trying to have a smaller deployment and runtime footprint! Jigsaw was to be the inbuilt solution for the OpenJDK, extending beyond the JVM itself to help applications with their modularisation story (i.e. Jigsaw or OSGi). Sadly, Jigsaw is not going to make it for Java 8, so developers are left with only a few tooling choices in order to help them. However, tooling is only part of the answer in managing the complexity. What really matters in avoiding a noodly mess is the layout of your source code and resources, clear contracts, and how you think about the dependencies your code has at runtime. Dr Holly Cummins and Martijn Verburg will take you through some of the practical design decisions you can take to help modularise your application properly, independent of tooling. Time permitting they'll then showcase some of tooling support that OSGi can bring to the table as the premier modularity system for Java today.
Devops with the S for Sharing - Patrick DeboisJAX London
Devops means many things to many people. Even without a clear definition people instantly understand the problem space, while the solution space is much more complex and layered. In this talk on devops, different fields and practices will be presented, including how Devops is related to Agile and Lean, and the central roles that infrastructure as code, metrics and monitoring have. Many devops talks relate to the CAMS acronym : Culture, Automation, Measuring and Sharing. The S for Sharing is usually taken for granted and does not get much explanation, but in this talk it will be right in the centre. Without Sharing there is no Devops and successful adoption is impossible.
Busy Developer's Guide to Windows 8 HTML/JavaScript AppsJAX London
With the upcoming release of Windows 8, Microsoft decided to bring HTML+Javascript into the world of Windows-platform application development as a first-class citizen. But make no mistake, this isn’t an attempt to somehow subvert Web developers—it’s more about enabling Web developers to leverage those skills in building “native” Windows applications running on the Windows 8 laptops, desktops, and slates. In this presentation, we’ll go over the basics of building a Windows 8 app using HTML and JavaScript, including a brief overview of what’s possible—and what’s not—for the Web developer seeking to “go native” on Windows.
It's code but not as we know: Infrastructure as Code - Patrick DeboisJAX London
Configuration Management systems like CFEngine, Puppet and Chef, are often adopted as part of devops toolchains. It promises us infrastructure as code a concept that leads to 'Agile' infrastructure: In this session I'd like to give: - a brief explanation of the concept and why it's useful - an overview about the similarities this has with regular code - concepts such as TDD, BDD how well can they be translated - how it fits in with continuous Integration for systems Besides the concept translation I will add links to existing tools and project that working and evolving in the space. And finally, is it really the same? Does a coding background help you? Where do the tools/concepts need improvement? After this session I'm sure you'll be ready to give it a spin and explore more and possibly share some ideas back.
Locks? We Don't Need No Stinkin' Locks - Michael BarkerJAX London
Embrace the dark side. As a developer you'll often be advised that writing concurrent code should be the purview of the genius coders alone. In this talk Michael Barker will discard that notion into the cesspits of logic and reason and attempt to present on the less understood area of non-blocking concurrency, i.e. concurrency without locks. We'll look the modern Intel CPU architecture, why we need a memory model, the performance costs of various non-blocking constructs and delve into the implementation details of the latest version of the Disruptor to see how non-blocking concurrency can be applied to build high performance data structures.
Worse is better, for better or for worse - Kevlin HenneyJAX London
Over two decades ago, Richard P Gabriel proposed the thesis of "Worse Is Better" to explain why some things that are designed to be pure and perfect are eclipsed by solutions that are seemingly limited and incomplete. This is not simply the observation that things that should be better are not, but that some solutions that were not designed to be the best were nonetheless effective and were the better option. We find many examples of this in software development, some more provocative and surprising than others. In this talk we revisit the original premise and question in the context of software architecture.
Java performance: What's the big deal? - Trisha GeeJAX London
It seems that everywhere you look these days people are needing to work on high performance/low latency applications. If they're not already working on them, then they want to be. But what do you really need to know when you're working on performance-sensitive systems? Where do you get started? In this session, Trisha outlines things you need to consider when thinking about performance, covers some of the frequently-asked questions, and notes the pitfalls and common traps that developers fall into.
There is an increasing interest in functional programming from Java developers and the organisations in which they work. For many companies the challenge now is how to make use of the competitive advantage of functional programming. For developers, how do you adapt your mindset to this newly reimagined paradigm? Through the use of examples and a modular approach to design, Clojure made simple will show how developers can be productive quickly without a major change to their current development life-cycle. We will also cover the Clojure build process, tools and exciting projects out there.
HTML alchemy: the secrets of mixing JavaScript and Java EE - Matthias WessendorfJAX London
A deep dive on creating mobile-ready, cloud-enabled, HTML5 applications based on Java EE and modern JavaScript. You will learn how to balance and combine the enterprise Java programming model, based on APIs such as CDI, JAX-RS and EJB 3.1, with JavaScript libraries like jQuery Mobile, Backbone, Require and Underscore, while taking advantage of the ease of deployment and elasticity of the cloud.
This presentation introduces the key innovations that Play 2 brings to web application development in Java and Scala. The Play framework has brought high-productivity web development to Java with three innovations that changed the rules on Java EE: Java class and template save-and-reload that just works, a simplified stateless architecture that enables cloud deployment, and superior ease-of-use. Following Play's rapidly-growing popularity, Play 2.0 was released in March 2012 with innovations that are not just new in the Java world: type-safe view templates and HTTP routing, compile-time checking for static resources, and native support for both Java and Scala. Type safety matters. After dynamically-typed programming languages such as PHP and Ruby set the standard for high-productivity web development, Play built on their advantages and has created a type-safe web development framework with extensive compile-time checking. This is essential for applications that will scale to tens of thousands of lines of code, with hundreds of view templates. Meanwhile, Play avoids the architectural-complexity that is promoted by Java EE-based approaches. The result is that Play 2 first enables rapid initial application development and then Play 2 helps you build big, serious and scalable web applications.
Complexity theory and software development : Tim BerglundJAX London
Some systems are too large to be understood entirely by any one human mind. They are composed of a diverse array of individual components capable of interacting with each other and adapting to a changing environment. As systems, they produce behavior that differs in kind from the behavior of their components. Complexity Theory is an emerging discipline that seeks to describe such phenomena previously encountered in biology, sociology, economics, and other disciplines. Beyond new ways of looking at ant colonies, fashion trends, and national economies, complexity theory promises powerful insights to software development. The Internet—perhaps the most valuable piece of computing infrastructure of the present day—may fit the description of a complex system. Large corporate organizations in which developers are employed have complex characteristics. In this session, we'll explore what makes a complex system, what advantages complexity has to offer us, and how to harness these in the systems we build.
Why FLOSS is a Java developer's best friend: Dave GruberJAX London
The explosion of new open source projects is changing the game for today’s Java developers. With literally hundreds of thousands of FOSS projects underway, the opportunity to leverage open source to deliver “the trifecta” (faster/better/cheaper) has never been better. In this session we will explore tools and resources that can help you navigate the vast world of open source projects, in addition to sharing tips and tricks that will help you narrow the field so you can quickly get to the right projects for your next application.
Scalable systems make us face challenges like concurrency, distribution, fault tolerance, elasticity, etc. Akka not only steps up to meet these, but it makes writing scalable software particularly easy. We will demo Akka's most important tool, the actor model, using a vivid example and live coding.
Alternative databases continue to establish their role in the technology stack of the future—and for many, the technology stack of the present. Making mature engineering decisions about when to adopt new products is not easy, and requires that we learn about them both from an abstract perspective and from a very concrete one as well. If you are going to recommend a NoSQL database for a new project, you're going to have to look at code. In this talk, we'll examine three important contenders in the NoSQL space: Cassandra, MongoDB, and Neo4J. We'll review their data models, scaling paradigms, and query idioms. Most importantly, we'll work through the exercise of modeling a real-world problem with each database, and look at the code and queries we'd use to implement real product features. Come to this session for a thorough and thoroughly practical smackdown between three important NoSQL products.
Closures, the next "Big Thing" in Java: Russel WinderJAX London
Java 8 will bring lambda function and closures to the Java-verse. These tools will revolutionize programming using Java, leading to shorter more declarative code. But how can we prepare ourselves for this? A number of JVM-based languages already have support for lambda functions and closures: dynamically-typed languages like Groovy, and statically-typed languages like Scala and Kotlin. Can looking at examples of how to solve problems in these languages teach us how to do things in Java 8? Yes. Of course the observation of how programming changes may get people to start using Groovy, Scala, Kotlin, etc. instead of Java. The critical thing here is that we can realize the revolution in an evolutionary way. The JVM allows for all of these languages to work together, we can program bits in Java, bits in Groovy, bits in Scala, etc. So we can evolve extant systems by rewriting bits a little at a time. In this session we will look at these issues and a collection of examples to try and ascertain what the idioms might be in a post-Java 8 world.
Java and the machine - Martijn Verburg and Kirk PepperdineJAX London
In Terminator 3 - Rise of the Machines, bare metal comes back to haunt humanity, ruthlessly crushing all resistance. This keynote is here to warn you that the same thing is happening to Java and the JVM! Java was designed in a world where there were a wide range of hardware platforms to support. Its premise of Write Once Run Anywhere (WORA) proved to be one of the compelling reasons behind Java's dominance (even if the reality didn't quite meet the marketing hype). However, this WORA property means that Java and the JVM struggled to utilise specialist hardware and operating system features that could make a massive difference in the performance of your application. This problem has recently gotten much, much worse. Due to the rise of multi-core processors, massive increases in main memory and enhancements to other major hardware components (e.g. SSD), the JVM is now distant from utilising that hardware, causing some major performance and scalability issues! Kirk Pepperdine and Martijn Verburg will take you through the complexities of where Java meets the machine and loses. They'll give up some of their hard-won insights on how to work around these issues so that you can plan to avoid termination, unlike some of the poor souls that ran into the T-800...
Brendan McAdams explores the deeper relationship between the MongoDB database and various languages on the Java Virtual Machine such as Java, Scala, Clojure, JRuby and Python as well as the challenges posted getting MongoDB to play nice with these tools and their syntax. Also examined will be frameworks and integration points popular between MongoDB and the JVM such as Spring Data, Morphia and Lift’s MongoDB-Record component
New opportunities for connected data - Ian RobinsonJAX London
Today's complex data is not only big, but also semi-structured and densely connected. In this session we'll look at how size, structure and connectedness have converged to change the data world. We'll then go on to look at some of the new opportunities for creating end-user value that have emerged in a world of connected data, illustrated with practical examples implemented using Neo4j, the world's leading graph database.
The family of HTML5 technologies has pushed the pendulum away from rich client technologies and towards ever more capable web clients running in today's browsers. In particular, WebSockets brings new opportunities for efficient peer to peer communication, providing the basis for a new generation of interactive and 'live' web applications. This session examines the efforts underway to support WebSockets in the Java programming model using JSR 356, from its base level integration in the Java Servlet and Java EE containers, to a new easy to use API and toolset that is destined to become a part of the standard Java Platform.
The Big Data Con: Why Big Data is a Problem, not a Solution - Ian PloskerJAX London
"Big Data" is a commonly brandished term, but rarely does anyone provide a clear definition of it. If we stop to define Big Data, we find that at its core, it is about an inability to meet the challenges of information storage and access in a world where data is generated by the petabyte. As an alternative exists the concept of Critical Data, data that generates value for businesses, whose unavailability costs money or even lives. This is the data which has already been identified and understood to be at the core of a business. Together, we will gain a deeper understanding of how to gather, store, and access this business critical information.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
8. HBase Tables and Regions
• Table is made up of any number if regions
• Region is specified by its startKey and endKey
• Empty table: (Table, NULL, NULL)
• Two-region table: (Table, NULL, “com.cloudera.www”)
and (Table, “com.cloudera.www”, NULL)
• Each region may live on a different node and is
made up of several HDFS files and blocks, each
of which is replicated by Hadoop
10. HBase Tables
• Tables are sorted by Row in lexicographical order
• Table schema only defines its column families
• Each family consists of any number of columns
• Each column consists of any number of versions
• Columns only exist when inserted, NULLs are free
• Columns within a family are sorted and stored
together
• Everything except table names are byte[]
(Table, Row, Family:Column, Timestamp) -> Value
12. HBase Architecture (cont.)
• HBase uses HDFS (or similar) as its reliable
storage layer
• Handles checksums, replication, failover
• Native Java API, Gateway for REST, Thrift, Avro
• Master manages cluster
• RegionServer manage data
• ZooKeeper is used the “neural network”
• Crucial for HBase
• Bootstraps and coordinates cluster
13. HBase Architecture (cont.)
• Based on Log-Structured Merge-Trees (LSM-Trees)
• Inserts are done in write-ahead log first
• Data is stored in memory and flushed to disk on
regular intervals or based on size
• Small flushes are merged in the background to keep
number of files small
• Reads read memory stores first and then disk based
files second
• Deletes are handled with “tombstone” markers
• Atomicity on row level no matter how many columns
• keeps locking model easy
14. MemStores
• After data is written to the WAL the RegionServer
saves KeyValues in memory store
• Flush to disk based on size, see
hbase.hregion.memstore.flush.size
• Default size is 64MB
• Uses snapshot mechanism to write flush to disk
while still serving from it and accepting new data
at the same time
• Snapshots are released when flush has
succeeded
15. Compactions
• General Concepts
• Two types: Minor and Major Compactions
• Asynchronous and transparent to client
• Manage file bloat from MemStore flushes
• Minor Compactions
• Combine last “few” flushes
• Triggered by number of storage files
• Major Compactions
• Rewrite all storage files
• Drop deleted data and those values exceeding TTL and/or number of
versions
• Triggered by time threshold
• Cannot be scheduled automatically starting at a specific time (bummer!)
• May (most definitely) tax overall HDFS IO performance
Tip: Disable major compactions and schedule to run manually (e.g.
cron) at off-peak times
16. Block Cache
• Acts as very large, in-memory distributed cache
• Assigned a large part of the JVM heap in the RegionServer process,
see hfile.block.cache.size
• Optimizes reads on subsequent columns and rows
• Has priority to keep “in-memory” column families in cache
if(inMemory) {
this.priority = BlockPriority.MEMORY;
} else {
this.priority = BlockPriority.SINGLE;
}
• Cache needs to be used properly to get best read performance
• Turn off block cache on operations that cause large churn
• Store related data “close” to each other
• Uses LRU cache with threaded (asynchronous) evictions based on
priorities
17. Region Splits
• Triggered by configured maximum file size of any
store file
• This is checked directly after the compaction call to
ensure store files are actually approaching the
threshold
• Runs as asynchronous thread on RegionServer
• Splits are fast and nearly instant
• Reference files point to original region files and
represent each half of the split
• Compactions take care of splitting original files
into new region directories
19. Auto Sharding and Distribution
• Unit of scalability in HBase is the Region
• Sorted, contiguous range of rows
• Spread “randomly” across RegionServer
• Moved around for load balancing and failover
• Split automatically or manually to scale with
growing data
• Capacity is solely a factor of cluster nodes vs.
regions per node
20. Column Family vs. Column
• Use only a few column families
• Causes many files that need to stay open per region
plus class overhead per family
• Best used when logical separation between data
and meta columns
• Sorting per family can be used to convey
application logic or access pattern
21. Storage Separation
• Column Families allow for separation of data
• Used by Columnar Databases for fast analytical
queries, but on column level only
• Allows different or no compression depending on the
content type
• Segregate information based on access pattern
• Data is stored in one or more storage file, called
HFiles
25. Key Cardinality
• The best performance is gained from using row
keys
• Time range bound reads can skip store files
• So can Bloom Filters
• Selecting column families reduces the amount of
data to be scanned
• Pure value based filtering is a full table scan
• Filters often are too, but reduce network traffic
27. Fold, Store, and Shift
• Logical layout does not match physical one
• All values are stored with the full coordinates,
including: Row Key, Column Family, Column
Qualifier, and Timestamp
• Folds columns into “row per column”
• NULLs are cost free as nothing is stored
• Versions are multiple “rows” in folded table
28. Key/Table Design
• Crucial to gain best performance
• Why do I need to know? Well, you also need to know
that RDBMS is only working well when columns are
indexed and query plan is OK
• Absence of secondary indexes forces use of row
key or column name sorting
• Transfer multiple indexes into one
• Generate large table -> Good since fits architecture
and spreads across cluster
29. DDI
• Stands for Denormalization, Duplication and
Intelligent Keys
• Needed to overcome shortcomings of
architecture
• Denormalization -> Replacement for JOINs
• Duplication -> Design for reads
• Intelligent Keys -> Implement indexing and
sorting, optimize reads
30. Pre-materialize Everything
• Achieve one read per customer request if
possible
• Otherwise keep at lowest number
• Reads between 10ms (cache miss) and 1ms
(cache hit)
• Use MapReduce to compute exacts in batch
• Store and merge updates live
• Use incrementColumnValue
Motto: “Design for Reads”
31. Tall-Narrow vs. Flat-Wide Tables
• Rows do not split
• Might end up with one row per region
• Same storage footprint
• Put more details into the row key
• Sometimes dummy column only
• Make use of partial key scans
• Tall with Scans, Wide with Gets
• Atomicity only on row level
• Example: Large graphs, stored as adjacency
matrix
32. Example: Mail Inbox
<userId> : <colfam> : <messageId> : <timestamp> : <email-message>
12345 : data : 5fc38314-e290-ae5da5fc375d : 1307097848 : "Hi Lars, ..."
12345 : data : 725aae5f-d72e-f90f3f070419 : 1307099848 : "Welcome, and ..."
12345 : data : cc6775b3-f249-c6dd2b1a7467 : 1307101848 : "To Whom It ..."
12345 : data : dcbee495-6d5e-6ed48124632c : 1307103848 : "Hi, how are ..."
or
12345-5fc38314-e290-ae5da5fc375d : data : : 1307097848 : "Hi Lars, ..."
12345-725aae5f-d72e-f90f3f070419 : data : : 1307099848 : "Welcome, and ..."
12345-cc6775b3-f249-c6dd2b1a7467 : data : : 1307101848 : "To Whom It ..."
12345-dcbee495-6d5e-6ed48124632c : data : : 1307103848 : "Hi, how are ..."
è Same Storage Requirements
33. Partial Key Scans
Key
Descrip+on
<userId> Scan
over
all
messages
for
a
given
user
ID
<userId>-<date> Scan
over
all
messages
on
a
given
date
for
the
given
user
ID
<userId>-<date>-<messageId> Scan
over
all
parts
of
a
message
for
a
given
user
ID
and
date
<userId>-<date>-<messageId>-<attachmentId> Scan
over
all
a8achments
of
a
message
for
a
given
user
ID
and
date
34. Sequential Keys
<timestamp><more key>: {CF: {CQ: {TS : Val}}}
• Hotspotting on Regions: bad!
• Instead do one of the following:
• Salting
• Prefix <timestamp> with distributed value
• Binning or bucketing rows across regions
• Key field swap/promotion
• Move <more key> before the timestamp (see OpenTSDB
later)
• Randomization
• Move <timestamp> out of key
35. Salting
• Prefix row keys to gain spread
• Use well known or numbered prefixes
• Use modulo to spread across servers
• Enforce common data stay close to each other for
subsequent scanning or MapReduce processing
0_rowkey1, 1_rowkey2, 2_rowkey3
0_rowkey4, 1_rowkey5, 2_rowkey6
• Sorted by prefix first
0_rowkey1
0_rowkey4
1_rowkey2
1_rowkey5
…
36. Hashing vs. Sequential Keys
• Uses hashes for best spread
• Use for example MD5 to be able to recreate key
• Key = MD5(customerID)
• Counter productive for range scans
• Use sequential keys for locality
• Makes use of block caches
• May tax one server overly, may be avoided by salting
or splitting regions while keeping them small
38. Key Design Summary
• Based on access pattern, either use sequential or
random keys
• Often a combination of both is needed
• Overcome architectural limitations
• Neither is necessarily bad
• Use bulk import for sequential keys and reads
• Random keys are good for random access patterns
39. Example: Facebook Insights
• > 20B Events per Day
• 1M Counter Updates per Second
• 100 Nodes Cluster
• 10K OPS per Node
• ”Like” button triggers AJAX request
• Event written to log file
• 30mins current for website owner
Web
➜
Scribe
➜
Ptail
➜
Puma
➜
HBase
40. HBase Counters
• Store counters per Domain and per URL
• Leverage HBase increment (atomic read-modify-
write) feature
• Each row is one specific Domain or URL
• The columns are the counters for specific metrics
• Column families are used to group counters by
time range
• Set time-to-live on CF level to auto-expire counters by
age to save space, e.g., 2 weeks on “Daily Counters”
family
41. Key Design
• Reversed Domains
• Examples: “com.cloudera.www”, “com.cloudera.blog”
• Helps keeping pages per site close, as HBase efficiently
scans blocks of sorted keys
• Domain Row Key =
MD5(Reversed Domain) + Reversed Domain
• Leading MD5 hash spreads keys randomly across all regions
for load balancing reasons
• Only hashing the domain groups per site (and per subdomain
if needed)
• URL Row Key =
MD5(Reversed Domain) + Reversed Domain + URL ID
• Unique ID per URL already available, make use of it
43. Summary
• Design for Use-Case
• Read, Write, or Both?
• Avoid Hotspotting
• Consider using IDs instead of full text
• Leverage Column Family to HFile relation
• Shift details to appropriate position
• Composite Keys
• Column Qualifiers
44. Summary (cont.)
• Schema design is a combination of
• Designing the keys (row and column)
• Segregate data into column families
• Choose compression and block sizes
• Similar techniques are needed to scale most
systems
• Add indexes, partition data, consistent hashing
• Denormalization, Duplication, and Intelligent
Keys (DDI)