This is a version of the talk given at Dev Bootcamp in Chicago.
Technical Debt has become a catch-all phrase for any code that needs to be re-worked. Much like Refactoring has become a catch-all phrase for any activity that involves changing code. These fundamental misunderstandings and comfortable yet mis-applied metaphors have resulted in a plethora of poor decisions. What is technical debt? What is not technical debt? Why should we care? What is the cost of misunderstanding? What do we do about it? Doc discusses the origins of the metaphor, what it means today, and how we properly identify and manage technical debt.
CDC Stream Processing with Apache FlinkTimo Walther
An instant world requires instant decisions at scale. This includes the ability to digest and react to changes in real-time. Thus, event logs such as Apache Kafka can be found in almost every architecture, while databases and similar systems still provide the foundation. Change Data Capture (CDC) has become popular for propagating changes. Nevertheless, integrating all these systems, which often have slightly different semantics, can be a challenge.
In this talk, we highlight what it means for Apache Flink to be a general data processor that acts as a data integration hub. Looking under the hood, we demonstrate Flink's SQL engine as a changelog processor that ships with an ecosystem tailored to processing CDC data and maintaining materialized views. We will discuss the semantics of different data sources and how to perform joins or stream enrichment between them. This talk illustrates how Flink can be used with systems such as Kafka (for upsert logging), Debezium, JDBC, and others.
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
Flink Forward San Francisco 2022.
Being in the payments space, Stripe requires strict correctness and freshness guarantees. We rely on Flink as the natural solution for delivering on this in support of our Change Data Capture (CDC) infrastructure. We heavily rely on CDC as a tool for capturing data change streams from our databases without critically impacting database reliability, scalability, and maintainability. Data derived from these streams is used broadly across the business and powers many of our critical financial reporting systems totalling over $640 Billion in payment volume annually. We use many components of Flink’s flexible DataStream API to perform aggregations and abstract away the complexities of stream processing from our downstreams. In this talk, we’ll walk through our experience from the very beginning to what we have in production today. We’ll share stories around the technical details and trade-offs we encountered along the way.
by
Jeff Chao
Apache Flink is a popular stream computing framework for real-time stream computing. Many stream compute algorithms require trailing data in order to compute the intended result. One example is computing the number of user logins in the last 7 days. This creates a dilemma where the results of the stream program are incomplete until the runtime of the program exceeds 7 days. The alternative is to bootstrap the program using historic data to seed the state before shifting to use real-time data.
This talk will discuss alternatives to bootstrap programs in Flink. Some alternatives rely on technologies exogenous to the stream program, such as enhancements to the pub/sub layer, that are more generally applicable to other stream compute engines. Other alternatives include enhancements to Flink source implementations. Lyft is exploring another alternative using orchestration of multiple Flink programs. The talk will cover why Lyft pursued this alternative and future directions to further enhance bootstrapping support in Flink.
Speaker
Gregory Fee, Principal Engineer, Lyft
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Evening out the uneven: dealing with skew in FlinkFlink Forward
Flink Forward San Francisco 2022.
When running Flink jobs, skew is a common problem that results in wasted resources and limited scalability. In the past years, we have helped our customers and users solve various skew-related issues in their Flink jobs or clusters. In this talk, we will present the different types of skew that users often run into: data skew, key skew, event time skew, state skew, and scheduling skew, and discuss solutions for each of them. We hope this will serve as a guideline to help you reduce skew in your Flink environment.
by
Jun Qin & Karl Friedrich
CDC Stream Processing with Apache FlinkTimo Walther
An instant world requires instant decisions at scale. This includes the ability to digest and react to changes in real-time. Thus, event logs such as Apache Kafka can be found in almost every architecture, while databases and similar systems still provide the foundation. Change Data Capture (CDC) has become popular for propagating changes. Nevertheless, integrating all these systems, which often have slightly different semantics, can be a challenge.
In this talk, we highlight what it means for Apache Flink to be a general data processor that acts as a data integration hub. Looking under the hood, we demonstrate Flink's SQL engine as a changelog processor that ships with an ecosystem tailored to processing CDC data and maintaining materialized views. We will discuss the semantics of different data sources and how to perform joins or stream enrichment between them. This talk illustrates how Flink can be used with systems such as Kafka (for upsert logging), Debezium, JDBC, and others.
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
Flink Forward San Francisco 2022.
Being in the payments space, Stripe requires strict correctness and freshness guarantees. We rely on Flink as the natural solution for delivering on this in support of our Change Data Capture (CDC) infrastructure. We heavily rely on CDC as a tool for capturing data change streams from our databases without critically impacting database reliability, scalability, and maintainability. Data derived from these streams is used broadly across the business and powers many of our critical financial reporting systems totalling over $640 Billion in payment volume annually. We use many components of Flink’s flexible DataStream API to perform aggregations and abstract away the complexities of stream processing from our downstreams. In this talk, we’ll walk through our experience from the very beginning to what we have in production today. We’ll share stories around the technical details and trade-offs we encountered along the way.
by
Jeff Chao
Apache Flink is a popular stream computing framework for real-time stream computing. Many stream compute algorithms require trailing data in order to compute the intended result. One example is computing the number of user logins in the last 7 days. This creates a dilemma where the results of the stream program are incomplete until the runtime of the program exceeds 7 days. The alternative is to bootstrap the program using historic data to seed the state before shifting to use real-time data.
This talk will discuss alternatives to bootstrap programs in Flink. Some alternatives rely on technologies exogenous to the stream program, such as enhancements to the pub/sub layer, that are more generally applicable to other stream compute engines. Other alternatives include enhancements to Flink source implementations. Lyft is exploring another alternative using orchestration of multiple Flink programs. The talk will cover why Lyft pursued this alternative and future directions to further enhance bootstrapping support in Flink.
Speaker
Gregory Fee, Principal Engineer, Lyft
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Evening out the uneven: dealing with skew in FlinkFlink Forward
Flink Forward San Francisco 2022.
When running Flink jobs, skew is a common problem that results in wasted resources and limited scalability. In the past years, we have helped our customers and users solve various skew-related issues in their Flink jobs or clusters. In this talk, we will present the different types of skew that users often run into: data skew, key skew, event time skew, state skew, and scheduling skew, and discuss solutions for each of them. We hope this will serve as a guideline to help you reduce skew in your Flink environment.
by
Jun Qin & Karl Friedrich
Ever tried to get get clarity on what kinds of memory there are and how to tune each of them ? If not, very likely your jobs are configured incorrectly. As we found out, its is not straightforward and it is not well documented either. This session will provide information on the types of memory to be aware of, the calculations involved in determining how much is allocated to each type of memory and how to tune it depending on the use case.
Scylla Summit 2022: IO Scheduling & NVMe Disk ModellingScyllaDB
Join ScyllaDB engineer Pavel Emelyanov who will provide a walkthrough of Diskplorer, an open-source disk latency/bandwidth exploring toolset to measure behavior under load. By using Linux fio under the hood Diskplorer runs a battery of measurements to discover performance characteristics for a specific hardware configuration, giving you an at-a-glance view of how server storage I/O will behave under load.
Discover how ScyllaDB uses this elaborated model of disk performance, as well as a scheduling algorithm developed for the Seastar framework to build latency-oriented I/O scheduling that cherry-picks requests from the incoming queue keeping the disk load perfectly balanced.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Kafka on ZFS: Better Living Through Filesystems confluent
(Hugh O'Brien, Jet.com) Kafka Summit SF 2018
You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!)
Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka.
-Striping cheap disks to maximize instance IOPS
-Block compression to reduce disk usage by ~80% (JSON data)
-Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments
-Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free
We’ll cover:
-Basic Principles
-Adapting ZFS for cloud instances (gotchas)
-Performance tuning for Kafka
-Benchmarks
The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP
Ed Huang, CTO of PingCAP, talked at Go System Conference about dealing with the typical and profound issues related to Go’s runtime as your systems become more complex. Taking TiDB as an example, he demonstrated how these problems can be reproduced, located, and analyzed in production.
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...HostedbyConfluent
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo | Current 2022
Seven years ago, O’Reilly published the Streaming 101 and Streaming 102 articles, which have long served as the gateway for many undertaking their own journey into the world of stream processing. But while those articles have remained remarkably relevant over the years, they have also remained entirely static. Meanwhile, those of us pushing forward the envelope of stream processing have not.
Join Tyler Akidau, original author of those articles, and Dan Sotolongo, longtime collaborator and expert in streaming theory, as they reframe the fundamentals of stream processing in modern terms. What is streaming? What theoretical changes do we need to grow beyond the limitations of stream processing systems of the past? What wisdom can we draw from other disciplines to inform these changes? And if we transcend these limitations, can we make life simpler for everyone?
This talk will cover the key concepts of stream processing theory as we understand them today. It is simultaneously an introductory talk as well as an advanced survey on the breadth of stream processing theory. Anyone with an interest in streaming should find something engaging within.
Apache Kafka lies at the heart of the largest data pipelines, handling trillions of messages and petabytes of data every day. Learn the right approach for getting the most out of Kafka from the experts at LinkedIn and Confluent. Todd Palino and Gwen Shapira demonstrate how to monitor, optimize, and troubleshoot performance of your data pipelines—from producer to consumer, development to production—as they explore some of the common problems that Kafka developers and administrators encounter when they take Apache Kafka from a proof of concept to production usage. Too often, systems are overprovisioned and underutilized and still have trouble meeting reasonable performance agreements.
Topics include:
- What latencies and throughputs you should expect from Kafka
- How to select hardware and size components
- What you should be monitoring
- Design patterns and antipatterns for client applications
- How to go about diagnosing performance bottlenecks
- Which configurations to examine and which ones to avoid
Extending Flink SQL for stream processing use casesFlink Forward
Flink Forward San Francisco 2022.
Apache Flink is a powerful stream processing platform that enables users to build complex real time applications. Flink SQL provides a SQL interface that implements standard SQL. While the standard SQL provides a perfect interface for batch processing, in stream processing context, it can result is ambiguity and complex syntax. As an example, consider these three types of streams: Append-only stream, Retract stream and Upsert stream. Using standard SQL, we would represent all of these streams as Table along with the Table concept in batch processing. Such overloading of concepts can result in ambiguity in SQL statements in streaming context. In this talk, we will present extensions to the Flink SQL that simplify SQL statements in the context of stream processing. We will show how such extensions work in the context of a Flink application using different use cases. These extensions are only sugar syntax and users should be able to use Flink SQL as is if they desire.
by
Hojjat Jafarpour
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...Databricks
The reality of most large scale data deployments includes storage decoupled from computation, pipelines operating directly on files and metadata services with no locking mechanisms or transaction tracking. For this reason attempts at achieving transactional behavior, snapshot isolation, safe schema evolution or performant support for CRUD operations has always been marred with tradeoffs.
This talk will focus on technical aspects, practical capabilities and the potential future of three table formats that have emerged in recent years as solutions to the issues mentioned above – ACID ORC (in Hive 3.x), Iceberg and Delta Lake. To provide a richer context, a comparison between traditional databases and big data tools as well as an overview of the reasons for the current state of affairs will be included.
After the talk, the audience is expected to have a clear understanding of the current development trends in large scale table formats, on the conceptual and practical level. This should allow the attendees to make better informed assessments about which approaches to data warehousing, metadata management and data pipelining they should adapt in their organizations.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
This presentation:
* covers basics of caching and popular cache types
* explains evolution from simple cache to distributed, and from distributed to IMDG
* not describes usage of NoSQL solutions for caching
* is not intended for products comparison or for promotion of Hazelcast as the best solution
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
We have covered the need for CDC and the benefits of building a CDC pipeline. We will compare various CDC streaming and reconciliation frameworks. We will also cover the architecture and the challenges we faced while running this system in the production. Finally, we will conclude the talk by covering Apache Hudi, Schema Registry and Debezium in detail and our contributions to the open-source community.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
VictoriaLogs: Open Source Log Management System - PreviewVictoriaMetrics
VictoriaLogs Preview - Aliaksandr Valialkin
* Existing open source log management systems
- ELK (ElasticSearch) stack: Pros & Cons
- Grafana Loki: Pros & Cons
* What is VictoriaLogs
- Open source log management system from VictoriaMetrics
- Easy to setup and operate
- Scales vertically and horizontally
- Optimized for low resource usage (CPU, RAM, disk space)
- Accepts data from Logstash and Fluentbit in Elasticsearch format
- Accepts data from Promtail in Loki format
- Supports stream concept from Loki
- Provides easy to use yet powerful query language - LogsQL
* LogsQL Examples
- Search by time
- Full-text search
- Combining search queries
- Searching arbitrary labels
* Log Streams
- What is a log stream?
- LogsQL examples: querying log streams
- Stream labels vs log labels
* LogsQL: stats over access logs
* VictoriaLogs: CLI Integration
* VictoriaLogs Recap
Learn the current state of the NoSQL landscape and discover the different data models within it. From document stores and key value databases to graph and Wide Column. Then you’ll learn why wide column databases are the most appropriate for scalable high performance use cases, including capabilities for massive scale-out architecture, peer-to-peer clustering to avoid bottlenecking and built-in multi-datacenter replication.
Using LLVM to accelerate processing of data in Apache ArrowDataWorks Summit
Most query engines follow an interpreter-based approach where a SQL query is translated into a tree of relational algebra operations then fed through a conventional tuple-based iterator model to execute the query. We will explore the overhead associated with this approach and how the performance of query execution on columnar data can be improved using run-time code generation via LLVM.
Generally speaking, the best case for optimal query execution performance is a hand-written query plan that does exactly what is needed by the query for the exact same data types and format. Vectorized query processing models amortize the cost of function calls. However, research has shown that hand-written code for a given query plan has the potential to outperform the optimizations associated with a vectorized query processing model.
Over the last decade, the LLVM compiler framework has seen significant development. Furthermore, the database community has realized the potential of LLVM to boost query performance by implementing JIT query compilation frameworks. With LLVM, a SQL query is translated into a portable intermediary representation (IR) which is subsequently converted into machine code for the desired target architecture.
Dremio is built on top of Apache Arrow’s in-memory columnar vector format. The in-memory vectors map directly to the vector type in LLVM and that makes our job easier when writing the query processing algorithms in LLVM. We will talk about how Dremio implemented query processing logic in LLVM for some operators like FILTER and PROJECT. We will also discuss the performance benefits of LLVM-based vectorized query execution over other methods.
Speaker
Siddharth Teotia, Dremio, Software Engineer
Cosco: An Efficient Facebook-Scale Shuffle ServiceDatabricks
Cosco is an efficient shuffle-as-a-service that powers Spark (and Hive) jobs at Facebook warehouse scale. It is implemented as a scalable, reliable and maintainable distributed system. Cosco is based on the idea of partial in-memory aggregation across a shared pool of distributed memory. This provides vastly improved efficiency in disk usage compared to Spark's built-in shuffle. Long term, we believe the Cosco architecture will be key to efficiently supporting jobs at ever larger scale. In this talk we'll take a deep dive into the Cosco architecture and describe how it's deployed at Facebook. We will then describe how it's integrated to run shuffle for Spark, and contrast it with Spark's built-in sort-based shuffle mechanism and SOS (presented at Spark+AI Summit 2018).
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...Databricks
In Big Data field, Spark SQL is important data processing module for Apache Spark to work with structured row-based data in a majority of operators. Field-programmable gate array(FPGA) with highly customized intellectual property(IP) can not only bring better performance but also lower power consumption to accelerate CPU-intensive segments for an application.
Slide deck from the talk I gave at the conference
1. QuickView for test data-> quick feedback
2. Automating everything on top of open-source tools like Selenium, Appium, jenkins etc.
3. Pain points and work arounds specially for Mobile App Test Automation
Agile India 2017 Conference is Asia's Largest and Premier Conference on Agile, Scrum, eXtreme Programming, Lean, Kanban, DevOps, Enterprise Agile, Lean Startup, Continuous Delivery, Research and Patterns. Get to meet pioneers and expert practitioners from around the world on Agile Mindset, Scaling Agility, Lean Product Discovery, Continuous Delivery and DevOps. 6 - 12 March 2017 at ITC Gardenia, Bangalore. More details: http://2017.agileindia.org
Ever tried to get get clarity on what kinds of memory there are and how to tune each of them ? If not, very likely your jobs are configured incorrectly. As we found out, its is not straightforward and it is not well documented either. This session will provide information on the types of memory to be aware of, the calculations involved in determining how much is allocated to each type of memory and how to tune it depending on the use case.
Scylla Summit 2022: IO Scheduling & NVMe Disk ModellingScyllaDB
Join ScyllaDB engineer Pavel Emelyanov who will provide a walkthrough of Diskplorer, an open-source disk latency/bandwidth exploring toolset to measure behavior under load. By using Linux fio under the hood Diskplorer runs a battery of measurements to discover performance characteristics for a specific hardware configuration, giving you an at-a-glance view of how server storage I/O will behave under load.
Discover how ScyllaDB uses this elaborated model of disk performance, as well as a scheduling algorithm developed for the Seastar framework to build latency-oriented I/O scheduling that cherry-picks requests from the incoming queue keeping the disk load perfectly balanced.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Kafka on ZFS: Better Living Through Filesystems confluent
(Hugh O'Brien, Jet.com) Kafka Summit SF 2018
You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!)
Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka.
-Striping cheap disks to maximize instance IOPS
-Block compression to reduce disk usage by ~80% (JSON data)
-Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments
-Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free
We’ll cover:
-Basic Principles
-Adapting ZFS for cloud instances (gotchas)
-Performance tuning for Kafka
-Benchmarks
The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP
Ed Huang, CTO of PingCAP, talked at Go System Conference about dealing with the typical and profound issues related to Go’s runtime as your systems become more complex. Taking TiDB as an example, he demonstrated how these problems can be reproduced, located, and analyzed in production.
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...HostedbyConfluent
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo | Current 2022
Seven years ago, O’Reilly published the Streaming 101 and Streaming 102 articles, which have long served as the gateway for many undertaking their own journey into the world of stream processing. But while those articles have remained remarkably relevant over the years, they have also remained entirely static. Meanwhile, those of us pushing forward the envelope of stream processing have not.
Join Tyler Akidau, original author of those articles, and Dan Sotolongo, longtime collaborator and expert in streaming theory, as they reframe the fundamentals of stream processing in modern terms. What is streaming? What theoretical changes do we need to grow beyond the limitations of stream processing systems of the past? What wisdom can we draw from other disciplines to inform these changes? And if we transcend these limitations, can we make life simpler for everyone?
This talk will cover the key concepts of stream processing theory as we understand them today. It is simultaneously an introductory talk as well as an advanced survey on the breadth of stream processing theory. Anyone with an interest in streaming should find something engaging within.
Apache Kafka lies at the heart of the largest data pipelines, handling trillions of messages and petabytes of data every day. Learn the right approach for getting the most out of Kafka from the experts at LinkedIn and Confluent. Todd Palino and Gwen Shapira demonstrate how to monitor, optimize, and troubleshoot performance of your data pipelines—from producer to consumer, development to production—as they explore some of the common problems that Kafka developers and administrators encounter when they take Apache Kafka from a proof of concept to production usage. Too often, systems are overprovisioned and underutilized and still have trouble meeting reasonable performance agreements.
Topics include:
- What latencies and throughputs you should expect from Kafka
- How to select hardware and size components
- What you should be monitoring
- Design patterns and antipatterns for client applications
- How to go about diagnosing performance bottlenecks
- Which configurations to examine and which ones to avoid
Extending Flink SQL for stream processing use casesFlink Forward
Flink Forward San Francisco 2022.
Apache Flink is a powerful stream processing platform that enables users to build complex real time applications. Flink SQL provides a SQL interface that implements standard SQL. While the standard SQL provides a perfect interface for batch processing, in stream processing context, it can result is ambiguity and complex syntax. As an example, consider these three types of streams: Append-only stream, Retract stream and Upsert stream. Using standard SQL, we would represent all of these streams as Table along with the Table concept in batch processing. Such overloading of concepts can result in ambiguity in SQL statements in streaming context. In this talk, we will present extensions to the Flink SQL that simplify SQL statements in the context of stream processing. We will show how such extensions work in the context of a Flink application using different use cases. These extensions are only sugar syntax and users should be able to use Flink SQL as is if they desire.
by
Hojjat Jafarpour
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...Databricks
The reality of most large scale data deployments includes storage decoupled from computation, pipelines operating directly on files and metadata services with no locking mechanisms or transaction tracking. For this reason attempts at achieving transactional behavior, snapshot isolation, safe schema evolution or performant support for CRUD operations has always been marred with tradeoffs.
This talk will focus on technical aspects, practical capabilities and the potential future of three table formats that have emerged in recent years as solutions to the issues mentioned above – ACID ORC (in Hive 3.x), Iceberg and Delta Lake. To provide a richer context, a comparison between traditional databases and big data tools as well as an overview of the reasons for the current state of affairs will be included.
After the talk, the audience is expected to have a clear understanding of the current development trends in large scale table formats, on the conceptual and practical level. This should allow the attendees to make better informed assessments about which approaches to data warehousing, metadata management and data pipelining they should adapt in their organizations.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
This presentation:
* covers basics of caching and popular cache types
* explains evolution from simple cache to distributed, and from distributed to IMDG
* not describes usage of NoSQL solutions for caching
* is not intended for products comparison or for promotion of Hazelcast as the best solution
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
We have covered the need for CDC and the benefits of building a CDC pipeline. We will compare various CDC streaming and reconciliation frameworks. We will also cover the architecture and the challenges we faced while running this system in the production. Finally, we will conclude the talk by covering Apache Hudi, Schema Registry and Debezium in detail and our contributions to the open-source community.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
VictoriaLogs: Open Source Log Management System - PreviewVictoriaMetrics
VictoriaLogs Preview - Aliaksandr Valialkin
* Existing open source log management systems
- ELK (ElasticSearch) stack: Pros & Cons
- Grafana Loki: Pros & Cons
* What is VictoriaLogs
- Open source log management system from VictoriaMetrics
- Easy to setup and operate
- Scales vertically and horizontally
- Optimized for low resource usage (CPU, RAM, disk space)
- Accepts data from Logstash and Fluentbit in Elasticsearch format
- Accepts data from Promtail in Loki format
- Supports stream concept from Loki
- Provides easy to use yet powerful query language - LogsQL
* LogsQL Examples
- Search by time
- Full-text search
- Combining search queries
- Searching arbitrary labels
* Log Streams
- What is a log stream?
- LogsQL examples: querying log streams
- Stream labels vs log labels
* LogsQL: stats over access logs
* VictoriaLogs: CLI Integration
* VictoriaLogs Recap
Learn the current state of the NoSQL landscape and discover the different data models within it. From document stores and key value databases to graph and Wide Column. Then you’ll learn why wide column databases are the most appropriate for scalable high performance use cases, including capabilities for massive scale-out architecture, peer-to-peer clustering to avoid bottlenecking and built-in multi-datacenter replication.
Using LLVM to accelerate processing of data in Apache ArrowDataWorks Summit
Most query engines follow an interpreter-based approach where a SQL query is translated into a tree of relational algebra operations then fed through a conventional tuple-based iterator model to execute the query. We will explore the overhead associated with this approach and how the performance of query execution on columnar data can be improved using run-time code generation via LLVM.
Generally speaking, the best case for optimal query execution performance is a hand-written query plan that does exactly what is needed by the query for the exact same data types and format. Vectorized query processing models amortize the cost of function calls. However, research has shown that hand-written code for a given query plan has the potential to outperform the optimizations associated with a vectorized query processing model.
Over the last decade, the LLVM compiler framework has seen significant development. Furthermore, the database community has realized the potential of LLVM to boost query performance by implementing JIT query compilation frameworks. With LLVM, a SQL query is translated into a portable intermediary representation (IR) which is subsequently converted into machine code for the desired target architecture.
Dremio is built on top of Apache Arrow’s in-memory columnar vector format. The in-memory vectors map directly to the vector type in LLVM and that makes our job easier when writing the query processing algorithms in LLVM. We will talk about how Dremio implemented query processing logic in LLVM for some operators like FILTER and PROJECT. We will also discuss the performance benefits of LLVM-based vectorized query execution over other methods.
Speaker
Siddharth Teotia, Dremio, Software Engineer
Cosco: An Efficient Facebook-Scale Shuffle ServiceDatabricks
Cosco is an efficient shuffle-as-a-service that powers Spark (and Hive) jobs at Facebook warehouse scale. It is implemented as a scalable, reliable and maintainable distributed system. Cosco is based on the idea of partial in-memory aggregation across a shared pool of distributed memory. This provides vastly improved efficiency in disk usage compared to Spark's built-in shuffle. Long term, we believe the Cosco architecture will be key to efficiently supporting jobs at ever larger scale. In this talk we'll take a deep dive into the Cosco architecture and describe how it's deployed at Facebook. We will then describe how it's integrated to run shuffle for Spark, and contrast it with Spark's built-in sort-based shuffle mechanism and SOS (presented at Spark+AI Summit 2018).
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...Databricks
In Big Data field, Spark SQL is important data processing module for Apache Spark to work with structured row-based data in a majority of operators. Field-programmable gate array(FPGA) with highly customized intellectual property(IP) can not only bring better performance but also lower power consumption to accelerate CPU-intensive segments for an application.
Slide deck from the talk I gave at the conference
1. QuickView for test data-> quick feedback
2. Automating everything on top of open-source tools like Selenium, Appium, jenkins etc.
3. Pain points and work arounds specially for Mobile App Test Automation
Agile India 2017 Conference is Asia's Largest and Premier Conference on Agile, Scrum, eXtreme Programming, Lean, Kanban, DevOps, Enterprise Agile, Lean Startup, Continuous Delivery, Research and Patterns. Get to meet pioneers and expert practitioners from around the world on Agile Mindset, Scaling Agility, Lean Product Discovery, Continuous Delivery and DevOps. 6 - 12 March 2017 at ITC Gardenia, Bangalore. More details: http://2017.agileindia.org
Agile Metrics : Velocity is NOT the Goal - NDC Oslo 2014Doc Norton
Velocity is one of the most common metrics used-and one of the most commonly misused-on agile projects. Velocity is simply a measurement of speed in a given direction-the rate at which a team is delivering toward a product release. As with a vehicle en route to a particular destination, increasing the speed may appear to ensure a timely arrival. However, that assumption is dangerous because it ignores the risks with higher speeds. And while it’s easy to increase a vehicle’s speed, where exactly is the accelerator on a software team?
Michael “Doc" Norton walks us through the Hawthorne Effect and Goodhart’s Law to explain why setting goals for velocity can actually hurt a project's chances. Take a look at what can negatively impact velocity, ways to stabilize fluctuating velocity, and methods to improve velocity without the risks. Leave with a toolkit of additional metrics that, coupled with velocity, give a better view of the project's overall health.
Diffy : Automatic Testing of Microservices @ TwitterPuneet Khanduri
Agile development has become a norm nowadays. Though it fosters faster product development cycles, it often results in a higher number of functional and/or performance regressions. In an SOA setting such as Twitter, such regressions may cascade from one service to one or more services. Detecting such regressions manually is not practically feasible in light of the hundreds of services and tens of thousands of metrics each service collects. To this end, we developed a novel tool called Diffy to automatically detect such regressions.
The key highlights of the talk are the following:
A simple yet effective approach for detecting functional regressions. False positives are minimized via statistical analysis of metrics obtained from a tuple <primary,> of nodes, where the same traffic is sent to each node.
An ensemble approach to performance regression. The need for an ensemble of classifiers stemmed from the multifaceted characteristics of the performance data. In order to minimize the impact of variability of hardware performance across nodes, we used two clusters – instead of a tuple of nodes – corresponding to the release candidate and production code. The approach is robust against the presence of anomalies in the performance data.
The proposed techniques work well with minute data. Diffy has been in use in production by multiple services at Twitter, and has been baked into the continuous build process so as to actively detect functional and/or performance regressions.
We shall take the audience through how the techniques are being used at Twitter with REAL data.
The world as we know it is growing more complex. As we automate away those things that can be easily repeated, we leave ourselves with ever more challenging work. The way we've worked in the past won't necessarily work for today's problems¦ or will it? Join Diane and Doc as they explore dimensions of complexity in software development and look at how teams and leaders might adjust their behaviors (and the software they create) based on the complexity of the problem at hand.
This hands-on, interactive workshop will provide a practical introduction to Cynefin (a sense-making framework for complexity) and show how it applies to the work we do every day as creators of software. You'll map your own work to Cynefin and learn about applicable management styles and optimal team interactions for each of the Cynefin contexts.
Autonomy, Connection, and Excellence; The Building Blocks of a DevOps CultureDoc Norton
DevOps, to a great extent, is about people working together. Without true cross-discipline collaboration, the full value of DevOps cannot be realized.
But you can’t just mandate collaboration. Many organizations do more than separate developers and operations, they design systems, metrics, and rewards that make the two seem like natural enemies. “They don’t get it.”, you hear from both sides.
In this talk, Doc will take a look at what motivates teams, how our systems produce the exact results we design them to produce, and how we can use simple (but not necessarily easy) techniques to counter years of “us versus them” conditioning.
Continuous Delivery Sounds Great but it Won't Work HereJez Humble
Since the Continuous Delivery book came out in 2010, it's gone from being a controversial idea to a commonplace... until you consider that many people who say they are doing it aren't really, and there are still plenty of places that consider it crazy talk.
In this session Jez will present some of the highlights and lowlights of the past six years listening to people explain why continuous delivery won't work, and what he learned in the process.
Technical Debt has become a catch-all phrase for any code that needs to be re-worked. Much like Refactoring has become a catch-all phrase for any activity that involves changing code.
These fundamental misunderstandings and comfortable yet mis-applied metaphors have resulted in a plethora of poor decisions.
What is technical debt?
What is not technical debt?
Why should we care?
What is the cost of misunderstanding?
What's Measured Improves: Metrics that matterRaj Indugula
“Every line is the perfect length if you don't measure it.”
- Marty Rubin
So your organization has embarked upon a transformation to be more nimble and responsive by employing the latest tools and thinking in the Agile and DevOps arena. In this transformational context, how do you know that your initiatives are effective? Empirical measurements should provide insights on business value flow and delivery efficiency, allowing teams and organizations to see how they are progressing toward achieving their goals, but all too often we find ourselves mired in measurement traps that don't quite provide the right guidance in steering our efforts.
Rooted in contemporary thinking and tested in practice, this talk explores the principles of good measurement, what to measure, what not to measure, and enumerates some key metrics to help guide and inform our Agile and DevOps efforts. If done right, metrics can present a true picture of performance, and any progression, digression of these metrics can drive learning and improvement.
Building Blocks of a Knowledge Work Culture - NDC London 2016Doc Norton
Re-designed presentation on Autonomy, Connection, Excellence, and Diversity. This version shows a bit more about the management styles appropriate in different domains of complexity, connects knowledge work to Complicated and Complex, and then walks through the Building Blocks.
Technical Debt has become a catch-all phrase for any code that needs to be re-worked. Much like Refactoring has become a catch-all phrase for any activity that involves changing code. These fundamental misunderstandings and comfortable yet mis-applied metaphors have resulted in a plethora of poor decisions. What is technical debt? What is not technical debt? Why should we care? What is the cost of misunderstanding? What do we do about it? Doc discusses the origins of the metaphor, what it means today, and how we properly identify and manage technical debt.
Java Puzzlers NG S02: Down the Rabbit Hole as presented at DevNexus 2017Baruch Sadogursky
Moar puzzlers! The more we work with Java 8, the more we go into the rabbit hole. Did they add all those streams, lambdas, monads, Optionals and CompletableFutures only to confuse us? It surely looks so! And Java 9 that heads our way brings even more of what we like the most, more puzzlers, of course! In this season we as usual have a great batch of the best Java WTF, great jokes to present them and great prizes for the winners!
Among the traits that distinguish a good team from a great team is their ability to innovate. Despite the rhetoric in favor of innovation, most organizations are stuck in an implementation mindset, stifling creativity, excellence, and the resultant innovation. The experimentation mindset frees us from self-imposed constraints, allowing us to continually learn and improve. In this session, we'll talk about how we learn as individuals and how we learn as organizations. We'll take a look at some examples of the experimentation mindset happening in the agile community today and we'll talk about how you can foster such a mindset in your own organization.
SDEC 2014 Keynote - Among the traits that distinguish a good team from a great team is their ability to innovate. And despite the rhetoric in favor of innovation, most organizations are stuck in an implementation mindset, stifling creativity, excellence, and the resultant innovation. The experimentation mindset frees us from self-imposed constraints, allowing us to continually learn and improve.
Switching horses midstream - From Waterfall to AgileDoc Norton
You’ve been working for several months on a key software initiative for the company and leadership has decided they want it faster than projected, so the team has been told they’re getting “the agile” installed next week.
“Great.”, you think, “Right in the middle of the project. Nothing like changing horses in midstream. One way or another, this will go swimmingly.”
Sarcasm and puns aside, you’ve got a point. It isn’t easy to switch methodologies in the middle of a project. Doc shares some stories from his own experiences helping teams make this change and provides a few pointers that can help you do the same.
While this talk is focused on testing, it involves the whole team, as agile methods usually do.
Even high functioning teams occasionally have a hard time making decisions or coming up with creative ideas. There are times when the conversation seems to drag on long after a decision is reached. There are times when we have too many people involved in the discussion or the wrong people involved. There are times when we’re not sure whose the actual decision maker. And there are those times when we just seem to be out of synch with each other. This creative collaboration workshop provides tools that help resolve all of these issues.
Building a private CI/CD pipeline with Java and Docker in the Cloud as presen...Baruch Sadogursky
A private Java (Maven or Gradle) repository as a service can be setup in the cloud. A private Docker registry as a service can be easily setup in the cloud. But what if you want to build a holistic CI/CD pipeline, and on the cloud of YOUR choice?
In this talk Baruch will take you through steps of setting up a universal artifact repository, which can serve for both Java and Docker. You’ll learn how to build a CI/CD pipeline with traceable metadata from the Java source files all the way to Docker images. Amazon, Azure, and Google Cloud (do you have setup that works on these?) will be used as an example although the recipes shown would be applicable to other cloud as well.
8 Things That Make Continuous Delivery Go NutsEduards Sizovs
Continuous Delivery is still trendy and everyone wants to get there, but there are so many walls you have to break and nerves to spoil! In this talk Eduards will present real-world battle stories of continuous delivery adoption, 10 underlooked things that tend to go wrong and what practices can you apply in order to survive.
Love is a Contagion. Let's Start an Epidemic.
This is the slide deck to go along with the Keynote I gave at That Conference in August 2013. This talk is all about the stories. The visuals support the stories told, but do not tell them on their own. I will add details to my blog and may upload another version of the slide deck with notes.
Technical debt is something that most project teams or independent developers have to deal with – we take shortcuts to push out releases, deadlines need to be met, quick fixes slowly become the standard. In this talk, we will discuss what technical debt is, when it is acceptable and when it isn’t, and strategies for effectively managing it, both on an independent and team level.
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...Thiago de Faria
AI is the buzzword while ML is the underlying component... but when do we use ML? To solve problems that machines can find patterns without explicitly programming them to do so. But do you have a team building an ML model? How far are they from the IT team? Do they know how to deploy and serve that? Testing? And sharing what they have done? That's where a devops mindset comes in: reduce the batch size, continuous-everything and a culture of failure/experimentation are vital for your data team! In the end, I will show how the workflow of a data scientist can be in real life with a live demo!
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...Codemotion
AI is the buzzword while ML is the underlying component... but when do we use ML? To solve problems that machines can find patterns without explicitly programming them to do so. But do you have a team building an ML model? How far are they from the IT team? Do they know how to deploy and serve that? Testing? And sharing what they have done? That's where a devops mindset comes in: reduce the batch size, continuous-everything and a culture of failure/experimentation are vital for your data team! In the end, I will show how the workflow of a data scientist can be on the real life with a live demo!
Whether we know it or not, every time we deliver a feature we also deliver technical debt. This debt remains largely invisible, it isn't tracked, it isn't visible on our information radiators and we very seldom tell our clients about it. The closest we come to acknowledging technical debt is bugs, mainly because our users tell us there is a problem, so we can't ignore it. Technical debt shouldn't be invisible, it's as real as the features we deliver and we should start treating it so. In this talk I propose a technical debt model which can be used when identifying technical debt, furthermore I propose a low friction technique for integrating technical debt into your current SDLC process.
The Technical Debt Trap - Michael "Doc" NortonLeanDog
Technical Debt has become a catch-all phrase for any code that needs to be re-worked. Much like Refactoring has become a catch-all phrase for any activity that involves changing code.
These fundamental misunderstandings and comfortable yet mis-applied metaphors have resulted in a plethora of poor decisions.
What is technical debt?
What is not technical debt?
Why should we care?
What is the cost of misunderstanding?
What do we do about it?
My closing talk for this year's Fronteers conference in Amsterdam, the Netherlands about just how cool it is to be someone who builds things for the web.
DevDay 2013 - Building Startups and Minimum Viable ProductsBen Hall
DevDay (http://devday.pl),
20th of September 2013, Kraków
Video at http://www.youtube.com/watch?v=L4eTOvq2WmM&feature=c4-overview-vl&list=PLBMFXMTB7U74NdDghygvBaDcp67owVUUF
Beyond Technical Debt: Unconventional techniques to uncover technical and soc...Juraj Martinka
* Traditional static code analysis tools (Sonar et al.) are notoriously bad at producing actionable insights (what does the "4000 years of accumulated technical debt" really mean?)
* CodeScene takes a different approach and focuses instead on how code evolves over time using a treasure trove we all posses - version control history.
* Come and meet CodeScene, a unique behavioral code analysis tool that looks for patterns in version control data to understand the history and evolution of a codebase: unraveling things like hotspots, change coupling between modules and interesting social aspects of the code.
* I'll describe the ideas behind CodeScene, how it works and demonstrate the techniques on analyses of real projects.
OSDC 2019 | Feature Branching considered Evil by Thierry de PauwNETWAYS
With DVCSs, branch creation became very easy, but it comes at a certain cost. Long living branches break the flow of the software delivery process, impacting stability and throughput. The session explores why teams are using feature branches, what problems are introduced by using them and what techniques exist to avoid them altogether. It explores exactly what’s evil about feature branches, which is not necessarily the problems they introduce – but rather, the real reasons why teams are using them. After the session, you’ll understand a different branching strategy and how it relates to CI/CD.
To Make Your Chatbot Smart, You Need to Feed It Right: How to Write for Chatb...LavaConConference
Chatbots are becoming an increasingly popular delivery channel for many types of content, including customer support, marketing, and pre-sales. To make chatbots scalable, helpful, and smoothly integrated into the content ecosystem of your organization, you need to feed the chatbots with the right content prepared in the right way.
In this workshop, you’ll learn how to write content for chatbots in a way that lets the chatbots find the relevant content and precisely match it to the user’s request.
In this session, attendee’s will learn:
How chatbots work: approaches to recognizing user’s intent, handling incomplete requests, and finding matching content
How the content needs to be organized, structured, and written to be discoverable by the chatbot
How to not create an isolated, chatbot-specific content that would be available just for the chatbot
What makes content undiscoverable by the chatbot and makes the chatbot to give wrong or irrelevant answers
How to handle situations when the chatbot is unsure which content should be provided to the user
How to handle content variations (for example, product- or audience- specific)
Achieving Technical Excellence in Your Software Teams - from Devternity Peter Gfader
Our industry has a problem: We are not lacking software methodologies, programming languages, tools or frameworks but we need great software engineers.
Great software engineer teams build quality-in and deliver great software on a regular basis. The technical excellence of those engineers will help you escape the "Waterfall sandwich" and make your organization a little more agile, from the inception of an idea till they go live.
I will talk about my experiences from the last 15 years, including small software delivery teams until big financial institutions.
Why would a company like to be "agile"?
How can a company achieve that?
How can you achieve Technical Excellence in your software teams?
What developer skills are more important than languages, methods or frameworks?
This will be an interactive session with a Q&A at the end.
Braintree SDK v.zero or "A payment gateway walks into a bar..." - Devfest Nan...Alberto López Martín
Presentation of the talk given at DevFest Nantes 2014, speaking about new SDK of Braintree (a PayPal company): vZero, but also talking about the best practices for having a great experience during the payment flow using Android devices
Agile and Beyond 2017 Presentation on Tuckman's Theory of Team Development. This theory was based on non-scientifically gathered surveys and has never been empirically proven despite dozens of scientific attempts. This talk covers why stable teams may have been a good thing and why we want to consider dynamic teams as we face new challenges.
From NDC Oslo 2015 - Workshop with Denise Jacobs, Doc Norton, and Carl Smith
Even high functioning teams occasionally have a hard time making decisions or coming up with creative ideas. There are times when the conversation seems to drag on long after a decision is reached. There are times when we have too many people involved in the discussion or the wrong people involved. There are times when we're not sure whose the actual decision maker. And there are those times when we just seem to be out of synch with each other. This creative collaboration workshop provides tools that help resolve all of these issues. Come have some laughs with Denise, Doc, and Carl, play with new friends, and learn one or two new techniques you can try at home.
How does the common cold spread through a group of friends or co-workers. What about other contagions? Can a contagion be used for good? Doc explores how things like disease, politics, and even moods travel through (meat-space) social networks. What impact do we have on others? What impact do they have on us? And what does this mean for members of the software development community?
Teamwork ain’t always easy. From meetings where everybody has something to say but nothing gets done to poor decisions being made because the most senior or most forceful team member won the argument; sometimes you long for the days of high-walled cubicles and lone ranger coding. Long no more.
In this workshop, you will learn a few simple techniques that drastically improve a team’s ability to work together toward common goals with less conflict and more genuine collaboration.
Creating a Global Engineering Culture - Agile india 2014Doc Norton
A short (and incomplete) telling of how we got to where we are as an engineering organization for Groupon. A little philosophy about what motivates individuals and teams. And finally a little bit about what we're doing at Groupon.
When it comes to creating a global culture, remember that you are more archeologist than architect. Uncover the good that is already happening and help to share it rather than trying to design something new.
Agile Metrics: Velocity is NOT the Goal - Agile 2013 versionDoc Norton
A newly formatted version of "Velocity is NOT the Goal" for Agile 2013. I've removed some details about standard deviation, added a few more thoughts around the "psychology" of setting targets for metrics, and show a bit more about how we do this at Groupon.
Code PaLOUsa rendition of Velocity is NOT the Goal.
Velocity is one of the most common metrics used—and one of the most commonly misused—on agile projects. Velocity is simply a measurement of speed in a given direction—the rate at which a team is delivering toward a product release. As with a vehicle en route to a particular destination, increasing the speed may appear to ensure a timely arrival. However, that assumption is dangerous because it ignores the risks with higher speeds. And while it’s easy to increase a vehicle’s speed, where exactly is the accelerator on a software team? Michael “Doc" Norton walks us through the Hawthorne Effect and Goodhart’s Law to explain why setting goals for velocity can actually hurt a project's chances. Take a look at what can negatively impact velocity, ways to stabilize fluctuating velocity, and methods to improve velocity without the risks. Leave with a toolkit of additional metrics that, coupled with velocity, give a better view of the project's overall health.
Teamwork ain’t always easy. From meetings where everybody has something to say but nothing gets done to poor decisions being made because the most senior or most forceful team member won the argument; sometimes you long for the days of high-walled cubicles and lone ranger coding. Long no more.
In this workshop, you will learn about two simple techniques that drastically improve a team’s ability to work together toward common goals with less conflict and more genuine collaboration.
Updated version of this talk as presented at Pacific Northwest Software Quality Conference in 2012. This is a longer version, including content on scatter diagrams and standard deviation.
Growing Into Excellence talk given at the Pacific Northwest Software Quality Conference. This is yet another version on this theme. A little more on Collaboration 8 and Six Thinking Hats along with new material from Dave Hoover on stretching into incompetence.
Velocity is one of the most common metrics used—and one of the most commonly misused—on agile projects. Velocity is simply a measurement of speed in a given direction—the rate at which a team is delivering toward a product release. As with a vehicle en route to a particular destination, increasing the speed may appear to ensure a timely arrival. However, that assumption is dangerous because it ignores the risks with higher speeds. And while it’s easy to increase a vehicle’s speed, where exactly is the accelerator on a software team? This presentation covers Hawthorne Effect and Goodhart’s Law to explain why setting goals for velocity can actually hurt a project's chances. Take a look at what can negatively impact velocity, ways to stabilize fluctuating velocity, and methods to improve velocity without the risks.
Simple slide deck for a talk around velocity, how it should be used on a project, the common mistakes we make, why setting targets for velocity is a poor practice, and how to fix "bad" velocity.
CodeStock :: Introduction To MacRuby and HotCocoaDoc Norton
MacRuby is an implementation of Ruby 1.9 directly on top of Mac OS X core technologies. HotCocoa is a thin, idiomatic Ruby layer that sits above Cocoa and other frameworks. Together, they make building Mac applications pain free (for Rubyists). Doc shows us how to get started and then walks us through the creation of a simple app.
I continue to tweak this deck and presentation based on feedback from the audience. I had a very good discussion with Martin Fowler about this one and I've made adjustments based on that discussion.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Your Digital Assistant.
Making complex approach simple. Straightforward process saves time. No more waiting to connect with people that matter to you. Safety first is not a cliché - Securely protect information in cloud storage to prevent any third party from accessing data.
Would you rather make your visitors feel burdened by making them wait? Or choose VizMan for a stress-free experience? VizMan is an automated visitor management system that works for any industries not limited to factories, societies, government institutes, and warehouses. A new age contactless way of logging information of visitors, employees, packages, and vehicles. VizMan is a digital logbook so it deters unnecessary use of paper or space since there is no requirement of bundles of registers that is left to collect dust in a corner of a room. Visitor’s essential details, helps in scheduling meetings for visitors and employees, and assists in supervising the attendance of the employees. With VizMan, visitors don’t need to wait for hours in long queues. VizMan handles visitors with the value they deserve because we know time is important to you.
Feasible Features
One Subscription, Four Modules – Admin, Employee, Receptionist, and Gatekeeper ensures confidentiality and prevents data from being manipulated
User Friendly – can be easily used on Android, iOS, and Web Interface
Multiple Accessibility – Log in through any device from any place at any time
One app for all industries – a Visitor Management System that works for any organisation.
Stress-free Sign-up
Visitor is registered and checked-in by the Receptionist
Host gets a notification, where they opt to Approve the meeting
Host notifies the Receptionist of the end of the meeting
Visitor is checked-out by the Receptionist
Host enters notes and remarks of the meeting
Customizable Components
Scheduling Meetings – Host can invite visitors for meetings and also approve, reject and reschedule meetings
Single/Bulk invites – Invitations can be sent individually to a visitor or collectively to many visitors
VIP Visitors – Additional security of data for VIP visitors to avoid misuse of information
Courier Management – Keeps a check on deliveries like commodities being delivered in and out of establishments
Alerts & Notifications – Get notified on SMS, email, and application
Parking Management – Manage availability of parking space
Individual log-in – Every user has their own log-in id
Visitor/Meeting Analytics – Evaluate notes and remarks of the meeting stored in the system
Visitor Management System is a secure and user friendly database manager that records, filters, tracks the visitors to your organization.
"Secure Your Premises with VizMan (VMS) – Get It Now"
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
Worried about document security while sharing them in Salesforce? Fret no more! Here are the top-notch security standards XfilesPro upholds to ensure strong security for your Salesforce documents while sharing with internal or external people.
To learn more, read the blog: https://www.xfilespro.com/how-does-xfilespro-make-document-sharing-secure-and-seamless-in-salesforce/
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
5. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
Shipping first time code is like
going into debt. A little debt
speeds development so long as
it is paid back promptly with a
rewrite.
– Ward Cunningham
OOPSLA ‘92
6. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
Shipping first time code is like
going into debt. A little debt
speeds development so long as
it is paid back promptly with a
rewrite.
The danger occurs when the
debt is not repaid. Every minute
spent on not-quite-right code
counts as interest on that debt.
– Ward Cunningham
OOPSLA ‘92
7. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
The danger occurs
when the debt is not
repaid. Every minute
spent on not-quite-
right code counts as
interest on that debt.
– Ward Cunningham
OOPSLA ‘92
8. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
The danger occurs
when the debt is not
repaid. Every minute
spent on not-quite-
right code counts as
interest on that debt.
– Ward Cunningham
OOPSLA ‘92
18. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
metaphors rock
we reason by analogy
Building on a
weak foundation
Puts pressure on our design
Can’t keep running at this pace
It’s raining men
(hallelujah)
19. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
metaphorphosis
when metaphors go wrong
Credit CardSHORT-TERM
Long-Term
Return on
Investment
OPERATIONALHOME LOAN
Loan Shark
PRAGMATIC
LEVERAGE
Intentional AUTO LOAN
FRAUDULENT
VOLUNTARY
PYRAMID SCHEME
Prudent
Student Loan
HIGH INTEREST
INADVERTENT
RECKLESS DEBT IN
THE THIRD
QUADRANT
20. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
metaphorphosis
when metaphors go wrong
“quick and dirty”
- Martin Fowler
“sloppy”
- David Laribee
“just hack it in”
- Steve McConnell
“CUT A LOT OF CORNERS”
- James Shore
21. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
technical debt quadrant
Reckless
Inadvertent
Prudent
Deliberate
“We don’t have time
for design”
“We must ship now and
deal with
consequences”
“Now we know how we
should have done it”
“What’s Layering?”
22. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
– Ward Cunningham
Youtube ‘09
[Many] have explained
the debt metaphor and
confused it with the
idea that you could
write code poorly with
the intention of doing a
good job later.
23. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
– Ward Cunningham
Youtube ‘09
confused the
debt metaphor
with the idea that
you could write
code poorly
24. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
– Ward Cunningham
Youtube ‘09
The ability to pay back
debt [...] depends upon
you writing code that is
clean enough to be
able to refactor as you
come to understand
your problem.
25. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
– Ward Cunningham
Youtube ‘09
The ability to pay back
debt [...] depends upon
you writing code that is
clean enough to be
able to refactor as you
come to understand
your problem.
26. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
– Ward Cunningham
twitter ‘09
Dirty code is to
technical debt as the
pawn broker is to
financial debt.
27. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
– Ward Cunningham
twitter ‘09
Dirty code is to
technical debt as the
pawn broker is to
financial debt.
Don’t think you are
ever going to get your
code back.
29. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
technical debt?
Ask yourself…
• Is the code clean?
• Is the code tested?
• Is there a learning objective or event?
• Is there a plan for payback?
• Is the business truly informed?
30. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
technical debt?
If you say no to even one…
• Is the code clean?
• Is the code tested?
• Is there a learning objective or event?
• Is there a plan for payback?
• Is the business truly informed?
31. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
technical debt?
If you say no to even one…
• Is the code clean?
• Is the code tested?
• Is there a learning objective or event?
• Is there a plan for payback?
• Is the business truly informed?
... then you don’t have Technical Debt
33. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
mess (noun)
• Disorderly accumulation, heap, or jumble
• A state of embarrassing confusion
• An unpleasant or difficult situation
35. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
cruft (noun)
• An unpleasant substance
• The result of shoddy construction
• Redundant, old or improperly written code
42. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
technical debt quadrant
Reckless
Inadvertent
Prudent
Deliberate
“We don’t have time
for design”
“We must ship now and
deal with
consequences”
“Now we know how we
should have done it”
“What’s Layering?”
43. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
technical debt quadrant
Reckless
Inadvertent
Prudent
Deliberate
“We don’t have time
for design”
“Let’s deploy and
gather more
information”
“Now we know how we
should have done it”
“What’s Layering?”
51. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
RECKLESS AND DELIBERATE
RECKLESS AND DELIBERATE
RECKLESS AND INADVERTENT
technical debt in other fields
52. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
technical debt quadrant
Reckless
Inadvertent
Prudent
Deliberate
“We don’t have time
for design”
“Let’s deploy and
gather more
information”
“Now we know how we
should have done it”
“What’s Layering?”
53. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
technical debt quadrant
Reckless
Inadvertent
Prudent
Deliberate
“We don’t have time
for design”
“Let’s deploy and
gather more
information”
“Now we know how we
should have done it”
“What’s Layering?”
IRRESPONSIBLE
54. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
technical debt quadrant
Reckless
Inadvertent
Prudent
Deliberate
“We don’t have time
for design”
“Let’s deploy and
gather more
information”
“Now we know how we
should have done it”
“What’s Layering?”
IRRESPONSIBLE
INCOMPETENT
55. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
technical debt quadrant
Reckless
Inadvertent
Prudent
Deliberate
“We don’t have time
for design”
“Let’s deploy and
gather more
information”
“Now we know how we
should have done it”
“What’s Layering?”
IRRESPONSIBLE
INCOMPETENT
TECHNICAL
DEBT
66. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
double getSpeed() {
switch (_type) {
case EUROPEAN:
return getBaseSpeed();
case AFRICAN:
return getBaseSpeed() - getLoadFactor() * _numberOfCoconuts;
case NORWEGIAN_BLUE:
return (_isNailed) ? 0 : getBaseSpeed(_voltage);
}
throw new RuntimeException ("Should be unreachable");
}
http://www.refactoring.com/catalog/replaceConditionalWithPolymorphism.html
cruft or debt?
67. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
class Swallow ...
double getSpeed() { return getBaseSpeed(); }
end class
class EuropeanSwallow ...
end class
class AfricanSwallow ...
double getSpeed() { return super.getSpeed - coconutLoad(); }
double coconutLoad() { return getLoadFactor() * _numberOfCoconuts; }
end class
class NorwegianSwallow ...
double getSpeed() { return (_isNailed) ? 0 : getBaseSpeed(_voltage); }
end class
69. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
cruft is a bad decision
• You are a professional developer
• You’re going to create unintentional cruft
• You have to clean up the existing cruft
71. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
The Trap
• Precedent for speed over quality
• Expectation of increased velocity
• Cruft slows you down
• Must write more cruft to keep up
• Ask permission to do your job correctly
76. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
winning strategy
Clean Constantly
• Never make an intentional mess
• Monitor your “Technical Debt”
• Follow the Boy Scout Rule
• Remember quality is your responsibility
77. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
winning strategy
Clean Constantly
• Never make an intentional mess
• Monitor your “Technical Debt”
• Follow the Boy Scout Rule
• Remember quality is your responsibility
• NEVER ask permission to do your job correctly
81. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
review
Technical Debt
• A strategic design decision
• Requires business to be informed
• Includes a pay-back plan
Cruft
• Happens
• Needs to be monitored and cleaned
• Is NOT Technical Debt
82. #DevBootcamp / #TechnicalDebtTrap / @DocOnDev
review
Technical Debt
• A strategic design decision
• Requires business to be informed
• Includes a pay-back plan
Cruft
• Happens
• Needs to be monitored and cleaned
• Is NOT Technical Debt
NEVER
ask
perm
issio
n
to
do
yo
ur
jo
b
co
rrectly