How Southwest Airlines Uses Geode
Distributed systems and fast data require new software patterns and implementation skills. Learn how Southwest Airlines uses Apache Geode, organizes team responsibilities, and approaches design tradeoffs. Drawing inspiration from real whiteboard conversations, we’ll explore: common development pitfalls, environment capacity planning, streaming data patterns like consumer checkpointing, support roles, and production lessons learned.
Every day, Apache Geode improves how Southwest Airlines schedules nearly 4,000 flights and serves over 500,000 passengers. It’s an essential component of Southwest’s ability to reduce flight delays and support future growth.
In this session we review the design of the newly released off heap storage feature in Apache Geode, and discuss use cases and potential direction for additional capabilities of this feature.
In this session we review the design of the current capabilities of the Spring Data GemFire API that supports Geode, and explore additional use cases and future direction that the Spring API and underlying Geode support might evolve.
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & GeodePivotalOpenSourceHub
In this session we review the design of the current state of support for Apache Geode by Spring Cloud Data Flow, and explore additional use cases and future direction that Spring Cloud Data Flow and Apache Geode might evolve.
Apache Apex and Apache Geode are two of the most promising incubating open source projects. Combined, they promise to fill gaps of existing big data analytics platforms. Apache Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream and batch processing. Apex is highly scalable, performant, fault tolerant, and strong in operability. Apache Geode provides a database-like consistency model, reliable transaction processing and a shared-nothing architecture to maintain very low latency performance with high concurrency processing. We will also look at some use cases where how these two projects can be used together to form distributed, fault tolerant, reliable in memory data processing layer.
#GeodeSummit - Where Does Geode Fit in Modern System ArchitecturesPivotalOpenSourceHub
In this talk, Eitan Suez explores the question: Where does Geode fit in an organization's system architecture? Geode is a unique and feature-rich product that perhaps hasn't seen as much adoption as it deserves. Today's apps are no longer the straightforward, database-backed web applications we used to build a few years ago. Applications have become more sophisticated, as they've had to meet the need to scale, to be reliable, fault-tolerant, and to integrate with other systems. In this talk, Eitan will suggest one particular fit for Geode in the context of a CQRS architecture, and welcomes you to attend, and to contribute by sharing how you've put Geode to use in your organization.
Motivation and goals for off-heap storage
Off-heap features and usage
Implementation overview
Preliminary benchmarks: off-heap vs. heap
Tips and best practices
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...PivotalOpenSourceHub
The financial sector is an exciting mix of challenges regarding throughput, high availability as well as specific constraints regarding latency and consistency. In the continuous evolution of its platform, Murex relies on open source technologies like Apache Geode and Apache Storm in a "kind of" lambda architecture to ensure storage, near-real time (around the milliseconds) aggregation of thousands of events per second, advanced notification mechanisms and on-demand deployments. This talk will focus on the technical architecture, the underlying principles as well as the technologies used to support this mix of functional and non-functional requirements.
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"PivotalOpenSourceHub
Keynote at Geode Summit 2016 by Dr. Justin Erenkrantz, Bloolmberg LP. Creating the Future of Big Data Through "The Apache Way" and why this matters to the community
In this session we review the design of the newly released off heap storage feature in Apache Geode, and discuss use cases and potential direction for additional capabilities of this feature.
In this session we review the design of the current capabilities of the Spring Data GemFire API that supports Geode, and explore additional use cases and future direction that the Spring API and underlying Geode support might evolve.
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & GeodePivotalOpenSourceHub
In this session we review the design of the current state of support for Apache Geode by Spring Cloud Data Flow, and explore additional use cases and future direction that Spring Cloud Data Flow and Apache Geode might evolve.
Apache Apex and Apache Geode are two of the most promising incubating open source projects. Combined, they promise to fill gaps of existing big data analytics platforms. Apache Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream and batch processing. Apex is highly scalable, performant, fault tolerant, and strong in operability. Apache Geode provides a database-like consistency model, reliable transaction processing and a shared-nothing architecture to maintain very low latency performance with high concurrency processing. We will also look at some use cases where how these two projects can be used together to form distributed, fault tolerant, reliable in memory data processing layer.
#GeodeSummit - Where Does Geode Fit in Modern System ArchitecturesPivotalOpenSourceHub
In this talk, Eitan Suez explores the question: Where does Geode fit in an organization's system architecture? Geode is a unique and feature-rich product that perhaps hasn't seen as much adoption as it deserves. Today's apps are no longer the straightforward, database-backed web applications we used to build a few years ago. Applications have become more sophisticated, as they've had to meet the need to scale, to be reliable, fault-tolerant, and to integrate with other systems. In this talk, Eitan will suggest one particular fit for Geode in the context of a CQRS architecture, and welcomes you to attend, and to contribute by sharing how you've put Geode to use in your organization.
Motivation and goals for off-heap storage
Off-heap features and usage
Implementation overview
Preliminary benchmarks: off-heap vs. heap
Tips and best practices
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...PivotalOpenSourceHub
The financial sector is an exciting mix of challenges regarding throughput, high availability as well as specific constraints regarding latency and consistency. In the continuous evolution of its platform, Murex relies on open source technologies like Apache Geode and Apache Storm in a "kind of" lambda architecture to ensure storage, near-real time (around the milliseconds) aggregation of thousands of events per second, advanced notification mechanisms and on-demand deployments. This talk will focus on the technical architecture, the underlying principles as well as the technologies used to support this mix of functional and non-functional requirements.
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"PivotalOpenSourceHub
Keynote at Geode Summit 2016 by Dr. Justin Erenkrantz, Bloolmberg LP. Creating the Future of Big Data Through "The Apache Way" and why this matters to the community
In this session we review the design of the current capabilities of a partially completed feature in Apache Geode - the ability to act as a backend for Redis client applications. We’ll explore potential use cases and future direction that this capability might evolve.
Low latency high throughput streaming using Apache Apex and Apache KuduDataWorks Summit
True streaming is fast becoming a necessity for many business use cases. On the other hand the data set sizes and volumes are also growing exponentially compounding the complexity of data processing pipelines.There exists a need for true low latency streaming coupled with very high throughput data processing. Apache Apex as a low latency and high throughput data processing framework and Apache Kudu as a high throughput store form a nice combination which solves this pattern very efficiently.
This session will walk through a use case which involves writing a high throughput stream using Apache Kafka,Apache Apex and Apache Kudu. The session will start with a general overview of Apache Apex and capabilities of Apex that form the foundation for a low latency and high throughput engine with Apache kafka being an example input source of streams. Subsequently we walk through Kudu integration with Apex by walking through various patterns like end to end exactly once, selective column writes and timestamp propagations for out of band data. The session will also cover additional patterns that this integration will cover for enterprise level data processing pipelines.
The session will conclude with some metrics for latency and throughput numbers for the use case that is presented.
Speaker
Ananth Gundabattula, Senior Architect, Commonwealth Bank of Australia
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon
Zen is a storage service built at Pinterest that offers a graph data model of top of HBase and potentially other storage backends. In this talk, Zen's architects go over the design motivation for Zen and describe its internals including the API, type system, and HBase backend.
An introduction into Spark ML plus how to go beyond when you get stuckData Con LA
Abstract:-
This talk will introduce Spark new machine learning frame work (Spark ML) and how to train basic models with it. A companion Jupyter notebook for people to follow along with will be provided. Once we've got the basics down we'll look at what to do when we find we need more than the tools available in Spark ML (and I'll try and convince people to contribute to my latest side project -- Sparkling ML).
Bio:-
Holden Karau is a transgender Canadian, Apache Spark committer, an active open source contributor, and coauthor of Learning Spark and High Performance Spark. When not in San Francisco working as a software development engineer at IBM’s Spark Technology Center, Holden speaks internationally about Spark and holds office hours at coffee shops at home and abroad. She makes frequent contributions to Spark, specializing in PySpark and machine learning. Prior to IBM, she worked on a variety of distributed, search, and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. She holds a bachelor of mathematics in computer science from the University of Waterloo. Outside of computers she enjoys scootering and playing with fire.
Building Efficient Pipelines in Apache SparkJeremy Beard
Presented at NYC Women in Machine Learning & Data Science, May 31 2017.
Apache Spark is a powerful engine for processing large data sets, but it is not always obvious how to develop pipelines that run efficiently. This presentation will cover the most helpful tips and tricks that you can use to get the performance you want from your Spark applications.
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
Splice Machine is an open-source database that combines the benefits of modern lambda architectures with the full expressiveness of ANSI-SQL. Like lambda architectures, it employs separate compute engines for different workloads - some call this an HTAP database (Hybrid Transactional and Analytical Platform). This talk describes the architecture and implementation of Splice Machine V2.0. The system is powered by a sharded key-value store for fast short reads and writes, and short range scans (Apache HBase) and an in-memory, cluster data flow engine for analytics (Apache Spark). It differs from most other clustered SQL systems such as Impala, SparkSQL, and Hive because it combines analytical processing with a distributed Multi-Value Concurrency Method that provides fine-grained concurrency which is required to power real-time applications. This talk will highlight the Splice Machine storage representation, transaction engine, cost-based optimizer, and present the detailed execution of operational queries on HBase, and the detailed execution of analytical queries on Spark. We will compare and contrast how Splice Machine executes queries with other HTAP systems such as Apache Phoenix and Apache Trafodian. We will end with some roadmap items under development involving new row-based and column-based storage encodings.
Speakers:
Monte Zweben, is a technology industry veteran. Monte’s early career was spent with the NASA Ames Research Center as the Deputy Chief of the Artificial Intelligence Branch, where he won the prestigious Space Act Award for his work on the Space Shuttle program. He then founded and was the Chairman and CEO of Red Pepper Software, a leading supply chain optimization company, which merged in 1996 with PeopleSoft, where he was VP and General Manager, Manufacturing Business Unit. In 1998, he was the founder and CEO of Blue Martini Software – the leader in e-commerce and multi-channel systems for retailers. Blue Martini went public on NASDAQ in one of the most successful IPOs of 2000, and is now part of JDA. Following Blue Martini, he was the chairman of SeeSaw Networks, a digital, place-based media company. Monte is also the co-author of Intelligent Scheduling and has published articles in the Harvard Business Review and various computer science journals and conference proceedings. He currently serves on the Board of Directors of Rocket Fuel Inc. as well as the Dean’s Advisory Board for Carnegie-Mellon’s School of Computer Science.
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
John Leach Co-Founder and CTO of Splice Machine with 15+ years software development and machine learning experience will discuss how to use HBase co-processors to build an ANSI-99 SQL database with 1) parallelization of SQL execution plans, 2) ACID transactions with snapshot isolation and 3) consistent secondary indexing.
Transactions are critical in traditional RDBMSs because they ensure reliable updates across multiple rows and tables. Most operational applications require transactions, but even analytics systems use transactions to reliably update secondary indexes after a record insert or update.
In the Hadoop ecosystem, HBase is a key-value store with real-time updates, but it does not have multi-row, multi-table transactions, secondary indexes or a robust query language like SQL. Combining SQL with a full transactional model over HBase opens a whole new set of OLTP and OLAP use cases for Hadoop that was traditionally reserved for RDBMSs like MySQL or Oracle. However, a transactional HBase system has the advantage of scaling out with commodity servers, leading to a 5x-10x cost savings over traditional databases like MySQL or Oracle.
HBase co-processors, introduced in release 0.92, provide a flexible and high-performance framework to extend HBase. In this talk, we show how we used HBase co-processors to support a full ANSI SQL RDBMS without modifying the core HBase source. We will discuss how endpoint transactions are used to serialize SQL execution plans over to regions so that computation is local to where the data is stored. Additionally, we will show how observer co-processors simultaneously support both transactions and secondary indexing.
The talk will also discuss how Splice Machine extended the work of Google Percolator, Yahoo Labs’ OMID, and the University of Waterloo on distributed snapshot isolation for transactions. Lastly, performance benchmarks will be provided, including full TPC-C and TPC-H results that show how Hadoop/HBase can be a replacement of traditional RDBMS solutions.
To view the accompanying slide deck: http://www.slideshare.net/ChicagoHUG/
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...Yahoo Developer Network
Monte Zweben Co-Founder and CEO of Splice Machine, will discuss how to use HBase co-processors to build an ANSI-99 SQL database with 1) parallelization of SQL execution plans, 2) ACID transactions with snapshot isolation and 3) consistent secondary indexing.
Transactions are critical in traditional RDBMSs because they ensure reliable updates across multiple rows and tables. Most operational applications require transactions, but even analytics systems use transactions to reliably update secondary indexes after a record insert or update.
In the Hadoop ecosystem, HBase is a key-value store with real-time updates, but it does not have multi-row, multi-table transactions, secondary indexes or a robust query language like SQL. Combining SQL with a full transactional model over HBase opens a whole new set of OLTP and OLAP use cases for Hadoop that was traditionally reserved for RDBMSs like MySQL or Oracle. However, a transactional HBase system has the advantage of scaling out with commodity servers, leading to a 5x-10x cost savings over traditional databases like MySQL or Oracle.
HBase co-processors, introduced in release 0.92, provide a flexible and high-performance framework to extend HBase. In this talk, we show how we used HBase co-processors to support a full ANSI SQL RDBMS without modifying the core HBase source. We will discuss how endpoint transactions are used to serialize SQL execution plans over to regions so that computation is local to where the data is stored. Additionally, we will show how observer co-processors simultaneously support both transactions and secondary indexing.
The talk will also discuss how Splice Machine extended the work of Google Percolator, Yahoo Labs’ OMID, and the University of Waterloo on distributed snapshot isolation for transactions. Lastly, performance benchmarks will be provided, including full TPC-C and TPC-H results that show how Hadoop/HBase can be a replacement of traditional RDBMS solutions.
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
Spark had been elected, deservedly, as the main massive parallel processing framework, and HDFS is the one of the most popular Big Data storage technologies. Therefore its combination is one of the most usual Big Data’s use cases. But, what happens with the security? Can these two technologies coexist in a secure environment? Furthermore, with the proliferation of BI technologies adapted to Big Data environments, that demands that several users interacts with the same cluster concurrently, can we continue to ensure that our Big Data environments are still secure? In this lecture, Abel and Jorge will explain which adaptations of Spark´s core they had to perform in order to guarantee the security of multiple concurrent users using a single Spark cluster, which can use any of its cluster managers, without degrading the outstanding Spark’s performance.
#GeodeSummit - Wall St. Derivative Risk Solutions Using GeodePivotalOpenSourceHub
In this talk, Andre Langevin discusses how Geode forms the core of many Wall Street derivative risk solutions. By externalizing risk from trading systems, Geode-based solutions provide cross-product risk management at speeds suitable for automated hedging, while simultaneously eliminating the back office costs associated with traditional trading system based solutions.
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...PivotalOpenSourceHub
This talk introduces an open-source solution that integrates cloud native apps running on Cloud Foundry with an open-source hybrid transactions + analytics real-time solution. The architecture is based on the fastest scalable, highly available and fully consistent In-Memory Data Grid (Apache Geode / GemFire), natively integrated to the first open-source massive parallel data warehouse (Greenplum Database) in a hybrid transactional and analytical architecture that is extremely fast, horizontally scalable, highly resilient and open source. This session also features a live demo running on Cloud Foundry, showing a real case of real-time closed-loop analytics and machine learning using the featured solution.
In this session we review the design of the current capabilities of a partially completed feature in Apache Geode - the ability to act as a backend for Redis client applications. We’ll explore potential use cases and future direction that this capability might evolve.
Low latency high throughput streaming using Apache Apex and Apache KuduDataWorks Summit
True streaming is fast becoming a necessity for many business use cases. On the other hand the data set sizes and volumes are also growing exponentially compounding the complexity of data processing pipelines.There exists a need for true low latency streaming coupled with very high throughput data processing. Apache Apex as a low latency and high throughput data processing framework and Apache Kudu as a high throughput store form a nice combination which solves this pattern very efficiently.
This session will walk through a use case which involves writing a high throughput stream using Apache Kafka,Apache Apex and Apache Kudu. The session will start with a general overview of Apache Apex and capabilities of Apex that form the foundation for a low latency and high throughput engine with Apache kafka being an example input source of streams. Subsequently we walk through Kudu integration with Apex by walking through various patterns like end to end exactly once, selective column writes and timestamp propagations for out of band data. The session will also cover additional patterns that this integration will cover for enterprise level data processing pipelines.
The session will conclude with some metrics for latency and throughput numbers for the use case that is presented.
Speaker
Ananth Gundabattula, Senior Architect, Commonwealth Bank of Australia
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon
Zen is a storage service built at Pinterest that offers a graph data model of top of HBase and potentially other storage backends. In this talk, Zen's architects go over the design motivation for Zen and describe its internals including the API, type system, and HBase backend.
An introduction into Spark ML plus how to go beyond when you get stuckData Con LA
Abstract:-
This talk will introduce Spark new machine learning frame work (Spark ML) and how to train basic models with it. A companion Jupyter notebook for people to follow along with will be provided. Once we've got the basics down we'll look at what to do when we find we need more than the tools available in Spark ML (and I'll try and convince people to contribute to my latest side project -- Sparkling ML).
Bio:-
Holden Karau is a transgender Canadian, Apache Spark committer, an active open source contributor, and coauthor of Learning Spark and High Performance Spark. When not in San Francisco working as a software development engineer at IBM’s Spark Technology Center, Holden speaks internationally about Spark and holds office hours at coffee shops at home and abroad. She makes frequent contributions to Spark, specializing in PySpark and machine learning. Prior to IBM, she worked on a variety of distributed, search, and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. She holds a bachelor of mathematics in computer science from the University of Waterloo. Outside of computers she enjoys scootering and playing with fire.
Building Efficient Pipelines in Apache SparkJeremy Beard
Presented at NYC Women in Machine Learning & Data Science, May 31 2017.
Apache Spark is a powerful engine for processing large data sets, but it is not always obvious how to develop pipelines that run efficiently. This presentation will cover the most helpful tips and tricks that you can use to get the performance you want from your Spark applications.
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
Splice Machine is an open-source database that combines the benefits of modern lambda architectures with the full expressiveness of ANSI-SQL. Like lambda architectures, it employs separate compute engines for different workloads - some call this an HTAP database (Hybrid Transactional and Analytical Platform). This talk describes the architecture and implementation of Splice Machine V2.0. The system is powered by a sharded key-value store for fast short reads and writes, and short range scans (Apache HBase) and an in-memory, cluster data flow engine for analytics (Apache Spark). It differs from most other clustered SQL systems such as Impala, SparkSQL, and Hive because it combines analytical processing with a distributed Multi-Value Concurrency Method that provides fine-grained concurrency which is required to power real-time applications. This talk will highlight the Splice Machine storage representation, transaction engine, cost-based optimizer, and present the detailed execution of operational queries on HBase, and the detailed execution of analytical queries on Spark. We will compare and contrast how Splice Machine executes queries with other HTAP systems such as Apache Phoenix and Apache Trafodian. We will end with some roadmap items under development involving new row-based and column-based storage encodings.
Speakers:
Monte Zweben, is a technology industry veteran. Monte’s early career was spent with the NASA Ames Research Center as the Deputy Chief of the Artificial Intelligence Branch, where he won the prestigious Space Act Award for his work on the Space Shuttle program. He then founded and was the Chairman and CEO of Red Pepper Software, a leading supply chain optimization company, which merged in 1996 with PeopleSoft, where he was VP and General Manager, Manufacturing Business Unit. In 1998, he was the founder and CEO of Blue Martini Software – the leader in e-commerce and multi-channel systems for retailers. Blue Martini went public on NASDAQ in one of the most successful IPOs of 2000, and is now part of JDA. Following Blue Martini, he was the chairman of SeeSaw Networks, a digital, place-based media company. Monte is also the co-author of Intelligent Scheduling and has published articles in the Harvard Business Review and various computer science journals and conference proceedings. He currently serves on the Board of Directors of Rocket Fuel Inc. as well as the Dean’s Advisory Board for Carnegie-Mellon’s School of Computer Science.
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
John Leach Co-Founder and CTO of Splice Machine with 15+ years software development and machine learning experience will discuss how to use HBase co-processors to build an ANSI-99 SQL database with 1) parallelization of SQL execution plans, 2) ACID transactions with snapshot isolation and 3) consistent secondary indexing.
Transactions are critical in traditional RDBMSs because they ensure reliable updates across multiple rows and tables. Most operational applications require transactions, but even analytics systems use transactions to reliably update secondary indexes after a record insert or update.
In the Hadoop ecosystem, HBase is a key-value store with real-time updates, but it does not have multi-row, multi-table transactions, secondary indexes or a robust query language like SQL. Combining SQL with a full transactional model over HBase opens a whole new set of OLTP and OLAP use cases for Hadoop that was traditionally reserved for RDBMSs like MySQL or Oracle. However, a transactional HBase system has the advantage of scaling out with commodity servers, leading to a 5x-10x cost savings over traditional databases like MySQL or Oracle.
HBase co-processors, introduced in release 0.92, provide a flexible and high-performance framework to extend HBase. In this talk, we show how we used HBase co-processors to support a full ANSI SQL RDBMS without modifying the core HBase source. We will discuss how endpoint transactions are used to serialize SQL execution plans over to regions so that computation is local to where the data is stored. Additionally, we will show how observer co-processors simultaneously support both transactions and secondary indexing.
The talk will also discuss how Splice Machine extended the work of Google Percolator, Yahoo Labs’ OMID, and the University of Waterloo on distributed snapshot isolation for transactions. Lastly, performance benchmarks will be provided, including full TPC-C and TPC-H results that show how Hadoop/HBase can be a replacement of traditional RDBMS solutions.
To view the accompanying slide deck: http://www.slideshare.net/ChicagoHUG/
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...Yahoo Developer Network
Monte Zweben Co-Founder and CEO of Splice Machine, will discuss how to use HBase co-processors to build an ANSI-99 SQL database with 1) parallelization of SQL execution plans, 2) ACID transactions with snapshot isolation and 3) consistent secondary indexing.
Transactions are critical in traditional RDBMSs because they ensure reliable updates across multiple rows and tables. Most operational applications require transactions, but even analytics systems use transactions to reliably update secondary indexes after a record insert or update.
In the Hadoop ecosystem, HBase is a key-value store with real-time updates, but it does not have multi-row, multi-table transactions, secondary indexes or a robust query language like SQL. Combining SQL with a full transactional model over HBase opens a whole new set of OLTP and OLAP use cases for Hadoop that was traditionally reserved for RDBMSs like MySQL or Oracle. However, a transactional HBase system has the advantage of scaling out with commodity servers, leading to a 5x-10x cost savings over traditional databases like MySQL or Oracle.
HBase co-processors, introduced in release 0.92, provide a flexible and high-performance framework to extend HBase. In this talk, we show how we used HBase co-processors to support a full ANSI SQL RDBMS without modifying the core HBase source. We will discuss how endpoint transactions are used to serialize SQL execution plans over to regions so that computation is local to where the data is stored. Additionally, we will show how observer co-processors simultaneously support both transactions and secondary indexing.
The talk will also discuss how Splice Machine extended the work of Google Percolator, Yahoo Labs’ OMID, and the University of Waterloo on distributed snapshot isolation for transactions. Lastly, performance benchmarks will be provided, including full TPC-C and TPC-H results that show how Hadoop/HBase can be a replacement of traditional RDBMS solutions.
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
Spark had been elected, deservedly, as the main massive parallel processing framework, and HDFS is the one of the most popular Big Data storage technologies. Therefore its combination is one of the most usual Big Data’s use cases. But, what happens with the security? Can these two technologies coexist in a secure environment? Furthermore, with the proliferation of BI technologies adapted to Big Data environments, that demands that several users interacts with the same cluster concurrently, can we continue to ensure that our Big Data environments are still secure? In this lecture, Abel and Jorge will explain which adaptations of Spark´s core they had to perform in order to guarantee the security of multiple concurrent users using a single Spark cluster, which can use any of its cluster managers, without degrading the outstanding Spark’s performance.
#GeodeSummit - Wall St. Derivative Risk Solutions Using GeodePivotalOpenSourceHub
In this talk, Andre Langevin discusses how Geode forms the core of many Wall Street derivative risk solutions. By externalizing risk from trading systems, Geode-based solutions provide cross-product risk management at speeds suitable for automated hedging, while simultaneously eliminating the back office costs associated with traditional trading system based solutions.
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...PivotalOpenSourceHub
This talk introduces an open-source solution that integrates cloud native apps running on Cloud Foundry with an open-source hybrid transactions + analytics real-time solution. The architecture is based on the fastest scalable, highly available and fully consistent In-Memory Data Grid (Apache Geode / GemFire), natively integrated to the first open-source massive parallel data warehouse (Greenplum Database) in a hybrid transactional and analytical architecture that is extremely fast, horizontally scalable, highly resilient and open source. This session also features a live demo running on Cloud Foundry, showing a real case of real-time closed-loop analytics and machine learning using the featured solution.
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)PivotalOpenSourceHub
Today, if events change the decision model, we wait until the next batch model build for new insights. By extending fast “time-to-decisions” into the world of Big Data Analytics to get fast “time-to-insights”, apps will get what used to be batch insights in near real time. The technology enabling this includes smart in-memory data storage, new storage class memory, and products designed to do one or more parts of an analysis pipeline very well. In this talk we describe how Ampool is building on Apache Geode to allow Big Data analysis solutions to work together with a scalable smart storage class memory layer to allow fast and complex end-to-end pipelines to be built -- closing the loop and providing dramatically lower time to critical insights.
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...PivotalOpenSourceHub
One of the largest retailers in North America are considering Apache Geode for their new mobile loyalty application, to support their digital transformation effort. They would use Geode to provide operational data services for their mobile cloud service. This retailer needs to replace sluggish response times with sub-second response which will improved conversion rates. They also want to able to close the loop between data science findings and app experience. This way the right customer interaction is suggested when it is needed such as when customers are looking at their mobile app while walking in the store, or sending notifications at the individuals most likely shopping times. The final benefits of using Geode will include faster development cycles, increased customer loyalty, and higher revenue.
#GeodeSummit - Modern manufacturing powered by Spring XD and GeodePivotalOpenSourceHub
Wondering how to improve on your production yield, increase asset life and activate reliability centered maintenance? TEKsystems has developed “Golden Batch” recommendation engine to realize your goals of modern manufacturing. This is a Predictive analytics framework built on top of Manufacturing Data Lake for analysis and training of machine learning algorithms, and subsequent processing and detection of streaming data from sensors to detect or predict failures. We’ll present a solution architecture featuring Spring XD for data pipelining, Apache Geode for in-memory processing, Hadoop as a data lake, and R for machine learning.
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...PivotalOpenSourceHub
In this session we explore a case study of a large-scale government fraud detection program that prevents billions of dollars in fraudulent payments each year leveraging the beta release of the GemFire+Greenplum Connector, which is planned for release in GemFire 9. Topics will include an overview of the system architecture and a review of the new GemFire+Greenplum Connector features that simplify use cases requiring a blend of massively parallel database capabilities and accelerated in-memory data processing.
This presentation will help you understand your company's indirect rate structure, looking at yielding auditable rates that comply with Federal Regulations and structure competitive indirect rates.
Design Tradeoffs in Distributed Systems- How Southwest Airlines Uses Geode VMware Tanzu
SpringOne Platform 2016
Speaker: Brian Dunlap; Tech Lead, Southwest Airlines.
Distributed systems and fast data require new software patterns and implementation skills. Learn how Southwest Airlines uses Apache Geode, organizes team responsibilities, and approaches design tradeoffs. Drawing inspiration from real whiteboard conversations, we’ll explore:
-Common development pitfalls
-Environment capacity planning
-Streaming data patterns, like consumer checkpointing
-Support roles
-Production lessons learned
Every day, Apache Geode improves how Southwest Airlines schedules nearly 4,000 flights and serves over 500,000 passengers. It’s an essential component of Southwest’s ability to reduce flight delays and support future growth.
Avoiding Chaos: Methodology for Managing Performance in a Shared Storage A...brettallison
Scope - The primary focus of this presentation is on the methodology we use for managing performance in a very large shared Storage Area Network environment with a Primary focus on Distributed Systems and IBM Enterprise Storage Server. The focus on this presentation is methodology and NOT measurement. There are numerous excellent presentations already out there on measurement. However, there are several references in the back of the presentation to measurement tools.
Amazon Aurora is a MySQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Amazon Aurora is disruptive technology in the database space, bringing a new architectural model and distributed systems techniques to provide far higher performance, availability and durability than previously available using conventional monolithic database techniques. In this session, we will do a deep-dive into some of the key innovations behind Amazon Aurora, discuss best practices and configurations, and share early customer experience from the field.
Data analysis is being used to transform businesses, increase efficiency, and drive innovation. But organizations need to perform increasingly complex analysis on their data (streaming analytics, ad-hoc querying and predictive analytics) in order to get better insights and actionable business intelligence. The growing data volume, speed, and complexity of diverse data formats make legacy tools inadequate or difficult to use. The AWS Cloud has a comprehensive portfolio of analytics services to help you process data of any volume and automate how you put that data to work for your organization. In this session we’ll see how to put those services at work on structured, unstructured and real-time data.
Gary Grider from Los Alamos National Laboratory presented this deck at the 2016 OpenFabrics Workshop.
"Trends in computer memory/storage technology are in flux perhaps more so now than in the last two decades. Economic analysis of HPC storage hierarchies has led to new tiers of storage being added to the next fleet of supercomputers including Burst Buffers or In-System Solid State Storage and Campaign Storage. This talk will cover the background that brought us these new storage tiers and postulate what the economic crystal ball looks like for the coming decade. Further it will suggest methods of leveraging HPC workflow studies to inform the continued evolution of the HPC storage hierarchy."
Watch the video presentation: https://www.youtube.com/watch?v=iDYLIpF-6Ew
See more talks from the Open Fabrics Workshop: http://insidehpc.com/2016-open-fabrics-workshop-video-gallery/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
Spinning Brown Donuts: Why Storage Still CountsSparkhound Inc.
Storage, next to server hardware, is pretty commoditized and probably the least exciting thing in your datacenter. However, not properly assessing your storage needs and requirements can be the difference between a great app or resume generating event. This session will cover topics such as: Why you may not need all flash, SAN is not just NAS spelled backwards, leveraging cloud storage, why RAID is not a sound backup solution, and cutting through the marketing to make sense of it all.
RDS for MySQL, No BS Operations and PatternsLaine Campbell
Amazon's RDS for MySQL is a wonderful tool with a significant value. It can also create a lot of havoc if you are not aware of it's limitations and changes before you make it a core part of your environment. In this deck, we discuss those issues.
In this technical overview of Azure Cosmos DB you will learn how easy it is to get started building planet-scale applications with Azure Cosmos DB. We’ll then take a closer look at important design aspects around global distribution, consistency, and server-side partitioning. How to model your data to fit your app’s needs using tools and APIs you love.
Cloudifying High Availability: The Case for Elastic Disaster RecoveryAli Hodroj
Elastic DR: a solution architecture that aims to optimally balance cost and recovery time via three core principles that are germane the cloud world:
On-Demand: The disaster recovery cloud can be provisioned on any availability zone, region, or public/private cloud through Cloudify's cloud-agnostic bootstrapping mechanism.
Elastic: The ability to automatically provision resources in the recovery cloud in case of disaster while eliminating the need for idle resources in normal scenarios, thereby fully profiting from the pay-per-use pricing model of clouds.
Flexible RTO/RPO: The architecture can be easily extended from a warm DR to a hot DR pattern through enabling/disabling application recipes. This allows us to exploit economies of scale that the cloud provides by matching the number of recipes/tiers to provision (in the recovery cloud) against the recovery time/point objective for our disaster recovery strategy
Here are the slides for Greenplum Chat #8. You can view the replay here: https://www.youtube.com/watch?v=FKFiyJDgdQk
The increased frequency and sophistication of high-profile data breaches and malicious hacking is putting organizations at continued risk of data theft and significant business disruption. Complicating this scenario is the unbounded growth of Big Data and petabyte-scale data storage, new open source database and distribution schemes, and the continued adoption of cloud services by enterprises.
Pivotal Greenplum customers often look for additional encryption of data-at-rest and data-in-motion. The massively parallel processing (MPP) architecture of Pivotal Greenplum provides an architecture that is unlike traditional OLAP on RDBMS for data warehousing, and encryption capabilities must address the scale-out architecture.
The Zettaset Big Data Encryption Suite has been designed for optimal performance and scalability in distributed Big Data systems like Greenplum Database and Apache HAWQ.
Here is a replay of our recent Greenplum Chat with Zettaset:
00:59 What is Greenplum’s approach for encryption and why Zettaset?
02:17 Results of field testing Zettaset with Greenplum
03:50 Introduction to Zettaset, the security company
05:36 Overview of Zettaset and their solutions
14:51 Different layers for encrypting data at rest
16:50 Encryption key management for big data
20:51 Zettaset BD Encrypt for data at rest and data in motion
22:19 How to mitigate encryption overhead with an MPP scale-out system
24:12 How to deploy BD Encrypt
25:50 Deep dive on data at rest encryption
30:44 Deep dive on data in motion encryption
36:72 Q: How does Zettaset deal with encrypting Greenplums multiple interfaces?
38:08 Q: Can I encrypt data for a particular column?
40:26 How Zettaset fits into a security strategy
41:21 Q: What is the performance impact on queries by encrypting the entire database?
43:28 How Zettaset helps Greenplum meet IT compliance requirements
45:12 Q: How authentication for keys is obtained
48:50 Q: How can Greenplum users try out Zettaset?
50:53 Q: What is a ‘Zettaset Security Coach’?
How to use the WAN Gateway feature of Apache Geode to implement multi-site and active-active failover, disaster recovery, and global scale applications.
Building Apps with Distributed In-Memory Computing Using Apache GeodePivotalOpenSourceHub
Slides from the Meetup Monday March 7, 2016 just before the beginning of #GeodeSummit, where we cover an introduction of the technology and community that is Apache Geode, the in-memory data grid.
GPORCA is newly open source advanced query optimizer that is a subproject of Greenplum Database open source project. GPORCA is the query optimizer used in commercial distributions of both Greenplum and HAWQ. In these distributions GPORCA has achieved 1000x performance improvement across TPC-DS queries by focusing on three distinct areas: Dynamic Partition Elimination, SubQuery Unnesting, and Common Table Expression.
Now that GPORCA is open source, we are looking for collaborators to help us realize the ultimate dream for GPORCA - to work with any database.
The new breed of data management systems in Big Data have to process so much data that optimization mistakes are magnified in traditional optimizers. Furthermore, coding and manual optimization of complex queries has proven to be hard.
In this session, Venkatesh will discuss:
- Overview of GPORCA
- How to add GPORCA to HAWQ with a build option
- How GPORCA could be made to work with any database
- Future vision for GPORCA and more immediate plans
- How to work with GPORCA, and how to contribute to GPORCA
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivotalOpenSourceHub
Pivoting Spring XD to Spring Cloud Data Flow: A microservice based architecture for stream processing
Microservice based architectures are not just for distributed web applications! They are also a powerful approach for creating distributed stream processing applications. Spring Cloud Data Flow enables you to create and orchestrate standalone executable applications that communicate over messaging middleware such as Kafka and RabbitMQ that when run together, form a distributed stream processing application. This allows you to scale, version and operationalize stream processing applications following microservice based patterns and practices on a variety of runtime platforms such as Cloud Foundry, Apache YARN and others.
About Sabby Anandan
Sabby Anandan is a Product Manager at Pivotal. Sabby is focused on building products that eliminate the barriers between application development, cloud, and big data.
Zeppelin Interpreters
PSQL (to became JDBC in 0.6.x)
Geode
SpringXD
Apache Ambari
Zeppelin Service
Geode, HAWQ and Spring XD services
Webpage Embedder View
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
10. Southwest’s
Network Operations Control
integrates decision makers.
BOISE ALBANY
OKLAHOMA CITY
AUSTIN PANAMA CITY BEACH
CHARLESTON
GREENVILLE-SPARTANBURG
TUCSON
LUBBOCK
AMARILLO
MIDLAND/ODESSA
EL PASO
LITTLE ROCK
NASHVILLE
DALLAS (LOVE FIELD)
SACRAMENTO
OAKLAND
SAN JOSE
BURBANK
LOS ANGELES
(LAX) ORANGE COUNTY
ONTARIO
SAN DIEGO
SAN FRANCISCO (SFO)
BIRMINGHAM
LOUISVILLE
CLEVELAND
OMAHA
TULSA
RENO/TAHOE
HARLINGEN/SOUTH PADRE ISLAND
PUERTO VALLARTA
CORPUS CHRISTI
ALBUQUERQUE
DES MOINES
MEMPHIS
CABO SAN LUCAS/LOS CABOS
ROCHESTER
AKRON/
CANTON
WICHITA
PENSACOLA
MEXICO CITY
NASSAU
PUNTA CANA
SAN JUAN
MONTEGO BAY
ARUBA
CANCÚN
FLINT
GRAND
RAPIDS
CHARLOTTE
DAYTON
MINNEAPOLIS/
ST. PAUL
PHOENIX
DENVER
INDIANAPOLIS
COLUMBUS
RALEIGH/DURHAM
CHICAGO
(MIDWAY)
FT. LAUDERDALE (MIAMI AREA)
DETROIT
HOUSTON (HOBBY)
SEATTLE/TACOMA
LAS VEGAS
NEW ORLEANS
ST. LOUIS
MILWAUKEE
BUFFALO/
NIAGARA FALLS
ATLANTA
ORLANDO
FT. MYERS/NAPLES
JACKSONVILLE
TAMPA
WEST PALM BEACH
SAN ANTONIO
KANSAS CITY
BELIZE CITY
SAN JOSÉ
LIBERIA
PORTLAND
WASHINGTON, D.C. (REAGAN NATIONAL)
RICHMOND
MANCHESTER
PROVIDENCE
HARTFORD/SPRINGFIELD
NORFOLK/VIRGINIA BEACH
BOSTON LOGAN
PHILADELPHIA
BALTIMORE/WASHINGTON (BWI)
WASHINGTON, D.C. (DULLES)
PITTSBURGH
NEW YORK (LAGUARDIA)
LONG ISLAND/ISLIP
NEW YORK (NEWARK)
SALT LAKE CITY
SPOKANE
PORTLAND
23. What do you own? (core)
<focus>
What do you need? (supporting)
<simplify>
How long can you keep it?
<intentional>
24. Adding is very easy.
Watch out for data that’s around for too long.
Do all of these data
need to be
in-memory?
Data at rest for
a long time? (>365 days)
GEODE
REGION
SIZES
25. Determine if each subdomain
should use Geode.
Don’t make an automatic decision.
Domain tradeoffs
26. Maybe it needs an
entirely different home?
Domain tradeoffs
30. OLD NEW
NORMALIZED JOINS
REGIONS FOR READS
REGIONS FOR AGGREGATES
BLOCKING THREADS ASYNC - AKKA / ACTORS
ACTIVE / PASSIVE ACTIVE / ACTIVE
MUTABLE STATE
IMMUTABILITY / EVENT SOURCING
DATA CONVERGENCE
CRUD
CQRS / DDD
EVENT DRIVEN
ServiceManagerHandlerImpl
32. OLD NEW
NORMALIZED JOINS
REGIONS FOR READS
REGIONS FOR AGGREGATES
BLOCKING THREADS ASYNC - AKKA / ACTORS
ACTIVE / PASSIVE ACTIVE / ACTIVE
MUTABLE STATE
IMMUTABILITY / EVENT SOURCING
DATA CONVERGENCE
CRUD
CQRS / DDD
EVENT DRIVEN
33. We write immutable domain events into event regions.
Clients receive events using Geode CQs.
Clients checkpoint their position into separate regions.
Event regions expire messages.
checkpointing
34. Akka Cluster manages Actor Singletons which coordinate
parallel processing based on a logical groupId.
Backpressure is implemented through a competing
consumer pattern. Take a look at Akka Streams!
All Geode replicate regions use distributed ack. We don’t
want to converge. (some write wins)
coordination (*important concept)
36. PUSH or PULL
How do we scale expensive read I/O?
Contain expensive reads
With CQRS view model builders, perform
heavy state enriching “select *” once.
Push read updates vs. polling (Geode CQs)
Conflate triggering view model rebuild events
37. Be careful with timeouts!
Be careful with alerts!
Be careful with joins!
Be careful with large values!
Be careful with old habits!
safety tips
40. Integrate Geode security with a directory
Tune JVM size and GC
Deploy and upgrade environments
Size and configure VMs
Support production events
Enable WAN Gateway Sender / Receivers
Load snapshots between environments
Automate starting and stopping clusters
Teaching distributed concepts - like CAP
How do we share new
distributed system responsibilities?
DBAs
UNIX
DEVs
Middleware
Release Management
Offshore Support
New Geode Team
DevOps
EARLIER
IS BETTER
41. Learn to luv conversation
tension.
When there’s tension, you’re on the
right track!
46. Prefer less-shared disk I/O.
(local to a VM rack, or dedicated)
Prefer larger + fewer Geode nodes.
(4 larger nodes vs. 8 smaller ones)
Take advantage of availability zones (AZs).
CONVERSATION
LEADERSHIP
ACROSS TEAMS
SHARED or
SHARED LESS
What infrastructure
supports Geode?
47. Know your memory (and GC) limits.
Watch out for slow heap growth
that triggers continuous GC.
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=60
-Xloggc:/your/path/node-name.GC.log
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
-XX:+PrintGCCause
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=20
-XX:GCLogFileSize=5M
Check out GCViewer
for GC log analysis.
48. Essential tool for real-time
decision optimization testing!
Helpful for QA performance
and functional testing.
Wonderful Geode feature!
WAN Gateway
49. Optimization binary consumes PDX via C++ Native Client
Moving > 200 MB per optimization request
Be careful with refactoring PDX data types!
C++ Native Client