1) The document introduces Infinispan, an open source in-memory data grid and distributed cache. It discusses Infinispan's architecture as an embedded library or standalone server, clustering modes, persistence, querying, transactions and more.
2) Use cases for Infinispan include sharing data, high performance caching, scalability, and as a database platform in the cloud. Example applications discussed are session clustering and a data grid platform.
3) The document provides a case study of using Infinispan with Spring for HTTP session clustering, describing how to configure Infinispan, implement a custom SecurityContextDao, and integrate it with Spring Security.
Deploying Docker Containers at Scale with Mesos and MarathonDiscover Pinterest
Connor Doyle from Mesosphere.
Deploying Docker Containers at Scale with Mesos and Marathon
The norm these days is to operate apps at web scale. But that’s out of reach for most companies. Deploying Docker containers with Mesos and Marathon makes it easier. See how they help deploy and manage Docker containers at scale and how the Mesos cluster scheduler builds highly-available, fault-tolerant web scale apps.
The document discusses continuous deployment and practices at Disqus for releasing code frequently. It emphasizes shipping code as soon as it is ready after it has been reviewed, passes automated tests, and some level of QA. It also discusses keeping development simple, integrating code changes through automated testing, using metrics for reporting, and doing progressive rollouts of new features to subsets of users.
Caching has been an essential strategy for greater performance in computing since the beginning of the field. Nearly all applications have data access patterns that make caching an attractive technique, but caching also has hidden trade-offs related to concurrency, memory usage, and latency.
As we build larger distributed systems, caching continues to be a critical technique for building scalable, high-throughput, low-latency applications. Large systems tend to magnify the caching trade-offs and have created new approaches to distributed caching. There are unique challenges in testing systems like these as well.
Ehcache and Terracotta provide a unique way to start with simple caching for a small system and grow that system over time with a consistent API while maintaining low-latency, high-throughput caching.
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...Lightbend
The Big Data industry emerged in response to the unprecedented sizes of data sets collected by Internet companies and the particular needs they had to store and use that data.
Today, the need to process that data more quickly is morphing Big Data architectures into Fast Data architectures. This session discusses the forces driving this trend and the most popular tools that have emerged to address particular design challenges:
Spark - For sophisticated processing of data streams, as well as traditional batch-mode processing.
Kafka - For durable and scalable ingestion and distribution of data streams.
Cassandra - For scalable, flexible persistence.
Reactive Platform: Lagom, Akka, and Play - For integration of other components and building microservices.
Mesos - For cluster resource management.
---
About the presenter:
Dean Wampler, Ph.D. is the Architect for Big Data Products and Services and a member of the office of the CTO at Lightbend. He is designing the product strategy and technical architecture for Lightbend's Spark on Mesos products and emerging streaming tools built around Spark and Lightbend’s ConductR and Akka products. Dean has written books on Scala, Functional Programming, and Hive for O'Reilly. He speaks at and co-organizes many industry conferences. He also organizes several Chicago-area user groups and contributes to many open-source projects, including Apache Spark. Dean has a Ph.D. in Physics from the University of Washington.
Terracotta (an open source technology) provides a clustered, durable virtual heap. Terracotta's goal is to make Java apps scale with as little effort as possible. If you are using Hibernate, there are several patterns that can be used to leverage Terracotta and reduce the load on your database so your app can scale.
First, you can use the Terracotta clustered Hibernate cache. This is a high-performance clustered cache and allows you to avoid hitting the database on all nodes in your cluster. It's suitable, not just for read-only, but also for read-mostly and read-write use cases, which traditionally have not been viewed as good use cases for Hibernate second level cache.
Another high performance option is to disconnect your POJOs from their Hibernate session and manage them entirely in Terracotta shared heap instead. This is a great option for conversational data where the conversational data is not of long-term interest but must be persistent and highly-available. This pattern can significantly reduce your database load but does require more changes to your application than using second-level cache.
This talk will examine the basics of what Terracotta provides and examples of how you can scale your Hibernate application with both clustered second level cache and detached clustered state. Also, we'll take a look at Terracotta's Hibernate-specific monitoring tools.
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaCloudera, Inc.
Performance is a thing that you can never have too much of. But performance is a nebulous concept in Hadoop. Unlike databases, there is no equivalent in Hadoop to TPC, and different use cases experience performance differently. This talk will discuss advances on how Hadoop performance is measured and will also talk about recent and future advances in performance in different areas of the Hadoop stack.
Deploying Docker Containers at Scale with Mesos and MarathonDiscover Pinterest
Connor Doyle from Mesosphere.
Deploying Docker Containers at Scale with Mesos and Marathon
The norm these days is to operate apps at web scale. But that’s out of reach for most companies. Deploying Docker containers with Mesos and Marathon makes it easier. See how they help deploy and manage Docker containers at scale and how the Mesos cluster scheduler builds highly-available, fault-tolerant web scale apps.
The document discusses continuous deployment and practices at Disqus for releasing code frequently. It emphasizes shipping code as soon as it is ready after it has been reviewed, passes automated tests, and some level of QA. It also discusses keeping development simple, integrating code changes through automated testing, using metrics for reporting, and doing progressive rollouts of new features to subsets of users.
Caching has been an essential strategy for greater performance in computing since the beginning of the field. Nearly all applications have data access patterns that make caching an attractive technique, but caching also has hidden trade-offs related to concurrency, memory usage, and latency.
As we build larger distributed systems, caching continues to be a critical technique for building scalable, high-throughput, low-latency applications. Large systems tend to magnify the caching trade-offs and have created new approaches to distributed caching. There are unique challenges in testing systems like these as well.
Ehcache and Terracotta provide a unique way to start with simple caching for a small system and grow that system over time with a consistent API while maintaining low-latency, high-throughput caching.
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...Lightbend
The Big Data industry emerged in response to the unprecedented sizes of data sets collected by Internet companies and the particular needs they had to store and use that data.
Today, the need to process that data more quickly is morphing Big Data architectures into Fast Data architectures. This session discusses the forces driving this trend and the most popular tools that have emerged to address particular design challenges:
Spark - For sophisticated processing of data streams, as well as traditional batch-mode processing.
Kafka - For durable and scalable ingestion and distribution of data streams.
Cassandra - For scalable, flexible persistence.
Reactive Platform: Lagom, Akka, and Play - For integration of other components and building microservices.
Mesos - For cluster resource management.
---
About the presenter:
Dean Wampler, Ph.D. is the Architect for Big Data Products and Services and a member of the office of the CTO at Lightbend. He is designing the product strategy and technical architecture for Lightbend's Spark on Mesos products and emerging streaming tools built around Spark and Lightbend’s ConductR and Akka products. Dean has written books on Scala, Functional Programming, and Hive for O'Reilly. He speaks at and co-organizes many industry conferences. He also organizes several Chicago-area user groups and contributes to many open-source projects, including Apache Spark. Dean has a Ph.D. in Physics from the University of Washington.
Terracotta (an open source technology) provides a clustered, durable virtual heap. Terracotta's goal is to make Java apps scale with as little effort as possible. If you are using Hibernate, there are several patterns that can be used to leverage Terracotta and reduce the load on your database so your app can scale.
First, you can use the Terracotta clustered Hibernate cache. This is a high-performance clustered cache and allows you to avoid hitting the database on all nodes in your cluster. It's suitable, not just for read-only, but also for read-mostly and read-write use cases, which traditionally have not been viewed as good use cases for Hibernate second level cache.
Another high performance option is to disconnect your POJOs from their Hibernate session and manage them entirely in Terracotta shared heap instead. This is a great option for conversational data where the conversational data is not of long-term interest but must be persistent and highly-available. This pattern can significantly reduce your database load but does require more changes to your application than using second-level cache.
This talk will examine the basics of what Terracotta provides and examples of how you can scale your Hibernate application with both clustered second level cache and detached clustered state. Also, we'll take a look at Terracotta's Hibernate-specific monitoring tools.
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaCloudera, Inc.
Performance is a thing that you can never have too much of. But performance is a nebulous concept in Hadoop. Unlike databases, there is no equivalent in Hadoop to TPC, and different use cases experience performance differently. This talk will discuss advances on how Hadoop performance is measured and will also talk about recent and future advances in performance in different areas of the Hadoop stack.
1. The system must take ownership of event data to ensure durability and free input systems from managing logs.
2. Events must be causally linked through parent IDs and timestamps to reconstruct request flows.
3. The system must be idempotent to handle duplicate events and allow bulk import of historical data.
4. The output should be time-invariant so that reprocessing the same input data produces the same outputs.
This document discusses integrating Apache Gora and Apache Giraph to allow Giraph to access graph data stored in different NoSQL backends via Gora. It provides an overview of Gora and Giraph, describes how the integration would work by implementing hooks for vertices, edges and keys, and lists some challenges and future work, such as supporting more complex schemas and data stores. The goal is to give Giraph users more flexibility in how they run graph algorithms by accessing data through Gora.
The document discusses Impala, an SQL query engine for Hadoop. It provides an overview of Impala, details improvements in versions 1.4 and 2.0, and describes new features like subqueries, analytic functions, and data types. Performance optimizations like HDFS caching and partition pruning are also covered.
An embedded mirror maker is being prototyped to address the large number of dedicated machines currently used for mirroring. The proposed approach would embed the mirroring logic directly in the Kafka brokers to reduce latency, load, and number of machines. It would use idempotent producers, dynamic configuration via Zookeeper, and handle scenarios like leader movement. Challenges include tighter broker/mirror coupling and ensuring message ordering across clusters.
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
This document provides an introduction to Big Data and Apache Hadoop. It defines Big Data as large and complex datasets that are difficult to process using traditional database tools. It describes how Hadoop uses MapReduce and HDFS to provide scalable storage and parallel processing of Big Data. It provides examples of companies using Hadoop to analyze exabytes of data and common Hadoop use cases like log analysis. Finally, it summarizes some popular Hadoop ecosystem projects like Hive, Pig, and Zookeeper that provide SQL-like querying, data flows, and coordination.
Using Groovy? Got lots of stuff to do at the same time? Then you need to take a look at GPars (“Jeepers!”), a library providing support for concurrency and parallelism in Groovy. GPars brings powerful concurrency models from other languages to Groovy and makes them easy to use with custom DSLs:
- Actors (Erlang and Scala)
- Dataflow (Io)
- Fork/join (Java)
- Agent (Clojure agents)
In addition to this support, GPars integrates with standard Groovy frameworks like Grails and Griffon.
Background, comparisons to other languages, and motivating examples will be given for the major GPars features.
Harnessing the power of Nutch with ScalaKnoldus Inc.
This document discusses using Nutch, an open source web crawler, with Scala. It provides an overview of Nutch's architecture and how plugins can be written in Scala to extend its functionality. As an example, it describes how Scala was used to build a plugin for an aggregator application that crawls multiple suppliers, parses content to extract details, and passes this data to an actor for processing. The solution was able to crawl 5 suppliers and collect over 500k records using Nutch and 823 lines of Scala code.
"In this session, Twitter engineer Alex Payne will explore how the popular social messaging service builds scalable, distributed systems in the Scala programming language. Since 2008, Twitter has moved the development of its most critical systems to Scala, which blends object-oriented and functional programming with the power, robust tooling, and vast library support of the Java Virtual Machine. Find out how to use the Scala components that Twitter has open sourced, and learn the patterns they employ for developing core infrastructure components in this exciting and increasingly popular language."
1) Apache Ambari is an open-source platform for provisioning, managing, and monitoring Hadoop clusters.
2) New features in Ambari 2.4 include additional services, role-based access control, management packs, and Grafana integration.
3) Ambari simplifies cluster operations through an intuitive UI for deploying, securing, monitoring, upgrading, and scaling Hadoop clusters.
Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, and fault-tolerant database. It originated at Facebook in 2007 to solve their inbox search problem. Some key companies using Cassandra include Twitter, Facebook, Digg, and Rackspace. Cassandra's data model is based on Google's Bigtable and its distribution design is based on Amazon's Dynamo.
Hazelcast provides scale-out computing capabilities that allow cluster capacity to be increased or decreased on demand. It enables resilience through automatic recovery from member failures without data loss. Hazelcast's programming model allows developers to easily program cluster applications as if they are a single process. It also provides fast application performance by holding large data sets in main memory.
Kick your database_to_the_curb_reston_08_27_19confluent
This document discusses using Kafka Streams interactive queries to enable powerful microservices by making stream processing results queryable in real-time. It provides an overview of Kafka Streams, describes how to embed an interactive query server to expose stateful stream processing results via HTTP endpoints, and demonstrates how to securely query processing state from client applications.
Python Utilities for Managing MySQL DatabasesMats Kindahl
Managing a MySQL database server can become a full time job. What we need are tools that bundle a set of related tasks into a common utility. While there are several such utility libraries to choose, it is often the case that you need to customize them to your needs. The MySQL Utilities library is the answer to that need. It is open source so you can modify and expand it as you see fit.
This is the presentation from OSCON 2011 in Portland.
This document provides an overview of Hazelcast, an open source in-memory data grid. It discusses what Hazelcast is, common use cases, features, and how to configure and use distributed maps (IMap) and querying with predicates. Key points covered include that Hazelcast stores data in memory and distributes it across a cluster, supports caching, distributed computing and messaging use cases, and IMap implements a distributed concurrent map that can be queried using predicates and configured with eviction policies and persistence.
In this session we review the design of the current capabilities of the Spring Data GemFire API that supports Geode, and explore additional use cases and future direction that the Spring API and underlying Geode support might evolve.
Impala is a SQL query engine for Apache Hadoop that allows for interactive queries on large datasets. It uses a distributed architecture where each node runs an Impala daemon and queries are distributed across nodes. Impala aims to provide general-purpose SQL with high performance by using C++ instead of Java and avoiding MapReduce execution. It runs directly on Hadoop storage systems and supports common file formats like Parquet and Avro.
This document discusses how to setup HBase with Docker in three configurations: single-node standalone, pseudo-distributed single-machine, and fully-distributed cluster. It describes features of HBase like consistent reads/writes, automatic sharding and failover. It provides instructions for installing HBase in a single node using Docker, including building an image and running it with ports exposed. It also covers running HBase in pseudo-distributed mode with the processes running as separate containers and interacting with the HBase shell.
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
Treasure Data provides a data analytics service with the following key components:
- Data is collected from various sources using Fluentd and loaded into PlazmaDB.
- PlazmaDB is the distributed time-series database that stores metadata and data.
- Jobs like queries, imports, and optimizations are executed on Hadoop and Presto clusters using queues, workers, and a scheduler.
- The console and APIs allow users to access the service and submit jobs for processing and analyzing their data.
The document introduces the Infinispan data grid platform. It discusses how Infinispan can be used as a distributed in-memory cache both as a library and server. Key features of Infinispan are clustering, persistence, transactions, querying, and map-reduce capabilities. Examples of using Infinispan for session clustering and as a state store for Storm processing are provided.
The document introduces the Infinispan data grid platform. It discusses how Infinispan can be used as a distributed in-memory cache and data grid. It covers key Infinispan features like clustering, persistence, transactions, querying and map-reduce capabilities. It also provides examples of using Infinispan for session clustering and as a state store for Storm processing.
1. The system must take ownership of event data to ensure durability and free input systems from managing logs.
2. Events must be causally linked through parent IDs and timestamps to reconstruct request flows.
3. The system must be idempotent to handle duplicate events and allow bulk import of historical data.
4. The output should be time-invariant so that reprocessing the same input data produces the same outputs.
This document discusses integrating Apache Gora and Apache Giraph to allow Giraph to access graph data stored in different NoSQL backends via Gora. It provides an overview of Gora and Giraph, describes how the integration would work by implementing hooks for vertices, edges and keys, and lists some challenges and future work, such as supporting more complex schemas and data stores. The goal is to give Giraph users more flexibility in how they run graph algorithms by accessing data through Gora.
The document discusses Impala, an SQL query engine for Hadoop. It provides an overview of Impala, details improvements in versions 1.4 and 2.0, and describes new features like subqueries, analytic functions, and data types. Performance optimizations like HDFS caching and partition pruning are also covered.
An embedded mirror maker is being prototyped to address the large number of dedicated machines currently used for mirroring. The proposed approach would embed the mirroring logic directly in the Kafka brokers to reduce latency, load, and number of machines. It would use idempotent producers, dynamic configuration via Zookeeper, and handle scenarios like leader movement. Challenges include tighter broker/mirror coupling and ensuring message ordering across clusters.
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
This document provides an introduction to Big Data and Apache Hadoop. It defines Big Data as large and complex datasets that are difficult to process using traditional database tools. It describes how Hadoop uses MapReduce and HDFS to provide scalable storage and parallel processing of Big Data. It provides examples of companies using Hadoop to analyze exabytes of data and common Hadoop use cases like log analysis. Finally, it summarizes some popular Hadoop ecosystem projects like Hive, Pig, and Zookeeper that provide SQL-like querying, data flows, and coordination.
Using Groovy? Got lots of stuff to do at the same time? Then you need to take a look at GPars (“Jeepers!”), a library providing support for concurrency and parallelism in Groovy. GPars brings powerful concurrency models from other languages to Groovy and makes them easy to use with custom DSLs:
- Actors (Erlang and Scala)
- Dataflow (Io)
- Fork/join (Java)
- Agent (Clojure agents)
In addition to this support, GPars integrates with standard Groovy frameworks like Grails and Griffon.
Background, comparisons to other languages, and motivating examples will be given for the major GPars features.
Harnessing the power of Nutch with ScalaKnoldus Inc.
This document discusses using Nutch, an open source web crawler, with Scala. It provides an overview of Nutch's architecture and how plugins can be written in Scala to extend its functionality. As an example, it describes how Scala was used to build a plugin for an aggregator application that crawls multiple suppliers, parses content to extract details, and passes this data to an actor for processing. The solution was able to crawl 5 suppliers and collect over 500k records using Nutch and 823 lines of Scala code.
"In this session, Twitter engineer Alex Payne will explore how the popular social messaging service builds scalable, distributed systems in the Scala programming language. Since 2008, Twitter has moved the development of its most critical systems to Scala, which blends object-oriented and functional programming with the power, robust tooling, and vast library support of the Java Virtual Machine. Find out how to use the Scala components that Twitter has open sourced, and learn the patterns they employ for developing core infrastructure components in this exciting and increasingly popular language."
1) Apache Ambari is an open-source platform for provisioning, managing, and monitoring Hadoop clusters.
2) New features in Ambari 2.4 include additional services, role-based access control, management packs, and Grafana integration.
3) Ambari simplifies cluster operations through an intuitive UI for deploying, securing, monitoring, upgrading, and scaling Hadoop clusters.
Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, and fault-tolerant database. It originated at Facebook in 2007 to solve their inbox search problem. Some key companies using Cassandra include Twitter, Facebook, Digg, and Rackspace. Cassandra's data model is based on Google's Bigtable and its distribution design is based on Amazon's Dynamo.
Hazelcast provides scale-out computing capabilities that allow cluster capacity to be increased or decreased on demand. It enables resilience through automatic recovery from member failures without data loss. Hazelcast's programming model allows developers to easily program cluster applications as if they are a single process. It also provides fast application performance by holding large data sets in main memory.
Kick your database_to_the_curb_reston_08_27_19confluent
This document discusses using Kafka Streams interactive queries to enable powerful microservices by making stream processing results queryable in real-time. It provides an overview of Kafka Streams, describes how to embed an interactive query server to expose stateful stream processing results via HTTP endpoints, and demonstrates how to securely query processing state from client applications.
Python Utilities for Managing MySQL DatabasesMats Kindahl
Managing a MySQL database server can become a full time job. What we need are tools that bundle a set of related tasks into a common utility. While there are several such utility libraries to choose, it is often the case that you need to customize them to your needs. The MySQL Utilities library is the answer to that need. It is open source so you can modify and expand it as you see fit.
This is the presentation from OSCON 2011 in Portland.
This document provides an overview of Hazelcast, an open source in-memory data grid. It discusses what Hazelcast is, common use cases, features, and how to configure and use distributed maps (IMap) and querying with predicates. Key points covered include that Hazelcast stores data in memory and distributes it across a cluster, supports caching, distributed computing and messaging use cases, and IMap implements a distributed concurrent map that can be queried using predicates and configured with eviction policies and persistence.
In this session we review the design of the current capabilities of the Spring Data GemFire API that supports Geode, and explore additional use cases and future direction that the Spring API and underlying Geode support might evolve.
Impala is a SQL query engine for Apache Hadoop that allows for interactive queries on large datasets. It uses a distributed architecture where each node runs an Impala daemon and queries are distributed across nodes. Impala aims to provide general-purpose SQL with high performance by using C++ instead of Java and avoiding MapReduce execution. It runs directly on Hadoop storage systems and supports common file formats like Parquet and Avro.
This document discusses how to setup HBase with Docker in three configurations: single-node standalone, pseudo-distributed single-machine, and fully-distributed cluster. It describes features of HBase like consistent reads/writes, automatic sharding and failover. It provides instructions for installing HBase in a single node using Docker, including building an image and running it with ports exposed. It also covers running HBase in pseudo-distributed mode with the processes running as separate containers and interacting with the HBase shell.
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
Treasure Data provides a data analytics service with the following key components:
- Data is collected from various sources using Fluentd and loaded into PlazmaDB.
- PlazmaDB is the distributed time-series database that stores metadata and data.
- Jobs like queries, imports, and optimizations are executed on Hadoop and Presto clusters using queues, workers, and a scheduler.
- The console and APIs allow users to access the service and submit jobs for processing and analyzing their data.
The document introduces the Infinispan data grid platform. It discusses how Infinispan can be used as a distributed in-memory cache both as a library and server. Key features of Infinispan are clustering, persistence, transactions, querying, and map-reduce capabilities. Examples of using Infinispan for session clustering and as a state store for Storm processing are provided.
The document introduces the Infinispan data grid platform. It discusses how Infinispan can be used as a distributed in-memory cache and data grid. It covers key Infinispan features like clustering, persistence, transactions, querying and map-reduce capabilities. It also provides examples of using Infinispan for session clustering and as a state store for Storm processing.
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement VMware Tanzu
This document provides an agenda for a hands-on introduction and hackathon kickoff for Apache Geode. The agenda includes details about the hackathon, an introduction to Apache Geode including its history and key features, a hands-on lab to build, run, and use Geode, and a Q&A session. It also outlines how to contribute to the Geode project through code, documentation, issue tracking, and mailing lists.
Using JCache
This document discusses Java Caching (JCache), the Java standard for caching APIs specified in JSR-107. It introduces caching concepts and benefits, describes the key interfaces and classes in JCache like CacheManager and Cache, and demonstrates features like entry processors. It also discusses JCache implementations like Hazelcast and annotations for method-level caching. The future of JCache is outlined, with plans for JCache 1.1, 2.0 aligned with Java EE 8, and 3.0 aligned with future Java versions. The document ends with information on Hazelcast's JCache support.
This document discusses in-memory data grids and JBoss Infinispan. It begins with an overview of in-memory data grids, their uses for caching, performance boosting, scalability, and high availability. It then discusses Infinispan specifically, describing it as an open-source, distributed in-memory key-value data grid and cache. The document outlines Infinispan's architecture, features like persistence, transactions, querying, distributed execution, and map-reduce capabilities. It also provides a case study on using Infinispan for session clustering in a web application.
This document provides an overview and agenda for the "Busy Java Developer's Guide to WebSphere Debugging & Troubleshooting" presentation. The presentation covers various WebSphere Application Server components, troubleshooting tools like IBM Support Assistant, JVM troubleshooting tools, problem determination tools, common problem scenarios, how customers run into trouble, and includes a demo and Q&A section. It provides an in-depth look at debugging and resolving issues with WebSphere Application Server.
The Design, Implementation and Open Source Way of Apache Pegasusacelyc1112009
A presentation in Apache Pegasus meetup in 2021 from Yuchen He.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
This document provides an overview of Hazelcast, a leading in-memory data grid solution. Hazelcast provides distributed data structures, execution services, and caching capabilities. It allows applications to scale linearly by adding additional nodes. Hazelcast can be configured via XML, API or Spring and supports features like transactions, custom serialization, and native client libraries. It integrates with Spring Framework for caching and can discover nodes via multicast or TCP/IP lists to form clusters across distributed systems.
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systexJames Chen
This document discusses using Hadoop/MapReduce with Solr/Lucene for large scale distributed search. It begins with an introduction to the speaker and his experience with Hadoop. The agenda then outlines discussing why search big data, an overview of Lucene, Solr and Zookeeper, distributed searching and indexing with Hadoop, and a case study on web log categorization.
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Chris Fregly
Spark Streaming allows for processing of real-time data streams using Spark. The document discusses using Spark Streaming with Amazon Kinesis for streaming data ingestion. It covers the Spark Streaming and Kinesis integration architecture, how the Spark Kinesis receiver works, scaling considerations, and fault tolerance mechanisms through checkpointing. Examples of monitoring and tuning Spark Streaming jobs on Kinesis data are also provided.
This document outlines the agenda and content for a presentation on xPatterns, a tool that provides APIs and tools for ingesting, transforming, querying and exporting large datasets on Apache Spark, Shark, Tachyon and Mesos. The presentation demonstrates how xPatterns has evolved its infrastructure to leverage these big data technologies for improved performance, including distributed data ingestion, transformation APIs, an interactive Shark query server, and exporting data to NoSQL databases. It also provides examples of how xPatterns has been used to build applications on large healthcare datasets.
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
The driving question behind redesigns of countless data collection architectures has often been, ?how can we make the data available to our analytical systems faster?? Increasingly, the go-to solution for this data collection problem is Apache Flume. In this talk, architectures and techniques for designing a low-latency Flume-based data collection and delivery system to enable Hadoop-based analytics are explored. Techniques for getting the data into Flume, getting the data onto HDFS and HBase, and making the data available as quickly as possible are discussed. Best practices for scaling up collection, addressing de-duplication, and utilizing a combination streaming/batch model are described in the context of Flume and Hadoop ecosystem components.
This document discusses Red Hat's distributed cache and data grid platform, Infinispan. It provides an overview of Infinispan, describing it as a distributed in-memory key/value data grid and cache that is highly available, elastic, and manageable. It outlines Infinispan's architecture as both a standalone library and clustered server, its clustering and persistence capabilities, support for transactions and querying, and use cases such as session clustering and big data processing.
Apache Pegasus (incubating): A distributed key-value storage systemacelyc1112009
A presentation in ApacheCon Asia 2021 from Yuchen He and Shuo Jia.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
The document discusses Hazelcast, an in-memory data grid platform. Hazelcast provides features like scale-out computing, resilience, fast performance, and an easy programming model. It can be used for distributed caching, computing, messaging, and data storage. Hazelcast runs as a distributed system across multiple nodes and provides APIs for Java and other languages.
Elastic and Cloud-ready Applications with Payara MicroOndrej Mihályi
This session will explain how to build modern and scalable applications, while efficiently adding business value. With the right tools, technical decisions can be deferred and problems can be solved according to business needs instead. Payara Micro – an open source MicroProfile-compatible runtime – provides these tools in an easy-to-use package, allowing developers to focus on getting the job done. In addition, it can be connected using a standard API to Apache Kafka or Amazon SQS for high performance messaging.
In this talk, you’ll learn how to create an architecture around all these tools to get as much flexibility as possible and be ready to deploy your applications into cloud. During a live demonstration, you’ll see how a Java EE application can benefit from dynamic clustering, MicroProfile API, distributed configuration and scalable cache built into the Payara Micro runtime.
Elastic and Cloud-ready Applications with Payara MicroPayara
This document discusses how to build elastic and cloud-ready applications using Payara Micro. It covers several key aspects:
1) Payara Micro provides a scalable runtime by allowing applications to run as a lightweight executable JAR file that can dynamically form clusters of multiple instances for scalability.
2) Features like JCache, CDI events, and JCA connectors allow applications to support requirements for cloud deployments like pluggable persistence, loose coupling, and failure recovery.
3) Microprofile Config allows applications to access configuration from external sources in a standardized way, and Microprofile metrics and health provide monitoring capabilities.
4) Payara Micro integrates these technologies to provide a complete solution for building resilient
Elastic and Cloud-ready Applications with Payara MicroPayara
First presneted at the W-JAX Conference in Munich, Germany on the 8th of November 2017.
This session will explain how to build modern and scalable applications, while efficiently adding business value. With the right tools, technical decisions can be deferred and problems can be solved according to business needs instead. Payara Micro – an open source MicroProfile-compatible runtime – provides these tools in an easy-to-use package, allowing developers to focus on getting the job done. In addition, it can be connected using a standard API to Apache Kafka or Amazon SQS for high performance messaging.
In this talk, you’ll learn how to create an architecture around all these tools to get as much flexibility as possible and be ready to deploy your applications into cloud. During a live demonstration, you’ll see how a Java EE application can benefit from dynamic clustering, MicroProfile API, distributed configuration and scalable cache built into the Payara Micro runtime.
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Sunghyouk Bae
Kotlin Backend @ Coupang discusses Coupang's adoption of Kotlin for backend development. Some of the key reasons for adopting Kotlin included improving code safety, readability and testability. Example uses of Kotlin included developing common libraries, components like a Korean tokenizer, a Kafka client, an audit tool and a product creation pipeline system. Spring Data Requery was also developed as an alternative to JPA/Hibernate that provided better performance. Overall, Kotlin helped improve code quality, simplify asynchronous programming and increase development productivity at Coupang.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
3. Distributed Cache
• Fast access to data
• Performance boost
• Elasticity
• High Availability
• JSR 107 (Temporary Caching for the Java Platform)
– Read, write, expiration, write-through, distribution
5. In-Memory Data Grid
• Evolution of distributed caches
• Clustered by nature (shared across multiple servers)
• Low response time
• High throughput
• Predictable scalability
• High Availability
• Querying
• Task Execution (Map/Reduce)
• JSR 347 (Data Grid for the Java Platform)
6. In-Memory Data Grid (cont.)
Persistence
(DataStore)
In-Memory
Data Grid
Node
Node
Node
App
App
App
7. IMDG for What?
• Sharing data (session, app states)
• Toolkit for clustering
• Performance (caching, in-memory processing)
• Scalability
• Database on cloud
9. Cache vs. Data Grid
• JSR 107 - Temporary Caching for the Java Platform
– Basic interaction(read, write, expiry)
– Transactions with JTA compatibility
– Listener
– Persistence: read-through, write-through, write-behind
– Annotations
– javax.cache.*
• JSR 347 - Data Grids for the Java Platform
– Asynchronous, non-blocking API
– Distributed code execution and map/reduce API
– Group API for co-location
– Annotations (CDI)
– Eventually Consistent API
– Querying
– Configuration
– javax.datagrid.*
13. Architecture: as library, clustered
Cluster
Infinispan
JVM
App
Infinispan
JVM
App
Infinispan
JVM
App
Application doesn’t know it’s on cluster
l Use as library
– More features
– Richer APIs
– Programmatic/
Declarative
configuration
– Extendable/
embeddable
– Faster (API call)
14. Architecture: as server, clustered
• Use as server
– Remote
– Data tier shared by
multi-apps
– App doesn’t affect
cluster
– Non-java clients
• C++, .NET, Ruby
, Python, Java
Cluster
Infinispan
JVM
Infinispan
JVM
Infinispan
JVM
App
App
App
15. Architecture: multi clusters
• Multi-clusters
– By replication
– By persistence
– By replication
to other clust
er (topology a
ware)
Cluster
Infinispan
JVM
Infinispan
JVM
Cluster
Infinispan
JVM
Infinispan
JVM
persistence
16. Clustering
• Peer-to-Peer
– No central master, no single point of failure, no single
bottle neck
• JGroups
– Reliable multicast communication library, nodes
discovery, sharing data, performing cluster scaling
• Consistent Hash
– Hash based data distribution
– How it finds where data locates
• Linear in nature: throughput, capacity
20. Persistence
• Used for durability
• Cache Store - Persistence Storage
– File System, Cloud, Remote, JDBC, JPA, LevelDB,
Cassandra, HBase, MongoDB, BerkeleyDB, JDBM, REST
• CacheLoader, CacheWriter
• Read-through, write-through, write-behind
• Passivation, activation
• Store chain
• Shared store
21. Persistence (cont.)
• Passivation – write to persistence when evicted
from memory (default)
• Activation – read to memory and remove from
persistence
24. Distributed Execution
• Executes codes on distributed nodes
• Through a standard JDK ExecutorService interface
• Use DistributedCallable extends
java.util.concurrent.Callable
25. Map/Reduce
• Based on Distributed Execution Framework
• Mapper, Reducer, Collator, MapReduceTask
public interface Callator<KOut, Vout, R> {
R collate(Map<KOut, VOut>);
}
public interface Mapper<KIn, VIn, KOut, VOut> extends Serializable {
void map(KIn key, VIn value, Collector<KOut, VOut> collector);
}
public interface Reducer<KOut, VOut> extends Serializable {
VOut reduce(KOut reducedKey, Iterator<VOut> iter);
}
28. Listener
• Listener on CacheManager
– Node join/ leave, Cache start/ stop
• Cache
– CRUD, Eviction/ Passivation
– Rehashing/ Transaction completion
@Listener
public class SimpleListener {
@CacheEntryCreated
public void dataAdded(CacheEntryCreatedEvent event) {
if (event.isPre()) {
System.out.println("Before creating the entry:" + event.getKey());
} else {
System.out.println("After creating the entry:" + event.getKey());
}
…
}
DefaultCacheManager manager = new DefaultCacheManager();
manager.addListener(listener);
Cache<Integer, Ticket> cache = manager.getCache();
cache.addListener(listener);
29. Asynchronous APIs
• put() and get() and remove() are synchronous
– They wait for RPC and Locks (and maybe cache stores)
• The asynchronous API returns NotifyingFuture
– Events are fired on completion of the operation
NotifyingFuture<String> future = c.removeAsync(key);
future.attachListener(new FutureListener<String>() {
@Override
public void futureDone(Future<String> future) {
try {
future.get();
System.out.printf ("The entry stored under key %s has been removed.", key);
} catch (ExecutionException e) {
System.out.printf("Failed to remove %s!", key);
}
}
});
30. Spring Integration
• Infinispan provider for Spring cache abstraction
• infinispan-spring.jar
<cache:annotation-driven cache-manager=“myCacheManager"/>
<bean id="operationCacheManager"
class="org.infinispan.spring.provider.SpringEmbeddedCacheManagerFactoryBean"
p:configurationFileLocation="classpath:infinispan -config.xml" />
@Cacheable(value = "secureContextCache", key="#contextId")
public SecureLayerContext getSecureLayerContext(String contextId) {
return null;
}
@CachePut(value = "secureContextCache", key="#contextId")
public SecureLayerContext setSecureLayerContext(String contextId,
SecureLayerContext secureLayerContext) {
return secureLayerContext;
}
@CacheEvict(value = "secureContextCache", key="#contextId")
public void removeSecureLayerContext(String contextId) {
// Intentionally blank
}
32. Infinispan on JBoss AS 7 (WildFly 8)
• Used for session clustering, Hibernate L2 cache
• Application gets cache with JNDI name using
@Resource
• XML Configuration in server configuration file
<cache-container name="web" aliases="standard-session-cache" default-cache="repl">
<transport lock-timeout="60000" />
<replicated-cache name="repl" mode="ASYNC" batching="true">
<file-store />
</replicated-cache>
</cache-container>
33. Marshalling
• JBoss Marshalling framework used for POJOs
• User can provide custom Externalizer impls for
non-Serializable object
• Custome marshaller can be provided as well
36. Radar Gun
• Data grid and distributed cache benchmarking
framework
• Built to test Infinispan and other distributed data
grid platforms
• https://github.com/radargun/radargun
38. Use Cases: In Streaming Processing
Infinispan Data Grid
39. Use Cases: Data Grid Platform
In-Memory 대용량 데이터 처리를 위한 아키텍쳐 구조 제시 및 적용 사례,
제 6회 한국 소프트웨어 아키텍트 대회 최우수상, 2013
40. Use Case: Session Clustering
• Store session information into cache
in Spring MVC Interceptor
41. Case Study: Session Clustering
#1 Spring Cache Abstraction 사용하
여 쉽게 다른 캐시 구현체 사용 가능하
게
• ConcurrentHashMap, EHCache
• Infinispan
#2 SecurityContext를 캐시에 저장
• 기본적으로 HTTP Session에 저장
(HttpSessionSecurityContextRepository)
• SecurityContextRepository 구현한
CacheSecurityContextRepository 작성
하여 HTTP Session 대신 캐시를 사용하
도록 함
42. Case Study: Infinispan with Spring
Loadbalancer
Client
Client
Client
Infinispan
Application
Services
Spring
Security
WAS
Infinispan
Application
Services
Spring
Security
WAS
Infinispan
Application
Services
Spring
Security
WAS
. . .
43. Infinispan with Spring (cont.)
• infinispan-config.xml
<global>
<transport clusterName=”MyCacheCluster">
<properties>
<property name="configurationFile" value="jgroups-tcp.xml" />
</properties>
</transport>
</global>
<default>
<clustering mode="replication">
<sync />
</clustering>
</default>
<namedCache name="securityContextCache">
<!– maxEntries means the maximum concurrent user connections-->
<eviction strategy="LIRS" maxEntries="10000" />
<!-- the max idle time is 30 minutes -->
<expiration maxIdle="1800000" />
</namedCache>
44. Infinispan with Spring (cont.)
• spring-cache-config.xml
• SpringEmbeddedCacheManagerFactoryBean
<cache:annotation-driven cache-manager="myCacheManager"/>
<bean id="myCacheManager"
class="my.domain.MyCacheManagerFactoryBean"
p:configurationFileLocation="classpath:infinispan-config.xml" />
public class MyCacheManagerFactoryBean extends SpringEmbeddedCacheManagerFactoryBean {
@Override public void afterPropertiesSet() throws Exception {
super.afterPropertiesSet();
addListeners();
}
private void addListeners() throws Exception {
SpringEmbeddedCacheManager cacheManager = getObject();
if (cacheManager.getNativeCacheManager() instanceof EmbeddedCacheManager) {
cacheManager.getNativeCacheManager().addListener(new MyCacheManagerListener());
}
Collection<String> cacheNames = cacheManager.getCacheNames();
for (String cacheName : cacheNames) {
SpringCache cache = cacheManager.getCache(cacheName);
if (cache.getNativeCache() instanceof Cache) {
((Cache<?,?>)cache.getNativeCache()).addListener(new MyCacheListener());
}}}}
45. Infinispan with Spring (cont.)
• Dao Implementation
@Named("SecurityContextDao")
public class SecurityContextDaoImpl implements SecurityContextDao {
@Cacheable(value = "securityContextCache", key="#key")
public SecurityContext getSecurityContext(String key) {
return null;
}
@CachePut(value = "securityContextCache", key="#key")
public SecurityContext setSecurityContext(String key, SecurityContext securityContext) {
return securityContext;
}
@CacheEvict(value = "securityContextCache", key="#key")
public void removeSecurityContext(String key) {
// Intentionally blank
}
}
46. Infinispan with Spring (cont.)
• Spring Security’s SecurityContextRepository
public class MyCacheSecurityContextRepository implements SecurityContextRepository {
@Inject SecurityContextDao securityContextDao;
public SecurityContext loadContext(HttpRequestResponseHolder requestResponseHolder) {
...
return securityContextDao.getSecurityContext(authToken);
}
public void saveContext(SecurityContext context, HttpServletRequest request,
HttpServletResponse response) {
...
securityContextDao.setSecurityContext(authToken, context);
...
}
public boolean containsContext(HttpServletRequest request) {
...
return securityContextDao.getSecurityContext(key) != null;
}
}