Do you need to scale your application, share data across cluster, perform massive parallel processing on many JVMs or maybe consider alternative to your favorite NoSQL technology? Hazelcast to the rescue! With Hazelcast distributed development is much easier. This presentation will be useful to those who would like to get acquainted with Hazelcast top features and see some of them in action, e.g. how to cluster application, cache data in it, partition in-memory data, distribute workload onto many servers, take advantage of parallel processing, etc.
Presented on JavaDay Kyiv 2014 conference.
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
This presentation on "Getting Started with HazelCast" was made by Sandeep Kumar Pandey from Lastminute.com in Core Java / BoJUG meetup group on 24th March.
"In this session, we are going to talk about high level architecture of Hazelcast framework and we will look into the Java Collections and concepts which has been used to build the framework. We will also have a live demo on Distributed Cache using Hazelcast."
From distributed caches to in-memory data gridsMax Alexejev
A brief introduction into modern caching technologies, starting from distributed memcached to modern data grids like Oracle Coherence.
Slides were presented during distributed caching tech talk in Moscow, May 17 2012.
Do you need to scale your application, share data across cluster, perform massive parallel processing on many JVMs or maybe consider alternative to your favorite NoSQL technology? Hazelcast to the rescue! With Hazelcast distributed development is much easier. This presentation will be useful to those who would like to get acquainted with Hazelcast top features and see some of them in action, e.g. how to cluster application, cache data in it, partition in-memory data, distribute workload onto many servers, take advantage of parallel processing, etc.
Presented on JavaDay Kyiv 2014 conference.
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
This presentation on "Getting Started with HazelCast" was made by Sandeep Kumar Pandey from Lastminute.com in Core Java / BoJUG meetup group on 24th March.
"In this session, we are going to talk about high level architecture of Hazelcast framework and we will look into the Java Collections and concepts which has been used to build the framework. We will also have a live demo on Distributed Cache using Hazelcast."
From distributed caches to in-memory data gridsMax Alexejev
A brief introduction into modern caching technologies, starting from distributed memcached to modern data grids like Oracle Coherence.
Slides were presented during distributed caching tech talk in Moscow, May 17 2012.
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner
Apache Kafka in conjunction with Apache Spark became the de facto standard for processing and analyzing data. Both frameworks are open, flexible, and scalable.
Unfortunately, the latter makes operations a challenge for many teams. Ideally, teams can use serverless SaaS offerings to focus on business logic. However, hybrid and multi-cloud scenarios require a cloud-native platform that provides automated and elastic tooling to reduce the operations burden.
This session explores different architectures to build serverless Apache Kafka and Apache Spark multi-cloud architectures across regions and continents.
We start from the analytics perspective of a data lake and explore its relation to a fully integrated data streaming layer with Kafka to build a modern data Data Lakehouse.
Real-world use cases show the joint value and explore the benefit of the "delta lake" integration.
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
This presentation:
* covers basics of caching and popular cache types
* explains evolution from simple cache to distributed, and from distributed to IMDG
* not describes usage of NoSQL solutions for caching
* is not intended for products comparison or for promotion of Hazelcast as the best solution
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
Yahoo! is one of the most-visited web sites in the world. It runs one of the largest private cloud infrastructures, one that operates on petabytes of data every day. Being able to store and manage that data well is essential to the efficient functioning of Yahoo!`s Hadoop clusters. A key component that enables this efficient operation is data compression. With regard to compression algorithms, there is an underlying tension between compression ratio and compression performance. Consequently, Hadoop provides support for several compression algorithms, including gzip, bzip2, Snappy, LZ4 and others. This plethora of options can make it difficult for users to select appropriate codecs for their MapReduce jobs. This paper attempts to provide guidance in that regard. Performance results with Gridmix and with several corpuses of data are presented. The paper also describes enhancements we have made to the bzip2 codec that improve its performance. This will be of particular interest to the increasing number of users operating on “Big Data” who require the best possible ratios. The impact of using the Intel IPP libraries is also investigated; these have the potential to improve performance significantly. Finally, a few proposals for future enhancements to Hadoop in this area are outlined.
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerDatabricks
Kubernetes is the most popular container orchestration system that is natively designed for Cloud. At Lyft and Cloudera, we have both emerged the next-generation, cloud-native infrastructure based on Kubernetes, which supports various distributed workloads.
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
Containers are everywhere. But what exactly is a container? What are they made from? What's the difference between LXC, butts-nspawn, Docker, and the other container systems out there? And why should we bother about specific filesystems?
In this talk, Jérôme will show the individual roles and behaviors of the components making up a container: namespaces, control groups, and copy-on-write systems. Then, he will use them to assemble a container from scratch, and highlight the differences (and likelinesses) with existing container systems.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Koalas: Making an Easy Transition from Pandas to Apache SparkDatabricks
Koalas is an open-source project that aims at bridging the gap between big data and small data for data scientists and at simplifying Apache Spark for people who are already familiar with pandas library in Python. Pandas is the standard tool for data science and it is typically the first step to explore and manipulate a data set, but pandas does not scale well to big data.
Hazelcast is an easy to use but scalable in-memory datagrid and distributed executor framework. It enables you to build applications having a big requirement on memory or that needs to scale horizontally.
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner
Apache Kafka in conjunction with Apache Spark became the de facto standard for processing and analyzing data. Both frameworks are open, flexible, and scalable.
Unfortunately, the latter makes operations a challenge for many teams. Ideally, teams can use serverless SaaS offerings to focus on business logic. However, hybrid and multi-cloud scenarios require a cloud-native platform that provides automated and elastic tooling to reduce the operations burden.
This session explores different architectures to build serverless Apache Kafka and Apache Spark multi-cloud architectures across regions and continents.
We start from the analytics perspective of a data lake and explore its relation to a fully integrated data streaming layer with Kafka to build a modern data Data Lakehouse.
Real-world use cases show the joint value and explore the benefit of the "delta lake" integration.
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
This presentation:
* covers basics of caching and popular cache types
* explains evolution from simple cache to distributed, and from distributed to IMDG
* not describes usage of NoSQL solutions for caching
* is not intended for products comparison or for promotion of Hazelcast as the best solution
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
Yahoo! is one of the most-visited web sites in the world. It runs one of the largest private cloud infrastructures, one that operates on petabytes of data every day. Being able to store and manage that data well is essential to the efficient functioning of Yahoo!`s Hadoop clusters. A key component that enables this efficient operation is data compression. With regard to compression algorithms, there is an underlying tension between compression ratio and compression performance. Consequently, Hadoop provides support for several compression algorithms, including gzip, bzip2, Snappy, LZ4 and others. This plethora of options can make it difficult for users to select appropriate codecs for their MapReduce jobs. This paper attempts to provide guidance in that regard. Performance results with Gridmix and with several corpuses of data are presented. The paper also describes enhancements we have made to the bzip2 codec that improve its performance. This will be of particular interest to the increasing number of users operating on “Big Data” who require the best possible ratios. The impact of using the Intel IPP libraries is also investigated; these have the potential to improve performance significantly. Finally, a few proposals for future enhancements to Hadoop in this area are outlined.
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerDatabricks
Kubernetes is the most popular container orchestration system that is natively designed for Cloud. At Lyft and Cloudera, we have both emerged the next-generation, cloud-native infrastructure based on Kubernetes, which supports various distributed workloads.
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
Containers are everywhere. But what exactly is a container? What are they made from? What's the difference between LXC, butts-nspawn, Docker, and the other container systems out there? And why should we bother about specific filesystems?
In this talk, Jérôme will show the individual roles and behaviors of the components making up a container: namespaces, control groups, and copy-on-write systems. Then, he will use them to assemble a container from scratch, and highlight the differences (and likelinesses) with existing container systems.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Koalas: Making an Easy Transition from Pandas to Apache SparkDatabricks
Koalas is an open-source project that aims at bridging the gap between big data and small data for data scientists and at simplifying Apache Spark for people who are already familiar with pandas library in Python. Pandas is the standard tool for data science and it is typically the first step to explore and manipulate a data set, but pandas does not scale well to big data.
Hazelcast is an easy to use but scalable in-memory datagrid and distributed executor framework. It enables you to build applications having a big requirement on memory or that needs to scale horizontally.
Microservice Performance Metrics
Challenges and requirements
Tools and techniques available
(anti)patterns
Analyzing services with
Dashboard, Hystrix, DynaTrace
Setting up your services in the Dashboards and working through real and simulated problems
The ability of a system to respond gracefully to an unexpected hardware or software failure.
There are many levels of fault tolerance, the lowest being the ability to continue operation in the event of a power failure. Many fault-tolerant computer systems mirror all operations -- that is, every operation is performed on two or more duplicate systems, so if one fails the other can take over.
In most cases it could be achieved by redundancy in application design and set of patterns and approaches to software design.
Distributed Computing in Hazelcast - Geekout 2014 EditionChristoph Engelbert
Today’s amounts of collected data are showing a nearly exponential growth. More than 75% of all the data have been collected in the past 5 years. To store this data and process it in an appropriate time you need to partition the data and parallelize the processing of reports and analytics.
This talk will demonstrate how to parallelize data processing using Hazelcast and it’s underlying distributed data structures. With a quick introduction into the different terms and some short live coding examples we will make the journey into the distributed computing.
Sourcecode of the demonstrations are available here:
1. https://github.com/noctarius/hazelcast-mapreduce-presentation
2. https://github.com/noctarius/hazelcast-distributed-computing
Today’s amounts of collected data are showing nearly exponential growth. More than 75 percent of all collected data has been collected in the past five years. To store that data and process it within an appropriate time, you need to partition the data and parallelize the processing of reports and analytics. This session demonstrates how to quickly and easily parallelize data processing with Hazelcast and its underlying distributed data structures. By giving a few quick introductions to different terms and some short live coding sessions, the presentation takes you on a journey through distributed computing.
The way from monolithic to micro service architectures can hard. Overall micro services are not the all holy grail to just solve all your issues. You need to be aware that you need the right developers and the right toolset. Oh and not to forget, moving state to authorization systems doesn't mean your application is really stateless :)
Anyhow micro services are a great architecture and this deck is a short introduction on why we need to change our application architectures and what pitfalls you you have when introducing the idea of micro services.
In this webinar we will compare the complexities involved in Terracotta with the code/configuration changes to migrate to Hazelcast. You will learn about important features of Hazelcast such as IMDG capabilities, off-heap data storage, distributed collections, etc. and the feature-rich product portfolio of Hazelcast. We will cover how Hazelcast can scale up and out dynamically and without downtime against the static configuration of Terracotta. Expect to leave the webinar being more educated about Hazelcast in terms of architecture, important features and best practices.
We’ll cover these topics:
- Hazelcast architecture and features
- Terracotta distributed architecture
- Scale – Vertical + Horizontal = Showcase no downtime feature in Hazelcast
- BigMemory vs. HDC
- Ease of installation – two jars against multiple jars
- Config and Code changes – cache vs. maps, off-heap vs. HDC
- Portability of Client APIs – IMap, IQueue, Topics, etc.
- Added functionalities – Showcase IExecutorService, EntryProcessors, Multimap, etc.
- DSO – Showcase EntryProcessors taking place of DSO
- Live Q&A
Presenter:
Rahul Gupta, Senior Solutions Architect
Rahul is a technology-driven professional with 12+ years of experience in building and architecting highly scalable and concurrent, low latency business critical distributed infrastructure. His expertise lies in Big Data and Real Time Analytics space where he specializes in big data governing technologies and Enterprise Architecture. Rahul is an expert in working with decision makers across different business verticals within an organization and guiding them in right decision making through in-depth technical understanding, analysis and evaluation procedures to bring home critical deals with high business values.
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and IgniteJoseph Kuo
This session aims to establish applications running against distributed and scalable system, or as we know cloud computing system. We will introduce you not only briefing of Hazelcast but also deeper kernel of it, and how it works with Spark, the most famous Map-reduce library. Furthermore, we will introduce another in-memory cache called Apache Ignite and compare it with Hazelcast to see what's the difference between them. In the end, we will give a demonstration showing how Hazelcast and Spark work together well to form a cloud-base service which is distributed, flexible, reliable, available, scalable and stable. You can find demo code here: https://github.com/CyberJos/jcconf2016-hazelcast-spark
https://cyberjos.blog/java/seminar/jcconf-2016-cloud-computing-applications-hazelcast-spark-and-ignite/
Sharing of Distributed Objects in a DX Cluster, thanks to Hazelcast - Online ...Jahia Solutions Group
Here are the slides of Jahia's Developers Meetup held online, via Cisco WebEx, on March 15, 2017.
The topic of this meetup was: Sharing of Distributed Objects in a DX Cluster, thanks to Hazelcast.
[OracleCode SF] In memory analytics with apache spark and hazelcastViktor Gamov
Apache Spark is a distributed computation framework optimized to work in-memory, and heavily influenced by concepts from functional programming languages.
Hazelcast - open source in-memory data grid capable of amazing feats of scale - provides wide range of distributed computing primitives computation, including ExecutorService, M/R and Aggregations frameworks.
The nature of data exploration and analysis requires data scientists be able to ask questions that weren't planned to be asked—and get an answer fast!
In this talk, Viktor will explore Spark and see how it works together with Hazelcast to provide a robust in-memory open-source big data analytics solution!
Making Clouds: Turning OpenNebula into a ProductNETWAYS
What does it takes to bring innovations like private clouds to small and medium enterprises? In the course of this talk we will present our experience in creating a self-service toolkit for creating a complete virtualization and cloud platform based on OpenNebula, as well as our experience gathered in tens of installations of all sizes. From scalable storage (with benchmarks!) to autonomic optimization, we will present what in our view is needed to bring private clouds to everyone, what components and additions we created to better solve our customers’ problems (from replacing industrial control systems to medium scale virtual desktop infrastructures), and why OpenNebula has been chosen over other competing cloud toolkits.
OpenNebulaConf 2013 - Making Clouds: Turning OpenNebula into a Product by Car...OpenNebula Project
What does it takes to bring innovations like private clouds to small and medium enterprises? In the course of this talk we will present our experience in creating a self-service toolkit for creating a complete virtualization and cloud platform based on OpenNebula, as well as our experience gathered in tens of installations of all sizes. From scalable storage (with benchmarks!) to autonomic optimization, we will present what in our view is needed to bring private clouds to everyone, what components and additions we created to better solve our customers’ problems (from replacing industrial control systems to medium scale virtual desktop infrastructures), and why OpenNebula has been chosen over other competing cloud toolkits.
Bio:
Carlo Daffara the Technical director of Cloudweavers, and formerly head of research and development at Conecta, a consulting firm specializing in open source systems and distributed computing; Italian member of the European Working Group on Libre Software and co-coordinator of the working group on SMEs of the EU ICT task force on competitiveness. Since 1999, works as evaluator for IST programme submissions in the field of component-based software engineering, GRIDs and international cooperation. Coordinator of the open source platforms technical area of the IEEE technical committee on scalable computing, co-chair of the SIENA EU cloud initiative roadmap editorial board and part of the editorial review board of the International Journal of Open Source Software & Processes (IJOSSP).
28March2024-Codeless-Generative-AI-Pipelines
https://www.meetup.com/futureofdata-princeton/events/299440871/
https://www.meetup.com/real-time-analytics-meetup-ny/events/299290822/
******Note*****
The event is seat-limited, therefore please complete your registration here. Only people completing the form will be able to attend.
-----------------------
We're excited to invite you to join us in-person, for a Real-Time Analytics exploration!
Join us for an evening of insights, networking as we delve into the OSS technologies shaping the field!
Agenda:
05:30-06:00: Pizza and friends
06:00- 06:40: Codeless GenAI Pipelines with Flink, Kafka, NiFi
06:40- 07:20 Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders
07:20-07:30 QNA
Codeless GenAI Pipelines with Flink, Kafka, NiFi | Tim Spann, Cloudera
Explore the power of real-time streaming with GenAI using Apache NiFi. Learn how NiFi simplifies data engineering workflows, allowing you to focus on creativity over technical complexities. I'll guide you through practical examples, showcasing NiFi's automation impact from ingestion to delivery. Whether you're a seasoned data engineer or new to GenAI, this talk offers valuable insights into optimizing workflows. Join us to unlock the potential of real-time streaming and witness how NiFi makes data engineering a breeze for GenAI applications!
Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders | Viktor Gamov, StarTree
Explore how industry leaders like LinkedIn, Uber Eats, and Stripe are mastering real-time data with Viktor as your guide. Discover how Apache Pinot transforms data into actionable insights instantly. Viktor will showcase Pinot's features, including the Star-Tree Index, and explain why it's a game-changer in data strategy. This session is for everyone, from data geeks to business gurus, eager to uncover the future of tech. Join us and be wowed by the power of real-time analytics with Apache Pinot!
-------
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera.
He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more.
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
AI has been a hot topic lately, with advances being made constantly in what is possible, there has not been as much discussion of the infrastructure and scaling challenges that come with it. How do you support dozens of different languages and frameworks, and make them interoperate invisibly? How do you scale to run abstract code from thousands of different developers, simultaneously and elastically, while maintaining less than 15ms of overhead?
At Algorithmia, we’ve built, deployed, and scaled thousands of algorithms and machine learning models, using every kind of framework (from scikit-learn to tensorflow). We’ve seen many of the challenges faced in this area, and in this talk I’ll share some insights into the problems you’re likely to face, and how to approach solving them.
In brief, we’ll examine the need for, and implementations of, a complete “Operating System for AI” – a common interface for different algorithms to be used and combined, and a general architecture for serverless machine learning which is discoverable, versioned, scalable and sharable.
Linux Distribution Collaboration …on a Mainframe!All Things Open
Presented at All Things Open 2023
Presented by Elizabeth K. Joseph - IBM
Title: Linux Distribution Collaboration …on a Mainframe!
Abstract: Linux has run on the mainframe architecture (s390x) for over 20 years now, and there’s even Linux-only mainframe hardware! But tight collaboration between the Linux distributions is rather new. Enter the Open Mainframe Project Linux Distributions Working Group, founded in late 2021.
Bringing together various Linux distributions, both corporate-backed and community-driven, representatives from openSUSE, Debian, Fedora, SUSE, and more immediately joined the effort to share bug reports and patches that impact all the distributions. Issues are often shared and discussed on the mailing list, and more complicated topics covered during the monthly meetings. The working group has a number of success stories that will be shared.
Future potential issues are also tackled, and notes shared about upstream changes that may soon impact the package processes. In the latest effort, the team has started thinking about actual upstream projects to invite to our group to be more pro-active about changes that may cause problems on the s390x architecture.
But more importantly, this is a story about community and collaboration. Many people view the various Linux distributions as a competitive space, but like so much of the open source software community, we are all more successful when we share knowledge about our core. The success of this working group, and growing enthusiasm for it from new Linux distributions who are joining, is a great example of this.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)DataWorks Summit
Once the staple of HPC clusters, today high-performance network and storage devices are everywhere. For a fraction of the cost, one can rent 40/100 Gbps RDMA networks and high-end NVMe flash devices supporting 10s GB/s bandwidths, less than 100 microseconds of latencies, with millions of IOPS. How does one leverage this phenomenal performance for popular data processing frameworks such as Apache Spark, Flink, Hadoop that we all know and love?
In this talk, I will introduce the Apache Crail (Incubating), which is a fast, distributed data store that is designed specifically for high-performance network and storage devices. The goal of the project is to deliver the true hardware performance to Apache data processing frameworks in the most accessible way. With its modular design, Crail supports multiple storage back ends (DRAM, NVMe Flash, and 3D XPoint) and networking protocols (RDMA and TPC/sockets). Crail provides multiple flexible APIs (file system, KV, HDFS, streaming) for a better integration with the high-level data access operations in Apache compute frameworks. As a result, on a 100 Gbps network infrastructure, Crail delivers all-to-all shuffle operations at 80+ Gbps speed, broadcast operations at less than 10 usec latencies, and more than 8M lookups/namenode, etc. Moreover, Crail is a generic solution that integrates well with the Apache ecosystem including frameworks like Spark, Hadoop, Hive, etc.
I will present the case for Crail, its current status, and future plans. As Crail is a young Apache project, we are seeking to build a community and expand its application to other interesting domains.
Speaker
Animesh Trivedi, IBM Research, Research Staff Member (RSM)
Unified Batch and Real-Time Stream Processing Using Apache FlinkSlim Baltagi
This talk was given at Capital One on September 15, 2015 at the launch of the Washington DC Area Apache Flink Meetup. Apache flink is positioned at the forefront of 2 major trends in Big Data Analytics:
- Unification of Batch and Stream processing
- Multi-purpose Big Data Analytics frameworks
In these slides, we will also find answers to the burning question: Why Apache Flink? You will also learn more about how Apache Flink compares to Hadoop MapReduce, Apache Spark and Apache Storm.
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...confluent
Watch this talk here: https://www.confluent.io/online-talks/scaling-security-on-100s-of-millions-of-mobile-devices-using-kafka-and-scylla-on-demand
Join mobile cybersecurity leader Lookout as they talk through their data ingestion journey.
Lookout enables enterprises to protect their data by evaluating threats and risks at post-perimeter endpoint devices and providing access to corporate data after conditional security scans. Their continuous assessment of device health creates a massive amount of telemetry data, forcing new approaches to data ingestion. Learn how Lookout changed its approach in order to grow from 1.5 million devices to 100 million devices and beyond, by implementing Confluent Platform and switching to Scylla.
TYPO3 CMS v8 in the cloud
This session will look into changes happening with TYPO3 CMS version 8 and how they relate to an improved integration with cloud infrastructure:
untangled file-handling for better support
untangled database abstraction layer to support different database backends
updated and now finely tuneable caching framework
composer changes for repeatable builds
possible pre-compilation for uid
Finally the session will also look into the practical example of deploying TYPO3 into the platform.sh cloud to kickstart the audience.
The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...NRB
Containerization on IBM Z : the notion of containers, their principles, how it works, their benefits on IBM Z and the reasons to adopt containers.
The second part of the presentation focuses on the various solutions available on IBM Z to run and execute your containers at the best place, on IBM Z !
Extending DevOps to Big Data Applications with KubernetesNicola Ferraro
DevOps, continuous delivery and modern architectural trends can incredibly speed up the software development process. Big Data applications cannot be an exception and need to keep the same pace.
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Rafael Ferreira da Silva
Scientific workflows are used routinely in numerous scientific domains, and Workflow Management Systems (WMSs) have been developed to orchestrate and optimize workflow executions on distributed platforms. WMSs are complex software systems that interact with complex software infrastructures. Most WMS research and development activities rely on empirical experiments conducted with full-fledged software stacks on actual hardware platforms. Such experiments, however, are limited to hardware and software infrastructures at hand and can be labor- and/or time-intensive. As a result, relying solely on real- world experiments impedes WMS research and development. An alternative is to conduct experiments in simulation.
In this work we present WRENCH, a WMS simulation framework, whose objectives are (i) accurate and scalable simula- tions; and (ii) easy simulation software development. WRENCH achieves its first objective by building on the SimGrid framework. While SimGrid is recognized for the accuracy and scalability of its simulation models, it only provides low-level simulation abstractions and thus large software development efforts are required when implementing simulators of complex systems. WRENCH thus achieves its second objective by providing high- level and directly re-usable simulation abstractions on top of SimGrid. After describing and giving rationales for WRENCH’s software architecture and APIs, we present a case study in which we apply WRENCH to simulate the Pegasus production WMS. We report on ease of implementation, simulation accuracy, and simulation scalability so as to determine to which extent WRENCH achieves its two above objectives. We also draw both qualitative and quantitative comparisons with a previously proposed workflow simulator.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
3. about me
3
core developer at hazelcast
holds bsc. computer engineering
started programming some time ago,
then turned into a career
lives in beautiful istanbul
interested in distributed systems
4. distributed computing
4
use of bunch of computers to solve a
computational problem
problem is divided into multiple tasks
and they are solved by one or more
computers
computers communicate each other
by sending messages
5. in-memory data grids
5
middleware software
shared nothing architecture
manages objects across distributed
servers in the RAM
ability to scale
provides fault tolerance
6. why use an imdg?
6
performance - ram is faster
flexibility - rich set of data structures
operations - easy to scale/maintain
7. other imdg solutions
7
oracle coherence
ibm extremescale
vmware gemfire
gigaspaces
redhat infinispan
gridgain
terracotta
11. a company
11
hazelcast enterprise edition
management center
enterprise support
training / consulting
offices in istanbul (r&d), palo alto(hq)
and london
12. an open-source project
12
leading open-source in-memory data
grid.
dead simple distributed
programming
easy way to scale applications
simple api
built with in Istanbul
13. use cases
13
scaling your application
sharing data across cluster
partitioning data
sending/receiving messages
load balancing
session replication
parallel task processing on multiple
machines
…
14. how differs ?
14
apache licensed open source
lightweight w/o any dependency
ease of use and more fun !
15. 15
fact : every ~0.4 second a hazelcast node is started
around the world
a lot of developers :)
who uses ?
17. 17
what is ?
distributed impl. of Java Collections
dynamic clustering, backup and
failover
transaction support ( two phase, XA)
distributed execution framework
map/reduce api
distributed queries
native Java, C#, C++ clients
33. service provider interface
( SPI )
33
roll your own services
extend hazelcast based on your needs !
hierarchical lock service
priority queue
scheduled executor service
distributed actors
anything you can think of !
check out SPI section of the hazelcast
documentation