This tutorial was presented in KDD 2016 conference in San Francisco, CA. You can find the main presentation at http://www.slideshare.net/NeeraAgarwal2/streaming-analytics
This talk will explore two libraries, a Cassandra native CQL client and a Clojure DSL for writing CQL3 queries.
This will demonstrate how Cassandra and Clojure are a great fit, show the strength of the functional approach to this domain and how in particular the data centric nature of Clojure makes a lot of sense in this context.
You may all know that JSON is a subset of JavaScript, but... Did you know that HTML5 implements NoSQL databases? Did you know that JavaScript was recommended for REST by Roy T. Fielding himself? Did you know that map & reduce are part of the native JavaScript API? Did you know that most NoSQL solutions integrate a JavaScript engine? CouchDB, MongoDB, WakandaDB, ArangoDB, OrientDB, Riak.... And when they don't, they have a shell client which does...
The story of NoSQL and JavaScript goes beyond your expectations and open more opportunities than you might imagine... What better match could you find than a flexible and dynamic language for schemaless databases? Isn't, an event-driven language what you were waiting for to manage eventually consistency? When NoSQL doesn't come to JavaScript, JavaScript comes to NoSQL, and does it very well...
A fast Message-Queue base on zookeeper.
Jafka mq is a distributed publish-subscribe messaging system cloning from Apache Kafka.
So it has following features:
(1)Persistent messaging with O(1) disk structures that provide constant time performance even with many TB of stored messages.
(2)High-throughput: even with very modest hardware single broker can support hundreds of thousands of messages per second.
(3)Explicit support for partitioning messages over broker servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics.
(4)Simple message format for many language clients.
New languages bring new ways of thinking and teach us new principles and tools that we can bring back to your day to day language.
Using a real application as an example, we will learn how to build and design Java applications that follow Clojure’s functional principles using just core Java, without any libraries, lambdas, streams or weird syntax; and we will see what benefits those functional principles can bring.
Continuing where module 2 left off, this part of the course explains signals and slots in more detail and tells you how to extend functionality of existing widgets by subclassing them. In real applications, widgets are often used in dialogs or inside the main window, which is a container for widgets and by default supports menus, toolbars and actions. These topics are all demonstrated via small examples.
This talk will explore two libraries, a Cassandra native CQL client and a Clojure DSL for writing CQL3 queries.
This will demonstrate how Cassandra and Clojure are a great fit, show the strength of the functional approach to this domain and how in particular the data centric nature of Clojure makes a lot of sense in this context.
You may all know that JSON is a subset of JavaScript, but... Did you know that HTML5 implements NoSQL databases? Did you know that JavaScript was recommended for REST by Roy T. Fielding himself? Did you know that map & reduce are part of the native JavaScript API? Did you know that most NoSQL solutions integrate a JavaScript engine? CouchDB, MongoDB, WakandaDB, ArangoDB, OrientDB, Riak.... And when they don't, they have a shell client which does...
The story of NoSQL and JavaScript goes beyond your expectations and open more opportunities than you might imagine... What better match could you find than a flexible and dynamic language for schemaless databases? Isn't, an event-driven language what you were waiting for to manage eventually consistency? When NoSQL doesn't come to JavaScript, JavaScript comes to NoSQL, and does it very well...
A fast Message-Queue base on zookeeper.
Jafka mq is a distributed publish-subscribe messaging system cloning from Apache Kafka.
So it has following features:
(1)Persistent messaging with O(1) disk structures that provide constant time performance even with many TB of stored messages.
(2)High-throughput: even with very modest hardware single broker can support hundreds of thousands of messages per second.
(3)Explicit support for partitioning messages over broker servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics.
(4)Simple message format for many language clients.
New languages bring new ways of thinking and teach us new principles and tools that we can bring back to your day to day language.
Using a real application as an example, we will learn how to build and design Java applications that follow Clojure’s functional principles using just core Java, without any libraries, lambdas, streams or weird syntax; and we will see what benefits those functional principles can bring.
Continuing where module 2 left off, this part of the course explains signals and slots in more detail and tells you how to extend functionality of existing widgets by subclassing them. In real applications, widgets are often used in dialogs or inside the main window, which is a container for widgets and by default supports menus, toolbars and actions. These topics are all demonstrated via small examples.
Basically everything you need to get started on your Zookeeper training, and setup apache Hadoop high availability with QJM setup with automatic failover.
Object Oriented Code RE with HexraysCodeXplorerAlex Matrosov
In recent time we see a large spike of complex threats with elaborate object-oriented architecture among which the most notorious examples are: Stuxnet, Flamer, Duqu.
The approaches to analysis of such malware are rather distinct compared to the malware developed using procedural programming languages.
This presentation will take an in-depth look at challenges related to reversing object-oriented code with respect to modern malware and demonstrate approaches and tools employed for reversing object-oriented code.
Slides for presentation on ZooKeeper I gave at Near Infinity (www.nearinfinity.com) 2012 spring conference.
The associated sample code is on GitHub at https://github.com/sleberknight/zookeeper-samples
HKG15-211: Advanced Toolchain Usage Part 4
---------------------------------------------------
Speaker: Ryan Arnold, Maxim Kuvyrkov, Will Newton, Yvan Roux
Date: February 10, 2015
---------------------------------------------------
★ Session Summary ★
This session is a continuation of the Advanced Toolchain Usage Part 1 & 2 presentations given at LCU14. Parts 3 and 4 will cover a variety of topics, such as: Linker tips and tricks, adding symbol versioning interfaces to a system library, debugging the dynamic linker, debugging applications that use malloc, gcc attributes, manually constructing a backtrace on arm & Aarch64, how to add lightweight debugging to your program, how to use a signal handler appropriately, and TLS Models on Aarch64 and when to use them.
--------------------------------------------------
★ Resources ★
Pathable: https://hkg15.pathable.com/meetings/250792
Video: https://www.youtube.com/watch?v=9AcklY0Cc7U
Etherpad: http://pad.linaro.org/p/hkg15-211
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2015 - #HKG15
February 9-13th, 2015
Regal Airport Hotel Hong Kong Airport
---------------------------------------------------
http://www.linaro.org
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2Antonios Giannopoulos
MongoDB 4.2 comes GA soon delivering some amazing new features on multiple areas. In this talk, we will focus on the new capabilities of the aggregation framework. We are going to cover the new operators and expressions. At the same time, we will explore how updates commands can now use the aggregation framework operators. We are also going to present aggregation framework improvements focusing on the on-demand materialized views. Finally, we are going to explore the wildcard indexes introduced in MongoDB 4.2 and how they change the way we design documents and build queries/aggregations. We will also make a reference to the new index build system.
From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016Susan Potter
This talk will introduce the audience to the Nix packaging, NixOS, and related ecosystem tools for Erlang/Elixir developers.
By reviewing common development, testing, and deployment problems we will look at what Nix has to offer to aid Erlang/Elixir developers in these areas.
From seamless developer environment bootstrapping to consistent CI environments and beyond.
These slides were designed for Apache Hadoop + Apache Apex workshop (University program).
Audience was mainly from third year engineering students from Computer, IT, Electronics and telecom disciplines.
I tried to keep it simple for beginners to understand. Some of the examples are using context from India. But, in general this would be good starting point for the beginners.
Advanced users/experts may not find this relevant.
The end of polling : why and how to transform a REST API into a Data Streamin...Audrey Neveu
We know interactivity is the key to keep our user’s interest alive but we can’t reduce animation to UI anymore. Twitter, Waze, Slack… users are used to have real-time data in applications they love. But how can you turn your static API into a stream of data?
When talking about data streaming, we often think about WebSockets. But have you ever heard of Server-Sent Events? In this tools-in-action we will compare both technologies to understand which one you should opt for depending on your usecase and I’ll show you how we have been even further by reducing the amount of data to transfer with JSON-Patch.
And because real-time data is not only needed by web (and because it’s much more fun), I’ll show you how we can make drone dance on streamed APIs.
Basically everything you need to get started on your Zookeeper training, and setup apache Hadoop high availability with QJM setup with automatic failover.
Object Oriented Code RE with HexraysCodeXplorerAlex Matrosov
In recent time we see a large spike of complex threats with elaborate object-oriented architecture among which the most notorious examples are: Stuxnet, Flamer, Duqu.
The approaches to analysis of such malware are rather distinct compared to the malware developed using procedural programming languages.
This presentation will take an in-depth look at challenges related to reversing object-oriented code with respect to modern malware and demonstrate approaches and tools employed for reversing object-oriented code.
Slides for presentation on ZooKeeper I gave at Near Infinity (www.nearinfinity.com) 2012 spring conference.
The associated sample code is on GitHub at https://github.com/sleberknight/zookeeper-samples
HKG15-211: Advanced Toolchain Usage Part 4
---------------------------------------------------
Speaker: Ryan Arnold, Maxim Kuvyrkov, Will Newton, Yvan Roux
Date: February 10, 2015
---------------------------------------------------
★ Session Summary ★
This session is a continuation of the Advanced Toolchain Usage Part 1 & 2 presentations given at LCU14. Parts 3 and 4 will cover a variety of topics, such as: Linker tips and tricks, adding symbol versioning interfaces to a system library, debugging the dynamic linker, debugging applications that use malloc, gcc attributes, manually constructing a backtrace on arm & Aarch64, how to add lightweight debugging to your program, how to use a signal handler appropriately, and TLS Models on Aarch64 and when to use them.
--------------------------------------------------
★ Resources ★
Pathable: https://hkg15.pathable.com/meetings/250792
Video: https://www.youtube.com/watch?v=9AcklY0Cc7U
Etherpad: http://pad.linaro.org/p/hkg15-211
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2015 - #HKG15
February 9-13th, 2015
Regal Airport Hotel Hong Kong Airport
---------------------------------------------------
http://www.linaro.org
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2Antonios Giannopoulos
MongoDB 4.2 comes GA soon delivering some amazing new features on multiple areas. In this talk, we will focus on the new capabilities of the aggregation framework. We are going to cover the new operators and expressions. At the same time, we will explore how updates commands can now use the aggregation framework operators. We are also going to present aggregation framework improvements focusing on the on-demand materialized views. Finally, we are going to explore the wildcard indexes introduced in MongoDB 4.2 and how they change the way we design documents and build queries/aggregations. We will also make a reference to the new index build system.
From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016Susan Potter
This talk will introduce the audience to the Nix packaging, NixOS, and related ecosystem tools for Erlang/Elixir developers.
By reviewing common development, testing, and deployment problems we will look at what Nix has to offer to aid Erlang/Elixir developers in these areas.
From seamless developer environment bootstrapping to consistent CI environments and beyond.
These slides were designed for Apache Hadoop + Apache Apex workshop (University program).
Audience was mainly from third year engineering students from Computer, IT, Electronics and telecom disciplines.
I tried to keep it simple for beginners to understand. Some of the examples are using context from India. But, in general this would be good starting point for the beginners.
Advanced users/experts may not find this relevant.
The end of polling : why and how to transform a REST API into a Data Streamin...Audrey Neveu
We know interactivity is the key to keep our user’s interest alive but we can’t reduce animation to UI anymore. Twitter, Waze, Slack… users are used to have real-time data in applications they love. But how can you turn your static API into a stream of data?
When talking about data streaming, we often think about WebSockets. But have you ever heard of Server-Sent Events? In this tools-in-action we will compare both technologies to understand which one you should opt for depending on your usecase and I’ll show you how we have been even further by reducing the amount of data to transfer with JSON-Patch.
And because real-time data is not only needed by web (and because it’s much more fun), I’ll show you how we can make drone dance on streamed APIs.
Real Time Analytics with Apache Cassandra - Cassandra Day MunichGuido Schmutz
Time series data is everywhere: IoT, sensor data or financial transactions. The industry has moved to databases like Cassandra to handle the high velocity and high volume of data that is now common place. In this talk I will present how we have used Cassandra to store time series data. I will highlight both the Cassandra data model as well as the architecture we put in place for collecting and ingesting data into Cassandra, using Apache Kafka and Apache Storm.
Real-time Stream Processing with Apache Flink @ Hadoop SummitGyula Fóra
Apache Flink is an open source project that offers both batch and stream processing on top of a common runtime and exposing a common API. This talk focuses on the stream processing capabilities of Flink.
RBea: Scalable Real-Time Analytics at KingGyula Fóra
This talk introduces RBEA (Rule-Based Event Aggregator), the scalable real-time analytics platform developed by King’s Streaming Platform team. We have built RBEA to make real-time analytics easily accessible to game teams across King without having to worry about operational details. RBEA is built on top of Apache Flink and uses the framework’s capabilities to it’s full potential in order to provide highly scalable stateful and windowed processing logic for the analytics applications. We will talk about how we have built a high-level DSL on the abstractions provided by Flink and how we tackled different technical challenges that have come up while developing the system.
Large-Scale Stream Processing in the Hadoop EcosystemGyula Fóra
Distributed stream processing is one of the hot topics in big data analytics today. An increasing number of applications are shifting from traditional static data sources to processing the incoming data in real-time. Performing large scale stream processing or analysis requires specialized tools and techniques which have become publicly available in the last couple of years.
This talk will give a deep, technical overview of the top-level Apache stream processing landscape. We compare several frameworks including Spark, Storm, Samza and Flink. Our goal is to highlight the strengths and weaknesses of the individual systems in a project-neutral manner to help selecting the best tools for the specific applications. We will touch on the topics of API expressivity, runtime architecture, performance, fault-tolerance and strong use-cases for the individual frameworks.
Real-time analytics as a service at King Gyula Fóra
This talk introduces RBea, our scalable real-time analytics platform at King built on top of Apache Flink. The design goal of RBea is to make stream analytics easily accessible to game teams across King. RBea is powered by Apache Flink and uses the framework’s capabilities to it’s full potential in order to provide highly scalable stateful and windowed processing logic for the analytics applications. RBea provides a high-level scripting DSL that is more approachable to developers without stream-processing experience and uses code-generation to execute user-scripts efficiently at scale.
In this talk I will cover the technical details of the RBea architecture and will also look at what real-time analytics brings to the table from the business perspective. If time permits I will also give some outlook on our future plans to generalise and further grow the platform.
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinGuido Schmutz
Time series data is everywhere: IoT, sensor data or financial transactions. The industry has moved to databases like Cassandra to handle the high velocity and high volume of data that is now common place. In this talk I will present how we have used Cassandra to store time series data. I will highlight both the Cassandra data model as well as the architecture we put in place for collecting and ingesting data into Cassandra, using Apache Kafka and Apache Storm.
This presentation examines some of the top stream analytic platforms in the enterprise. The slide deck explores the characteristics of enterprise stream analytic solutions and discusses the capabilties of some of the top stream analytic platform in the current market.
Processing data from social media streams and sensors in real-time is becoming increasingly prevalent and there are plenty open source solutions to choose from. To help practitioners decide what to use when we compare three popular Apache projects allowing to do stream processing: Apache Storm, Apache Spark and Apache Samza.
Reliable Data Intestion in BigData / IoTGuido Schmutz
Many of the Big Data and IoT use cases are based on combing data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache Flume, Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
More complex streaming applications generally need to store some state of the running computations in a fault-tolerant manner. This talk discusses the concept of operator state and compares state management in current stream processing frameworks such as Apache Flink Streaming, Apache Spark Streaming, Apache Storm and Apache Samza.
We will go over the recent changes in Flink streaming that introduce a unique set of tools to manage state in a scalable, fault-tolerant way backed by a lightweight asynchronous checkpointing algorithm.
Talk presented in the Apache Flink Bay Area Meetup group on 08/26/15
Independent of the source of data, the integration and analysis of event streams gets more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events.
So far this mostly a development experience, with frameworks such as Oracle Event Processing, Apache Storm or Spark Streaming. With Oracle Stream Analytics, analytics on event streams can be put in the hands of the business analyst. It simplifies the implementation of event processing solutions so that every business analyst is able to graphically and decleratively define event stream processing pipelines, without having to write a single line of code or continous query language (CQL). Event Processing is no longer “complex”! This session presents Oracle Stream Analytics directly on some selected demo use cases.
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
ndependent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
Mit der Architektur steht und fällt jedes IT-Projekt. Das gilt in noch stärkerem Maße für Big-Data-Projekte, denn hier konnten noch keine Standards über Jahrzehnte ihre Tauglichkeit beweisen. Dennoch verbreiten und etablieren sich auch hier gute und effektive Lösungen. Der Vortrag erklärt, welche Bausteine wichtig für die verschiedenen Einsatzmöglichkeiten im Big-Data-Umfeld sind, und wie sie in konkrete Lösungen gegossen werden können. Dabei beleuchtet er sowohl traditionelle Big-Data-Architekturen als auch aktuelle Ansätze, wie z. B. die Lambda- und die Kappa-Architektur. Ebenfalls ein Thema sind Stream-Processing-Infrastrukturen und ihre Kombination mit Big-Data-Technologien. Ausgehend von einer produkt- und technologieunabhängigen Referenzarchitektur stellt dieser Vortrag verschiedene Lösungsmöglichkeiten auf Basis von Open-Source-Komponenten vor.
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...Amazon Web Services
"This presentation will introduce Kinesis, the new AWS service for real-time streaming big data ingestion and processing.
We’ll provide an overview of the key scenarios and business use cases suitable for real-time processing, and how Kinesis can help customers shift from a traditional batch-oriented processing of data to a continual real-time processing model. We’ll explore the key concepts, attributes, APIs and features of the service, and discuss building a Kinesis-enabled application for real-time processing. We’ll walk through a candidate use case in detail, starting from creating an appropriate Kinesis stream for the use case, configuring data producers to push data into Kinesis, and creating the application that reads from Kinesis and performs real-time processing. This talk will also include key lessons learnt, architectural tips and design considerations in working with Kinesis and building real-time processing applications."
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Gyula Fóra
Distributed stream processing is one of the hot topics in big data analytics today. An increasing number of applications are shifting from traditional static data sources to processing the incoming data in real-time. Performing large scale stream analysis requires specialized tools and techniques which have become widely available in the last couple of years. This talk will give a deep, technical overview of the Apache stream processing landscape. We compare several frameworks including Flink , Spark, Storm, Samza and Apex. Our goal is to highlight the strengths and weaknesses of the individual systems in a project-neutral manner to help selecting the best tools for the specific applications. We will touch on the topics of API expressivity, runtime architecture, performance, fault-tolerance and strong use-cases for the individual frameworks. This talk is targeted towards anyone interested in streaming analytics either from user’s or contributor’s perspective. The attendees can expect to get a clear view of the available open-source stream processing architectures
Spark Streaming has quickly established itself as one of the more popular Streaming Engines running on the Hadoop Ecosystem. Not only does it provide integration with many type of message brokers and stream sources, but it also provides the ability to leverage other major modules in Spark like Spark SQL and MLib in conjunction. This allows for businesses and developers to make use out of data in ways they couldn’t hope to do in the past.
However, while building a Spark Streaming pipeline, it’s not sufficient to only know how to express your business logic. Operationalizing these pipelines and running the application with high uptime and continuous monitoring has a lot of operational challenges. Fortunately, Spark Streaming makes all that easy as well. In this talk, we’ll go over some of the main steps you’ll need to take to get your Spark Streaming application ready for production, specifically in conjunction with Kafka. This includes steps to gracefully shutdown your application, steps to perform upgrades, monitoring, various useful spark configurations and more.
Hands-on Lab to compare and contrast relational queries (using RDS for MySQL) with non-relational queries (using ElastiCache for Redis). You’ll need a laptop with a Firefox or Chrome browser.
The overall evolution towards microservices has caused a lot of IT leaders to radically rethink architectures and platforms. One can hardly keep up with the rapid onslaught on new distributed technologies. The same people who just asked yesterday "how can we deploy Docker containers?", are now asking "how can we operate Kubernetes-as-a-Service on-premise?", and are about to start asking "how can we operate the open source frameworks of our choice, such as Spark, TensorFlow, HDFS, and more, as a service across hybrid clouds?”. This session will discuss: Challenges of orchestrating and operating.
The overall evolution towards microservices has caused a lot of IT leaders to radically rethink architectures and platforms. One can hardly keep up with the rapid onslaught on new distributed technologies. The same people who just asked yesterday "how can we deploy Docker containers?", are now asking "how can we operate Kubernetes-as-a-Service on-premise?", and are about to start asking "how can we operate the open source frameworks of our choice, such as Spark, TensorFlow, HDFS, and more, as a service across hybrid clouds?”. This session will discuss: Challenges of orchestrating and operating
by Jeff Duffy, Database Specialist Solution Architect, AWS
Database Week at the AWS Loft is an opportunity to learn about Amazon’s broad and deep family of managed database services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon RDS and Amazon Aurora relational databases, Amazon DynamoDB non-relational databases, Amazon Neptune graph databases, and Amazon ElastiCache managed Redis, along with options for database migration, caching, search and more. You'll will learn how to get started, how to support applications, and how to scale.
Today there are a lot of cloud providers, with a wide range of offers. Web projects usually have continuously changing needs: what worked well yesterday may not be enough today. These two facts became quite obvious for us while migrating a large PHP application from Rackspace to Amazon. In this session I’d like to share this experience highlighting infrastructure and code evolution, migration steps, cost analisys, issues.
by Ganesh Shankaran, Sr. Solutions Architect, AWS
Hands-on Lab to compare and contrast relational queries (using RDS for MySQL) with nonrelational queries (using ElastiCache for Redis). You’ll need a laptop with a Firefox or Chrome browser.
Vawtrak Trojan, also known as Neverquest or Snifula, has been an enduring banking Trojan for a long time and is still one of the most prevalent banking Trojans in the wild today.
This report forms part of Blueliv’s investigation into the Vawtrak group. Reversing the Trojan was a mandatory element of this investigation in order to understand and track the cybercriminal groups and malicious actors behind the Vawtrak malware technical security researchers and reverse engineers can use this report to increase their understanding of how the banking Trojan works.
In this meetup, Liran Cohen, Cloud platform & DevOps Team Leader, will talk about some of Kubernetes key concepts. We will learn about the architecture of the system; the different resources available in the system; the problems it’s trying to solve, and the model that it uses to manage containerized application deployments.
Similar to KDD 2016 Streaming Analytics Tutorial (20)
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
2. Streaming Analytics
Before doing this tutorial please read the main presentation.
http://www.slideshare.net/NeeraAgarwal2/streaming-analytics
3. Technical Requirements
● OS: MAC OS X
● Programming Language: Scala 2.10.x
● Open source software used in tutorial: Kafka and Spark 1.6.2
4. Tutorials
Wiki topic
Ads topic
Clicks topic
Wiki
Edit
Events
Ad
Events
Click
Events
Producers
Kafka
Spark Streaming
Consumers
WikiPedia
Article
Edit Metrics
Impression &
Click Metrics
Tutorial -1
Tutorial -2
5. Step 1: Installation
In the conference we provided USB stick with the environment ready for the
tutorials. Here are the instructions to create your own environment:
1. Check Java: java -version
java version "1.8.0_92" (Note: Java 1.7+ should work.)
1. Check Maven: mvn -v
If not installed, check instructions in the Additional slides at the end.
6. Step 1: Installation
3. Install Scala
curl -O http://downloads.lightbend.com/scala/2.10.6/scala-2.10.6.tgz
tar -xzf scala-2.10.6.tgz
Set scala home to the path pointing to scala-2.10.6 folder. For example on Mac:
export SCALA_HOME=/Users/<username>/scala-2.10.6
export PATH=$PATH:$SCALA_HOME/bin
4. Install Spark
curl -O http://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz
8. Step 2: Start Kafka
Start a terminal window. Start Zookeeper
> cd kafka_2.10-0.10.0.0
> bin/zookeeper-server-start.sh config/zookeeper.properties
Wait for zookeeper to start. Start another terminal window and start Kafka
> cd kafka_2.10-0.10.0.0
> bin/kafka-server-start.sh config/server.properties
10. Step 1: Listening to WikiPedia Edit Stream
Start a new terminal window. Run WikiPedia Connector
> cd kdd2016-streaming-tutorial
> java -cp target/streamingtutorial-1.0.0-jar-with-dependencies.jar example.WikiPediaConnector
After some messages, stop using CTRL-C. We will run it again after writing
streaming code.
Does not run, build package and try again...
> mvn package
11. Step 1: WikiPedia Stream message structure
[[-acylglycerol O-acyltransferase]] MB
https://en.wikipedia.org/w/index.php?diff=733783045&oldid=721976415 * BU RoBOT * (-1) /* References
*/Sort into more specific stub template based on presence in [[Category:EC 2.3]] or subcategories (Task
25)
[[City Building]] https://en.wikipedia.org/w/index.php?diff=733783047&oldid=732314994 * Hmains * (+9)
refine category structures
[[Wikipedia:Articles for deletion/Log/2016 August 10]] B
https://en.wikipedia.org/w/index.php?diff=733783051&oldid=733783026 * Cyberbot
Fields: title, flags, diffUrl, user, byteDiff, summary
Flags (2nd field): ‘M’=Minor, ‘N’ = New, ‘!’ = Unpatrolled, ‘B’ = Bot Edit
WikiPedia Stream pattern = "[[(.*)]]s(.*)s(.*)s*s(.*)s*s(+?(.d*))s(.*)".r
12. Spark: RDD
An RDD is an immutable distributed collection of objects. Each RDD is split into
multiple partitions, which may be computed on different nodes of the cluster.
RDDs can contain any type of objects, including Python, Java, Scala or user-
defined classes.
RDDs offer two types of operations:
● Transformations construct a new RDD from a previous one.
● Actions compute a result based on an RDD, and either return it to the driver program or
save it to an external storage system.
13. Spark: DStream
DStream is a sequence of data arriving over time.
Internally, each DStream is represented as a sequence of RDDs arriving at each time step.
DStreams offer two types of operations:
● Transformations yield a new DStream.
● Output operations write data to an external system.
Ref: https://spark.apache.org/docs/latest/streaming-programming-guide.html
14. Step 2: Write Code
In kdd2016-streaming-tutorial
Change code in src/main/scala/example/WikiPediaStreaming.scala file. Use your
favorite editor.
val lines = messages.foreachRDD { rdd =>
// ADD CODE HERE
}
Note: In the conference participants were asked to write the code while in github repository full code is provided.
15. Step 2: Continued
rdd =>
val linesDF = rdd.map(row => row._2 match {
case pattern(title, flags, diffUrl, user, byteDiff, summary) => WikiEdit(title, flags, diffUrl, user,
byteDiff.toInt, summary)
case _ => WikiEdit("title", "flags", "diffUrl", "user", 0, "summary")
}).filter(row => row.title != "title").toDF()
16. Step 2 Continued
// Number of records in 10 second window.
val totalCnt = linesDF.count()
// Number of bot edited records in 10 second window.
val botEditCnt = linesDF.filter("flags like '%B%'").count()
// Number of human edited records in 10 second window.
val humanEditCnt = linesDF.filter("flags not like '%B%'").count()
val botEditPct = if (totalCnt > 0) 100 * botEditCnt / totalCnt else 0
val humanEditPct = if (totalCnt > 0) 100 * humanEditCnt / totalCnt else 0
17. Step 3: Build Program
Start a new terminal window.
> cd kdd2016-streaming-tutorial
> mvn package
18. Step 4: Run Programs
Run WikiPediaConnector in terminal window. It starts receiving data from
WikiPedia IRC channel and writes to Kafka.
>java -cp target/streamingtutorial-1.0.0-jar-with-dependencies.jar
example.WikiPediaConnector
Run WikiPediaStream in a new terminal window.
> cd kdd2016-streaming-tutorial
> ../spark-1.6.2-bin-hadoop2.6/bin/spark-submit --class
example.WikiPediaStreaming target/streamingtutorial-1.0.0-jar-with-
dependencies.jar
21. Tutorials
Wiki topic
Ads topic
Clicks topic
Wiki
Edit
Events
Ad
Events
Click
Events
Producers
Kafka
Spark Streaming
Consumers
WikiPedia
Edit Metrics
Impression &
Click Metrics
Tutorial -1
Tutorial -2
22. Step 1: Listening to Ads and Clicks stream
Run program to replay Ad and Click Events from file
> cd kdd2016-streaming-tutorial
> java -cp target/streamingtutorial-1.0.0-jar-with-dependencies.jar example.AdClickEventReplay
After some messages, stop using CTRL-C. We will run it again after writing
streaming code.
Does not run, build package and try again...
> mvn package
23. Step 1: Ad and Click Event message structure
Ad Event:
QueryID, AdId, TimeStamp: 6815, 48195, 1470632477761
Click Event:
QueryID, ClickId, TimeStamp: 6815, 93630, 1470632827088
Join on QueryId, show metrics by Ad Id.
24. Step 2: Write Code
In kdd2016-streaming-tutorial
Change code in src/main/scala/example/AdEventJoiner.scala file.
val adEventDStream = adStream.transform( rdd => {
rdd.map(line => line._2.split(",")).
map(row => (row(0).trim.toInt, AdEvent(row(0).trim.toInt, row(1).trim.toInt, row(2).trim.toLong)))
})
// ADD CODE HERE..
Note: In the conference participants were asked to write the code while in github repository full code is provided.
25. Step 2 Continued
//Connects Spark Streaming to Kafka Topic and gets DStream of RDDs (click event message)
val clickStream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc,
kafkaParams, clickStreamTopicSet)
//Create a new DStream by extracting kafka message and converting it to DStream[queryId, ClickEvent]
val clickEventDStream = clickStream.transform{ rdd =>
rdd.map(line => line._2.split(",")).
map(row => (row(0).trim.toInt, ClickEvent(row(0).trim.toInt, row(1).trim.toInt, row(2).trim.toLong)))
}
26. Step 2 Continued
// Join adEvent and clickEvent DStreams and output DStream[queryId, (adEvent, clickEvent)]
val joinByQueryId = adEventDStream.join(clickEventDStream)
joinByQueryId.print()
// Transform DStream to DStream[adId, count(adId)] for each RDD
val countByAdId = joinByQueryId.map(rdd => (rdd._2._1.adId,1)).reduceByKey(_+_)
27. Step 2 Continued
// Update the state [adId, countCummulative(adId)] by values from the next RDDs
val updateFunc = (values: Seq[Int], state: Option[Int]) => {
val currentCount = values.sum
val previousCount = state.getOrElse(0)
Some(currentCount + previousCount)
}
val countByAdIdCumm = countByAdId.updateStateByKey(updateFunc)
// Transform (key, value) pair to (adId, count(adId), countCummulative(adId))
val ad = countByAdId.join(countByAdIdCumm).map {case (adId, (count, cumCount)) => (adId, count, cumCount)}
29. Step 3: Build Program
> cd kdd2016-streaming-tutorial
> mvn package
30. Step 4: Run Programs
Run AdClickEventReplay in terminal window. It reads data from Ad and Click event
files writes to Kafka.
> cd kdd2016-streaming-tutorial
> java -cp target/streamingtutorial-1.0.0-jar-with-dependencies.jar
example.AdClickEventReplay
Run AdEventJoiner in a new terminal window.
> cd kdd2016-streaming-tutorial
> ../spark-1.6.2-bin-hadoop2.6/bin/spark-submit --class example.AdEventJoiner
target/streamingtutorial-1.0.0-jar-with-dependencies.jar
33. Additional Notes - Install Java (Mac 10.11)
● java -version
java version "1.8.0_92"
If java does not exist, try
In .bash_profie add export JAVA_HOME=$(/usr/libexec/java_home)
(Install Java - https://java.com/en/download/help/mac_install.xml)
34. Additional Notes - Install Maven
● Check Maven on a terminal window
○ mvn -v
○ Apache Maven 3.2.5+
● Install Maven
○ brew install maven
OR if you do not have brew then do:
1. curl -O http://mirror.nexcess.net/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-
bin.tar.gz
2. tar -xzvf apache-maven-3.3.9-bin.tar.gz