Reactive Streams are a cross-company initiative first ignited by Lightbend in 2013, soon to be joined by RxJava and other implementations focused on solving a very similar problem: asynchronous non-blocking stream processing, with guaranteed over-flow protection. Fast forward to 2016 and now these interfaces are part of JSR-266 and proposed for JDK9.
In this talk we'll first disambiguate what the word Stream means in this context (as it's been overloaded recently by various different meanings), then look at how its protocol works and how one might use it in the real world showing examples using existing implementations.
We'll also have a peek into the future, to see what the next steps for such collaborative protocols and the JDK ecosystem are in general.
[Japanese] How Reactive Streams and Akka Streams change the JVM Ecosystem @ R...Konrad Malawski
Japanese subtitles by Yugo Maede-san, thank you very much. Japanese subtitled version of the "How Reactive Streams and Akka Streams change the JVM Ecosystem". http://www.slideshare.net/ktoso/how-reactive-streams-akka-streams-change-the-jvm-ecosystem
End to End Akka Streams / Reactive Streams - from Business to SocketKonrad Malawski
The Reactive Streams specification, along with its TCK and various implementations such as Akka Streams, is coming closer and closer with the inclusion of the RS types in JDK 9. Using an example Twitter-like streaming service implementation, this session shows why this is a game changer in terms of how you can design reactive streaming applications by connecting pipelines of back-pressured asynchronous processing stages. The presentation looks at the example from two perspectives: a raw implementation and an implementation addressing a high-level business need.
Reactive Streams are a cross-company initiative first ignited by Lightbend in 2013, soon to be joined by RxJava and other implementations focused on solving a very similar problem: asynchronous non-blocking stream processing, with guaranteed over-flow protection. Fast forward to 2016 and now these interfaces are part of JSR-266 and proposed for JDK9.
In this talk we'll first disambiguate what the word Stream means in this context (as it's been overloaded recently by various different meanings), then look at how its protocol works and how one might use it in the real world showing examples using existing implementations.
We'll also have a peek into the future, to see what the next steps for such collaborative protocols and the JDK ecosystem are in general.
[Japanese] How Reactive Streams and Akka Streams change the JVM Ecosystem @ R...Konrad Malawski
Japanese subtitles by Yugo Maede-san, thank you very much. Japanese subtitled version of the "How Reactive Streams and Akka Streams change the JVM Ecosystem". http://www.slideshare.net/ktoso/how-reactive-streams-akka-streams-change-the-jvm-ecosystem
End to End Akka Streams / Reactive Streams - from Business to SocketKonrad Malawski
The Reactive Streams specification, along with its TCK and various implementations such as Akka Streams, is coming closer and closer with the inclusion of the RS types in JDK 9. Using an example Twitter-like streaming service implementation, this session shows why this is a game changer in terms of how you can design reactive streaming applications by connecting pipelines of back-pressured asynchronous processing stages. The presentation looks at the example from two perspectives: a raw implementation and an implementation addressing a high-level business need.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
Big data made easy with a Spark is the presentation I did for Opensource 101 in Columbia, SC, on April 18th 2019. It is a hands-on tutorial on Apache Spark with Java, walking through 3 different labs.
Building a Reactive System with Akka - Workshop @ O'Reilly SAConf NYCKonrad Malawski
Intense 3 hour workshop covering Akka Actors, Cluster, Streams, HTTP and more. Including very advanced patterns.
Presented with Henrik Engstrom at O'Reilly Software Architecture Conference in New York City in 2017
Enabling blazing fast search on a 1 billion member social network demands special weapons and tactics. In Evojam we took advantage of the Scala ecosystem and modern tools to realize persistence and processing of such a data set.
Case study focused on the client requirements and tools we have made use of to meet them. Thoughts, injuries and ideas about real life fast & big data challenge brought to you directly from the trenches.
This talk has been given on Scalar 2016 conference.
While we’re drawing ever closer to Java 9, and even hearing about features in Java 10, it’s also true that many of us are still working with an older version. Even if your project has technically adopted Java 8, and even if you’re using it when coding new features, it’s likely the majority of your code base is still not making the most of what’s available in Java 8 - features like Lambda Expressions, the Streams API, and new Date/Time.
In this presentation, Trisha:
- Highlights some of the benefits of using Java 8 - after all, you’ll probably have to persuade The Management that tampering with existing code is worthwhile
- Demonstrates how to identify areas of code that can be updated to use Java 8 features
- Shows how to automatically refactor your code to make use of features like lambdas and streams.
- Covers some of the pros and cons of using the new features - including suggestions of when refactoring may NOT be the best idea.
Reactive Streams: Handling Data-Flow the Reactive WayRoland Kuhn
Building on the success of Reactive Extensions—first in Rx.NET and now in RxJava—we are taking Observers and Observables to the next level: by adding the capability of handling back-pressure between asynchronous execution stages we enable the distribution of stream processing across a cluster of potentially thousands of nodes. The project defines the common interfaces for interoperable stream implementations on the JVM and is the result of a collaboration between Twitter, Netflix, Pivotal, RedHat and Typesafe. In this presentation I introduce the guiding principles behind its design and show examples using the actor-based implementation in Akka.
Extending the Yahoo Streaming BenchmarkJamie Grier
This presentation covers describes my own benchmarking of Apache Storm and Apache Flink based on the work started by Yahoo! It shows the incredible performance of Apache Flink
Microservices, Containers, and Machine LearningPaco Nathan
Session talk for Data Day Texas 2015, showing GraphX and SparkSQL for text analytics and graph analytics of an Apache developer email list -- including an implementation of TextRank in Spark.
Apache Spark Performance is too hard. Let's make it easierDatabricks
Apache Spark is a dynamic execution engine that can take relatively simple Scala code and create complex and optimized execution plans. In this talk, we will describe how user code translates into Spark drivers, executors, stages, tasks, transformations, and shuffles. We will then describe how this is critical to the design of Spark and how this tight interplay allows very efficient execution. We will also discuss various sources of metrics on how Spark applications use hardware resources, and show how application developers can use this information to write more efficient code. Users and operators who are aware of these concepts will become more effective at their interactions with Spark.
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinTill Rohrmann
This talk shows how we can use Apache Flink and Apache Zeppelin to do interactive data analysis. The examples show the usage of FlinkML to solve a linear regression and classification problem.
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Databricks
Overview and extended description: AI is expected to be the engine of technological advancements in the healthcare industry, especially in the areas of radiology and image processing. The purpose of this session is to demonstrate how we can build a AI-based Radiologist system using Apache Spark and Analytics Zoo to detect pneumonia and other diseases from chest x-ray images. The dataset, released by the NIH, contains around 110,00 X-ray images of around 30,000 unique patients, annotated with up to 14 different thoracic pathology labels. Stanford University developed a state-of-the-art model using CNN and exceeds average radiologist performance on the F1 metric. This talk focuses on how we can build a multi-label image classification model in a distributed Apache Spark infrastructure, and demonstrate how to build complex image transformations and deep learning pipelines using BigDL and Analytics Zoo with scalability and ease of use. Some practical image pre-processing procedures and evaluation metrics are introduced. We will also discuss runtime configuration, near-linear scalability for training and model serving, and other general performance topics.
Streaming Microservices With Akka Streams And Kafka StreamsLightbend
One of the most frequent questions that we get asked at Lightbend is “what’s the difference between Akka Streams and Kafka Streams?” After all, there is only a 1 letter difference between these two technologies, so how different could they be?
Well, as we see in this presentation, they are actually quite different. Both tools are part of the streaming Fast Data stack, but were created with entirely different technological approaches in mind. For example, While Akka Streams emerged as a dataflow-centric abstraction for the Akka Actor model, designed for general-purpose microservices, very low-latency event processing, and supports a wider class of application problems and third-party integrations via Alpakka, Kafka Streams is purpose-built for reading data from Kafka topics, processing it, and writing the results to new topics in a Kafka-centric way.
In this webinar by Dr. Dean Wampler, VP of Fast Data Engineering at Lightbend, we will:
* Discuss the strengths and weaknesses of Kafka Streams and Akka Streams for particular design needs in data-centric microservices
* Contrast them with Spark Streaming and Flink, which provide richer analytics over potentially huge data sets
* Help you map these streaming engines to your specific use cases, so you confidently pick the right ones for your jobs
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)Jean-Luc David
Mike Krieger discusses Instagram's best and worst infrastructure decisions, building and deploying scalable and extensible services. Audio here: https://soundcloud.com/jldavid/mike-krieger-how-a-small-team-scales-instagram
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
Slides for our solution we developed for using Mesos, Docker, Kafka, Spark, Cassandra and Solr (DataStax Enterprise Edition) all developed in Go for doing realtime log analysis at scale. Many organizations either need or want log analysis in real time where you can see within a second what is happening within your entire infrastructure. Today, with the hardware available and software systems we have in place, you can develop, build and use as a service these solutions.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
Big data made easy with a Spark is the presentation I did for Opensource 101 in Columbia, SC, on April 18th 2019. It is a hands-on tutorial on Apache Spark with Java, walking through 3 different labs.
Building a Reactive System with Akka - Workshop @ O'Reilly SAConf NYCKonrad Malawski
Intense 3 hour workshop covering Akka Actors, Cluster, Streams, HTTP and more. Including very advanced patterns.
Presented with Henrik Engstrom at O'Reilly Software Architecture Conference in New York City in 2017
Enabling blazing fast search on a 1 billion member social network demands special weapons and tactics. In Evojam we took advantage of the Scala ecosystem and modern tools to realize persistence and processing of such a data set.
Case study focused on the client requirements and tools we have made use of to meet them. Thoughts, injuries and ideas about real life fast & big data challenge brought to you directly from the trenches.
This talk has been given on Scalar 2016 conference.
While we’re drawing ever closer to Java 9, and even hearing about features in Java 10, it’s also true that many of us are still working with an older version. Even if your project has technically adopted Java 8, and even if you’re using it when coding new features, it’s likely the majority of your code base is still not making the most of what’s available in Java 8 - features like Lambda Expressions, the Streams API, and new Date/Time.
In this presentation, Trisha:
- Highlights some of the benefits of using Java 8 - after all, you’ll probably have to persuade The Management that tampering with existing code is worthwhile
- Demonstrates how to identify areas of code that can be updated to use Java 8 features
- Shows how to automatically refactor your code to make use of features like lambdas and streams.
- Covers some of the pros and cons of using the new features - including suggestions of when refactoring may NOT be the best idea.
Reactive Streams: Handling Data-Flow the Reactive WayRoland Kuhn
Building on the success of Reactive Extensions—first in Rx.NET and now in RxJava—we are taking Observers and Observables to the next level: by adding the capability of handling back-pressure between asynchronous execution stages we enable the distribution of stream processing across a cluster of potentially thousands of nodes. The project defines the common interfaces for interoperable stream implementations on the JVM and is the result of a collaboration between Twitter, Netflix, Pivotal, RedHat and Typesafe. In this presentation I introduce the guiding principles behind its design and show examples using the actor-based implementation in Akka.
Extending the Yahoo Streaming BenchmarkJamie Grier
This presentation covers describes my own benchmarking of Apache Storm and Apache Flink based on the work started by Yahoo! It shows the incredible performance of Apache Flink
Microservices, Containers, and Machine LearningPaco Nathan
Session talk for Data Day Texas 2015, showing GraphX and SparkSQL for text analytics and graph analytics of an Apache developer email list -- including an implementation of TextRank in Spark.
Apache Spark Performance is too hard. Let's make it easierDatabricks
Apache Spark is a dynamic execution engine that can take relatively simple Scala code and create complex and optimized execution plans. In this talk, we will describe how user code translates into Spark drivers, executors, stages, tasks, transformations, and shuffles. We will then describe how this is critical to the design of Spark and how this tight interplay allows very efficient execution. We will also discuss various sources of metrics on how Spark applications use hardware resources, and show how application developers can use this information to write more efficient code. Users and operators who are aware of these concepts will become more effective at their interactions with Spark.
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinTill Rohrmann
This talk shows how we can use Apache Flink and Apache Zeppelin to do interactive data analysis. The examples show the usage of FlinkML to solve a linear regression and classification problem.
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Databricks
Overview and extended description: AI is expected to be the engine of technological advancements in the healthcare industry, especially in the areas of radiology and image processing. The purpose of this session is to demonstrate how we can build a AI-based Radiologist system using Apache Spark and Analytics Zoo to detect pneumonia and other diseases from chest x-ray images. The dataset, released by the NIH, contains around 110,00 X-ray images of around 30,000 unique patients, annotated with up to 14 different thoracic pathology labels. Stanford University developed a state-of-the-art model using CNN and exceeds average radiologist performance on the F1 metric. This talk focuses on how we can build a multi-label image classification model in a distributed Apache Spark infrastructure, and demonstrate how to build complex image transformations and deep learning pipelines using BigDL and Analytics Zoo with scalability and ease of use. Some practical image pre-processing procedures and evaluation metrics are introduced. We will also discuss runtime configuration, near-linear scalability for training and model serving, and other general performance topics.
Streaming Microservices With Akka Streams And Kafka StreamsLightbend
One of the most frequent questions that we get asked at Lightbend is “what’s the difference between Akka Streams and Kafka Streams?” After all, there is only a 1 letter difference between these two technologies, so how different could they be?
Well, as we see in this presentation, they are actually quite different. Both tools are part of the streaming Fast Data stack, but were created with entirely different technological approaches in mind. For example, While Akka Streams emerged as a dataflow-centric abstraction for the Akka Actor model, designed for general-purpose microservices, very low-latency event processing, and supports a wider class of application problems and third-party integrations via Alpakka, Kafka Streams is purpose-built for reading data from Kafka topics, processing it, and writing the results to new topics in a Kafka-centric way.
In this webinar by Dr. Dean Wampler, VP of Fast Data Engineering at Lightbend, we will:
* Discuss the strengths and weaknesses of Kafka Streams and Akka Streams for particular design needs in data-centric microservices
* Contrast them with Spark Streaming and Flink, which provide richer analytics over potentially huge data sets
* Help you map these streaming engines to your specific use cases, so you confidently pick the right ones for your jobs
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)Jean-Luc David
Mike Krieger discusses Instagram's best and worst infrastructure decisions, building and deploying scalable and extensible services. Audio here: https://soundcloud.com/jldavid/mike-krieger-how-a-small-team-scales-instagram
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
Slides for our solution we developed for using Mesos, Docker, Kafka, Spark, Cassandra and Solr (DataStax Enterprise Edition) all developed in Go for doing realtime log analysis at scale. Many organizations either need or want log analysis in real time where you can see within a second what is happening within your entire infrastructure. Today, with the hardware available and software systems we have in place, you can develop, build and use as a service these solutions.
In an environment where cloud-scaling applications is becoming more and more important, client-server architectures paradigms, as shown by memcached, are back with vengeance. In this talk, Galder will talk about Hot Rod, Infinispan's new client/server binary protocol, explaining the key differences compared to memcached's binary protocol, such as the possibility of receiving cluster topology changes. Audience of this talk will learn of the importance of Hot Rod in 'cloud-scale' application server clustering, where stateless application server instances could use Infinispan Hot Rod clients to retrieve state from an elastic farm of Infinispan Hot Rod servers, improving capabilities to run application server instances as a PaaS. The talk will finish with a brief demo of a cluster of Infinispan Hot Rod servers running on EC2 being accessed from a non-Java client. The audience is expected to have an intermediate understanding of client-server software architectures and cloud deployments.
Advanced Data Retrieval and Analytics with Apache Spark and Openstack SwiftDaniel Krook
Lightning talk from the OpenStack NYC meetup on October 8, 2014.
http://bit.ly/ibm-os-meetup
By Gil Vernik
The integration between Apache Spark and Swift, and the use of Storlets for smart retrieval via filtering and privacy-support.
The content of this talk is a statement from the IBM Research division, not IBM product divisions, and is not a statement from IBM regarding its plans, directions or product intents. Any activities described by this talk are subject to change.
ELC-E 2010: The Right Approach to Minimal Boot Timesandrewmurraympc
This was presented at ELC-E 2010 in Cambridge and describes an approach to cold boot time reduction. It also demonstrates the approach through a case study with an MS7724 reference board.
Managing and Versioning Machine Learning Models in PythonSimon Frid
Practical machine learning is becoming messy, and while there are lots of algorithms, there is still a lot of infrastructure needed to manage and organize the models and datasets. Estimators and Django-Estimators are two python packages that can help version data sets and models, for deployment and effective workflow.
Refactoring Organizations - A Netflix Study (QCon NYC 2017)Josh Evans
Is your service architecture and engineering velocity constrained by organizational concerns? Does it seem impossible to give priority to key initiatives regardless of intent? Are engineers switching tasks so often that they are just treading water? Are critical projects endlessly backlogged? Has staffing up pushed the limits of your team structure? Navigating through challenges like these can be daunting and solutions fraught with uncertainty. How do you know what, where, when to change. And whatever the answer is today it will most certainly vary over time. Effective organizations evolve, at key inflection points, to support critical business and technical goals. There is not only a strong relationship between organizations and the software they produce (Conway’s Law) but many organizational solutions can be derived from analogs in the technical realm. In other words, we can treat organizational improvement as a refactoring exercise. Over the last 20 years Netflix engineering has proven time and again an ability to adapt and grow, resulting in undisputed dominance over the global internet tv market. In this talk we’ll use Netflix as a case study to illustrate how specific strategies, framed as technical analogs, have been employed to maximize engineering agility, velocity, and impact. These powerful, yet simple strategies and solutions provide a useful blueprint for organizational success.
The pillars of DevOps are Culture, Automation, Measurement and Sharing. Docker is a rare tool at enables DevOps through all 4 pillars. These slides take a look at how Docker can affect each pillar in your organization through a Lean lens.
Docker Enables DevOps - Keep C.A.L.M.S. and Docker on ...Boyd Hemphill
The pillars of DevOps are Culture, Automation, Measurement and Sharing. Docker is a rare tool at enables DevOps through all 4 pillars. These slides take a look at how Docker can affect each pillar in your organization through a Lean lens.
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn
This presentation about Hive will help you understand the history of Hive, what is Hive, Hive architecture, data flow in Hive, Hive data modeling, Hive data types, different modes in which Hive can run on, differences between Hive and RDBMS, features of Hive and a demo on HiveQL commands. Hive is a data warehouse system which is used for querying and analyzing large datasets stored in HDFS. Hive uses a query language called HiveQL which is similar to SQL. Hive issues SQL abstraction to integrate SQL queries (like HiveQL) into Java without the necessity to implement queries in the low-level Java API. Now, let us get started and understand Hadoop Hive in detail
Below topics are explained in this Hive presetntation:
1. History of Hive
2. What is Hive?
3. Architecture of Hive
4. Data flow in Hive
5. Hive data modeling
6. Hive data types
7. Different modes of Hive
8. Difference between Hive and RDBMS
9. Features of Hive
10. Demo on HiveQL
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
ROI & Business Value of CI, CD, DevOps, DevSecOps, & MicroservicesDavid Rico
Comprehensive overview of CI, CD, DevOps, DevSecOps, and Microservices, along with costs, benefits, facts, figures, statistics, models, tools, DevOps ecosystems and pipelines, case studies, and edge cases ...
AppSec Pipelines and Event based SecurityMatt Tesauro
Presented at AppSec California 2017, this is a continuation of earlier talks about AppSec Pipelines and demonstrates 1st and 2nd Gen Pipelines, how OWASP is creating a pipeline for its projects and how several companies have benefited from combining DevOps, Agile, CI/CD and Security into an AppSec Pipeline to move beyond traditional AppSec testing.
US Software Developers - Github Audience Analysis Affinio
Visual analysis of 20 unique developer communities found within @GitHub US audience (Twitter followers) using Affinio. Insights for each community include:
- Top coding languages
- Top brands
- Top influencers
- Top media & entertainment (shows, networks, channels)
- Top publishers
- Top events, organizations, etc.
- Top conversations & hashtags
- Top links, articles, images, and media
By the numbers:
- 127k followers analyzed
- 20 clusters
- 1bn+ relationships mapped
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...Amazon Web Services
This session discusses the architecture, formation, and usage of a collaborative HPC/big data scientific research and analysis environment on AWS. The pharmaceutical industry trend toward joint ventures and collaborations has created a need for new platforms in which to work together. We'll dive into architectural decisions for building collaborative systems. Examples include how such a platform allowed Human Longevity, Inc. to accelerate software deployment to production in a fast-paced research environment, and how Celgene uses AWS for research collaboration with outside universities and foundations.
Hadoop clusters can store nearly everything in a cheap and blazingly fast way to your data lake. Answering questions and gaining insights out of this ever growing stream becomes the decisive part for many businesses. Increasingly data has a natural structure as a graph, with vertices linked by edges, and many questions arising about the data involve graph traversals or other complex queries, for which one does not have an a priori given bound on the length of paths.
Spark with GraphX is great for answering relatively simple graph questions which are worth starting a Spark job for, because they essentially involve the whole graph. But does it make sense to start one for every ad-hoc query or is it suitable for complex real-time queries?
In this talk I will introduce an alternative solution that adds those features to an existing Hadoop/Spark setup and enables real-time insights. I will address the following topics:
* Challenges in gaining deeper insights from large amounts of graph data
* Benefits and limitations of graph analysis with Spark
* Introduction to ArangoDB SmartGraphs
* Deployment of Hadoop, Spark and ArangoDB using DC/OS
* Performing complex queries on billions of nodes and vertices leveraging ArangoDB SmartGraphs (Live Demo)
Hadoop clusters can store nearly everything in a cheap and blazingly fast way to your data lake. Answering questions and gaining insights out of this ever growing stream becomes the decisive part for many businesses. Increasingly data has a natural structure as a graph, with vertices linked by edges, and many questions arising about the data involve graph traversals or other complex queries, for which one does not have an a priori given bound on the length of paths.
Microservices, Events, and Breaking the Data Monolith with KafkaVMware Tanzu
One of the trickiest problems with microservices is dealing with data as it becomes spread across many different bounded contexts. An event architecture and event-streaming platform like Kafka provide a respite to this problem. Event-first thinking has a plethora of other advantages too, pulling in concepts from event sourcing, stream processing, and domain-driven design.
In this talk, Ben and Cornelia will tackle how to do the following:
● Transform the data monolith to microservices
● Manage bounded contexts for data fields that overlap
● Use event architectures that apply streaming technologies like Kafka to address the challenges of distributed data
Speakers:
Cornelia Davis, Author & VP, Technology, Pivotal
Ben Stopford, Author & Technologist, Office of CTO, Confluent
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Globus Compute wth IRI Workflows - GlobusWorld 2024
Velox: Models in Action
1. VELOX:
MODELS IN ACTION
Presented by Dan Crankshaw
crankshaw@cs.berkeley.edu
Henry Milner, Joseph Gonzalez, Peter Bailis, Haoyuan Li, Tomer Kaftan,
Zhao Zhang, Ali Ghodsi, Michael Franklin, Michael Jordan, and Ion Stoica
https://amplab.cs.berkeley.edu/projects/velox/
2. MODELS AT REST
Data
Well
Studied
Train
Observe
Predictions Model Predict
24. What’s wrong?
1. Built from scratch for each
application
2. Different systems
25. What’s wrong?
1. Built from scratch for each
application
2. Different systems
3. Space inefficient
26. What’s wrong?
1. Built from scratch for each
application
2. Different systems
3. Space inefficient
4. Stale predictions
27. What’s wrong?
1. Built from scratch for each
application
2. Different systems
3. Space inefficient
4. Stale predictions
5. The T-Swift effect Sample Bias
28. Catify: Music for Cats
Pipeline
Tachyon + HDFS
NGINX
Node.js App Server
MongoDB
Training Data
New Model
34. BENEFITS
1. Low-latency and scalable
predictions as a service
2. Integrated approach leads to
fresher, better predictions
35. BENEFITS
1. Low-latency and scalable
predictions as a service
2. Integrated approach leads to
fresher, better predictions
3. Easy translation to production
predictions
36. BENEFITS
1. Low-latency and scalable
predictions as a service
2. Integrated approach leads to
fresher, better predictions
3. Easy translation to production
predictions
4. Eases operational pain
65. ACTIVE LEARNING: LinUCB
Rating
Songs
Uncertainty
Prediction
Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. WWW '10:
Proceedings of the 19th international conference on World wide web, New York, New York, USA: ACM. doi:10.1145/1772690.1772758
66. ACTIVE LEARNING: LinUCB
Rating
Songs
Look at upper
confidence bound
Uncertainty
Prediction
Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. WWW '10:
Proceedings of the 19th international conference on World wide web, New York, New York, USA: ACM. doi:10.1145/1772690.1772758
67. ACTIVE LEARNING: LinUCB
Rating
Songs
Look at upper
confidence bound
Uncertainty
Prediction
Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. WWW '10:
Proceedings of the 19th international conference on World wide web, New York, New York, USA: ACM. doi:10.1145/1772690.1772758
72. Catify: Music for Cats
Pipeline
Tachyon + HDFS
NGINX
Node.js App Server
MongoDB
Training Data
New Model
73. USER-FACING API
GET
/velox/catify/predict?userid=22&song=27632
GET
/velox/catify/predict_top_k?userid=22&k=100
74. USER-FACING API
GET
/velox/catify/predict?userid=22&song=27632
GET
/velox/catify/predict_top_k?userid=22&k=100
POST
/velox/catify/observe?userid=22&song=27632?score=3.7
82. The future of research in scalable learning systems will be in the
integration of the learning lifecycle:
Data
Training
Feedback
Predictions Model Serving
84. SUMMARY
•Model training and predictions rely on ad-hoc,
manual processes spread across multiple systems
85. SUMMARY
•Model training and predictions rely on ad-hoc,
manual processes spread across multiple systems
•The Velox system automatically maintains multiple
models while providing low latency, scalable, and
personalized predictions
86. SUMMARY
•Model training and predictions rely on ad-hoc,
manual processes spread across multiple systems
•The Velox system automatically maintains multiple
models while providing low latency, scalable, and
personalized predictions
•Velox is part of BDAS, is coming soon…
87. SUMMARY
•Model training and predictions rely on ad-hoc,
manual processes spread across multiple systems
•The Velox system automatically maintains multiple
models while providing low latency, scalable, and
personalized predictions
•Velox is part of BDAS, is coming soon…
•https://amplab.cs.berkeley.edu/projects/velox/