What is Distributed Tracing (DT), why it may be useful for you.
Design of DT, how OpenTracing and OpenCensus works for Elixir/Erlang projects (libraries, problems, my experience)
Making the big data ecosystem work together with Python & Apache Arrow, Apach...Holden Karau
Slides from PyData London exploring how the big data ecosystem (currently) works together as well as how different parts of the ecosystem work with Python. Proof-of-concept examples are provided using nltk & spacy with Spark. Then we look to the future and how we can improve.
Accelerating Big Data beyond the JVM - Fosdem 2018Holden Karau
Many popular big data technologies (such as Apache Spark, BEAM, Flink, and Kafka) are built in the JVM, and many interesting tools are built in other languages (ranging from Python to CUDA). For simple operations the cost of copying the data can quickly dominate, and in complex cases can limit our ability to take advantage of specialty hardware. This talk explores how improved formats are being integrated to reduce these hurdles to co-operation.
Many popular big data technologies (such as Apache Spark, BEAM, and Flink) are built in the JVM, and many interesting AI tools are built in other languages, and some requiring copying to the GPU. As many folks have experienced, while we may wish that we spend all of our time playing with cool algorithms -- we often need to spend more of our time working on data prep. Having to copy our data slowly between the JVM and the target language of computation can remove much of the benefit of being able to access our specialized tooling. Thankfully, as illustrated in the soon to be released Spark 2.3, Apache Arrow and related tools offer the ability to reduce this overhead. This talk will explore how Arrow is being integrated into Spark, and how it can be integrated into other systems, but also limitations and places where Apache Arrow will not magically save us.
Link: https://fosdem.org/2018/schedule/event/big_data_outside_jvm/
Big Data Beyond the JVM - Strata San Jose 2018Holden Karau
Many of the recent big data systems, like Hadoop, Spark, and Kafka, are written primarily in JVM languages. At the same time, there is a wealth of tools for data science and data analytics that exist outside of the JVM. Holden Karau and Rachel Warren explore the state of the current big data ecosystem and explain how to best work with it in non-JVM languages. While much of the focus will be on Python + Spark, the talk will also include interesting anecdotes about how these lessons apply to other systems (including Kafka).
Holden and Rachel detail how to bridge the gap using PySpark and discuss other solutions like Kafka Streams as well. They also outline the challenges of pure Python solutions like dask. Holden and Rachel start with the current architecture of PySpark and its evolution. They then turn to the future, covering Arrow-accelerated interchange for Python functions, how to expose Python machine learning models into Spark, and how to use systems like Spark to accelerate training of traditional Python models. They also dive into what other similar systems are doing as well as what the options are for (almost) completely ignoring the JVM in the big data space.
Python users will learn how to more effectively use systems like Spark and understand how the design is changing. JVM developers will gain an understanding of how to Python code from data scientist and Python developers while avoiding the traditional trap of needing to rewrite everything.
Testing and validating distributed systems with Apache Spark and Apache Beam ...Holden Karau
As distributed data parallel systems, like Spark, are used for more mission-critical tasks, it is important to have effective tools for testing and validation. This talk explores the general considerations and challenges of testing systems like Spark through spark-testing-base and other related libraries.
With over 40% of folks automatically deploying the results of their Spark jobs to production, testing is especially important. Many of the tools for working with big data systems (like notebooks) are great for exploratory work, and can give a false sense of security (as well as additional excuses not to test). This talk explores why testing these systems are hard, special considerations for simulating "bad" partioning, figuring out when your stream tests are stopped, and solutions to these challenges.
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
Big data analysis relies on exploiting various handy tools to gain insight from data easily. In this talk, the speaker demonstrates a data mining flow for text classification using many Python tools. The flow consists of feature extraction/selection, model training/tuning and evaluation. Various tools are used in the flow, including: Pandas for feature processing, scikit-learn for classification, IPython, Notebook for fast sketching, matplotlib for visualization.
How to Write the Fastest JSON Parser/Writer in the WorldMilo Yip
How RapidJSON is developed in order to achieve highest performance among 20 C/C++ JSON libraries. Benchmarks, some C++ design, algorithm and low-level optimizations are covered.
Apache Spark is one of the most popular big data projects, offering greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. This talk will examine how to debug Apache Spark applications, the different options for logging in PySpark, as well as some common errors and how to detect them.
Spark’s own internal logging can often be quite verbose, and this talk will examine how to effectively search logs from Apache Spark to spot common problems. In addition to the internal logging, this talk will look at options for logging from within our program itself.
Spark’s accumulators have gotten a bad rap because of how they interact in the event of cache misses or partial recomputes, but this talk will look at how to effectively use Spark’s current accumulators for debugging as well as a look to future for data property type accumulators which may be coming to Spark in future version.
In addition to reading logs, and instrumenting our program with accumulators, Spark’s UI can be of great help for quickly detecting certain types of problems.
Debuggers are a wonderful tool, however when you have 100 computers the “wonder” can be a bit more like “pain”. This talk will look at how to connect remote debuggers, but also remind you that it’s probably not the easiest path forward.
How do we go from your Java code to the CPU assembly that actually runs it? Using high level constructs has made us forget what happens behind the scenes, which is however key to write efficient code.
Starting from a few lines of Java, we explore the different layers that constribute to running your code: JRE, byte code, structure of the OpenJDK virtual machine, HotSpot, intrinsic methds, benchmarking.
An introductory presentation to these low-level concerns, based on the practical use case of optimizing 6 lines of code, so that hopefully you to want to explore further!
Presentation given at the Toulouse (FR) Java User Group.
Video (in french) at https://www.youtube.com/watch?v=rB0ElXf05nU
Slideshow with animations at https://docs.google.com/presentation/d/1eIcROfLpdTU2_Z_IKiMG-AwqZGZgbN1Bs2E0nGShpbk/pub?start=true&loop=false&delayms=60000
Talk presented by Aarón Fas & Andrés Viedma at the JBcnConf 2015.
'Microservices' is one of the most popular buzzwords in the industry now, but are they really a step forward? Or they might be more a problem than a solution? When are they really helpful? How should they be addressed? What challenges will we face if we decide to implement a microservices based architecture?
One year ago, Tuenti moved from a monolithic PHP backend to a Java + PHP microservices architecture. In this talk, we'll share our experiences so far: how we addressed the change, how we implemented it, why we think it's been valuable for us (and how is that related to the company culture), why it might not be a good idea for your company / application and, mostly, what lessons we have learned from this experience.
This module has been created to answer all the questions on how IPFS can be used for dynamic real-time applications. In this module, you will learn:
- how to reason about dynamic data on IPFS,
- IPNS, the simplest construction for naming in IPFS,
- how PubSub can offer subsecond speeds for interactive applications,
- how CRDTs are a fundamental building block for distributed applications,
- what is available in the ecosystem.
Making the big data ecosystem work together with Python & Apache Arrow, Apach...Holden Karau
Slides from PyData London exploring how the big data ecosystem (currently) works together as well as how different parts of the ecosystem work with Python. Proof-of-concept examples are provided using nltk & spacy with Spark. Then we look to the future and how we can improve.
Accelerating Big Data beyond the JVM - Fosdem 2018Holden Karau
Many popular big data technologies (such as Apache Spark, BEAM, Flink, and Kafka) are built in the JVM, and many interesting tools are built in other languages (ranging from Python to CUDA). For simple operations the cost of copying the data can quickly dominate, and in complex cases can limit our ability to take advantage of specialty hardware. This talk explores how improved formats are being integrated to reduce these hurdles to co-operation.
Many popular big data technologies (such as Apache Spark, BEAM, and Flink) are built in the JVM, and many interesting AI tools are built in other languages, and some requiring copying to the GPU. As many folks have experienced, while we may wish that we spend all of our time playing with cool algorithms -- we often need to spend more of our time working on data prep. Having to copy our data slowly between the JVM and the target language of computation can remove much of the benefit of being able to access our specialized tooling. Thankfully, as illustrated in the soon to be released Spark 2.3, Apache Arrow and related tools offer the ability to reduce this overhead. This talk will explore how Arrow is being integrated into Spark, and how it can be integrated into other systems, but also limitations and places where Apache Arrow will not magically save us.
Link: https://fosdem.org/2018/schedule/event/big_data_outside_jvm/
Big Data Beyond the JVM - Strata San Jose 2018Holden Karau
Many of the recent big data systems, like Hadoop, Spark, and Kafka, are written primarily in JVM languages. At the same time, there is a wealth of tools for data science and data analytics that exist outside of the JVM. Holden Karau and Rachel Warren explore the state of the current big data ecosystem and explain how to best work with it in non-JVM languages. While much of the focus will be on Python + Spark, the talk will also include interesting anecdotes about how these lessons apply to other systems (including Kafka).
Holden and Rachel detail how to bridge the gap using PySpark and discuss other solutions like Kafka Streams as well. They also outline the challenges of pure Python solutions like dask. Holden and Rachel start with the current architecture of PySpark and its evolution. They then turn to the future, covering Arrow-accelerated interchange for Python functions, how to expose Python machine learning models into Spark, and how to use systems like Spark to accelerate training of traditional Python models. They also dive into what other similar systems are doing as well as what the options are for (almost) completely ignoring the JVM in the big data space.
Python users will learn how to more effectively use systems like Spark and understand how the design is changing. JVM developers will gain an understanding of how to Python code from data scientist and Python developers while avoiding the traditional trap of needing to rewrite everything.
Testing and validating distributed systems with Apache Spark and Apache Beam ...Holden Karau
As distributed data parallel systems, like Spark, are used for more mission-critical tasks, it is important to have effective tools for testing and validation. This talk explores the general considerations and challenges of testing systems like Spark through spark-testing-base and other related libraries.
With over 40% of folks automatically deploying the results of their Spark jobs to production, testing is especially important. Many of the tools for working with big data systems (like notebooks) are great for exploratory work, and can give a false sense of security (as well as additional excuses not to test). This talk explores why testing these systems are hard, special considerations for simulating "bad" partioning, figuring out when your stream tests are stopped, and solutions to these challenges.
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
Big data analysis relies on exploiting various handy tools to gain insight from data easily. In this talk, the speaker demonstrates a data mining flow for text classification using many Python tools. The flow consists of feature extraction/selection, model training/tuning and evaluation. Various tools are used in the flow, including: Pandas for feature processing, scikit-learn for classification, IPython, Notebook for fast sketching, matplotlib for visualization.
How to Write the Fastest JSON Parser/Writer in the WorldMilo Yip
How RapidJSON is developed in order to achieve highest performance among 20 C/C++ JSON libraries. Benchmarks, some C++ design, algorithm and low-level optimizations are covered.
Apache Spark is one of the most popular big data projects, offering greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. This talk will examine how to debug Apache Spark applications, the different options for logging in PySpark, as well as some common errors and how to detect them.
Spark’s own internal logging can often be quite verbose, and this talk will examine how to effectively search logs from Apache Spark to spot common problems. In addition to the internal logging, this talk will look at options for logging from within our program itself.
Spark’s accumulators have gotten a bad rap because of how they interact in the event of cache misses or partial recomputes, but this talk will look at how to effectively use Spark’s current accumulators for debugging as well as a look to future for data property type accumulators which may be coming to Spark in future version.
In addition to reading logs, and instrumenting our program with accumulators, Spark’s UI can be of great help for quickly detecting certain types of problems.
Debuggers are a wonderful tool, however when you have 100 computers the “wonder” can be a bit more like “pain”. This talk will look at how to connect remote debuggers, but also remind you that it’s probably not the easiest path forward.
How do we go from your Java code to the CPU assembly that actually runs it? Using high level constructs has made us forget what happens behind the scenes, which is however key to write efficient code.
Starting from a few lines of Java, we explore the different layers that constribute to running your code: JRE, byte code, structure of the OpenJDK virtual machine, HotSpot, intrinsic methds, benchmarking.
An introductory presentation to these low-level concerns, based on the practical use case of optimizing 6 lines of code, so that hopefully you to want to explore further!
Presentation given at the Toulouse (FR) Java User Group.
Video (in french) at https://www.youtube.com/watch?v=rB0ElXf05nU
Slideshow with animations at https://docs.google.com/presentation/d/1eIcROfLpdTU2_Z_IKiMG-AwqZGZgbN1Bs2E0nGShpbk/pub?start=true&loop=false&delayms=60000
Talk presented by Aarón Fas & Andrés Viedma at the JBcnConf 2015.
'Microservices' is one of the most popular buzzwords in the industry now, but are they really a step forward? Or they might be more a problem than a solution? When are they really helpful? How should they be addressed? What challenges will we face if we decide to implement a microservices based architecture?
One year ago, Tuenti moved from a monolithic PHP backend to a Java + PHP microservices architecture. In this talk, we'll share our experiences so far: how we addressed the change, how we implemented it, why we think it's been valuable for us (and how is that related to the company culture), why it might not be a good idea for your company / application and, mostly, what lessons we have learned from this experience.
This module has been created to answer all the questions on how IPFS can be used for dynamic real-time applications. In this module, you will learn:
- how to reason about dynamic data on IPFS,
- IPNS, the simplest construction for naming in IPFS,
- how PubSub can offer subsecond speeds for interactive applications,
- how CRDTs are a fundamental building block for distributed applications,
- what is available in the ecosystem.
Engineering software is widely employed for its powerful abstraction of scientific and technical knowledge. It enables productive applications, e.g., analysis, prototyping, and manufacturing. Making engineering software requires a profound understanding in the problem domain, as well as the art of engineering it.
Software engineering differs substantially from conventional engineering. To professionally build software, mathematicians, scientists, and engineers need skills including system administration, automatic build, automatic testing, version control, to name but a few. Computer science knowledge like algorithms and data structures is also indispensable. It is a joyful, interdisciplinary, and world-changing enterprise worth sharing with all future engineering practitioners.
Time to say goodbye to your Nagios based setup. Discover all the new cool tools out there to do some more efficient monitoring. A talk made at OSMC 2014.
https://www.youtube.com/watch?v=_BAWi9Zhmic
OSMC 2014: Time to say goodbye to your Nagios setup | Oliver JanNETWAYS
I was presenting the new trends and possibilities for a decade in open source monitoring 3 years ago at OSMC. It's time now to see if we can make some monitoring without Nagios. We'll then explore:
- Collectd, Diamond, Packetbeat, StatsD and Logstash for collecting metrics and events
- Graphite, InfluxDB for storing time series data
- Elasticsearch for storing events
- Kibana and Grafana for dispaying and searching metrics and events.
- Seyren and Cabot for notifications
What is possible with such a solution? How does it work compared to Nagios…? Is it iso-functionnal, maybe better than a Nagios based solution? Is there any migration path from Nagios?
We'll try to answer all theses questions and maybe more!
Hail hydrate! from stream to lake using open sourceTimothy Spann
(VIRTUAL) Hail Hydrate! From Stream to Lake Using Open Source - Timothy J Spann, StreamNative
https://osselc21.sched.com/event/lAPi?iframe=no
A cloud data lake that is empty is not useful to anyone. How can you quickly, scalably and reliably fill your cloud data lake with diverse sources of data you already have and new ones you never imagined you needed. Utilizing open source tools from Apache, the FLiP stack enables any data engineer, programmer or analyst to build reusable modules with low or no code. FLiP utilizes Apache NiFi, Apache Pulsar, Apache Flink and MiNiFi agents to load CDC, Logs, REST, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before. I will teach you how to fish in the deep end of the lake and return a data engineering hero. Let's hope everyone is ready to go from 0 to Petabyte hero.
https://osselc21.sched.com/event/lAPi/virtual-hail-hydrate-from-stream-to-lake-using-open-source-timothy-j-spann-streamnative
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
Building Real-Time Pipelines With FLaNK
Timothy Spann, Principal Developer Advocate, Streaming - Cloudera Future of Data meetup, startup grind, AI Camp
The combination of Apache Flink, Apache NiFi, and Apache Kafka for building real-time data processing pipelines is extremely powerful, as demonstrated by this case study using the FLaNK-MTA project. The project leverages these technologies to process and analyze real-time data from the New York City Metropolitan Transportation Authority (MTA). FLaNK-MTA demonstrates how to efficiently collect, transform, and analyze high-volume data streams, enabling timely insights and decision-making.
Apache NiFi
Apache Kafka
Apache Flink
Apache Iceberg
LLM
Generative AI
Slack
Postgresql
Go fit perfectly inside containers, you can ship apps as tiny images on k8s, distributing them across the globe. Gianluca will show how InfluxData debugs containers running on Kubernetes to allow sysadmins and developers to troubleshoot and replicate issues using core dump, debuggers, and logs.
Go applications are perfect to be run inside a container. You can build a single binary, a tiny Docker image and you can ship them on your Kubernetes cluster. A successful production environment requires stability and simplicity, it needs to be easy to troubleshoot and operators need to be able to get all the information developers will need to fix a bug. During this talk, Gianluca will share what influxData is doing to allow developers and system administrator to work together, understanding problems running live at scale on Kubernetes and how to escalate them down to Software Engineer using logs, delve, gdb, core dumps, and traces to replicate and fix issues.
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...apidays
Apidays Paris 2023 - Software and APIs for Smart, Sustainable and Sovereign Societies
December 6, 7 & 8, 2023
Forget TypeScript, Choose Rust to build Robust, Fast and Cheap APIs
Zacaria Chtatar, Backend Software Engineer at HaveSomeCode
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaGuido Schmutz
Many of the Big Data and IoT use cases are based on combining data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache Flume, Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
Kubernetes is not needed to 90 percents of the companies.rusIvan Glushkov
Эйфория и хайп вокруг Kubernetes не дают всем компаниям возможности трезво взглянуть на сложности, проблемы и риски процесса перехода на Kubernetes.
Я попробую остудить пыл самых смелых и наивных, и показать, что этот путь очень тернист и опасен:
рассмотрю список стандартных проблем,
покажу, на что следует обратить внимание при планировании перехода,
и посоветую "ничего не трогать, пока работает".
Стандартно, банально, но может сэкономить вам массу нервов и денег.
NewSQL overview:
- History of RDBMs
- The reasons why NoSQL concept appeared
- Why NoSQL was not enough, the necessity of NewSQL
- Characteristics of NewSQL
- 7 DBs that belongs to NewSQL
- Overview Table with main properties
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
10. Design DT: Use Cases
❖ Log one request through all the services
11. Design DT: Use Cases
❖ Log one request through all the services
❖ Gather all operations information (result, time)
12. Design DT: Use Cases
❖ Log one request through all the services
❖ Gather all operations information (result, time)
❖ Build Dependency Graph
13. Design DT: Use Cases
❖ Log one request through all the services
❖ Gather all operations information (result, time)
❖ Build Dependency Graph
❖ Analytics (“Daper” paper)
14. Design DT: Use Cases
❖ Log one request through all the services
❖ Gather all operations information (result, time)
❖ Build Dependency Graph
❖ Analytics (“Daper” paper)
❖ Tags, Logs, Artifacts for each operation
15. Design DT: Use Cases
❖ Log one request through all the services
❖ Gather all operations information (result, time)
❖ Build Dependency Graph
❖ Analytics (“Daper” paper)
❖ Tags, Logs, Artifacts for each operation
❖ Lines of Business analytics
16. Design DT: Use Cases
❖ Log one request through all the services
❖ Gather all operations information (result, time)
❖ Build Dependency Graph
❖ Analytics (“Daper” paper)
❖ Tags, Logs, Artifacts for each operation
❖ Lines of Business analytics
❖ QoS, Traffic Control
20. Design DT: Idea
❖ User Request ID -> to pass to every subsystem:
❖ HTTP: headers
21. Design DT: Idea
❖ User Request ID -> to pass to every subsystem:
❖ HTTP: headers
❖ gRPC: additional field / auto wrapping
22. Design DT: Idea
❖ User Request ID -> to pass to every subsystem:
❖ HTTP: headers
❖ gRPC: additional field / auto wrapping
❖ Event Bus: additional field / auto wrapping
23. Design DT: Idea
❖ User Request ID -> to pass to every subsystem:
❖ HTTP: headers
❖ gRPC: additional field / auto wrapping
❖ Event Bus: additional field / auto wrapping
❖ Subsystem to have sub-request ID
24. Design DT: Idea
❖ User Request ID -> to pass to every subsystem:
❖ HTTP: headers
❖ gRPC: additional field / auto wrapping
❖ Event Bus: additional field / auto wrapping
❖ Subsystem to have sub-request ID
❖ Relation to the previous subsystem (parent/child, sequence, …)
29. Design DT: Problems
❖ Too many traces -> OOM or CPU is 100%
❖ Too few traces -> miss problems
❖ Decide “on the fly” is difficult
30. OpenTracing
❖ Cloud Native Computing Foundation (cncf.io) incubating project
❖ Uber, Apple, Pinterest, Couchbase
❖ API specification, libraries
31. OpenTracing: Concepts
❖ Trace
❖ Span: name, start time, end time
❖ Span: kv tags, kv logs, baggage items
❖ SpanContext
❖ Scopes + Threading + ActiveSpan
❖ Tracers: API + ready solutions
❖ Carriers: API to inject/extract SpanContext
32. OpenTracing: Flow
1. get SpanContext or start Trace => span.start(SpanContext)
2. span.store(tags/metrics/logs/baggage)
3.
4. span.finish()
33. OpenTracing: Flow
1. get SpanContext or start Trace => span.start(SpanContext)
2. span.store(tags/metrics/logs/baggage)
3. run another function with SpanContext
4. span.finish()
34. OpenTracing: Flow
1. get SpanContext or start Trace => span.start(SpanContext)
2. span.store(tags/metrics/logs/baggage)
3. send async message with SpanContext
4. span.finish()
35. OpenTracing: Flow
1. get SpanContext or start Trace => span.start(SpanContext)
2. span.store(tags/metrics/logs/baggage)
3. HTTP request with SpanContext in headers
4. span.finish()
45. defmodule Nested do
use ExRay, pre: :before_fun, post: :after_fun
…
@trace kind: :critical
def fred(a, b), do: blee(a, b)
…
defp before_fun(ctx) do
Span.open(ctx.target, @req_id)
|> :otter.tag(:kind, ctx.meta[:kind])
|> :otter.log(">>> #{ctx.target} with #{ctx.args |> inspect}")
end
end
OpenTracing: Ex_Ray (Elixir)
46. ❖ Less code needed
❖ Low quality code
❖ Memory leaks
❖ Exceptions are not re-raised in wrappers
❖ No default agreements
OpenTracing: Ex_Ray (Elixir): Summary
47. OpenCensus
❖ Started in Google
❖ Large community (Microsoft, Datadog, Prometheus, …)
❖ Automatic Context Propagation
❖ Reference implementation of the official W3C HTTP tracing header
48. OpenCensus: Concepts
❖ Trace, Span - similar to OpenTracing
❖ Link between spans: child/parent/unknown
❖ Sampling: Always/Never/Probabilistic (1 in 10000)/RateLimiting (10 per
sec)
❖ Automatic Context Propagation
❖ Stats/Metrics
52. OpenCensus Erlang
❖ Public GitHub repo for all Elixir/Erlang libs
❖ Libs for web-servers (Elli, Cowboy, Phoenix, …)
❖ Integrate with minimum effort
53. OpenCensus Erlang
❖ ETS table for Span data + GC for abandoned Spans
❖ Track SpanContext: process dict / variable
❖ Parse transform or manual context tracking
❖ Logger can receive SpanContext
❖ Metrics
57. OpenCensus Elixir
❖ Uses opencencus-erlang (e.g. prepare headers with SpanContext)
❖ Implements a macro:
with_child_span “span1” do
…
end
58. ❖ Uses “Phoenix Instrumenter”
❖ Creates Span for any Controller or View
❖ Integration (config.exs):
instrumenters: [OpencensusPhoenix.Instrumenter]
OpenCensus Elixir: Phoenix
59. ❖ Integrates into any pipeline with “Plug”
❖ Gets parent Span from headers
❖ Creates child Span with new attributes (call function to get them)
❖ Integration:
defmodule MyApp.TracePlug do
# some custom configuration
end
plug MyApp.TracePlug
OpenCensus Elixir: Plug
60. OpenCensus BEAM: Summary
❖ A lot of libraries ready to be used
❖ Seamless integration with other languages
❖ You need to understand the concept
62. Summary
❖ A lot of advantages: Introspection, Analytics, LoB, QoS
❖ Think about sending metrics with OpenCensus
❖ Easy to integrate even with Erlang/Elixir
63. Breaking News
❖ Update: May 21st
❖ OpenTracing + OpenCensus
=> OpenTelemetry
❖ Backward compatibility for both projects
❖ Nov 2019: readonly mode for
OpenTracing, OpenCensus