This document discusses estimating financial risk using Apache Spark. It introduces the value-at-risk (VaR) metric, which estimates the maximum potential loss of a portfolio over a given time period and probability. It describes approaches for VaR estimation including variance-covariance, historical simulation, and Monte Carlo simulation. It then outlines how to predict instrument returns from market risk factors using linear regression models in Spark. The document shows how to generate multivariate normal random samples to simulate factor returns, run trials to estimate portfolio losses, and calculate VaR and expected shortfall. It argues that Spark enables easier, more powerful financial risk analysis by allowing joint processing, simulation matrices to be saved fully in memory, and calling GPUs for matrix operations.
Financial Modeling with Apache Spark: Calculating Value at RiskC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1LAW6pi.
Sandy Ryza walks the audience through a basic VaR calculation with Spark. The calculation employs the widely used Monte Carlo method, which is useful for modeling portfolios with non-normal distributions of returns. The talk aims to give a feel for what it is like to approach financial modeling with modern big data tools. Filmed at qconnewyork.com.
Sandy Ryza is a data scientist at Cloudera focusing on Apache Spark and its ecosystem. He is an active contributor to the Spark project and coauthor of O’Reilly Media’s forthcoming Advanced Analytics on Spark, as well as an Apache Hadoop committer and PMC member.
BoldRadius' Senior Software Developer Alejandro Lujan explains how to use higher order functions in Scala and illustrates them with some examples.
See the accompanying video at www.boldradius.com/blog
Financial Modeling with Apache Spark: Calculating Value at RiskC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1LAW6pi.
Sandy Ryza walks the audience through a basic VaR calculation with Spark. The calculation employs the widely used Monte Carlo method, which is useful for modeling portfolios with non-normal distributions of returns. The talk aims to give a feel for what it is like to approach financial modeling with modern big data tools. Filmed at qconnewyork.com.
Sandy Ryza is a data scientist at Cloudera focusing on Apache Spark and its ecosystem. He is an active contributor to the Spark project and coauthor of O’Reilly Media’s forthcoming Advanced Analytics on Spark, as well as an Apache Hadoop committer and PMC member.
BoldRadius' Senior Software Developer Alejandro Lujan explains how to use higher order functions in Scala and illustrates them with some examples.
See the accompanying video at www.boldradius.com/blog
The hunt for the most effective machine learning model is hard enough with a modest dataset, and much more so as our data grow! As we search for the optimal combination of features, algorithm, and hyperparameters, we often use tools like histograms, heatmaps, embeddings, and other plots to make our processes more informed and effective. However, large, high-dimensional datasets can prove particularly challenging. In this talk, we’ll explore a suite of visual diagnostics, investigate their strengths and weaknesses in face of increasingly big data, and consider how we can steer the machine learning process, not only purposefully but at scale!
Native ads (ads that match the look and feel of the embedding page) have become a multi-billion dollar business in recent years. Gemini native is Yahoo’s native advertisement platform and this talk will overview some of the science behind its ad ranking.
The accurate prediction of an ad’s click-through rate (CTR) for a given impression is a key component of any such ad ranking system as it allows one to rank the ads according to their expected revenue. I will give a short overview of different CTR prediction models and deep dive into the major components of large-scale logistic regression models; a special focus will be given to implementing such a logistic regression model in Apache Spark.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/254wkpw.
Aaron Bedra focuses on describing a system as a series of models that can be used to systematically and automatically generate input data and ensure that a code is behaving as expected. Bedra discusses property based testing and how it can help one build more resilient systems and even reduce the time needed to maintain a test suite. Filmed at qconlondon.com.
Aaron Bedra is Chief Security Officer at eligible.com. He is the creator of Repsheet, an open source threat intelligence framework. Bedra is the co-author of Programming Clojure, 2nd Edition and a frequent contributor to open source software.
Aspect-based sentiment analysis is a text analysis technique that breaks down text into aspects (attributes or components of a product or service), and then scores the sentiment level (positive, negative or neutral) of each aspect. In this talk we'll walk through a production pipeline for training large Aspect Based Sentiment Analysis model in python with the Intel NLP Architect package based on the following open sourced code https://github.com/microsoft/nlp-recipes/tree/master/examples/sentiment_analysis/absa
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
https://github.com/yaowser/data_mining_group_project
https://www.kaggle.com/c/zillow-prize-1/data
From the Zillow real estate data set of properties in the southern California area, conduct the following data cleaning, data analysis, predictive analysis, and machine learning algorithms:
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regression Model Performance, Optimizing Support Vector Machine Classifier, Accuracy of results and efficiency, Logistic Regression Feature Importance, interpretation of support vectors, Density Graph
This slide deck gives an overview of the Azure Machine Learning Service. It highlights benefits of Azure Machine Learning Workspace, Automated Machine Learning and integration Notebook scripts
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
The hunt for the most effective machine learning model is hard enough with a modest dataset, and much more so as our data grow! As we search for the optimal combination of features, algorithm, and hyperparameters, we often use tools like histograms, heatmaps, embeddings, and other plots to make our processes more informed and effective. However, large, high-dimensional datasets can prove particularly challenging. In this talk, we’ll explore a suite of visual diagnostics, investigate their strengths and weaknesses in face of increasingly big data, and consider how we can steer the machine learning process, not only purposefully but at scale!
Native ads (ads that match the look and feel of the embedding page) have become a multi-billion dollar business in recent years. Gemini native is Yahoo’s native advertisement platform and this talk will overview some of the science behind its ad ranking.
The accurate prediction of an ad’s click-through rate (CTR) for a given impression is a key component of any such ad ranking system as it allows one to rank the ads according to their expected revenue. I will give a short overview of different CTR prediction models and deep dive into the major components of large-scale logistic regression models; a special focus will be given to implementing such a logistic regression model in Apache Spark.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/254wkpw.
Aaron Bedra focuses on describing a system as a series of models that can be used to systematically and automatically generate input data and ensure that a code is behaving as expected. Bedra discusses property based testing and how it can help one build more resilient systems and even reduce the time needed to maintain a test suite. Filmed at qconlondon.com.
Aaron Bedra is Chief Security Officer at eligible.com. He is the creator of Repsheet, an open source threat intelligence framework. Bedra is the co-author of Programming Clojure, 2nd Edition and a frequent contributor to open source software.
Aspect-based sentiment analysis is a text analysis technique that breaks down text into aspects (attributes or components of a product or service), and then scores the sentiment level (positive, negative or neutral) of each aspect. In this talk we'll walk through a production pipeline for training large Aspect Based Sentiment Analysis model in python with the Intel NLP Architect package based on the following open sourced code https://github.com/microsoft/nlp-recipes/tree/master/examples/sentiment_analysis/absa
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
https://github.com/yaowser/data_mining_group_project
https://www.kaggle.com/c/zillow-prize-1/data
From the Zillow real estate data set of properties in the southern California area, conduct the following data cleaning, data analysis, predictive analysis, and machine learning algorithms:
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regression Model Performance, Optimizing Support Vector Machine Classifier, Accuracy of results and efficiency, Logistic Regression Feature Importance, interpretation of support vectors, Density Graph
This slide deck gives an overview of the Azure Machine Learning Service. It highlights benefits of Azure Machine Learning Workspace, Automated Machine Learning and integration Notebook scripts
Similar to Estimating Value at Risk with Spark (20)
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
14. Fancier
• Add features that are non-linear transformations of
the market risk factors
• Decision trees
• For options, use Black-Scholes
15. import org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression
// Load the instruments and factors
val factorReturns: Array[Array[Double]] = ...
val instrumentReturns: RDD[Array[Double]] = ...
// Fit a model to each instrument
val models: Array[Array[Double]] =
instrumentReturns.map { instrument =>
val regression = new OLSMultipleLinearRegression()
regression.newSampleData(instrument, factorReturns)
regression.estimateRegressionParameters()
}.collect()
16. How to sample factor
returns?
• Need to be able to generate sample vectors where
each component is a factor return.
• Factors returns are usually correlated.
19. The Multivariate Normal Distribution
• Probability distribution over vectors of length N
• Given all the variables but one, that variable is
distributed according to a univariate normal
distribution
• Models correlations between variables
20.
21. import org.apache.commons.math3.stat.correlation.Covariance
// Compute means
val factorMeans: Array[Double] = transpose(factorReturns)
.map(factor => factor.sum / factor.size)
// Compute covariances
val factorCovs: Array[Array[Double]] = new Covariance(factorReturns)
.getCovarianceMatrix().getData()
22. Fancier
• Multivariate normal often a poor choice compared to
more sophisticated options
• Fatter tails: Multivariate T Distribution
• Filtered historical simulation
• ARMA
• GARCH
23. Running the Simulations
• Create an RDD of seeds
• Use each seed to generate a set of simulations
• Aggregate results
24. def trialReturn(factorDist: MultivariateNormalDistribution, models: Seq[Array[Double]]): Double = {
val trialFactorReturns = factorDist.sample()
var totalReturn = 0.0
for (model <- models) {
// Add the returns from the instrument to the total trial return
for (i <- until trialFactorsReturns.length) {
totalReturn += trialFactorReturns(i) * model(i)
}
}
totalReturn
}
25. // Broadcast the factor return -> instrument return models
val bModels = sc.broadcast(models)
// Generate a seed for each task
val seeds = (baseSeed until baseSeed + parallelism)
val seedRdd = sc.parallelize(seeds, parallelism)
// Create an RDD of trials
val trialReturns: RDD[Double] = seedRdd.flatMap { seed =>
trialReturns(seed, trialsPerTask, bModels.value, factorMeans, factorCovs)
}
33. Easier to use
• Scala and Python REPLs
• Single platform for
• Cleaning data
• Fitting models
• Running simulations
• Analyzing results
34. New powers
• Save full simulation-loss matrix in memory (or disk)
• Run deeper analyses
• Join with other datasets
35. But it’s CPU bound and
we’re using Java?
• Computational bottlenecks are normally in matrix
operations, which can be BLAS-ified
• Can call out to GPUs just like in C++
• Memory access patterns aren’t high-GC inducing