Time series are a sequence of data points positioned in order of time. Time series forecasting has two main purposes - to understand the mechanisms that lead to rise or fall, and to predict future values. Very often it analyses trends, cyclical events, seasonality and has unique importance in Economics and Business. The quality of predictions can be evaluated only in future due to temporal dependencies on previous data points and there are many model types for approximation. In this session we are going to talk about challenges, ways of improvement and technology stack like ML.NET, ARIMA, Python, Azure ML, Regression and FB Prophet
UNIVARIATE & BIVARIATE ANALYSIS
UNIVARIATE BIVARIATE & MULTIVARIATE
UNIVARIATE ANALYSIS
-One variable analysed at a time
BIVARIATE ANALYSIS
-Two variable analysed at a time
MULTIVARIATE ANALYSIS
-More than two variables analysed at a time
TYPES OF ANALYSIS
DESCRIPTIVE ANALYSIS
INFERENTIAL ANALYSIS
DESCRIPTIVE ANALYSIS
Transformation of raw data
Facilitate easy understanding and interpretation
Deals with summary measures relating to sample data
Eg-what is the average age of the sample?
INFERENTIAL ANALYSIS
Carried out after descriptive analysis
Inferences drawn on population parameters based on sample results
Generalizes results to the population based on sample results
Eg-is the average age of population different from 35?
DESCRIPTIVE ANALYSIS OF UNIVARIATE DATA
1. Prepare frequency distribution of each variable
Missing Data
Situation where certain questions are left unanswered
Analysis of multiple responses
Measures of central tendency
3 measures of central tendency
1.Mean
2.Median
3.Mode
MEAN
Arithmetic average of a variable
Appropriate for interval and ratio scale data
x
MEDIAN
Calculates the middle value of the data
Computed for ratio, interval or ordinal scale.
Data needs to be arranged in ascending or descending order
MODE
Point of maximum frequency
Should not be computed for ordinal or interval data unless grouped.
Widely used in business
MEASURE OF DISPERSION
Measures of central tendency do not explain distribution of variables
4 measures of dispersion
1.Range
2.Variance and standard deviation
3.Coefficient of variation
4.Relative and absolute frequencies
DESCRIPTIVE ANALYSIS OF BIVARIATE DATA
There are three types of measure used.
1.Cross tabulation
2.Spearmans rank correlation coefficient
3.Pearsons linear correlation coefficient
Cross Tabulation
Responses of two questions are combined
Spearman’s rank order correlation coefficient.
Used in case of ordinal data
Introduces common data management techniques in Stata. Topics covered include basic data manipulation commands such as: recoding variables, creating new variables, working with missing data, and generating variables based on complex selection criteria, merging and collapsing data sets. Intended for users who have an introductory level of knowledge of Stata software.
All workshop materials including slides, do files, and example data sets can be downloaded from http://projects.iq.harvard.edu/rtc/event/data-management-stata
UNIVARIATE & BIVARIATE ANALYSIS
UNIVARIATE BIVARIATE & MULTIVARIATE
UNIVARIATE ANALYSIS
-One variable analysed at a time
BIVARIATE ANALYSIS
-Two variable analysed at a time
MULTIVARIATE ANALYSIS
-More than two variables analysed at a time
TYPES OF ANALYSIS
DESCRIPTIVE ANALYSIS
INFERENTIAL ANALYSIS
DESCRIPTIVE ANALYSIS
Transformation of raw data
Facilitate easy understanding and interpretation
Deals with summary measures relating to sample data
Eg-what is the average age of the sample?
INFERENTIAL ANALYSIS
Carried out after descriptive analysis
Inferences drawn on population parameters based on sample results
Generalizes results to the population based on sample results
Eg-is the average age of population different from 35?
DESCRIPTIVE ANALYSIS OF UNIVARIATE DATA
1. Prepare frequency distribution of each variable
Missing Data
Situation where certain questions are left unanswered
Analysis of multiple responses
Measures of central tendency
3 measures of central tendency
1.Mean
2.Median
3.Mode
MEAN
Arithmetic average of a variable
Appropriate for interval and ratio scale data
x
MEDIAN
Calculates the middle value of the data
Computed for ratio, interval or ordinal scale.
Data needs to be arranged in ascending or descending order
MODE
Point of maximum frequency
Should not be computed for ordinal or interval data unless grouped.
Widely used in business
MEASURE OF DISPERSION
Measures of central tendency do not explain distribution of variables
4 measures of dispersion
1.Range
2.Variance and standard deviation
3.Coefficient of variation
4.Relative and absolute frequencies
DESCRIPTIVE ANALYSIS OF BIVARIATE DATA
There are three types of measure used.
1.Cross tabulation
2.Spearmans rank correlation coefficient
3.Pearsons linear correlation coefficient
Cross Tabulation
Responses of two questions are combined
Spearman’s rank order correlation coefficient.
Used in case of ordinal data
Introduces common data management techniques in Stata. Topics covered include basic data manipulation commands such as: recoding variables, creating new variables, working with missing data, and generating variables based on complex selection criteria, merging and collapsing data sets. Intended for users who have an introductory level of knowledge of Stata software.
All workshop materials including slides, do files, and example data sets can be downloaded from http://projects.iq.harvard.edu/rtc/event/data-management-stata
Residuals represent variation in the data that cannot be explained by the model.
Residual plots useful for discovering patterns, outliers or misspecifications of the model. Systematic patterns discovered may suggest how to reformulate the model.
If the residuals exhibit no pattern, then this is a good indication that the model is appropriate for the particular data.
Time series forecasting with machine learningDr Wei Liu
An introduction of developing and application time series forecast models with both traditional time series methods and machine learning techniques. Case study for a challenging very short-term electrical price forecasting project was presented.
Quantitative data analysis - John RichardsonOUmethods
Your project report should include: a viable research question; a critical literature review; a research proposal; and a work plan for the project. The proposed methods should include methods of data collection and methods of data analysis. Whether you are carrying out qualitative of quantitative research, you should know broadly how you are going to analyse your data before you collect them. And the work plan for your project should include a realistic estimate of the time it will take you to do the analysis. The aim of this presentation is to get you to think creatively about the kinds of analysis that might address your research problem.
Data Science - Part X - Time Series ForecastingDerek Kane
This lecture provides an overview of Time Series forecasting techniques and the process of creating effective forecasts. We will go through some of the popular statistical methods including time series decomposition, exponential smoothing, Holt-Winters, ARIMA, and GLM Models. These topics will be discussed in detail and we will go through the calibration and diagnostics effective time series models on a number of diverse datasets.
Data Lakehouse Symposium | Day 1 | Part 2Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
A Q-Q plot is a probability plot for assessing how closely two data sets agree, which plots the two quintiles against each other. Get your answers here by clicking what is qq plot?
Overview on data collection methods and a deep dive on data (primary Vs secondary, qualitative and quantitative). Bias. Data processing and structured, unstructured, semistructured data. Databases jargon.
IoT with Azure Machine Learning and InfluxDBIvo Andreev
Devices from the IoT realm generate data in a rate and magnitude that make it practically impossible to retrieve valuable information without support of adequate AI engines. Although being one among many solutions available, Azure ML has proved to be a great balance between flexibility, usability and affordable price.
Storing and serving billions of data measurements over time is also a non-trivial task addressed by the special class of Time Series DBs. Out of these, InfluxDB has the largest popularity, provides comprehensive documentation and above all - is available open source.
This session is about managing and understanding IoT data.
Residuals represent variation in the data that cannot be explained by the model.
Residual plots useful for discovering patterns, outliers or misspecifications of the model. Systematic patterns discovered may suggest how to reformulate the model.
If the residuals exhibit no pattern, then this is a good indication that the model is appropriate for the particular data.
Time series forecasting with machine learningDr Wei Liu
An introduction of developing and application time series forecast models with both traditional time series methods and machine learning techniques. Case study for a challenging very short-term electrical price forecasting project was presented.
Quantitative data analysis - John RichardsonOUmethods
Your project report should include: a viable research question; a critical literature review; a research proposal; and a work plan for the project. The proposed methods should include methods of data collection and methods of data analysis. Whether you are carrying out qualitative of quantitative research, you should know broadly how you are going to analyse your data before you collect them. And the work plan for your project should include a realistic estimate of the time it will take you to do the analysis. The aim of this presentation is to get you to think creatively about the kinds of analysis that might address your research problem.
Data Science - Part X - Time Series ForecastingDerek Kane
This lecture provides an overview of Time Series forecasting techniques and the process of creating effective forecasts. We will go through some of the popular statistical methods including time series decomposition, exponential smoothing, Holt-Winters, ARIMA, and GLM Models. These topics will be discussed in detail and we will go through the calibration and diagnostics effective time series models on a number of diverse datasets.
Data Lakehouse Symposium | Day 1 | Part 2Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
A Q-Q plot is a probability plot for assessing how closely two data sets agree, which plots the two quintiles against each other. Get your answers here by clicking what is qq plot?
Overview on data collection methods and a deep dive on data (primary Vs secondary, qualitative and quantitative). Bias. Data processing and structured, unstructured, semistructured data. Databases jargon.
IoT with Azure Machine Learning and InfluxDBIvo Andreev
Devices from the IoT realm generate data in a rate and magnitude that make it practically impossible to retrieve valuable information without support of adequate AI engines. Although being one among many solutions available, Azure ML has proved to be a great balance between flexibility, usability and affordable price.
Storing and serving billions of data measurements over time is also a non-trivial task addressed by the special class of Time Series DBs. Out of these, InfluxDB has the largest popularity, provides comprehensive documentation and above all - is available open source.
This session is about managing and understanding IoT data.
Practical deep learning for computer visionEran Shlomo
This is the presentation given in TLV DLD 2017. In this presentation we walk through the planning and implemintation of deeplearning solution for image recognition, with focus on the data.
It is based on the work we do at dataloop.ai and its customers.
This Project is helpful for Time Series Analysis Forecasting. Better accuracy and metrics
in short-term forecasting are provided for intermediate planning for the target to reduce
CO2 emissions. Implementing different models like Exponential techniques, Linear
statistical modeling, and Autoregressive are used to forecast the emissions and finally
deployed on Stream lit.
Best practices for monitoring your IT infrastructure using StatsD. Find dashboard examples here: https://p.datadoghq.com/sb/9b246c4ade
Monitor StatsD easily with Datadog. Learn more at https://www.datadoghq.com
Time Series Anomaly Detection with .net and AzureMarco Parenzan
If you have any device or source that generates values over time (also a log from a service), you want to determine if in a time frame, the time serie is correct or you can detect some anomalies. What can you do as a developer (not a Data Scientist) with .NET o Azure? Let's see how in this session.
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEHONGJOO LEE
45 min talk about collecting home network performance measures, analyzing and forecasting time series data, and building anomaly detection system.
In this talk, we will go through the whole process of data mining and knowledge discovery. Firstly we write a script to run speed test periodically and log the metric. Then we parse the log data and convert them into a time series and visualize the data for a certain period.
Next we conduct some data analysis; finding trends, forecasting, and detecting anomalous data. There will be several statistic or deep learning techniques used for the analysis; ARIMA (Autoregressive Integrated Moving Average), LSTM (Long Short Term Memory).
SQLBits Module 2 RStats Introduction to R and StatisticsJen Stirrup
SQLBits Module 2 RStats Introduction to R and Statistics. This is a 90 minute segment of a full preconference workshop, focusing on data analytics with R.
4Developers 2015: Measure to fail - Tomasz KowalczewskiPROIDEA
YouTube: https://www.youtube.com/watch?v=H5F0D55nKX4&index=11&list=PLnKL6-WWWE_WNYmP_P5x2SfzJ7jeJNzfp
Tomasz Kowalczewski
Language: English
Hardware fails, applications fail, our code... well, it fails too (at least mine). To prevent software failure we test. Hardware failures are inevitable, so we write code that tolerates them, then we test. From tests we gather metrics and act upon them by improving parts that perform inadequately. Measuring right things at right places in an application is as much about good engineering practices and maintaining SLAs as it is about end user experience and may differentiate successful product from a failure.
In order to act on performance metrics such as max latency and consistent response times we need to know their accurate value. The problem with such metrics is that when using popular tools we get results that are not only inaccurate but also too optimistic.
During my presentation I will simulate services that require monitoring and show how gathered metrics differ from real numbers. All this while using what currently seems to be most popular metric pipeline - Graphite together with com.codahale metrics library - and get completely false results. We will learn to tune it and get much better accuracy. We will use JMeter to measure latency and observe how falsely reassuring the results are. We will check how graphite averages data just to helplessly watch important latency spikes disappear. Finally I will show how HdrHistogram helps in gathering reliable metrics. We will also run tests measuring performance of different metric classes
Hardware fails, applications fail, our code... well, it fails too (at least mine). To prevent software failure we test. Hardware failures are inevitable, so we write code that tolerates them, then we test. From tests we gather metrics and act upon them by improving parts that perform inadequately. Measuring right things at right places in an application is as much about good engineering practices and maintaining SLAs as it is about end user experience and may differentiate successful product from a failure.
In order to act on performance metrics such as max latency and consistent response times we need to know their accurate value. The problem with such metrics is that when using popular tools we get results that are not only inaccurate but also too optimistic.
During my presentation I will simulate services that require monitoring and show how gathered metrics differ from real numbers. All this while using what currently seems to be most popular metric pipeline - Graphite together with metrics.dropwizard.io library - and get completely false results. We will learn to tune it and get much better accuracy. We will use JMeter to measure latency and observe how falsely reassuring the results are. Finally I will show how HdrHistogram helps in gathering reliable metrics. We will also run tests measuring performance of different metric classes.
How can we scale forecasting to many people and problems within an organization? Here I argue for a strategy that makes model building equivalent to feature construction, making building a forecaster similar to building a classifier.
Similar to Forecasting time series powerful and simple (20)
Cybersecurity and Generative AI - for Good and Bad vol.2Ivo Andreev
The presentation is an extended in-depth version review of cybersecurity challenges with generative AI, enriched with multiple demos, analysis, responsible AI topics and mitigation steps, also covering a broader scope beyond OpenAI service.
Popularity, demand and ease of access to modern generative AI technologies reveal new challenges in the cybersecurity landscape that vary from protecting confidentiality and integrity of data to misuse and abuse of technology by malicious actors. In this session we elaborate about monitoring and auditing, managing ethical implications and resolving common problems like prompt injections, jailbreaks, utilization in cyberattacks or generating insecure code.
Architecting AI Solutions in Azure for BusinessIvo Andreev
The topic is about Azure solution architectures that involve IoT and AI to solve common business domain problems. With near real time recommender system and an object detection with image recognition we review the architecture, build from the ground-up and illustrate how the typical realistic challenges could be addressed.
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
The presentation is an extended in-depth version review of cybersecurity challenges with generative AI, enriched with multiple demos, analysis, responsible AI topics and mitigation steps, also covering a broader scope beyond OpenAI service.
Popularity, demand and ease of access to modern generative AI technologies reveal new challenges in the cybersecurity landscape that vary from protecting confidentiality and integrity of data to misuse and abuse of technology by malicious actors. In this session we elaborate about monitoring and auditing, managing ethical implications and resolving common problems like prompt injections, jailbreaks, utilization in cyberattacks or generating insecure code.
JS-Experts - Cybersecurity for Generative AIIvo Andreev
Popularity, demand and ease of access to modern generative AI technologies reveal new challenges in the cybersecurity landscape that vary from protecting confidentiality and integrity of data to misuse and abuse of technology by malicious actors. In this session we elaborate about monitoring and auditing, managing ethical implications and resolving common problems like prompt injections, jailbreaks, utilization in cyberattacks or generating insecure code.
This is a totally different perspective of LLMs
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersIvo Andreev
Have you ever wondered why GPT models work? Do you ask questions like:
◉ How does GPT work? Why does the same problem receive different answers for different users? Is there a way to improve explainability? ◉ Can GPT model provide its sources? Why does Bing chat work differently? What are my ways to have better performance and improve completions? ◉ How can I work with data in my enterprise? What practical business cases could a generative AI model fit solving?
If you are tired of sessions just scratching the surface of OpenAI GPT, this one will go deeper and answer questions like why, why not and how.
Key Terms; ChatGPT Enterprise; Top Questions; Enterprise Data; Azure Search; Functions; Embeddings; Context Encoding; General Intelligence; Emerging Abilities; Chain of Thought; Plugins; Multimodal with DALL-E; Project Florence
OpenAI GPT in Depth - Questions and MisconceptionsIvo Andreev
OpenAI GPT in depth – misconceptions and questions you would like answered
Have you ever wondered why GPT models work? Do you ask questions like:
How does GPT work? Why does the same problem receive different answers for different users? Is there a way to improve explainability? Can GPT model provide its sources? Why does Bing chat work differently? What are my ways to have better performance and improve completions? How can I work with data in my enterprise? What practical business cases could a generative AI model fit solving?
If you are tired of sessions just scratching the surface of OpenAI GPT, this one will go deeper and answer questions like why, why not and how.
Cutting Edge Computer Vision for EveryoneIvo Andreev
Microsoft offers a wide range of tools and advanced solutions to support you in managing computer vision related tasks.
From purely coding approaches with ML.NET, through zero-code ComputerVision.ai to advanced and flexible AI service in Azure ML, there is a solution for every need and each type of person.
From running on premises, through managed infrastructure to completely cloud services the speed of getting to the desired results and the return of investment are guaranteed.
Join this session to get insights about the options, deployment, pricing, pros and cons compared and select the most appropriate tech for your business case.
Collecting and Analysing Spaceborn DataIvo Andreev
Communicating with space and analysing satellite data
Azure reached beyond the clouds and bring space-born satellite data to your subscription for analysis and discovering insights.
Satellite as a service, Azure Orbital and a whole new ecosystem signal the ambition to push the limits and explore new opportunities.
In this session we are talking about geospatial AI-based analysis and a comprehensive flow that will allow you touch a vector of increasing importance for extending the cloud and helping businesses make tactical decisions.
Collecting and Analysing Satellite Data with Azure OrbitalIvo Andreev
Azure reached beyond the clouds and bring space-born satellite data to your subscription for analysis and discovering insights.
Satellite as a service, Azure Orbital and a whole new ecosystem signal the ambition to push the limits and explore new opportunities.
In this session we are talking about geospatial AI-based analysis and a comprehensive flow that will allow you touch a vector of increasing importance for extending the cloud and helping businesses make tactical decisions.
Azure Orbital - a fully managed cloud-based ground station as a service that enables you to communicate with your spacecrafts or satellites and generate products for customers.
AZ orbital handles machine-machine communication for the user based on the schedule and TLE location of satellites.
Azure software modules decrypt satellite data and prepare for usage.
Since Nov 2021 AZ cognitive for language is having a fresh tool – the Language Studio which is now in Preview. The studio offers multiple prebuilt and preconfigured models which allow you to quickly implement, test and deploy tasks like understanding conversational language, extracting information, classifying text or answering questions. But it goes further and offers multiple features to create, train and deploy custom models that model your data and serves your needs best. Language Studio does that by utilizing workflows that let developers build models without the need of ML knowledge and deploy the results as handy APIs.
Cosmos DB is among the top databases, with its strengths being in a flexible, extremely scalable hosted model, high SLA, low latency, globally distributed, automatic indexing, 2-dimensional redundancy and granular access level. But how does it suit IoT scenarios and for what scenarios is it appropriate?
Constrained Optimization with Genetic Algorithms and Project BonsaiIvo Andreev
Traditional machine learning requires volumes of labelled data that can be time consuming and expensive to produce,”
“Machine teaching leverages the human capability to decompose and explain concepts to train machine learning models
direction (teaching the correct answer is not by showing the data for it, but by using a person to show the answer).
Project Bonsai is a low code platform for intelligent solutions but with a different perspective on data it allows a completely new approach to tasks, especially when the physical world is involved. Under the hood it combines machine teaching, calibration and optimization to create intelligent control systems using simulations. The teaching curriculum is performed using a new language concept - “Inkling” and training a model is easy and interactive.
Azure security guidelines for developers Ivo Andreev
Azure security baselines and benchmarks, Security Maturity Model, Industrial Internet Consortium IIC , Certification, Web Application Firewall, API Management Service
Autonomous Machines with Project BonsaiIvo Andreev
Autonomous machines rely on fusion of many technologies to sense, plan, optimize and act as if an intelligent superhuman is in control.
Project Bonsai is a machine teaching service that combines machine learning (ML), calibration and optimization to create intelligent control systems using simulations. The teaching curriculum is performed using a proprietary “Inkling” language close to JavaScript and training a model is easy and interactive. Join this session for a Bonsai jump start and a demo and try it yourself – it is free.
Global azure virtual 2021 - Azure LighthouseIvo Andreev
Azure Lighthouse provides capabilities to perform cross-tenant management at scale.
We do this by providing you the ability to view and manage multiple customers from a single context.
Building a scalable business model in the cloud is a real challenge that is of uncomparable complexity compared to project-based solutions.
If you want to offer a solution in the cloud and onboard multiple customers, the next step would be to consider how would you deploy, maintain and monitor such environment. What is Azure Lighthouse and how to make your first steps following good practices is the response to that question and the main topic of our session.
Flux QL - Nexgen Management of Time Series Inspired by JSIvo Andreev
The time series landscape evolves fast to meet the aggressive challenges in IoT. Influx 2.0 Beta was released in the first days of 2020 and although being already Top 1 time series database it introduces a revolutionary change again. InfluxDB 2 is now generally available and its key features are originate from Flux - a functional and open source 4th generation analytical programming language inspired by JavaScript. Supported in VS Code it takes a new approach towards data exploration of time series data and enables some unmatched capabilities like enrichment and filtering of time series data with external data from RDBMS.
Azure architecture design patterns - proven solutions to common challengesIvo Andreev
Building a reliable, scalable, secure applications could happen either following verified design patterns or the hard way - following the trial and error approach. Azure architecture patterns are a tested and accepted solutions of common challenges thus reducing the technical risk to the project by not having to employ a new and untested design. However, most of the patterns are relevant to any distributed system, whether hosted on Azure or on other cloud platforms.
Industrial IoT from the Ground up with Azure and Open Source
IIoT leverages the power of machines and realtime analytics to pick up on industrial inefficiencies and problems sooner, and save time and money in addition to supporting BI efforts. In a myriad of reference architectures it is up to experience and trial-error to find out what really works in a real life scenario.
We will review the challenges and solutions in building an IIoT platform from the ground up on the edge between Azure and open source in order to have the best from both worlds. Technical focus will be on IoT Edge, TS Insights, Stream Analytics, IoT Hub, App Insights, Event Grid, Service Bus, ARM templates, Influx DB, Grafana and more - all neatly glued together by Azure Functions.
The Power of Auto ML and How Does it WorkIvo Andreev
Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science.
In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.
Flying a Drone with JavaScript and Computer VisionIvo Andreev
Almost anything that used to run on desktop, now runs in the browser and as of Atwood's law: anything that could be written in JavaScript, will eventually be written in JavaScript.
If you have dared imagining to control your toys with code, communicate with the cloud and use advanced computer intelligence, your dreams have now become close at hand.
This session is to challenge your fantasy and make you think what you could do with JavaScript. This session is about programming drones with JavaScript and AI capabilities.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
7. Agenda
• Time Series?
• Forecasting?
• ML.NET
• Azure ML Service
• ARIMA/AutoARIMA
• Regression
• FB Prophet
• Demo
8. Takeaways
Time Series
o Introduction to Hierarchical Time Series
o Overview of Time Series Forecasting Models
o Time Series Analysis with Python
ARIMA
o Time Series Forecasting with ARIMA models
o ARIMA, Auto ARIMA, Prophet, Regression (Youtube)
SSA
o A Brief Introduction to SSA
o Forecast Service Demand with Time Series Analysis and ML.NET
FB Prophet
o FB Prophet Quickstart (FB GitHub)
o Time Series Analysis using FB Prophet
o Generate Accurate Forecasts with FB Prophet in Python
9. Time Series – a sequence of
observations taken over time
Forecasting – the process of
predicting for new data
10. Describing or Forecasting
• Data are Temporal
o Unlike other data, the fact that a point is close to another is important
• Sample Data Look like…
• Time Series Analysis
o Understanding Time Series and underlying causes
o Create a mathematical model that describes data
o Determine seasonal patterns, trends, relations to external factors
o Note: assumptions are often in place (i.e. the form of data)
• Forecasting
o Scientific predictions based on historical time-stamped data
o Univariate / Multivariate TS Forecasting
o Note: Explanatory power is often low
Time Value
2021-11-01T00:00:00+02:00 66
2021-11-01T01:00:00+02:00 29
2021-11-01T02:00:00+02:00 6
2021-11-01T03:00:00+02:00 8
2021-11-01T04:00:00+02:00 91
2021-11-01T05:00:00+02:00 145
2021-11-01T06:00:00+02:00 14
2021-11-01T07:00:00+02:00 19
2021-11-01T08:00:00+02:00 64
2021-11-01T09:00:00+02:00 4
2021-11-01T10:00:00+02:00 22
2021-11-01T11:00:00+02:00 65
2021-11-01T12:00:00+02:00 30
2021-11-01T13:00:00+02:00 152
2021-11-01T14:00:00+02:00 30
2021-11-01T15:00:00+02:00 17
2021-11-01T16:00:00+02:00 9
2021-11-01T17:00:00+02:00 11
2021-11-01T18:00:00+02:00 19
2021-11-01T19:00:00+02:00 76
2021-11-01T20:00:00+02:00 117
2021-11-01T21:00:00+02:00 152
2021-11-01T22:00:00+02:00 53
2021-11-01T23:00:00+02:00 3
2021-11-02T00:00:00+02:00 13
11. Practical Use Cases
• Sample Data Sources
o Sensor readings (environmental data, temperature, pressure, humidity)
o Financial market data
o Medical data (body parameters, heartbeat, pulse rate, blood pressure)
• Sample Scenarios
o Unit sales for each day in a store
o Number of passengers on a station
o Number of users of a web site
o Liters of usage of hot water in a household
o Stocks price for a day
o Diesel price for the next week
o Water level of a dam during the year
o Body weight over the year ☺
13. Hierarchical Time Series Forecasting
• Hierarchical TS
o Evident hierarchical structure
o Lower levels are nested (i.e. geographical split)
• Grouped TS
o Multiple non-nested levels of detail (i.e. category, retailer, colour)
• Hierarchical Forecasting
o A collection of techniques rather that another methodology
o Generate forecast that is consistent across the whole hierarchy
o Forecasts shall add up
• Approaches
o Bottom up, Top-down
o Middle-out (Mixed) – Bottom-up (above middle), Top-down (below middle)
o Reconciliation – each level independently, Determine coefficients with linear regression
Bulgaria
East
Varna
Burgas
West
Sofia
14. Quacks like Time Series, Moves like …
• Do you have enough data?
o More data = more options for aggregation, model tuning, model testing
• Time horizon for prediction?
o Shorter time horizon can be predicted with higher confidence
• Are forecasts updateable or static?
o Retrain after new data are available for more accurate results
• Frequency of forecasts?
o Downsampling and upsampling of data affect accuracy (in both directions)
• Is time series stationary?
o Time series properties do not depend on observation time?
15. Time Series Stationarity
• Stationarity
o Statistical properties of TS do not depend on time of observation (mean, variance)
o Rule: Non-stationary data are unpredictable and cannot be forecasted
o Conclusion: Non-stationary TS data need to be converted to stationary
• Differencing
o Method to transform time series and remove time-dependent attributes (trend, seasonality)
o Lag difference could be calculated on a larger time window (i.e. window size)
Note: Some TS forecasting methods do not require stationarity (i.e. ARIMA), as
preliminary differencing is performed. (ARMA does though)
difference(t) = observation(t) - observation(t-1)
Example: 1 2 3 4 5 6 7 8 9 10
Differencing: 1 1 1 1 1 1 1 1 1
inverted(t) = differenced(t) + observation(t-1)
17. Time Series Analysis
TS Analysis provides techniques to understand data and break into components:
• Trend (Tt)
o Smooth general long term tendency to increase, decrease or both
• Seasonality (St)
o Rhythmic forces operate on smaller intervals (i.e. 1h, 1d, 1w, 1m)
• Cyclic (Ct)
o Cyclic behaviour that repeats over a long period (i.e. 4y, 1y)
• Random Noise (Rt)
o Random irregular observations that cannot be explained (unpredictable)
Additive Model: Yt = Tt + St + Ct + Rt
Multipl. Model: Yt = Tt * St * Ct * Rt
Mixed Model: Yt = Tt * Ct + St * Rt; Yt = Tt + St * Ct * Rt
18. Advanced
Observation: Time series tend to display significant autocorellation
• Correlation
o Measures the relationship between TS and a lagged version of it (T, T-k)
o Meaning: ± 1 - perfect correlation; 0 – no correlation
• Measured with Pearson Correlation
o Preconditions: normal distribution, no significant outliers, continuous variables
o Cross-correlation - the correlation is observed across different lags
• Augmented Dickey-Fuller Test (python adfuller function)
o Null hypothesis (H0) – the TS has a unit root (non-stationary)
o Alternate hypothesis (HA) – the null hypothesis is rejected
• ADF p-value < 0.05
• H0 rejected = TS is stationary
19. Common Data Preparation
• Imputation
o Replacing missing data with substitute values
• Frequency / Resampling
o Could be too high for a model compared to prediction front
o Irregular time series may require resampling at regular intervals
• Outliers
o Extreme values need to be identified and handled
o Outlier = Value ∉ [Q1-1.5*IQR; Q3+1.5*IQR]
Does missing data have
meaning?
NO
Type of data
Large dataset, little
data missing at
random:
Remove instances with
"missing "? data
Does data follow simple
distribution?
NO
Impute with simple ML
model
YES
Impute with mean value
YES, with outliers
Impute missing values
with median
Large, temporary
ordered dataset:
Replace data with
preceding values
YES: Numerical
Convert missing values
to meaningful number
21. Naïve Algorithms Baseline
Note: Naïve algorithms are often referred to as “benchmark models”
Naïve Model
• Forecasts for any horizon match the last value
SNaïve Model(Seasonal Naïve)
• Assumes a seasonal component with time window T
• Forecast matches the last T timestamps
22. ARIMA (AutoRegressive Integrated Moving Average)
• Auto Regressive - linear combination of past values of the variable
o Assume that future will resemble the past
o Inaccurate when an unseen event happens
• Moving Average - linear combination of past forecast errors.
o Smooth impacts of short-term fluctuations
o Simple MA – arithmetic mean of the previous 5,10,20,100 etc. values
o Exponential MA - weighted average that gives greater importance to the most recent values
• Integrated – Differencing for stationary time series
• ARIMA Parameters
o p – number of observations from the Past to forecast future
o d – degree of Differencing (number of times raw observations are differenced for stationarity)
o q – size of the window to calculate forecast Quality errors
ARIMA(p,d,q) = const + (weighted sum last P values) + (weighted sum of last Q errors) after D differencing
23. SeasonalARIMA
• ARIMA (p,d,q) is a non-seasonal ARIMA
• SARIMA (p, d, q, P, D, Q)
o P - number of seasonal autoregressive terms,
o D – differencing order (number of transformations to make TS stationary)
o Q - moving-average order of seasonal component
o m – periods in a season (i.e. 12 for monthly data)
• The parameter space becomes larger
• Grid search for optimal parameters
24. AutoARIMA
• Identifies the most optimal parameters of ARIMA (p, d, q)
o pip install pyramid-arima (mimics R auto.arima)
o .fit() does a magic
o Utilizes AIC (Akaike Information Criterion) to pick best model (smaller = better)
• N*ln(SSe/N)+2K – N (N- number of observations, SSe - SumSquareErrors, K – model parameters)
• Conducts differencing tests to determine the order of differencing
• Pros
o Saves time
o One of the simplest techniques for TS forecasting
o Eliminates the need of in-depth statistics understanding
o Reduces the chance of human error due to misinterpretation
model = auto_arima(train, [42 other optional arguments])
model.fit(train)
25. Singular Spectrum Analysis (SSA)
• Novel powerful technique
• 2 complementary stages
o Decomposition - extract independent components from time series
o Reconstruction – reconstruct the series for forecasting, after removing noise
• Pros
o Works with arbitrary statistical process
o No assumptions for data (i.e. stationarity)
• ML.NET ForecastBySsa Parameters
o trainSize – number of train samples (rows) from beginning (i.e. 300)
o seriesLength – length of series in buffer (how much data to use to train on)
o windowSize – length of the window on the series (seasonality)
o horizon – number of values to forecast (i.e. 24)
o confidenceLevel – degree of certainty (i.e. 95% of estimates to contain the real)
26. SSA, How it Works
• How does it work
• Checkpoint
o Avoids replay of all previous data, provide only most recent observations
o But if this creates a drift, a clean retrain on last observations (i.e. 1 month) may be better
MLContext mlContext = new MLContext(); //All ML.NET operations are within context
IDataView dv = mlContext.Data.LoadFromTextFile(…) //Step 1: Load data from file
var pipeline = mlContext.Forecasting.ForecastBySsa([Parameters],…) //Step 2: SSA Pipeline
SsaForecastingTransformer forecaster = pipeline.Fit(dv); //Step 3: Data training
… //Step 4: Evaluate (i.e. calculate RMSE)
var forecastEngine = forecaster.CreateTimeSeriesEngine(mlContext);
ModelOutput forecast = forecastEngine.Predict(); //Step 5: Load trained model and predict
forecastEngine.CheckPoint(mlContext, outputModelPath); //Save Checkpoint
model = mlContext.Model.Load(file, out DataViewSchema schema); //Load from Checkpoint
forecastEngine = model.CreateTimeSeriesEngine<TimeSeriesData, ChangePointPrediction>(mlContext);
27. Regression Model
• Forecasting Recap
o Data are ordered in series as {Time: Value} pairs; No external knowledge
• Regression
o Predicting a single numeric value
o Time Series Forecasting involves Regression under the hood
o Can be applied to non-ordered data
o Shall be applied multiple times to predict the same horizon
• Feature Engineering & Extraction
o Date – Year, Month, Day, Hour
o Lag – What has happened at T-1, T-2, T-12, T-24, T-48, T-n observations
o Delta – What is the difference from T-1, T-2, T-12, T-24, T-48, T-n observations
o Moving Average – Mean(2), Mean(12), Mean(24), Mean (48), …
o Sum – Sum(2), Sum(12), Sum(24), Sum(48),…
o Domain knowledge – Weather, Distance (not GPS), Ref. Price
28. Azure ML Service
• Azure Auto ML (Forecasting uses AutoARIMA under the hood)
o The easiest still powerful way to do ML
o Optimizes the iterative time consuming tasks of ML
o Azure Auto ML Python SDK
o Azure ML Studio – (ML Studio Classic retires August 2024)
Upload File Select Task Type Parameters Metrics
29. • Created by in 2017
• Pros
o Trains quickly, highly accurate
o No background required (like AutoARIMA)
o Can also be used for multivariate TS analysis
o Handles outliers and missing data well
o Strong at series with seasonal effects and few seasons in training data
o Handles random changes due to special events (i.e. market events)
• Under the Hood
o Requires prophet Python package
o Uses additive regression model
Y(t) = Trend(t) + Seasonality(t) + Holiday(t) + Error(t)
30. • Prophet does not run on Python 3.9
• What’s the easiest
• Install Azure Data Science VM (< 4 Cores is sluggish)
• Find the 3.8 Kernel from Jupyter Lab
• Activate kernel
• Use Conda package manager to install
• Conda has own C++ compiler to build the packages
• Select a channel
Prophet – Easy to Use, Hard to Install
C:> activate py38_default
(py38_default) C:> conda install pystan -c conda-forge
(py38_default) C:> conda install -c conda-forge fbprophet