This document discusses various concepts related to time series analysis in Python. It covers topics like date and time data types, resampling time series data to different frequencies, plotting and visualizing time series, and performing time series analysis tasks such as decomposition, forecasting, and detecting trends and seasonality using Python libraries. It also discusses shifting, indexing, and selecting subsets of time series data.
This presentation educates you about R - Time Series Analysis with its Following description parameters using basic syntax, The Time series chart, Different Time Intervals and Multiple Time Series with The Multiple Time series chart.
For more topics stay tuned with Learnbay.
Real Time Analytics with Apache Cassandra - Cassandra Day MunichGuido Schmutz
Time series data is everywhere: IoT, sensor data or financial transactions. The industry has moved to databases like Cassandra to handle the high velocity and high volume of data that is now common place. In this talk I will present how we have used Cassandra to store time series data. I will highlight both the Cassandra data model as well as the architecture we put in place for collecting and ingesting data into Cassandra, using Apache Kafka and Apache Storm.
FOR MORE CLASSES VISIT
tutorialoutletdotcom
CMIS 102 Hands-On Lab
Week 8
Overview
This hands-on lab allows you to follow and experiment with the critical steps of developing a program
including the program description, analysis, test plan, and implementation with C code. The example
provided uses sequential, repetition, selection statements, functions, strings and arrays.
1
CMIS 102 Hands-On Lab
Week 8
Overview
This hands-on lab allows you to follow and experiment with the critical steps of developing a program
including the program description, analysis, test plan, and implementation with C code. The example
provided uses sequential, repetition, selection statements, functions, strings and arrays.
Program Description
This program will input and store meteorological data into an array. The program will prompt the user to
enter the average monthly rainfall for a specific region and then use a loop to cycle through the array
and print out each value. The program should store up 5 years of meteorological data. Data is collected
once per month. The program should provide the option to the user of not entering any data.
Analysis
I will use sequential, selection, and repetition programming statements and an array to store data.
I will define a 2-D array of Float number: Raindata[][] to store the Float values input by the user. To store
up to 5 years of monthly data, the array size should be at least 5*12 = 60 elements. In a 2D array this will
be RainData[5][12]. We can use #defines to set the number of years and months to eliminate hard-
coding values.
A float number (rain) will also be needed to input the individual rain data.
A nested for loop can be used to iterate through the array to enter Raindata. A nested for loop can also
be used to print the data in the array.
A array of strings can be used to store year and month names. This will allow a tabular display with
labels for the printout.
Functions will be used to separate functionality into smaller work units. Functions for displaying the data
and inputting the data will be used.
A selection statement will be used to determine if data should be entered.
Test Plan
To verify this program is working properly the input values could be used for testing:
Test Case Input Expected Output
1 Enter data? = y
1.2
2.2
3.3
2.2
10.2
12.2
2.3
0.4
0.2
1.1
2.1
year month rain
2011 Jan 1.20
2011 Feb 2.20
2011 Mar 3.30
2011 Apr 2.20
2011 May 10.20
2011 Jun 12.20
2011 Jul 2.30
2011 Aug 0.40
2011 Sep 0.20
2011 Oct 1.10
2011 Nov 2.10
2011 Dec 0.40
2
0.4
1.1
2.2
3.3
2.2
10.2
12.2
2.3
0.4
0.2
1.1
2.1
0.4
1.1
2.2
3.3
2.2
10.2
12.2
2.3
0.4
0.2
1.1
2.1
0.4
1.1
2.2
3.3
2.2
10.2
12.2
2.3
0.4
0.2
1.1
2.1
0.4
1.1
2.2
3.3
2.2
10.2
12.2
2.3
0.4
0.2
1.1
2.1
0.4
2012 Jan 1.10
2012 Feb 2.20
2012 Mar 3.30
2012 Apr 2.20
2012 May 10.20
2012 Jun 12.20
2012 Jul 2.30
2012 Aug 0.40
2012 Sep 0.20
2012 Oct 1.10
2012 Nov 2.10
2012 Dec 0.40
2013 Jan 1.10
2013 Feb 2.20
2013 Mar 3.30
2013 Apr 2.20
2013 May 10.20
2013 Jun 12.20
2013 Jul 2.30
2013 Aug 0.40
2013 Sep 0 ...
Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...ITIIIndustries
The developed software is a web application with open access and is aimed on forecasting of time series stored in database. We proposed approach of time series forecasting, combined ARIMA models with fuzzy techniques: three fuzzy time series models, fuzzy transformation (F-transform) and ACL-scale. Applications of a proposed web service have demonstrated efficiency in practical time series predictions with suitable accuracy.
This presentation educates you about R - Time Series Analysis with its Following description parameters using basic syntax, The Time series chart, Different Time Intervals and Multiple Time Series with The Multiple Time series chart.
For more topics stay tuned with Learnbay.
Real Time Analytics with Apache Cassandra - Cassandra Day MunichGuido Schmutz
Time series data is everywhere: IoT, sensor data or financial transactions. The industry has moved to databases like Cassandra to handle the high velocity and high volume of data that is now common place. In this talk I will present how we have used Cassandra to store time series data. I will highlight both the Cassandra data model as well as the architecture we put in place for collecting and ingesting data into Cassandra, using Apache Kafka and Apache Storm.
FOR MORE CLASSES VISIT
tutorialoutletdotcom
CMIS 102 Hands-On Lab
Week 8
Overview
This hands-on lab allows you to follow and experiment with the critical steps of developing a program
including the program description, analysis, test plan, and implementation with C code. The example
provided uses sequential, repetition, selection statements, functions, strings and arrays.
1
CMIS 102 Hands-On Lab
Week 8
Overview
This hands-on lab allows you to follow and experiment with the critical steps of developing a program
including the program description, analysis, test plan, and implementation with C code. The example
provided uses sequential, repetition, selection statements, functions, strings and arrays.
Program Description
This program will input and store meteorological data into an array. The program will prompt the user to
enter the average monthly rainfall for a specific region and then use a loop to cycle through the array
and print out each value. The program should store up 5 years of meteorological data. Data is collected
once per month. The program should provide the option to the user of not entering any data.
Analysis
I will use sequential, selection, and repetition programming statements and an array to store data.
I will define a 2-D array of Float number: Raindata[][] to store the Float values input by the user. To store
up to 5 years of monthly data, the array size should be at least 5*12 = 60 elements. In a 2D array this will
be RainData[5][12]. We can use #defines to set the number of years and months to eliminate hard-
coding values.
A float number (rain) will also be needed to input the individual rain data.
A nested for loop can be used to iterate through the array to enter Raindata. A nested for loop can also
be used to print the data in the array.
A array of strings can be used to store year and month names. This will allow a tabular display with
labels for the printout.
Functions will be used to separate functionality into smaller work units. Functions for displaying the data
and inputting the data will be used.
A selection statement will be used to determine if data should be entered.
Test Plan
To verify this program is working properly the input values could be used for testing:
Test Case Input Expected Output
1 Enter data? = y
1.2
2.2
3.3
2.2
10.2
12.2
2.3
0.4
0.2
1.1
2.1
year month rain
2011 Jan 1.20
2011 Feb 2.20
2011 Mar 3.30
2011 Apr 2.20
2011 May 10.20
2011 Jun 12.20
2011 Jul 2.30
2011 Aug 0.40
2011 Sep 0.20
2011 Oct 1.10
2011 Nov 2.10
2011 Dec 0.40
2
0.4
1.1
2.2
3.3
2.2
10.2
12.2
2.3
0.4
0.2
1.1
2.1
0.4
1.1
2.2
3.3
2.2
10.2
12.2
2.3
0.4
0.2
1.1
2.1
0.4
1.1
2.2
3.3
2.2
10.2
12.2
2.3
0.4
0.2
1.1
2.1
0.4
1.1
2.2
3.3
2.2
10.2
12.2
2.3
0.4
0.2
1.1
2.1
0.4
2012 Jan 1.10
2012 Feb 2.20
2012 Mar 3.30
2012 Apr 2.20
2012 May 10.20
2012 Jun 12.20
2012 Jul 2.30
2012 Aug 0.40
2012 Sep 0.20
2012 Oct 1.10
2012 Nov 2.10
2012 Dec 0.40
2013 Jan 1.10
2013 Feb 2.20
2013 Mar 3.30
2013 Apr 2.20
2013 May 10.20
2013 Jun 12.20
2013 Jul 2.30
2013 Aug 0.40
2013 Sep 0 ...
Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...ITIIIndustries
The developed software is a web application with open access and is aimed on forecasting of time series stored in database. We proposed approach of time series forecasting, combined ARIMA models with fuzzy techniques: three fuzzy time series models, fuzzy transformation (F-transform) and ACL-scale. Applications of a proposed web service have demonstrated efficiency in practical time series predictions with suitable accuracy.
A talk given by Julian Hyde at DataCouncil SF on April 18, 2019
How do you organize your data so that your users get the right answers at the right time? That question is a pretty good definition of data engineering — but it is also describes the purpose of every DBMS (database management system). And it’s not a coincidence that these are so similar.
This talk looks at the patterns that reoccur throughout data management — such as caching, partitioning, sorting, and derived data sets. As the speaker is the author of Apache Calcite, we first look at these patterns through the lens of Relational Algebra and DBMS architecture. But then we apply these patterns to the modern data pipeline, ETL and analytics. As a case study, we look at how Looker’s “derived tables” blur the line between ETL and caching, and leverage the power of cloud databases.
Apache Cassandra for Timeseries- and Graph-DataGuido Schmutz
Apache Cassandra has proven to be one of the best solutions for storing and retrieving data at high velocity and high volume.
In the first part of the talk we will discuss how the storage model of Cassandra is ideal for time series use cases, which are often of high velocity and high volume. Time series data is everywhere today: Internet of Things, sensor data, transactional data, social media streams. We go over examples of how to best build data models.
We will also cover pairing Apache Spark with Apache Cassandra to create a real time data analytics platform.
The second part of the talk will present Titan:db, an open source distributed graph database build on top of Cassandra that can power real-time applications with thousands of concurrent users over graphs with billions of edges. It exposes a property graph data model directly atop Cassandra which makes storing and querying relationship data fast, easy, and scalable to huge graphs. This talk demonstrates how Titan's features enable complex, multi-relational databases in Cassandra and discusses how Titan:db has been used in a customer case to store social network data.
Assignment Details There is a .h file on Moodle that provides a defi.pdfjyothimuppasani1
Assignment Details There is a .h file on Moodle that provides a definition for a
WeatherForecaster class. The functionality for that class is similar to the functionality you
implemented in Assignment 5, with a few additional functions. Instead of using an array of
structs and functions to process the array, you will create one WeatherForecaster object that
includes the array of structs as a private variable and public methods to process the data. The
struct for this assignment has an additional member called forecastDay, you will need to store all
of the data this time. struct ForecastDay{ string day; string forecastDay; int highTemp; int
lowTemp; int humidity; int avgWind; string avgWindDir; int maxWind; string maxWindDir;
double precip; }; Methods in the WeatherForecaster class void addDayToData(ForecastDay); •
Takes a ForecastDay as an argument and adds it to the private array stored in the
WeatherForecaster object. • Use the private index variable to control where ForecastDay is
added to the array. void printDaysInData(); • Show the dates in the data set where the day and
the forecast day are the same. void printForecastForDay(string); • Take a date as an argument
and shows the forecast for that date. CSCI 1310 - Assignment 6 Due Saturday, Oct 15, by 12:30
pm void printFourDayForecast(string); • Takes a date as an argument and shows the forecast
issued on that date and for the next three days. For example, for a date of 1- 26-2016, you would
show the forecast for 1-26-2016 issued on 1- 26-2016 as well as the forecast for 1-27, 1-28, and
1-29 issued on 1-26. double calculateTotalPrecipitation(); • Returns the sum of the precipitation
in the data set. void printLastDayItRained(); • Shows the date of the last measureable
precipitation. void printLastDayAboveTemperature(int); • Takes an integer as an argument and
shows the date for the last day above that temperature. If no days are above the temperature,
prints “No days above that temperature.” void printTemperatureForecastDifference(string); •
Takes a date as an argument and shows the temperature forecast for that date for the three days
leading up to the date and the day-of forecast. void printPredictedVsActualRainfall(int); • Shows
the difference between the predicted and actual rainfall total in the entire data set. • The
argument to the function is the number of forecast days away. For example, the forecast for 1-27-
2016 is one day away from 1- 26-2016. string getFirstDayInData(); • Returns the first date in the
data with a day-of forecast, i.e. day = forecastDay string getLastDayInData(); • Returns the last
date in the data with a day-of forecast, i.e. day = forecastDay Challenge functions 1. There is
another header file on Moodle called WeatherForecastChallenge.h that uses a vector to store the
future forecast days. Instead of including all data in the yearData array, you can include only
days where the day = forecast day in the array. The other forecast days are stored in the vecto.
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
https://github.com/yaowser/data_mining_group_project
https://www.kaggle.com/c/zillow-prize-1/data
From the Zillow real estate data set of properties in the southern California area, conduct the following data cleaning, data analysis, predictive analysis, and machine learning algorithms:
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regression Model Performance, Optimizing Support Vector Machine Classifier, Accuracy of results and efficiency, Logistic Regression Feature Importance, interpretation of support vectors, Density Graph
Best Practices: How to Analyze IoT Sensor Data with InfluxDBInfluxData
InfluxDB is the purpose-built time series platform. Its high ingest capability makes it perfect for collecting, storing and analyzing time-stamped data from sensors — down to the nanosecond. The InfluxDB platform has everything developers need: the data collection agent, the database, visualization tools, and data querying and scripting language. Join this webinar as Brian Gilmore provides a product overview; he will also deep-dive with some helpful tips and ticks. Stick around for a live demo and Q&A time.
Join this webinar as Brian Gilmore dives into:
The basics of time series data and applications
A platform overview — learn about InfluxDB, Telegraf, and Flux
InfluxDB use case examples — start collecting data at the edge and use your preferred IoT protocol (i.e. MQTT)
Multi-Step-Ahead Simultaneously Forecasting For Multiple Time-Series, Using T...Florian Cartuta
Time series forecasting remains a challenging task across many application fields
despite extensive work done in this domain. The purpose of this paper is to propose a
scalable and efficient method which simplifies multi-step-ahead simultaneous forecasting of
large number of time-series. The method proposed here seeks to improve the efficiency and
accuracy of multi-step-ahead forecasting over medium/long term forecast horizons performed
simultaneously in one go, for a large number of time-series. The method proposed in this work is also exemplified for a store-item forecasting application in retail domain.
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey IJECEIAES
In the modern era, workflows are adopted as a powerful and attractive paradigm for expressing/solving a variety of applications like scientific, data intensive computing, and big data applications such as MapReduce and Hadoop. These complex applications are described using high-level representations in workflow methods. With the emerging model of cloud computing technology, scheduling in the cloud becomes the important research topic. Consequently, workflow scheduling problem has been studied extensively over the past few years, from homogeneous clusters, grids to the most recent paradigm, cloud computing. The challenges that need to be addressed lies in task-resource mapping, QoS requirements, resource provisioning, performance fluctuation, failure handling, resource scheduling, and data storage. This work focuses on the complete study of the resource provisioning and scheduling algorithms in cloud environment focusing on Infrastructure as a service (IaaS). We provided a comprehensive understanding of existing scheduling techniques and provided an insight into research challenges that will be a possible future direction to the researchers.
A talk given by Julian Hyde at DataCouncil SF on April 18, 2019
How do you organize your data so that your users get the right answers at the right time? That question is a pretty good definition of data engineering — but it is also describes the purpose of every DBMS (database management system). And it’s not a coincidence that these are so similar.
This talk looks at the patterns that reoccur throughout data management — such as caching, partitioning, sorting, and derived data sets. As the speaker is the author of Apache Calcite, we first look at these patterns through the lens of Relational Algebra and DBMS architecture. But then we apply these patterns to the modern data pipeline, ETL and analytics. As a case study, we look at how Looker’s “derived tables” blur the line between ETL and caching, and leverage the power of cloud databases.
Apache Cassandra for Timeseries- and Graph-DataGuido Schmutz
Apache Cassandra has proven to be one of the best solutions for storing and retrieving data at high velocity and high volume.
In the first part of the talk we will discuss how the storage model of Cassandra is ideal for time series use cases, which are often of high velocity and high volume. Time series data is everywhere today: Internet of Things, sensor data, transactional data, social media streams. We go over examples of how to best build data models.
We will also cover pairing Apache Spark with Apache Cassandra to create a real time data analytics platform.
The second part of the talk will present Titan:db, an open source distributed graph database build on top of Cassandra that can power real-time applications with thousands of concurrent users over graphs with billions of edges. It exposes a property graph data model directly atop Cassandra which makes storing and querying relationship data fast, easy, and scalable to huge graphs. This talk demonstrates how Titan's features enable complex, multi-relational databases in Cassandra and discusses how Titan:db has been used in a customer case to store social network data.
Assignment Details There is a .h file on Moodle that provides a defi.pdfjyothimuppasani1
Assignment Details There is a .h file on Moodle that provides a definition for a
WeatherForecaster class. The functionality for that class is similar to the functionality you
implemented in Assignment 5, with a few additional functions. Instead of using an array of
structs and functions to process the array, you will create one WeatherForecaster object that
includes the array of structs as a private variable and public methods to process the data. The
struct for this assignment has an additional member called forecastDay, you will need to store all
of the data this time. struct ForecastDay{ string day; string forecastDay; int highTemp; int
lowTemp; int humidity; int avgWind; string avgWindDir; int maxWind; string maxWindDir;
double precip; }; Methods in the WeatherForecaster class void addDayToData(ForecastDay); •
Takes a ForecastDay as an argument and adds it to the private array stored in the
WeatherForecaster object. • Use the private index variable to control where ForecastDay is
added to the array. void printDaysInData(); • Show the dates in the data set where the day and
the forecast day are the same. void printForecastForDay(string); • Take a date as an argument
and shows the forecast for that date. CSCI 1310 - Assignment 6 Due Saturday, Oct 15, by 12:30
pm void printFourDayForecast(string); • Takes a date as an argument and shows the forecast
issued on that date and for the next three days. For example, for a date of 1- 26-2016, you would
show the forecast for 1-26-2016 issued on 1- 26-2016 as well as the forecast for 1-27, 1-28, and
1-29 issued on 1-26. double calculateTotalPrecipitation(); • Returns the sum of the precipitation
in the data set. void printLastDayItRained(); • Shows the date of the last measureable
precipitation. void printLastDayAboveTemperature(int); • Takes an integer as an argument and
shows the date for the last day above that temperature. If no days are above the temperature,
prints “No days above that temperature.” void printTemperatureForecastDifference(string); •
Takes a date as an argument and shows the temperature forecast for that date for the three days
leading up to the date and the day-of forecast. void printPredictedVsActualRainfall(int); • Shows
the difference between the predicted and actual rainfall total in the entire data set. • The
argument to the function is the number of forecast days away. For example, the forecast for 1-27-
2016 is one day away from 1- 26-2016. string getFirstDayInData(); • Returns the first date in the
data with a day-of forecast, i.e. day = forecastDay string getLastDayInData(); • Returns the last
date in the data with a day-of forecast, i.e. day = forecastDay Challenge functions 1. There is
another header file on Moodle called WeatherForecastChallenge.h that uses a vector to store the
future forecast days. Instead of including all data in the yearData array, you can include only
days where the day = forecast day in the array. The other forecast days are stored in the vecto.
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
https://github.com/yaowser/data_mining_group_project
https://www.kaggle.com/c/zillow-prize-1/data
From the Zillow real estate data set of properties in the southern California area, conduct the following data cleaning, data analysis, predictive analysis, and machine learning algorithms:
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regression Model Performance, Optimizing Support Vector Machine Classifier, Accuracy of results and efficiency, Logistic Regression Feature Importance, interpretation of support vectors, Density Graph
Best Practices: How to Analyze IoT Sensor Data with InfluxDBInfluxData
InfluxDB is the purpose-built time series platform. Its high ingest capability makes it perfect for collecting, storing and analyzing time-stamped data from sensors — down to the nanosecond. The InfluxDB platform has everything developers need: the data collection agent, the database, visualization tools, and data querying and scripting language. Join this webinar as Brian Gilmore provides a product overview; he will also deep-dive with some helpful tips and ticks. Stick around for a live demo and Q&A time.
Join this webinar as Brian Gilmore dives into:
The basics of time series data and applications
A platform overview — learn about InfluxDB, Telegraf, and Flux
InfluxDB use case examples — start collecting data at the edge and use your preferred IoT protocol (i.e. MQTT)
Multi-Step-Ahead Simultaneously Forecasting For Multiple Time-Series, Using T...Florian Cartuta
Time series forecasting remains a challenging task across many application fields
despite extensive work done in this domain. The purpose of this paper is to propose a
scalable and efficient method which simplifies multi-step-ahead simultaneous forecasting of
large number of time-series. The method proposed here seeks to improve the efficiency and
accuracy of multi-step-ahead forecasting over medium/long term forecast horizons performed
simultaneously in one go, for a large number of time-series. The method proposed in this work is also exemplified for a store-item forecasting application in retail domain.
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey IJECEIAES
In the modern era, workflows are adopted as a powerful and attractive paradigm for expressing/solving a variety of applications like scientific, data intensive computing, and big data applications such as MapReduce and Hadoop. These complex applications are described using high-level representations in workflow methods. With the emerging model of cloud computing technology, scheduling in the cloud becomes the important research topic. Consequently, workflow scheduling problem has been studied extensively over the past few years, from homogeneous clusters, grids to the most recent paradigm, cloud computing. The challenges that need to be addressed lies in task-resource mapping, QoS requirements, resource provisioning, performance fluctuation, failure handling, resource scheduling, and data storage. This work focuses on the complete study of the resource provisioning and scheduling algorithms in cloud environment focusing on Infrastructure as a service (IaaS). We provided a comprehensive understanding of existing scheduling techniques and provided an insight into research challenges that will be a possible future direction to the researchers.
Similar to unit 5_Real time Data Analysis vsp.pptx (20)
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
2. TIME SERIES
Time series analysis is a crucial part of data analysis, and Python provides several tools and libraries for working
with time series data.
DR.V.S.PRAKASH | V BCA 2
3. Date and Time Data Types and Tools:
Datetime Module:
Python's standard library includes the datetime module, which provides classes for working with dates and
times. You can use the datetime class to represent both date and time information.
from datetime import datetime
now = datetime.now() # Current date and time
print(now)
Date and Time Objects:
The datetime module includes date and time classes that allow you to work with just the date or time portion.
DR.V.S.PRAKASH | V BCA 3
4. String to Datetime Conversion:
You can convert a string representing a date and time to a datetime object using the strptime method.
date_str = "2023-10-24"
datetime_obj = datetime.strptime(date_str, "%Y-%m-%d")
Datetime to String Conversion:
To convert a datetime object back to a string, you can use the strftime method.
formatted_date = datetime_obj.strftime("%Y-%m-%d")
DR.V.S.PRAKASH | V BCA 4
8. TIME SERIES BASICS:
Time series data consists of data points collected or recorded at successive, equally spaced time intervals.
Common examples of time series data include stock prices, temperature measurements, and website traffic.
Here are some fundamental concepts and tools for working with time series data in Python:
Pandas: The Pandas library is a powerful tool for working with time series data. It provides data structures like
DataFrame and Series that are ideal for organizing and analyzing time series data.
import pandas as pd
time_series_data = pd.Series([10, 20, 30, 40], index=pd.date_range(start='2023-10-01', periods=4, freq='D')
DR.V.S.PRAKASH | V BCA 8
9. Time Resampling:
You can resample time series data to change the frequency of data points (e.g., from daily to monthly) using
Pandas' resample method.
Pandas dataframe.resample() function is primarily used for time series data.
A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time
series is a sequence taken at successive equally spaced points in time. It is a Convenience method for
frequency conversion and resampling of time series.
monthly_data = time_series_data.resample('M').mean()
Plotting Time Series Data:
Libraries like Matplotlib and Seaborn can be used to visualize time series data.
import matplotlib.pyplot as plt
time_series_data.plot()
plt.xlabel("Date")
plt.ylabel("Value")
plt.show()
DR.V.S.PRAKASH | V BCA 9
10. Time Series Analysis:
You can perform various time series analysis tasks, including trend analysis, seasonality detection, and
forecasting, using tools like Statsmodels and Scikit-learn.
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(time_series_data)
trend = decomposition.trend
seasonal = decomposition.seasonal
Time series decomposition involves thinking of a series as a combination of level, trend, seasonality, and noise
noise components.
• Level: The average value in the series.
• Trend: The increasing or decreasing value in the series.
• Seasonality: The repeating short-term cycle in the series.
• Noise: The random variation in the series.
DR.V.S.PRAKASH | V BCA 10
11. INDEXING AND SELECTION:
Selecting by Index:
You can access specific elements in a time series using index labels.
import pandas as pd
time_series_data = pd.Series([10, 20, 30, 40], index=pd.date_range(start='2023-10-01', periods=4, freq='D'))
selected_data = time_series_data['2023-10-01']
Selecting by Slicing: Use slicing to select a range of data.
selected_range = time_series_data['2023-10-01':'2023-10-03']
Subsetting by Conditions:
You can subset time series data based on conditions using boolean indexing.
subset = time_series_data[time_series_data > 20]
DR.V.S.PRAKASH | V BCA 11
12. Date Ranges and Frequencies:
Date Ranges: You can create date ranges using Pandas' date_range function. This is useful for generating date
index for time series data.
date_range = pd.date_range(start='2023-10-01', end='2023-10-10', freq='D')
Frequencies: You can specify various frequencies when creating date ranges. Common frequencies include 'D'
(day), 'H' (hour), 'M' (month end), and more.
hourly_range = pd.date_range(start='2023-10-01', periods=24, freq='H')
monthly_range = pd.date_range(start='2023-01-01', end='2023-12-31', freq='M')
Shifting Data:
Shifting is a common operation when working with time series data, often used for calculating differences or
creating lag features.
Shift Data: You can shift the data forward or backward using the shift method.
shifted_data = time_series_data.shift(periods=1) # Shift data one period forward
Calculating Differences: To compute the difference between consecutive values, you can subtract the shifted
series.
DR.V.S.PRAKASH | V BCA 12
13. SHIFTING DATA:
Shifting data involves moving time series data forward or backward in time. This is useful for various time series
analysis tasks.
Shift Data: You can shift data using Pandas' shift method.
The simplest call should have an argument periods (It defaults to 1) and it represents the number of shifts for the
desired axis. And by default, it is shifting values vertically along the axis 0 . NaN will be filled for missing values
introduced as a result of the shifting.
import pandas as pd
# Shifting data one period forward
shifted_data = time_series_data.shift(periods=1)
# Shifting data two periods backward
shifted_data = time_series_data.shift(periods=-2)
Calculating Differences: Shifting is often used to calculate the differences between consecutive values.
# Calculate the difference between consecutive values
DR.V.S.PRAKASH | V BCA 13
14. Generating Date Ranges and Frequencies:
Pandas provides powerful tools for generating date ranges with different frequencies.
Date Ranges: Use the date_range function to create date ranges.
date_range = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D') # Daily frequency
Frequencies: You can specify various frequencies such as 'D' (day), 'H' (hour), 'M' (month end), 'Y' (year end),
and more when creating date ranges.
hourly_range = pd.date_range(start='2023-01-01', periods=24, freq='H')
monthly_range = pd.date_range(start='2023-01-01', end='2023-12-31', freq='M')
DR.V.S.PRAKASH | V BCA 14
15. Time Zone Handling:
Pandas can handle time zones and convert between them.
Setting Time Zone:
time_series_data = time_series_data.tz_localize('UTC') # Set time zone to UTC
Converting Time Zones:
time_series_data = time_series_data.tz_convert('US/Eastern') # Convert to US Eastern Time
Quarterly Period Frequencies:
Quarterly periods can be generated with the "Q" frequency code.
quarterly_range = pd.period_range(start='2023Q1', end='2023Q4', freq='Q')
DR.V.S.PRAKASH | V BCA 15
16. TIME SERIES ANALYSIS
Time series analysis often involves various data manipulation tasks, including plotting, data munging,
splicing data from multiple sources, decile and quartile analysis, and more. Let's explore these concepts and
some sample applications in the context of time series analysis:
Time Series Plotting:
Plotting is crucial for visualizing time series data to identify patterns and trends.
import matplotlib.pyplot as plt
# Plot time series data
time_series_data.plot()
plt.xlabel("Date")
plt.ylabel("Value")
plt.title("Time Series Plot")
plt.show()
DR.V.S.PRAKASH | V BCA 16
17. Data Munging:
Data munging involves cleaning, transforming, and preparing data for analysis. In time series analysis, this might
include handling missing values, resampling, or dealing with outliers.
# Handling missing values
time_series_data = time_series_data.fillna(method='ffill')
# Resampling to a different frequency
resampled_data = time_series_data.resample('W').mean()
Splicing Together Data Sources:
In some cases, you may need to combine time series data from multiple sources.
import pandas as pd
# Concatenating data from multiple sources
combined_data = pd.concat([data_source1, data_source2])
Decile and Quartile Analysis:
Decile and quartile analysis helps you understand the distribution of data.
# Calculate quartiles
quartiles = time_series_data.quantile([0.25, 0.5, 0.75])
# Calculate deciles
deciles = time_series_data.quantile([i/10 for i in range(1, 10)])
DR.V.S.PRAKASH | V BCA 17
18. Sample Applications:
Stock Market Analysis: Analyzing stock price time series data for trends and predicting future stock prices.
Temperature Forecasting: Analyzing historical temperature data to forecast future weather conditions.
Demand Forecasting: Analyzing sales data to forecast future product demand.
Future Contract Rolling:
In financial time series analysis, rolling futures contracts is crucial to avoid jumps in time series data when
contracts expire.
# Rolling futures contracts in a DataFrame
rolled_data = contract_rolling_function(time_series_data, window=10)
Rolling Correlation and Linear Regression:
Rolling correlation and regression are used to understand how the relationship between two time series
changes over time.
# Calculate rolling correlation between two time series
rolling_corr = time_series_data1.rolling(window=30).corr(time_series_data2)
# Calculate rolling linear regression between two time series
rolling_regression = time_series_data1.rolling(window=30).apply(lambda x: your_lin
DR.V.S.PRAKASH | V BCA 18
19. DR.V.S.PRAKASH | V BCA 19
Data Munging:
Data munging is a more general term that encompasses various data preparation tasks, including
cleaning, structuring, and organizing data.
It often involves dealing with missing data, handling outliers, and addressing issues like data
format inconsistencies.
Data munging can also include tasks such as data loading, data extraction, and basic data
exploration.
It is a broader term that doesn't specify a particular methodology or approach.
Data Wrangling:
Data wrangling is a subset of data munging that specifically refers to the process of cleaning,
transforming, and structuring data for analysis.
Data wrangling typically involves tasks like filtering, aggregating, joining, and reshaping data to
create a dataset that is ready for analysis or machine learning.
It is often associated with data preparation in the context of data analysis and is more focused on
making data suitable for specific analytical tasks.