The document discusses strategies for analyzing large datasets. It introduces housing data from Texas metropolitan areas from 2000 to 2009, including numbers of houses listed and sold, total value, average sale price, and time on market. As an example, the document focuses on Houston's data, exploring seasonal patterns and fitting linear models to remove seasonality and better view long-term trends. These techniques are then applied to sales data from all Texas cities.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
03 Modelling
1. plyr
Modelling large data
Hadley Wickham
Tuesday, 7 July 2009
2. 1. Strategy for analysing large data.
2. Introduction to the Texas housing
data.
3. What’s happening in Houston?
4. Using a models as a tool
5. Using models in their own right
Tuesday, 7 July 2009
3. Large data strategy
Start with a single unit, and identify
interesting patterns.
Summarise patterns with a model.
Apply model to all units.
Look for units that don’t fit the pattern.
Summarise with a single model.
Tuesday, 7 July 2009
4. Texas housing data
For each metropolitan area (45) in Texas,
for each month from 2000 to 2009 (112):
Number of houses listed and sold
Total value of houses, and average sale
price
Average time on market
CC BY http://www.flickr.com/photos/imagesbywestfall/3510831277/
Tuesday, 7 July 2009
5. Strategy
Start with a single city (Houston).
Explore patterns & fit models.
Apply models to all cities.
Tuesday, 7 July 2009
6. 220000
200000
avgprice
180000
160000
2000 2002 2004 2006 2008
date
Tuesday, 7 July 2009
7. 8000
7000
6000
sales
5000
4000
3000
2000 2002 2004 2006 2008
date
Tuesday, 7 July 2009
8. 6.5
6.0
onmarket
5.5
5.0
4.5
4.0
2000 2002 2004 2006 2008
date
Tuesday, 7 July 2009
9. Seasonal trends
Make it much harder to see long term
trend. How can we remove the trend?
(Many sophisticated techniques from time
series, but what’s the simplest thing that
might work?)
Tuesday, 7 July 2009
11. Challenge
What does the following function do?
deseas <- function(var, month) {
resid(lm(var ~ factor(month))) +
mean(var, na.rm = TRUE)
}
How could you use it in conjunction with
transform to deasonalise the data? What if
you wanted to deasonalise every city?
Tuesday, 7 July 2009
14. Model as tools
Here we’re using the linear model as a
tool - we don’t care about the coefficients
or the standard errors, just using it to get
rid of a striking pattern.
Tukey described this pattern as residuals
and reiteration: by removing a striking
pattern we can see more subtle patterns.
Tuesday, 7 July 2009
15. 210000
200000
190000
avgprice_ds
180000
170000
160000
150000
2000 2002 2004 2006 2008
date
Tuesday, 7 July 2009
16. 7000
6500
6000
sales_ds
5500
5000
4500
4000
2000 2002 2004 2006 2008
date
Tuesday, 7 July 2009
17. 6.5
6.0
onmarket_ds
5.5
5.0
4.5
4.0
2000 2002 2004 2006 2008
date
Tuesday, 7 July 2009
18. Summary
Most variables seem to be combination of
strong seasonal pattern plus weaker long-
term trend.
How do these patterns hold up for the
rest of Texas? We’ll focus on sales.
Tuesday, 7 July 2009
19. 8000
6000
sales
4000
2000
2000 2002 2004 2006 2008
date
Tuesday, 7 July 2009
20. tx <- read.csv("tx-house-sales.csv")
qplot(date, sales, data = tx, geom = "line",
group = city)
tx <- ddply(tx, "city", transform,
sales_ds = deseas(sales, month))
qplot(date, sales_ds, data = tx, geom = "line",
group = city)
Tuesday, 7 July 2009
21. 7000
6000
5000
sales_ds
4000
3000
2000
1000
2000 2002 2004 2006 2008
date
Tuesday, 7 July 2009
22. It works, but...
It doesn’t give us any insight into the
similarity of the patterns across multiple
cities. Are the trends the same or
different?
So instead of throwing the models away
and just using the residuals, let’s keep the
models and explore them in more depth.
Tuesday, 7 July 2009
23. Two new tools
dlply: takes a data frame, splits up in the
same way as ddply, applies function to
each piece and combines the results into a
list
ldply: takes a list, splits up into elements,
applies function to each piece and then
combines the results into a data frame
dlply + ldply = ddply
Tuesday, 7 July 2009
24. models <- dlply(tx, "city", function(df)
lm(sales ~ factor(month), data = df))
models[[1]]
coef(models[[1]])
ldply(models, coef)
Tuesday, 7 July 2009
25. Labelling
Notice we didn’t have to do anything to
have the coefficients labelled correctly.
Behind the scenes plyr records the labels
used for the split step, and ensures they
are preserved across multiple plyr calls.
Tuesday, 7 July 2009
26. Back to the model
What are some problems with this model?
How could you fix them?
Is the format of the coefficients optimal?
Turn to the person next to you and
discuss for 2 minutes.
Tuesday, 7 July 2009
28. qplot(date, log10(sales), data = tx, geom = "line",
group = city)
Log transform sales to
make coefficients
models2 <- dlply(tx, "city", function(df)
comparable (ratios)
lm(log10(sales) ~ factor(month), data = df))
coef2 <- ldply(models2, function(mod) {
data.frame(
month = 1:12,
effect = c(0, coef(mod)[-1]),
intercept = coef(mod)[1])
})
Puts coefficients in
rows, so they can be
plotted more easily
Tuesday, 7 July 2009
29. 0.4
0.3
0.2
effect
0.1
0.0
−0.1
2 4 6 8 10 12
qplot(month, effect, data = coef2, group month
= city, geom = "line")
Tuesday, 7 July 2009
30. 2.5
2.0
10^effect
1.5
1.0
2 4 6 8 10 12
month
qplot(month, 10 ^ effect, data = coef2, group = city, geom = "line")
Tuesday, 7 July 2009
31. Abilene Amarillo Arlington Austin Bay Area Beaumont Brazoria County
2.5
2.0
1.5
1.0
BrownsvilleBryan−College Station
Collin County Corpus Christi Dallas Denton County El Paso
2.5
2.0
1.5
1.0
Fort Bend Fort Worth Galveston Garland Harlingen Houston Irving
2.5
2.0
1.5
1.0
Killeen−Fort Hood Laredo Longview−Marshall Lubbock Lufkin McAllen Midland
10^effect
2.5
2.0
1.5
1.0
Montgomery CountyNacogdoches NE Tarrant County Odessa Palestine Paris Port Arthur
2.5
2.0
1.5
1.0
San Angelo San Antonio San Marcos Sherman−DenisonTemple−Belton Texarkana Tyler
2.5
2.0
1.5
1.0
Victoria Waco Wichita Falls
2.5
2.0
1.5
1.0
2 4 6 8 1012 2 4 6 8 1012 2 4 6 8 1012 2 4 6 8 1012 2 4 6 8 1012 2 4 6 8 1012 2 4 6 8 1012
month
qplot(month, 10 ^ effect, data = coef2, geom = "line") + facet_wrap(~ city)
Tuesday, 7 July 2009
32. What should
we do next?
What do you think?
You have 30 seconds to come up with (at
least) one idea.
Tuesday, 7 July 2009
33. My ideas
Fit a single model, log(sales) ~ city *
factor(month), and look at residuals
Fit individual models, log(sales) ~
factor(month) + ns(date, 3), look cities
that don’t fit
Tuesday, 7 July 2009
34. # One approach - fit a single model
mod <- lm(log10(sales) ~ city + factor(month),
data = tx)
tx$sales2 <- 10 ^ resid(mod)
qplot(date, sales2, data = tx, geom = "line",
group = city)
last_plot() + facet_wrap(~ city)
Tuesday, 7 July 2009
35. 3.5
3.0
2.5
sales2
2.0
1.5
1.0
0.5
2000 2002 2004 2006 2008
date
qplot(date, sales2, data = tx, geom = "line", group = city)
Tuesday, 7 July 2009
36. Abilene Amarillo Arlington Austin Bay Area Beaumont Brazoria County
3.5
3.0
2.5
2.0
1.5
1.0
0.5
BrownsvilleBryan−College Station
Collin County Corpus Christi Dallas Denton County El Paso
3.5
3.0
2.5
2.0
1.5
1.0
0.5
Fort Bend Fort Worth Galveston Garland Harlingen Houston Irving
3.5
3.0
2.5
2.0
1.5
1.0
0.5
Killeen−Fort Hood Laredo Longview−Marshall Lubbock Lufkin McAllen Midland
3.5
sales2
3.0
2.5
2.0
1.5
1.0
0.5
Montgomery CountyNacogdoches NE Tarrant County Odessa Palestine Paris Port Arthur
3.5
3.0
2.5
2.0
1.5
1.0
0.5
San Angelo San Antonio San Marcos Sherman−DenisonTemple−Belton Texarkana Tyler
3.5
3.0
2.5
2.0
1.5
1.0
0.5
Victoria Waco Wichita Falls
3.5
3.0
2.5
2.0
1.5
1.0
0.5
2000 2004 2008 2002 2006 2000 2004 2008 2002 2006 2000 2004 2008 2002 2006 2000 2004 2008
2002 2006 2000 2004 2008 2002 2006 2000 2004 2008 2002 2006 2000 2004 2008 2002 2006
date
last_plot() + facet_wrap(~ city)
Tuesday, 7 July 2009
37. # Another approach: Essence of most cities is seasonal
# term plus long term smooth trend. We could fit this
# model to each city, and then look for models which don't
# fit well.
library(splines)
models3 <- dlply(tx, "city", function(df) {
lm(log10(sales) ~ factor(month) + ns(date, 3), data = df)
})
# Extract rsquared from each model
rsq <- function(mod) c(rsq = summary(mod)$r.squared)
quality <- ldply(models3, rsq)
Tuesday, 7 July 2009
38. Wichita Falls ●
Waco ●
Victoria ●
Tyler ●
Texarkana ●
Temple−Belton ●
Sherman−Denison ●
San Marcos ●
San Antonio ●
San Angelo ●
Port Arthur ●
Paris ●
Palestine ●
Odessa ●
NE Tarrant County ●
Nacogdoches ●
Montgomery County ●
Midland ●
McAllen ●
Lufkin ●
Lubbock ●
Longview−Marshall ●
Laredo ●
city
Killeen−Fort Hood ●
Irving ●
Houston ●
Harlingen ●
Garland ●
Galveston ●
Fort Worth ●
Fort Bend ●
El Paso ●
Denton County ●
Dallas ●
Corpus Christi ●
Collin County ●
Bryan−College Station ●
Brownsville ●
Brazoria County ●
Beaumont ●
Bay Area ●
Austin ●
Arlington ●
Amarillo ●
Abilene ●
0.5 0.6 0.7 0.8 0.9
rsq
qplot(rsq, city, data = quality)
Tuesday, 7 July 2009
39. San Antonio ●
Montgomery County ●
Houston How are the good ●
Dallas ●
Bryan−College Station
Collin County
fits different from ●
●
Denton County
Austin
the bad fits? ●
●
Fort Bend ●
Fort Worth ●
Tyler ●
NE Tarrant County ●
Bay Area ●
Corpus Christi ●
Arlington ●
Waco ●
Temple−Belton ●
Lubbock ●
reorder(city, rsq)
Garland ●
Longview−Marshall ●
Midland ●
Laredo ●
Harlingen ●
Abilene ●
Killeen−Fort Hood ●
Brazoria County ●
McAllen ●
Brownsville ●
Wichita Falls ●
Sherman−Denison ●
Irving ●
Galveston ●
Odessa ●
San Marcos ●
San Angelo ●
Amarillo ●
Nacogdoches ●
Lufkin ●
Victoria ●
Beaumont ●
Texarkana ●
Paris ●
Palestine ●
El Paso ●
Port Arthur ●
0.5 0.6 0.7 0.8 0.9
rsq
qplot(rsq, reorder(city, rsq), data = quality)
Tuesday, 7 July 2009
40. quality$poor <- quality$rsq < 0.7
tx2 <- merge(tx, quality, by = "city")
mfit <- ldply(models3, function(mod) {
data.frame(
resid = resid(mod),
pred = predict(mod))
})
tx2 <- cbind(tx2, mfit[, -1])
Can you think of any potential
problems with this line?
Tuesday, 7 July 2009
41. Abilene Amarillo Arlington Austin Bay Area Beaumont Brazoria County
3.5
3.0
2.5
2.0
1.5
1.0
0.5
BrownsvilleBryan−College Station
Collin County Corpus Christi Dallas Denton County El Paso
3.5
3.0
2.5
2.0
1.5
1.0
0.5
Fort Bend Fort Worth Galveston Garland Harlingen Houston Irving
3.5
3.0
2.5
2.0
1.5
1.0
0.5
log10(sales)
Killeen−Fort Hood Laredo Longview−Marshall Lubbock Lufkin McAllen Midland
3.5
3.0
2.5
2.0
1.5
1.0
0.5
Montgomery CountyNacogdoches NE Tarrant County Odessa Palestine Paris Port Arthur
3.5
3.0
2.5
2.0
1.5
1.0
0.5
San Angelo San Antonio San Marcos Sherman−DenisonTemple−Belton Texarkana Tyler
3.5
3.0
2.5
2.0
1.5
1.0
0.5
Victoria Waco Wichita Falls
3.5
3.0
2.5
2.0
1.5
1.0
0.5
2000 2004 2008 2002 2006 2000 2004 2008 2002 2006 2000 2004 2008 2002 2006 2000 2004 2008
2002 2006 2000 2004 2008 2002 2006 2000 2004 2008 2002 2006 2000 2004 2008 2002 2006 Raw data
date
Tuesday, 7 July 2009
42. Abilene Amarillo Arlington Austin Bay Area Beaumont Brazoria County
3.5
3.0
2.5
2.0
1.5
1.0
BrownsvilleBryan−College Station
Collin County Corpus Christi Dallas Denton County El Paso
3.5
3.0
2.5
2.0
1.5
1.0
Fort Bend Fort Worth Galveston Garland Harlingen Houston Irving
3.5
3.0
2.5
2.0
1.5
1.0
Killeen−Fort Hood Laredo Longview−Marshall Lubbock Lufkin McAllen Midland
3.5
3.0
pred
2.5
2.0
1.5
1.0
Montgomery CountyNacogdoches NE Tarrant County Odessa Palestine Paris Port Arthur
3.5
3.0
2.5
2.0
1.5
1.0
San Angelo San Antonio San Marcos Sherman−DenisonTemple−Belton Texarkana Tyler
3.5
3.0
2.5
2.0
1.5
1.0
Victoria Waco Wichita Falls
3.5
3.0
2.5
2.0
1.5
1.0
Predictions
2000 2004 2008 2002 2006 2000 2004 2008 2002 2006 2000 2004 2008 2002 2006 2000 2004 2008
2002 2006 2000 2004 2008 2002 2006 2000 2004 2008 2002 2006 2000 2004 2008 2002 2006
date
Tuesday, 7 July 2009
43. Abilene Amarillo Arlington Austin Bay Area Beaumont Brazoria County
0.4
0.2
0.0
−0.2
−0.4
BrownsvilleBryan−College Station
Collin County Corpus Christi Dallas Denton County El Paso
0.4
0.2
0.0
−0.2
−0.4
Fort Bend Fort Worth Galveston Garland Harlingen Houston Irving
0.4
0.2
0.0
−0.2
−0.4
Killeen−Fort Hood Laredo Longview−Marshall Lubbock Lufkin McAllen Midland
0.4
0.2
resid
0.0
−0.2
−0.4
Montgomery County
Nacogdoches NE Tarrant County Odessa Palestine Paris Port Arthur
0.4
0.2
0.0
−0.2
−0.4
San Angelo San Antonio San Marcos Sherman−DenisonTemple−Belton Texarkana Tyler
0.4
0.2
0.0
−0.2
−0.4
Victoria Waco Wichita Falls
0.4
0.2
0.0
−0.2
−0.4
2000 2004 2008 2002 2006 2000 2004 2008 2002 2006 2000 2004 2008 2002 2006 2000 2004 2008
2002 2006 2000 2004 2008 2002 2006 2000 2004 2008 2002 2006 2000 2004 2008 2002 2006 Residuals
date
Tuesday, 7 July 2009
44. Conclusions
Simple (and relatively small) example, but
shows how collections of models can be
useful for gaining understanding.
Each attempt illustrated something new
about the data.
Plyr made it easy to create and summarise
collection of models, so we didn’t have to
worry about the mechanics.
Tuesday, 7 July 2009