Modelling tick bites dynamics
using volunteer data
Irene Garcia-Martí
PhD Candidate
Dept. of Geo-Information Processing (GIP)
ITC – University of Twente
25th May 2015
Ticks
2
Distribution of Lyme disease
http://en.wikipedia.org/wiki/Lyme_disease
Lyme, CT
3
Source:DutchNationalAtlas
4
Tick bites and Lyme cases 1994 – 2009
Source: Dutch National Atlas & press releases
(2014 estimated)
5
Volunteer data collection
6
Important factors on tick densities
• Start questing season
• Survival through winterTemperature
• Increases tick survival
• Prevent tick dessicationPrecipitation
• Keeps soil moisture high
• Prevent tick dessicationVegetation
• Sustains tick populationWildlife
7
Important factors on tick bites
• Recreational pressure in nature
• Low perception of tick bite risk
Humans
• High temperatures, more
people outTemperature
• Rainy days, less people outsidePrecipitation
8
Problem Conceptualization
TB = f(TA, HA, ENV, CLI)
TB = Tick Bites
TA = Tick Abundance
HA = Human Activity (Volunteer data, soil type, land use)
ENV = Environmental factors (Vegetation indices)
CLI = Weather factors (temperature and precipitation)
9
Data
 Four types of data:
 Volunteer data on tick bites
 Remote sensing data
 Weather data
 Official data on land use and soil type
 Influence tick ecology
10
Data
 Volunteer tick bite collection
(2006 – 2014)
 Quality check:
 Remove observations without
coordinate
 Remove observations outside
Netherlands
 Total amount:
 28.865 observations
11
Data
 Remote-sensed data
12
Data
 Weather data:
 Daily temperature and precipitation raster files
 Provided by KNMI
 Official data:
 Land use
 Soil type
13
Data preparation
 Objective:
 Characterize tick bites observations in function of human,
environmental and climatic indices
 Big try-and-error factor
 Procedure:
 Create multidimensional table in function of data available
 Find features that are related with tick ecology
 Set a temporal scale to aggregate data
 Tools: Python and Javascript to process data
14
Data preparation
Observations Vegetation
Indices
Temperature
Indices
Precipitation
Indices
Human
indices
1
2
3
…
28.865
Multidimensional Table (MDT)
15
Data preparation
 Why are we doing this?
 Find conditions where humans are more exposed to tick bites that are
frequent in data
 Find clusters of these conditions and study spatial patterns
 Direct prevention campaigns
16
Analysis overview
 Analysis:
 MDT: Input for modeling with data mining algorithms
 Three experiments:
 1st: Clustering and classification techniques
 2nd: Frequent pattern mining (today!)
 3rd: Decision Trees (on-going)
 Visualization with maps and ringmaps
17
What is frequent pattern mining?
 FPM is:
 Technique to find statistically relevant patterns in data
 Try to find the longest combination of elements with a frequency above
a threshold
 Multiple algorithms:
 Apriori
 Eclat
 FP-Growth
 Toy example with supermarket list
18
How does Apriori work?
19
ID Items ID Items
1 Apple, Chicken, Donut 11 Donut
2 Apple, Chicken 12 Chicken, Donut
3 Chicken, Bread, Apple 13 Apple
4 Donut, Bread 14 Apple, Donut
5 Donut, Chicken, Bread 15 Apple, Chicken
6 Apple, Donut 16 Bread
7 Bread, Chicken 17 Bread, Chicken
8 Apple, Donut 18 Donut
9 Break, Chicken 19 Apple, Chicken
10 Chicken, Donut 20 Bread, Chicken, Donut
How does Apriori work?
 Key idea: A pattern is frequent if
its subsets are also frequent
 User sets a threshold to consider
a pattern frequent
 Pattern generation is bottom-up
 Computationally expensive:
 One level per different item
 Multiple passes on data
20
How does Apriori work?
21
ID Items ID Items
1 Apple, Chicken, Donut 11 Chicken
2 Apple, Bread 12 Chicken, Donut
3 Apple, Bread, Chicken 13 Apple
4 Bread, Donut 14 Apple, Donut
5 Bread, Chicken, Donut 15 Apple, Chicken
6 Apple, Donut 16 Bread
7 Apple, Chicken 17 Bread, Chicken
8 Apple, Donut 18 Chicken
9 Bread, Chicken 19 Apple, Donut
10 Chicken, Donut 20 Bread, Chicken, Donut
Back to the analysis
 No more supermarket items
 Reality:
 Apriori:
 Will scan combinations of 39 different items
 Will receive 28.865 rows as an input
 Data on temperature, precipitation, vegetation and human indices
 Thousands of patterns may be generated
 Depending on the threshold
 Challenge for visualization
 Patterns combined using ringmaps
22
Analysis at a seasonal scale
Classified MDT
(Jenks Natural
Breaks)
Apply Apriori
Explore frequent
patterns
MDT
+ 39 columns
+ 28.865 obs.
+ 2006 - 2014
23
Analysis at seasonal scale
Patterns
length 3
Patterns
length 4
• Combining 1500 patterns
• Length 3 and 4
• Frequency +20% 24
Analysis at a seasonal scale
25
26
Conclusions
 Identified combinations of indices where tick bites occurred:
 Adding human-related variables seem to produce meaningful results
 Still no spatial pattern:
 Humans are a biasing factor.
 It suggests there are ticks and humans everywhere
 Further work:
 Converge to a suitable temporal scale for tick dynamics and humans
 Spatial aggregation of observations (forest, neighborhood, grid cell)
 More human-related indices, less nature-related indices
27
Discussion
 Do any of you knew Lyme disease before?
 Do any of you have experience in modeling species?
 What else would you include in the analysis?
28
Thanks
Questions?

Modelling tick bites dynamics using VGI (2015)

  • 1.
    Modelling tick bitesdynamics using volunteer data Irene Garcia-Martí PhD Candidate Dept. of Geo-Information Processing (GIP) ITC – University of Twente 25th May 2015
  • 2.
  • 3.
    Distribution of Lymedisease http://en.wikipedia.org/wiki/Lyme_disease Lyme, CT 3
  • 4.
  • 5.
    Tick bites andLyme cases 1994 – 2009 Source: Dutch National Atlas & press releases (2014 estimated) 5
  • 6.
  • 7.
    Important factors ontick densities • Start questing season • Survival through winterTemperature • Increases tick survival • Prevent tick dessicationPrecipitation • Keeps soil moisture high • Prevent tick dessicationVegetation • Sustains tick populationWildlife 7
  • 8.
    Important factors ontick bites • Recreational pressure in nature • Low perception of tick bite risk Humans • High temperatures, more people outTemperature • Rainy days, less people outsidePrecipitation 8
  • 9.
    Problem Conceptualization TB =f(TA, HA, ENV, CLI) TB = Tick Bites TA = Tick Abundance HA = Human Activity (Volunteer data, soil type, land use) ENV = Environmental factors (Vegetation indices) CLI = Weather factors (temperature and precipitation) 9
  • 10.
    Data  Four typesof data:  Volunteer data on tick bites  Remote sensing data  Weather data  Official data on land use and soil type  Influence tick ecology 10
  • 11.
    Data  Volunteer tickbite collection (2006 – 2014)  Quality check:  Remove observations without coordinate  Remove observations outside Netherlands  Total amount:  28.865 observations 11
  • 12.
  • 13.
    Data  Weather data: Daily temperature and precipitation raster files  Provided by KNMI  Official data:  Land use  Soil type 13
  • 14.
    Data preparation  Objective: Characterize tick bites observations in function of human, environmental and climatic indices  Big try-and-error factor  Procedure:  Create multidimensional table in function of data available  Find features that are related with tick ecology  Set a temporal scale to aggregate data  Tools: Python and Javascript to process data 14
  • 15.
  • 16.
    Data preparation  Whyare we doing this?  Find conditions where humans are more exposed to tick bites that are frequent in data  Find clusters of these conditions and study spatial patterns  Direct prevention campaigns 16
  • 17.
    Analysis overview  Analysis: MDT: Input for modeling with data mining algorithms  Three experiments:  1st: Clustering and classification techniques  2nd: Frequent pattern mining (today!)  3rd: Decision Trees (on-going)  Visualization with maps and ringmaps 17
  • 18.
    What is frequentpattern mining?  FPM is:  Technique to find statistically relevant patterns in data  Try to find the longest combination of elements with a frequency above a threshold  Multiple algorithms:  Apriori  Eclat  FP-Growth  Toy example with supermarket list 18
  • 19.
    How does Aprioriwork? 19 ID Items ID Items 1 Apple, Chicken, Donut 11 Donut 2 Apple, Chicken 12 Chicken, Donut 3 Chicken, Bread, Apple 13 Apple 4 Donut, Bread 14 Apple, Donut 5 Donut, Chicken, Bread 15 Apple, Chicken 6 Apple, Donut 16 Bread 7 Bread, Chicken 17 Bread, Chicken 8 Apple, Donut 18 Donut 9 Break, Chicken 19 Apple, Chicken 10 Chicken, Donut 20 Bread, Chicken, Donut
  • 20.
    How does Aprioriwork?  Key idea: A pattern is frequent if its subsets are also frequent  User sets a threshold to consider a pattern frequent  Pattern generation is bottom-up  Computationally expensive:  One level per different item  Multiple passes on data 20
  • 21.
    How does Aprioriwork? 21 ID Items ID Items 1 Apple, Chicken, Donut 11 Chicken 2 Apple, Bread 12 Chicken, Donut 3 Apple, Bread, Chicken 13 Apple 4 Bread, Donut 14 Apple, Donut 5 Bread, Chicken, Donut 15 Apple, Chicken 6 Apple, Donut 16 Bread 7 Apple, Chicken 17 Bread, Chicken 8 Apple, Donut 18 Chicken 9 Bread, Chicken 19 Apple, Donut 10 Chicken, Donut 20 Bread, Chicken, Donut
  • 22.
    Back to theanalysis  No more supermarket items  Reality:  Apriori:  Will scan combinations of 39 different items  Will receive 28.865 rows as an input  Data on temperature, precipitation, vegetation and human indices  Thousands of patterns may be generated  Depending on the threshold  Challenge for visualization  Patterns combined using ringmaps 22
  • 23.
    Analysis at aseasonal scale Classified MDT (Jenks Natural Breaks) Apply Apriori Explore frequent patterns MDT + 39 columns + 28.865 obs. + 2006 - 2014 23
  • 24.
    Analysis at seasonalscale Patterns length 3 Patterns length 4 • Combining 1500 patterns • Length 3 and 4 • Frequency +20% 24
  • 25.
    Analysis at aseasonal scale 25
  • 26.
  • 27.
    Conclusions  Identified combinationsof indices where tick bites occurred:  Adding human-related variables seem to produce meaningful results  Still no spatial pattern:  Humans are a biasing factor.  It suggests there are ticks and humans everywhere  Further work:  Converge to a suitable temporal scale for tick dynamics and humans  Spatial aggregation of observations (forest, neighborhood, grid cell)  More human-related indices, less nature-related indices 27
  • 28.
    Discussion  Do anyof you knew Lyme disease before?  Do any of you have experience in modeling species?  What else would you include in the analysis? 28
  • 29.