Machine learning for improving
wireless network performance
Merima Kulin, Eli De Poorter, Dirk Deschrijver, Tom Dhaene and Ingrid Moerman
merima.kulin@intec.ugent.be
Internet Based Communication Networks and Services research group (IBCN)- IDLab
Department of Information Technology (INTEC)
Ghent University - imec BESTCOM, 21.10.2016., Louvain
Machine learning for wireless networks
• Introduction
• Data-driven design: examples
• Data science in wireless networks: a tutorial
• Conclusion
2
Why machine learning?
• Gartner's Hype Cycle for Emerging Technologies
2015.
2016.
Computational
power
Massive amounts
of data
Unprecedented advances
in ML
Why machine learning?
Data is the new oil!
What kind of data are generating wireless networks?
IoT
Network
monitoring
Cognitive radio
Wireless networks
as data sources
Data Science
What is data science?
6
Machine
learning
Data mining
Data
analysis
ML
algorithm
selection
Model
Evaluation
Pre-
processing
…
Kulin, Merima, Carolina Fortuna, Eli De Poorter, Dirk Deschrijver, and Ingrid Moerman.
"Data-Driven Design of Intelligent Wireless Networks: An Overview and Tutorial."
Sensors 16, no. 6 (2016): 790.
• Machine learning
vs.
• Data mining
vs.
• Data science
“Data science is the study of generalizable
extraction of knowledge from data”.
Machine learning in wireless networks
• Introduction
• Data driven design: examples
• Data science in wireless networks: a tutorial
• Conclusion
7
Data mining/Machine learning approaches
8
Regression Classification
Clustering Anomaly detection
Regression
9
Regression
Regression
X Y
Regression - example
10
Application area: Localization
Vanheel, F.; Verhaevert, J.; Laermans, E.; Moerman, I.; Demeester, P. Automated
linear regression tools improve rssi wsn localization in multipath indoor environment.
EURASIP J. Wirel. Commun. Netw. 2011, 2011, 1–27.
Regression
RSSI distance
Classification
11
Classifier
X Y
Classification
C1
C2
C3
C1
C2
C3
Classification: example
12
Classifier
RSSI
Zigbee
WiFi
Bluetooth
Application area: System recognition
Zheng, Xiaolong, et al. "ZiSense: towards interference resilient duty cycling in
wireless sensor networks." Proceedings of the 12th ACM Conference on
Embedded Network Sensor Systems. ACM, 2014.
Microwave
-
-
-
-
Clustering
13
Clustering
X ?
C1
C2
C3
Clustering
Clustering: example
14
Application area: System identification
Shetty, N.; Pollin, S.; Pawełczak, P. Identifying spectrum usage by unknown
systems using experiments in machine learning. In Proceedings of the 2009
IEEE Wireless Communications and Networking Conference, Budapest,
Hungary, 5–8 April 2009
Clustering
X ?
Zigbee
WiFi
Noise
Anomaly detection
15
Anomaly
X Y/?
C1
C2
C3
Anomaly detection
Which DM/ML method can you use?
16
X Y
Uluagac, A. Selcuk, et al. "A passive technique for fingerprinting wireless devices
with wired-side observations." Communications and Network Security (CNS), 2013
IEEE Conference on. IEEE, 2013
Application area: ?
Measurements from a device Device type
?Classification
Machine learning in wireless networks
• Introduction
• Data driven design: examples
• Data science in wireless networks: a tutorial
• Conclusion
17
The knowledge discovery process
28/11/201618
The knowledge discovery process
28/11/201619
Step 1: Understanding the problem domain
Problem formulation
 Fingerprinting wireless
devices
 Identify devices and
device types
 classification problem
Goal
 A new solution for
Network Access
Control to enhance
network security
Assumptions
 Packet generation is
influenced by hardware
architecture (CPU, DMA,
L1/L2 cache, ..)
Hypothesis
 Identify devices and/or
device types based on
statistical properties of
their traffic flows
Data collection
 Analyze inter-arrival times
(IATs) from several devices
The knowledge discovery process
28/11/201620
Collected data
• IAT traces
Validate the data
 Is the selected data a
representative sample for
solving the formulated
problem?
Validate the hypothesis
 Is the stated hypothesis true
and the selected data mining
task is likely to prove it?
Step 2: Understanding the data
Device fingerprinting
• Data collection
• Data repositories (e.g. CRAWDAD)
• Run experiments on testbed facilities
• Collect data in situ
21
Overall
 94 files
 Total ~137 mil.
 Mean ~1.46 mil.
 Std ~1.3 mil
Dell
iPad
iPhone
Nokia
Device fingerprinting
Step 2: Understanding the data
22
Visual techniques Computational techniques
 Five number summary
 Standard deviation, variance
 skewness
 Coefficient of determination (R2)
 Coefficient of correlation
 …
boxplot
PDF and CDF
Scatter plots
Histograms
Device fingerprinting
Step 2: Understanding the data
• Visual techniques
• PDF, time-series, histograms…
23
Device fingerprinting
Step 2: Understanding the data
• Computational techniques
• 5-num summary
24
Device fingerprinting
Step 2: Understanding the data
• Computational
• Coefficient of determination (R2)
• Analysis for device identification
25
 How much can data from one Dell Notebook tell about the data from other Dell
Notebooks?
DN2
DN3
The knowledge discovery process
28/11/201626
Step 3: Data pre-processing
Raw data
• Traces of IAT
data points
Training data
 Features extraction
 Feature vectors
 Training examples
Device fingerprinting
Data pre-processing
27
Device fingerprinting
Data pre-processing
28
Device fingerprinting
Data pre-processing
• Features extraction
29
The knowledge discovery process
28/11/201630
Step 4: Data mining
Training data
• Feature vectors of
histogram bins
Model
 Neural network
 HL=6, α=0.1, learned
weights
 k-Nearest Neighbors
 K=1
 Decision trees
 Logistic regression
 …
ML
31
Device fingerprinting
Step 5: Performance evaluation
Test data
• Test set of feature
vectors
Performance
indication
 RMSE, MAE, R2…
 Precision, Recall,
 Confusion matrix
 …
• Algorithm selection: k-fold cross validation
32
Device fingerprinting
Step 5: Performance evaluation
• Performance evaluation
• Confusion matrix
• Accuracy, Precision, Recall, accuracy, F1-score
33
Device fingerprinting
Step 5: Performance evaluation
34
Device fingerprinting
Step 5: Performance evaluation
• Device type classification results
35
Device fingerprinting
Step 5: Performance evaluation
• Model tuning: neural networks
Kulin, Merima, Carolina Fortuna, Eli De Poorter, Dirk Deschrijver, and Ingrid Moerman. "Data-Driven
Design of Intelligent Wireless Networks: An Overview and Tutorial." Sensors 16, no. 6 (2016): 790.
More details about how to tune your algorithm can be found:
Conclusion
36
• Data-driven network design can be used for
• Failure detection
• Systems recognition
• Performance optimization…
• Data traces are valuable
• Considering releasing data traces after use
• Need for increased collaboration
• Network experts, testbed experts, data mining experts,
statisticians, wireless communication, etc.
Machine learning for wireless networks @Bestcom2016

Machine learning for wireless networks @Bestcom2016

  • 1.
    Machine learning forimproving wireless network performance Merima Kulin, Eli De Poorter, Dirk Deschrijver, Tom Dhaene and Ingrid Moerman merima.kulin@intec.ugent.be Internet Based Communication Networks and Services research group (IBCN)- IDLab Department of Information Technology (INTEC) Ghent University - imec BESTCOM, 21.10.2016., Louvain
  • 2.
    Machine learning forwireless networks • Introduction • Data-driven design: examples • Data science in wireless networks: a tutorial • Conclusion 2
  • 3.
    Why machine learning? •Gartner's Hype Cycle for Emerging Technologies 2015. 2016.
  • 4.
    Computational power Massive amounts of data Unprecedentedadvances in ML Why machine learning? Data is the new oil!
  • 5.
    What kind ofdata are generating wireless networks? IoT Network monitoring Cognitive radio Wireless networks as data sources
  • 6.
    Data Science What isdata science? 6 Machine learning Data mining Data analysis ML algorithm selection Model Evaluation Pre- processing … Kulin, Merima, Carolina Fortuna, Eli De Poorter, Dirk Deschrijver, and Ingrid Moerman. "Data-Driven Design of Intelligent Wireless Networks: An Overview and Tutorial." Sensors 16, no. 6 (2016): 790. • Machine learning vs. • Data mining vs. • Data science “Data science is the study of generalizable extraction of knowledge from data”.
  • 7.
    Machine learning inwireless networks • Introduction • Data driven design: examples • Data science in wireless networks: a tutorial • Conclusion 7
  • 8.
    Data mining/Machine learningapproaches 8 Regression Classification Clustering Anomaly detection
  • 9.
  • 10.
    Regression - example 10 Applicationarea: Localization Vanheel, F.; Verhaevert, J.; Laermans, E.; Moerman, I.; Demeester, P. Automated linear regression tools improve rssi wsn localization in multipath indoor environment. EURASIP J. Wirel. Commun. Netw. 2011, 2011, 1–27. Regression RSSI distance
  • 11.
  • 12.
    Classification: example 12 Classifier RSSI Zigbee WiFi Bluetooth Application area:System recognition Zheng, Xiaolong, et al. "ZiSense: towards interference resilient duty cycling in wireless sensor networks." Proceedings of the 12th ACM Conference on Embedded Network Sensor Systems. ACM, 2014. Microwave - - - -
  • 13.
  • 14.
    Clustering: example 14 Application area:System identification Shetty, N.; Pollin, S.; Pawełczak, P. Identifying spectrum usage by unknown systems using experiments in machine learning. In Proceedings of the 2009 IEEE Wireless Communications and Networking Conference, Budapest, Hungary, 5–8 April 2009 Clustering X ? Zigbee WiFi Noise
  • 15.
  • 16.
    Which DM/ML methodcan you use? 16 X Y Uluagac, A. Selcuk, et al. "A passive technique for fingerprinting wireless devices with wired-side observations." Communications and Network Security (CNS), 2013 IEEE Conference on. IEEE, 2013 Application area: ? Measurements from a device Device type ?Classification
  • 17.
    Machine learning inwireless networks • Introduction • Data driven design: examples • Data science in wireless networks: a tutorial • Conclusion 17
  • 18.
    The knowledge discoveryprocess 28/11/201618
  • 19.
    The knowledge discoveryprocess 28/11/201619 Step 1: Understanding the problem domain Problem formulation  Fingerprinting wireless devices  Identify devices and device types  classification problem Goal  A new solution for Network Access Control to enhance network security Assumptions  Packet generation is influenced by hardware architecture (CPU, DMA, L1/L2 cache, ..) Hypothesis  Identify devices and/or device types based on statistical properties of their traffic flows Data collection  Analyze inter-arrival times (IATs) from several devices
  • 20.
    The knowledge discoveryprocess 28/11/201620 Collected data • IAT traces Validate the data  Is the selected data a representative sample for solving the formulated problem? Validate the hypothesis  Is the stated hypothesis true and the selected data mining task is likely to prove it? Step 2: Understanding the data
  • 21.
    Device fingerprinting • Datacollection • Data repositories (e.g. CRAWDAD) • Run experiments on testbed facilities • Collect data in situ 21 Overall  94 files  Total ~137 mil.  Mean ~1.46 mil.  Std ~1.3 mil Dell iPad iPhone Nokia
  • 22.
    Device fingerprinting Step 2:Understanding the data 22 Visual techniques Computational techniques  Five number summary  Standard deviation, variance  skewness  Coefficient of determination (R2)  Coefficient of correlation  … boxplot PDF and CDF Scatter plots Histograms
  • 23.
    Device fingerprinting Step 2:Understanding the data • Visual techniques • PDF, time-series, histograms… 23
  • 24.
    Device fingerprinting Step 2:Understanding the data • Computational techniques • 5-num summary 24
  • 25.
    Device fingerprinting Step 2:Understanding the data • Computational • Coefficient of determination (R2) • Analysis for device identification 25  How much can data from one Dell Notebook tell about the data from other Dell Notebooks? DN2 DN3
  • 26.
    The knowledge discoveryprocess 28/11/201626 Step 3: Data pre-processing Raw data • Traces of IAT data points Training data  Features extraction  Feature vectors  Training examples
  • 27.
  • 28.
  • 29.
  • 30.
    The knowledge discoveryprocess 28/11/201630 Step 4: Data mining Training data • Feature vectors of histogram bins Model  Neural network  HL=6, α=0.1, learned weights  k-Nearest Neighbors  K=1  Decision trees  Logistic regression  … ML
  • 31.
    31 Device fingerprinting Step 5:Performance evaluation Test data • Test set of feature vectors Performance indication  RMSE, MAE, R2…  Precision, Recall,  Confusion matrix  …
  • 32.
    • Algorithm selection:k-fold cross validation 32 Device fingerprinting Step 5: Performance evaluation
  • 33.
    • Performance evaluation •Confusion matrix • Accuracy, Precision, Recall, accuracy, F1-score 33 Device fingerprinting Step 5: Performance evaluation
  • 34.
    34 Device fingerprinting Step 5:Performance evaluation • Device type classification results
  • 35.
    35 Device fingerprinting Step 5:Performance evaluation • Model tuning: neural networks Kulin, Merima, Carolina Fortuna, Eli De Poorter, Dirk Deschrijver, and Ingrid Moerman. "Data-Driven Design of Intelligent Wireless Networks: An Overview and Tutorial." Sensors 16, no. 6 (2016): 790. More details about how to tune your algorithm can be found:
  • 36.
    Conclusion 36 • Data-driven networkdesign can be used for • Failure detection • Systems recognition • Performance optimization… • Data traces are valuable • Considering releasing data traces after use • Need for increased collaboration • Network experts, testbed experts, data mining experts, statisticians, wireless communication, etc.