Quantitive Time Series Analysis of Malware and Vulnerability Trends


Published on

Quantitive Time Series Analysis of Malware and Vulnerability Trends - Craig Wright

Published in: Business, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Quantitive Time Series Analysis of Malware and Vulnerability Trends

    2. 2. Who Am I <ul><li>Senior IS Audit Manager - BDO </li></ul><ul><li>My Specialties </li></ul><ul><li>ISMS, ISO 7799 Consulting and Audit/Review </li></ul><ul><li>Digital Forensics </li></ul><ul><li>Information Security Design and Review </li></ul><ul><li>Threat/Risk Analysis and Review </li></ul><ul><li>Information Risk and Management (ANZ4360) </li></ul><ul><li>Data Mining </li></ul><ul><li>Neural Networks </li></ul><ul><li>Anomaly Detection Systems </li></ul><ul><li>CAATS </li></ul><ul><li>Technology Related Business Continuity Planning (BCP) and Disaster Recovery Planning (DRP) </li></ul><ul><li>Cryptography </li></ul>Craig S Wright, DTh LLM (Cand.) MNSA MMIT CISA CISM CISSP ISSMP ISSAP G7799 GCFA CCE MSDBA AFAIM MACS And a partridge in a pear tree…
    3. 3. Today’s Presentation <ul><li>To effectively protect against attacks to the computers systems and network architecture, we need to understand the threats and to be able to create predictive models for them. </li></ul>
    4. 4. A Quantitative Time Series Analysis of Malware and Vulnerability Trends <ul><li>Introduction and objectives </li></ul><ul><ul><li>The creation of Quantitative Risk models in Information Systems Security is a field in its infancy. </li></ul></ul><ul><ul><li>The prediction of threats is oft touted as being too difficult due to a shortage of data and the costs associated with collecting an analysing data for a site. </li></ul></ul>
    5. 5. Research Design / Methods / Data Collection <ul><li>It has been deduced that three main problems exist within the analytical process involved with Information Systems security (Valentino, 2003): </li></ul><ul><ul><li>utilising all available information sources, </li></ul></ul><ul><ul><li>verifying the validity of a suspected computer system intrusion, and </li></ul></ul><ul><ul><li>following a standard process. </li></ul></ul>
    6. 6. Research Data Sources <ul><li>The Wildlist organisation </li></ul><ul><li>Virus Bulletin </li></ul><ul><li>Vendor Virus bulletins </li></ul><ul><li>Vendor vulnerability announcements </li></ul><ul><li>CERT </li></ul>
    7. 7. ARIMA techniques for time-series analysis <ul><li>Three sets of data have been collected for analysis. These consist of: </li></ul><ul><ul><li>The reported monthly Virus Incidents (Virus.No), </li></ul></ul><ul><ul><li>The numbers of infections/incidents associated with the most prevent malware in the month (Top.Mth), and </li></ul></ul><ul><ul><li>The Wildlist collated monthly data for malware reported “in the wild” (Wild.Lst). </li></ul></ul>
    8. 8. Initial observations <ul><li>Visual analysis alone is sufficient to see that trends in malicious code incidents have increased significantly over the last 3 years in a non-linear manner. </li></ul>
    9. 9. Wildlist Trends <ul><li>It is clear that there is a trend and that the variance increases with the mean. </li></ul>
    10. 10. A logarithmic transform was selected for the three datasets <ul><li>There is a clear trend with all three sets of data with the number of malicious code incidents increasing over time. The trends are all roughly linear (particularly the Wildlist data), but it is difficult to be sure in the presence of the other features. </li></ul>
    11. 11. Analysis of Wildlist Data <ul><li>A Timeplot of d=1 of the logarithm for the Wildlist data shows that the series is stationary after taking one difference. There appears to be no seasonality with this timeseries. </li></ul>
    12. 12. Wildlist ACF
    13. 13. Wildlist Partial ACF
    14. 14. Inspection of the ACF PACF Plots <ul><li>The ACF/PACF plots suggested that either an AR (1) or MA (1) model for the differenced series may be suitable. </li></ul><ul><li>Taking the log transformed differenced values (d=1), the ACF plot decreases exponentially to zero and the PACF plot is significant at lag 1. </li></ul>
    15. 15. Model Comparison -685.5491 0.985 -675.5562 -681.5908 0.010813 149 IMA(1, 2) No Intercept -685.5822 0.985 -675.5899 -681.6245 0.0108106 149 ARI(2, 1) No Intercept -685.5343 0.985 -680.5581 -683.5753 0.010742 150 IMA(1, 1) No Intercept -685.3136 0.985 -680.3351 -683.3524 0.0107579 150 ARI(1, 1) No Intercept -2LogLH RSquare SBC AIC Variance DF Model
    16. 16. Model Selection <ul><li>Over-fitting either model gave back values of the coefficients that where not significant at the p-value < 5%. </li></ul><ul><li>The diagnostic plots for each model produced no significant values within the residual plots and we could see no evidence of inadequacy for either model. </li></ul>
    17. 17. Comparison of forecasts <ul><li>To see if there was any important difference in the models in terms of the aim of the analysis (forecasting), forecasts and forecast intervals were computed to a time of the last 5 months to May 2006. </li></ul>
    18. 18. Comparison of forecasts <ul><li>ARI models where tested. </li></ul><ul><li>No significant differences where found between the two models and all forecast data were contained in the predicted confidence intervals. </li></ul>
    19. 19. Analysis of Virus Incidents <ul><li>The analysis is focused on the overall pattern of malware incidents reported monthly. A side comparison of the number of incidents which are attributable to the most prevalent malware varietals has also been undertaken. </li></ul>
    20. 21. Analysis of Virus Incidents <ul><li>It is clear from the plot of the two variables alone that the most prevalent malware varietals follows a similar pattern to the total number of incidents and that the two functions are becoming more closely correlated over time. </li></ul><ul><li>This would indicate that individual computer viruses and worms are having a greater impact individually. </li></ul>
    21. 22. Analysis of Virus Incidents <ul><li>The trend is thus that fewer numbers of malicious code types are causing more damage. </li></ul><ul><li>In the past a large number of virus types where generally acting at any given time. </li></ul><ul><li>The trend is towards greater effects by specific malicious code samples. </li></ul>
    22. 23. ACF
    23. 24. PACF
    24. 25. Model Comparison -79.10179 0.908 -55.38593 -69.83768 0.5700881 128 ARI(5, 1) No Intercept -74.54214 0.904 -55.46153 -67.02293 0.5865218 129 ARI(4, 1) No Intercept -2LogLH RSquare SBC AIC Variance DF Model
    25. 26. ARI (5, 1) Model Model: ARI (5, 1) Parameter Estimates 0.0326 -2.16 0.0973837 -0.2103974 5 AR5 0.0003 -3.74 0.0965763 -0.3610897 4 AR4 0.0025 -3.09 0.0883067 -0.272786 3 AR3 0.0235 -2.29 0.0887335 -0.2034253 2 AR2 <.0001 -4.57 0.0850698 -0.3886438 1 AR1 Prob>|t| t Ratio Std Error Estimate Lag Term
    26. 27. The residual plot of the ARI (5, 1) model for the fitted value v the actual value shows no recognisable pattern
    27. 28. Tests of the model <ul><li>The residual plot of the ARI (5, 1) model for the fitted value v the actual value shows no recognisable pattern. A Normal Q-Q plot of the residuals shows that the residuals are near to normal, though they are slightly skewed. </li></ul><ul><li>None of the values seem to be extreme outliers however and have not been excluded. </li></ul>
    28. 29. Prediction
    29. 30. The ARI (5, 1) model supports predictions for the 5 month period with all the observed values falling into the confidence limits Forecast Values
    30. 31. Findings <ul><li>The threat is not abating! </li></ul><ul><li>It also seems that the industry is not keeping up with the threat. </li></ul><ul><li>Further research into why this is occurring to assess the future levels of threats should be conducted </li></ul>
    31. 32. Where this can lead <ul><li>The results demonstrate that time series analysis is a valid method of predicting trends in malicious code incidents. </li></ul><ul><li>The results have applications to operational risk in general and further development of models and risk engines is warranted from the findings. </li></ul>
    32. 33. Further Research <ul><li>Further research into frequency domain analysis is expected to aide in the determination of patterns in past threat frequencies. </li></ul><ul><li>Analysis of vulnerability data using stochastic point-process models to gain more insight into the mechanistic nature of the time series and how it is affected through the changing nature and evolution of the Malware varietals would also be expected to produce significant findings. </li></ul>
    33. 34. To Conclude <ul><li>It is feasible to use ARIMA models to forecast short-term malware trends. </li></ul><ul><li>The numbers of incidents are modelled and the incident data are input into the software package for future analysis. </li></ul><ul><li>Monthly trend patterns may be derived from statistic procedure. </li></ul>
    34. 35. Thank You <ul><li>Thank you for your time </li></ul>
    35. 36. Bibliography Or a day in the life of an academic junkie… Berman (1992) “Sojourns and Extremes of Stochastic Processes”, Wadsworth. Box, P., Jenkins, G. (1976) “Time-Series Analysis”, Rev. Ed. Holden-Day, US Bridwell, L.M. & Tibbet, P. (2000) “Sixth annual ICSA Labs Computer Virus Prevalance Survey 2000”, ICSA Labs US Brillinger, David (1975) “Time Series: Data Analysis and Theory (context)” Priestley Brockwell, P.J. & Davis, R.A. (1991). “ITSM: An Interactive Time Series Modelling Package for the PC”, Springer-Verlag. New York Brockwell, P.J. & Davis, R.A. (1991) “Time series: Theory and Methods”, Springer-Verlag. Brockwell, P.J., & Davis, R.A. (1996) “Introduction to Time Series and Forecasting”, 1996, Springer Brown , Lawrence D. (2003) “Estimation and Prediction in a Random Effects Point-process Model Involving Autoregressive Terms” Statistics Department, U. of Penn. Butler, S.A. (2001), “Improving Security Technology Selections with Decision Theory”. Emerald Cox, D. R, & Isham, V., (1985) “Point Processes”, Chapman & Hall. Cox, D. & Miller, H. (1965) “The Theory of Stochastic Processes”. Chapman and Hall, London, 1965. Chatfield, C. (1996) “The Analysis of Time Series : An Introduction”. 5th Ed, Chapman and Hall Chen, Z., Gao, L. & Kwiat. K, (2003) “Modeling the spread of active worms”. In IEEE INFOCOM Coulthard, A. Vuori, T. A. (2002) “Computer Viruses: a quantitative analysis” Logistics Information Management, Volume 15, Number 5/96, 2002 pp 400-409 Figueiredo Daniel R., Liu, Benyuan, Misra, Vishal, & Towsley, Don (200) “On the autocorrelation structure of TCP traffic”, Department of Computer Science, University of Massachusetts, Amherst, MA 01003-9264, USA, 2002 Elsevier Science B.V. Forgionne, G.A. (1999), “Management Science”, Wiley Custom Services, USA. Giles. K.E. (2004) “On the spectral analysis of backscatter data”. In GMP - Hawai 2004, URL:http://www.mts.jhu.edu/ priebe/FILES/-gmp hawaii04.pdf. Garetto, M., Gong, W., Towsley, D., (2003) “Modeling Malware Spreading Dynamics,” in Proc. of INFOCOM 2003, San Francisco, April, 2003. Harder, Uli, Johnson, Matt W., Bradley, Jeremy T. & Knottenbelt William J. (200x) “Observing Internet Worm and Virus Attacks with a Small Network Telescope”, Department of Computing, Imperial College London, South Kensington Campus, London SW7 2AZ, United Kingdom Electronic Notes in Theoretical Computer Science Hipel, K. W., & A.I. McLeod, A. I., (1994) “Time Series Modelling of Water Resources and Environmental Systems”, Elsevier, Amsterdam Kephart, J. O. & White, S. R. (1993) “Measuring and Modeling Computer Virus Prevalence”, Proc. of the 1993 IEEE Computer Society Symposium on Research in Security and Privacy, 2-15, May. 1993 Leadbetter, M.R., Lindgren, G. and Rootzen, H. (1983) “Extremes and Related Properties of Random Sequences and Processes”. Springer. Berlin. Pouget, F., Dacier, M., & Pham V.H. (200) “Understanding Threats: a Prerequisite to Enhance Survivability of Computing Systems” Institut Eur_ecom B.P. 193, 06904 Sophia Antipolis, FRANCE Rohloff, K., & Basar, T., (2005) “Stochastic Behaviour of Random Constant Scanning Worms,” in Proc. of IEEE Conference on Computer Communications and Networks 2005 (ICCCN 2005), San Diego, CA, Oct., 2005. Spafford, Eugene (1989) “The Internet Worm: Crisis and Aftermath” Communications of the ACM 32, 6 pp.678-687 June 1989 Shumway, R. H & Stoffer, D.S, (2000), “Time Series Analysis and its Applications, Springer-Verlag New York Tong (1990) “Non-linear Time Series: A Dynamical Systems Approach”, Oxford Univ. Press. Valentino, Christopher C. (2003) “Smarter computer intrusion detection utilizing decision modelling” Department of Information Systems, The University of Maryland, Baltimore County, Baltimore, MD, USA Yegneswaran, V., Barford, P., & Ullrich J. (2003) “Internet Intrusions: Global Characteristics and Prevalence”, SIGMETRICS 2003. Zou, C. C., Gong, W., & Towsley, D. (2003) “Worm propagation modelling and analysis under dynamic quarantine defense”. In ACM WORM 03, October 2003. Zou, C. C., Gong, W., Towsley, D., & Gao, L., (2005) “The Monitoring and Early Detection of Internet Worms,” IEEE/ACM Transactions on Networking, 13(5), 961- 974, October 2005. Zou, C. C., Gong, W., & Towsley, D. (2003) “Monitoring and Early Warning for Internet Worms”, Umass ECE Technical Report TR-CSE-03-01, 2003. Zou, C. C., Gong, W., & Towsley, D. “On the Performance of Internet Worm Scanning Strategies,” to appear in Journal of Performance Evaluation.