Data Works MD January 2021 - https://www.meetup.com/DataWorks/events/274890810/
Video - https://www.youtube.com/watch?v=Bwy9aPOdDPE
----------------------------------------
Malware Detection, Enabled by Machine Learning
With the scale of new malware being created each year growing, as well as the expanding market opportunities for malware reuse, protecting systems can’t rely solely on downloading a vendor’s updated virus signature files. Our customers need ways to detect and cordon likely threats, by using data retrieved from a combination of static and behavioral characteristics, and comparing it to other classes of “good” versus “bad” files. Optimally, the solution cordons risky files, force ranks them according to their likelihood of causing harm, correlates some metadata to help with further learning and to provide context to analysts, and lets an analyst “release” a file after further analysis and a request from a user. Oh, with that feedback relayed back into the model to support further tuning.
This talk will delve into IRAD efforts ClearEdge is doing on building and integrating malware detectors using machine learning algorithms.
----------------------------------------
Tina Coleman is a Technical Director for ClearEdge. In that role, she’s accountable for furthering the company’s depth in cybersecurity, particularly in aspects that allow ClearEdge to build solutions that scale for customer needs using its strengths in software engineering, dev ops, and data science. In addition to her work on contract and as a Technical Director, Ms. Coleman leads the Women In Technology program for ClearEdge, which seeks to encourage the participation and retention of women in technology. Ms. Coleman graduated from UMBC with undergraduate degrees in Computer Science and Economics and is currently pursuing her Masters in Cybersecurity Technology from University of Maryland, Global Campus. Tina can be found on LinkedIn at https://www.linkedin.com/in/tinadcoleman/
32. function init(args)
local needs = {}
-- data needed to be able to do analysis : a response body
needs["http.response_body"] = tostring(true)
-- setup for data to be returned as part of alert, if matches
needs["flowint"] = {"confidence"}
needs["flowvar"] = {"model"}
return needs
end
33. function match(args)
local confidence_value =
callMLModel(tostring(args["http.response_body"]))
if confidence_value and confidence_value > 0 then
ScFlowintSet(0, confidence_value)
return 1 -- did match.. Note that even low confidence items
will match here... we may want to tune that
else
ScFlowintSet(0, 0)
return 0 -- does not match
end
end
34. function callMLModel(file_contents)
local mp = _M
local respbody = {}
SCLogInfo("Calling ML Model")
local rq = mp.gen_request({file = {filename = "temp", data =
file_contents}})
rq.url = model_url
rq.sink = ltn12.sink.table(respbody)
http.request(rq)
return tonumber(respbody[1])
end