SlideShare a Scribd company logo
1 of 36
ML* vs Cryptocoin Miners
*Statistical Analysis
About me
• Jonn Callahan
• Principal Sec Consultant @ nVisium (DC-based)
• Python & Golang
• Math is neat, even if I’m terrible at it
• I’d liken myself to a mathematical skid, but I’m learning!
• Huge metalhead
• Credit @corsoa
Quick Intro
• Cryptocoin mining is a new (ish) path of monetizing popped boxes
• Monero (XMR) + Verium (VRM)
• Vast majority of pools utilize the Stratum protocol
• This is what we’ll actually be analyzing
• Network traffic is aggregated and exposed within AWS via VPC Flow
Logs
• Mining traffic probably exhibits some patterned behaviors
• So, is it possible to build a mining-specific IDS running against Flow
Logs?
Starting Data
• 24 hours of mining traffic from 12 different cryptocoin miners
• All Monero (XMR)
• All the same pool and port
• Varying c4 sizes (had the best $:hashrate at spot pricing)
• 3 weeks worth of non-mining traffic
• ~82,000 unique-by-IP, one-way streams (pre-transformation)
• ~38,000 unique-by-IP-pair, bi-directional streams (post-transformation)
• ~1,000 unique ENIs (virtual NICs)
VPC Flow Logs
• Aggregate 5-tuple network logs organized by ENI
• Aggregation occurs over 10-minute windows
• Two log entries for each single TCP stream
• version account-id interface-id srcaddr dstaddr srcport dstport
protocol packets bytes start end action log-status
• 2 178069999999 eni-55aa9999 91.189.89.199 172.12.12.12 3333
45113 6 3 760 1520029322 1520029358 ACCEPT OK
VPC Flow Logs Transformation
• Need to re-organization and correlate multiple flow log entries that
correspond to TCP streams from two services communicating
• Can filter out REJECT
• Focusing strictly on clients that are successfully mining
• Organize by IP and protocol (ignore port)
• Proto 6 == TCP
• Client ports can vary (ephemeral ports) but a remote mining server is unlikely
to receive anything other than mining traffic – safe to aggregate traffic by
unique IP pairs
• Count all unique IP addresses, use to assume which is the source (saves an
AWS API call)
Data Extraction
• Eight features available to us
• Number of bytes sent by src/dst
• Number of packets sent by src/dst
• Number of unique src/dst ports
• Length of src/dst communication
Strategy
• Graph data (false positives if iteration > 1)
• Eyeball patterns
• Build model
• Filter non-mining traffic
• Repeat until no more false positives
Attempt 0
• Quick and dirty check: number of unique dst ports
• Since our infected machine acts as a client communicating with a
remote mining pool, it’s safe to assume only a single remote port is
used
• But a variety of ephemeral src ports will likely be used
• Therefore, lets strip out streams with multiple dst ports
• Sadly, this strips out very few streams (~200 of our ~38k)
Attempt 1
• Lets graph two dimensions of our data and see how they correlate
• Specifically, lets focus on the number of bytes and packets sent
• Do they correlate?
• Is there clustering?
• Any discernable patterns?
Patterns
• Fairly strong clustering among most metric-pairs
• src pkts x src bytes has the strongest linear correlation
• Though all metric pairs look to correlate linearly to some degree
• A striation pattern emerges for src with enough points
• Also appears with src bytes x dst bytes, but with less linear correlation
• Not so much for dst, though maybe aggregating all mining data and zooming
in a bit will reveal something
• Surprisingly, points at (0, 0)
Build a Model
• Lets focus on the clustering
• Since we strong clustering, we can define the area that encompasses
all of our samples via a convex hull
• This methodology is conceptually similar to defining a hyperplane a la SVM,
for those familiar
Let’s filter!
• With a defined hull, we can now pipe the samples through
• If a set of traffic stream’s membership rate within our hull is under
our limit, strip it out
• For now, let’s pick an arbitrary rate of 98%
• Fairly mediocre results: ~38k -> ~34k
n-Dimensional Euclidean Magic
• The previous hull graphs aren’t really representative of what we’re
actually doing
• Since computing the convex hull of a point cloud relies on Euclidean
distances, we don’t need to limit ourselves to 2-dimensional hulls
(and 5 different sets of hulls, one for each metric pairing)
• Instead, let’s define a 4-dimensional hull encompassing:
• src bytes sent
• src packets sent
• dst bytes sent
• dst packets sent
Curse of Dimensionality
• As the number of dimensions increases, volume increases at such a
rate that points become sparse
• Restated: as dimensions increases linearly, amount of data needed
increases exponentially
• Typically only starts to occur with hundreds of dimensions, so
irrelevant to us
• But an important note nonetheless!
Results
• ~38k streams -> ~19k streams
• Good first attempt, but lets take a look at what got through
Attempt 2
• Many of the FPs looked like the previous graph
• Lots of points at (0,0,0,0)
• Since our test data populates far more than (0,0,0,0), lets strip those
out -> build a new hull -> re-filter
Et voilà!
• ~19k -> 4
• No, that’s not a typo. Four streams, not 4k.
So what next?
• Lots of other options for fingerprinting cluster shape
• Size of the convex hull interior
• Mean/median nearest-neighbor
• Spatial homogeneity within the convex hull
• Ripley K and L functions (expensive and accurate)
• Number of members within a set of concentric circles centered on the center point of
the convex hull (cheap and inaccurate)
• Quantifying the striations found in src pkts x src bytes pairing
• Timestamp analysis (bursts of traffic vs continuous)
• This would easily strip out our last four false positives
• Linear regression line
• Angle
• Mean/median pt distance to regression line
So what next?
• Optimization
• A perfect model is useless if it takes too long to run
• Compare time-to-execute of other strategies
So what next?
• Collect more mining data
• Pools with different Stratum configurations
• Share difficulty
• Reward systems (PPLNS, PPS, etc)
• Solo mining
• Unsuccessful/blocked miners
• REJECT traffic – miners that can’t reach their target pool/wallet
• Different coins
• Collect more non-mining data
• Now taking VPC Flow Log donations….
Tooling + Learning
• Python!
• numpy + scipy for all the mathematical black magic
• matplotlib for making the pretty charts (and some 2D topological
stuff)
• Algorithms of the Intelligent Web by Douglas McIlwraith et. al.
• (second edition)
If you’re feeling generous….
• I need lots and lots of VPC Flow Logs
• Looking for people to donate
• I plan to release OSS tooling – would be happy to let those who help
with this preview tooling in a sort-of closed beta
Thanks for listening!
• jonn.callahan@nvisium.com
• @jonn.callahan

More Related Content

Similar to OWASP Netherlands -- ML vs Cryptocoin Miners

Blockchain Technology Introduction and Basics
Blockchain Technology  Introduction and BasicsBlockchain Technology  Introduction and Basics
Blockchain Technology Introduction and Basicsjayasris2023
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer InsightMapR Technologies
 
Cache aware hybrid sorter
Cache aware hybrid sorterCache aware hybrid sorter
Cache aware hybrid sorterManchor Ko
 
24-ad-hoc.ppt
24-ad-hoc.ppt24-ad-hoc.ppt
24-ad-hoc.pptsumadi26
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012Ted Dunning
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...NoSQLmatters
 
Decima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnDecima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnGuerrilla
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford MapR Technologies
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series DatabaseDataWorks Summit
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
8. TDM Mux_Demux.pdf
8. TDM Mux_Demux.pdf8. TDM Mux_Demux.pdf
8. TDM Mux_Demux.pdfTabrezahmed39
 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Managementk_tauhid
 

Similar to OWASP Netherlands -- ML vs Cryptocoin Miners (20)

Blockchain Technology Introduction and Basics
Blockchain Technology  Introduction and BasicsBlockchain Technology  Introduction and Basics
Blockchain Technology Introduction and Basics
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
 
Raptor codes
Raptor codesRaptor codes
Raptor codes
 
Cache aware hybrid sorter
Cache aware hybrid sorterCache aware hybrid sorter
Cache aware hybrid sorter
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
24-ad-hoc.ppt
24-ad-hoc.ppt24-ad-hoc.ppt
24-ad-hoc.ppt
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
 
Decima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnDecima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero Dawn
 
Jvm memory model
Jvm memory modelJvm memory model
Jvm memory model
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Modern Cryptography
Modern CryptographyModern Cryptography
Modern Cryptography
 
8. TDM Mux_Demux.pdf
8. TDM Mux_Demux.pdf8. TDM Mux_Demux.pdf
8. TDM Mux_Demux.pdf
 
lecture04.ppt
lecture04.pptlecture04.ppt
lecture04.ppt
 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Management
 

Recently uploaded

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 

OWASP Netherlands -- ML vs Cryptocoin Miners

  • 1. ML* vs Cryptocoin Miners *Statistical Analysis
  • 2. About me • Jonn Callahan • Principal Sec Consultant @ nVisium (DC-based) • Python & Golang • Math is neat, even if I’m terrible at it • I’d liken myself to a mathematical skid, but I’m learning! • Huge metalhead
  • 4. Quick Intro • Cryptocoin mining is a new (ish) path of monetizing popped boxes • Monero (XMR) + Verium (VRM) • Vast majority of pools utilize the Stratum protocol • This is what we’ll actually be analyzing • Network traffic is aggregated and exposed within AWS via VPC Flow Logs • Mining traffic probably exhibits some patterned behaviors • So, is it possible to build a mining-specific IDS running against Flow Logs?
  • 5. Starting Data • 24 hours of mining traffic from 12 different cryptocoin miners • All Monero (XMR) • All the same pool and port • Varying c4 sizes (had the best $:hashrate at spot pricing) • 3 weeks worth of non-mining traffic • ~82,000 unique-by-IP, one-way streams (pre-transformation) • ~38,000 unique-by-IP-pair, bi-directional streams (post-transformation) • ~1,000 unique ENIs (virtual NICs)
  • 6. VPC Flow Logs • Aggregate 5-tuple network logs organized by ENI • Aggregation occurs over 10-minute windows • Two log entries for each single TCP stream • version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status • 2 178069999999 eni-55aa9999 91.189.89.199 172.12.12.12 3333 45113 6 3 760 1520029322 1520029358 ACCEPT OK
  • 7. VPC Flow Logs Transformation • Need to re-organization and correlate multiple flow log entries that correspond to TCP streams from two services communicating • Can filter out REJECT • Focusing strictly on clients that are successfully mining • Organize by IP and protocol (ignore port) • Proto 6 == TCP • Client ports can vary (ephemeral ports) but a remote mining server is unlikely to receive anything other than mining traffic – safe to aggregate traffic by unique IP pairs • Count all unique IP addresses, use to assume which is the source (saves an AWS API call)
  • 8. Data Extraction • Eight features available to us • Number of bytes sent by src/dst • Number of packets sent by src/dst • Number of unique src/dst ports • Length of src/dst communication
  • 9. Strategy • Graph data (false positives if iteration > 1) • Eyeball patterns • Build model • Filter non-mining traffic • Repeat until no more false positives
  • 10. Attempt 0 • Quick and dirty check: number of unique dst ports • Since our infected machine acts as a client communicating with a remote mining pool, it’s safe to assume only a single remote port is used • But a variety of ephemeral src ports will likely be used • Therefore, lets strip out streams with multiple dst ports • Sadly, this strips out very few streams (~200 of our ~38k)
  • 11. Attempt 1 • Lets graph two dimensions of our data and see how they correlate • Specifically, lets focus on the number of bytes and packets sent • Do they correlate? • Is there clustering? • Any discernable patterns?
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. Patterns • Fairly strong clustering among most metric-pairs • src pkts x src bytes has the strongest linear correlation • Though all metric pairs look to correlate linearly to some degree • A striation pattern emerges for src with enough points • Also appears with src bytes x dst bytes, but with less linear correlation • Not so much for dst, though maybe aggregating all mining data and zooming in a bit will reveal something • Surprisingly, points at (0, 0)
  • 17. Build a Model • Lets focus on the clustering • Since we strong clustering, we can define the area that encompasses all of our samples via a convex hull • This methodology is conceptually similar to defining a hyperplane a la SVM, for those familiar
  • 18.
  • 19.
  • 20.
  • 21. Let’s filter! • With a defined hull, we can now pipe the samples through • If a set of traffic stream’s membership rate within our hull is under our limit, strip it out • For now, let’s pick an arbitrary rate of 98% • Fairly mediocre results: ~38k -> ~34k
  • 22. n-Dimensional Euclidean Magic • The previous hull graphs aren’t really representative of what we’re actually doing • Since computing the convex hull of a point cloud relies on Euclidean distances, we don’t need to limit ourselves to 2-dimensional hulls (and 5 different sets of hulls, one for each metric pairing) • Instead, let’s define a 4-dimensional hull encompassing: • src bytes sent • src packets sent • dst bytes sent • dst packets sent
  • 23. Curse of Dimensionality • As the number of dimensions increases, volume increases at such a rate that points become sparse • Restated: as dimensions increases linearly, amount of data needed increases exponentially • Typically only starts to occur with hundreds of dimensions, so irrelevant to us • But an important note nonetheless!
  • 24.
  • 25. Results • ~38k streams -> ~19k streams • Good first attempt, but lets take a look at what got through
  • 26.
  • 27. Attempt 2 • Many of the FPs looked like the previous graph • Lots of points at (0,0,0,0) • Since our test data populates far more than (0,0,0,0), lets strip those out -> build a new hull -> re-filter
  • 28.
  • 29. Et voilà! • ~19k -> 4 • No, that’s not a typo. Four streams, not 4k.
  • 30.
  • 31. So what next? • Lots of other options for fingerprinting cluster shape • Size of the convex hull interior • Mean/median nearest-neighbor • Spatial homogeneity within the convex hull • Ripley K and L functions (expensive and accurate) • Number of members within a set of concentric circles centered on the center point of the convex hull (cheap and inaccurate) • Quantifying the striations found in src pkts x src bytes pairing • Timestamp analysis (bursts of traffic vs continuous) • This would easily strip out our last four false positives • Linear regression line • Angle • Mean/median pt distance to regression line
  • 32. So what next? • Optimization • A perfect model is useless if it takes too long to run • Compare time-to-execute of other strategies
  • 33. So what next? • Collect more mining data • Pools with different Stratum configurations • Share difficulty • Reward systems (PPLNS, PPS, etc) • Solo mining • Unsuccessful/blocked miners • REJECT traffic – miners that can’t reach their target pool/wallet • Different coins • Collect more non-mining data • Now taking VPC Flow Log donations….
  • 34. Tooling + Learning • Python! • numpy + scipy for all the mathematical black magic • matplotlib for making the pretty charts (and some 2D topological stuff) • Algorithms of the Intelligent Web by Douglas McIlwraith et. al. • (second edition)
  • 35. If you’re feeling generous…. • I need lots and lots of VPC Flow Logs • Looking for people to donate • I plan to release OSS tooling – would be happy to let those who help with this preview tooling in a sort-of closed beta
  • 36. Thanks for listening! • jonn.callahan@nvisium.com • @jonn.callahan