SlideShare a Scribd company logo
1 of 13
Download to read offline
SHOBEVODSDT: SHODAN AND BINARY EDGE BASED
VULNERABLE OPEN DATA SOURCES DETECTION TOOL
OR
WHAT INTERNET OF THINGS SEARCH ENGINES KNOW
ABOUT YOU
The International Conference on Intelligent Data Science Technologies and Applications (IDSTA2021)
November 15-16, 2021. Tartu, Estonia (web-based)
Artjoms Daskevics, Anastasija Nikiforova
“Innovative Information Technologies” Laboratory, Programming Department
Faculty of Computing, University of Latvia
AIM
To propose an OSINT-based (Open Source Intelligence) tool for non-intrusive testing of open data sources inspecting their
vulnerabilities and their extent.
is the data source visible outside the organization?
what data can be gathered from open data sources (if any) and what is their “value” for attacker and fraudsters?
whether these data can pose the risks to organization using them to deploy an attack?
This allows both a comprehensive analysis of unprotected data sources, falling into a list of predefined data sources, or a
specific IP or IP range to examine what can be seen from the outside of the organization about the data source in use
The use of Open Source Intelligence (OSINT) tools, more precisely the Internet of Things Search Engines (IoTSE) should
allow the tool to inspect a list of predefined data sources on their vulnerabilities and their extent
ShoBeVODSDT
Shodan- and Binary Edge- based vulnerable open data sources detection tool
ShoBeVODSDT
ShoBEVODSDT uses mainly the passive assessment (non-intrusive testing), which is characterized by its
low level of intrusiveness;
the data sources concerned are not thoroughly and actively tested.;
the tool refer to the most likely and potentially existing bottlenecks or weaknesses which, if the fourth stage
of the penetration testing, namely the attack, would take place, could be revealed and exposed.
ShoBeVODSDT
Shodan- and Binary Edge- based vulnerable open data sources detection tool
ShoBeVODSDT
ShoBeVODSDT SCOPE
What will be inspected?
8 types of data sources– MySQL, PostgreSQL, MongoDB, Redis, Elasticsearch,
CouchDB, Cassandra and Memcached.
Three types of sources
relational databases,
NoSQL databases, both types, document-oriented,
column-oriented and key-value databases
data stores.
How it will be inspected?
OSINT tools or, more precisely, Internet of Things (IoT) search engines (IoTSE)
Shodan and BinaryEdge, which search for and index publicly available and accessible open data sources
Database Primary database model Connection data Default port
MySql Relational DBMS IP address, port, username, password 3306
PostgreSql Relational DBMS IP address, port, authentication data (if supports connection with a
password)
5432
MongoDB Document store IP address, port, username, password 5984
Redis Key-value store IP address, port, authentication data (if access control is enabled) 27017
Elasticsearch Search engine IP address, port 6379
CouchDB Document store IP address, port, authentication data (if anonymized access is not
enabled)
9200
Cassandra wide-column store IP address, port, authentication data 9160
Memcached key-value store IP address, port 11211
DATA SOURCES, THEIR MODELS AND CONNECTION DATA
ShoBeVODSDT ACTION
searches for files in a “checked” folder that corresponds to
the service and country being checked;
opens the file and checks IP address using the “check”
class method associated with the service;
if the connection has been successful, the IP address is
stored in „good/<service_name> _ <country>.txt”, if failed -
the IP address and error information are stored in the
„bad/<service_name>_ <country>.txt”.
Step I
IP address search (gather)
uses BinaryEdge and Shodan libraries to find
service IP addresses that belong to an user-defined
country;
combines results from BinaryEdge and Shodan
by eliminating duplicates;
saves results in the
“parsed/<service_name_>_<country>.txt”;
Step II
IP address check
Step III
Retrieving information from an IP
address (parse)
searches for files in a “parsed/good” folder that corresponds to the
service and country to be checked;
opens the file and tries to reconnect. If the connection was successful -
tries to download the information from the database. For each type of
database, the is different;
saves the information in the “parsed” ,“<IP_ ADDRESS>.txt”.
TOOL ARCHITECTURE
The search class includes a class constructor where a Shodan or
Binary Edge client is initialized using a valid API key and
search method to obtain data from Shodan or Binary Edge*.
*In the case of Binary Edge, a page number to search for IP addresses should
also be provided.
The service class includes a class constructor where a separate
service client tries to establish the new connection. Two
functions :
(1) “check”, which returns an error if the connection was
unsuccessful or “true” if it was successful
(2) “parse”, which attempts to download all information
from the database.
ShoBeVODSDT IN ACTION
Use-case - data on Latvia, Estonia and Lithuania (Baltic States)
15180 IP addresses were processed,
Lithuania (7453)
Estonia (5352)
Latvia (2375)
98.43% of the addresses have failed to connect
Category Description
0 failed to connect
1 has managed to connect but failed to gather data or information
2 has managed to connect, but the database is empty
3 has managed to connect by gathering system data or non-sensitive information
4 has managed to connect and gather sensitive data
5 compromised database
✔ the further actions took place with 1.57% or 93 IP addresses only
ShoBeVODSDT IN ACTION
“2” and “3” – the most popular categories – good point, i.e. while these
data sources are open, these data are not of very high importance to
attackers and fraudsters, although they can facilitate their attacks,
8% of data sources contain data that could be used by attackers,
12% of them have already been compromised
most empty and compromised databases belong to Elasticsearch.
most databases that store sensitive data belong to Memcached, but it is also a
leader in databases where sensitive data are not stored (category “3”).
Memcached and ElasticSearch have the highest number of open data sources
with higher “value” of data gathered from them in almost all categories, except for
relatively poor results demonstrated by the MongoDB for the number of
compromised databases and Redis for data sources storing sensitive data.
FUTURE WORKS
The list of used IoTSE may be extended to other well-known Search Engines such as Censys, ZoomEye etc. to allow more extensive
investigation and determine whether the number of IoTSE has an impact on the results.
Similarly, the number of data sources can be supplemented by other data sources identified as the most popular; especially given
Oracle and MS SQL are somteimes found to have the highest number of vulnerabilities.
Although our aim was to propose the tool for investigating databases only, further studies may also cover other “types of devices”,
such as Network Equipments, Terminal, Server, Office Equipment, Industrial Control Equipment, Smart Home, Power Supply
Equipment, Web Camera, Remote Management Equipment, Blockchain and industrial based connected devices in the cloud.
At the moment, the future study aims to apply the tool to specific countries of Latvia, Lithuania and Estonia and to carry out
extensive investigation on the current state of data sources and their security. This will allow conclusions to be drawn on differences
in country patterns, i.e. whether the technological development of Estonia will be also seen in this matter. It will draw more objective
conclusions on the less protected-by-design data sources.
RESULTS AND CONCLUSIONS I
The paper proposes a tool called ShoBeVODSDT - Shodan- and Binary Edge- based vulnerable open data sources
detection tool, for non-intrusive testing of open data sources for detecting their vulnerabilities. ShoBeVODSDT:
supports the identification of vulnerabilities at early security assessment stages and does not require the
implementation of active and possibly disruptive techniques;
uses two IoTSE (Shodan and Binary Edge) by extending their features with the advanced capabilities built
in it;
allows inspecting 8 predefined data sources - MySQL, PostgreSQL, MongoDB, Redis, Elasticsearch,
CouchDB, Cassandra and Memcached, on their vulnerabilities and their extent.
While the tool covers 8 data sources representing both rational databases, NoSQL databases and data stores, it is
designed to be easily scalable by extending the publicly available code  https://github.com/zhmyh/ShoBEVODST
https://www.eosc-hub.eu/open-science-info
RESULTS AND CONCLUSIONS II
The total number of open data sources available to everyone (who wants to access them) is not very high, i.e. less than 2% of
the data sources scanned.
BUT, there are data sources that may pose risks to organizations, since external users can access the information that can be
used for further attacks. For 12% of ispected data sources this has already taken place.
Security features built into the database allow to protect against unauthorized access, but there are databases with low
security features, where we were able to connect to nearly all IP addresses by retrieving information from them. Even more, in
some cases the databases, which do not use security mechanisms, have been already compromised.
THANK YOU FOR
ATTENTION!
QUESTIONS?
For more information, see ResearchGate
See also anastasijanikiforova.com
For questions or any other queries, contact
me via email - Anastasija.Nikiforova@lu.lv

More Related Content

What's hot

Final review m score
Final review m scoreFinal review m score
Final review m score
azhar4010
 

What's hot (20)

Konrad cedem praesi
Konrad cedem praesiKonrad cedem praesi
Konrad cedem praesi
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
McGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and ScalingMcGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and Scaling
 
DLD_SYNOPSIS
DLD_SYNOPSISDLD_SYNOPSIS
DLD_SYNOPSIS
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
 
Sanderson Shout It Out: LOUD
Sanderson Shout It Out: LOUDSanderson Shout It Out: LOUD
Sanderson Shout It Out: LOUD
 
Final review m score
Final review m scoreFinal review m score
Final review m score
 
Sub1555
Sub1555Sub1555
Sub1555
 
Semantics and linked data at astra zeneca
Semantics and linked data at astra zenecaSemantics and linked data at astra zeneca
Semantics and linked data at astra zeneca
 
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
 
dkNET ESP Meeting - February 2016
dkNET ESP Meeting - February 2016dkNET ESP Meeting - February 2016
dkNET ESP Meeting - February 2016
 
Implementation of Matching Tree Technique for Online Record Linkage
Implementation of Matching Tree Technique for Online Record LinkageImplementation of Matching Tree Technique for Online Record Linkage
Implementation of Matching Tree Technique for Online Record Linkage
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...
 

Similar to ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detection tool or what Internet of Things Search Engines know about you

Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
kalai75
 

Similar to ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detection tool or what Internet of Things Search Engines know about you (20)

BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Web Investigation
Web InvestigationWeb Investigation
Web Investigation
 
Top 10 data science technologies
Top 10 data science technologiesTop 10 data science technologies
Top 10 data science technologies
 
Database Management in Different Applications of IOT
Database Management in Different Applications of IOTDatabase Management in Different Applications of IOT
Database Management in Different Applications of IOT
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
Ultralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeUltralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC Edge
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
The LOD Gateway: Open Source Infrastructure for Linked Data
The LOD Gateway: Open Source Infrastructure for Linked DataThe LOD Gateway: Open Source Infrastructure for Linked Data
The LOD Gateway: Open Source Infrastructure for Linked Data
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analytics
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
 
JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...
JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...
JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...
 
Utility privacy tradeoff in databases an information-theoretic approach
Utility privacy tradeoff in databases an information-theoretic approachUtility privacy tradeoff in databases an information-theoretic approach
Utility privacy tradeoff in databases an information-theoretic approach
 

More from Anastasija Nikiforova

Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Anastasija Nikiforova
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Anastasija Nikiforova
 
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Anastasija Nikiforova
 
Open data hackathon as a tool for increased engagement of Generation Z: to h...
Open data hackathon as a tool for increased engagement of Generation Z:  to h...Open data hackathon as a tool for increased engagement of Generation Z:  to h...
Open data hackathon as a tool for increased engagement of Generation Z: to h...
Anastasija Nikiforova
 
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Anastasija Nikiforova
 
Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...
Anastasija Nikiforova
 
Atvērto datu potenciāls
Atvērto datu potenciālsAtvērto datu potenciāls
Atvērto datu potenciāls
Anastasija Nikiforova
 
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
Anastasija Nikiforova
 

More from Anastasija Nikiforova (20)

Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
 
Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...
 
Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...
 
Artificial Intelligence for open data or open data for artificial intelligence?
Artificial Intelligence for open data or open data for artificial intelligence?Artificial Intelligence for open data or open data for artificial intelligence?
Artificial Intelligence for open data or open data for artificial intelligence?
 
Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...
 
Data Quality as a prerequisite for you business success: when should I start ...
Data Quality as a prerequisite for you business success: when should I start ...Data Quality as a prerequisite for you business success: when should I start ...
Data Quality as a prerequisite for you business success: when should I start ...
 
Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
 
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
 
Open data hackathon as a tool for increased engagement of Generation Z: to h...
Open data hackathon as a tool for increased engagement of Generation Z:  to h...Open data hackathon as a tool for increased engagement of Generation Z:  to h...
Open data hackathon as a tool for increased engagement of Generation Z: to h...
 
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
 
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISCombining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
 
The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...
 
Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...
 
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERSOPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS
 
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
 
Atvērto datu potenciāls
Atvērto datu potenciālsAtvērto datu potenciāls
Atvērto datu potenciāls
 
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...
 
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
 
Towards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business ProcessesTowards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business Processes
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Buy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptxBuy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptx
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 

ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detection tool or what Internet of Things Search Engines know about you

  • 1. SHOBEVODSDT: SHODAN AND BINARY EDGE BASED VULNERABLE OPEN DATA SOURCES DETECTION TOOL OR WHAT INTERNET OF THINGS SEARCH ENGINES KNOW ABOUT YOU The International Conference on Intelligent Data Science Technologies and Applications (IDSTA2021) November 15-16, 2021. Tartu, Estonia (web-based) Artjoms Daskevics, Anastasija Nikiforova “Innovative Information Technologies” Laboratory, Programming Department Faculty of Computing, University of Latvia
  • 2. AIM To propose an OSINT-based (Open Source Intelligence) tool for non-intrusive testing of open data sources inspecting their vulnerabilities and their extent. is the data source visible outside the organization? what data can be gathered from open data sources (if any) and what is their “value” for attacker and fraudsters? whether these data can pose the risks to organization using them to deploy an attack? This allows both a comprehensive analysis of unprotected data sources, falling into a list of predefined data sources, or a specific IP or IP range to examine what can be seen from the outside of the organization about the data source in use The use of Open Source Intelligence (OSINT) tools, more precisely the Internet of Things Search Engines (IoTSE) should allow the tool to inspect a list of predefined data sources on their vulnerabilities and their extent ShoBeVODSDT Shodan- and Binary Edge- based vulnerable open data sources detection tool
  • 3. ShoBeVODSDT ShoBEVODSDT uses mainly the passive assessment (non-intrusive testing), which is characterized by its low level of intrusiveness; the data sources concerned are not thoroughly and actively tested.; the tool refer to the most likely and potentially existing bottlenecks or weaknesses which, if the fourth stage of the penetration testing, namely the attack, would take place, could be revealed and exposed. ShoBeVODSDT Shodan- and Binary Edge- based vulnerable open data sources detection tool ShoBeVODSDT
  • 4. ShoBeVODSDT SCOPE What will be inspected? 8 types of data sources– MySQL, PostgreSQL, MongoDB, Redis, Elasticsearch, CouchDB, Cassandra and Memcached. Three types of sources relational databases, NoSQL databases, both types, document-oriented, column-oriented and key-value databases data stores. How it will be inspected? OSINT tools or, more precisely, Internet of Things (IoT) search engines (IoTSE) Shodan and BinaryEdge, which search for and index publicly available and accessible open data sources
  • 5. Database Primary database model Connection data Default port MySql Relational DBMS IP address, port, username, password 3306 PostgreSql Relational DBMS IP address, port, authentication data (if supports connection with a password) 5432 MongoDB Document store IP address, port, username, password 5984 Redis Key-value store IP address, port, authentication data (if access control is enabled) 27017 Elasticsearch Search engine IP address, port 6379 CouchDB Document store IP address, port, authentication data (if anonymized access is not enabled) 9200 Cassandra wide-column store IP address, port, authentication data 9160 Memcached key-value store IP address, port 11211 DATA SOURCES, THEIR MODELS AND CONNECTION DATA
  • 6. ShoBeVODSDT ACTION searches for files in a “checked” folder that corresponds to the service and country being checked; opens the file and checks IP address using the “check” class method associated with the service; if the connection has been successful, the IP address is stored in „good/<service_name> _ <country>.txt”, if failed - the IP address and error information are stored in the „bad/<service_name>_ <country>.txt”. Step I IP address search (gather) uses BinaryEdge and Shodan libraries to find service IP addresses that belong to an user-defined country; combines results from BinaryEdge and Shodan by eliminating duplicates; saves results in the “parsed/<service_name_>_<country>.txt”; Step II IP address check Step III Retrieving information from an IP address (parse) searches for files in a “parsed/good” folder that corresponds to the service and country to be checked; opens the file and tries to reconnect. If the connection was successful - tries to download the information from the database. For each type of database, the is different; saves the information in the “parsed” ,“<IP_ ADDRESS>.txt”.
  • 7. TOOL ARCHITECTURE The search class includes a class constructor where a Shodan or Binary Edge client is initialized using a valid API key and search method to obtain data from Shodan or Binary Edge*. *In the case of Binary Edge, a page number to search for IP addresses should also be provided. The service class includes a class constructor where a separate service client tries to establish the new connection. Two functions : (1) “check”, which returns an error if the connection was unsuccessful or “true” if it was successful (2) “parse”, which attempts to download all information from the database.
  • 8. ShoBeVODSDT IN ACTION Use-case - data on Latvia, Estonia and Lithuania (Baltic States) 15180 IP addresses were processed, Lithuania (7453) Estonia (5352) Latvia (2375) 98.43% of the addresses have failed to connect Category Description 0 failed to connect 1 has managed to connect but failed to gather data or information 2 has managed to connect, but the database is empty 3 has managed to connect by gathering system data or non-sensitive information 4 has managed to connect and gather sensitive data 5 compromised database ✔ the further actions took place with 1.57% or 93 IP addresses only
  • 9. ShoBeVODSDT IN ACTION “2” and “3” – the most popular categories – good point, i.e. while these data sources are open, these data are not of very high importance to attackers and fraudsters, although they can facilitate their attacks, 8% of data sources contain data that could be used by attackers, 12% of them have already been compromised most empty and compromised databases belong to Elasticsearch. most databases that store sensitive data belong to Memcached, but it is also a leader in databases where sensitive data are not stored (category “3”). Memcached and ElasticSearch have the highest number of open data sources with higher “value” of data gathered from them in almost all categories, except for relatively poor results demonstrated by the MongoDB for the number of compromised databases and Redis for data sources storing sensitive data.
  • 10. FUTURE WORKS The list of used IoTSE may be extended to other well-known Search Engines such as Censys, ZoomEye etc. to allow more extensive investigation and determine whether the number of IoTSE has an impact on the results. Similarly, the number of data sources can be supplemented by other data sources identified as the most popular; especially given Oracle and MS SQL are somteimes found to have the highest number of vulnerabilities. Although our aim was to propose the tool for investigating databases only, further studies may also cover other “types of devices”, such as Network Equipments, Terminal, Server, Office Equipment, Industrial Control Equipment, Smart Home, Power Supply Equipment, Web Camera, Remote Management Equipment, Blockchain and industrial based connected devices in the cloud. At the moment, the future study aims to apply the tool to specific countries of Latvia, Lithuania and Estonia and to carry out extensive investigation on the current state of data sources and their security. This will allow conclusions to be drawn on differences in country patterns, i.e. whether the technological development of Estonia will be also seen in this matter. It will draw more objective conclusions on the less protected-by-design data sources.
  • 11. RESULTS AND CONCLUSIONS I The paper proposes a tool called ShoBeVODSDT - Shodan- and Binary Edge- based vulnerable open data sources detection tool, for non-intrusive testing of open data sources for detecting their vulnerabilities. ShoBeVODSDT: supports the identification of vulnerabilities at early security assessment stages and does not require the implementation of active and possibly disruptive techniques; uses two IoTSE (Shodan and Binary Edge) by extending their features with the advanced capabilities built in it; allows inspecting 8 predefined data sources - MySQL, PostgreSQL, MongoDB, Redis, Elasticsearch, CouchDB, Cassandra and Memcached, on their vulnerabilities and their extent. While the tool covers 8 data sources representing both rational databases, NoSQL databases and data stores, it is designed to be easily scalable by extending the publicly available code  https://github.com/zhmyh/ShoBEVODST https://www.eosc-hub.eu/open-science-info
  • 12. RESULTS AND CONCLUSIONS II The total number of open data sources available to everyone (who wants to access them) is not very high, i.e. less than 2% of the data sources scanned. BUT, there are data sources that may pose risks to organizations, since external users can access the information that can be used for further attacks. For 12% of ispected data sources this has already taken place. Security features built into the database allow to protect against unauthorized access, but there are databases with low security features, where we were able to connect to nearly all IP addresses by retrieving information from them. Even more, in some cases the databases, which do not use security mechanisms, have been already compromised.
  • 13. THANK YOU FOR ATTENTION! QUESTIONS? For more information, see ResearchGate See also anastasijanikiforova.com For questions or any other queries, contact me via email - Anastasija.Nikiforova@lu.lv