SlideShare a Scribd company logo
1 of 13
Download to read offline
How to search extracted data
© Copyright 2015 NowSecure, Inc.
Javier Collado
● It’s hard to decode data for each application with limited resources
○ There are a lot of applications
○ Each application version might change:
■ format (file type, database schema)
■ content (new and interesting data)
● Many applications store data in SQLite databases
Data extraction in mobile devices
© Copyright 2015 NowSecure, Inc.
● Libraries
○ Low level interface
○ Examples: lucene, xapian, whoosh
● Servers
○ High level interface
○ Examples: solr, elasticsearch, sphinx
Index and search
© Copyright 2015 NowSecure, Inc.
● Very flexible and permissive: each value has its own type
● Storage class: group of related datatypes (different lengths, encodings, …)
● Type affinity: preferred storage class for a column based on column type
● Not all the content should be indexed:
○ sqlite_master, sqlite_sequence
○ FTS tables
○ BLOBs
SQLite
© Copyright 2015 NowSecure, Inc.
sqlite> CREATE TABLE names (id INTEGER, name TEXT);
sqlite> INSERT INTO names VALUES (1, "Alice");
sqlite> INSERT INTO names VALUES ("Bob", 2);
sqlite> SELECT typeof(id), id, typeof(name), name FROM names;
integer|1|text|Alice
text|Bob|text|2
sqlite>
SQLite
© Copyright 2015 NowSecure, Inc.
sqlite> CREATE TABLE names (id INTEGER name TEXT);
sqlite> .schema names
CREATE TABLE names (id INTEGER name TEXT);
sqlite> INSERT INTO names VALUES (1, "Alice");
Error: table names has 1 columns but 2 values were supplied
SQLite
© Copyright 2015 NowSecure, Inc.
● Search server
● Document oriented (json)
● RESTful API
● Schema (mapping) not required, but needed to avoid errors due to SQLite flexibility
ElasticSearch
© Copyright 2015 NowSecure, Inc.
$ curl -XPOST 'http://localhost:9200/dfrws/names' -d '{id: 1, name: "Alice"}'
{"_index":"dfrws","_type":"names","_id":"AUxNeQ7-7Nsk22Tyod1W","_version":1,"created":true}
$ curl -XPOST 'http://localhost:9200/dfrws/names' -d '{id: "Bob", name: 2}'
{"error":"MapperParsingException[failed to parse [id]]; nested: NumberFormatException[For input string: "
Bob"]; ","status":400}
$ curl -XGET 'http://localhost:9200/dfrws/_mapping/names'
{"dfrws":{"mappings":{"names":{"properties":{"id":{"type":"long"},"name":{"type":"string"}}}}}}
ElasticSearch
© Copyright 2015 NowSecure, Inc.
$ curl -XPOST 'http://localhost:9200/dfrws/_names' -d '{id: 1, name: "Alice"}'
{"error":"InvalidTypeNameException[mapping type name [_names] can't start with '_']","status":400}
$ curl -XGET 'http://localhost:9200/dfrws/names/_search' -d '{query: {match: {name: "Alice"}}}'
{"took":27,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":
0.30685282,"hits":[{"_index":"dfrws","_type":"names","_id":"AUxNeQ7-7Nsk22Tyod1W","_score":0.30685282,"
_source":{id: 1, name: "Alice"}}]}}
ElasticSearch
© Copyright 2015 NowSecure, Inc.
● https://github.com/jcollado/esis
● Command line tool written in python
○ Ability to index every row in every table in every database file found under a given directory
○ Ability to search using simple queries
Example tool
© Copyright 2015 NowSecure, Inc.
● SQLite content can be indexed in elasticsearch but…
○ Types need to be consistent
○ Not relevant information needs to be discarded
Conclusions
© Copyright 2015 NowSecure, Inc.
● Index text information from other file types (Apache Tika)
● Regular expressions
● Highlight search results
● Search suggestions
● Language detection and custom analyzers
● Proximity matching (match vs. match_phrase)
Future work
© Copyright 2015 NowSecure, Inc.
© Copyright 2015 NowSecure, Inc.
Thanks

More Related Content

What's hot

SXML: S-expression eXtensible Markup Language
SXML: S-expression eXtensible Markup LanguageSXML: S-expression eXtensible Markup Language
SXML: S-expression eXtensible Markup Languageelliando dias
 
JS App Architecture
JS App ArchitectureJS App Architecture
JS App ArchitectureCorey Butler
 
DBIx::Class walkthrough @ bangalore pm
DBIx::Class walkthrough @ bangalore pmDBIx::Class walkthrough @ bangalore pm
DBIx::Class walkthrough @ bangalore pmSheeju Alex
 
Introduction to CouchDB - LA Hacker News
Introduction to CouchDB - LA Hacker NewsIntroduction to CouchDB - LA Hacker News
Introduction to CouchDB - LA Hacker NewsMichael Parker
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring DataEric Bottard
 
NoSQL Overview
NoSQL OverviewNoSQL Overview
NoSQL Overviewadesso AG
 
Data-Defined Typed Schema Generation in Accumulo
Data-Defined Typed Schema Generation in AccumuloData-Defined Typed Schema Generation in Accumulo
Data-Defined Typed Schema Generation in AccumuloAccumulo Summit
 
Json Persistence Framework
Json Persistence FrameworkJson Persistence Framework
Json Persistence Frameworkdanieloskarsson
 
An Evening with MongoDB - Orlando: Welcome and Keynote
An Evening with MongoDB - Orlando: Welcome and KeynoteAn Evening with MongoDB - Orlando: Welcome and Keynote
An Evening with MongoDB - Orlando: Welcome and KeynoteMongoDB
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsSpringPeople
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo dbRohit Bishnoi
 
PHP Training Session 6
PHP Training Session 6PHP Training Session 6
PHP Training Session 6Vishal Kothari
 
Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)
Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)
Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)Sirar Salih
 

What's hot (20)

SXML: S-expression eXtensible Markup Language
SXML: S-expression eXtensible Markup LanguageSXML: S-expression eXtensible Markup Language
SXML: S-expression eXtensible Markup Language
 
Object Storage
Object StorageObject Storage
Object Storage
 
JS App Architecture
JS App ArchitectureJS App Architecture
JS App Architecture
 
DBIx::Class walkthrough @ bangalore pm
DBIx::Class walkthrough @ bangalore pmDBIx::Class walkthrough @ bangalore pm
DBIx::Class walkthrough @ bangalore pm
 
Introduction to CouchDB - LA Hacker News
Introduction to CouchDB - LA Hacker NewsIntroduction to CouchDB - LA Hacker News
Introduction to CouchDB - LA Hacker News
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
NoSQL Overview
NoSQL OverviewNoSQL Overview
NoSQL Overview
 
Data-Defined Typed Schema Generation in Accumulo
Data-Defined Typed Schema Generation in AccumuloData-Defined Typed Schema Generation in Accumulo
Data-Defined Typed Schema Generation in Accumulo
 
Json Persistence Framework
Json Persistence FrameworkJson Persistence Framework
Json Persistence Framework
 
An Evening with MongoDB - Orlando: Welcome and Keynote
An Evening with MongoDB - Orlando: Welcome and KeynoteAn Evening with MongoDB - Orlando: Welcome and Keynote
An Evening with MongoDB - Orlando: Welcome and Keynote
 
Mysql DBI
Mysql DBIMysql DBI
Mysql DBI
 
MongoDB
MongoDBMongoDB
MongoDB
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 
Php 2
Php 2Php 2
Php 2
 
Python Files
Python FilesPython Files
Python Files
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
PHP Training Session 6
PHP Training Session 6PHP Training Session 6
PHP Training Session 6
 
Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)
Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)
Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)
 

Viewers also liked

Class Inventions New
Class Inventions NewClass Inventions New
Class Inventions NewStormes
 
Bdi Linkedin
Bdi LinkedinBdi Linkedin
Bdi LinkedinDee Mehta
 
Invest.Acktual English April 2309 V 2.0
Invest.Acktual English April 2309 V 2.0Invest.Acktual English April 2309 V 2.0
Invest.Acktual English April 2309 V 2.0Elias de la Cruz
 
Essere Madre, Essere Padre oggi
Essere Madre, Essere Padre oggiEssere Madre, Essere Padre oggi
Essere Madre, Essere Padre oggiMarco Scarmagnani
 
Heart diseases
Heart diseasesHeart diseases
Heart diseasesHeena Modi
 
The gorgeous pearl
The gorgeous pearlThe gorgeous pearl
The gorgeous pearlHeena Modi
 
Pictures Of Zanzibar
Pictures Of ZanzibarPictures Of Zanzibar
Pictures Of ZanzibarHeena Modi
 
Zanzibar, The old days
Zanzibar, The old daysZanzibar, The old days
Zanzibar, The old daysHeena Modi
 

Viewers also liked (9)

Class Inventions New
Class Inventions NewClass Inventions New
Class Inventions New
 
Famiglia Lavoro
Famiglia LavoroFamiglia Lavoro
Famiglia Lavoro
 
Bdi Linkedin
Bdi LinkedinBdi Linkedin
Bdi Linkedin
 
Invest.Acktual English April 2309 V 2.0
Invest.Acktual English April 2309 V 2.0Invest.Acktual English April 2309 V 2.0
Invest.Acktual English April 2309 V 2.0
 
Essere Madre, Essere Padre oggi
Essere Madre, Essere Padre oggiEssere Madre, Essere Padre oggi
Essere Madre, Essere Padre oggi
 
Heart diseases
Heart diseasesHeart diseases
Heart diseases
 
The gorgeous pearl
The gorgeous pearlThe gorgeous pearl
The gorgeous pearl
 
Pictures Of Zanzibar
Pictures Of ZanzibarPictures Of Zanzibar
Pictures Of Zanzibar
 
Zanzibar, The old days
Zanzibar, The old daysZanzibar, The old days
Zanzibar, The old days
 

Similar to How to search extracted data

Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1medcl
 
Managing Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchManaging Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchVic Hargrave
 
OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQLLuca Garulli
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Michael Rys
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...NoSQLmatters
 
iOS & Drupal
iOS & DrupaliOS & Drupal
iOS & DrupalFoti Dim
 
Building Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query EnginesBuilding Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query EnginesMapR Technologies
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformAntonio Peric-Mazar
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life琛琳 饶
 
Elastic search intro-@lamper
Elastic search intro-@lamperElastic search intro-@lamper
Elastic search intro-@lampermedcl
 
Json to hive_schema_generator
Json to hive_schema_generatorJson to hive_schema_generator
Json to hive_schema_generatorPayal Jain
 
曾勇 Elastic search-intro
曾勇 Elastic search-intro曾勇 Elastic search-intro
曾勇 Elastic search-introShaoning Pan
 
RESTFul development with Apache sling
RESTFul development with Apache slingRESTFul development with Apache sling
RESTFul development with Apache slingSergii Fesenko
 
Discover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQLDiscover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQLDave Stokes
 
Discover The Power of NoSQL + MySQL with MySQL
Discover The Power of NoSQL + MySQL with MySQLDiscover The Power of NoSQL + MySQL with MySQL
Discover The Power of NoSQL + MySQL with MySQLDave Stokes
 
MySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document StoreMySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document StoreOlivier DASINI
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseRobert Lujo
 

Similar to How to search extracted data (20)

Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
 
Managing Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchManaging Your Security Logs with Elasticsearch
Managing Your Security Logs with Elasticsearch
 
OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQL
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
 
iOS & Drupal
iOS & DrupaliOS & Drupal
iOS & Drupal
 
Building Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query EnginesBuilding Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query Engines
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
REST easy with API Platform
REST easy with API PlatformREST easy with API Platform
REST easy with API Platform
 
War of the Indices- SQL vs. Oracle
War of the Indices-  SQL vs. OracleWar of the Indices-  SQL vs. Oracle
War of the Indices- SQL vs. Oracle
 
Elastic search intro-@lamper
Elastic search intro-@lamperElastic search intro-@lamper
Elastic search intro-@lamper
 
Json to hive_schema_generator
Json to hive_schema_generatorJson to hive_schema_generator
Json to hive_schema_generator
 
曾勇 Elastic search-intro
曾勇 Elastic search-intro曾勇 Elastic search-intro
曾勇 Elastic search-intro
 
RESTFul development with Apache sling
RESTFul development with Apache slingRESTFul development with Apache sling
RESTFul development with Apache sling
 
Discover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQLDiscover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQL
 
Discover The Power of NoSQL + MySQL with MySQL
Discover The Power of NoSQL + MySQL with MySQLDiscover The Power of NoSQL + MySQL with MySQL
Discover The Power of NoSQL + MySQL with MySQL
 
MySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document StoreMySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document Store
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
 

Recently uploaded

Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

How to search extracted data

  • 1. How to search extracted data © Copyright 2015 NowSecure, Inc. Javier Collado
  • 2. ● It’s hard to decode data for each application with limited resources ○ There are a lot of applications ○ Each application version might change: ■ format (file type, database schema) ■ content (new and interesting data) ● Many applications store data in SQLite databases Data extraction in mobile devices © Copyright 2015 NowSecure, Inc.
  • 3. ● Libraries ○ Low level interface ○ Examples: lucene, xapian, whoosh ● Servers ○ High level interface ○ Examples: solr, elasticsearch, sphinx Index and search © Copyright 2015 NowSecure, Inc.
  • 4. ● Very flexible and permissive: each value has its own type ● Storage class: group of related datatypes (different lengths, encodings, …) ● Type affinity: preferred storage class for a column based on column type ● Not all the content should be indexed: ○ sqlite_master, sqlite_sequence ○ FTS tables ○ BLOBs SQLite © Copyright 2015 NowSecure, Inc.
  • 5. sqlite> CREATE TABLE names (id INTEGER, name TEXT); sqlite> INSERT INTO names VALUES (1, "Alice"); sqlite> INSERT INTO names VALUES ("Bob", 2); sqlite> SELECT typeof(id), id, typeof(name), name FROM names; integer|1|text|Alice text|Bob|text|2 sqlite> SQLite © Copyright 2015 NowSecure, Inc.
  • 6. sqlite> CREATE TABLE names (id INTEGER name TEXT); sqlite> .schema names CREATE TABLE names (id INTEGER name TEXT); sqlite> INSERT INTO names VALUES (1, "Alice"); Error: table names has 1 columns but 2 values were supplied SQLite © Copyright 2015 NowSecure, Inc.
  • 7. ● Search server ● Document oriented (json) ● RESTful API ● Schema (mapping) not required, but needed to avoid errors due to SQLite flexibility ElasticSearch © Copyright 2015 NowSecure, Inc.
  • 8. $ curl -XPOST 'http://localhost:9200/dfrws/names' -d '{id: 1, name: "Alice"}' {"_index":"dfrws","_type":"names","_id":"AUxNeQ7-7Nsk22Tyod1W","_version":1,"created":true} $ curl -XPOST 'http://localhost:9200/dfrws/names' -d '{id: "Bob", name: 2}' {"error":"MapperParsingException[failed to parse [id]]; nested: NumberFormatException[For input string: " Bob"]; ","status":400} $ curl -XGET 'http://localhost:9200/dfrws/_mapping/names' {"dfrws":{"mappings":{"names":{"properties":{"id":{"type":"long"},"name":{"type":"string"}}}}}} ElasticSearch © Copyright 2015 NowSecure, Inc.
  • 9. $ curl -XPOST 'http://localhost:9200/dfrws/_names' -d '{id: 1, name: "Alice"}' {"error":"InvalidTypeNameException[mapping type name [_names] can't start with '_']","status":400} $ curl -XGET 'http://localhost:9200/dfrws/names/_search' -d '{query: {match: {name: "Alice"}}}' {"took":27,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score": 0.30685282,"hits":[{"_index":"dfrws","_type":"names","_id":"AUxNeQ7-7Nsk22Tyod1W","_score":0.30685282," _source":{id: 1, name: "Alice"}}]}} ElasticSearch © Copyright 2015 NowSecure, Inc.
  • 10. ● https://github.com/jcollado/esis ● Command line tool written in python ○ Ability to index every row in every table in every database file found under a given directory ○ Ability to search using simple queries Example tool © Copyright 2015 NowSecure, Inc.
  • 11. ● SQLite content can be indexed in elasticsearch but… ○ Types need to be consistent ○ Not relevant information needs to be discarded Conclusions © Copyright 2015 NowSecure, Inc.
  • 12. ● Index text information from other file types (Apache Tika) ● Regular expressions ● Highlight search results ● Search suggestions ● Language detection and custom analyzers ● Proximity matching (match vs. match_phrase) Future work © Copyright 2015 NowSecure, Inc.
  • 13. © Copyright 2015 NowSecure, Inc. Thanks