SlideShare a Scribd company logo
1 of 18
Imagine that you have millions of XML files
And you want to extract specific data from it and load into SQL or NoSQL database
OR
SQL
Database
NoSQL
Database
XMLParser
o You have too many file types or they have no fixed structure?
o You need very precise control over the data you extract? For example,
you need extract only 15% of data? And be able to extract zip code
from string “Amsterdam 1019” and place it in one column and city name
in another?
o You don’t want to write code at all?
o You want to have universal solution, but not single script for each
document type like single developer or GPT chat most likely can do?
XMLParser
Our solution for you if:
We provide solution that can:
Detect Morphology
of Document
Extract an arbitrary set
of fields
Load result
to database
XMLParser
1. You can define an arbitrary number of document types
2. For each type you can specify arbitrary parsing and extraction rules
3. You can get update for database records from separate documents
XMLParser
Can process documents with missing sections or with identical
sections with different spelling but the same content
Data with missed sections:
XMLParser
Expected structure: Restored document structure:
<lots>
<lot>
<object>
<name>Laptop</number>
<price>2700</price>
</object>
</lot>
</lots>
<lot>
<name>Laptop</number>
<price>2700</price>
</lot>
<lots>
<lot>
<name></number>
<price></price>
</lot>
</lots>
Can process documents with missing sections or with identical
sections with different spelling but the same content
Data with unexpected naming and
structure:
Expected structure: Restored document structure:
XMLParser
<lots>
<lot-item>
<object>
<purchase>
<name>Laptop</number>
<price>2700</price>
</purchase>
</object>
</lot-item>
</lots>
<lots>
<lot>
<name></number>
<price></price>
</lot>
</lots>
<lots>
<lot>
<object>
<name>Laptop</number>
<price>2700</price>
</object>
</lot>
</lots>
Documents can be very complex, and you can extract only required data set
XMLParser
Every XML can be represented as JSON or SQL
XMLParser
When generating SQL, you can preserve the parent-child relation
XMLParser
You can flexibly customize the SQL generation
o You can set table constrains and generate ON CONFLICT DO UPDATE
queries to prevent rows duplications;
o You can insert multiple VALUES in single query;
o You can use some documents for updating already exists records in DB;
o You can generate plain text SQL for debugging or for other purposes.
XMLParser
You can specify when to extract data as an object, and when as a list of objects
XMLParser
<lots>
<lot>
<object>
<name>Laptop</number>
<price>2700</price>
</object>
</lot>
</lots>
{"object_name": "Laptop", "price": 2700}
[
{"object_name": "Laptop", "price": 2700}
]
Extract values if they are placed as tag attributes in same way as there
were separated tags:
=
‘
XMLParser
<lots>
<lot>
<object>
<name>Laptop 1</number>
<price>1650</price>
</object>
<object>
<name>Laptop 2</number>
<price>2100</price>
</object>
</lot>
<lot>
<object>
<name>Laptop 3</number>
<price>2700</price>
</object>
</lot>
</lots>
<lots>
<lot>
<object name=”Laptop 1” price=1650/>
</lot>
<lot>
<object name=” Laptop 2” price=2100/>
</lot>
<lot>
<object name=” Laptop 3” price=2700/>
</lot>
</lots>
Forget about them, you don't need them anymore!
XSD schemes that never work?
XMLParser
You can do very tricky type conversion
<?xml version="1.0"?>
<price>”1780”<price>
1780.00
1780
<?xml version="1.0"?>
<status>on<status>
to bool
TRUE
<?xml version="1.0"?>
<status>off<status>
FALSE
to bool
XMLParser
String
In more complex cases you may use build in Natural Language Processing.
It’s very tiny but with it you can for example
evaluate some text as logical expression and store in database only
true/false flag instead of full text:
<id>1</id>
<order>
<status>canceled by user</status>
</order>
<id>1</id>
<order>
<status>the order was canceled</status>
</order>
OR
XMLParser
id order_canceled
1 TRUE
<order>
<address>country: Netherlands city: Amsterdam, phone: +31620874518</address>
</order>
XMLParser
In more complex cases you may use build in Natural Language Processing.
It’s very tiny but with it you can for example
parse text (if it have some regular patterns) and place some it’s parts as
separate fields:
country city phone
Netherlands Amsterdam +31620874518
<order>
<price>$1780</price>
</order>
Symbol of dollar removed
to allow place data to numeric field:
<order>
<bankCardNumber>4716957520893445</bankCardNumber>
</order>
Card number was masked:
XMLParser
In more complex cases you may use build in Natural Language Processing.
It’s very tiny but with it you can for example for
do some data cleaning/filtering/masking
currency price
USD 1780
id cardNumber
1 47**********3445
bubnenkoff@gmail.com
XMLParser
+ Flexible and lightweight;
+ Easy to scale and suitable for big data;
+ Not a cloud solution. Your data remains private
and local;
+ Doesn’t require knowledge of programming;
+ Built-in tiny Natural Language Processing engine
for complex rules;
+ Supports multi-model data representation.
Easy way to parse XML and load data into SQL/NoSQL database.

More Related Content

Similar to XMLParser functionality demonstration...

Similar to XMLParser functionality demonstration... (20)

xml-150211140504-conversion-gate01 (1).pdf
xml-150211140504-conversion-gate01 (1).pdfxml-150211140504-conversion-gate01 (1).pdf
xml-150211140504-conversion-gate01 (1).pdf
 
Xml
XmlXml
Xml
 
Web Developement Workshop (Oct 2009) Slides
Web Developement Workshop (Oct 2009) SlidesWeb Developement Workshop (Oct 2009) Slides
Web Developement Workshop (Oct 2009) Slides
 
Javascript2839
Javascript2839Javascript2839
Javascript2839
 
Json
JsonJson
Json
 
XML Presentation-2
XML Presentation-2XML Presentation-2
XML Presentation-2
 
XML notes.pptx
XML notes.pptxXML notes.pptx
XML notes.pptx
 
Xml
XmlXml
Xml
 
Web Services Part 1
Web Services Part 1Web Services Part 1
Web Services Part 1
 
Xml
XmlXml
Xml
 
Digital + Container List
Digital + Container ListDigital + Container List
Digital + Container List
 
Xml
XmlXml
Xml
 
uptu web technology unit 2 Xml2
uptu web technology unit 2 Xml2uptu web technology unit 2 Xml2
uptu web technology unit 2 Xml2
 
Everything You Always Wanted To Know About XML But Were Afraid To Ask
Everything You Always Wanted To Know About XML But Were Afraid To AskEverything You Always Wanted To Know About XML But Were Afraid To Ask
Everything You Always Wanted To Know About XML But Were Afraid To Ask
 
Xml and DTD's
Xml and DTD'sXml and DTD's
Xml and DTD's
 
xml.pptx
xml.pptxxml.pptx
xml.pptx
 
Introduction to xml
Introduction to xmlIntroduction to xml
Introduction to xml
 
Xml
XmlXml
Xml
 
Architecture | Busy Java Developers Guide to NoSQL | Ted Neward
Architecture | Busy Java Developers Guide to NoSQL | Ted NewardArchitecture | Busy Java Developers Guide to NoSQL | Ted Neward
Architecture | Busy Java Developers Guide to NoSQL | Ted Neward
 
XML/XSLT
XML/XSLTXML/XSLT
XML/XSLT
 

Recently uploaded

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 

Recently uploaded (20)

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 

XMLParser functionality demonstration...

  • 1. Imagine that you have millions of XML files And you want to extract specific data from it and load into SQL or NoSQL database OR SQL Database NoSQL Database XMLParser
  • 2. o You have too many file types or they have no fixed structure? o You need very precise control over the data you extract? For example, you need extract only 15% of data? And be able to extract zip code from string “Amsterdam 1019” and place it in one column and city name in another? o You don’t want to write code at all? o You want to have universal solution, but not single script for each document type like single developer or GPT chat most likely can do? XMLParser Our solution for you if:
  • 3. We provide solution that can: Detect Morphology of Document Extract an arbitrary set of fields Load result to database XMLParser
  • 4. 1. You can define an arbitrary number of document types 2. For each type you can specify arbitrary parsing and extraction rules 3. You can get update for database records from separate documents XMLParser
  • 5. Can process documents with missing sections or with identical sections with different spelling but the same content Data with missed sections: XMLParser Expected structure: Restored document structure: <lots> <lot> <object> <name>Laptop</number> <price>2700</price> </object> </lot> </lots> <lot> <name>Laptop</number> <price>2700</price> </lot> <lots> <lot> <name></number> <price></price> </lot> </lots>
  • 6. Can process documents with missing sections or with identical sections with different spelling but the same content Data with unexpected naming and structure: Expected structure: Restored document structure: XMLParser <lots> <lot-item> <object> <purchase> <name>Laptop</number> <price>2700</price> </purchase> </object> </lot-item> </lots> <lots> <lot> <name></number> <price></price> </lot> </lots> <lots> <lot> <object> <name>Laptop</number> <price>2700</price> </object> </lot> </lots>
  • 7. Documents can be very complex, and you can extract only required data set XMLParser
  • 8. Every XML can be represented as JSON or SQL XMLParser
  • 9. When generating SQL, you can preserve the parent-child relation XMLParser
  • 10. You can flexibly customize the SQL generation o You can set table constrains and generate ON CONFLICT DO UPDATE queries to prevent rows duplications; o You can insert multiple VALUES in single query; o You can use some documents for updating already exists records in DB; o You can generate plain text SQL for debugging or for other purposes. XMLParser
  • 11. You can specify when to extract data as an object, and when as a list of objects XMLParser <lots> <lot> <object> <name>Laptop</number> <price>2700</price> </object> </lot> </lots> {"object_name": "Laptop", "price": 2700} [ {"object_name": "Laptop", "price": 2700} ]
  • 12. Extract values if they are placed as tag attributes in same way as there were separated tags: = ‘ XMLParser <lots> <lot> <object> <name>Laptop 1</number> <price>1650</price> </object> <object> <name>Laptop 2</number> <price>2100</price> </object> </lot> <lot> <object> <name>Laptop 3</number> <price>2700</price> </object> </lot> </lots> <lots> <lot> <object name=”Laptop 1” price=1650/> </lot> <lot> <object name=” Laptop 2” price=2100/> </lot> <lot> <object name=” Laptop 3” price=2700/> </lot> </lots>
  • 13. Forget about them, you don't need them anymore! XSD schemes that never work? XMLParser
  • 14. You can do very tricky type conversion <?xml version="1.0"?> <price>”1780”<price> 1780.00 1780 <?xml version="1.0"?> <status>on<status> to bool TRUE <?xml version="1.0"?> <status>off<status> FALSE to bool XMLParser String
  • 15. In more complex cases you may use build in Natural Language Processing. It’s very tiny but with it you can for example evaluate some text as logical expression and store in database only true/false flag instead of full text: <id>1</id> <order> <status>canceled by user</status> </order> <id>1</id> <order> <status>the order was canceled</status> </order> OR XMLParser id order_canceled 1 TRUE
  • 16. <order> <address>country: Netherlands city: Amsterdam, phone: +31620874518</address> </order> XMLParser In more complex cases you may use build in Natural Language Processing. It’s very tiny but with it you can for example parse text (if it have some regular patterns) and place some it’s parts as separate fields: country city phone Netherlands Amsterdam +31620874518
  • 17. <order> <price>$1780</price> </order> Symbol of dollar removed to allow place data to numeric field: <order> <bankCardNumber>4716957520893445</bankCardNumber> </order> Card number was masked: XMLParser In more complex cases you may use build in Natural Language Processing. It’s very tiny but with it you can for example for do some data cleaning/filtering/masking currency price USD 1780 id cardNumber 1 47**********3445
  • 18. bubnenkoff@gmail.com XMLParser + Flexible and lightweight; + Easy to scale and suitable for big data; + Not a cloud solution. Your data remains private and local; + Doesn’t require knowledge of programming; + Built-in tiny Natural Language Processing engine for complex rules; + Supports multi-model data representation. Easy way to parse XML and load data into SQL/NoSQL database.