SlideShare a Scribd company logo
Delta for your data
How to efficiently track and store deeply nested data changes
By Sep Dehpour
Oct 2020
What is already out there?
Git
Git
[{
"key1": "value1",
"key2": "value2"
}]
Git
[{
"key2": "value2",
"key1": "value1"
}]
diff & patch
diff -u file1.html file2.html > patchfile.patch
patch file1.html patchfile.patch
Quilt
"Manage data like code"
https://github.com/quiltdata/quilt
QRI
https://qri.io
Similar to Quilt but built on top of ipfs.io
DVID
dvid.io
Distributed Versioned Image-oriented Dataservice
Dolt
https://www.dolthub.com/
Git for data experience in a SQL database,
Databricks Delta Lake
https://delta.io/
storage layer that brings ACID
transactions to Apache Spark™ and big data workloads.
DeepDiff
https://zepworks.com
pip install deepdiff
t1.csv
first_name,last_name,zip
Joe,Nobody,90011
Jimmy,Brown,90007
Sara,Smith,90007
t2.csv
first_name,last_name,zip
Joe,Nobody,90011
Jimmy,Brown,90404
Sara,Smith,90007
$ deepdiff t1.csv t2.csv
{ 'values_changed': { "root[1]['zip']": { 'new_value': '90404',
'old_value': '90007'}}}
t2.csv
first_name,last_name,zip
Joe,Nobody,90011
Jimmy,Brown,90404
Sara,Smith,90007
t3.csv
first_name,last_name,zip
Joe,Nobody,90011
Sara,Smith,90007
Jimmy,Brown,90404
$ deepdiff t1.csv t3.csv
$ deepdiff t1.csv t3.csv
{ 'values_changed': { "root[1]['first_name']": { 'new_value': 'Sara',
'old_value': 'Jimmy'},
"root[1]['last_name']": { 'new_value': 'Smith',
'old_value': 'Brown'},
"root[1]['zip']": { 'new_value': '90007',
'old_value': '90007'},
"root[2]['first_name']": { 'new_value': 'Jimmy',
'old_value': 'Sara'},
"root[2]['last_name']": { 'new_value': 'Brown',
'old_value': 'Smith'},
"root[2]['zip']": { 'new_value': '90404',
'old_value': '90007'}}}
$ deepdiff t1.csv t3.csv --ignore-order
{ 'values_changed': { "root[1]['zip']": { 'new_value': '90404',
'old_value': '90007'}}}
t4.csv
first_name,last_name,zip
Joe,Nobody,90011
Jimmy,Brown,90404
Sara,Smith,90007
Jimmy,Brown,90404
deepdiff t1.csv t4.csv --ignore-order
{ 'values_changed': { "root[1]['zip']": { 'new_value': '90404',
'old_value': '90007'}}}
Let's make a patch aka delta
deepdiff t1.csv t4.csv --ignore-order --report-repetition --create-patch > patch1
t1new.csv
first_name,last_name,zip
Joe,Nobody,90011
Jimmy,Brown,90007
Sara,Smith,90007
John,Doe,90001
Max,Foo,23232
DeepPatch
first_name,last_name,zip
Joe,Nobody,90011
Jimmy,Brown,90007
Sara,Smith,90007
John,Doe,90001
Max,Foo,23232
DeepPatch
$ deeppatch t1new.csv patch1 --backup
first_name,last_name,zip
Joe,Nobody,90011
Jimmy,Brown,90007
Sara,Smith,90007
John,Doe,90001
Max,Foo,23232
DeepPatch
$ deeppatch t1new.csv patch1 --backup
first_name,last_name,zip
Joe,Nobody,90011
Jimmy,Brown,90404
Sara,Smith,90007
John,Doe,90001
Max,Foo,23232
files
$ deepdiff t1.csv t1new.csv --ignore-order --report-repetition --create-patch > patch2
deepdiff t1.csv t4.csv --ignore-order --report-repetition --create-patch > patch1
t1.csv
patch1
patch2
deeppatch t1.csv patch1 --backup
deeppatch t1.csv patch2 --backup
some.yaml
---
![bg original](present_files/logo_long_B1_black2_16_9_big.svg)
-
first_name: Joe
last_name: Nobody
address: 3232 Main st.
phone: 323-123-2345
zip: 90011
-
first_name: Jimmy
last_name: Brown
address: 11th There sq.
phone: 111-123-9911
zip: 90002
-
first_name: Sara
last_name: Smith
address: Downtown LA
phone: 818-113-2005
zip: 90007
$ deeppatch some.yaml patch1 --backup
---
![bg original](present_files/logo_long_B1_black2_16_9_big.svg)
-
first_name: Joe
last_name: Nobody
address: 3232 Main st.
phone: 323-123-2345
zip: 90011
-
first_name: Jimmy
last_name: Brown
address: 11th There sq.
phone: 111-123-9911
zip: 90002
-
first_name: Sara
last_name: Smith
address: Downtown LA
phone: 818-113-2005
zip: 90007
$ deeppatch some.yaml patch1 --backup
- address: 3232 Main st.
first_name: Joe
last_name: Nobody
phone: 323-123-2345
zip: 90011
- address: 11th There sq.
first_name: Jimmy
last_name: Brown
phone: 111-123-9911
zip: '90404'
- address: Downtown LA
first_name: Sara
last_name: Smith
phone: 818-113-2005
zip: 90007
$ deeppatch some.yaml patch2 --backup
- address: 3232 Main st.
first_name: Joe
last_name: Nobody
phone: 323-123-2345
zip: 90011
- address: 11th There sq.
first_name: Jimmy
last_name: Brown
phone: 111-123-9911
zip: '90404'
- address: Downtown LA
first_name: Sara
last_name: Smith
phone: 818-113-2005
zip: 90007
- first_name: John
last_name: Doe
zip: '90001'
- first_name: Max
last_name: Foo
zip: '23232'
Looking for something in the file?
DeepGrep
$ deepgrep sara some.yaml -i
DeepGrep
$ deepgrep sara some.yaml -i
{'matched_values': OrderedSet(["root[2]['first_name']"])}
Extracting the value?
$ deepgrep sara some.yaml -i
{'matched_values': OrderedSet(["root[2]['first_name']"])}
DeepExtract
$ deepgrep sara some.yaml -i
{'matched_values': OrderedSet(["root[2]['first_name']"])}
$ deepextract "root[2]" some.yaml
{ 'address': 'Downtown LA',
'first_name': 'Sara',
'last_name': 'Smith',
'phone': '818-113-2005',
'zip': 90007}
Nested Data
[
[
[
[["a", "b", "c", "d"], ["a", "e", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]],
[[10, "b", 2, "d"], ["a", "e", "c", "d"], ["f", 2, "d", 80]],
[["10", "d"], ["a", "d", 89], [10, 2, 2, 80]]
],
[
[["a", "b", "f", "d"], ["a", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]],
[[10, "b", 2, "d"], ["a", "e", "c", "d", "d"], ["f", 22, "d", 80]],
[["10", "dd"], ["a", "d", 89], [10, 2]]
],
[
[["a", "b", "c", "d"], ["a", "e", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]],
[["10", "d"], ["a", "d", 89], [10, 2, 2, 80]]
],
[
[["a", "b", "f", "d"], ["a", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]],
[[10, "b", 22, "d"], ["a", "e", "c", "d", "d"], ["f", 22, "d", 80]],
[["10", "dd"], ["a", "d", 89], [10, 2]]
]
],
[
[
[["a", "b", "f", "d"], ["a", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]],
[[10, "dada"], ["a", "e", "c", "d", "d"], ["f", 22, "d", 80]],
[["10", "dd"], ["a", "d", 389], [10, 2]]
],
[
[["a", "b", "c", "d"], ["a", "e", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]],
[[10, "b", 2, "d"], ["a", "e", "c", "d"], ["f", 2, "d", 80]],
[["10", "d"], ["a", "d", 89], [10, 2, 2, 80]]
],
[
[["a", "b", "c", "d"], ["a", "e", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]],
[["10", "d"], ["a", "d", 89], [10, 2, 2, 801]]
],
[
[["a", "b", "f", "d"], ["a", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]],
[[10, "b", 22, "d"], ["a", "e", "d"], ["f", 22, "d", 80]],
[["10", "dd"], ["a", "d", 89], [10, 2, 11]]
]
]
]
[
[
[
[["a", "b", "c", "d"], ["a", "e", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]],
[[1, "b", 2, "d"], ["a", "e", "c", "d"], ["f", 2, "d", 80]],
[["10", 0, "d"], ["a", "d", 89], [10, 2, 2, 80]]
],
[
[["a", "b", "f", "d"], ["a", "c", "d"], ["e", "c", "d", 9, 2, 4, 5]],
[[10, "dd3"], ["ab", "d", 89], [10, 2]],
[[10, "b", 2, "d"], ["a", "e", "c", "d", "d", "f"], ["f", 80]]
],
[
[["a", "b"], ["a", "e", "c1", "d"], ["e", "c", "d", 1, 2, 4, 5]],
[["b", 2, "d"], ["a", "e", "c", "d"], ["f", 2, "d", 80]],
[["10", "d"], ["a", "d", 89], [10, 2, 2, 80]]
],
[
[["a", "b", "f", "d"], ["a", "c", "d"], ["e", "c", "d", 9, 2, 4, 5]],
[[10, "b", 2, "d"], ["f", 9], ["f", 80]],
[["10", "dd1"], ["ab", "d", 89], [10, 2]]
]
],
[
[
[["a", "b", "c", "d"], ["a", "e", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]],
[[1, "b", 2, "d"], ["a", "e", "c", "d"], ["f", 2, "d", 80]],
[["10", 0, "d"], ["a", "d", 89], [10, 2, 2, 80]]
],
[
[["a12", "b", "f", "d"], ["a", "c", "d"], ["e", "c", "d", 9, 2, 4, 5]],
[[10, "dd3"], ["ab", "d", 89], [10, 2]],
[[10, "b", 2, "d"], ["a", "e", "c", "d", "d", "f"], ["f", 80]]
],
[
[["a", "b"], ["a", "e", "c1", "d"], ["e", "d", 1, 2, 4, 5]],
[["b", 22, "d"], ["a", "e", "c", "d"], ["f", 2, "d", 80]],
[["10", "d"], ["a", "d", 89], [10, 2, 2, 80]]
],
[
[["a", "b", "f", "d"], [], ["e", "c", "d", 9, 2, 4, 5]],
[[10, "b", 2, "d"], ["f", 9], ["f", 80]],
[["10", "dd1"], ["ab", "d", 89], [10, 2, "4"]]
]
]
]
nested_b_t1.json
[
{
"key3": [[[[[[[[[[1, 2, 4, 5]]], [[[8, 7, 3, 5]]]]]]]]]],
"key4": [7, 8]
},
{
"key5": "val5",
"key6": "val6"
}
]
nested_b_t2.json
[
{
"key5": "CHANGE",
"key6": "val6"
},
{
"key3": [[[[[[[[[[1, 3, 5, 4]]], [[[8, 8, 1, 5]]]]]]]]]],
"key4": [7, 8]
}
]
$ deepdiff nested_b_t1.json nested_b_t2.json
{ 'dictionary_item_added': [root[0]['key6'], root[0]['key5'],
root[1]['key4'], root[1]['key3']],
'dictionary_item_removed': [root[0]['key4'], root[0]['key3'],
root[1]['key6'], root[1]['key5']]}
$ deepdiff nested_b_t1.json nested_b_t2.json --ignore-order
{ 'values_changed': { 'root[0]': { 'new_value': { 'key5': 'CHANGE',
'key6': 'val6'},
'old_value': { 'key3': [ [ [ [ [ [ [ [ [ [ 1,
2,
4,
5]]],
[ [ [ 8,
7,
3,
5]]]]]]]]]],
'key4': [7, 8]}},
'root[1]': { 'new_value': { 'key3': [ [ [ [ [ [ [ [ [ [ 1,
3,
5,
4]]],
[ [ [ 8,
8,
1,
5]]]]]]]]]],
'key4': [7, 8]},
'old_value': { 'key5': 'val5',
'key6': 'val6'}}}}
$ deepdiff --help
--cutoff-distance-for-pairs FLOAT
[default: 0.3]
--cutoff-intersection-for-pairs FLOAT
[default: 0.7]
--cache-size INTEGER [default: 0]
--cache-tuning-sample-size INTEGER
[default: 0]
--cache-purge-level INTEGER RANGE
[default: 1]
--create-patch [default: False]
--exclude-paths TEXT
--exclude-regex-paths TEXT
--get-deep-distance [default: False]
--ignore-order [default: False]
--ignore-string-type-changes [default: False]
--ignore-numeric-type-changes [default: False]
--ignore-type-subclasses [default: False]
--ignore-string-case [default: False]
--ignore-nan-inequality [default: False]
--include-private-variables [default: False]
--log-frequency-in-sec INTEGER [default: 0]
--max-passes INTEGER [default: 10000000]
--max_diffs INTEGER
--number-format-notation [f|e] [default: f]
--progress-logger [info|error] [default: info]
--report-repetition [default: False]
--significant-digits INTEGER
--truncate-datetime [second|minute|hour|day]
--verbose-level INTEGER RANGE [default: 1]
--help Show this message and exit.
$ deepdiff nested_b_t1.json nested_b_t2.json --ignore-order
--cache-size 5000 --cutoff-intersection-for-pairs 1
{ 'iterable_item_removed': {"root[0]['key3'][0][0][0][0][0][0][1][0][0][1]": 7},
'values_changed': { "root[0]['key3'][0][0][0][0][0][0][0][0][0][1]": { 'new_value': 3,
'old_value': 2},
"root[0]['key3'][0][0][0][0][0][0][1][0][0][2]": { 'new_value': 1,
'old_value': 3},
"root[1]['key5']": { 'new_value': 'CHANGE',
'old_value': 'val5'}}}
File Types Supported
Json
Yaml
Toml
CSV
Pickle
Conclusion
Delta for your data
How to efficiently track and store deeply nested data changes
By Sep Dehpour
Oct 2020

More Related Content

Similar to Delta for your Data: How to efficiently track and store deeply nested data changes

Palestra sobre Collections com Python
Palestra sobre Collections com PythonPalestra sobre Collections com Python
Palestra sobre Collections com Python
pugpe
 
Schema Design by Chad Tindel, Solution Architect, 10gen
Schema Design  by Chad Tindel, Solution Architect, 10genSchema Design  by Chad Tindel, Solution Architect, 10gen
Schema Design by Chad Tindel, Solution Architect, 10gen
MongoDB
 
CouchDB
CouchDBCouchDB
CouchDB
codebits
 
pa-pe-pi-po-pure Python Text Processing
pa-pe-pi-po-pure Python Text Processingpa-pe-pi-po-pure Python Text Processing
pa-pe-pi-po-pure Python Text Processing
Rodrigo Senra
 
Resilient Taunton Watershed Network: Shaping the Future of Your Community
Resilient Taunton Watershed Network: Shaping the Future of Your CommunityResilient Taunton Watershed Network: Shaping the Future of Your Community
Resilient Taunton Watershed Network: Shaping the Future of Your Community
greenbelt82
 
Use of django at jolt online v3
Use of django at jolt online v3Use of django at jolt online v3
Use of django at jolt online v3
Jaime Buelta
 
Credit Risk Assessment using Machine Learning Techniques with WEKA
Credit Risk Assessment using Machine Learning Techniques with WEKACredit Risk Assessment using Machine Learning Techniques with WEKA
Credit Risk Assessment using Machine Learning Techniques with WEKA
Mehnaz Newaz
 
CJK Generation Panels Coordination Review
CJK Generation Panels Coordination ReviewCJK Generation Panels Coordination Review
CJK Generation Panels Coordination Review
Kenny Huang Ph.D.
 
R57php 1231677414471772-2
R57php 1231677414471772-2R57php 1231677414471772-2
R57php 1231677414471772-2
ady36
 
Moosecon native apps_blackberry_10-optimized
Moosecon native apps_blackberry_10-optimizedMoosecon native apps_blackberry_10-optimized
Moosecon native apps_blackberry_10-optimized
Heinrich Seeger
 
Visual Studio 2013, Xamarin and Microsoft Azure Mobile Services: A Match Made...
Visual Studio 2013, Xamarin and Microsoft Azure Mobile Services: A Match Made...Visual Studio 2013, Xamarin and Microsoft Azure Mobile Services: A Match Made...
Visual Studio 2013, Xamarin and Microsoft Azure Mobile Services: A Match Made...
Rick G. Garibay
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Takahiro Inoue
 
A Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsA Few of My Favorite (Python) Things
A Few of My Favorite (Python) Things
Michael Pirnat
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api Training
Spark Summit
 
The Ring programming language version 1.4.1 book - Part 14 of 31
The Ring programming language version 1.4.1 book - Part 14 of 31The Ring programming language version 1.4.1 book - Part 14 of 31
The Ring programming language version 1.4.1 book - Part 14 of 31
Mahmoud Samir Fayed
 
The Lean Startup - simplified
The Lean Startup - simplifiedThe Lean Startup - simplified
The Lean Startup - simplified
Stefano Bernardi
 
NodeConf OneShot Budapest — Production Ready Node.js by Nuno Job
NodeConf OneShot Budapest — Production Ready Node.js by Nuno JobNodeConf OneShot Budapest — Production Ready Node.js by Nuno Job
NodeConf OneShot Budapest — Production Ready Node.js by Nuno Job
Nuno Job
 
Steering Iterative and Incremental Delivery with Jeff Patton
Steering Iterative and Incremental Delivery with Jeff PattonSteering Iterative and Incremental Delivery with Jeff Patton
Steering Iterative and Incremental Delivery with Jeff Patton
UIEpreviews
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
pittaya
 
The Mythology of Big Data
The Mythology of Big DataThe Mythology of Big Data
The Mythology of Big Data
mark madsen
 

Similar to Delta for your Data: How to efficiently track and store deeply nested data changes (20)

Palestra sobre Collections com Python
Palestra sobre Collections com PythonPalestra sobre Collections com Python
Palestra sobre Collections com Python
 
Schema Design by Chad Tindel, Solution Architect, 10gen
Schema Design  by Chad Tindel, Solution Architect, 10genSchema Design  by Chad Tindel, Solution Architect, 10gen
Schema Design by Chad Tindel, Solution Architect, 10gen
 
CouchDB
CouchDBCouchDB
CouchDB
 
pa-pe-pi-po-pure Python Text Processing
pa-pe-pi-po-pure Python Text Processingpa-pe-pi-po-pure Python Text Processing
pa-pe-pi-po-pure Python Text Processing
 
Resilient Taunton Watershed Network: Shaping the Future of Your Community
Resilient Taunton Watershed Network: Shaping the Future of Your CommunityResilient Taunton Watershed Network: Shaping the Future of Your Community
Resilient Taunton Watershed Network: Shaping the Future of Your Community
 
Use of django at jolt online v3
Use of django at jolt online v3Use of django at jolt online v3
Use of django at jolt online v3
 
Credit Risk Assessment using Machine Learning Techniques with WEKA
Credit Risk Assessment using Machine Learning Techniques with WEKACredit Risk Assessment using Machine Learning Techniques with WEKA
Credit Risk Assessment using Machine Learning Techniques with WEKA
 
CJK Generation Panels Coordination Review
CJK Generation Panels Coordination ReviewCJK Generation Panels Coordination Review
CJK Generation Panels Coordination Review
 
R57php 1231677414471772-2
R57php 1231677414471772-2R57php 1231677414471772-2
R57php 1231677414471772-2
 
Moosecon native apps_blackberry_10-optimized
Moosecon native apps_blackberry_10-optimizedMoosecon native apps_blackberry_10-optimized
Moosecon native apps_blackberry_10-optimized
 
Visual Studio 2013, Xamarin and Microsoft Azure Mobile Services: A Match Made...
Visual Studio 2013, Xamarin and Microsoft Azure Mobile Services: A Match Made...Visual Studio 2013, Xamarin and Microsoft Azure Mobile Services: A Match Made...
Visual Studio 2013, Xamarin and Microsoft Azure Mobile Services: A Match Made...
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
 
A Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsA Few of My Favorite (Python) Things
A Few of My Favorite (Python) Things
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api Training
 
The Ring programming language version 1.4.1 book - Part 14 of 31
The Ring programming language version 1.4.1 book - Part 14 of 31The Ring programming language version 1.4.1 book - Part 14 of 31
The Ring programming language version 1.4.1 book - Part 14 of 31
 
The Lean Startup - simplified
The Lean Startup - simplifiedThe Lean Startup - simplified
The Lean Startup - simplified
 
NodeConf OneShot Budapest — Production Ready Node.js by Nuno Job
NodeConf OneShot Budapest — Production Ready Node.js by Nuno JobNodeConf OneShot Budapest — Production Ready Node.js by Nuno Job
NodeConf OneShot Budapest — Production Ready Node.js by Nuno Job
 
Steering Iterative and Incremental Delivery with Jeff Patton
Steering Iterative and Incremental Delivery with Jeff PattonSteering Iterative and Incremental Delivery with Jeff Patton
Steering Iterative and Incremental Delivery with Jeff Patton
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
The Mythology of Big Data
The Mythology of Big DataThe Mythology of Big Data
The Mythology of Big Data
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
Data Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
Data Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
Data Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 

Recently uploaded (20)

Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 

Delta for your Data: How to efficiently track and store deeply nested data changes

  • 1. Delta for your data How to efficiently track and store deeply nested data changes By Sep Dehpour Oct 2020
  • 2. What is already out there?
  • 3. Git
  • 6. diff & patch diff -u file1.html file2.html > patchfile.patch patch file1.html patchfile.patch
  • 7. Quilt "Manage data like code" https://github.com/quiltdata/quilt
  • 8. QRI https://qri.io Similar to Quilt but built on top of ipfs.io
  • 10. Dolt https://www.dolthub.com/ Git for data experience in a SQL database,
  • 11. Databricks Delta Lake https://delta.io/ storage layer that brings ACID transactions to Apache Spark™ and big data workloads.
  • 14. $ deepdiff t1.csv t2.csv { 'values_changed': { "root[1]['zip']": { 'new_value': '90404', 'old_value': '90007'}}}
  • 18. $ deepdiff t1.csv t3.csv { 'values_changed': { "root[1]['first_name']": { 'new_value': 'Sara', 'old_value': 'Jimmy'}, "root[1]['last_name']": { 'new_value': 'Smith', 'old_value': 'Brown'}, "root[1]['zip']": { 'new_value': '90007', 'old_value': '90007'}, "root[2]['first_name']": { 'new_value': 'Jimmy', 'old_value': 'Sara'}, "root[2]['last_name']": { 'new_value': 'Brown', 'old_value': 'Smith'}, "root[2]['zip']": { 'new_value': '90404', 'old_value': '90007'}}}
  • 19. $ deepdiff t1.csv t3.csv --ignore-order { 'values_changed': { "root[1]['zip']": { 'new_value': '90404', 'old_value': '90007'}}}
  • 21. deepdiff t1.csv t4.csv --ignore-order { 'values_changed': { "root[1]['zip']": { 'new_value': '90404', 'old_value': '90007'}}}
  • 22. Let's make a patch aka delta deepdiff t1.csv t4.csv --ignore-order --report-repetition --create-patch > patch1
  • 25. DeepPatch $ deeppatch t1new.csv patch1 --backup first_name,last_name,zip Joe,Nobody,90011 Jimmy,Brown,90007 Sara,Smith,90007 John,Doe,90001 Max,Foo,23232
  • 26. DeepPatch $ deeppatch t1new.csv patch1 --backup first_name,last_name,zip Joe,Nobody,90011 Jimmy,Brown,90404 Sara,Smith,90007 John,Doe,90001 Max,Foo,23232
  • 27. files $ deepdiff t1.csv t1new.csv --ignore-order --report-repetition --create-patch > patch2 deepdiff t1.csv t4.csv --ignore-order --report-repetition --create-patch > patch1 t1.csv patch1 patch2 deeppatch t1.csv patch1 --backup deeppatch t1.csv patch2 --backup
  • 28. some.yaml --- ![bg original](present_files/logo_long_B1_black2_16_9_big.svg) - first_name: Joe last_name: Nobody address: 3232 Main st. phone: 323-123-2345 zip: 90011 - first_name: Jimmy last_name: Brown address: 11th There sq. phone: 111-123-9911 zip: 90002 - first_name: Sara last_name: Smith address: Downtown LA phone: 818-113-2005 zip: 90007
  • 29. $ deeppatch some.yaml patch1 --backup --- ![bg original](present_files/logo_long_B1_black2_16_9_big.svg) - first_name: Joe last_name: Nobody address: 3232 Main st. phone: 323-123-2345 zip: 90011 - first_name: Jimmy last_name: Brown address: 11th There sq. phone: 111-123-9911 zip: 90002 - first_name: Sara last_name: Smith address: Downtown LA phone: 818-113-2005 zip: 90007
  • 30. $ deeppatch some.yaml patch1 --backup - address: 3232 Main st. first_name: Joe last_name: Nobody phone: 323-123-2345 zip: 90011 - address: 11th There sq. first_name: Jimmy last_name: Brown phone: 111-123-9911 zip: '90404' - address: Downtown LA first_name: Sara last_name: Smith phone: 818-113-2005 zip: 90007
  • 31. $ deeppatch some.yaml patch2 --backup - address: 3232 Main st. first_name: Joe last_name: Nobody phone: 323-123-2345 zip: 90011 - address: 11th There sq. first_name: Jimmy last_name: Brown phone: 111-123-9911 zip: '90404' - address: Downtown LA first_name: Sara last_name: Smith phone: 818-113-2005 zip: 90007 - first_name: John last_name: Doe zip: '90001' - first_name: Max last_name: Foo zip: '23232'
  • 32. Looking for something in the file?
  • 33. DeepGrep $ deepgrep sara some.yaml -i
  • 34. DeepGrep $ deepgrep sara some.yaml -i {'matched_values': OrderedSet(["root[2]['first_name']"])}
  • 35. Extracting the value? $ deepgrep sara some.yaml -i {'matched_values': OrderedSet(["root[2]['first_name']"])}
  • 36. DeepExtract $ deepgrep sara some.yaml -i {'matched_values': OrderedSet(["root[2]['first_name']"])} $ deepextract "root[2]" some.yaml { 'address': 'Downtown LA', 'first_name': 'Sara', 'last_name': 'Smith', 'phone': '818-113-2005', 'zip': 90007}
  • 38. [ [ [ [["a", "b", "c", "d"], ["a", "e", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]], [[10, "b", 2, "d"], ["a", "e", "c", "d"], ["f", 2, "d", 80]], [["10", "d"], ["a", "d", 89], [10, 2, 2, 80]] ], [ [["a", "b", "f", "d"], ["a", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]], [[10, "b", 2, "d"], ["a", "e", "c", "d", "d"], ["f", 22, "d", 80]], [["10", "dd"], ["a", "d", 89], [10, 2]] ], [ [["a", "b", "c", "d"], ["a", "e", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]], [["10", "d"], ["a", "d", 89], [10, 2, 2, 80]] ], [ [["a", "b", "f", "d"], ["a", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]], [[10, "b", 22, "d"], ["a", "e", "c", "d", "d"], ["f", 22, "d", 80]], [["10", "dd"], ["a", "d", 89], [10, 2]] ] ], [ [ [["a", "b", "f", "d"], ["a", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]], [[10, "dada"], ["a", "e", "c", "d", "d"], ["f", 22, "d", 80]], [["10", "dd"], ["a", "d", 389], [10, 2]] ], [ [["a", "b", "c", "d"], ["a", "e", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]], [[10, "b", 2, "d"], ["a", "e", "c", "d"], ["f", 2, "d", 80]], [["10", "d"], ["a", "d", 89], [10, 2, 2, 80]] ], [ [["a", "b", "c", "d"], ["a", "e", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]], [["10", "d"], ["a", "d", 89], [10, 2, 2, 801]] ], [ [["a", "b", "f", "d"], ["a", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]], [[10, "b", 22, "d"], ["a", "e", "d"], ["f", 22, "d", 80]], [["10", "dd"], ["a", "d", 89], [10, 2, 11]] ] ] ]
  • 39. [ [ [ [["a", "b", "c", "d"], ["a", "e", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]], [[1, "b", 2, "d"], ["a", "e", "c", "d"], ["f", 2, "d", 80]], [["10", 0, "d"], ["a", "d", 89], [10, 2, 2, 80]] ], [ [["a", "b", "f", "d"], ["a", "c", "d"], ["e", "c", "d", 9, 2, 4, 5]], [[10, "dd3"], ["ab", "d", 89], [10, 2]], [[10, "b", 2, "d"], ["a", "e", "c", "d", "d", "f"], ["f", 80]] ], [ [["a", "b"], ["a", "e", "c1", "d"], ["e", "c", "d", 1, 2, 4, 5]], [["b", 2, "d"], ["a", "e", "c", "d"], ["f", 2, "d", 80]], [["10", "d"], ["a", "d", 89], [10, 2, 2, 80]] ], [ [["a", "b", "f", "d"], ["a", "c", "d"], ["e", "c", "d", 9, 2, 4, 5]], [[10, "b", 2, "d"], ["f", 9], ["f", 80]], [["10", "dd1"], ["ab", "d", 89], [10, 2]] ] ], [ [ [["a", "b", "c", "d"], ["a", "e", "c", "d"], ["e", "c", "d", 1, 2, 4, 5]], [[1, "b", 2, "d"], ["a", "e", "c", "d"], ["f", 2, "d", 80]], [["10", 0, "d"], ["a", "d", 89], [10, 2, 2, 80]] ], [ [["a12", "b", "f", "d"], ["a", "c", "d"], ["e", "c", "d", 9, 2, 4, 5]], [[10, "dd3"], ["ab", "d", 89], [10, 2]], [[10, "b", 2, "d"], ["a", "e", "c", "d", "d", "f"], ["f", 80]] ], [ [["a", "b"], ["a", "e", "c1", "d"], ["e", "d", 1, 2, 4, 5]], [["b", 22, "d"], ["a", "e", "c", "d"], ["f", 2, "d", 80]], [["10", "d"], ["a", "d", 89], [10, 2, 2, 80]] ], [ [["a", "b", "f", "d"], [], ["e", "c", "d", 9, 2, 4, 5]], [[10, "b", 2, "d"], ["f", 9], ["f", 80]], [["10", "dd1"], ["ab", "d", 89], [10, 2, "4"]] ] ] ]
  • 40. nested_b_t1.json [ { "key3": [[[[[[[[[[1, 2, 4, 5]]], [[[8, 7, 3, 5]]]]]]]]]], "key4": [7, 8] }, { "key5": "val5", "key6": "val6" } ]
  • 41. nested_b_t2.json [ { "key5": "CHANGE", "key6": "val6" }, { "key3": [[[[[[[[[[1, 3, 5, 4]]], [[[8, 8, 1, 5]]]]]]]]]], "key4": [7, 8] } ]
  • 42. $ deepdiff nested_b_t1.json nested_b_t2.json { 'dictionary_item_added': [root[0]['key6'], root[0]['key5'], root[1]['key4'], root[1]['key3']], 'dictionary_item_removed': [root[0]['key4'], root[0]['key3'], root[1]['key6'], root[1]['key5']]}
  • 43. $ deepdiff nested_b_t1.json nested_b_t2.json --ignore-order { 'values_changed': { 'root[0]': { 'new_value': { 'key5': 'CHANGE', 'key6': 'val6'}, 'old_value': { 'key3': [ [ [ [ [ [ [ [ [ [ 1, 2, 4, 5]]], [ [ [ 8, 7, 3, 5]]]]]]]]]], 'key4': [7, 8]}}, 'root[1]': { 'new_value': { 'key3': [ [ [ [ [ [ [ [ [ [ 1, 3, 5, 4]]], [ [ [ 8, 8, 1, 5]]]]]]]]]], 'key4': [7, 8]}, 'old_value': { 'key5': 'val5', 'key6': 'val6'}}}}
  • 44. $ deepdiff --help --cutoff-distance-for-pairs FLOAT [default: 0.3] --cutoff-intersection-for-pairs FLOAT [default: 0.7] --cache-size INTEGER [default: 0] --cache-tuning-sample-size INTEGER [default: 0] --cache-purge-level INTEGER RANGE [default: 1] --create-patch [default: False] --exclude-paths TEXT --exclude-regex-paths TEXT --get-deep-distance [default: False] --ignore-order [default: False] --ignore-string-type-changes [default: False] --ignore-numeric-type-changes [default: False] --ignore-type-subclasses [default: False] --ignore-string-case [default: False] --ignore-nan-inequality [default: False] --include-private-variables [default: False] --log-frequency-in-sec INTEGER [default: 0] --max-passes INTEGER [default: 10000000] --max_diffs INTEGER --number-format-notation [f|e] [default: f] --progress-logger [info|error] [default: info] --report-repetition [default: False] --significant-digits INTEGER --truncate-datetime [second|minute|hour|day] --verbose-level INTEGER RANGE [default: 1] --help Show this message and exit.
  • 45. $ deepdiff nested_b_t1.json nested_b_t2.json --ignore-order --cache-size 5000 --cutoff-intersection-for-pairs 1 { 'iterable_item_removed': {"root[0]['key3'][0][0][0][0][0][0][1][0][0][1]": 7}, 'values_changed': { "root[0]['key3'][0][0][0][0][0][0][0][0][0][1]": { 'new_value': 3, 'old_value': 2}, "root[0]['key3'][0][0][0][0][0][0][1][0][0][2]": { 'new_value': 1, 'old_value': 3}, "root[1]['key5']": { 'new_value': 'CHANGE', 'old_value': 'val5'}}}
  • 48. Delta for your data How to efficiently track and store deeply nested data changes By Sep Dehpour Oct 2020