SlideShare a Scribd company logo
1 of 12
Managing XML and Semistructured Data Lecture 19: Compressing XML Data Prof. Dan Suciu Spring 2001
In this lecture ,[object Object],[object Object],[object Object],[object Object],[object Object]
Compression: The Problem ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
An Example:Web Server Logs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],202.239.238.16|GET / HTTP/1.0|text/html|200|1997/10/01-00:00:02|-|4478|-|-|http://www.net.jp/|Mozilla/3.1[ja](I) ASCII File 15.9 MB  (gzipped 1.6MB): XML-ized inflates to 24.2 MB  (gzipped 2.1MB):
XMill ,[object Object],[object Object],[object Object],[object Object],[object Object]
How Xmill Works: Three Ideas < apache:entry > < apache:host > </ apache:host > . . . </ apache:entry > 202.239.238.16  GET / HTTP/1.0  text/html  200 … gzip Structure gzip Data =1.75MB + Compress the structure separately from the data:
How Xmill Works: Three Ideas < apache:entry > . . . </ apache:entry > 202.23.23.16 224.42.24.55 … gzip Structure gzip Data1 =1.33MB + GET / HTTP/1.0 GET / HTTP/1.1 … gzip Data2 + Group the data values according to their types:
How Xmill Works: Three Ideas Apply semantic (specialized) compressors: ,[object Object],[object Object],[object Object],[object Object],[object Object],gzip Structure  +  gzip c1(Data1)  +  gzip c2(Data2) + ... =0.82MB
XML Compression
Compression Tradeoff
Summary of XML Data Management ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Summary of XML Data Management ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot (6)

2013 DATA @ NFLX (Tableau User Group)
2013 DATA @ NFLX (Tableau User Group)2013 DATA @ NFLX (Tableau User Group)
2013 DATA @ NFLX (Tableau User Group)
 
Lost In The Clouds
Lost In The CloudsLost In The Clouds
Lost In The Clouds
 
YAML Engineering: why we need a new paradigm
YAML Engineering: why we need a new paradigmYAML Engineering: why we need a new paradigm
YAML Engineering: why we need a new paradigm
 
Geo data analytics
Geo data analyticsGeo data analytics
Geo data analytics
 
The immutable database datomic
The immutable database   datomicThe immutable database   datomic
The immutable database datomic
 
Mining top k frequent closed itemsets
Mining top k frequent closed itemsetsMining top k frequent closed itemsets
Mining top k frequent closed itemsets
 

Viewers also liked

Best Practices Portfolio Mngt
Best Practices Portfolio MngtBest Practices Portfolio Mngt
Best Practices Portfolio Mngt
STKI
 
Rt Printing V3 And Vendors
Rt Printing  V3 And VendorsRt Printing  V3 And Vendors
Rt Printing V3 And Vendors
STKI
 
Office Of The Cio Round Table Summary 3
Office Of The Cio Round Table Summary 3Office Of The Cio Round Table Summary 3
Office Of The Cio Round Table Summary 3
STKI
 
Crm Round Table Summary 2
Crm Round Table Summary 2Crm Round Table Summary 2
Crm Round Table Summary 2
STKI
 
Office Of The Cio 2
Office Of The Cio  2 Office Of The Cio  2
Office Of The Cio 2
STKI
 
Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07
STKI
 
Christ The Redeemer In Rio
Christ The Redeemer In Rio Christ The Redeemer In Rio
Christ The Redeemer In Rio
alina28
 
Erp Round Table Summary V5
Erp Round Table Summary V5Erp Round Table Summary V5
Erp Round Table Summary V5
STKI
 
Itil Rt Summary1
Itil Rt Summary1Itil Rt Summary1
Itil Rt Summary1
STKI
 
Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07
STKI
 
Minunea Globului Pamintesc
Minunea Globului PamintescMinunea Globului Pamintesc
Minunea Globului Pamintesc
alina28
 
Bpm Round Table Summary
Bpm Round Table SummaryBpm Round Table Summary
Bpm Round Table Summary
STKI
 
Office Of The Cio 2
Office Of The Cio  2 Office Of The Cio  2
Office Of The Cio 2
STKI
 
Green Dc Rt V3 And Vendors
Green Dc  Rt V3 And VendorsGreen Dc  Rt V3 And Vendors
Green Dc Rt V3 And Vendors
STKI
 
Nelson Rolihlahla Mandela
Nelson Rolihlahla MandelaNelson Rolihlahla Mandela
Nelson Rolihlahla Mandela
marija1987
 
Psalm of Life
Psalm of LifePsalm of Life
Psalm of Life
lsample
 
Lit. Unit 7: Gift of the Magi
Lit. Unit 7: Gift of the MagiLit. Unit 7: Gift of the Magi
Lit. Unit 7: Gift of the Magi
lsample
 

Viewers also liked (19)

Best Practices Portfolio Mngt
Best Practices Portfolio MngtBest Practices Portfolio Mngt
Best Practices Portfolio Mngt
 
Rt Printing V3 And Vendors
Rt Printing  V3 And VendorsRt Printing  V3 And Vendors
Rt Printing V3 And Vendors
 
Office Of The Cio Round Table Summary 3
Office Of The Cio Round Table Summary 3Office Of The Cio Round Table Summary 3
Office Of The Cio Round Table Summary 3
 
Crm Round Table Summary 2
Crm Round Table Summary 2Crm Round Table Summary 2
Crm Round Table Summary 2
 
Office Of The Cio 2
Office Of The Cio  2 Office Of The Cio  2
Office Of The Cio 2
 
Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07
 
Christ The Redeemer In Rio
Christ The Redeemer In Rio Christ The Redeemer In Rio
Christ The Redeemer In Rio
 
Erp Round Table Summary V5
Erp Round Table Summary V5Erp Round Table Summary V5
Erp Round Table Summary V5
 
Itil Rt Summary1
Itil Rt Summary1Itil Rt Summary1
Itil Rt Summary1
 
Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07Office Of The Cio Pmo 23.12.07
Office Of The Cio Pmo 23.12.07
 
Minunea Globului Pamintesc
Minunea Globului PamintescMinunea Globului Pamintesc
Minunea Globului Pamintesc
 
Com Fer Un Blog
Com Fer Un BlogCom Fer Un Blog
Com Fer Un Blog
 
Bpm Round Table Summary
Bpm Round Table SummaryBpm Round Table Summary
Bpm Round Table Summary
 
Office Of The Cio 2
Office Of The Cio  2 Office Of The Cio  2
Office Of The Cio 2
 
Green Dc Rt V3 And Vendors
Green Dc  Rt V3 And VendorsGreen Dc  Rt V3 And Vendors
Green Dc Rt V3 And Vendors
 
Nelson Rolihlahla Mandela
Nelson Rolihlahla MandelaNelson Rolihlahla Mandela
Nelson Rolihlahla Mandela
 
Psalm of Life
Psalm of LifePsalm of Life
Psalm of Life
 
Lit. Unit 7: Gift of the Magi
Lit. Unit 7: Gift of the MagiLit. Unit 7: Gift of the Magi
Lit. Unit 7: Gift of the Magi
 
The harsh hammurabi code
The harsh hammurabi codeThe harsh hammurabi code
The harsh hammurabi code
 

Similar to 19compression

Utilized XStrem in Green Integration
Utilized XStrem in Green IntegrationUtilized XStrem in Green Integration
Utilized XStrem in Green Integration
Guo Albert
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Romeo Kienzler
 
Semi structure data extraction
Semi structure data extractionSemi structure data extraction
Semi structure data extraction
R A Akerkar
 
Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011
lennartkoopmann
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
programmermag
 
Document Object Model
Document Object ModelDocument Object Model
Document Object Model
chomas kandar
 

Similar to 19compression (20)

XML-athon with Don and Dean
XML-athon with Don and DeanXML-athon with Don and Dean
XML-athon with Don and Dean
 
Utilized XStrem in Green Integration
Utilized XStrem in Green IntegrationUtilized XStrem in Green Integration
Utilized XStrem in Green Integration
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
 
How to deploy & optimize eZ Publish (2014)
How to deploy & optimize eZ Publish (2014)How to deploy & optimize eZ Publish (2014)
How to deploy & optimize eZ Publish (2014)
 
Managing the logs of your (Rails) applications - Arrrrcamp 2011
Managing the logs of your (Rails) applications - Arrrrcamp 2011Managing the logs of your (Rails) applications - Arrrrcamp 2011
Managing the logs of your (Rails) applications - Arrrrcamp 2011
 
D3ML Session
D3ML SessionD3ML Session
D3ML Session
 
Synapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipelineSynapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipeline
 
Semi structure data extraction
Semi structure data extractionSemi structure data extraction
Semi structure data extraction
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Handout3o
Handout3oHandout3o
Handout3o
 
PowerAI Deep dive
PowerAI Deep divePowerAI Deep dive
PowerAI Deep dive
 
Php memory-redux
Php memory-reduxPhp memory-redux
Php memory-redux
 
Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011Managing the logs of your (Rails) applications - RailsWayCon 2011
Managing the logs of your (Rails) applications - RailsWayCon 2011
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
6 311 W
6 311 W6 311 W
6 311 W
 
6 311 W
6 311 W6 311 W
6 311 W
 
test
testtest
test
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Document Object Model
Document Object ModelDocument Object Model
Document Object Model
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

19compression

  • 1. Managing XML and Semistructured Data Lecture 19: Compressing XML Data Prof. Dan Suciu Spring 2001
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. How Xmill Works: Three Ideas < apache:entry > < apache:host > </ apache:host > . . . </ apache:entry > 202.239.238.16 GET / HTTP/1.0 text/html 200 … gzip Structure gzip Data =1.75MB + Compress the structure separately from the data:
  • 7. How Xmill Works: Three Ideas < apache:entry > . . . </ apache:entry > 202.23.23.16 224.42.24.55 … gzip Structure gzip Data1 =1.33MB + GET / HTTP/1.0 GET / HTTP/1.1 … gzip Data2 + Group the data values according to their types:
  • 8.
  • 11.
  • 12.