SlideShare a Scribd company logo
Data compression with Python: application of differentData compression with Python: application of different
algorithms with the use of threads in genome filesalgorithms with the use of threads in genome files
Vinicius Seus¹ (viniciusseus@gmail.com)
Alex Camargo¹ (alexcamargoweb@gmail.com)
Diego Mengarda² (diegormengarda@gmail.com)
¹FURG
²UNIPAMPA
Brazil
MOL2NET
International Conference Series on Multidisciplinary Sciences
http://sciforum.net/conference/mol2net-03
2
Introduction
This work proposes and evaluates an implementation of different
algorithms of data compression using the Python programming
language allied to the use of threads.
 As a case study it was used genomic data available from the
NCBI (National Center for Biotechnology Information) public
database.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
3
Introduction
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
Figure 1. Graphical Abstract
4
Materials and Methods
For the experimental environment it was used a simplified structure
composed of a computer with: 3.07GHz processor (12 cores), 12 GB
RAM and 500 GB hard disk.
 Table 1 shows the results of the experiments for the file
"ref_ASM45574v1_gnomon_scaffolds.txt" with a total size of
122 MB, referring to an excerpt from the genome identified by
"Aligator sinensis", belonging to the family Alligatoridae.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
5
Conclusions
The main contribution of this work was to present an algorithm
option for data compression based on the Python
programming language.
 By default, the algorithms were not designed to work in
parallel, however, with the use of the Python Threading library
this was achieved.
 With the experimental environment implemented, it was
possible to analyze the performance of both the compression
rate and the compression time for each algorithm.
 As future works, we intend to extend the range of algorithms to
be studied as well as the application and analysis of
decompression metrics with emphasis on public genomic data.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
6
References
BURROWS, Michael; WHEELER, David J. A block-sorting lossless data compression
algorithm. 1994.
VERLI, Hugo et al. Bioinformática da Biologia à flexibilidade molecular. Porto
Alegre, Brasil, v. 1, 2014.
WELCH, Terry A. A technique for high-performance data compression. Computer,
v. 6, n. 17, p. 8-19, 1984.
MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition

More Related Content

Similar to Data compression with Python: application of different algorithms with the use of threads in genome files

Optimizing queries via search server ElasticSearch: a study applied to large ...
Optimizing queries via search server ElasticSearch: a study applied to large ...Optimizing queries via search server ElasticSearch: a study applied to large ...
Optimizing queries via search server ElasticSearch: a study applied to large ...
Alex Camargo
 
Computational of Bioinformatics
Computational of BioinformaticsComputational of Bioinformatics
Computational of Bioinformatics
ijtsrd
 
Bio-UnaGrid: Easing bioinformatics workflow execution
Bio-UnaGrid: Easing bioinformatics workflow executionBio-UnaGrid: Easing bioinformatics workflow execution
Bio-UnaGrid: Easing bioinformatics workflow execution
Mario Jose Villamizar Cano
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
Edureka!
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
CSCJournals
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
c.titus.brown
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
Paolo Romano
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
Jaclyn Williams
 
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...
A N  E XTENSION OF  P ROTÉGÉ FOR AN AUTOMA TIC  F UZZY - O NTOLOGY BUILDING U...A N  E XTENSION OF  P ROTÉGÉ FOR AN AUTOMA TIC  F UZZY - O NTOLOGY BUILDING U...
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...
ijcsit
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methods
Priscill Orue Esquivel
 
Towards reproducibility and maximally-open data
Towards reproducibility and maximally-open dataTowards reproducibility and maximally-open data
Towards reproducibility and maximally-open data
Pablo Bernabeu
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
Blue BRIDGE
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
ecij
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
ecij
 
Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010
Oladokun Sulaiman
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
Alejandra Gonzalez-Beltran
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical research
University Medicine Greifswald
 
The Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text TechnologiesThe Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text Technologies
International Journal of Modern Research in Engineering and Technology
 
short-story.pptx
short-story.pptxshort-story.pptx
short-story.pptx
SravaniRaparla
 

Similar to Data compression with Python: application of different algorithms with the use of threads in genome files (20)

Optimizing queries via search server ElasticSearch: a study applied to large ...
Optimizing queries via search server ElasticSearch: a study applied to large ...Optimizing queries via search server ElasticSearch: a study applied to large ...
Optimizing queries via search server ElasticSearch: a study applied to large ...
 
Computational of Bioinformatics
Computational of BioinformaticsComputational of Bioinformatics
Computational of Bioinformatics
 
Bio-UnaGrid: Easing bioinformatics workflow execution
Bio-UnaGrid: Easing bioinformatics workflow executionBio-UnaGrid: Easing bioinformatics workflow execution
Bio-UnaGrid: Easing bioinformatics workflow execution
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...
A N  E XTENSION OF  P ROTÉGÉ FOR AN AUTOMA TIC  F UZZY - O NTOLOGY BUILDING U...A N  E XTENSION OF  P ROTÉGÉ FOR AN AUTOMA TIC  F UZZY - O NTOLOGY BUILDING U...
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methods
 
Towards reproducibility and maximally-open data
Towards reproducibility and maximally-open dataTowards reproducibility and maximally-open data
Towards reproducibility and maximally-open data
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
 
Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical research
 
The Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text TechnologiesThe Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text Technologies
 
short-story.pptx
short-story.pptxshort-story.pptx
short-story.pptx
 

More from Alex Camargo

Escola Bíblica - Eclesiologia
Escola Bíblica - EclesiologiaEscola Bíblica - Eclesiologia
Escola Bíblica - Eclesiologia
Alex Camargo
 
Escola Bíblica - Demonologia
Escola Bíblica - DemonologiaEscola Bíblica - Demonologia
Escola Bíblica - Demonologia
Alex Camargo
 
Python para finanças: explorando dados financeiros
Python para finanças: explorando dados financeirosPython para finanças: explorando dados financeiros
Python para finanças: explorando dados financeiros
Alex Camargo
 
A practical guide: How to use Bitcoins?
A practical guide: How to use Bitcoins?A practical guide: How to use Bitcoins?
A practical guide: How to use Bitcoins?
Alex Camargo
 
IA e Bioinformática: modelos computacionais de proteínas
IA e Bioinformática: modelos computacionais de proteínasIA e Bioinformática: modelos computacionais de proteínas
IA e Bioinformática: modelos computacionais de proteínas
Alex Camargo
 
Introdução às criptomoedas: investimento, mercado e segurança
Introdução às criptomoedas: investimento, mercado e segurançaIntrodução às criptomoedas: investimento, mercado e segurança
Introdução às criptomoedas: investimento, mercado e segurança
Alex Camargo
 
Introdução às criptomoedas: criando a sua própria moeda como o Bitcoin!
Introdução às criptomoedas:  criando a sua própria moeda como o Bitcoin!Introdução às criptomoedas:  criando a sua própria moeda como o Bitcoin!
Introdução às criptomoedas: criando a sua própria moeda como o Bitcoin!
Alex Camargo
 
Cristão versus Redes Sociais - Alex (Arca da Aliança)
Cristão versus Redes Sociais - Alex (Arca da Aliança)Cristão versus Redes Sociais - Alex (Arca da Aliança)
Cristão versus Redes Sociais - Alex (Arca da Aliança)
Alex Camargo
 
Empatia e compaixão: O Bom Samaritano
Empatia e compaixão: O Bom SamaritanoEmpatia e compaixão: O Bom Samaritano
Empatia e compaixão: O Bom Samaritano
Alex Camargo
 
Alta performance em IA: uma abordagem pratica
Alta performance em IA: uma abordagem praticaAlta performance em IA: uma abordagem pratica
Alta performance em IA: uma abordagem pratica
Alex Camargo
 
Bioinformática do DNA ao medicamento: ferramentas e usabilidade
Bioinformática do DNA ao medicamento: ferramentas e usabilidadeBioinformática do DNA ao medicamento: ferramentas e usabilidade
Bioinformática do DNA ao medicamento: ferramentas e usabilidade
Alex Camargo
 
Inteligência Artificial aplicada: reconhecendo caracteres escritos à mão
Inteligência Artificial aplicada: reconhecendo caracteres escritos à mãoInteligência Artificial aplicada: reconhecendo caracteres escritos à mão
Inteligência Artificial aplicada: reconhecendo caracteres escritos à mão
Alex Camargo
 
IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)
IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)
IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)
Alex Camargo
 
Algoritmos de inteligência artificial para classificação de notícias falsas. ...
Algoritmos de inteligência artificial para classificação de notícias falsas. ...Algoritmos de inteligência artificial para classificação de notícias falsas. ...
Algoritmos de inteligência artificial para classificação de notícias falsas. ...
Alex Camargo
 
Fake News - Conceitos, métodos e aplicações de identificação e mitigação
Fake News - Conceitos, métodos e aplicações de identificação e mitigaçãoFake News - Conceitos, métodos e aplicações de identificação e mitigação
Fake News - Conceitos, métodos e aplicações de identificação e mitigação
Alex Camargo
 
PredictCovid: IA. SIEPE UNIPAMPA 2020
PredictCovid: IA. SIEPE UNIPAMPA 2020PredictCovid: IA. SIEPE UNIPAMPA 2020
PredictCovid: IA. SIEPE UNIPAMPA 2020
Alex Camargo
 
Ia versus covid 19 - alex
Ia versus covid 19 - alexIa versus covid 19 - alex
Ia versus covid 19 - alex
Alex Camargo
 
2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial Intelligence2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial Intelligence
Alex Camargo
 
Aula 5 - Considerações finais
Aula 5 - Considerações finaisAula 5 - Considerações finais
Aula 5 - Considerações finais
Alex Camargo
 
Aula 04 - Injeção de código (Cross-Site Scripting)
Aula 04 - Injeção de código (Cross-Site Scripting)Aula 04 - Injeção de código (Cross-Site Scripting)
Aula 04 - Injeção de código (Cross-Site Scripting)
Alex Camargo
 

More from Alex Camargo (20)

Escola Bíblica - Eclesiologia
Escola Bíblica - EclesiologiaEscola Bíblica - Eclesiologia
Escola Bíblica - Eclesiologia
 
Escola Bíblica - Demonologia
Escola Bíblica - DemonologiaEscola Bíblica - Demonologia
Escola Bíblica - Demonologia
 
Python para finanças: explorando dados financeiros
Python para finanças: explorando dados financeirosPython para finanças: explorando dados financeiros
Python para finanças: explorando dados financeiros
 
A practical guide: How to use Bitcoins?
A practical guide: How to use Bitcoins?A practical guide: How to use Bitcoins?
A practical guide: How to use Bitcoins?
 
IA e Bioinformática: modelos computacionais de proteínas
IA e Bioinformática: modelos computacionais de proteínasIA e Bioinformática: modelos computacionais de proteínas
IA e Bioinformática: modelos computacionais de proteínas
 
Introdução às criptomoedas: investimento, mercado e segurança
Introdução às criptomoedas: investimento, mercado e segurançaIntrodução às criptomoedas: investimento, mercado e segurança
Introdução às criptomoedas: investimento, mercado e segurança
 
Introdução às criptomoedas: criando a sua própria moeda como o Bitcoin!
Introdução às criptomoedas:  criando a sua própria moeda como o Bitcoin!Introdução às criptomoedas:  criando a sua própria moeda como o Bitcoin!
Introdução às criptomoedas: criando a sua própria moeda como o Bitcoin!
 
Cristão versus Redes Sociais - Alex (Arca da Aliança)
Cristão versus Redes Sociais - Alex (Arca da Aliança)Cristão versus Redes Sociais - Alex (Arca da Aliança)
Cristão versus Redes Sociais - Alex (Arca da Aliança)
 
Empatia e compaixão: O Bom Samaritano
Empatia e compaixão: O Bom SamaritanoEmpatia e compaixão: O Bom Samaritano
Empatia e compaixão: O Bom Samaritano
 
Alta performance em IA: uma abordagem pratica
Alta performance em IA: uma abordagem praticaAlta performance em IA: uma abordagem pratica
Alta performance em IA: uma abordagem pratica
 
Bioinformática do DNA ao medicamento: ferramentas e usabilidade
Bioinformática do DNA ao medicamento: ferramentas e usabilidadeBioinformática do DNA ao medicamento: ferramentas e usabilidade
Bioinformática do DNA ao medicamento: ferramentas e usabilidade
 
Inteligência Artificial aplicada: reconhecendo caracteres escritos à mão
Inteligência Artificial aplicada: reconhecendo caracteres escritos à mãoInteligência Artificial aplicada: reconhecendo caracteres escritos à mão
Inteligência Artificial aplicada: reconhecendo caracteres escritos à mão
 
IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)
IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)
IA versus COVID-19 Deep Learning, Códigos e Execução em nuvem (Tchelinux 2020)
 
Algoritmos de inteligência artificial para classificação de notícias falsas. ...
Algoritmos de inteligência artificial para classificação de notícias falsas. ...Algoritmos de inteligência artificial para classificação de notícias falsas. ...
Algoritmos de inteligência artificial para classificação de notícias falsas. ...
 
Fake News - Conceitos, métodos e aplicações de identificação e mitigação
Fake News - Conceitos, métodos e aplicações de identificação e mitigaçãoFake News - Conceitos, métodos e aplicações de identificação e mitigação
Fake News - Conceitos, métodos e aplicações de identificação e mitigação
 
PredictCovid: IA. SIEPE UNIPAMPA 2020
PredictCovid: IA. SIEPE UNIPAMPA 2020PredictCovid: IA. SIEPE UNIPAMPA 2020
PredictCovid: IA. SIEPE UNIPAMPA 2020
 
Ia versus covid 19 - alex
Ia versus covid 19 - alexIa versus covid 19 - alex
Ia versus covid 19 - alex
 
2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial Intelligence2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial Intelligence
 
Aula 5 - Considerações finais
Aula 5 - Considerações finaisAula 5 - Considerações finais
Aula 5 - Considerações finais
 
Aula 04 - Injeção de código (Cross-Site Scripting)
Aula 04 - Injeção de código (Cross-Site Scripting)Aula 04 - Injeção de código (Cross-Site Scripting)
Aula 04 - Injeção de código (Cross-Site Scripting)
 

Recently uploaded

Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
exukyp
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 

Recently uploaded (20)

Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 

Data compression with Python: application of different algorithms with the use of threads in genome files

  • 1. Data compression with Python: application of differentData compression with Python: application of different algorithms with the use of threads in genome filesalgorithms with the use of threads in genome files Vinicius Seus¹ (viniciusseus@gmail.com) Alex Camargo¹ (alexcamargoweb@gmail.com) Diego Mengarda² (diegormengarda@gmail.com) ¹FURG ²UNIPAMPA Brazil MOL2NET International Conference Series on Multidisciplinary Sciences http://sciforum.net/conference/mol2net-03
  • 2. 2 Introduction This work proposes and evaluates an implementation of different algorithms of data compression using the Python programming language allied to the use of threads.  As a case study it was used genomic data available from the NCBI (National Center for Biotechnology Information) public database. MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
  • 3. 3 Introduction MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition Figure 1. Graphical Abstract
  • 4. 4 Materials and Methods For the experimental environment it was used a simplified structure composed of a computer with: 3.07GHz processor (12 cores), 12 GB RAM and 500 GB hard disk.  Table 1 shows the results of the experiments for the file "ref_ASM45574v1_gnomon_scaffolds.txt" with a total size of 122 MB, referring to an excerpt from the genome identified by "Aligator sinensis", belonging to the family Alligatoridae. MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
  • 5. 5 Conclusions The main contribution of this work was to present an algorithm option for data compression based on the Python programming language.  By default, the algorithms were not designed to work in parallel, however, with the use of the Python Threading library this was achieved.  With the experimental environment implemented, it was possible to analyze the performance of both the compression rate and the compression time for each algorithm.  As future works, we intend to extend the range of algorithms to be studied as well as the application and analysis of decompression metrics with emphasis on public genomic data. MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition
  • 6. 6 References BURROWS, Michael; WHEELER, David J. A block-sorting lossless data compression algorithm. 1994. VERLI, Hugo et al. Bioinformática da Biologia à flexibilidade molecular. Porto Alegre, Brasil, v. 1, 2014. WELCH, Terry A. A technique for high-performance data compression. Computer, v. 6, n. 17, p. 8-19, 1984. MOL2NET 2017, International Conference on Multidisciplinary Sciences, 3rd edition