SlideShare a Scribd company logo
DATA SCIENCE SERVICES
www.yleaf.co
Yellow
Leaf
Software
Create production software
8.
Define visualization approach
7.
Interpretand challenge results
6.
Apply prediction m odelling / m achine learning
5.
Perform exploratory data analysis
4.
Clean the data forworking datasetsam ple
3.
Obtain the data (any possible to get)
2.
Define businessquestion
1.
YLS lifecyclefordata products
www.yleaf.co
Yellow
Leaf
Software
Initialrough data setexploratory research
Problem analysiswith five whys
Interview
Brainstorm ing
www.yleaf.co
Yellow
Leaf
Software
Defining question
Usage ofexisting research data
(Kaggle,Quandle,publicscientificarticlesand research results)
DB extraction and m anipulation (ETL)
W eb scraping (Python/Scrapy,Mechanize,Selenium )
www.yleaf.co
Yellow
Leaf
Software
Obtaining thedata (mining)
Preparing righttable form atsforanalysis(e.g.factorization)
DB m anipulation (Table joining,switching form atsetc.)
Early stage noise reduction (rem oving em pty orirrelevantfieldsetc.)
www.yleaf.co
Yellow
Leaf
Software
Cleaning data (munging)
Othervisualization techniques
Dim ension reduction
Clustering
Correlation analysis
www.yleaf.co
Yellow
Leaf
Software
Exploratorydata analysis
Othersupervised and unsupervised learning techniques
8.
Scikit-learn,num py,tensorflow software packages
7.
Neuralnetwork algorithm s
6.
Naturallanguage processing
5.
Opticalcharacterrecognition (Multi-classSVM,tensorflow,ABBYY cloud OCR)
4.
SVM
3.
Random forest
2.
Regression analysisand m odelling
1.
Prediction modeling and ML
Yellow
Leaf
Software
Otherlibsforchartsand graphs
HighCharts
ChartsJS
GoogleVis
D3.js
www.yleaf.co
Yellow
Leaf
Software
Exploratorydata analysis
Bad habitsand place ofliving correlation with disease severity
Hum an voice spectercom parator
Random forestpracticalusage
www.yleaf.co
Yellow
Leaf
Software
YLSpectro POC casestudy

More Related Content

What's hot

HEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkHEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 Talk
Eamonn Maguire
 
Talk Big Data Conference Munich - Data Science needs real Data Scientists.
Talk Big Data Conference Munich - Data Science needs real Data Scientists. Talk Big Data Conference Munich - Data Science needs real Data Scientists.
Talk Big Data Conference Munich - Data Science needs real Data Scientists.
Marcel Blattner, PhD
 
Ilya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-ConferenceIlya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-Conference
NextBio
 
[Pr12] deep anomaly detection using geometric transformations
[Pr12] deep anomaly detection using geometric transformations[Pr12] deep anomaly detection using geometric transformations
[Pr12] deep anomaly detection using geometric transformations
강민국 강민국
 
Data-Intensive Scalable Science
Data-Intensive Scalable ScienceData-Intensive Scalable Science
Data-Intensive Scalable Science
University of Washington
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
Greg Landrum
 
Data science
Data scienceData science
Data science
Sreejith c
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
Gaignard Alban
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data Commons
Simon Twigger
 

What's hot (9)

HEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkHEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 Talk
 
Talk Big Data Conference Munich - Data Science needs real Data Scientists.
Talk Big Data Conference Munich - Data Science needs real Data Scientists. Talk Big Data Conference Munich - Data Science needs real Data Scientists.
Talk Big Data Conference Munich - Data Science needs real Data Scientists.
 
Ilya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-ConferenceIlya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-Conference
 
[Pr12] deep anomaly detection using geometric transformations
[Pr12] deep anomaly detection using geometric transformations[Pr12] deep anomaly detection using geometric transformations
[Pr12] deep anomaly detection using geometric transformations
 
Data-Intensive Scalable Science
Data-Intensive Scalable ScienceData-Intensive Scalable Science
Data-Intensive Scalable Science
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
Data science
Data scienceData science
Data science
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data Commons
 

Similar to Data science services YLS

Data Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake FansData Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake Fans
Jameel Syed
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014
Carole Goble
 
Java Introductie
Java IntroductieJava Introductie
Java Introductiembruggen
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)
Andy Petrella
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009Ian Foster
 
Open source for customer analytics
Open source for customer analyticsOpen source for customer analytics
Open source for customer analytics
Matthias Funke
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
Chelle Gentemann
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
Duncan Hull
 
OpenML NeurIPS2018
OpenML NeurIPS2018OpenML NeurIPS2018
OpenML NeurIPS2018
Joaquin Vanschoren
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
Ian Foster
 
Past, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software DataPast, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software Data
Jeongwhan Choi
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
Joaquin Vanschoren
 
Ben gurion university_data_desert
Ben gurion university_data_desertBen gurion university_data_desert
Ben gurion university_data_desertxband
 
Mining Whole Museum Collections Datasets for Expanding Understanding of Colle...
Mining Whole Museum Collections Datasets for Expanding Understanding of Colle...Mining Whole Museum Collections Datasets for Expanding Understanding of Colle...
Mining Whole Museum Collections Datasets for Expanding Understanding of Colle...
Matthew J Collins
 
Získáváme, čistíme a ukládáme data
Získáváme, čistíme a ukládáme dataZískáváme, čistíme a ukládáme data
Získáváme, čistíme a ukládáme data
Josef Šlerka
 
Analysis using r
Analysis using rAnalysis using r
Analysis using r
Priya Mohan
 
Wek1
Wek1Wek1
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.
Andy Petrella
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 

Similar to Data science services YLS (20)

Data Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake FansData Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake Fans
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014
 
Java Introductie
Java IntroductieJava Introductie
Java Introductie
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
Open source for customer analytics
Open source for customer analyticsOpen source for customer analytics
Open source for customer analytics
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
OpenML NeurIPS2018
OpenML NeurIPS2018OpenML NeurIPS2018
OpenML NeurIPS2018
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
 
Past, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software DataPast, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software Data
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
 
Ben gurion university_data_desert
Ben gurion university_data_desertBen gurion university_data_desert
Ben gurion university_data_desert
 
Mining Whole Museum Collections Datasets for Expanding Understanding of Colle...
Mining Whole Museum Collections Datasets for Expanding Understanding of Colle...Mining Whole Museum Collections Datasets for Expanding Understanding of Colle...
Mining Whole Museum Collections Datasets for Expanding Understanding of Colle...
 
Získáváme, čistíme a ukládáme data
Získáváme, čistíme a ukládáme dataZískáváme, čistíme a ukládáme data
Získáváme, čistíme a ukládáme data
 
Analysis using r
Analysis using rAnalysis using r
Analysis using r
 
Wek1
Wek1Wek1
Wek1
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 

More from Dima Semchuk

Dmytro semchuk scrum alliance-cspo_certificate
Dmytro semchuk scrum alliance-cspo_certificateDmytro semchuk scrum alliance-cspo_certificate
Dmytro semchuk scrum alliance-cspo_certificate
Dima Semchuk
 
Certification 1986197
Certification 1986197Certification 1986197
Certification 1986197
Dima Semchuk
 
YLS_presentation_en_2016
YLS_presentation_en_2016YLS_presentation_en_2016
YLS_presentation_en_2016Dima Semchuk
 
Fundamentals of project management sf discussion v1.0
Fundamentals of project management sf discussion v1.0Fundamentals of project management sf discussion v1.0
Fundamentals of project management sf discussion v1.0
Dima Semchuk
 
Fo pm pert
Fo pm   pertFo pm   pert
Fo pm pert
Dima Semchuk
 
Fopm evm
Fopm   evmFopm   evm
Fopm evm
Dima Semchuk
 
Iris recognition
Iris recognitionIris recognition
Iris recognition
Dima Semchuk
 

More from Dima Semchuk (7)

Dmytro semchuk scrum alliance-cspo_certificate
Dmytro semchuk scrum alliance-cspo_certificateDmytro semchuk scrum alliance-cspo_certificate
Dmytro semchuk scrum alliance-cspo_certificate
 
Certification 1986197
Certification 1986197Certification 1986197
Certification 1986197
 
YLS_presentation_en_2016
YLS_presentation_en_2016YLS_presentation_en_2016
YLS_presentation_en_2016
 
Fundamentals of project management sf discussion v1.0
Fundamentals of project management sf discussion v1.0Fundamentals of project management sf discussion v1.0
Fundamentals of project management sf discussion v1.0
 
Fo pm pert
Fo pm   pertFo pm   pert
Fo pm pert
 
Fopm evm
Fopm   evmFopm   evm
Fopm evm
 
Iris recognition
Iris recognitionIris recognition
Iris recognition
 

Data science services YLS