SlideShare a Scribd company logo
1 of 11
Download to read offline
Research Toolbox - Data Analysis with Python
A Waternomics Case Study
Umair ul Hassan
Agenda
 An overview of Python ecosystem
 Waternomics case study
 Data Access
 Data Manipulation
 Data Visualization
 Tips & Tricks
 Advanced Libraries
 Q & A
2
The Python Language
 According to Wikipedia
3
a widely used high-level, general-purpose, interpreted, dynamic
programming language. Its design philosophy emphasizes code
readability, and its syntax allows programmers to express concepts
in fewer lines of code
Python Distribution
 Official open source interpreter is CPython available at www.python.org
 A distribution packages a set of python tools, modules and libraries to simplify
setup and installation
4
Waternomics Case Study
 Linked Water Dataspace
5
Extract Transform
Load
NEB BMS AWS S3
RDF Data
Load
DRUID
DRUID
Transform
OpenCube
Data Access
 Simple file IO functions
 open, read, write
 Pandas
 read_csv, read_excel, read_hdf, read_sql, read_json,
read_msgpack, read_html, read_gbq, read_stata, read_sas,
read_clipboard, read_pickle
 For writing replace “read” with “to” e.g. to_csv
 RDFlib
 parse, serialize
 Requests (for HTTP/HTTPS)
 get, post, put, delete, head, options
 json
 dumps, loads
6
Data Manipulation
 Numpy
 Base N-dimensional array package
 Pandas
 Data structures & analysis
 Allows multi-dimensional OLAP like operations
 Scipy
 Set of package for mathematics, science, and engineering
 Integration, optimization, signal processing, linear algebra,
image processing, spatial data analysis, etc
 Statsmodels
 Statistical models, tests, and analysis
7
Data visualization
 Matplotlib
 Library for 2D Plotting
 Allows export to images
 Seaborn
 Attractive visualization using matplotlib
 Use themes for appealing graphs
 Bokeh
 Interactive visualizations for web browsers
 Deploy visualization of as part of a webside
8
Tips & Tricks
 Running a IPython/Jupyter server on Virtual Machine
 Allows remote access and data analysis
 Always password protect the server
 Do not print or view large datasets in browser
 Figures and tables for Latex
 Generate Latex code for DataFrames using to_latex
 Save matplotlib plots as .pgf for inclusion in Latex
 Package/module management
 pip - The Python package and dependency manager
 conda - Cross-platform, Python-agnostic binary package manager
 setuptools – Python project packaging, testing, installation, etc
9
Advanced Libraries
 scikt-learn
 Python library for machine learning
 Pyomo
 Library for optimization modelling
 Use in conjuction with glpk, grobi, CPLEX, etc
 NLTK
 Natural language toolkit for
 RDFLib
 Set of libraries for RDF and OWL processing
 Tweepy
 Library to access Twitter API
10
Other resources
 Conferences (SciPy, EuroSciPy, PyData)
 Web frameworks (Django, Flask, CherryPy, Bottle)
 Cross platform GUI frameworks (PyQT, Kivy)
 Awesome Python List https://github.com/vinta/awesome-python
 MOOCs
 Introduction to Python for Data Science
https://www.edx.org/course/introduction-python-data-science-
microsoft-dat208x-1
 Python for Everybody
https://www.coursera.org/specializations/python
11

More Related Content

What's hot

BeeGFS - Dealing with Extreme Requirements in HPC
BeeGFS - Dealing with Extreme Requirements in HPCBeeGFS - Dealing with Extreme Requirements in HPC
BeeGFS - Dealing with Extreme Requirements in HPC
inside-BigData.com
 

What's hot (10)

BeeGFS Enterprise Deployment
BeeGFS Enterprise Deployment BeeGFS Enterprise Deployment
BeeGFS Enterprise Deployment
 
R vs python
R vs pythonR vs python
R vs python
 
Python
PythonPython
Python
 
Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories Metadata
 
BeeGFS - Dealing with Extreme Requirements in HPC
BeeGFS - Dealing with Extreme Requirements in HPCBeeGFS - Dealing with Extreme Requirements in HPC
BeeGFS - Dealing with Extreme Requirements in HPC
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summit
 
Five python libraries should know for machine learning
Five python libraries should know for machine learningFive python libraries should know for machine learning
Five python libraries should know for machine learning
 
Exposing Bibliographic Information as Linked Open Data using Standards-based ...
Exposing Bibliographic Information as Linked Open Data using Standards-based ...Exposing Bibliographic Information as Linked Open Data using Standards-based ...
Exposing Bibliographic Information as Linked Open Data using Standards-based ...
 
Hands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4jHands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4j
 
An Efficient Search Engine for Searching Desired File
An Efficient Search Engine for Searching Desired FileAn Efficient Search Engine for Searching Desired File
An Efficient Search Engine for Searching Desired File
 

Similar to Researh toolbox-data-analysis-with-python

CHX PYTHON INTRO
CHX PYTHON INTROCHX PYTHON INTRO
CHX PYTHON INTRO
Kai Liu
 
Fedora Overview
Fedora OverviewFedora Overview
Fedora Overview
eposthumus
 

Similar to Researh toolbox-data-analysis-with-python (20)

Python standard library & list of important libraries
Python standard library & list of important librariesPython standard library & list of important libraries
Python standard library & list of important libraries
 
CHX PYTHON INTRO
CHX PYTHON INTROCHX PYTHON INTRO
CHX PYTHON INTRO
 
overview of python programming language.pptx
overview of python programming language.pptxoverview of python programming language.pptx
overview of python programming language.pptx
 
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
 
Python Programming
Python ProgrammingPython Programming
Python Programming
 
Python for Data Engineering: Why Do Data Engineers Use Python?
Python for Data Engineering: Why Do Data Engineers Use Python?Python for Data Engineering: Why Do Data Engineers Use Python?
Python for Data Engineering: Why Do Data Engineers Use Python?
 
Why Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageWhy Python Should Be Your First Programming Language
Why Python Should Be Your First Programming Language
 
Pycon 2011
Pycon 2011Pycon 2011
Pycon 2011
 
Anaconda Python KNIME & Orange Installation
Anaconda Python KNIME & Orange InstallationAnaconda Python KNIME & Orange Installation
Anaconda Python KNIME & Orange Installation
 
Python webinar 4th june
Python webinar 4th junePython webinar 4th june
Python webinar 4th june
 
Python | What is Python | History of Python | Python Tutorial
Python | What is Python | History of Python | Python TutorialPython | What is Python | History of Python | Python Tutorial
Python | What is Python | History of Python | Python Tutorial
 
Python for data science
Python for data sciencePython for data science
Python for data science
 
Python
PythonPython
Python
 
Getting Started with Python
Getting Started with PythonGetting Started with Python
Getting Started with Python
 
CSP as a Domain-Specific Language Embedded in Python and Jython
CSP as a Domain-Specific Language Embedded in Python and JythonCSP as a Domain-Specific Language Embedded in Python and Jython
CSP as a Domain-Specific Language Embedded in Python and Jython
 
Fedora Overview
Fedora OverviewFedora Overview
Fedora Overview
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
 
Rapid Web Development with Python for Absolute Beginners
Rapid Web Development with Python for Absolute BeginnersRapid Web Development with Python for Absolute Beginners
Rapid Web Development with Python for Absolute Beginners
 
03_aiops-1.pptx
03_aiops-1.pptx03_aiops-1.pptx
03_aiops-1.pptx
 
Introduction_to_Python.pptx
Introduction_to_Python.pptxIntroduction_to_Python.pptx
Introduction_to_Python.pptx
 

More from Waternomics

More from Waternomics (20)

Shazam that water leak! Sensors and faults
Shazam that water leak! Sensors and faultsShazam that water leak! Sensors and faults
Shazam that water leak! Sensors and faults
 
Smart cities - Perspectives from the South
Smart cities - Perspectives from the SouthSmart cities - Perspectives from the South
Smart cities - Perspectives from the South
 
Africa: Water
Africa: WaterAfrica: Water
Africa: Water
 
Waternomics: Business Models and Exploitation
Waternomics: Business Models and ExploitationWaternomics: Business Models and Exploitation
Waternomics: Business Models and Exploitation
 
Waternomics: Overview of the Pilots Objectives, Measures and Outcomes
Waternomics: Overview of the Pilots Objectives, Measures and OutcomesWaternomics: Overview of the Pilots Objectives, Measures and Outcomes
Waternomics: Overview of the Pilots Objectives, Measures and Outcomes
 
Waternomics Methodology Overview
Waternomics Methodology OverviewWaternomics Methodology Overview
Waternomics Methodology Overview
 
Waternomics Applications Platform - Water Apps for Everyone
Waternomics Applications Platform - Water Apps for EveryoneWaternomics Applications Platform - Water Apps for Everyone
Waternomics Applications Platform - Water Apps for Everyone
 
Water Conservation in Galway City & Waternomics
Water Conservation in Galway City & WaternomicsWater Conservation in Galway City & Waternomics
Water Conservation in Galway City & Waternomics
 
Waternomics: Key impacts for smart water management
Waternomics: Key impacts for smart water managementWaternomics: Key impacts for smart water management
Waternomics: Key impacts for smart water management
 
Welcome and Project Overview
Welcome and Project OverviewWelcome and Project Overview
Welcome and Project Overview
 
Waternomics: Making Sense of Water Data
Waternomics: Making Sense of Water DataWaternomics: Making Sense of Water Data
Waternomics: Making Sense of Water Data
 
Waternomics Results and Impact
Waternomics Results and ImpactWaternomics Results and Impact
Waternomics Results and Impact
 
Waternomics Methodology design Sustainable buildings
Waternomics Methodology design Sustainable buildingsWaternomics Methodology design Sustainable buildings
Waternomics Methodology design Sustainable buildings
 
Waternomics Application Platform
Waternomics Application PlatformWaternomics Application Platform
Waternomics Application Platform
 
The business value of a smart water system
The business value of a smart water system The business value of a smart water system
The business value of a smart water system
 
AUTOMATED LEAK DETECTION SYSTEM FOR THE IMPROVEMENT OF WATER NETWORK MANAGEMENT
AUTOMATED LEAK DETECTION SYSTEM FOR THE IMPROVEMENT OF WATER NETWORK MANAGEMENTAUTOMATED LEAK DETECTION SYSTEM FOR THE IMPROVEMENT OF WATER NETWORK MANAGEMENT
AUTOMATED LEAK DETECTION SYSTEM FOR THE IMPROVEMENT OF WATER NETWORK MANAGEMENT
 
Making your-very-own-android-apps-for-waternomics-using-app-inventor-2
Making your-very-own-android-apps-for-waternomics-using-app-inventor-2Making your-very-own-android-apps-for-waternomics-using-app-inventor-2
Making your-very-own-android-apps-for-waternomics-using-app-inventor-2
 
Water usage-visualization-tutorial
Water usage-visualization-tutorialWater usage-visualization-tutorial
Water usage-visualization-tutorial
 
Waternomics - ICT for Water Resource Management - Water Information Platform
Waternomics - ICT for Water Resource Management - Water Information PlatformWaternomics - ICT for Water Resource Management - Water Information Platform
Waternomics - ICT for Water Resource Management - Water Information Platform
 
Waternomics - ICT for Water Resrouce Management - Roll up Banner
Waternomics - ICT for Water Resrouce Management - Roll up BannerWaternomics - ICT for Water Resrouce Management - Roll up Banner
Waternomics - ICT for Water Resrouce Management - Roll up Banner
 

Recently uploaded

Jax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined DeckJax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined Deck
Marc Lester
 

Recently uploaded (20)

What is an API Development- Definition, Types, Specifications, Documentation.pdf
What is an API Development- Definition, Types, Specifications, Documentation.pdfWhat is an API Development- Definition, Types, Specifications, Documentation.pdf
What is an API Development- Definition, Types, Specifications, Documentation.pdf
 
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
 
Malaysia E-Invoice digital signature docpptx
Malaysia E-Invoice digital signature docpptxMalaysia E-Invoice digital signature docpptx
Malaysia E-Invoice digital signature docpptx
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
The Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test AutomationThe Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test Automation
 
Jax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined DeckJax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined Deck
 
Salesforce Introduced Zero Copy Partner Network to Simplify the Process of In...
Salesforce Introduced Zero Copy Partner Network to Simplify the Process of In...Salesforce Introduced Zero Copy Partner Network to Simplify the Process of In...
Salesforce Introduced Zero Copy Partner Network to Simplify the Process of In...
 
Weeding your micro service landscape.pdf
Weeding your micro service landscape.pdfWeeding your micro service landscape.pdf
Weeding your micro service landscape.pdf
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024
 

Researh toolbox-data-analysis-with-python

  • 1. Research Toolbox - Data Analysis with Python A Waternomics Case Study Umair ul Hassan
  • 2. Agenda  An overview of Python ecosystem  Waternomics case study  Data Access  Data Manipulation  Data Visualization  Tips & Tricks  Advanced Libraries  Q & A 2
  • 3. The Python Language  According to Wikipedia 3 a widely used high-level, general-purpose, interpreted, dynamic programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code
  • 4. Python Distribution  Official open source interpreter is CPython available at www.python.org  A distribution packages a set of python tools, modules and libraries to simplify setup and installation 4
  • 5. Waternomics Case Study  Linked Water Dataspace 5 Extract Transform Load NEB BMS AWS S3 RDF Data Load DRUID DRUID Transform OpenCube
  • 6. Data Access  Simple file IO functions  open, read, write  Pandas  read_csv, read_excel, read_hdf, read_sql, read_json, read_msgpack, read_html, read_gbq, read_stata, read_sas, read_clipboard, read_pickle  For writing replace “read” with “to” e.g. to_csv  RDFlib  parse, serialize  Requests (for HTTP/HTTPS)  get, post, put, delete, head, options  json  dumps, loads 6
  • 7. Data Manipulation  Numpy  Base N-dimensional array package  Pandas  Data structures & analysis  Allows multi-dimensional OLAP like operations  Scipy  Set of package for mathematics, science, and engineering  Integration, optimization, signal processing, linear algebra, image processing, spatial data analysis, etc  Statsmodels  Statistical models, tests, and analysis 7
  • 8. Data visualization  Matplotlib  Library for 2D Plotting  Allows export to images  Seaborn  Attractive visualization using matplotlib  Use themes for appealing graphs  Bokeh  Interactive visualizations for web browsers  Deploy visualization of as part of a webside 8
  • 9. Tips & Tricks  Running a IPython/Jupyter server on Virtual Machine  Allows remote access and data analysis  Always password protect the server  Do not print or view large datasets in browser  Figures and tables for Latex  Generate Latex code for DataFrames using to_latex  Save matplotlib plots as .pgf for inclusion in Latex  Package/module management  pip - The Python package and dependency manager  conda - Cross-platform, Python-agnostic binary package manager  setuptools – Python project packaging, testing, installation, etc 9
  • 10. Advanced Libraries  scikt-learn  Python library for machine learning  Pyomo  Library for optimization modelling  Use in conjuction with glpk, grobi, CPLEX, etc  NLTK  Natural language toolkit for  RDFLib  Set of libraries for RDF and OWL processing  Tweepy  Library to access Twitter API 10
  • 11. Other resources  Conferences (SciPy, EuroSciPy, PyData)  Web frameworks (Django, Flask, CherryPy, Bottle)  Cross platform GUI frameworks (PyQT, Kivy)  Awesome Python List https://github.com/vinta/awesome-python  MOOCs  Introduction to Python for Data Science https://www.edx.org/course/introduction-python-data-science- microsoft-dat208x-1  Python for Everybody https://www.coursera.org/specializations/python 11