A walk through the maze of understanding Data Visualization using several tools such as Python, R, Knime and Google Data Studio.
This workshop is hands-on and this set of presentations is designed to be an agenda to the workshop
Overview of tools available in python for performing data visualization (statistical, geographical, reporting, etc). Prepared for Minsk DataViz Day (October 4, 2017)
Overview of tools available in python for performing data visualization (statistical, geographical, reporting, etc). Prepared for Minsk DataViz Day (October 4, 2017)
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
This Data Science with Python presentation will help you understand what is Data Science, basics of Python for data analysis, why learn Python, how to install Python, Python libraries for data analysis, exploratory analysis using Pandas, introduction to series and dataframe, loan prediction problem, data wrangling using Pandas, building a predictive model using Scikit-Learn and implementing logistic regression model using Python. The aim of this video is to provide a comprehensive knowledge to beginners who are new to Python for data analysis. This video provides a comprehensive overview of basic concepts that you need to learn to use Python for data analysis. Now, let us understand how Python is used in Data Science for data analysis.
This Data Science with Python presentation will cover the following topics:
1. What is Data Science?
2. Basics of Python for data analysis
- Why learn Python?
- How to install Python?
3. Python libraries for data analysis
4. Exploratory analysis using Pandas
- Introduction to series and dataframe
- Loan prediction problem
5. Data wrangling using Pandas
6. Building a predictive model using Scikit-learn
- Logistic regression
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you'll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques.
Learn more at: https://www.simplilearn.com
This is the basic introduction of the pandas library, you can use it for teaching this library for machine learning introduction. This slide will be able to help to understand the basics of pandas to the students with no coding background.
This slide is used to do an introduction for the matplotlib library and this will be a very basic introduction. As matplotlib is a very used and famous library for machine learning this will be very helpful to teach a student with no coding background and they can start the plotting of maps from the ending of the slide by there own.
Abstract: This PDSG workshop introduces the basics of Python libraries used in machine learning. Libraries covered are Numpy, Pandas and MathlibPlot.
Level: Fundamental
Requirements: One should have some knowledge of programming and some statistics.
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
Python is open source and has so many libraries for data wrangling and visualization that makes life of data scientists easier. For data wrangling pandas is used as it represent tabular data and it has other function to parse data from different sources, data cleaning, handling missing values, merging data sets etc. To visualize data, low level matplotlib can be used. But it is a base package for other high level packages such as seaborn, that draw well customized plot in just one line of code. Python has dash framework that is used to make interactive web application using python code without javascript and html. These dash application can be published on any server as well as on clouds like google cloud but freely on heroku cloud.
Introduction to Python Pandas for Data AnalyticsPhoenix
Pandas is an open-source, BSD-licensed Python library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, medical...
Data preprocessing techniques
See my Paris applied psychology conference paper here
https://www.slideshare.net/jasonrodrigues/paris-conference-on-applied-psychology
or
https://prezi.com/view/KBP8JnekVH9LkLOiKY3w/
The amount of data available to us is growing rapidly, but what is required to make useful conclusions out of it?
Outline
1. Different tactics to gather your data
2. Cleansing, scrubbing, correcting your data
3. Running analysis for your data
4. Bring your data to live with visualizations
5. Publishing your data for rest of us as linked open data
This presentation have the concept of Big data.
Why Big data is important to the present world.
How to visualize big data.
Steps for perfect visualization.
Visualization and design principle.
Also It had a number of visualization method for big data and traditional data.
Advantage of Visualization in Big Data
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
This Data Science with Python presentation will help you understand what is Data Science, basics of Python for data analysis, why learn Python, how to install Python, Python libraries for data analysis, exploratory analysis using Pandas, introduction to series and dataframe, loan prediction problem, data wrangling using Pandas, building a predictive model using Scikit-Learn and implementing logistic regression model using Python. The aim of this video is to provide a comprehensive knowledge to beginners who are new to Python for data analysis. This video provides a comprehensive overview of basic concepts that you need to learn to use Python for data analysis. Now, let us understand how Python is used in Data Science for data analysis.
This Data Science with Python presentation will cover the following topics:
1. What is Data Science?
2. Basics of Python for data analysis
- Why learn Python?
- How to install Python?
3. Python libraries for data analysis
4. Exploratory analysis using Pandas
- Introduction to series and dataframe
- Loan prediction problem
5. Data wrangling using Pandas
6. Building a predictive model using Scikit-learn
- Logistic regression
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you'll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques.
Learn more at: https://www.simplilearn.com
This is the basic introduction of the pandas library, you can use it for teaching this library for machine learning introduction. This slide will be able to help to understand the basics of pandas to the students with no coding background.
This slide is used to do an introduction for the matplotlib library and this will be a very basic introduction. As matplotlib is a very used and famous library for machine learning this will be very helpful to teach a student with no coding background and they can start the plotting of maps from the ending of the slide by there own.
Abstract: This PDSG workshop introduces the basics of Python libraries used in machine learning. Libraries covered are Numpy, Pandas and MathlibPlot.
Level: Fundamental
Requirements: One should have some knowledge of programming and some statistics.
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
Python is open source and has so many libraries for data wrangling and visualization that makes life of data scientists easier. For data wrangling pandas is used as it represent tabular data and it has other function to parse data from different sources, data cleaning, handling missing values, merging data sets etc. To visualize data, low level matplotlib can be used. But it is a base package for other high level packages such as seaborn, that draw well customized plot in just one line of code. Python has dash framework that is used to make interactive web application using python code without javascript and html. These dash application can be published on any server as well as on clouds like google cloud but freely on heroku cloud.
Introduction to Python Pandas for Data AnalyticsPhoenix
Pandas is an open-source, BSD-licensed Python library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, medical...
Data preprocessing techniques
See my Paris applied psychology conference paper here
https://www.slideshare.net/jasonrodrigues/paris-conference-on-applied-psychology
or
https://prezi.com/view/KBP8JnekVH9LkLOiKY3w/
The amount of data available to us is growing rapidly, but what is required to make useful conclusions out of it?
Outline
1. Different tactics to gather your data
2. Cleansing, scrubbing, correcting your data
3. Running analysis for your data
4. Bring your data to live with visualizations
5. Publishing your data for rest of us as linked open data
This presentation have the concept of Big data.
Why Big data is important to the present world.
How to visualize big data.
Steps for perfect visualization.
Visualization and design principle.
Also It had a number of visualization method for big data and traditional data.
Advantage of Visualization in Big Data
Module 4: Data visualization (8 hrs)
Introduction, Types of data visualization, Data for visualization: Data types, Data encodings, Retinal variables, Mapping variables to encodings, Visual encodings, Data Visualization in Python-Superset or in Microsoft Power BI
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo
Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/Bvmvc9
Data prep and data blending are terms that have come to prominence over the last year or two. On the surface, they appear to offer functionality similar to data virtualization…but there are important differences!
In this session, you will learn:
• How data virtualization complements or contrasts technologies such as data prep and data blending
• Pros and cons of functionality provided by data prep, data catalog and data blending tools
• When and how to use these different technologies to be most effective
This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...Big Data Value Association
In the Internet of Everything, huge volumes of multimedia data are generated at very high rates by heterogeneous sources in various formats, such as sensors readings, process logs, structured data from RDBMS, etc. The need of the hour is setting up efficient data pipelines that can compute advanced analytics models on data and use results to customize services, predict future needs or detect anomalies. This Webinar explores the TOREADOR conversational, service-based approach to the easy design of efficient and reusable analytics pipelines to be automatically deployed on a variety of cloud-based execution platforms.
Creating Effective Data Visualizations in Excel 2016: Some BasicsShalin Hai-Jew
One of the mainstays of a modern software toolkit is Excel 2016, from Microsoft Office 2016. By reputation, Excel is considered a beginner’s tool that self-respecting data analysts would bypass, but Excel is fairly high-powered, can take up to 1.06 million rows of data per set, contains complex statistical analysis capabilities (without the need for scripting), and enables rich data visualizations. It has a number of rich add-ons to empower different analytical and data visualization functionalities. It works as a great bridging tool to more complex types of statistical analyses.
This session walks participants through some basic built-in data visualizations in Excel 2016, including pie charts and doughnuts, bar charts, tree maps and sunburst diagrams, cluster diagrams, spider (radar) charts, scattergraphs, and others. This session will cover how data structures and desired emphases will determine the options for particular data visualizations.
In this session, participants will
review how to load a data table,
read the general data in a data table (or worksheet),
process or clean the data as needed,
use the Recommended Charts feature,
decide which built-in data visualizations to use, and
consider how to add relevant data visualization elements (including data labels, background grids, axis labels, and titles) for a coherent and effective data visualization.
Also, participants will help co-build data visualizations from open-source and other datasets.
This presentation was given to the Tech Change Technology for Monitoring and Evaluation Diploma course on 25th September 2015. It covers:
Why visualise data?
Where to start?
Which tools to use?
It ends with an overview of Kwantu's approach to this area and the technology choices that we've made.
Project Explanation: Book Recommendation System
The goal of this project was to develop a book recommendation system that provides personalized recommendations to users based on their preferences and past reading behavior. The project involved the following key steps:
1. Data Collection: I gathered a comprehensive dataset of books, including information such as titles, authors, genres, and user ratings. This data was obtained from various reliable sources, such as online bookstores or publicly available book datasets.
2. Data Preprocessing: The collected data required cleaning and preprocessing to ensure its quality and consistency. I handled missing values, resolved inconsistencies in book titles or authors, and standardized the data format for further analysis.
3. Exploratory Data Analysis: I performed exploratory data analysis to gain insights into the dataset. This included analyzing book genres, distribution of user ratings, and identifying popular authors or books.
4. Feature Engineering: To capture the preferences and interests of users, I created relevant features from the available data. These features could include book genres, authors, user demographics, or historical reading behavior.
5. Recommendation Model Development: I developed a recommendation model using collaborative filtering techniques or content-based filtering methods. Collaborative filtering utilizes the preferences of similar users to make recommendations, while content-based filtering suggests books based on their attributes and user preferences. I employed popular machine learning algorithms, such as matrix factorization or k-nearest neighbors, to build the recommendation model.
6. Model Evaluation: I evaluated the performance of the recommendation system using metrics such as precision, recall, or mean average precision. I also conducted A/B testing or cross-validation to assess the system's effectiveness and optimize its performance.
7. User Interface Development: I created a user-friendly interface where users could input their preferences and receive personalized book recommendations. The interface provided an intuitive and interactive experience, allowing users to explore recommended books and provide feedback.
8. Deployment and Feedback Loop: The recommendation system was deployed in a production environment, where users could access it and provide feedback on the recommended books. This feedback was incorporated into the system to continually improve its accuracy and relevance over time.
By completing this project, I gained hands-on experience in data collection, preprocessing, exploratory data analysis, and recommendation system development. I demonstrated my ability to leverage machine learning algorithms and user data to build a personalized book recommendation system that enhances user engagement and satisfaction.
Olist Store Analysis
ccording to the data, Olist E-commerce has about 99,440 orders. With about 89,940 orders being delivered, the company has a 90% delivery success rate.
✔Their average product rating is 4.09 stars, with product categories going as high as 4.67 stars and as low as 2.5 stars. 1 Star reviews are on third place in the review score distribution ranking which likely indicates that there could be problems with product quality in some product categories
✔It helps in understanding the spending patterns of customers in sao paulo city .it also helps Olist in identifying high value customers and creating targeted marketing campaigns.
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari
Everybody wants to go on the “Big Data” hype cycle, “To do Scale”, to use the coolest tools in the market like Hadoop, Apache Spark, Apache Cassandra, etc.
But do they ask themselves is there really a reason for that?
In the talk we’ll make a brief overview to all of the technologies in the Big Data world nowadays and we’ll talk about the problems that really emerge when you’d like to enter the great world of Big Data handling.
Showing you the Hadoop ecosystem and Apache Spark and all of the distributed tools leading the market today, will give you all a notion of what will be the real costs entering that world.
Promise that I’ll share some stories from the trenches :)
(And about the “pool” thing...I don’t really know how to swim)
Just finished a basic course on data science (highly recommend it if you wish to explore what data science is all about). Here are my takeaways from the course.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
2. "In God We Trust…All Other's, Bring Data,"
Deming
3. 1. Exploratory Data Analysis
2. Fundamentals of Effective Data
Visualization
3. Tools for Data Visualization
4. Demo using Python, R and Knime to
create visualization
5. Creating insightful reports with
Visual tools
6. Q & A
3
AGENDA
4. What is EDA?
• Exploratory data analysis is a
data analysis approach to reveal the
important characteristics of a dataset,
mainly through visualization.
• Get to know your data!
• Distributions (symmetric, normal, skewed)
• Data quality problems
• Outliers
• Correlations and inter-relationships
• Functional relationships
• Derived attributes, keys such as Primary,
Foreign keys,
• Static attributes, dynamic attributes etc
5. Get a good look and feel of the Data.
• Always check your datasets
• Mean
• Medians
• Quantiles
• Histograms
• Boxplots
• Scatter Diagrams
Consider looking at every attribute - you will understand
what it represents!
6. Visualization beforeAnalysis
(Anscombe’s Quartet)
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
7. For all the Datasets
Property Value Accuracy
Mean of x 9 exact
Sample variance of x 11 exact
Mean of y 7.50 to 2 decimal places
Sample variance of y 4.125 plus/minus 0.003
Correlation between x and y 0.816 to 3 decimal places
Linear regression line y = 3.00 + 0.500x
to 2 and 3 decimal places,
respectively
Coefficient of
determination of the linear
regression
0.67 to 2 decimal places
8. • The first scatter plot (top left) appears to be a simple linear relationship, corresponding to
two variables correlated and following the assumption of normality.
• The second graph (top right) is not distributed normally; while a relationship between the two
variables is obvious, it is not linear, and the Pearson correlation coefficient is not relevant. A
more general regression and the corresponding coefficient of determination would be more
appropriate.
9. • In the third graph (bottom left), the distribution is linear, but should have a different regression
line (a robust regression would have been called for). The calculated regression is offset by the
one outlier which exerts enough influence to lower the correlation coefficient from 1 to 0.816.
• Finally, the fourth graph (bottom right) shows an example when one outlier is enough to produce
a high correlation coefficient, even though the other data points do not indicate any relationship
between the variables.
10. Get a general sense of the data
• Make sure your first visualization is - Data-driven (model-free)
• Think interactive and visual
• Humans are the best pattern recognizers
• Use as many dimensions as your data will permit 2, 3
• x,y,z, space, color, time….
• Visualization is useful in early stages of data mining
• detect outliers (e.g. assess data quality)
• test assumptions (e.g. normal distributions or skewed?)
• identify useful raw data & transforms (e.g. log(x))
Take Away: it is always well worth looking at your data!
11. 1. Exploratory Data Analysis
2. Fundamentals of Effective Data
Visualization
3. Tools for Data Visualization
4. Demo using Python, R and Knime to
create visualization
5. Creating insightful reports with
Visual tools
6. Q & A
11
AGENDA
18. Introduction to Information Visualization - Fall 2013
*Adapted from The ParaView
Tutorial, Moreland
Visualization: Converting raw data to a graphics that is
understandable to people
22. HEATMAPVISUALIZATION
• A heatmap is a two-dimensional
graphical representation of data
where the individual values that
are contained in a matrix are
represented as colors.
• The seaborn python package
allows the creation of
annotated heatmaps which can
be tweaked
using Matplotlib tools as per the
creator's requirement.
22
23. 1. Exploratory Data Analysis
2. Fundamentals of Effective Data
Visualization
3. Tools for Data Visualization
4. Demo using Python, R and Knime to
create visualization
5. Creating insightful reports with
Visual tools
6. Q & A
23
AGENDA
25. KNIME DATAVISUALIZATIONTOOLS
• KNIME Analytics Platform provides many nodes for data visualization,
including scatter plots, pie charts, box plots, histograms as well as tag
clouds and visualizations of networks.
Data Visualization Nodes
• KNIME has a number of native visualization dedicated nodes.
• Hiliting
• Geo-location
• R Choropleths
25
26. KNIME FEATURES
KNIME uses modular workflow approach, which documents and
stores the analysis process in the exact same order it was conceived
and implemented. All results in the workflow are instantly available
for review by the user, aiding debugging at every stage in the
workflow
Core KNIME features include:
• Scalability through sophisticated data handling
(intelligent automatic caching of data in the background
while maximizing throughput performance)
• Highly and easily extensible via a well-defined API for
plugin extensions
• Intuitive user interface
• Import/export of workflows (for exchanging with other
KNIME users)
• Parallel execution on multi-core systems
• Command line version for "headless" batch executions
26
27. KNIME FUNCTIONALITIES
Available KNIME modules cover a vast range of functionality,
such as:
• I/O: retrieves data from files or data bases
• Data Manipulation: pre-processes your input data with
filtering, group-by, pivoting, binning, normalization,
aggregation, joining, sampling, partitioning, etc.
• Views: inspects the data and results with several
interactive views, supporting interactive data exploration
• Hiliting: ensures hillite data points in one view are also
immediately hillite in all other views
• Mining: uses state-of-the-art data mining algorithms like
clustering, rule induction, decision tree, association rules,
naïve bayes, neural networks, support vector machines,
etc. to better understand your data
27
29. MATPLOTLIB – 2 D Graphics
29
Simple and powerful visualizations can be generated
using the Matplotlib Python Library.
It is the most widely-used library for plotting in the
Python community.
Libraries like pandas are “wrappers” over Matplotlib
allowing access to a number of Matplotlib’s methods
with less code.
The versatility of Matplotlib can be used to make many
visualization types:-
•Scatter plots
•Bar charts and Histograms
•Line plots
•Pie charts
•Stem plots
•Contour plots etc
30. SEABORN
• Seaborn is a popular data
visualization library that is built
on top of Matplotlib.
• Seaborn’s default styles and
color palettes are much more
sophisticated than Matplotlib.
• Seaborn is a higher-level library,
meaning it’s easier to generate
certain kinds of plots, including
heat maps, time series, and
violin plots.
30
31. ggplot
• Ggplot is a python visualization library
based on R’s ggplot2 and the Grammar of
Graphics.
• Ggplot operates differently compared to
Matplotlib: it lets users layer components
to create a full plot.
• The Grammar of Graphics has been hailed
as an “intuitive” method for plotting,
though, seasoned Matplotlib users might
need time to adjust to this new mindset.
31
32. Bokeh
https://bokeh.pydata.org/en/latest/docs/gallery.html#gallery
• Bokeh is native to Python, not ported over from R, unlike ggplot. Bokeh, like
ggplot, is also based on The Grammar of Graphics.
• It also supports streaming, and real-time data and its unique selling proposition
is its ability to create interactive, web-ready plots, which can easily output as
JSON objects, HTML documents, or interactive web applications.
• Bokeh has three interfaces with varying degrees of control to accommodate
different types of users.
• The topmost level is for creating charts quickly. It includes methods for creating
common charts such as bar plots, box plots, and histograms.
• The middle level allows the user to control the basic building blocks of each chart (for
example, the dots in a scatter plot) and has the same specificity as Matplotlib.
• The bottom level is geared toward developers and software engineers. It has no pre-
set defaults and requires the user to define every element of the chart.
32
https://demo.bokehplots.com/apps/crossfilter https://realpython.com/python-data-visualization-bokeh/
33. PLOTLY
• Plotly is widely known as an online platform for
data visualization.
• It can be accessed from a Python notebook.
• Like Bokeh, Plotly’s strength lies in making
interactive plots, and it offers some charts not
found in most libraries, like contour plots.
• Can also be used by people with no technical
background for creating interactive plots by
uploading the data and using plotly GUI.
• Plotly is compatible with ggplots in R and Python.
• It allows to embed interactive plots in projects or
websites using iframes or html.
33
https://plot.ly/python/line-and-scatter/ https://plot.ly/feed/?q=plottype:choropleth
34. PYGAL
• Offers interactive plots that can be
embedded in a web browser. The ability
to output charts as SVGs, is its prime
differentiator. For work involving smaller
datasets, SVGs will do just fine. However,
for charts with hundreds of thousands of
data points, they become sluggish and
have trouble rendering.
It’s easy to create a nice-looking chart
with just a few lines of code since each
chart type is packaged into a method and
the built-in styles are pretty.
34
35. ALTAIR
35
Altair is a declarative statistical
visualization python library based
on Vega-lite.
Declarative means you only need to
mention the links between data columns
to the encoding channels, such as x-axis,
y-axis, color, etc. and the rest of the
plotting details are handled automatically.
Being declarative makes Altair simple,
friendly and consistent. It is easy to
design effective and beautiful
visualizations with a minimal amount of
code using Altair.
36. Geoplotlib
• It is a toolbox used for plotting
geographical data and map creation.
• It can be used to create a variety of map-
types, like choropleths, heatmaps, and dot
density maps.
• It provides a set of in-built tools for the
most common tasks such as density
visualization, spatial graphs, and shape
files.
• Simply said Geoplotlib is a Python library
dedicated to visualization of maps
36
37. Major RVisual Libraries
37
• Plotly - Plotly's R graphing library makes interactive, publication-quality
graphs online. Can be used to make line plots, scatter plots, area
charts, bar charts, error bars, box plots, histograms, heatmaps,
subplots, multiple-axes, and 3D (WebGL based) charts.
• Ggplot2 - The ggplot2 package lets you make beautiful and
customizable plots of your data. It implements the grammar of
graphics, an easy to use system for building plots.
• Shiny - Shiny is an R package that makes it easy to build interactive web
apps straight from R. You can host standalone apps on a webpage or
embed them in R Markdown documents or build dashboards. You can
also extend your Shiny apps with CSS themes, htmlwidgets, and
JavaScript actions.
https://shiny.rstudio.com/gallery/genome-browser.html
https://rdrr.io/snippets/http://gallery.htmlwidgets.org/ docs.ggplot2.or
38. GOOGLE DATA STUDIO
• Currently in beta, Google Data Studio allows you
to create branded reports
with data visualizations to share with your
clients. ... Google Data Studio is part of
theGoogle Analytics 360 Suite — the high-end
(i.e., pricey)Google Analytics Enterprise package.
• Data Studio is Google's reporting solution for
power users who want to go beyond
the data and dashboards of Google Analytics.
The data widgets in Data Studio are notable for
their variety, customization options,
live data and interactive controls (such as
column sorting and table pagination).
• You can create up to five custom reports for free
earlier – now you can create as many as required 38
https://datastudio.google.com/reporting/1Rg5y6r0640X8uo2xo
2XY48sG9IyMiYEN/page/wcCU
39. D3.JS
• D3.js is a JavaScript library for
manipulating documents based on data.
• D3 helps you bring data to life using
HTML, SVG, and CSS. D3’s emphasis on
web standards gives you the full
capabilities of modern browsers without
tying yourself to a proprietary framework,
combining powerful visualization
components and a data-driven approach
to DOM manipulation.
39
40. 1. Exploratory Data Analysis
2. Fundamentals of Effective Data
Visualization
3. Tools for Data Visualization
4. Demo using Python, R and Knime to
create visualization
5. Creating insightful reports with
Visual tools
6. Q & A
40
AGENDA
41. 1. Exploratory Data Analysis
2. Fundamentals of Effective Data
Visualization
3. Tools for Data Visualization
4. Demo using Python, R and Knime to
create visualization
5. Creating insightful reports with
Visual tools
6. Q & A
41
AGENDA
42. Reporting and Analysis
• Reporting is “the process of
organizing data into informational
summaries in order to monitor
how different areas of a business
are performing.”
• Analytics is “the process of
exploring data and reports in order
to extract meaningful insights,
which can be used to better
understand and improve business
performance.”
42
43. An Analytical Report?
An analytical report is a business report
• It uses qualitative and quantitative data
to analyze as well as evaluate a business
strategy or process.
• Empowers decision makers to make data-
driven decisions based on evidence and
analytics.
43
45. Collecting Metrics is easy – Generating Insights is what nails it!
Generate - Actionable insight
• Actionable Insights is a term in data analytics and big data for information that can be
acted upon or information that gives enough insight into the future that the actions
that should be taken become clear for decision makers.
• Analytics (mathematical ways of synthesizing metrics) must illuminate business
conditions, sentiment and directional changes over time.
• Insights are what humans make from analytics - once you have data and perform the
analysis, you have the knowledge to form insights and change your actions or
responses.
45
46. 1. Exploratory Data Analysis
2. Fundamentals of Effective Data
Visualization
3. Tools for Data Visualization
4. Demo using Python, R and Knime to
create visualization
5. Creating insightful reports with
Visual tools
6. Q & A
46
AGENDA
47. 47
This session is for education purpose and the material used in this presentation has been compiled from various free
and readily available resources, a full acknowledgement list can be furnished on request