As David McCandless famously said “Information Visualization is a form of knowledge compression”. In particular, it is a way of compressing information in a visual way that can be easily and correctly interpreted by the visual system in our brains.
In this tutorial we will discuss the way in which our eyes and visual cortex process colors and shapes and how we may use it to our advantage. Ideas and concepts will be presented in an intuitive and practical way while providing references for the more technical descriptions and explanations available in the relevant scientific literature.
Matplotlib is the workhorse of visualization in Python and underlies all other major Python visualization packages and it it particularly well integrated into the Jupyter ecosystem. Mastering it is a fundamental requirement to be proficient in python data visualization. Seaborn, on the other hand, is a more recent package that builds on top of matplotlib and simplifies it for some of the most common use cases, making it more productive. We will cover both tools through practical examples and highlight the main differences and advantages of each one.
Code and slides available here: https://bmtgoncalves.github.io/DataVisualization/
The slides I was using when delivering a meetup about the matplotlib library. More info about that meetup can be found at https://www.meetup.com/life-michael/events/271738271/
This slide is used to do an introduction for the matplotlib library and this will be a very basic introduction. As matplotlib is a very used and famous library for machine learning this will be very helpful to teach a student with no coding background and they can start the plotting of maps from the ending of the slide by there own.
This Edureka Python Matplotlib tutorial (Python Tutorial Blog: https://goo.gl/wd28Zr) explains what is data visualization and how to perform data visualization using Matplotlib. It also explains how to modify your plot and how to plot various types of graphs. Below are the topics covered in this tutorial:
1. Why Data Visualization?
2. What Is Data Visualization?
3. Various Types Of Plots
4. What Is Matplotlib?
6. How To Use Matplotlib?
The slides I was using when delivering a meetup about the matplotlib library. More info about that meetup can be found at https://www.meetup.com/life-michael/events/271738271/
This slide is used to do an introduction for the matplotlib library and this will be a very basic introduction. As matplotlib is a very used and famous library for machine learning this will be very helpful to teach a student with no coding background and they can start the plotting of maps from the ending of the slide by there own.
This Edureka Python Matplotlib tutorial (Python Tutorial Blog: https://goo.gl/wd28Zr) explains what is data visualization and how to perform data visualization using Matplotlib. It also explains how to modify your plot and how to plot various types of graphs. Below are the topics covered in this tutorial:
1. Why Data Visualization?
2. What Is Data Visualization?
3. Various Types Of Plots
4. What Is Matplotlib?
6. How To Use Matplotlib?
Analysis of data in Python with SciPy and pandas, Ubuntu installation, PyCharm configuration, Series, DataFrame, big data, medical data, merging data, groupby, graphing data, iPython using Wakari.io, and analyzing stock prices of US automakers including Ford and Telsa. As presented at Penguicon 2016.
Looking for a computer institute to learn Full Stack development and Digital Marketing? Our institute offers comprehensive courses in both areas, providing students with the skills and knowledge needed to succeed in today's digital landscape
Python for Data Science is a must learn for professionals in the Data Analytics domain. With the growth in IT industry, there is a booming demand for skilled Data Scientists and Python has evolved as the most preferred programming language. Through this blog, you will learn the basics, how to analyze data and then create some beautiful visualizations using Python.
Python is the choice llanguage for data analysis,
The aim of this slide is to provide a comprehensive learning path to people new to python for data analysis. This path provides a comprehensive overview of the steps you need to learn to use Python for data analysis.
Abstract: This PDSG workshop introduces the basics of Python libraries used in machine learning. Libraries covered are Numpy, Pandas and MathlibPlot.
Level: Fundamental
Requirements: One should have some knowledge of programming and some statistics.
( Python Training: https://www.edureka.co/python )
This Edureka Python Numpy tutorial (Python Tutorial Blog: https://goo.gl/wd28Zr) explains what exactly is Numpy and how it is better than Lists. It also explains various Numpy operations with examples.
Check out our Python Training Playlist: https://goo.gl/Na1p9G
This tutorial helps you to learn the following topics:
1. What is Numpy?
2. Numpy v/s Lists
3. Numpy Operations
4. Numpy Special Functions
Microsoft COCO: Common Objects in Context KhalidKhan412
Datasets are available for facial recognition, action recognition, object detection and recognition, etc. Image datasets are helpful in scene understanding and providing semantic description
Word embeddings have received a lot of attention since some Tomas Mikolov published word2vec in 2013 and showed that the embeddings that the neural network learned by “reading” a large corpus of text preserved semantic relations between words. As a result, this type of embedding started being studied in more detail and applied to more serious NLP and IR tasks such as summarization, query expansion, etc… More recently, researchers and practitioners alike have come to appreciate the power of this type of approach and have started a cottage industry of modifying Mikolov’s original approach to many different areas.
In this talk we will cover the implementation and mathematical details underlying tools like word2vec and some of the applications word embeddings have found in various areas. Starting from an intuitive overview of the main concepts and algorithms underlying the neural network architecture used in word2vec we will proceed to discussing the implementation details of the word2vec reference implementation in tensorflow. Finally, we will provide a birds eye view of the emerging field of “2vec" (dna2vec, node2vec, etc...) methods that use variations of the word2vec neural network architecture.
This (long) version of the Tutorial was presented at #O'Reilly AI 2017 in San Francisco. See https://bmtgoncalves.github.io/word2vec-and-friends/ for further details.
Analysis of data in Python with SciPy and pandas, Ubuntu installation, PyCharm configuration, Series, DataFrame, big data, medical data, merging data, groupby, graphing data, iPython using Wakari.io, and analyzing stock prices of US automakers including Ford and Telsa. As presented at Penguicon 2016.
Looking for a computer institute to learn Full Stack development and Digital Marketing? Our institute offers comprehensive courses in both areas, providing students with the skills and knowledge needed to succeed in today's digital landscape
Python for Data Science is a must learn for professionals in the Data Analytics domain. With the growth in IT industry, there is a booming demand for skilled Data Scientists and Python has evolved as the most preferred programming language. Through this blog, you will learn the basics, how to analyze data and then create some beautiful visualizations using Python.
Python is the choice llanguage for data analysis,
The aim of this slide is to provide a comprehensive learning path to people new to python for data analysis. This path provides a comprehensive overview of the steps you need to learn to use Python for data analysis.
Abstract: This PDSG workshop introduces the basics of Python libraries used in machine learning. Libraries covered are Numpy, Pandas and MathlibPlot.
Level: Fundamental
Requirements: One should have some knowledge of programming and some statistics.
( Python Training: https://www.edureka.co/python )
This Edureka Python Numpy tutorial (Python Tutorial Blog: https://goo.gl/wd28Zr) explains what exactly is Numpy and how it is better than Lists. It also explains various Numpy operations with examples.
Check out our Python Training Playlist: https://goo.gl/Na1p9G
This tutorial helps you to learn the following topics:
1. What is Numpy?
2. Numpy v/s Lists
3. Numpy Operations
4. Numpy Special Functions
Microsoft COCO: Common Objects in Context KhalidKhan412
Datasets are available for facial recognition, action recognition, object detection and recognition, etc. Image datasets are helpful in scene understanding and providing semantic description
Word embeddings have received a lot of attention since some Tomas Mikolov published word2vec in 2013 and showed that the embeddings that the neural network learned by “reading” a large corpus of text preserved semantic relations between words. As a result, this type of embedding started being studied in more detail and applied to more serious NLP and IR tasks such as summarization, query expansion, etc… More recently, researchers and practitioners alike have come to appreciate the power of this type of approach and have started a cottage industry of modifying Mikolov’s original approach to many different areas.
In this talk we will cover the implementation and mathematical details underlying tools like word2vec and some of the applications word embeddings have found in various areas. Starting from an intuitive overview of the main concepts and algorithms underlying the neural network architecture used in word2vec we will proceed to discussing the implementation details of the word2vec reference implementation in tensorflow. Finally, we will provide a birds eye view of the emerging field of “2vec" (dna2vec, node2vec, etc...) methods that use variations of the word2vec neural network architecture.
This (long) version of the Tutorial was presented at #O'Reilly AI 2017 in San Francisco. See https://bmtgoncalves.github.io/word2vec-and-friends/ for further details.
Neural networks for word embeddings have received a lot of attention since some Googlers published word2vec in 2013. They showed that the internal state (embeddings) that the neural network learned by "reading" a large corpus of text preserved semantic relations between words.
As a result, this type of embedding started being studied in more detail and applied to more serious Natural Language Processing + NLP and IR tasks such as summarization, query expansion, etc...
In this talk we will cover the intuitions and algorithms underlying word2vec family of algorithms. On the second half of the presentation we will quickly review than basics of tensorflow and analyze in detail the tensorflow reference implementation of word2vec
Stuck with preset themes? Not impressed with your dashboard color scheme? All you need is flexibility to pick the colors you desire. But first, let us understand why colors are so important?
Colors have a profound effect on human psychology and evoke certain reactions which may vary from person to person. They play a vital role in all our activities and influence our responses.
Hence the freedom of choosing colors is very enticing. With Collabion Charts for SharePoint, you will get that freedom! Yes, you can use the colors of your choice in your charts.
In the world of data visualization, colors play an important role. They help us differentiate among various data plots we represent on our charts.
However, some conventional colors also highlight certain kinds of messages. For example, red represents warnings and green represents safer situations. But then again, choosing the right hues is totally dependent on personal preferences.
So go ahead and choose from myriads of colors to enhance the look and feel of your single-series charts in Collabion Charts for SharePoint.
Visit http://collabion.com
Word embeddings have received a lot of attention since some Tomas Mikolov published word2vec in 2013 and showed that the embeddings that the neural network learned by “reading” a large corpus of text preserved semantic relations between words. As a result, this type of embedding started being studied in more detail and applied to more serious NLP and IR tasks such as summarization, query expansion, etc… More recently, researchers and practitioners alike have come to appreciate the power of this type of approach and have started a cottage industry of modifying Mikolov’s original approach to many different areas.
In this talk we will cover the implementation and mathematical details underlying tools like word2vec and some of the applications word embeddings have found in various areas. Starting from an intuitive overview of the main concepts and algorithms underlying the neural network architecture used in word2vec we will proceed to discussing the implementation details of the word2vec reference implementation in tensorflow. Finally, we will provide a birds eye view of the emerging field of “2vec" (dna2vec, node2vec, etc...) methods that use variations of the word2vec neural network architecture.
This (short) version of the Tutorial was presented at #AIWTB https://ai.withthebest.com/. See https://bmtgoncalves.github.io/word2vec-and-friends/ for further details on future (and longer) editions and sign up to http://tinyletter.com/dataforscience for related news and updates.
The world is ever changing. As a result, many of the systems and phenomena we are interested in evolve over time resulting in time evolving datasets. Timeseries often display any interesting properties and levels of correlation. In this tutorial we will introduce the students to the use of Recurrent Neural Networks and LSTMs to model and forecast different kinds of timeseries.
GitHub: https://github.com/DataForScience/RNN
Bitcoin has brought about a true revolution in how we think about money. In one fell stroke it solved the main problems that afflicted previous attempts at a truly digital currency: distributed consensus, double spending, and external attacks. Perhaps more importantly, it provided the first working version of a blockchain or distributed ledger. However, despite their relative simplicity, the underlying concepts on which these technologies are built are not well known and often obscured by hype and technical jargon.
Since the days of Bitcoin’s founding, many other crypto-currencies have been proposed and released. This tutorial will introduce these technologies in an intuitive way for data scientists, explaining their driving algorithms, motivations, and data structures.
The advent of large scale online social services coupled with the dissemination of affordable GPS enabled smartphones resulted in the accumulation of massive amounts of data documenting our individual and social behavior. Using large data sets from source such as Twitter, Wikipedia, Google Books and others we will present several recent results on how languages are used across both time and space.
In particular, we will analyze the role of multilinguals in Social Networks and how language dialects can be defined empirically based on the way a language is used in the real world. Finally, we will also analyze how Chinese and English usage changes from place to place and over time and how languages can be used to identify communities within the urban environment.
The data deluge we currently witnessing presents both opportunities and challenges. Never before have so many aspects of our world been so thoroughly quantified as now and never before has data been so plentiful. One of the main beneficiaries of the current data glut have been data intensive approaches such as Neural Networks in general and Deep Learning in particular, with many important new practical applications arising in the last couple of years.
In this tutorial we provide a practical introduction to the most important concepts of Neural Networks as we gradually implement simple neural networks from scratch to perform simple tasks such as character recognition. Our emphasis will be on understanding the architecture, training, advantages and pitfalls of practical neural networks.
Participants are expected to have a basic familiarity with calculus and linear algebra as well as working proficiency with the Python programming language.
A practical Introduction to Machine(s) LearningBruno Gonçalves
The data deluge we currently witnessing presents both opportunities and challenges. Never before have so many aspects of our world been so thoroughly quantified as now and never before has data been so plentiful. On the other hand, the complexity of the analyses required to extract useful information from these piles of data is also rapidly increasing rendering more traditional and simpler approaches simply unfeasible or unable to provide new insights.
In this tutorial we provide a practical introduction to some of the most important algorithms of machine learning that are relevant to the field of Complex Networks in general, with a particular emphasis on the analysis and modeling of empirical data. The goal is to provide the fundamental concepts necessary to make sense of the more sophisticated data analysis approaches that are currently appearing in the literature and to provide a field guide to the advantages an disadvantages of each algorithm.
In particular, we will cover unsupervised learning algorithms such as K-means, Expectation-Maximization, and supervised ones like Support Vector Machines, Neural Networks and Deep Learning. Participants are expected to have a basic understanding of calculus and linear algebra as well as working proficiency with the Python programming language.
The increasing availability of huge amounts of data on every aspect of societal and individual behavior has created unprecedented opportunities but it also created new challenges. On one hand, it is now much easier to acquire detailed datasets no practically anything while on the other it is much harder to make sure that our observations and conclusions are well justified and robust. In this short 3h tutorial we will briefly introduce some of the fundamental principles and algorithms of Machine Learning in an intuitive and practical way. Mathematical requirements will be kept to a minimum and short snippets of Python code will be presented to illustrate the application of each algorithm.
We start by very briefly introducing the Twitter platform and detailing the demographics of the users and the biases they introduce. The relationship between geography, mobility and social network properties will be described using the Twitter service as a case study. Finally, tutorial attendees will get the chance to review the most seminal works in the area where spatial and geographic perspectives are highlighted.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
4. www.bgoncalves.com@bgoncalves
The views and opinions expressed in this article are
those of the authors and do not necessarily reflect the
official policy or position of my employer. The
examples provided with this tutorial were chosen for
their didactic value and are not mean to be
representative of my day to day work.
Disclaimer
8. www.bgoncalves.com@bgoncalves
• Some cognitive tasks are significantly easier than others. In order, we are good a
distinguishing:
• Position, length
• Direction, Angle, Area
• Volume, Curvature, Shade
Perception
9. www.bgoncalves.com@bgoncalves
• Some cognitive tasks are significantly easier than others. In order, we are good a
distinguishing:
• Position, length
• Direction, Angle, Area
• Volume, Curvature, Shade
• Color Saturation.
Perception
10. www.bgoncalves.com@bgoncalves
Perception
• Context also matters!
• An object seen in the context of larger objects will appear smaller, while in the content of
smaller objects it will appear larger.
• Some cognitive tasks are significantly easier than others. In order, we are good a
distinguishing:
• Position, length
• Direction, Angle, Area
• Volume, Curvature, Shade
• Color Saturation.
11. www.bgoncalves.com@bgoncalves
Perception
• Context also matters!
• An object seen in the context of larger objects will appear smaller, while in the content of
smaller objects it will appear larger.
• And we “fill in the gaps”
• Some cognitive tasks are significantly easier than others. In order, we are good a
distinguishing:
• Position, length
• Direction, Angle, Area
• Volume, Curvature, Shade
• Color Saturation.
18. www.bgoncalves.com@bgoncalves
“Who in the rainbow can draw the line where the violet tint ends and the orange tint
begins? Distinctly we see the difference of the colors, but where exactly does the
one first blendingly enter into the other? So with sanity and insanity.”
(H. Melville)
Color Perception
35. www.bgoncalves.com@bgoncalves
Fundamental Principles of Analytical Design
1. Show comparisons, contrasts and differences
2. Show causality, mechanism, explanation and systematic structure
3. Show multivariate data: more than one or two variables
4. Completely integrate words, numbers, images and diagrams
5. Documentation
6.Content matters most of all
“Information Visualization is a form of knowledge compression”
D. McCandless
37. www.bgoncalves.com@bgoncalves
Fundamental tools
• Points
• Lines
• Areas
• Shapes
• Colors
• Text
• Each for these can be used to encode a given variable to produce
all the types of plots we are familiar with:
• Scatter plot - Just points (line)
38. www.bgoncalves.com@bgoncalves
Fundamental tools
• Points
• Lines
• Areas
• Shapes
• Colors
• Text
• Each for these can be used to encode a given variable to produce
all the types of plots we are familiar with:
• Scatter plot - Just points (line)
• Bar chart - Areas
39. www.bgoncalves.com@bgoncalves
Fundamental tools
• Points
• Lines
• Areas
• Shapes
• Colors
• Text
• Each for these can be used to encode a given variable to produce
all the types of plots we are familiar with:
• Scatter plot - Just points (line)
• Bar chart - Areas
• Bubble chart - Scatter plot + size + color (time)
40. www.bgoncalves.com@bgoncalves
Fundamental tools
• Points
• Lines
• Areas
• Shapes
• Colors
• Text
• Each for these can be used to encode a given variable to produce
all the types of plots we are familiar with:
• Scatter plot - Just points (line)
• Bar chart - Areas
• Bubble chart - Scatter plot + size + color (time)
• Pie chart - Areas + colors
41. www.bgoncalves.com@bgoncalves
Fundamental tools
• Points
• Lines
• Areas
• Shapes
• Colors
• Text
• Each for these can be used to encode a given variable to produce
all the types of plots we are familiar with:
• Scatter plot - Just points (line)
• Bar chart - Areas
• Bubble chart - Scatter plot + size + color (time)
• Pie chart - Areas + colors
• etc…
44. www.bgoncalves.com@bgoncalves
Basic Plotting
• Matplotlib uses an object oriented structure following an intuitive notation
• Each Axes object contains one or more Axis objects.
• A Figure is a set of one or more Axes.
• Each Axes is associated with exactly one Figure and each set of Markers is associated
with exactly one Axes.
• In other words, Markers/Lines represent a dataset that is plotted against one or more Axis.
An Axes object is (effectively) a subplot of a Figure.
X
Y
46. www.bgoncalves.com@bgoncalves
Basic Plotting - Programmatically! https://matplotlib.org/2.0.0/
• While the Figure object controls the way in which the figure is displayed.
• .gca() - Get the current Axes, creating one if necessary
• .show() - Show the final figure
• .savefig(“filename.ext”, dpi=300) - Save the figure to “filename.ext” where “.ext”
defines the format the saved image ()
X
Y
47. www.bgoncalves.com@bgoncalves
Basic Plotting - Programmatically!
• The first step is to import the pyplot module from matplotlib and instanciating a Figure object:
• The convention is to import pyplot as plt
• To create subplots (Axes) you use .subplots(nrows, ncols, sharex=False, sharey=False)
instead of .figure(). set sharex and/or sharey to True to keep the same scale in both cases.
• .subplots - returns a (fig, ax_lst) tuple where ax_lst is a list of Axes and fig is the Figure.
• Axes have several methods of interest:
• .plot(x, y) - Make a scatter or line plot from a list of x, y coordinates.
• .imshow(mat) - Plot a matrix as if it were an image. Element 0,0 is plotted in the top right
corner.
• .bar(x, y) - Make a bar plot where x is a list of the lower left coordinates of each bar and
y is the respective height.
• .pie(values, labels=labels) - Produce a pie plot out of a list of values list and labeled
with labels
• .savefig(filename) - Write the current figure as an static image
import matplotlib.pyplot as plt
fig = plt.figure()
X
Y
48. www.bgoncalves.com@bgoncalves
Matplotlib - decorations
• The respective functions are named in an intuitive wa
Every Axes object has as methods:
• .set_xlabel(label)
• .set_ylabel(label)
• .set_title(title)
• And axis limits can be set using:
• .set_xlim(xmin, xmax)
• .set_ylim(ymin, ymax)
• Tick marks and labels are set using:
• .set_xticks(ticks)/.set_yticks(ticks)
• .set_xticklabels(labels)/.set_yticklabels(labels)
https://matplotlib.org/2.0.0/
49. www.bgoncalves.com@bgoncalves
Matplotlib - Images
• .imshow(fig) - Display an image on a set of axes.
• fig can be any matrix of numbers.
• Further plotting can occur by simply using the functions described above
https://matplotlib.org/2.0.0/