This document provides a cheat sheet for creating data visualizations with ggplot2 in R. It summarizes the key components of ggplot2 including data, geoms, stats, scales, facets, and themes. Geoms like geom_point and geom_bar are used to represent data points. Stats like stat_density can be used to calculate new variables for plotting. Scales map data values to visual properties. The cheat sheet provides examples of customizing plots by adding geoms, stats, scales, coordinate systems, and facets.
This document provides a cheat sheet for using ggplot2 for data visualization. It summarizes the key components of ggplot2 including graphical primitives, geoms, stats, scales, coordinate systems, and position adjustments. The cheat sheet lists many geom functions and their associated aesthetics for representing different types of data plots. It also lists stat functions that can be used to build new variables for plotting and scale functions for adjusting aesthetics.
ggplot2 is based on the grammar of graphics, the idea
that you can build every graph from the same
components: a data set, a coordinate system,
and geoms—visual marks that represent data points.
ggplot2 is a grammar of graphics package for creating plots in R. It allows building graphs from data, a coordinate system, and geoms (visual marks). Geoms represent data points and their aesthetic properties like color, size, and position on the plot. Common geoms include points, lines, and bars. Scales map data values to visual properties. Coordinate systems define the space in which geoms are drawn.
The document provides information on using different geom functions in ggplot2 to visualize data. It explains that geoms are used to represent data points and their aesthetic properties can represent variables. It then lists various geom functions for visualizing one, two, or three variables including points, lines, histograms, density plots, and maps. It also describes how to add layers with different geoms to build graphs and visualize relationships in the data.
This document provides a cheat sheet for using different geoms (visual marks) in ggplot2 to represent data points from one, two, or three variables. It lists various geoms that can be used to visualize univariate and bivariate continuous, discrete, and continuous distribution data, including their associated aesthetic mappings. Basic concepts around building graphs in ggplot2 using data, geoms, and coordinate systems are also introduced.
This document summarizes key concepts from Day 4 of an R bootcamp, including:
- Using qplot() and ggplot() to create basic scatterplots and add elements like titles, colors, and smoothing lines
- The structure of ggplot2 graphs using data, aesthetics, geometries, scales, and other elements
- Techniques for dealing with overplotting like jittering, binning, and faceting
- Customizing graphs using themes, color palettes, and different coordinate systems
- Recreating an example of Minard's graph showing Napoleon's 1812 march and losses using ggplot2
Using R in financial modeling provides an introduction to using R for financial applications. It discusses importing stock price data from various sources and visualizing it using basic graphs and technical indicators. It also covers topics like calculating returns, estimating distributions of returns, correlations, volatility modeling, and value at risk calculations. The document provides examples of commands and functions in R to perform these financial analytics tasks on sample stock price data.
r for data science 2. grammar of graphics (ggplot2) clean -refMin-hyung Kim
REFERENCES
#1. RStudio Official Documentations (Help & Cheat Sheet)
Free Webpage) https://www.rstudio.com/resources/cheatsheets/
#2. Wickham, H. and Grolemund, G., 2016.R for data science: import, tidy, transform, visualize, and model data. O'Reilly.
Free Webpage) https://r4ds.had.co.nz/
Cf) Tidyverse syntax (www.tidyverse.org), rather than R Base syntax
Cf) Hadley Wickham: Chief Scientist at RStudio. Adjunct Professor of Statistics at the University of Auckland, Stanford University, and Rice University
This document provides a cheat sheet for using ggplot2 for data visualization. It summarizes the key components of ggplot2 including graphical primitives, geoms, stats, scales, coordinate systems, and position adjustments. The cheat sheet lists many geom functions and their associated aesthetics for representing different types of data plots. It also lists stat functions that can be used to build new variables for plotting and scale functions for adjusting aesthetics.
ggplot2 is based on the grammar of graphics, the idea
that you can build every graph from the same
components: a data set, a coordinate system,
and geoms—visual marks that represent data points.
ggplot2 is a grammar of graphics package for creating plots in R. It allows building graphs from data, a coordinate system, and geoms (visual marks). Geoms represent data points and their aesthetic properties like color, size, and position on the plot. Common geoms include points, lines, and bars. Scales map data values to visual properties. Coordinate systems define the space in which geoms are drawn.
The document provides information on using different geom functions in ggplot2 to visualize data. It explains that geoms are used to represent data points and their aesthetic properties can represent variables. It then lists various geom functions for visualizing one, two, or three variables including points, lines, histograms, density plots, and maps. It also describes how to add layers with different geoms to build graphs and visualize relationships in the data.
This document provides a cheat sheet for using different geoms (visual marks) in ggplot2 to represent data points from one, two, or three variables. It lists various geoms that can be used to visualize univariate and bivariate continuous, discrete, and continuous distribution data, including their associated aesthetic mappings. Basic concepts around building graphs in ggplot2 using data, geoms, and coordinate systems are also introduced.
This document summarizes key concepts from Day 4 of an R bootcamp, including:
- Using qplot() and ggplot() to create basic scatterplots and add elements like titles, colors, and smoothing lines
- The structure of ggplot2 graphs using data, aesthetics, geometries, scales, and other elements
- Techniques for dealing with overplotting like jittering, binning, and faceting
- Customizing graphs using themes, color palettes, and different coordinate systems
- Recreating an example of Minard's graph showing Napoleon's 1812 march and losses using ggplot2
Using R in financial modeling provides an introduction to using R for financial applications. It discusses importing stock price data from various sources and visualizing it using basic graphs and technical indicators. It also covers topics like calculating returns, estimating distributions of returns, correlations, volatility modeling, and value at risk calculations. The document provides examples of commands and functions in R to perform these financial analytics tasks on sample stock price data.
r for data science 2. grammar of graphics (ggplot2) clean -refMin-hyung Kim
REFERENCES
#1. RStudio Official Documentations (Help & Cheat Sheet)
Free Webpage) https://www.rstudio.com/resources/cheatsheets/
#2. Wickham, H. and Grolemund, G., 2016.R for data science: import, tidy, transform, visualize, and model data. O'Reilly.
Free Webpage) https://r4ds.had.co.nz/
Cf) Tidyverse syntax (www.tidyverse.org), rather than R Base syntax
Cf) Hadley Wickham: Chief Scientist at RStudio. Adjunct Professor of Statistics at the University of Auckland, Stanford University, and Rice University
ggplot2: An Extensible Platform for Publication-quality GraphicsClaus Wilke
Talk given at the Symposium on Data Science and Statistics in Bellevue, Washington, May 29 - June 1, 2019, organized by the American Statistical Association and Interface Foundation of North America.
This document provides examples of various plotting functions in R including plot(), boxplot(), hist(), pairs(), barplot(), densityplot(), dotplot(), histogram(), xyplot(), cloud(), and biplot/triplot. Functions are demonstrated using built-in datasets like iris and by plotting variables against each other to create scatter plots, histograms, and other visualizations.
Data visualization using the grammar of graphicsRupak Roy
Well-documented data visualization using ggplot2, geom_density2d, stat_density_2d, geom_smooth, stat_ellipse, scatterplot and much more. Let me know if anything is required. Ping me at google #bobrupakroy
This document provides an example of creating geospatial plots in R using ggmap() and ggplot2. It includes 3 steps: 1) Get the map using get_map(), 2) Plot the map using ggmap(), and 3) Plot the dataset on the map using ggplot2 objects like geom_point(). The example loads crime and neighborhood datasets, filters the data, gets a map of Seattle, and plots crime incidents and dangerous neighborhoods on the map. It demonstrates various geospatial plotting techniques like adjusting point transparency, adding density estimates, labeling points, and faceting by crime type.
The document discusses alternative R packages for creating graphs beyond base R graphics. It focuses on the lattice package, which aims to improve on base graphics with better defaults and easier multivariate displays using trellis graphs. Trellis graphs display variables or relationships conditioned on other variables. Examples of different graph types like scatterplots, boxplots, and density plots are provided.
This document demonstrates how to create genomic graphics and plots using the ggbio and GenomicFeatures R packages. It shows examples of:
1) Creating tracks plots to visualize genomic data over time using qplot and tracks functions.
2) Plotting genomic ranges data from a GRanges object using autoplot with options to facet by strands or calculate coverage.
3) Creating bar plots of coverage data from a GRanges object grouped by chromosome and strand.
4) Drawing circular genome plots from GRanges data using layout_circle with options to add multiple track types like rectangles, bars, points and links between ranges.
The document discusses the grammar of graphics as it relates to the ggplot2 package in R. It explores relationships in the diamonds dataset between variables like carat, clarity, cut and price using various ggplot geoms and facets. It examines how aesthetics, geometries, statistics, coordinates and themes are components of the grammar that impact visualizations. Limitations of the grammar are also noted. Examples demonstrate facetting, transforming scales, and changing themes to refine visual analysis.
Folium is a Python library for interactive maps that allows users to easily create and publish interactive web maps. Some key features of Folium include adding markers, polylines, polygons and other shapes to maps. It also supports adding popups and labels to features, customizing basemaps and tiles, integrating geospatial data formats, and using plugins to add additional functionality like search, drawing and heatmap tools. Folium works with common Python data structures and geospatial file formats and generates HTML and JavaScript to display interactive maps in the Jupyter notebook or standalone web pages.
This document provides an overview of various plotting functions in R for visualizing data including quantile-quantile plots, barplots, boxplots, interaction plots, density plots, 3D plots, and adjusting graphical parameters such as colors, sizes, fonts and saving plots. Key functions discussed are qqnorm, qqline, barplot, boxplot, interaction.plot, density, persp, contour, par, and devices like pdf, jpeg for saving plots.
Advanced Data Visualization Examples with R-Part IIDr. Volkan OBAN
This document provides several examples of advanced data visualization techniques using R. It includes examples of 3D surface plots, contour plots, scatter plots and network graphs using various R packages like plot3D, scatterplot3D, ggplot2, qgraph and ggtree. Functions used include surf3D, contour3D, arrows3D, persp3D, image3D, scatter3D, qgraph, geom_point, geom_violin and ggtree. The examples demonstrate different visualization approaches for multivariate, spatial and network data.
This document provides a cheat sheet for creating various types of data visualizations using Stata 14.1. It summarizes the basic plot syntax and options for continuous, discrete, and categorical data plots like histograms, box plots, scatter plots, and more. It also shows how to add elements like titles, save figures, and combine multiple plots.
R is an open source statistical computing platform that is rapidly growing in popularity within academia. It allows for statistical analysis and data visualization. The document provides an introduction to basic R functions and syntax for assigning values, working with data frames, filtering data, plotting, and connecting to databases. More advanced techniques demonstrated include decision trees, random forests, and other data mining algorithms.
This document provides an overview of graphics functions in R for creating plots and customizing plot elements. It discusses how to start a new plot, set axis scales and limits, draw axes, add titles and labels, and draw different types of graphical elements like points, lines, and shapes. Functions covered include plot.new(), plot.window(), axis(), title(), points(), lines(), abline(), segments(), and box(). The document uses examples to demonstrate how to construct basic and customized plots programmatically in R.
Spatial Analysis with R - the Good, the Bad, and the PrettyNoam Ross
This document discusses using R for spatial data analysis. It notes that spatial data has unique characteristics like geometry and attributes, map projections, and large multivariate and time series datasets. It compares GIS and R, noting that R allows for more flexible analysis, attributes as important, creativity, repeatability, and speed of development. It covers representing spatial data in R using classes like SpatialPointsDataFrame and reading/writing spatial data. It also summarizes types of spatial analysis and statistics that can be performed in R including queries, transformations, exploration, optimization, inference, modeling, point pattern analysis, geostatistics, and inference. Finally, it provides examples of performing spatial analysis and visualization in R.
Computer Graphics in Java and Scala - Part 1bPhilip Schwarz
First see the Scala program from Part 1 translated into Java.
Then see the Scala program modified to produce a more intricate drawing.
Java Code: https://github.com/philipschwarz/computer-graphics-50-triangles-java
Scala Code: https://github.com/philipschwarz/computer-graphics-chessboard-with-a-great-many-squares-scala
Introduction
Plotting basic 2-D plots.
The plot command
The fplot command
Plotting multiple graphs in the same plot
Formatting plots
USING THE plot() COMMAND TO PLOT
MULTIPLE GRAPHS IN THE SAME PLOT
MATLAB PROGRAM TO PLOT VI CHARACTERISTICS OF A DIODE
SUMMARY
A Dimension Abstraction Approach to Vectorization in MatlabaiQUANT
The document presents an approach to vectorizing Matlab source code while preserving correctness and improving efficiency. It describes representing variable dimensionalities, rules for determining when vectorization is valid, and techniques like transpose transformations and additive reduction to enable more vectorizations. Evaluation on image processing and linear algebra examples showed speedups of 1.56x to 4.6x from vectorizing loop-based code.
The document presents an approach to vectorizing Matlab source code while preserving correctness and improving efficiency. It describes representing variable dimensionalities, rules for determining when vectorization is valid, and techniques like transpose transformations and additive reduction to enable more vectorizations. Evaluation on image processing and linear algebra examples showed speedups of 1.56x to 4.6x from vectorizing loop-based code.
ggplot2: An Extensible Platform for Publication-quality GraphicsClaus Wilke
Talk given at the Symposium on Data Science and Statistics in Bellevue, Washington, May 29 - June 1, 2019, organized by the American Statistical Association and Interface Foundation of North America.
This document provides examples of various plotting functions in R including plot(), boxplot(), hist(), pairs(), barplot(), densityplot(), dotplot(), histogram(), xyplot(), cloud(), and biplot/triplot. Functions are demonstrated using built-in datasets like iris and by plotting variables against each other to create scatter plots, histograms, and other visualizations.
Data visualization using the grammar of graphicsRupak Roy
Well-documented data visualization using ggplot2, geom_density2d, stat_density_2d, geom_smooth, stat_ellipse, scatterplot and much more. Let me know if anything is required. Ping me at google #bobrupakroy
This document provides an example of creating geospatial plots in R using ggmap() and ggplot2. It includes 3 steps: 1) Get the map using get_map(), 2) Plot the map using ggmap(), and 3) Plot the dataset on the map using ggplot2 objects like geom_point(). The example loads crime and neighborhood datasets, filters the data, gets a map of Seattle, and plots crime incidents and dangerous neighborhoods on the map. It demonstrates various geospatial plotting techniques like adjusting point transparency, adding density estimates, labeling points, and faceting by crime type.
The document discusses alternative R packages for creating graphs beyond base R graphics. It focuses on the lattice package, which aims to improve on base graphics with better defaults and easier multivariate displays using trellis graphs. Trellis graphs display variables or relationships conditioned on other variables. Examples of different graph types like scatterplots, boxplots, and density plots are provided.
This document demonstrates how to create genomic graphics and plots using the ggbio and GenomicFeatures R packages. It shows examples of:
1) Creating tracks plots to visualize genomic data over time using qplot and tracks functions.
2) Plotting genomic ranges data from a GRanges object using autoplot with options to facet by strands or calculate coverage.
3) Creating bar plots of coverage data from a GRanges object grouped by chromosome and strand.
4) Drawing circular genome plots from GRanges data using layout_circle with options to add multiple track types like rectangles, bars, points and links between ranges.
The document discusses the grammar of graphics as it relates to the ggplot2 package in R. It explores relationships in the diamonds dataset between variables like carat, clarity, cut and price using various ggplot geoms and facets. It examines how aesthetics, geometries, statistics, coordinates and themes are components of the grammar that impact visualizations. Limitations of the grammar are also noted. Examples demonstrate facetting, transforming scales, and changing themes to refine visual analysis.
Folium is a Python library for interactive maps that allows users to easily create and publish interactive web maps. Some key features of Folium include adding markers, polylines, polygons and other shapes to maps. It also supports adding popups and labels to features, customizing basemaps and tiles, integrating geospatial data formats, and using plugins to add additional functionality like search, drawing and heatmap tools. Folium works with common Python data structures and geospatial file formats and generates HTML and JavaScript to display interactive maps in the Jupyter notebook or standalone web pages.
This document provides an overview of various plotting functions in R for visualizing data including quantile-quantile plots, barplots, boxplots, interaction plots, density plots, 3D plots, and adjusting graphical parameters such as colors, sizes, fonts and saving plots. Key functions discussed are qqnorm, qqline, barplot, boxplot, interaction.plot, density, persp, contour, par, and devices like pdf, jpeg for saving plots.
Advanced Data Visualization Examples with R-Part IIDr. Volkan OBAN
This document provides several examples of advanced data visualization techniques using R. It includes examples of 3D surface plots, contour plots, scatter plots and network graphs using various R packages like plot3D, scatterplot3D, ggplot2, qgraph and ggtree. Functions used include surf3D, contour3D, arrows3D, persp3D, image3D, scatter3D, qgraph, geom_point, geom_violin and ggtree. The examples demonstrate different visualization approaches for multivariate, spatial and network data.
This document provides a cheat sheet for creating various types of data visualizations using Stata 14.1. It summarizes the basic plot syntax and options for continuous, discrete, and categorical data plots like histograms, box plots, scatter plots, and more. It also shows how to add elements like titles, save figures, and combine multiple plots.
R is an open source statistical computing platform that is rapidly growing in popularity within academia. It allows for statistical analysis and data visualization. The document provides an introduction to basic R functions and syntax for assigning values, working with data frames, filtering data, plotting, and connecting to databases. More advanced techniques demonstrated include decision trees, random forests, and other data mining algorithms.
This document provides an overview of graphics functions in R for creating plots and customizing plot elements. It discusses how to start a new plot, set axis scales and limits, draw axes, add titles and labels, and draw different types of graphical elements like points, lines, and shapes. Functions covered include plot.new(), plot.window(), axis(), title(), points(), lines(), abline(), segments(), and box(). The document uses examples to demonstrate how to construct basic and customized plots programmatically in R.
Spatial Analysis with R - the Good, the Bad, and the PrettyNoam Ross
This document discusses using R for spatial data analysis. It notes that spatial data has unique characteristics like geometry and attributes, map projections, and large multivariate and time series datasets. It compares GIS and R, noting that R allows for more flexible analysis, attributes as important, creativity, repeatability, and speed of development. It covers representing spatial data in R using classes like SpatialPointsDataFrame and reading/writing spatial data. It also summarizes types of spatial analysis and statistics that can be performed in R including queries, transformations, exploration, optimization, inference, modeling, point pattern analysis, geostatistics, and inference. Finally, it provides examples of performing spatial analysis and visualization in R.
Computer Graphics in Java and Scala - Part 1bPhilip Schwarz
First see the Scala program from Part 1 translated into Java.
Then see the Scala program modified to produce a more intricate drawing.
Java Code: https://github.com/philipschwarz/computer-graphics-50-triangles-java
Scala Code: https://github.com/philipschwarz/computer-graphics-chessboard-with-a-great-many-squares-scala
Introduction
Plotting basic 2-D plots.
The plot command
The fplot command
Plotting multiple graphs in the same plot
Formatting plots
USING THE plot() COMMAND TO PLOT
MULTIPLE GRAPHS IN THE SAME PLOT
MATLAB PROGRAM TO PLOT VI CHARACTERISTICS OF A DIODE
SUMMARY
A Dimension Abstraction Approach to Vectorization in MatlabaiQUANT
The document presents an approach to vectorizing Matlab source code while preserving correctness and improving efficiency. It describes representing variable dimensionalities, rules for determining when vectorization is valid, and techniques like transpose transformations and additive reduction to enable more vectorizations. Evaluation on image processing and linear algebra examples showed speedups of 1.56x to 4.6x from vectorizing loop-based code.
The document presents an approach to vectorizing Matlab source code while preserving correctness and improving efficiency. It describes representing variable dimensionalities, rules for determining when vectorization is valid, and techniques like transpose transformations and additive reduction to enable more vectorizations. Evaluation on image processing and linear algebra examples showed speedups of 1.56x to 4.6x from vectorizing loop-based code.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Natural Language Processing (NLP), RAG and its applications .pptxfkyes25
1. In the realm of Natural Language Processing (NLP), knowledge-intensive tasks such as question answering, fact verification, and open-domain dialogue generation require the integration of vast and up-to-date information. Traditional neural models, though powerful, struggle with encoding all necessary knowledge within their parameters, leading to limitations in generalization and scalability. The paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" introduces RAG (Retrieval-Augmented Generation), a novel framework that synergizes retrieval mechanisms with generative models, enhancing performance by dynamically incorporating external knowledge during inference.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
1. Data visualization with ggplot2 : : CHEAT SHEET
ggplot2 is based on the grammar of graphics, the idea
that you can build every graph from the same
components: a data set, a coordinate system,
and geoms—visual marks that represent data points.
Basics
GRAPHICAL PRIMITIVES
a + geom_blank() and a + expand_limits()
Ensure limits include values across all plots.
b + geom_curve(aes(yend = lat + 1,
xend = long + 1), curvature = 1) - x, xend, y, yend,
alpha, angle, color, curvature, linetype, size
a + geom_path(lineend = "butt",
linejoin = "round", linemitre = 1)
x, y, alpha, color, group, linetype, size
a + geom_polygon(aes(alpha = 50)) - x, y, alpha,
color, fill, group, subgroup, linetype, size
b + geom_rect(aes(xmin = long, ymin = lat,
xmax = long + 1, ymax = lat + 1)) - xmax, xmin,
ymax, ymin, alpha, color, fill, linetype, size
a + geom_ribbon(aes(ymin = unemploy - 900,
ymax = unemploy + 900)) - x, ymax, ymin,
alpha, color, fill, group, linetype, size
+ =
To display values, map variables in the data to visual
properties of the geom (aesthetics) like size, color, and x
and y locations.
+ =
data geom
x = F · y = A
coordinate
system
plot
data geom
x = F · y = A
color = F
size = A
coordinate
system
plot
Complete the template below to build a graph.
required
ggplot(data = mpg, aes(x = cty, y = hwy)) Begins a plot
that you finish by adding layers to. Add one geom
function per layer.
last_plot() Returns the last plot.
ggsave("plot.png", width = 5, height = 5) Saves last plot
as 5’ x 5’ file named "plot.png" in working directory.
Matches file type to file extension.
F M A
F M A
LINE SEGMENTS
common aesthetics: x, y, alpha, color, linetype, size
b + geom_abline(aes(intercept = 0, slope = 1))
b + geom_hline(aes(yintercept = lat))
b + geom_vline(aes(xintercept = long))
b + geom_segment(aes(yend = lat + 1, xend = long + 1))
b + geom_spoke(aes(angle = 1:1155, radius = 1))
a <- ggplot(economics, aes(date, unemploy))
b <- ggplot(seals, aes(x = long, y = lat))
ONE VARIABLE continuous
c <- ggplot(mpg, aes(hwy)); c2 <- ggplot(mpg)
c + geom_area(stat = "bin")
x, y, alpha, color, fill, linetype, size
c + geom_density(kernel = "gaussian")
x, y, alpha, color, fill, group, linetype, size, weight
c + geom_dotplot()
x, y, alpha, color, fill
c + geom_freqpoly()
x, y, alpha, color, group, linetype, size
c + geom_histogram(binwidth = 5)
x, y, alpha, color, fill, linetype, size, weight
c2 + geom_qq(aes(sample = hwy))
x, y, alpha, color, fill, linetype, size, weight
discrete
d <- ggplot(mpg, aes(fl))
d + geom_bar()
x, alpha, color, fill, linetype, size, weight
e + geom_label(aes(label = cty), nudge_x = 1,
nudge_y = 1) - x, y, label, alpha, angle, color,
family, fontface, hjust, lineheight, size, vjust
e + geom_point()
x, y, alpha, color, fill, shape, size, stroke
e + geom_quantile()
x, y, alpha, color, group, linetype, size, weight
e + geom_rug(sides = “bl")
x, y, alpha, color, linetype, size
e + geom_smooth(method = lm)
x, y, alpha, color, fill, group, linetype, size, weight
e + geom_text(aes(label = cty), nudge_x = 1,
nudge_y = 1) - x, y, label, alpha, angle, color,
family, fontface, hjust, lineheight, size, vjust
one discrete, one continuous
f <- ggplot(mpg, aes(class, hwy))
f + geom_col()
x, y, alpha, color, fill, group, linetype, size
f + geom_boxplot()
x, y, lower, middle, upper, ymax, ymin, alpha,
color, fill, group, linetype, shape, size, weight
f + geom_dotplot(binaxis = "y", stackdir = “center")
x, y, alpha, color, fill, group
f + geom_violin(scale = “area")
x, y, alpha, color, fill, group, linetype, size, weight
both discrete
g <- ggplot(diamonds, aes(cut, color))
g + geom_count()
x, y, alpha, color, fill, shape, size, stroke
e + geom_jitter(height = 2, width = 2)
x, y, alpha, color, fill, shape, size
THREE VARIABLES
seals$z <- with(seals, sqrt(delta_long^2 + delta_lat^2)); l <- ggplot(seals, aes(long, lat))
l + geom_raster(aes(fill = z), hjust = 0.5,
vjust = 0.5, interpolate = FALSE)
x, y, alpha, fill
l + geom_tile(aes(fill = z))
x, y, alpha, color, fill, linetype, size, width
h + geom_bin2d(binwidth = c(0.25, 500))
x, y, alpha, color, fill, linetype, size, weight
h + geom_density_2d()
x, y, alpha, color, group, linetype, size
h + geom_hex()
x, y, alpha, color, fill, size
continuous function
i <- ggplot(economics, aes(date, unemploy))
visualizing error
df <- data.frame(grp = c("A", "B"), fit = 4:5, se = 1:2)
j <- ggplot(df, aes(grp, fit, ymin = fit - se, ymax = fit + se))
maps
data <- data.frame(murder = USArrests$Murder,
state = tolower(rownames(USArrests)))
map <- map_data("state")
k <- ggplot(data, aes(fill = murder))
k + geom_map(aes(map_id = state), map = map)
+ expand_limits(x = map$long, y = map$lat)
map_id, alpha, color, fill, linetype, size
Not
required,
sensible
defaults
supplied
Geoms Use a geom function to represent data points, use the geom’s aesthetic properties to represent variables.
Each function returns a layer.
TWO VARIABLES
both continuous
e <- ggplot(mpg, aes(cty, hwy))
continuous bivariate distribution
h <- ggplot(diamonds, aes(carat, price))
RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at ggplot2.tidyverse.org • ggplot2 3.3.5 • Updated: 2021-08
ggplot (data = <DATA> ) +
<GEOM_FUNCTION> (mapping = aes( <MAPPINGS> ),
stat = <STAT> , position = <POSITION> ) +
<COORDINATE_FUNCTION> +
<FACET_FUNCTION> +
<SCALE_FUNCTION> +
<THEME_FUNCTION>
l + geom_contour(aes(z = z))
x, y, z, alpha, color, group, linetype, size, weight
l + geom_contour_filled(aes(fill = z))
x, y, alpha, color, fill, group, linetype, size, subgroup
i + geom_area()
x, y, alpha, color, fill, linetype, size
i + geom_line()
x, y, alpha, color, group, linetype, size
i + geom_step(direction = "hv")
x, y, alpha, color, group, linetype, size
j + geom_crossbar(fatten = 2) - x, y, ymax,
ymin, alpha, color, fill, group, linetype, size
j + geom_errorbar() - x, ymax, ymin,
alpha, color, group, linetype, size, width
Also geom_errorbarh().
j + geom_linerange()
x, ymin, ymax, alpha, color, group, linetype, size
j + geom_pointrange() - x, y, ymin, ymax,
alpha, color, fill, group, linetype, shape, size
Aes
color and fill - string ("red", "#RRGGBB")
linetype - integer or string (0 = "blank", 1 = "solid",
2 = "dashed", 3 = "dotted", 4 = "dotdash", 5 = "longdash",
6 = "twodash")
lineend - string ("round", "butt", or "square")
linejoin - string ("round", "mitre", or "bevel")
size - integer (line width in mm)
shape - integer/shape name or
a single character ("a")
Common aesthetic values.
2. Scales Coordinate Systems
A stat builds new variables to plot (e.g., count, prop).
Stats An alternative way to build a layer.
+ =
data geom
x = x ·
y = ..count..
coordinate
system
plot
fl cty cyl
x ..count..
stat
Visualize a stat by changing the default stat of a geom
function, geom_bar(stat="count") or by using a stat
function, stat_count(geom="bar"), which calls a default
geom to make a layer (equivalent to a geom function).
Use ..name.. syntax to map stat variables to aesthetics.
i + stat_density_2d(aes(fill = ..level..),
geom = "polygon")
stat function geommappings
variable created by stat
geom to use
c + stat_bin(binwidth = 1, boundary = 10)
x, y | ..count.., ..ncount.., ..density.., ..ndensity..
c + stat_count(width = 1) x, y | ..count.., ..prop..
c + stat_density(adjust = 1, kernel = "gaussian")
x, y | ..count.., ..density.., ..scaled..
e + stat_bin_2d(bins = 30, drop = T)
x, y, fill | ..count.., ..density..
e + stat_bin_hex(bins = 30) x, y, fill | ..count.., ..density..
e + stat_density_2d(contour = TRUE, n = 100)
x, y, color, size | ..level..
e + stat_ellipse(level = 0.95, segments = 51, type = "t")
l + stat_contour(aes(z = z)) x, y, z, order | ..level..
l + stat_summary_hex(aes(z = z), bins = 30, fun = max)
x, y, z, fill | ..value..
l + stat_summary_2d(aes(z = z), bins = 30, fun = mean)
x, y, z, fill | ..value..
f + stat_boxplot(coef = 1.5)
x, y | ..lower.., ..middle.., ..upper.., ..width.. , ..ymin.., ..ymax..
f + stat_ydensity(kernel = "gaussian", scale = "area") x, y
| ..density.., ..scaled.., ..count.., ..n.., ..violinwidth.., ..width..
e + stat_ecdf(n = 40) x, y | ..x.., ..y..
e + stat_quantile(quantiles = c(0.1, 0.9),
formula = y ~ log(x), method = "rq") x, y | ..quantile..
e + stat_smooth(method = "lm", formula = y ~ x, se = T,
level = 0.95) x, y | ..se.., ..x.., ..y.., ..ymin.., ..ymax..
ggplot() + xlim(-5, 5) + stat_function(fun = dnorm,
n = 20, geom = “point”) x | ..x.., ..y..
ggplot() + stat_qq(aes(sample = 1:100))
x, y, sample | ..sample.., ..theoretical..
e + stat_sum() x, y, size | ..n.., ..prop..
e + stat_summary(fun.data = "mean_cl_boot")
h + stat_summary_bin(fun = "mean", geom = "bar")
e + stat_identity()
e + stat_unique()
Scales map data values to the visual values of an
aesthetic. To change a mapping, add a new scale.
n <- d + geom_bar(aes(fill = fl))
n + scale_fill_manual(
values = c("skyblue", "royalblue", "blue", "navy"),
limits = c("d", "e", "p", "r"), breaks =c("d", "e", "p", “r"),
name = "fuel", labels = c("D", "E", "P", "R"))
scale_
aesthetic
to adjust
prepackaged
scale to use
scale-specific
arguments
title to use in
legend/axis
labels to use
in legend/axis
breaks to use in
legend/axis
range of
values to include
in mapping
GENERAL PURPOSE SCALES
Use with most aesthetics
scale_*_continuous() - Map cont’ values to visual ones.
scale_*_discrete() - Map discrete values to visual ones.
scale_*_binned() - Map continuous values to discrete bins.
scale_*_identity() - Use data values as visual ones.
scale_*_manual(values = c()) - Map discrete values to
manually chosen visual ones.
scale_*_date(date_labels = "%m/%d"),
date_breaks = "2 weeks") - Treat data values as dates.
scale_*_datetime() - Treat data values as date times.
Same as scale_*_date(). See ?strptime for label formats.
X & Y LOCATION SCALES
Use with x or y aesthetics (x shown here)
scale_x_log10() - Plot x on log10 scale.
scale_x_reverse() - Reverse the direction of the x axis.
scale_x_sqrt() - Plot x on square root scale.
COLOR AND FILL SCALES (DISCRETE)
n + scale_fill_brewer(palette = "Blues")
For palette choices:
RColorBrewer::display.brewer.all()
n + scale_fill_grey(start = 0.2,
end = 0.8, na.value = "red")
COLOR AND FILL SCALES (CONTINUOUS)
o <- c + geom_dotplot(aes(fill = ..x..))
o + scale_fill_distiller(palette = “Blues”)
o + scale_fill_gradient(low="red", high=“yellow")
o + scale_fill_gradient2(low = "red", high = “blue”,
mid = "white", midpoint = 25)
o + scale_fill_gradientn(colors = topo.colors(6))
Also: rainbow(), heat.colors(), terrain.colors(),
cm.colors(), RColorBrewer::brewer.pal()
SHAPE AND SIZE SCALES
p <- e + geom_point(aes(shape = fl, size = cyl))
p + scale_shape() + scale_size()
p + scale_shape_manual(values = c(3:7))
p + scale_radius(range = c(1,6))
p + scale_size_area(max_size = 6)
r <- d + geom_bar()
r + coord_cartesian(xlim = c(0, 5)) - xlim, ylim
The default cartesian coordinate system.
r + coord_fixed(ratio = 1/2)
ratio, xlim, ylim - Cartesian coordinates with
fixed aspect ratio between x and y units.
ggplot(mpg, aes(y = fl)) + geom_bar()
Flip cartesian coordinates by switching
x and y aesthetic mappings.
r + coord_polar(theta = "x", direction=1)
theta, start, direction - Polar coordinates.
r + coord_trans(y = “sqrt") - x, y, xlim, ylim
Transformed cartesian coordinates. Set xtrans
and ytrans to the name of a window function.
π + coord_quickmap()
π + coord_map(projection = "ortho", orientation
= c(41, -74, 0)) - projection, xlim, ylim
Map projections from the mapproj package
(mercator (default), azequalarea, lagrange, etc.).
Position Adjustments
Position adjustments determine how to arrange geoms
that would otherwise occupy the same space.
s <- ggplot(mpg, aes(fl, fill = drv))
s + geom_bar(position = "dodge")
Arrange elements side by side.
s + geom_bar(position = "fill")
Stack elements on top of one
another, normalize height.
e + geom_point(position = "jitter")
Add random noise to X and Y position of
each element to avoid overplotting.
e + geom_label(position = "nudge")
Nudge labels away from points.
s + geom_bar(position = "stack")
Stack elements on top of one another.
Each position adjustment can be recast as a function
with manual width and height arguments:
s + geom_bar(position = position_dodge(width = 1))
A
B
Themes
r + theme_bw()
White background
with grid lines.
r + theme_gray()
Grey background
(default theme).
r + theme_dark()
Dark for contrast.
r + theme_classic()
r + theme_light()
r + theme_linedraw()
r + theme_minimal()
Minimal theme.
r + theme_void()
Empty theme.
Faceting
Facets divide a plot into
subplots based on the
values of one or more
discrete variables.
t <- ggplot(mpg, aes(cty, hwy)) + geom_point()
t + facet_grid(cols = vars(fl))
Facet into columns based on fl.
t + facet_grid(rows = vars(year))
Facet into rows based on year.
t + facet_grid(rows = vars(year), cols = vars(fl))
Facet into both rows and columns.
t + facet_wrap(vars(fl))
Wrap facets into a rectangular layout.
Set scales to let axis limits vary across facets.
t + facet_grid(rows = vars(drv), cols = vars(fl),
scales = "free")
x and y axis limits adjust to individual facets:
"free_x" - x axis limits adjust
"free_y" - y axis limits adjust
Set labeller to adjust facet label:
t + facet_grid(cols = vars(fl), labeller = label_both)
t + facet_grid(rows = vars(fl),
labeller = label_bquote(alpha ^ .(fl)))
fl: c fl: d fl: e fl: p fl: r
↵c
↵d ↵e
↵p
↵r
Labels and Legends
Use labs() to label the elements of your plot.
t + labs(x = "New x axis label", y = "New y axis label",
title ="Add a title above the plot",
subtitle = "Add a subtitle below title",
caption = "Add a caption below plot",
alt = "Add alt text to the plot",
<aes> = "New <aes> legend title")
t + annotate(geom = "text", x = 8, y = 9, label = “A")
Places a geom with manually selected aesthetics.
p + guides(x = guide_axis(n.dodge = 2)) Avoid crowded
or overlapping labels with guide_axis(n.dodge or angle).
n + guides(fill = “none") Set legend type for each
aesthetic: colorbar, legend, or none (no legend).
n + theme(legend.position = "bottom")
Place legend at "bottom", "top", "le
ft
", or “right”.
n + scale_fill_discrete(name = "Title",
labels = c("A", "B", "C", "D", "E"))
Set legend title and labels with a scale function.
<AES> <AES>
Zooming
Without clipping (preferred):
t + coord_cartesian(xlim = c(0, 100), ylim = c(10, 20))
With clipping (removes unseen data points):
t + xlim(0, 100) + ylim(10, 20)
t + scale_x_continuous(limits = c(0, 100)) +
scale_y_continuous(limits = c(0, 100))
RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at ggplot2.tidyverse.org • ggplot2 3.3.5 • Updated: 2021-08
60
long
lat
r + theme() Customize aspects of the theme such
as axis, legend, panel, and facet properties.
r + ggtitle(“Title”) + theme(plot.title.postion = “plot”)
r + theme(panel.background = element_rect(fill = “blue”))
Override defaults with scales package.