Why does one decide to visualize data? And once they have decided to visualize their data, how do they know the best way to tell their story? To answer these questions, this talk will focus on the iterative design process, visualization fundamentals, and storytelling techniques. We will then anchor these principles in effective visualization examples.
Of particular importance is the ability and willingness to refine and redefine your objectives as you determine the right visualization for you (or your client/audience). This talk will walk through a Datascope client case study in order to convey the importance of a flexible and collaborative process when approaching data visualization problems in order to deliver the best end result.
The audience will walk away armed with helpful visualization techniques and an understanding of the iterative design process.
With the availability of powerful but relatively low-level plotting libraries like d3.js, plot.ly, and matplotlib, it is easier than it has ever been to create beautiful visualizations. However, these plotting libraries must be very general and thus quite complicated to accommodate arbitrarily complex plotting and visualization tasks.In this talk, I describe the plotting system used by yt, an analysis and visualization platform for volumetric data written in python. The yt plotting system wraps matplotlib, creating a domain-specific API for creating publication quality plots that matches users' intuition for how they would like to explore and visualize their data. I will provide tips for designing and testing domain-specific plotting APIs so that the resulting plots are beautiful by default, but still modifiable with the full power of the underlying plotting library.
PLOTCON NYC: Interactive Visual Statistics on Massive DatasetsPlotly
Visualization is oftentimes the best way to explore raw data. But as data grows to include millions and billions of points, traditional visualization techniques break down. Whether you're loading the data into limited memory, or separating the signal from the noise when thousands of data points occupy each pixel, as data gets big, visualization gets challenging.
In this talk, Peter will describe an approach called "datashading" that deconstructs the classical infovis pipeline to place statistical processing at the heart of the visualization task. The result is a scalable, interactive system that is easy to use and produces perceptually accurate renderings of extremely large datasets. He will show the open-source Datashader library, which implements these ideas, and makes them available within Jupyter notebooks and Bokeh data applications.
PLOTCON NYC: Custom Colormaps for Your FieldPlotly
Visualizations can be clear or obscure depending on the color scheme used to represent the data, and careful use of color can also be attractive. However, colormaps have not generally received the attention they deserve, given their significance. The colors used carry the responsibility of conveying data honestly and accurately. They should generally be perceptually uniform so that equal steps through the dataset are represented by equal perceptual jumps in the colormap. They should be intuitive to help support quick, natural understanding of the data. They should match basic properties of the data, like showing the presence of information (sequential) or anomalies in a field (diverging). Additionally, just as different variables are typically represented with different specific Greek letters when written, different variables should also be represented with different colormaps when plotted. A suite of colormaps called cmocean have been developed to meet the needs of oceanographers, and can be used by any plotter out there. The suite is freely available for many different software packages (including Python and R). You can use these colormaps to help convey your data honestly and accurately.
This document discusses using data to build new products and solve business problems. It outlines linking different data sets together and adding to existing data to gain new insights. Examples are given of tying demographic data to interest data to better understand audiences. Specific examples discussed include analyzing over 3.6 million tweets to understand trends around Halloween and using social listening, demographics, interests and history to inform dating predictions. The importance of clear visualizations and designing products around user workflows is emphasized.
Visualisation alone is not enough to solve most data analysis challenges. The data may be too big or too messy to show in a single plot. In this talk, I'll outline my current thinking about how the synthesis of visualisation, modeling, and data manipulation allows you to effectively explore and understand large and complex datasets. There are three key ideas:
1. Using tidyr to make nested data frame, where one column is a list of data frames.
2. Using purrr to use function programming tools instead of writing for loops
3. Visualising models by converting them to tidy data with broom, by David Robinson.
This work is embedded in R so I'll not only talk about the ideas, but show concrete code for working with large sets of models. You'll see how you can combine the dplyr and purrr packages to fit many models, then use tidyr and broom to convert to tidy data which can be visualised with ggplot2.
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of WranglingPlotly
If you are struggling to make a plot, tear yourself away from stackoverflow for a moment and ... take a hard look at your data. Is it really in the most favorable form for the task at hand? Time and time again I have found that my visualization struggles are really a symptom of unfinished data wrangling. R has long had excellent facilities for data aggregation or "split-apply-combine": split an object into pieces, compute on each piece, and glue the result back together again. Recent developments, especially in the purrr package, have made "split-apply-combine" even easier and more general. But this requires a certain comfort level with lists, especially with lists that are columns inside a data frame. This is unfamiliar to most of us. I give an overview of this set of problems and match them up with solutions based on grouped, nested, and split data frames.
PLOTCON NYC: The Future of Business Intelligence: Data VisualizationPlotly
This document discusses the importance and rise of data visualization. It notes that we are in an era of "big data" where vast amounts of data are being generated and collected daily through activities like searching, browsing, communicating, shopping, and more. However, simply having data is not enough - the data needs to be easier to understand and act upon. The document argues that data visualization is an essential skill for communicating information to others in an efficient and effective way. It discusses some of the challenges in designing good visualizations that are readable, interpretable, meaningful, relevant and timely for audiences. The document provides tips on designing visualizations with the audience and comprehension in mind through techniques like annotation and animation.
Human: Thank you
The document discusses the benefits of exercise for mental health. It states that regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise has also been shown to increase gray matter volume in the brain and reduce risks for conditions like Alzheimer's and dementia.
With the availability of powerful but relatively low-level plotting libraries like d3.js, plot.ly, and matplotlib, it is easier than it has ever been to create beautiful visualizations. However, these plotting libraries must be very general and thus quite complicated to accommodate arbitrarily complex plotting and visualization tasks.In this talk, I describe the plotting system used by yt, an analysis and visualization platform for volumetric data written in python. The yt plotting system wraps matplotlib, creating a domain-specific API for creating publication quality plots that matches users' intuition for how they would like to explore and visualize their data. I will provide tips for designing and testing domain-specific plotting APIs so that the resulting plots are beautiful by default, but still modifiable with the full power of the underlying plotting library.
PLOTCON NYC: Interactive Visual Statistics on Massive DatasetsPlotly
Visualization is oftentimes the best way to explore raw data. But as data grows to include millions and billions of points, traditional visualization techniques break down. Whether you're loading the data into limited memory, or separating the signal from the noise when thousands of data points occupy each pixel, as data gets big, visualization gets challenging.
In this talk, Peter will describe an approach called "datashading" that deconstructs the classical infovis pipeline to place statistical processing at the heart of the visualization task. The result is a scalable, interactive system that is easy to use and produces perceptually accurate renderings of extremely large datasets. He will show the open-source Datashader library, which implements these ideas, and makes them available within Jupyter notebooks and Bokeh data applications.
PLOTCON NYC: Custom Colormaps for Your FieldPlotly
Visualizations can be clear or obscure depending on the color scheme used to represent the data, and careful use of color can also be attractive. However, colormaps have not generally received the attention they deserve, given their significance. The colors used carry the responsibility of conveying data honestly and accurately. They should generally be perceptually uniform so that equal steps through the dataset are represented by equal perceptual jumps in the colormap. They should be intuitive to help support quick, natural understanding of the data. They should match basic properties of the data, like showing the presence of information (sequential) or anomalies in a field (diverging). Additionally, just as different variables are typically represented with different specific Greek letters when written, different variables should also be represented with different colormaps when plotted. A suite of colormaps called cmocean have been developed to meet the needs of oceanographers, and can be used by any plotter out there. The suite is freely available for many different software packages (including Python and R). You can use these colormaps to help convey your data honestly and accurately.
This document discusses using data to build new products and solve business problems. It outlines linking different data sets together and adding to existing data to gain new insights. Examples are given of tying demographic data to interest data to better understand audiences. Specific examples discussed include analyzing over 3.6 million tweets to understand trends around Halloween and using social listening, demographics, interests and history to inform dating predictions. The importance of clear visualizations and designing products around user workflows is emphasized.
Visualisation alone is not enough to solve most data analysis challenges. The data may be too big or too messy to show in a single plot. In this talk, I'll outline my current thinking about how the synthesis of visualisation, modeling, and data manipulation allows you to effectively explore and understand large and complex datasets. There are three key ideas:
1. Using tidyr to make nested data frame, where one column is a list of data frames.
2. Using purrr to use function programming tools instead of writing for loops
3. Visualising models by converting them to tidy data with broom, by David Robinson.
This work is embedded in R so I'll not only talk about the ideas, but show concrete code for working with large sets of models. You'll see how you can combine the dplyr and purrr packages to fit many models, then use tidyr and broom to convert to tidy data which can be visualised with ggplot2.
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of WranglingPlotly
If you are struggling to make a plot, tear yourself away from stackoverflow for a moment and ... take a hard look at your data. Is it really in the most favorable form for the task at hand? Time and time again I have found that my visualization struggles are really a symptom of unfinished data wrangling. R has long had excellent facilities for data aggregation or "split-apply-combine": split an object into pieces, compute on each piece, and glue the result back together again. Recent developments, especially in the purrr package, have made "split-apply-combine" even easier and more general. But this requires a certain comfort level with lists, especially with lists that are columns inside a data frame. This is unfamiliar to most of us. I give an overview of this set of problems and match them up with solutions based on grouped, nested, and split data frames.
PLOTCON NYC: The Future of Business Intelligence: Data VisualizationPlotly
This document discusses the importance and rise of data visualization. It notes that we are in an era of "big data" where vast amounts of data are being generated and collected daily through activities like searching, browsing, communicating, shopping, and more. However, simply having data is not enough - the data needs to be easier to understand and act upon. The document argues that data visualization is an essential skill for communicating information to others in an efficient and effective way. It discusses some of the challenges in designing good visualizations that are readable, interpretable, meaningful, relevant and timely for audiences. The document provides tips on designing visualizations with the audience and comprehension in mind through techniques like annotation and animation.
Human: Thank you
The document discusses the benefits of exercise for mental health. It states that regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise has also been shown to increase gray matter volume in the brain and reduce risks for conditions like Alzheimer's and dementia.
PLOTCON NYC: New Data Viz in Data JournalismPlotly
In this talk I present a survey of forms and tools that are used by practicing data journalists. I walk through examples of different techniques used by journalists to convey complex information to readers, including static charts and graphics, probabilistic models, simulations, and others. I discuss the tools that are available for creating such storytelling devices, examining their successes and shortcomings, and speculate on future directions. I also look at how open source software has impacted journalism. The audience should walk away with a better understanding of how data journalists work in practice, what tools are available for citizen data journalists, and how journalists can work together with the open source community.
PLOTCON NYC: Data Science in the Enterprise From Concept to ExecutionPlotly
Data science can create incredible value for companies. Those that do it well, use it as a tool for strategic differentiation in the market. However, generating value from data science, whether by embedding it into an actual product or using it to drive business strategy and operations, can be complex. Particularly with strategy and operations, delivering value in the enterprise from data science has a unique set of challenges. In this context, value is created only after the results of an analysis have led to actions in the line of business. To achieve success, this requires a complementary set of skills in addition to data analysis and modeling. It requires business acumen, persuasion, coordination, processes, and execution. In this talk, we will discuss the concept of analytics to execution at Red Hat, and how delivering value from data science in the enterprise extends far beyond the traditional data science workflow.
PLOTCON NYC: Building a Flexible Analytics StackPlotly
ABSTRACT
Board rooms and business reports have long been stuffed with basic graphs and ugly charts. Now, inspired by the work that appears in today's newspapers and scientific journals, and powered by tools like Plotly, businesses are doing much more to find answers and insight in their data. This talk will outline both how companies, and particularly leading tech startups, are combining--and often building--technologies that collect, prepare, move, transform, and visualize data, and the problems these businesses are solving with this new stack of data tools.
PLOTCON NYC: Mapping Networked Attention: What We Learn from Social DataPlotly
At a time when attention is a scarce commodity, true power lies in understanding the networked nature of digital audiences - who is a central authority, who resides at the periphery, and how friends, followers and fans are inter-connected. It is no longer possible to demand one's attention, or even expect it at a certain point in time. For a message to spread, it must be picked out from overflowing streams of updates, photos and links, and chosen to be reposted by each individual. The networked nature of social media may give some messages an overwhelming boost in popularity, but in most cases they fade as fast as they were created. It is imperative that we use available data to better model, track and gain insight about our audience in order to make the optimal decision at any given time.
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...Plotly
Project Jupyter, evolved from the IPython environment, provides a platform for interactive computing that is widely used today in research, education, journalism and industry. The core premise of the Jupyter architecture is to design tools around the experience of interactive computing, building an environment, protocol, file format and libraries optimized for the computational process when there is a human in the loop, in a live iteration with ideas and data assisted by the computer.
In this talk, I will discuss what are the basic ideas that underpin Jupyter, and how they provide "lego blocks" that enable the project team, and the broader community, to develop a variety of tools and approaches to problems in interactive computing, data science, visualization and more.
Summary of all tools and microsoft power biOmar Khan
This document introduces Microsoft Power BI and its tools for data visualization and reporting. It discusses how Power BI can support large data volumes, automated web reporting, and increased efficiency. Power BI tools like Power Pivot, Power View and Excel enable ad-hoc analysis, dashboards, and standard report automation from data marts and beyond Excel limits. Power BI solutions can be deployed on SharePoint for collaboration and on mobile devices.
Data Visualization Techniques in Power BIAngel Abundez
A progression from fundamental charts to more advanced ways to look at data. We end with Custom Visuals and R Visuals that extend this visualization platform.
PLOTCON NYC: PlotlyJS.jl: Interactive plotting in JuliaPlotly
PlotlyJS.jl is a Julia wrapper for the interactive JavaScript plotting library plotly.js. It provides two main layers: 1) a faithful representation of the plotly.js API to allow constructing plots and visualizations programmatically in JSON format, and 2) convenience functions and syntax to make common plotting tasks more natural in Julia, such as plotting data with a single function call or combining multiple plots into subplots. The library aims to make interactive visualization easy and publication-quality from Julia.
PLOTCON NYC: Text is data! Analysis and Visualization MethodsPlotly
Text is one of the most interesting and varied data sources on the web and beyond, but it is one of the most difficult to deal with because it is fundamentally a messy, fragmented, and unnormalized format. If you have ever wanted to analyze and visualize text, but don’t know where to get started, this talk is for you. Irene will go through examples of text visualization approaches and the analysis methods required to create them.
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016Tanya Cashorali
This document discusses various charting and visualization libraries for R and JavaScript. It outlines the evolution of charting libraries in R like base R, ggplot2, and plotly. It also lists popular JavaScript charting libraries like D3.js and Chart.js. The document demonstrates how charts like pie and bar charts can be created more efficiently in plotly compared to D3.js. It shows examples of dashboards created with Shiny and plotly that require less code than pure JavaScript equivalents. Finally, it discusses considerations for choosing R with plotly versus other options like D3.js and highlights benefits of using plotly in R.
What’s New in the Berkeley Data Analytics StackTuri, Inc.
The document discusses the Berkeley Data Analytics Stack (BDAS) developed by UC Berkeley's AMPLab. It summarizes the key components of the BDAS including Spark, Mesos, Tachyon, MLlib, and Velox. It describes how the BDAS provides a unified platform for batch, iterative, and streaming analytics using in-memory techniques. It also discusses recent developments like KeystoneML/ML Pipelines for scalable machine learning and SampleClean for human-in-the-loop analytics. The goal is to make it easier to build and deploy advanced analytics applications on large datasets.
Datascope: Designing your Data Viz - The (Iterative) ProcessMollie Pettit
This talk was given to a Data Visualization course, which is part of the Masters of Science in Analytics program at the Northwestern School of Engineering.
It walks through:
- Why to visualize data
- A common (linear) approach to data problems
- A look at a problem in an ambiguos world, and why the linear approach does not always get one to their ideal end point
- A better (iterative) approach
- how to get started on a project through the important practice of brainstorming
-An informal project example. In this example, an iterative approach to the visualization helped the creator to gain new insights which changed her story's focus all-together.
-A case study of a project done for Procter & Gamble. In this example, an iterative approach redirected us from a more complicated network graph of the company (which we initially assumed would be an end-result) to displaying data in a simpler way (e.g. bar charts), which was more ideal for the client.
-Another case study. In this example, an iterative approach led us to create a less obvious / more creative visualization that stressed the things that were most important to the client. Nearly every single iteration step (all of which were shown to the client) are shown in the slides.
It ends with a reminder that doing is better than planning. You really can't learn what your ideal end-product will be until you get started; while working, one must constantly ask questions and gain feedback, and refine the approach accordingly.
This document outlines a data science competition to build a spam detector using email data. Participants will be provided with training data containing 600 emails and their corresponding labels (spam or not spam). They will use this data to build a model to classify new emails as spam or not spam. The goal is to correctly classify as many new test emails as possible. Visualization and interpretation of results will be important for evaluating model performance and identifying ways to improve the spam detection.
From Research Objects to Reproducible Science TalesBertram Ludäscher
University of Southampton. Electronics & Computer Science. Research Seminar (Invited Talk).
TITLE: From Research Objects to Reproducible Science Tales
ABSTRACT. Rumor has it that there is a reproducibility crisis in science. Or maybe there are multiple crises? What do we mean by reproducibility and replicability anyways? In this talk I will first make an attempt at sorting out some of the terminological confusion in this area, focusing on computational aspects. The PRIMAD model is another attempt to describe different aspects of reproducibility studies by focusing on the "delta" between those studies and the original study. In addition to these more theoretical investigations, I will discuss practical efforts to create more reproducible and more transparent computational platforms such as the one developed by the Whole-Tale project: here 'tales' are executable research objects that may combine data, code, runtime environments, and narratives (i.e., the traditional "science story"). I will conclude with some thoughts about the remaining challenges and opportunities to bridge the large conceptual gaps that continue to exist despite the recognition of problems of reproducibility and transparency in science.
ABOUT the Speaker. Bertram Ludäscher is a professor at the School of Information Sciences at the University of Illinois, Urbana-Champaign and a faculty affiliate with the National Center for Supercomputing Applications (NCSA) and the Department of Computer Science at Illinois. Until 2014 he was a professor at the Department of Computer Science at the University of California, Davis. His research interests range from practical questions in scientific data and workflow management, to database theory and knowledge representation and reasoning. Prior to his faculty appointments, he was a research scientist at the San Diego Supercomputer Center (SDSC) and an adjunct faculty at the CSE Department at UC San Diego. He received his M.S. (Dipl.-Inform.) in computer science from the University of Karlsruhe (now K.I.T.), and his PhD (Dr. rer. nat.) from the University of Freiburg, in Germany.
This document summarizes Spotify's approach to music discovery and recommendations using machine learning techniques. It discusses how Spotify analyzes billions of user streams to find patterns and make recommendations using collaborative filtering and latent factor models. It also explores combining multiple models like recurrent neural networks, word2vec, and gradient boosted decision trees to improve recommendations. The challenges of evaluating recommendations and optimizing for the right metrics are also summarized.
1. The document discusses creative thinking tools and techniques for generating hundreds of ideas in minutes to produce new solutions.
2. It provides examples of different creative thinking tools including checklists, forced relationships, idea grids, PCP (Pluses, Concerns, Potentials) and hits and misses ranking.
3. The document advocates that every aspect of teaching can be made more creative to help students generate more ideas and solutions.
Replication in Data Science - A Dance Between Data Science & Machine Learning...June Andrews
We use Iterative Supervised Clustering as a simple building block for exploring Pinterest's Content. But simplicity can unlock great power and with this building block we show the shocking result of how hard it is to replicated data science conclusions. This begs us to challenge the future for When is Data Science a House of Cards?
How the Web can change social science research (including yours)Frank van Harmelen
A presentation for a group of PhD students from the Leibniz Institutes (section B, social sciences) to discuss how they could use the Web, and even better the Web of Data, as an instrument in their research.
Image is Everything: Exploring Visual Literacy for Critical Thinking EdTechTe...Amy Burvall
From cave walls to Facebook walls we have always embraced visual communication. Dual coding theory of cognition reiterates the importance of visual imagery in respect to our thinking processes - that in fact we need visual language in addition to verbal or text-based coding of stimuli. With the changing media landscape, our streams, memes, and zines have exploded with imagery, ushering in a need for visual literacy skills. We are quickly moving from images as decoration and augmentation to images as sole content and communication tool. We have some false beliefs about visual language - that it is equated with “art”, requiring “talent” from “creative types” - and therefore it is unfortunately often not overtly taught and practiced in schools. Technology has affected knowledge in such a way as to diminish the value of “raw” information and increase the value of sense-making, as well as chip away at attention spans, sparking a need for distillation of complex ideas. Images can essentialize the cumbersome in beautiful ways. They have a “stickiness” for the viewer and challenge the critical thinking of the creator.
**Please not videos will not play but they are located in respective categories on the G+ community
Workshop trailer: https://www.youtube.com/watch?v=BYNQ2hzbeQI
Workshop Resources: https://plus.google.com/u/1/communities/113762614515763343967
Deep multimodal intelligence by Xiaodong He from MicrosoftBill Liu
Presented at AI NEXTCon Seattle 1/17-20, 2018
http://aisea18.xnextcon.com
join our free online AI group with 50,000+ tech engineers to learn and practice AI technology, including: latest AI news, tech articles/blogs, tech talks, tutorial videos, and hands-on workshop/codelabs, on machine learning, deep learning, data science, etc..
Big, Open, Data and Semantics for Real-World Application Near YouBiplav Srivastava
(This is material presented as keynote at AMECSE 2014 on 21 Oct 2014 at Cairo, Egypt.)
State-of-the-art Artifical Intelligence (AI) and data management techniques have been demonstrated to process large volumes of noisy data to extract meaningful patterns and drive decisions in diverse applications ranging from space exploration (NASA's Curiosity), game shows (IBM's Watson in Jeopardy™ ) and even consumer products (Apple's SIRI™ voice-recognition). However, what stops them from helping us in more mundane things like fighting diseases, eliminating hunger, improving commuting
to work, or reducing financial frauds and corruption? Consumable data!
In this talk, Biplav will demonstrate and discuss how large volumes of data (Big), made available publicly (Open), can be productively used with semantic web and analytical techniques to drive day-to-day applications. One important source of this type of data is government open data which is from governments and free to be reused. Big Open Data is leading to early examples of "open innovations" - a confluence of open data (e.g., Data.gov, data.gov.in), accessible via API techniques (e.g., Open 311),
annotated with semantic information (e.g., W3C ontologies, Schema.org) and processed with analytical techniques (e.g., R, Weka) to drive actionable insights. The talk will illustrate how this can help bring increased benefits to citizens and discuss research issues that can accelerate its pace. It is increasingly being adopted by progressive businesses and governments to drive innovation that matters.
A New Year in Data Science: ML UnpausedPaco Nathan
This document summarizes Paco Nathan's presentation at Data Day Texas in 2015. Some key points:
- Paco Nathan discussed observations and trends from the past year in machine learning, data science, big data, and open source technologies.
- He argued that the definitions of data science and statistics are flawed and ignore important areas like development, visualization, and modeling real-world business problems.
- The presentation covered topics like functional programming approaches, streaming approximations, and the importance of an interdisciplinary approach combining computer science, statistics, and other fields like physics.
- Paco Nathan advocated for newer probabilistic techniques for analyzing large datasets that provide approximations using less resources compared to traditional batch processing approaches.
PLOTCON NYC: New Data Viz in Data JournalismPlotly
In this talk I present a survey of forms and tools that are used by practicing data journalists. I walk through examples of different techniques used by journalists to convey complex information to readers, including static charts and graphics, probabilistic models, simulations, and others. I discuss the tools that are available for creating such storytelling devices, examining their successes and shortcomings, and speculate on future directions. I also look at how open source software has impacted journalism. The audience should walk away with a better understanding of how data journalists work in practice, what tools are available for citizen data journalists, and how journalists can work together with the open source community.
PLOTCON NYC: Data Science in the Enterprise From Concept to ExecutionPlotly
Data science can create incredible value for companies. Those that do it well, use it as a tool for strategic differentiation in the market. However, generating value from data science, whether by embedding it into an actual product or using it to drive business strategy and operations, can be complex. Particularly with strategy and operations, delivering value in the enterprise from data science has a unique set of challenges. In this context, value is created only after the results of an analysis have led to actions in the line of business. To achieve success, this requires a complementary set of skills in addition to data analysis and modeling. It requires business acumen, persuasion, coordination, processes, and execution. In this talk, we will discuss the concept of analytics to execution at Red Hat, and how delivering value from data science in the enterprise extends far beyond the traditional data science workflow.
PLOTCON NYC: Building a Flexible Analytics StackPlotly
ABSTRACT
Board rooms and business reports have long been stuffed with basic graphs and ugly charts. Now, inspired by the work that appears in today's newspapers and scientific journals, and powered by tools like Plotly, businesses are doing much more to find answers and insight in their data. This talk will outline both how companies, and particularly leading tech startups, are combining--and often building--technologies that collect, prepare, move, transform, and visualize data, and the problems these businesses are solving with this new stack of data tools.
PLOTCON NYC: Mapping Networked Attention: What We Learn from Social DataPlotly
At a time when attention is a scarce commodity, true power lies in understanding the networked nature of digital audiences - who is a central authority, who resides at the periphery, and how friends, followers and fans are inter-connected. It is no longer possible to demand one's attention, or even expect it at a certain point in time. For a message to spread, it must be picked out from overflowing streams of updates, photos and links, and chosen to be reposted by each individual. The networked nature of social media may give some messages an overwhelming boost in popularity, but in most cases they fade as fast as they were created. It is imperative that we use available data to better model, track and gain insight about our audience in order to make the optimal decision at any given time.
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...Plotly
Project Jupyter, evolved from the IPython environment, provides a platform for interactive computing that is widely used today in research, education, journalism and industry. The core premise of the Jupyter architecture is to design tools around the experience of interactive computing, building an environment, protocol, file format and libraries optimized for the computational process when there is a human in the loop, in a live iteration with ideas and data assisted by the computer.
In this talk, I will discuss what are the basic ideas that underpin Jupyter, and how they provide "lego blocks" that enable the project team, and the broader community, to develop a variety of tools and approaches to problems in interactive computing, data science, visualization and more.
Summary of all tools and microsoft power biOmar Khan
This document introduces Microsoft Power BI and its tools for data visualization and reporting. It discusses how Power BI can support large data volumes, automated web reporting, and increased efficiency. Power BI tools like Power Pivot, Power View and Excel enable ad-hoc analysis, dashboards, and standard report automation from data marts and beyond Excel limits. Power BI solutions can be deployed on SharePoint for collaboration and on mobile devices.
Data Visualization Techniques in Power BIAngel Abundez
A progression from fundamental charts to more advanced ways to look at data. We end with Custom Visuals and R Visuals that extend this visualization platform.
PLOTCON NYC: PlotlyJS.jl: Interactive plotting in JuliaPlotly
PlotlyJS.jl is a Julia wrapper for the interactive JavaScript plotting library plotly.js. It provides two main layers: 1) a faithful representation of the plotly.js API to allow constructing plots and visualizations programmatically in JSON format, and 2) convenience functions and syntax to make common plotting tasks more natural in Julia, such as plotting data with a single function call or combining multiple plots into subplots. The library aims to make interactive visualization easy and publication-quality from Julia.
PLOTCON NYC: Text is data! Analysis and Visualization MethodsPlotly
Text is one of the most interesting and varied data sources on the web and beyond, but it is one of the most difficult to deal with because it is fundamentally a messy, fragmented, and unnormalized format. If you have ever wanted to analyze and visualize text, but don’t know where to get started, this talk is for you. Irene will go through examples of text visualization approaches and the analysis methods required to create them.
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016Tanya Cashorali
This document discusses various charting and visualization libraries for R and JavaScript. It outlines the evolution of charting libraries in R like base R, ggplot2, and plotly. It also lists popular JavaScript charting libraries like D3.js and Chart.js. The document demonstrates how charts like pie and bar charts can be created more efficiently in plotly compared to D3.js. It shows examples of dashboards created with Shiny and plotly that require less code than pure JavaScript equivalents. Finally, it discusses considerations for choosing R with plotly versus other options like D3.js and highlights benefits of using plotly in R.
What’s New in the Berkeley Data Analytics StackTuri, Inc.
The document discusses the Berkeley Data Analytics Stack (BDAS) developed by UC Berkeley's AMPLab. It summarizes the key components of the BDAS including Spark, Mesos, Tachyon, MLlib, and Velox. It describes how the BDAS provides a unified platform for batch, iterative, and streaming analytics using in-memory techniques. It also discusses recent developments like KeystoneML/ML Pipelines for scalable machine learning and SampleClean for human-in-the-loop analytics. The goal is to make it easier to build and deploy advanced analytics applications on large datasets.
Datascope: Designing your Data Viz - The (Iterative) ProcessMollie Pettit
This talk was given to a Data Visualization course, which is part of the Masters of Science in Analytics program at the Northwestern School of Engineering.
It walks through:
- Why to visualize data
- A common (linear) approach to data problems
- A look at a problem in an ambiguos world, and why the linear approach does not always get one to their ideal end point
- A better (iterative) approach
- how to get started on a project through the important practice of brainstorming
-An informal project example. In this example, an iterative approach to the visualization helped the creator to gain new insights which changed her story's focus all-together.
-A case study of a project done for Procter & Gamble. In this example, an iterative approach redirected us from a more complicated network graph of the company (which we initially assumed would be an end-result) to displaying data in a simpler way (e.g. bar charts), which was more ideal for the client.
-Another case study. In this example, an iterative approach led us to create a less obvious / more creative visualization that stressed the things that were most important to the client. Nearly every single iteration step (all of which were shown to the client) are shown in the slides.
It ends with a reminder that doing is better than planning. You really can't learn what your ideal end-product will be until you get started; while working, one must constantly ask questions and gain feedback, and refine the approach accordingly.
This document outlines a data science competition to build a spam detector using email data. Participants will be provided with training data containing 600 emails and their corresponding labels (spam or not spam). They will use this data to build a model to classify new emails as spam or not spam. The goal is to correctly classify as many new test emails as possible. Visualization and interpretation of results will be important for evaluating model performance and identifying ways to improve the spam detection.
From Research Objects to Reproducible Science TalesBertram Ludäscher
University of Southampton. Electronics & Computer Science. Research Seminar (Invited Talk).
TITLE: From Research Objects to Reproducible Science Tales
ABSTRACT. Rumor has it that there is a reproducibility crisis in science. Or maybe there are multiple crises? What do we mean by reproducibility and replicability anyways? In this talk I will first make an attempt at sorting out some of the terminological confusion in this area, focusing on computational aspects. The PRIMAD model is another attempt to describe different aspects of reproducibility studies by focusing on the "delta" between those studies and the original study. In addition to these more theoretical investigations, I will discuss practical efforts to create more reproducible and more transparent computational platforms such as the one developed by the Whole-Tale project: here 'tales' are executable research objects that may combine data, code, runtime environments, and narratives (i.e., the traditional "science story"). I will conclude with some thoughts about the remaining challenges and opportunities to bridge the large conceptual gaps that continue to exist despite the recognition of problems of reproducibility and transparency in science.
ABOUT the Speaker. Bertram Ludäscher is a professor at the School of Information Sciences at the University of Illinois, Urbana-Champaign and a faculty affiliate with the National Center for Supercomputing Applications (NCSA) and the Department of Computer Science at Illinois. Until 2014 he was a professor at the Department of Computer Science at the University of California, Davis. His research interests range from practical questions in scientific data and workflow management, to database theory and knowledge representation and reasoning. Prior to his faculty appointments, he was a research scientist at the San Diego Supercomputer Center (SDSC) and an adjunct faculty at the CSE Department at UC San Diego. He received his M.S. (Dipl.-Inform.) in computer science from the University of Karlsruhe (now K.I.T.), and his PhD (Dr. rer. nat.) from the University of Freiburg, in Germany.
This document summarizes Spotify's approach to music discovery and recommendations using machine learning techniques. It discusses how Spotify analyzes billions of user streams to find patterns and make recommendations using collaborative filtering and latent factor models. It also explores combining multiple models like recurrent neural networks, word2vec, and gradient boosted decision trees to improve recommendations. The challenges of evaluating recommendations and optimizing for the right metrics are also summarized.
1. The document discusses creative thinking tools and techniques for generating hundreds of ideas in minutes to produce new solutions.
2. It provides examples of different creative thinking tools including checklists, forced relationships, idea grids, PCP (Pluses, Concerns, Potentials) and hits and misses ranking.
3. The document advocates that every aspect of teaching can be made more creative to help students generate more ideas and solutions.
Replication in Data Science - A Dance Between Data Science & Machine Learning...June Andrews
We use Iterative Supervised Clustering as a simple building block for exploring Pinterest's Content. But simplicity can unlock great power and with this building block we show the shocking result of how hard it is to replicated data science conclusions. This begs us to challenge the future for When is Data Science a House of Cards?
How the Web can change social science research (including yours)Frank van Harmelen
A presentation for a group of PhD students from the Leibniz Institutes (section B, social sciences) to discuss how they could use the Web, and even better the Web of Data, as an instrument in their research.
Image is Everything: Exploring Visual Literacy for Critical Thinking EdTechTe...Amy Burvall
From cave walls to Facebook walls we have always embraced visual communication. Dual coding theory of cognition reiterates the importance of visual imagery in respect to our thinking processes - that in fact we need visual language in addition to verbal or text-based coding of stimuli. With the changing media landscape, our streams, memes, and zines have exploded with imagery, ushering in a need for visual literacy skills. We are quickly moving from images as decoration and augmentation to images as sole content and communication tool. We have some false beliefs about visual language - that it is equated with “art”, requiring “talent” from “creative types” - and therefore it is unfortunately often not overtly taught and practiced in schools. Technology has affected knowledge in such a way as to diminish the value of “raw” information and increase the value of sense-making, as well as chip away at attention spans, sparking a need for distillation of complex ideas. Images can essentialize the cumbersome in beautiful ways. They have a “stickiness” for the viewer and challenge the critical thinking of the creator.
**Please not videos will not play but they are located in respective categories on the G+ community
Workshop trailer: https://www.youtube.com/watch?v=BYNQ2hzbeQI
Workshop Resources: https://plus.google.com/u/1/communities/113762614515763343967
Deep multimodal intelligence by Xiaodong He from MicrosoftBill Liu
Presented at AI NEXTCon Seattle 1/17-20, 2018
http://aisea18.xnextcon.com
join our free online AI group with 50,000+ tech engineers to learn and practice AI technology, including: latest AI news, tech articles/blogs, tech talks, tutorial videos, and hands-on workshop/codelabs, on machine learning, deep learning, data science, etc..
Big, Open, Data and Semantics for Real-World Application Near YouBiplav Srivastava
(This is material presented as keynote at AMECSE 2014 on 21 Oct 2014 at Cairo, Egypt.)
State-of-the-art Artifical Intelligence (AI) and data management techniques have been demonstrated to process large volumes of noisy data to extract meaningful patterns and drive decisions in diverse applications ranging from space exploration (NASA's Curiosity), game shows (IBM's Watson in Jeopardy™ ) and even consumer products (Apple's SIRI™ voice-recognition). However, what stops them from helping us in more mundane things like fighting diseases, eliminating hunger, improving commuting
to work, or reducing financial frauds and corruption? Consumable data!
In this talk, Biplav will demonstrate and discuss how large volumes of data (Big), made available publicly (Open), can be productively used with semantic web and analytical techniques to drive day-to-day applications. One important source of this type of data is government open data which is from governments and free to be reused. Big Open Data is leading to early examples of "open innovations" - a confluence of open data (e.g., Data.gov, data.gov.in), accessible via API techniques (e.g., Open 311),
annotated with semantic information (e.g., W3C ontologies, Schema.org) and processed with analytical techniques (e.g., R, Weka) to drive actionable insights. The talk will illustrate how this can help bring increased benefits to citizens and discuss research issues that can accelerate its pace. It is increasingly being adopted by progressive businesses and governments to drive innovation that matters.
A New Year in Data Science: ML UnpausedPaco Nathan
This document summarizes Paco Nathan's presentation at Data Day Texas in 2015. Some key points:
- Paco Nathan discussed observations and trends from the past year in machine learning, data science, big data, and open source technologies.
- He argued that the definitions of data science and statistics are flawed and ignore important areas like development, visualization, and modeling real-world business problems.
- The presentation covered topics like functional programming approaches, streaming approximations, and the importance of an interdisciplinary approach combining computer science, statistics, and other fields like physics.
- Paco Nathan advocated for newer probabilistic techniques for analyzing large datasets that provide approximations using less resources compared to traditional batch processing approaches.
This document provides an overview of a machine learning course. It outlines the course structure, including topics covered, assignments, and grading. The course covers fundamental machine learning algorithms for classification, regression, clustering, and dimensionality reduction. It also discusses applications of machine learning like spam filtering, recommender systems, and chess playing computers.
This document provides an overview of a machine learning course. It outlines the course structure, including topics covered, assignments, and grading. The course covers fundamental machine learning algorithms for classification, regression, clustering, and dimensionality reduction. It also discusses applications of machine learning like spam filtering, recommender systems, and chess playing programs.
This document provides an overview of machine learning and feature engineering. It discusses how machine learning can be used for tasks like classification, regression, similarity matching, and clustering. It explains that feature engineering involves transforming raw data into numeric representations called features that machine learning models can use. Different techniques for feature engineering text and images are presented, such as bag-of-words and convolutional neural networks. Dimensionality reduction through principal component analysis is demonstrated. Finally, information is given about upcoming machine learning tutorials and Dato's machine learning platform.
Overview of Machine Learning and Feature EngineeringTuri, Inc.
Machine Learning 101 Tutorial at Strata NYC, Sep 2015
Overview of machine learning models and features. Visualization of feature space and feature engineering methods.
Talk given at the 6th Irish NLP Meetup on query understanding using conceptual slices and word embeddings.
https://www.meetup.com/NLP-Dublin/events/237998517/
My talk at the Scandinavian Developer Conference 2010 about following the wrong principles and getting too excited about shiny demos rather than building things that work and proving our technologies as professional tools.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Natural Language Processing (NLP), RAG and its applications .pptxfkyes25
1. In the realm of Natural Language Processing (NLP), knowledge-intensive tasks such as question answering, fact verification, and open-domain dialogue generation require the integration of vast and up-to-date information. Traditional neural models, though powerful, struggle with encoding all necessary knowledge within their parameters, leading to limitations in generalization and scalability. The paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" introduces RAG (Retrieval-Augmented Generation), a novel framework that synergizes retrieval mechanisms with generative models, enhancing performance by dynamically incorporating external knowledge during inference.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
PLOTCON NYC: Get Your Point Across: The Art of Choosing the Right Visualization for your Data
1. get your point across
for PLOTCON • Nov 16, 2016
the art of choosing the right visualization for your data
Mollie Pettit @MollzMP
Jess Freaner @jessfreaner
2. - iteration in action: a case study
agenda
- iterative design process
- storytelling techniques
- visualization fundamentals
- why visualize data?
44. I think you learn
about computer
safety.
:)
When you are a
genis at
electronics
:)
Using code and
fixing and making
computers.
:)
45. science on a computer
programming / coding
how computers work
how to use computers
studying computers
a website, program, or game
typing / testing
using computers to solve problems
engineering
someone good at computers
internet safety
experiments / research / modeling
I like it!
class / learning / lessons
making apps, games, or websites
I don’t know
what’s inside a computer
76. - fundamentals / tradeoffs
- iterate and be flexible
- use context, audience, purpose to guide you
takeaways
77. Mollie Pettit @MollzMP
Jess Freaner @jessfreaner
Thanks!
@DsAtweet
get your point across
the art of choosing the right visualization for your data
Editor's Notes
Quote:
The power of the unaided mind is highly overrated…
Without external aids, memory, thought, and reasoning are all constrained. But human intelligence is highly flexible and adaptive, superb at inventing procedures and objects that overcome its own limits.
The real powers come from devising external aids that enhance cognitive abilities.
How have we increased memory, thought, and reasoning? By the invention of external aids: It is things that make us smart.
— Don Norman
Things That Make us Smart:
Defending Human Attributes in the Age of the Machine
There are two reasons …
One you might most often overlook. Taking a look at patterns visually will help you have a better understanding of a data set
There are two reasons …
One you might most often overlook. Taking a look at patterns visually will help you have a better understanding of a data set
When you do exploratory analysis, it’s like hunting for pearls in oysters. We might have to open 100 oysters (test 100 different hypotheses or look at the data in 100 different ways) to find perhaps two pearls.
helps to get to more interesting questions.
Great thing about this kind of data - takes very little effort. You can do quick and dirty. For your eyes only. Not worried about fancy titles or pretty colors. Glean insight.
Mona Chalabi's handdrawn data sketches
When you do exploratory analysis, it’s like hunting for pearls in oysters. We might have to open 100 oysters (test 100 different hypotheses or look at the data in 100 different ways) to find perhaps two pearls.
When communicating analysis to your audience, you will likely have a specific story you want to tell - probably about those two pearls
Too often, people simply present the data - all 100 oysters - which makes the audience reopen all of the oysters all over again! It is often best to concentrate on the pearls, the information your audience needs to know to understand your story / narrative
When you do exploratory analysis, it’s like hunting for pearls in oysters. We might have to open 100 oysters (test 100 different hypotheses or look at the data in 100 different ways) to find perhaps two pearls.
When communicating analysis to your audience, you will likely have a specific story you want to tell - probably about those two pearls
Too often, people simply present the data - all 100 oysters - which makes the audience reopen all of the oysters all over again! It is often best to concentrate on the pearls, the information your audience needs to know to understand your story / narrative
When you do exploratory analysis, it’s like hunting for pearls in oysters. We might have to open 100 oysters (test 100 different hypotheses or look at the data in 100 different ways) to find perhaps two pearls.
When communicating analysis to your audience, you will likely have a specific story you want to tell - probably about those two pearls
Too often, people simply present the data - all 100 oysters - which makes the audience reopen all of the oysters all over again! It is often best to concentrate on the pearls, the information your audience needs to know to understand your story / narrative
but when there’s an audience sometimes there are better choices
particularly for a given context, especially when you consider the end purpose of the visualizations, end users / audience
the same exact choices in different situations can make a visualization successful or completely miss the mark
but when there’s an audience sometimes there are better choices
particularly for a given context, especially when you consider the end purpose of the visualizations, end users / audience
the same exact choices in different situations can make a visualization successful or completely miss the mark
due to all the geocoded data available, quite popular to put data visualizations on maps. but something to be careful of is that often times these end up being proxies for pop. density
for instance, consider what our stick figure friend in pointing out here:
Our site’s users
subscribers to Martha Stewart Living
Consumers of furry pornagraphy
These all just show that more of each group of people exist in more populous regions. :facepalm:
use maps only when geographic data is relevant
John Nelson
@John_M_Nelson
Discuss what human eyes can detect. What they’re good at, what they’re not.
The one single thing pie charts are good at is when you're comparing 2-3 different data points with very different amounts of information.
The human eye isn’t good at scribing quantitative value to two-dimensional space. Said more simply: pie charts are hard for people to read. When pieces are close in size, it’s difficult (or likely impossible) to tell which is bigger. When they aren’t close in size, the best you can do is determine that one is bigger than the other, but you can’t judge by how much.
These pie charts, however, are perfect examples of good pie charts. Because they are only comparing two items within the pie chart, you can gather a lot of information.
Leonardo - more than half know him as Renaissance artist
Michelangelo - less than half
Donatello - less than a quarter
Raphael - just over half
Bar and column graphs are great representations of categorical data, in which you can count the number of different categories. Bar charts are sometimes avoided because they are common. Mistake! One of the strong points of bar charts is that they are common, requiring less of a learning curve for their audience. Rather than needing to use brain power to understand how to read the graph, your audience spends it figuring out what information to take away from visual.
Bar charts are easy for our eyes to read and understand. Our eyes compare the end points of the bars, so it is easy to see quickly which categories is the biggest, which is smallest, and also the incremental differences between categories
Because of how our eyes compare the relative end points of the bars, it is important that they have a zero baseline to prevent giving false visual impressions.
sometimes the simplest viz is best
conveys 1 (or few) message(s)
easy to digest
but even with the seemingly simple…. there are choices
Bar and column graphs are great representations of categorical data, in which you can count the number of different categories. Bar charts are sometimes avoided because they are common. Mistake! One of the strong points of bar charts is that they are common, requiring less of a learning curve for their audience. Rather than needing to use brain power to understand how to read the graph, your audience spends it figuring out what information to take away from visual.
Bar charts are easy for our eyes to read and understand. Our eyes compare the end points of the bars, so it is easy to see quickly which categories is the biggest, which is smallest, and also the incremental differences between categories
Because of how our eyes compare the relative end points of the bars, it is important that they have a zero baseline to prevent giving false visual impressions.
sometimes the simplest viz is best
conveys 1 (or few) message(s)
easy to digest
but even with the seemingly simple…. there are choices
Bar graphs and line graphs can seem nearly interchangeable but generally, line graphs work best for continuous data, whereas bar and column graphs work best for categorical data. Continuous data is quantitative, you cannot count the number of different values. This includes data like sales, height, profit, time, etc. (Although, time can be looked at as categorical data, as in the last example where the bars were clumped into decades)
Examples: Price in dinner bill compared to tip, tuition cost at a university over time, etc.
It may be easy to discount tables, but tables still have their place. Tables are great for communicating to a mixed audience whose members will each look for their particular row of interest
Joey Cherdarchuk
depending on what you want to highlight or represent, the same data can manifest in radically different ways.
to illustrate the power and importance of consider what story you’re telling we’ll revisit that earlier drought visualization.
same data can be presented in different ways depending on purpose
This map uses the EXACT same data as the other map (and by the same person so the way that they handled, cleaned and analyzed is also the same), just this one is stress where droughts were RARE.
in both cases, these are relatively simple stories that you can quickly absorb (at least on a high level — drought happening in these areas and not in these)
This map uses the EXACT same data as the other map (and by the same person so the way that they handled, cleaned and analyzed is also the same), just this one is stress where droughts were RARE.
in both cases, these are relatively simple stories that you can quickly absorb (at least on a high level — drought happening in these areas and not in these)
with more complex stories, you need to provide much more information / context before thrusting everything upon your audience
some folks that are particularly great at this are at the NY Times’s Upshot so we’ll use Amanda Cox’s story / visualization about the voting habits of Americans to help illustrate an effective way to do this.
@Nate_Cohn, @amandacox
with more complex stories, you need to provide much more information / context before thrusting everything upon your audience
ensure results are communicable
avoid solving the wrong problem
encourages more / better data
ensure results are communicable
avoid solving the wrong problem
encourages more / better data
ensure results are communicable
avoid solving the wrong problem
encourages more / better data
ensure results are communicable
avoid solving the wrong problem
encourages more / better data
Right now, only 25% of the K-12 schools in the U.S. offer computer science with programming and coding, and only 28 states allow those courses to count towards high school graduation requirements, according to the White House.
Recognizing Chicago's success, the White House launched a national CS4All initiative, which aims to provide funding for states and school districts with the goal of making sure every K-12 student has access to computer science curriculum.
Just this year, the Chicago Public Board of Education announced that computer science courses would become a graduation requirement for all high school students. The Chicago Public School district is working with Code.org and other organizations to further develop a CS education curriculum to implement across all its high schools.
In doing so, Chicago Public Schools became the first school district in the country to elevate computer science as a core requirement for high school, separate from math and science.
"Making sure that our students are exposed to STEM and computer science opportunities early on is critical in building a pipeline to both college to career," said Mayor Emanuel. "Requiring computer science as a core requirement will ensure that our graduates are proficient in the language of the 21st century so that they can compete for the jobs of the future."
bar charts
very good for easy comparison
scales well
no granularity
no concept of the size of the study
no individuality to students - all clumped together
bar charts
very good for easy comparison
scales well
no granularity
no concept of the size of the study
no individuality to students - all clumped together
bar charts
very good for easy comparison
scales well
no granularity
no concept of the size of the study
no individuality to students - all clumped together
make comparisons very easy
scale well
show relative sizes in a compact amount of space
harder to gauge relative sizes, but these are fun
subcategories can be patterns inside the bubble
emphasize the number of objects more than bar chart- distributed in 2d
- emphasize the fragmentedness
relative size can be determined but comparisons are harder
other extractable data
could also do this for emotion words, like exciting, cool, fun
as well as proper nouns such as code.org, google, and angry birds
- more visually compelling
- emphasizes the size of the study and *students* rather than aggregates
- Gives personality and individuality to each person
- more visually compelling
- emphasizes the size of the study and *students* rather than aggregates
- Gives personality and individuality to each person
blurb list: quick to implement, shows everything at once
slide blurbs: show more sentences, gives user control to browse or just watch them cycle through, more compact
slide blurbs: show more sentences, gives user control to browse or just watch them cycle through, more compact
crowd blurbs: cute, humanizes the dots, not as much control for the user
crowd blurbs: cute, humanizes the dots, not as much control for the user
learn rules of thumb and best practices
but don’t be constrained by them
know them, to make informed decisions for when to follow and when to break
use context, audience, purpose to guide you
learn rules of thumb and best practices
but don’t be constrained by them
know them, to make informed decisions for when to follow and when to break
use context, audience, purpose to guide you
learn rules of thumb and best practices
but don’t be constrained by them
know them, to make informed decisions for when to follow and when to break
use context, audience, purpose to guide you