This document discusses the importance of data fluency skills in the 21st century. It defines key terms like data science, machine learning, data literacy, and statistical literacy. While these fields require extensive training, the document argues that domain expertise combined with basic data analysis skills can solve many problems. These basic skills include understanding data structures, using programming to interact with data, and exploratory data analysis through visualization. The data analysis process involves defining problems, collecting and preparing data, visualization and modeling, and communicating results. RStudio is presented as a tool that can support the entire data analysis process within a single integrated development environment.
Huge amount of data is being collected everywhere - when we browse the web, go to the doctor's clinic, visit the supermarket, tweet or watch a movie. This plethora of data is dealt under a new realm called Data Science. Data Science is now recognized as a highly-critical growing area with impact across many sectors including science, government, finance, health care, social networks, manufacturing, advertising, retail,
and others. This colloquium will try to provide an overview as well as clarify bits and bats about this emerging field.
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...DATAVERSITY
Google “citizen data scientist” today and you will see about 1M results. That number is data. It may be interesting, but it is meaningless without context. Sometimes it appears that we are drowning in data from systems and sensors but starving for insights. We definitely produce more of the former than the latter, which has created demand for more powerful tools to simplify the process and lower the skills requirement for analysis. As vendors build systems to meet this demand, we hear about the coming ”democratization” of big data as more people at varying levels within organizations are empowered to find meaning and improve their own performance with data-driven insights. This is a good thing, but it does require caution.
To paraphrase Col Jessup in A Few Good Men: You want answers? You can’t handle the data.
In this webinar, we will survey emerging approaches to simplifying analysis, and discuss the benefits, dangers, and skills required for individuals and organizations to thrive in the brave new world of analytics everywhere, for everyone.
This presentation briefly explains the following topics:
Why is Data Analytics important?
What is Data Analytics?
Top Data Analytics Tools
How to Become a Data Analyst?
Data Science is in high demand, the melting pot
of complex skills requires a qualified data scientist have made them the unicorns in today's data-driven landscape.
This Presentation gives an insight into what is big data, data analytics, difference between big data and data science.And also salary trends in big data analytics.
Huge amount of data is being collected everywhere - when we browse the web, go to the doctor's clinic, visit the supermarket, tweet or watch a movie. This plethora of data is dealt under a new realm called Data Science. Data Science is now recognized as a highly-critical growing area with impact across many sectors including science, government, finance, health care, social networks, manufacturing, advertising, retail,
and others. This colloquium will try to provide an overview as well as clarify bits and bats about this emerging field.
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...DATAVERSITY
Google “citizen data scientist” today and you will see about 1M results. That number is data. It may be interesting, but it is meaningless without context. Sometimes it appears that we are drowning in data from systems and sensors but starving for insights. We definitely produce more of the former than the latter, which has created demand for more powerful tools to simplify the process and lower the skills requirement for analysis. As vendors build systems to meet this demand, we hear about the coming ”democratization” of big data as more people at varying levels within organizations are empowered to find meaning and improve their own performance with data-driven insights. This is a good thing, but it does require caution.
To paraphrase Col Jessup in A Few Good Men: You want answers? You can’t handle the data.
In this webinar, we will survey emerging approaches to simplifying analysis, and discuss the benefits, dangers, and skills required for individuals and organizations to thrive in the brave new world of analytics everywhere, for everyone.
This presentation briefly explains the following topics:
Why is Data Analytics important?
What is Data Analytics?
Top Data Analytics Tools
How to Become a Data Analyst?
Data Science is in high demand, the melting pot
of complex skills requires a qualified data scientist have made them the unicorns in today's data-driven landscape.
This Presentation gives an insight into what is big data, data analytics, difference between big data and data science.And also salary trends in big data analytics.
Data Scientist has been regarded as the sexiest job of the twenty first century. As data in every industry keeps growing the need to organize, explore, analyze, predict and summarize is insatiable. Data Science is creating new paradigms in data driven business decisions. As the field is emerging out of its infancy a wide range of skill sets are becoming an integral part of being a Data Scientist. In this talk I will discuss the different driven roles and the expertise required to be successful in them. I will highlight some of the unique challenges and rewards of working in a young and dynamic field.
Effectiveness of Data Analytics and Big Data in United States Presidential Elections, Polls, Voting and Campaigns. U.S. presidential elections are the most talked about topic now a days. Who will win race? Donald Trump or Hillary Clinton ? This presentation gives an insight on how people can utilize the data analytics approaches to achieve specific goals and get insight to the target users.
The Future of Business Intelligence: Data VisualizationKristen Sosulski
Kristen Sosulski
The future of business intelligence: Data Visualization
How can data visualization be used as a platform to reveal intelligent insights and help business analysts make timely decisions? In this talk, Kristen Sosulski will discuss the opportunities for personalized, location aware, context relevant, and platform independent information visualizations as a toolkit for business analysts.
Data science is different from Data Analytics,Data Engineering,Big Data.
Presentation about Data Science.
What is Data Science its process future and scope.
Data Science Presentation By Amit Singh.
"Sexiest job of 21st century"
THIRUVANANTHAPURAM, JULY 19:
Marlabs, a Bangalore-based provider of IT services, is sponsoring a ‘Business Intelligence Technology’ conference at the Thiruvananthapuram Technopark on Friday.
The event will focus on emerging trends in Business Intelligence (BI) Technology, a Marlabs spokesman said.
It will feature eminent speakers from leading information technology companies including Marlabs, Infosys, UST Global, NeST and Kreara.
The conference will discuss latest developments in emerging BI areas such as predictive analytics, Big Data, mobile BI, social BI and advanced visualisations. It will also highlight the growing job opportunities for newly graduated software professionals in the Tier II and Tier III cities.
TechWise with Eric Kavanagh, Dr. Robin Bloor and Dr. Kirk Borne
Live Webcast on July 23, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=59d50a520542ee7ed00a0c38e8319b54
Analytical applications are everywhere these days, and for good reason. Organizations large and small are using analytics to better understand any aspect of their business: customers, processes, behaviors, even competitors. There are several critical success factors for using analytics effectively: 1) know which kind of apps make sense for your company; 2) figure out which data sets you can use, both internal and external; 3) determine optimal roles and responsibilities for your team; 4) identify where you need help, either by hiring new employees or using consultants 5) manage your program effectively over time.
Register for this episode of TechWise to learn from two of the most experienced analysts in the business: Dr. Robin Bloor, Chief Analyst of The Bloor Group, and Dr. Kirk Borne, Data Scientist, George Mason University. Each will provide their perspective on how companies can address each of the key success factors in building, refining and using analytics to improve their business. There will then be an extensive Q&A session in which attendees can ask detailed questions of our experts and get answers in real time. Registrants will also receive a consolidated deck of slides, not just from the main presenters, but also from a variety of software vendors who provide targeted solutions.
Visit InsideAnlaysis.com for more information.
New AI-based analytics accelerate truth-finding missions along the typical dimensions: Who, When, Where, Why, What, How and How Much.
In this very practical webinar, Johannes Scholtes (ZyLAB) and Paul Starrett (licensed attorney and private investigator with extensive experience in high-profile investigations), will talk with Mary Mack (ACEDS) and illustrate how these techniques help legal professionals to speed up the eDiscovery process and improve the quality.
Data Science Applications | Data Science For Beginners | Data Science Trainin...Edureka!
** Data Science Certification using R: https://www.edureka.co/data-science **
This Edureka "Data Science Applications" PPT takes you through the various domains in which data science is being deployed today, along with some potential applications of this technology. The world today runs on data and this PPT shows exactly that.
Check out our Data Science Tutorial blog series: http://bit.ly/data-science-blogs
Check out our complete Youtube playlist here: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Gayatri Patel, eBay, presents at the Big Analytics 2012 Roadshow
The wonders of what data can do for an organization is measured in the productivity and competitiveness of their team's decisions. Some believe more data is the key. Agreed...but good decisions require more than just deriving intelligence from big data. In this dynamic market, the need to socialize and evolve ideas with other teams, quickly correlate information across sources, and test ideas to fail fast early are strong enablers to gain competitive footing. eBay¹s analytic and technology advancements garners insights and approaches that continue to help our employees tell their "data stories" and make better decisions.
Big data for situation awareness and decision makingPaloma Diaz
In this lecture we analyse some design challenges and approaches to envision systems that help to take decisions in the era of big data and advanced interaction
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...Dana Gardner
Transcript of a discussion on how HTI Labs in London provides the means and governance with their Schematiq tool to bring critical data to the interface that users want most.
Understanding Data Science: Unveiling the Basics
What is Data Science?
Data science is an interdisciplinary field that combines techniques from statistics, mathematics, computer science, and domain knowledge to extract insights and knowledge from data. It involves collecting, processing, analyzing, and interpreting large and complex datasets to solve real-world problems.
Importance of Data Science
In today's data-driven world, organizations are inundated with data from various sources. Data science allows them to convert this raw data into actionable insights, enabling informed decision-making, improved efficiency, and innovation.
Intersection of Data Science, Statistics, and Computer Science
Data science borrows heavily from statistics and computer science. Statistical methods help in understanding data patterns, while computer science provides the tools to process and analyze large datasets efficiently.
Key Components of Data Science
Data Collection and Storage
The first step in data science is gathering relevant data from various sources. This data is then stored in databases or data warehouses for further processing.
Data Cleaning and Preprocessing
Raw data is often messy and inconsistent. Data cleaning involves removing errors, duplicates, and irrelevant information. Preprocessing includes transforming data into a usable format.
Exploratory Data Analysis (EDA)
EDA involves visualizing and summarizing data to uncover patterns, trends, and anomalies. It helps in forming hypotheses and guiding further analysis.
Machine Learning and Predictive Modeling
Machine learning algorithms are used to build predictive models from data. These models can make predictions and decisions based on new, unseen data.
Data Visualization
Visual representations of data, such as graphs and charts, help in understanding complex information quickly. Data visualization aids in conveying insights effectively.
The Data Science Process
Problem Definition
The data science process begins with understanding the problem you want to solve and defining clear objectives.
Data Collection and Understanding
Collect relevant data and understand its context. This step is crucial as the quality of the analysis depends on the quality of the data.
Data Preparation
Clean, preprocess, and transform the data into a suitable format for analysis. This step ensures that the data is accurate and ready for modeling.
Model Building
Select appropriate algorithms and build predictive models using machine learning techniques. This step involves training and fine-tuning the models.
Model Evaluation and Deployment
Evaluate the model's performance using metrics and test datasets. If the model performs well, deploy it for making predictions on new data.
Technologies Driving Data Science
Programming Languages
Languages like Python and R are widely used in data science due to their extensive libraries and versatility.
Machine Learning Libraries
Libraries like Scikit-Learn and TensorFlow prov
Data Scientist has been regarded as the sexiest job of the twenty first century. As data in every industry keeps growing the need to organize, explore, analyze, predict and summarize is insatiable. Data Science is creating new paradigms in data driven business decisions. As the field is emerging out of its infancy a wide range of skill sets are becoming an integral part of being a Data Scientist. In this talk I will discuss the different driven roles and the expertise required to be successful in them. I will highlight some of the unique challenges and rewards of working in a young and dynamic field.
Effectiveness of Data Analytics and Big Data in United States Presidential Elections, Polls, Voting and Campaigns. U.S. presidential elections are the most talked about topic now a days. Who will win race? Donald Trump or Hillary Clinton ? This presentation gives an insight on how people can utilize the data analytics approaches to achieve specific goals and get insight to the target users.
The Future of Business Intelligence: Data VisualizationKristen Sosulski
Kristen Sosulski
The future of business intelligence: Data Visualization
How can data visualization be used as a platform to reveal intelligent insights and help business analysts make timely decisions? In this talk, Kristen Sosulski will discuss the opportunities for personalized, location aware, context relevant, and platform independent information visualizations as a toolkit for business analysts.
Data science is different from Data Analytics,Data Engineering,Big Data.
Presentation about Data Science.
What is Data Science its process future and scope.
Data Science Presentation By Amit Singh.
"Sexiest job of 21st century"
THIRUVANANTHAPURAM, JULY 19:
Marlabs, a Bangalore-based provider of IT services, is sponsoring a ‘Business Intelligence Technology’ conference at the Thiruvananthapuram Technopark on Friday.
The event will focus on emerging trends in Business Intelligence (BI) Technology, a Marlabs spokesman said.
It will feature eminent speakers from leading information technology companies including Marlabs, Infosys, UST Global, NeST and Kreara.
The conference will discuss latest developments in emerging BI areas such as predictive analytics, Big Data, mobile BI, social BI and advanced visualisations. It will also highlight the growing job opportunities for newly graduated software professionals in the Tier II and Tier III cities.
TechWise with Eric Kavanagh, Dr. Robin Bloor and Dr. Kirk Borne
Live Webcast on July 23, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=59d50a520542ee7ed00a0c38e8319b54
Analytical applications are everywhere these days, and for good reason. Organizations large and small are using analytics to better understand any aspect of their business: customers, processes, behaviors, even competitors. There are several critical success factors for using analytics effectively: 1) know which kind of apps make sense for your company; 2) figure out which data sets you can use, both internal and external; 3) determine optimal roles and responsibilities for your team; 4) identify where you need help, either by hiring new employees or using consultants 5) manage your program effectively over time.
Register for this episode of TechWise to learn from two of the most experienced analysts in the business: Dr. Robin Bloor, Chief Analyst of The Bloor Group, and Dr. Kirk Borne, Data Scientist, George Mason University. Each will provide their perspective on how companies can address each of the key success factors in building, refining and using analytics to improve their business. There will then be an extensive Q&A session in which attendees can ask detailed questions of our experts and get answers in real time. Registrants will also receive a consolidated deck of slides, not just from the main presenters, but also from a variety of software vendors who provide targeted solutions.
Visit InsideAnlaysis.com for more information.
New AI-based analytics accelerate truth-finding missions along the typical dimensions: Who, When, Where, Why, What, How and How Much.
In this very practical webinar, Johannes Scholtes (ZyLAB) and Paul Starrett (licensed attorney and private investigator with extensive experience in high-profile investigations), will talk with Mary Mack (ACEDS) and illustrate how these techniques help legal professionals to speed up the eDiscovery process and improve the quality.
Data Science Applications | Data Science For Beginners | Data Science Trainin...Edureka!
** Data Science Certification using R: https://www.edureka.co/data-science **
This Edureka "Data Science Applications" PPT takes you through the various domains in which data science is being deployed today, along with some potential applications of this technology. The world today runs on data and this PPT shows exactly that.
Check out our Data Science Tutorial blog series: http://bit.ly/data-science-blogs
Check out our complete Youtube playlist here: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Gayatri Patel, eBay, presents at the Big Analytics 2012 Roadshow
The wonders of what data can do for an organization is measured in the productivity and competitiveness of their team's decisions. Some believe more data is the key. Agreed...but good decisions require more than just deriving intelligence from big data. In this dynamic market, the need to socialize and evolve ideas with other teams, quickly correlate information across sources, and test ideas to fail fast early are strong enablers to gain competitive footing. eBay¹s analytic and technology advancements garners insights and approaches that continue to help our employees tell their "data stories" and make better decisions.
Big data for situation awareness and decision makingPaloma Diaz
In this lecture we analyse some design challenges and approaches to envision systems that help to take decisions in the era of big data and advanced interaction
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...Dana Gardner
Transcript of a discussion on how HTI Labs in London provides the means and governance with their Schematiq tool to bring critical data to the interface that users want most.
Understanding Data Science: Unveiling the Basics
What is Data Science?
Data science is an interdisciplinary field that combines techniques from statistics, mathematics, computer science, and domain knowledge to extract insights and knowledge from data. It involves collecting, processing, analyzing, and interpreting large and complex datasets to solve real-world problems.
Importance of Data Science
In today's data-driven world, organizations are inundated with data from various sources. Data science allows them to convert this raw data into actionable insights, enabling informed decision-making, improved efficiency, and innovation.
Intersection of Data Science, Statistics, and Computer Science
Data science borrows heavily from statistics and computer science. Statistical methods help in understanding data patterns, while computer science provides the tools to process and analyze large datasets efficiently.
Key Components of Data Science
Data Collection and Storage
The first step in data science is gathering relevant data from various sources. This data is then stored in databases or data warehouses for further processing.
Data Cleaning and Preprocessing
Raw data is often messy and inconsistent. Data cleaning involves removing errors, duplicates, and irrelevant information. Preprocessing includes transforming data into a usable format.
Exploratory Data Analysis (EDA)
EDA involves visualizing and summarizing data to uncover patterns, trends, and anomalies. It helps in forming hypotheses and guiding further analysis.
Machine Learning and Predictive Modeling
Machine learning algorithms are used to build predictive models from data. These models can make predictions and decisions based on new, unseen data.
Data Visualization
Visual representations of data, such as graphs and charts, help in understanding complex information quickly. Data visualization aids in conveying insights effectively.
The Data Science Process
Problem Definition
The data science process begins with understanding the problem you want to solve and defining clear objectives.
Data Collection and Understanding
Collect relevant data and understand its context. This step is crucial as the quality of the analysis depends on the quality of the data.
Data Preparation
Clean, preprocess, and transform the data into a suitable format for analysis. This step ensures that the data is accurate and ready for modeling.
Model Building
Select appropriate algorithms and build predictive models using machine learning techniques. This step involves training and fine-tuning the models.
Model Evaluation and Deployment
Evaluate the model's performance using metrics and test datasets. If the model performs well, deploy it for making predictions on new data.
Technologies Driving Data Science
Programming Languages
Languages like Python and R are widely used in data science due to their extensive libraries and versatility.
Machine Learning Libraries
Libraries like Scikit-Learn and TensorFlow prov
Complete Data scientist roadmap and all about data science. How to become a data scientist. What is Data science. Who is data scientist. Why Data science is the future.
Data science is an integrative field that uses scientific methods, processes, algorithms, and systems to extract, knowledge and awareness from data in various forms
You've heard the news, Data Science is the cool new career opportunity sweeping the world. Come learn from Thinkful Mentors all about this new and exciting industry.
Data and Analytics Career Paths, Presented at IEEE LYC'19.
About Speaker:
Ahmed Amr is a Data/Analytics Engineer at Rubikal, where he leads, develops, and creates daily data/analytics operations, which includes data ingestion , data streaming, data warehousing, and analytical dashboards. Ahmed is graduated from Computer Engineering Department, Alexandria University; and he is currently pursuing his MSc degree in Computer Science, AAST. Professionally, Ahmed worked with Egyptian/US startups such as (Badr, Incorta, WhoKnows) to develop their data/analytics projects. Academically, Ahmed worked as a Teaching Assistant in CS department, AAST. Ahmed helps software companies to develop robust data engineering infrastructure, and powerful analytical insights.
References:
1) https://www.datacamp.com/community/tutorials/data-science-industry-infographic
2) Analytics: The real-world use of big data, IBM, Executive Report
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
1. Data Fluency for the 21st
Century
Martin Frigaard & Peter Spangler
access these slides: http://bit.ly/data-fluency-slides
icons by https://www.freepik.com/
comment on these slides: http://bit.ly/data-fluency-slides
2. Objectives
● Why are you here?
● Operational definitions
● Basic skills
● Data analysis toolkit
● Communicating with data
● Questions
3. Why are you here?
Data skills are in high demand!
'Data scientist' has been the sexiest job for over 5 years. Fortunately, many of the problems
businesses and organizations face do not require someone with a PhD in machine learning,
or a fancy software solution. Many of these problems can be solved by people with domain
knowledge, data analysis skills, curiosity and the ability to communicate.
4. Why are you here?
Government agencies, nonprofits, and non-governmental
organizations are also recognizing the need for data
analysis skills
- Data analysis has become an essential tool for all policy makers, agencies, and community
action organizations to demonstrate the evidence for their ideas.
- Data for Democracy: "We work together to make the world a better place. At the heart of our
collective efforts is how data and technology can be used for good. We work to help shape a
better future and make positive changes in communities around the globe."
https://www.datafordemocracy.org/about-us
- The Civic Analytics Network: "The network will collaborate on shared projects that advance
the use of data visualization and predictive analytics in solving important urban problems
related to economic opportunity, poverty reduction, and addressing the root causes of social
problems of equity and opportunity."
https://datasmart.ash.harvard.edu/news/article/about-the-civic-analytics-network-826
- Our World in Data: "We cannot know what is happening in the world from the daily news
alone. The news media focuses on single events, too often missing the long-lasting, forceful
changes that reshape the world we live in." https://ourworldindata.org/about
5. Why should more people be here?
Today, everyone needs to understand how data and statistics are shaping the
world we live in
Data are used to represent and
nearly every aspect of life...
- Redistricting has a huge effect on U.S. politics but is greatly misunderstood. This project
uncovers what’s really broken, what's not and whether gerrymandering can (or should) be
killed. Depending on the desired outcome, each of the different maps could represent the
“right” way to draw congressional district boundaries - fivethirtyeight's gerrymandering
project
6. Operational definitions
What is data science vs. machine learning?
Data science: "...integrates a set of problem definitions, algorithms, and processes
that can be used to analyze data so as to extract actionable insight...deals with both
structured and unstructured (big) data and encompasses principles from a range of
fields, including machine learning, statistics, data ethics and regulation, and
high-performance computing."
Machine learning: "The field of computer science research that focuses on
developing and evaluating algorithms that can extract useful patterns from data
sets."
- Both of these definitions involve a ton of school, training, and experience to understand.
However, as you can see, data science includes fields like statistics and machine learning.
- These are both far above what is required to work with data
- More on this here: https://arxiv.org/abs/1903.07639
7. Operational definitions
The good news!
Data science: "...integrates a set of problem definitions, algorithms, and processes
that can be used to analyze data so as to extract actionable insight...deals with both
structured and unstructured (big) data and encompasses principles from a range of
fields, including machine learning, statistics, data ethics and regulation, and
high-performance computing."
Machine learning: "The field of computer science research that focuses on
developing and evaluating algorithms that can extract useful patterns from data
sets."
USUALLY NOT NECESSARY!
- These are both far above what is required to work with data, create visualizations, and
gain useful insights!
- listen to this podcast:
https://soundcloud.com/dataframed/1-data-science-past-present-and-future
8. Operational definitions
Our concern is data fluency
Information literacy: "...the ability to know when there is a need for information,
to be able to identify, locate, evaluate, and effectively use that information for the
issue or problem at hand."
Data literacy: "...the ability to read, understand, create and communicate data as
information."
Statistical literacy: "...the ability to understand and reason with statistics and
data."
These are great--but why are they separated?
Why would you have one without the other?
9. Data Fluency
Data fluency combines 1) the situational assessment skills from
information literacy, 2) the storage, retrieval, manipulation, and
management abilities from data literacy, and 3) the problem
solving, reasoning, and critical thinking from statistical literacy.
Data fluency combines 1) the problem assessing skills from information literacy, 2) the
storage, retrieval, manipulation, and management abilities from data literacy, and the
problem solving, reasoning, and critical thinking from statistical literacy.
10. Operational definitions
skills that 'move across'
[Data] Transliteracy: "Transliteracy captures the idea of our capacity to
interact with information in whatever form it takes...[it] concerns the ability
to apply and transfer a range of skills and contextual insights to a variety of
settings. Rather than focusing on any one skill set or technology, transliteracy is
about fluidity of movement across a range of contexts. " - Transliteracy: The
Art and Craft of ‘Moving Across’
11. Basic Skills
What's required for analytic literacy?
1. Domain expertise: you need to know your stuff
2. Understanding data structures: know what gets measured, how it's stored, and
what it looks like
3. Programming: interact with data programmatically so you can express your
intentions clearly (and document your work)
4. Exploratory Data Analysis: be able to summarize and communicate the
characteristics and patterns of a data set, using tables, graphs, and visualizations
An analyst needs characteristics like curiosity, tenacity, and stick-with-it-ness.
12. Domain expertise
Providing the context and purpose
An analytic approach to solving problems typically starts with some version of the
following questions:
1. What happened?
2. Why did it happen?
3. What will happen if it continues?
4. What can we do about it (or what will happen to y if we do x)?
The people closest to a problem will often have the necessary information to solve it, so
training them to think analytically is a better long term solution than hiring an expensive
'data scientist' who doesn't know your business.
13. Data structures:
What kind of information is being collected?
What are data?
- Tweets
- Sales
- Addresses
How can we access them?
- API
- Relational databases
- Google sheets
Where are they stored?
- Tables (SQL, Google Sheets, etc.)
- Web structures (JSON)
14. Programming
Code is a necessary means of communication
"Instead of imagining that our main task is to instruct a computer what to do, let
us concentrate rather on explaining to human beings what we want a computer
to do." - Donald Knuth. "Literate Programming (1984)"
Should everyone learn to code?
- Knowing how to program "will vastly increase your potential in becoming a
valuable asset at any organization"
- "Having coding know-how equips you to better understand how the pieces of the
puzzle fit together in a business'
- "Coding doesn’t restrict you to a career in tech: it enhances the career, skills, or
interests you already have."
https://www.forbes.com/sites/laurencebradford/2016/06/20/why-every-millennial-should-
learn-some-code/#5ebd0b1870f2
15. Exploratory Data Analysis
The goal of the analysis is exploration (not models and algorithms)
- In order to know if you'll be able to use your data to predict anything, you'll
need to understand it's characteristics
- We do this through summaries, graphics, and visualizations
- "It is important to understand what you CAN DO before you learn to measure
how WELL you seem to have DONE it" - John Tukey
https://simplystatistics.org/2019/04/17/tukey-design-thinking-and-better-questions/
- ...goal of data analysis is to explore the data. In other words, data analysis is exploratory
data analysis...maybe this shouldn’t be so surprising given that Tukey wrote the book on
exploratory data analysis.
- In this paper, at least, he essentially dismisses other goals as overly optimistic or not really
meaningful.
- For the most part I agree with that sentiment, in the sense that looking for “the answer” in
a single set of data is going to result in disappointment. At best, you will accumulate
evidence that will point you in a new and promising direction. Then you can iterate,
perhaps by collecting new data, or by asking different questions.
- At worst, you will conclude that you’ve “figured it out” and then be shocked when
someone else, looking at another dataset, concludes something completely different.
In light of this, discussions about p-values and statistical significance are very much
beside the point.
16. The Data Analysis
Toolkit
The necessary steps for an
analytic data project are on
the left
As you can see, staying
inside the RStudio IDE
minimizes the number of
additional tools you'll have
to work with
Problem statement or
question
Data collection and
wrangling
Data visualization and
modeling
Data communication
RStudio IDE
The RStudio IDE is a complementary cognitive artifact.
....Expert users of the abacus are not users of the physical abacus—they use a
mental model in their brain. And expert users of slide rules can cast the ruler aside
having internalized its mechanics. Cartographers memorize maps, and Edwin
Hutchins has shown us how expert navigators form near symbiotic relationships
with their analog instruments.
So our upper Paleolithic lineage has always possessed artificial intelligence to the
extent our ancestors have been aided in this way. In modern life, mobile devices and
their apps—to-do apps, calendar apps, journaling apps, astronomy apps, game
apps, social apps, and on near infinitum—just recapitulate the three essential
elements of the astrolabe: memory, search, and calculation.
Compare these complementary cognitive artifacts to competitive cognitive artifacts
like the mechanical calculator, the global positioning systems in our cars and
phones, and machine learning systems powering our App ecosystem. In each of
these examples our effective intelligence is amplified, but not in the way of
complementary artifacts. In the case of competitive artifacts, when we are deprived
of their use, we are no better than when we started. They are not coaches and
teachers—they are serfs. We have created an artificial serf economy where
incremental and competitive artificial intelligence both amplifies our productivity
and threatens to diminish organic and complementary artificial intelligence, and
17. the ethics of this sort of mechanical labor are only now engaging the attention of
practitioners and policy makers.
http://nautil.us/blog/will-ai-harm-us-better-to-ask-how-well-reckon-with-our-hybri
d-nature
18. Case Study Follow the following link:
https://rstudio.cloud/project/322459
Collecting Google data
21. This is all stuff I've learned from other people!
1. Hadley Wickham
2. Hilary Mason
3. Greg Wilson
4. David Krakauer
5. David Robinson
6. Jenny Bryan
7. Charlotte Wickham
8. Bradley Boehmke
9. Benjamin S. Baumer
10. Mara Averick
11. Andrew Gelman
12. Lucy D'Agostino McGowan
I didn't come up with any of this stuff on my own--I learned it from these
great folks (and many others!)