This document provides an overview of big data analytics and data visualization. It discusses key concepts like data wrangling, exploring patterns, drawing conclusions, and communicating findings. Common techniques are also summarized, including classification, clustering, association rules, and predictive analytics. Specific algorithms like decision trees, k-means clustering, and hierarchical clustering are explained. The CRISP-DM process model and applications of analytics in areas like customer understanding and process optimization are also covered at a high level. Visualization is presented as an important part of the overall analytics process.
This is a presentation I gave on Data Visualization at a General Assembly event in Singapore, on January 22, 2016. The presso provides a brief history of dataviz as well as examples of common chart and visualization formatting mistakes that you should never make.
Data visualizations make huge amounts of data more accessible and understandable. Data visualization, or "data viz," is becoming largely important as the amount of data generated is increasing and big data tools are helping to create meaning behind all of that data.
This SlideShare presentation takes you through more details around data visualization and includes examples of some great data visualization pieces.
North Raleigh Rotarian Katie Turnbull gave a great presentation at our Friday morning extension meeting about data visualization. Katie is a consultant at research and advisory firm, Gartner, Inc.
Data visualization in data science: exploratory EDA, explanatory. Anscobe's quartet, design principles, visual encoding, design engineering and journalism, choosing the right graph, narrative structures, technology and tools.
This slide deck gives a general overview of Data Visualization, with inspiring examples, the strength and weaknesses of the human visual system, a few technical frameworks that may be used for creating your own visualizations and some design concepts from the data visualization field.
This is a presentation I gave on Data Visualization at a General Assembly event in Singapore, on January 22, 2016. The presso provides a brief history of dataviz as well as examples of common chart and visualization formatting mistakes that you should never make.
Data visualizations make huge amounts of data more accessible and understandable. Data visualization, or "data viz," is becoming largely important as the amount of data generated is increasing and big data tools are helping to create meaning behind all of that data.
This SlideShare presentation takes you through more details around data visualization and includes examples of some great data visualization pieces.
North Raleigh Rotarian Katie Turnbull gave a great presentation at our Friday morning extension meeting about data visualization. Katie is a consultant at research and advisory firm, Gartner, Inc.
Data visualization in data science: exploratory EDA, explanatory. Anscobe's quartet, design principles, visual encoding, design engineering and journalism, choosing the right graph, narrative structures, technology and tools.
This slide deck gives a general overview of Data Visualization, with inspiring examples, the strength and weaknesses of the human visual system, a few technical frameworks that may be used for creating your own visualizations and some design concepts from the data visualization field.
Data Analytics with R, Contents and Course materials, PPT contents. Developed by K K Singh, RGUKT Nuzvid.
Contents:
Introduction to Data, Information and Data Analytics,
Types of Variables,
Types of Analytics
Life cycle of data analytics.
The Institution's Innovation Council (Ministry of HRD initiative) and the Institution of Electronics and Telecommunication Engineers (IETE) invited me to grace "World Telecommunication & Information Society Day" on 18 May 2020.
This presentation will help you understand the basic building blocks of Business Intelligence. Learn how decisions are triggered, the complete decision process and who makes decisions in the corporate world.
More importantly, understand core components of a Business Intelligence architecture such as a data warehouse, data mining, OLAP (Online analytical procession) , OLTP (Online Transaction Processing) and data reporting. Each component plays an integral part which enables today's managers and decision makers collect, analyze and interpret data to make it actionable for decision making.
Business intelligence has become an integral part that needs to be incorporated to ensure business survival. It is a tool that helps analyze historical data and forecast future so that your are always one step ahead in your business.
Please feel free to like, share and comment as you please!
Introduction to Business Analytics Part 1 published by BeamSync.
BeamSync is providing business analytics training course in Bangalore. If you are looking for analytics training then visit BeamSync. Regular classes are running during the weekend.
For details visit: http://beamsync.com/business-analytics-training-bangalore/
Slides used for a presentation to introduce the field of business analytics. Covers what BA is, how it is a part of business intelligence, and what areas make up BA.
Data Analytics with R, Contents and Course materials, PPT contents. Developed by K K Singh, RGUKT Nuzvid.
Contents:
Introduction to Data, Information and Data Analytics,
Types of Variables,
Types of Analytics
Life cycle of data analytics.
The Institution's Innovation Council (Ministry of HRD initiative) and the Institution of Electronics and Telecommunication Engineers (IETE) invited me to grace "World Telecommunication & Information Society Day" on 18 May 2020.
This presentation will help you understand the basic building blocks of Business Intelligence. Learn how decisions are triggered, the complete decision process and who makes decisions in the corporate world.
More importantly, understand core components of a Business Intelligence architecture such as a data warehouse, data mining, OLAP (Online analytical procession) , OLTP (Online Transaction Processing) and data reporting. Each component plays an integral part which enables today's managers and decision makers collect, analyze and interpret data to make it actionable for decision making.
Business intelligence has become an integral part that needs to be incorporated to ensure business survival. It is a tool that helps analyze historical data and forecast future so that your are always one step ahead in your business.
Please feel free to like, share and comment as you please!
Introduction to Business Analytics Part 1 published by BeamSync.
BeamSync is providing business analytics training course in Bangalore. If you are looking for analytics training then visit BeamSync. Regular classes are running during the weekend.
For details visit: http://beamsync.com/business-analytics-training-bangalore/
Slides used for a presentation to introduce the field of business analytics. Covers what BA is, how it is a part of business intelligence, and what areas make up BA.
In a world of data explosion, the rate of data generation and consumption is on the increasing side,
there comes the buzzword - Big Data.
Big Data is the concept of fast-moving, large-volume data in varying dimensions (sources) and
highly unpredicted sources.
The 4Vs of Big Data
● Volume - Scale of Data
● Velocity - Analysis of Streaming Data
● Variety - Different forms of Data
● Veracity - Uncertainty of Data
With increasing data availability, the new trend in the industry demands not just data collection but making an ample sense of acquired data - thereby, the concept of Data Analytics.
Taking it a step further to further make futuristic prediction and realistic inferences - the concept
of Machine Learning.
A blend of both gives a robust analysis of data for the past, now and the future.
There is a thin line between data analytics and Machine learning which becomes very obvious
when you dig deep.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
4. 4
You will learn a few data analysis topics
Posing a question
Wrangling your data into a format you can use and fixing
any problems with it
Exploring the data, finding patterns in it, and building
your intuition about it
Drawing conclusions and/or making predictions
Communicating your findings
5. 5
What is Big Data Analytics?
Data analytics is an emerging technique that dives into a
data set without prior set of hypotheses
Accumulation of raw data captured from various sources
(i.e. discussion boards, emails, exam logs, chat logs in e-
learning systems) can be used to identify fruitful
patterns and relationships
Examining large amount of data
8. 8
Applications of Data analytics
Understanding and targetting Customers
Understanding and optimizing Business Processes
Improving Healthcare and Public Health
Optimizing Machine and Device Performance
Financial Trading
Improving and Optimizing Cities and Countries
Can you think of anything more??
How??
18. 18
Data Classification
Some Examples:
Separating Customer based on gender
Data sorting based on content type/file type,size etc
Classifying data into restricted, pubic or private data
types
"Among all the customers of Zalando, which are likely to respond to a new
offer?"
Will respond Will not respond
19. 19
Decision trees (DT)
Build classification or regression models in the form of Tree
structure
Classification Methods
21. 21
Classification Methods
Support Vector Machines(SVM)
Each data item is a point in n-dimensional space(n number
of features)
Find the hyperplane that differentiate the two classes
23. 23
Classification Methods
Select the hyperplane which
segragates two classes better
Ans: B
Maximising the distance between
nearest data point (Margin)
Ans: C
Select hyper-plane which classifies
accurately prior to maximising margin
Ans: A
Ignores outliers
Introduce: Z=x²+y²
In original input space
hyperplane looks like a circle
24. 24
Classification Methods
Bayesian Networks
Dotted lines: Potential Links
Blue box: Additional nodes and links between input
and output
Based on probability theory.
Can mix expert opinion and data to build
models
Backwards reasoning - in addition to
predicting outputs given inputs, we can
use output values to infer inputs.
Support for missing data during learning
and classification
26. 26
Association Rules
Discovering interesting realtions between variables in
large DB
Example Problems
Which products are frequently bought together by
customers? (Basket Analysis)
● DataTable = Receipts x Products
● Results could be used to change the placements of products in the market
Which courses tend to be attended together?
● DataTable = Students x Courses
● Results could be used to avoid scheduling conflicts....
27. 27
Association Rules
Examples
Bread, Cheese → Red Wine.
Customers that buy bread and cheese, also tend to buy red
wine
Machine Learning → Web Mining, ML Praktikum
Students that take 'Machine Learning' also take 'Web Mining'
and the 'Machine Learning Praktikum'
28. 28
Apriori Principle illustration
If {c,d,e} is frequent then all
subssets of this itemset are
frequent
Support Based pruning illustration
If {a,b} is infrequent then all
supersets of this itemset are
infrequent
Association Rules
30. 30
Cluster analysis
Task of grouping a set of objects in such a way that
objects in the same group (called a cluster) are more
similar (in some sense or another) to each other than to
those in other groups (clusters).
Examples
Biology: What is the taxonomy of the species?
Education: What are student groups that need special
attention?
Business: What are the customer segments?
33. 33
K-means clustering
k-means clustering aims to partition n observations into k
clusters in which each observation belongs to the cluster
with the nearest mean, serving as a prototype of the
cluster
Unsupervised learning algorithm
Define k centroids, one for each cluster
Take each point in the data set and associate it to the
nearest centroid
Recalculate the centroids
Repeat until the centroid doesnt move
34. 34
Hierarchical clustering
Groups data over a variety of scales by creating a cluster
tree or dendrogram.
Find the similarity or dissimilarity between every pair of
objects in the data set.
Group the objects into a binary, hierarchical cluster
tree.
Determine where to cut the hierarchical tree into
clusters
39. 39
Predictive Analytics
Make predictions about unknown future events based on
past happenings
Why now?
Growing volumes and types of data, and more interest in
using data to produce valuable insights.
Faster, cheaper computers.
Easier-to-use software.
Tougher economic conditions and a need for competitive
differentiation.
40. 40
Predictive Analytics
improve pattern detection and prevent criminal
behavior.
determine customer responses or purchases, as well as
promote cross-sell opportunities
forecast inventory and manage resources, to set ticket
prices.
Credit scores are used to assess a buyer’s likelihood of
default for purchases
41. 41
Data Visualization
Data visualization is the process of converting raw data
into easily understood pictures of information that
enable fast and effective decisions.
Visualization plays the key role in the efficient
communication of information (especially with large
amounts of information).
Visualization is used as a "check" to verify / falsify
results of automatic data analysis.
42. 42
Why Data Visualization?
Identify areas that need attention or improvement.
Clarify which factors influence customer behavior.
Help you understand which products to place where.
Predict sales volumes.
Data visualization is a quick, easy way to convey concepts in a
universal manner
44. 44
Visual Analytics Loop
Visual Analytics will foster the constructive evaluation, correction and rapid
improvement of our processes and models and - ultimately - the improvement of our
knowledge and our decisions
46. 46
Visual Analytics vs Information Visualization
Visual analytics is more than just visualization. It can rather be seen as an
integral approach to decision-making, combining visualization, human
factors and data analysis.
C04-0.01 room number
Starting
LMS registration >> BD2016
Groups
Who are we repeat in brief
What are we doing
Interactive session
Why are you sitting here? Why do u wanna do data anlysis? What dat do you have? Or what data you are familiar with? // for business people
Convert data into a preferred data format
Make others understand what you have found esp to business people
Vini
Do in day to day life
Examining raw data with the purpose of drawing conclusions about that information
Allows company to make better dcisions
3 types: Exploratory – new features in the data are discovered
Confirmatory – existing hypothesis are validted
Qualitative- draw conclusion from non numerical datalike words
Why would you use big data analytics?
Banks and credit cards companies: analyze withdrawal and spending patterns to prevent fraud or identity theft.
Ecommerce companies examine Web site - buy a product or service based upon prior purchases or viewing trends.
Predictive maintenance
Virus signature
Profit
Digital advertisement (targeted advetisement)
Recommender systems
Image recognition
Speech recognition
Gaming (motion gaming)
Price comparison websites – pricerunner, pricegrabber, junglee
Airline route planning
Delivery logistics – find best routes to ship
Self driving car
Robots
Improving science and research
Improving sports performance
Cities – traffic monitoring
danny
danny
Danny
Determine business objectives
Assess situations
Determine data mining goals
Produce poroject plan
Danny
Collect initial data
Describe data
Explore data
Verify data
Danny
Select
Clean
Construct
Integrate
Format data
Danny
#select mofelling techniques
Generate test design
Build model
Assess model
Dannyevaluate results
Review process
Determine next step
Danny
Plan deployment
Monitoring and maintenance
Review project
classification - a set of predefined classes and want to know which class a new object belongs to.
Clustering - group a set of objects and find whether there is some relationship between the objects.
classification - supervised learning
clustering - unsupervised learning.
Association : discovering interesting relations ´between variables
Learns a method for predicting the instance class from pre labelled classified instances
Sorting data within a db or repository
Decision trees
Support vector machines
Bayesian networks
DT: Clearly lay out the problem so that all options can be challenged.
Allow us to analyze fully the possible consequences of a decision.
Provide a framework to quantify the values of outcomes and the probabilities of achieving them.
Help us to make the best decisions on the basis of existing information and best guesses.
Apriori principle : Any subset of a frequent itemset must be frequent
Medicine : What are the diagnostic clusters?
Business: common needs, attitude, beahavious, demographics
Student groups : what issues they have for not excelling in exams: what psychological, environmental, aptitudinal, affective, and attitudinal factors