The document discusses data mining and provides definitions and explanations of key concepts. It defines data mining as the process of discovering patterns in large data sets involving methods from statistics, machine learning, and database systems. It describes the main components of data mining as including classification, association rule learning, and clustering. Examples of real-world applications are also given such as market basket analysis, fraud detection, and scientific research.
This Presentation covers data mining, data mining techniques,
data analysis, data mining subtypes, uses of data mining, sources of data for mining, privacy concerns.
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
key note address delivered on 23rd March 2011 in the Workshop on Data Mining and Computational Biology in Bioinformatics, sponsored by DBT India and organised by Unit of Simulation and Informatics, IARI, New Delhi.
I do not claim any originality either to slides or their content and in fact aknowledge various web sources.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
This Presentation covers data mining, data mining techniques,
data analysis, data mining subtypes, uses of data mining, sources of data for mining, privacy concerns.
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
key note address delivered on 23rd March 2011 in the Workshop on Data Mining and Computational Biology in Bioinformatics, sponsored by DBT India and organised by Unit of Simulation and Informatics, IARI, New Delhi.
I do not claim any originality either to slides or their content and in fact aknowledge various web sources.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
2. Content
Data Mining
Data Mining Definition
Data Mining – Two Main Components
Data Mining vs. Data Analysis
What is (not) Data Mining?
Related Fields
Data Mining Process
Major Data Mining Tasks
Uses of Data Mining
Sources of Data for Mining
Challenges of Data Mining
Advantages
Conclusion
Reference
3. Data Mining
New buzzword, old idea.
Inferring new information from already collected
data.
Traditionally job of Data Analysts
Computers have changed this.
Far more efficient to comb through data using a
machine than eyeballing statistical data.
4. Data Mining Definition
Data mining in Data is the
non-trivial process of identifying
valid
novel
potentially useful
and ultimately understandable patterns in data.
5. Data Mining vs. Data
Analysis
In terms of software and the marketing thereof
Data Mining != Data Analysis
Data Mining implies software uses some intelligence
over simple grouping and partitioning of data to infer
new information.
Data Analysis is more in line with standard statistical
software (ie: web stats). These usually present
information about subsets and relations within the
recorded data set (ie: browser/search engine usage,
average visit time, etc. )
6. What is (not) Data Mining?
Look up phone number
in phone directory
Query a Web search
engine for information
about “Amazon”
•Certain names are more
prevalent in certain US
locations (O’Brien,
O’Rurke, O’Reilly… in
Boston area)
• Group together similar
documents returned by
search engine according to
their context (e.g. Amazon
rainforest, Amazon.com,)
What is not Data Mining? What is Data Mining?
8. Why Mine Data? Scientific
Viewpoint
Data collected and stored at
enormous speeds (GB/hour)
o remote sensors on a satellite
o telescopes scanning the skies
o microarrays generating gene
expression data
o scientific simulations
generating terabytes of data
Traditional techniques infeasible for raw data
Data mining may help scientists
o in classifying and segmenting data
o in Hypothesis Formation
12. Major Data Mining Tasks
Classification: predicting an item class
Associations: e.g. A & B & C occur frequently
Visualization: to facilitate human discovery
Estimation: predicting a continuous value
Deviation Detection: finding changes
Link Analysis: finding relationships...
13. Uses of Data Mining
AI/Machine Learning
Combinatorial/Game Data Mining
Good for analyzing winning strategies to games, and thus
developing intelligent AI opponents. (ie: Chess)
Business Strategies
Market Basket Analysis
Identify customer demographics, preferences, and
purchasing patterns.
Risk Analysis
Product Defect Analysis
Analyze product defect rates for given plants and predict
possible complications (read: lawsuits) down the line.
14. Uses of Data Mining (Cont..)
User Behavior Validation
Fraud Detection
In the realm of cell phones
Comparing phone activity to calling records. Can
help detect calls made on cloned phones.
Similarly, with credit cards, comparing purchases
with historical purchases. Can detect activity with
stolen cards.
15. Uses of Data Mining (Cont..)
Health and Science
Protein Folding
Predicting protein interactions and functionality
within biological cells. Applications of this research
include determining causes and possible cures for
Alzheimers, Parkinson's, and some cancers (caused
by protein "misfolds")
Extra-Terrestrial Intelligence
Scanning Satellite receptions for possible
transmissions from other planets.
For more information see Stanford’s Folding@home
and SETI@home projects. Both involve participation
in a widely distributed computer application.
16. Sources of Data for Mining
Databases (most obvious)
Text Documents
Computer Simulations
Social Networks
17. Advantages of Data Mining
Marketing / Retail
Finance / Banking
Manufacturing
Governments
18. Challenges of Data Mining
Scalability
Dimensionality
Complex and Heterogeneous Data
Data Quality
Data Ownership and Distribution
Privacy Preservation
Streaming Data
19. Conclusion
Comprehensive data warehouses that integrate operational
data with customer, supplier, and market information have
resulted in an explosion of information.
Competition requires timely and sophisticated analysis on an
integrated view of the data.
However, there is a growing gap between more powerful
storage and retrieval systems and the users’ ability to
effectively analyze and act on the information they contain.