This Presentation is about Data mining and its application in different fields. This presentation shows why data mining is important and how it can impact businesses.
This Presentation covers data mining, data mining techniques,
data analysis, data mining subtypes, uses of data mining, sources of data for mining, privacy concerns.
This Presentation is about Data mining and its application in different fields. This presentation shows why data mining is important and how it can impact businesses.
This Presentation covers data mining, data mining techniques,
data analysis, data mining subtypes, uses of data mining, sources of data for mining, privacy concerns.
This lecture gives various definitions of Data Mining. It also gives why Data Mining is required. Various examples on Classification , Cluster and Association rules are given.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
This lecture gives various definitions of Data Mining. It also gives why Data Mining is required. Various examples on Classification , Cluster and Association rules are given.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
2. Content
Data Mining
Data Mining Definition
Data Mining – Two Main Components
Data Mining vs. Data Analysis
What is (not) Data Mining?
Related Fields
Data Mining Process
Major Data Mining Tasks
Uses of Data Mining
Sources of Data for Mining
Challenges of Data Mining
Advantages
Conclusion
Reference
3. DataMining
New buzzword, old idea.
Inferring new information from already collected
data.
Traditionally job of Data Analysts
Computers have changed this.
Far more efficient to comb through data using a
machine than eyeballing statistical data.
5. DataMining vs. DataAnalysis
In terms of software and the marketing thereof
Data Mining != Data Analysis
Data Mining implies software uses some
intelligence over simple grouping and partitioning
of data to infer new information.
Data Analysis is more in line with standard
statistical software (ie: web stats). These usually
present information about subsets and relations
within the recorded data set (ie: browser/search
engine usage, average visit time, etc. )
6. What is(not) DataMining?
Look up phone
number in phone
directory
Query a Web search
engine for information
about “ Amazon”
•Certain names are more
prevalent in certain US
locations (O’ Brien, O’
Rurke, O’ Reilly… in
Boston area)
• Group together similar
documents returned by
search engine according
to their context (e.g.
Amazon rainforest,
Amazon.com,)
Whatis notDataMining? Whatis DataMining?
8. Why Mine Data?Scientific Viewpoint
⚫ Data collected and stored at
enormous speeds (GB/hour)
o remote sensors on a satellite
o telescopes scanning the skies
o microarrays generating gene
expression data
o scientific simulations
generating terabytes of data
⚫ Traditional techniques infeasible for raw data
⚫ Data mining may help scientists
o in classifying and segmenting data
o in Hypothesis Formation
12. Major DataMiningTasks
Classification: predicting an item class
Associations: e.g. A & B & C occur frequently
Visualization: to facilitate human discovery
Estimation: predicting a continuous value
Deviation Detection: finding changes
Link Analysis: finding relationships...
13. Usesof DataMining
AI/Machine Learning
Good for analyzing winning strategies to games, and
thus developing intelligent AI opponents. (ie: Chess)
Business Strategies
Identify customer demographics, preferences, and
purchasing patterns.
Risk Analysis
Analyze product defect rates for given plants and
predict possible complications (read: lawsuits) down
the line.
14. Usesof DataMining(Cont..)
User Behavior Validation
In the realm of cell phones
Comparing phone activity to calling records. Can
help detect calls made on cloned phones.
Similarly, with credit cards, comparing purchases
with historical purchases. Can detect activity
with stolen cards.
15. Usesof DataMining(Cont..)
Health and Science
Predicting protein interactions and functionality
within biological cells. Applications of this
research include determining causes and
possible cures for Alzheimers, Parkinson's, and
some cancers (caused by protein "misfolds")
Scanning Satellite receptions for possible
transmissions from other planets.
For more information see Stanford’ s
Folding@home and SETI@home projects. Both
involve participation in a widely distributed
computer application.
19. Conclusion
Comprehensive data warehouses that integrate
operational data with customer, supplier, and market
information have resulted in an explosion of information.
Competition requires timely and sophisticated analysis
on an integrated view of the data.
However, there is a growing gap between more powerful
storage and retrieval systems and the users’ ability to
effectively analyze and act on the information they
contain.