The document discusses data mining and provides details on key aspects of the data mining process. It describes how data mining is used to extract knowledge from data and identifies the main steps in the knowledge discovery process as data cleaning, integration, selection, transformation, mining, pattern evaluation and presentation. It also outlines different types of data that can be mined, including relational databases, data warehouses, transactional data, and advanced database systems. Common data mining techniques are discussed like classification, clustering, association rule mining and anomaly detection.
ADVANCE DATABASE MANAGEMENT SYSTEM CONCEPTS & ARCHITECTURE by vikas jagtapVikas Jagtap
The data that indicates the earth location (latitude & longitude, or height & depth ) of these rendered objects is known as spatial data.
When the map is rendered, objects of this spatial data are used to project the location of the objects on 2-Dimentional piece of paper.
The spatial data management systems are designed to make the storage, retrieval, & manipulation of spatial data (i.e points, lines and polygons) easier and natural to users, such as GIS.
While typical databases can understand various numeric and character types of data, additional functionality needs to be added for databases to process spatial data types.
These are typically called geometry or feature.
This document discusses key concepts related to databases and information systems. It defines data, information, and databases. It explains that a database management system (DBMS) stores data in a structured way to facilitate retrieval and use. An information system combines a DBMS with tools for querying, analyzing, and presenting the data. The document outlines advantages of database systems like concurrent access, structured storage, separation of data and applications, and data integrity and persistence. Examples of database applications discussed include banking transactions, timetables, and library catalogs.
The document discusses major issues in data mining including mining methodology, user interaction, performance, and data types. Specifically, it outlines challenges of mining different types of knowledge, interactive mining at multiple levels of abstraction, incorporating background knowledge, visualization of results, handling noisy data, evaluating pattern interestingness, efficiency and scalability of algorithms, parallel and distributed mining, and handling relational and complex data types from heterogeneous databases.
Metadata contains answers to questions about the data in a data warehouse. It is stored in a metadata repository and describes pertinent details about the data to users, developers, and the project team. Metadata is necessary for using, building, and administering the data warehouse as it provides information about data extraction, transformations, structure, refreshment, and more. It serves important roles for both business users and IT staff across the data acquisition, storage, and delivery processes.
This presentation discusses the following topics:
Object Oriented Databases
Object Oriented Data Model(OODM)
Characteristics of Object oriented database
Object, Attributes and Identity
Object oriented methodologies
Benefit of object orientation in programming language
Object oriented model vs Entity Relationship model
Advantages of OODB over RDBMS
The document discusses metadata in data warehousing and business intelligence contexts. Some key points:
1. Metadata provides information about data in a data warehouse or warehouse components like data marts. It describes data structures, attributes, transformations and more.
2. Metadata is important for tasks like ETL processing, querying, reporting and overall data management. It helps users understand what data is available and how to access and analyze it.
3. There are different types of metadata including technical metadata about data storage and processes, and business metadata that provides business definitions and rules. Maintaining accurate and consistent metadata is vital for a successful data warehouse.
This document discusses techniques for data reduction to reduce the size of large datasets for analysis. It describes five main strategies for data reduction: data cube aggregation, dimensionality reduction, data compression, numerosity reduction, and discretization. Data cube aggregation involves aggregating data at higher conceptual levels, such as aggregating quarterly sales data to annual totals. Dimensionality reduction removes redundant attributes. The document then focuses on attribute subset selection techniques, including stepwise forward selection, stepwise backward elimination, and combinations of the two, to select a minimal set of relevant attributes. Decision trees can also be used for attribute selection by removing attributes not used in the tree.
The document discusses database management systems (DBMS). It explains that a DBMS is software that stores and manages databases to provide benefits like data independence, efficient access, integrity and security. It also discusses key DBMS concepts like data models, schemas, transactions, concurrency control and ensuring atomicity through logging. DB application development and database administration are important roles supported by a DBMS.
ADVANCE DATABASE MANAGEMENT SYSTEM CONCEPTS & ARCHITECTURE by vikas jagtapVikas Jagtap
The data that indicates the earth location (latitude & longitude, or height & depth ) of these rendered objects is known as spatial data.
When the map is rendered, objects of this spatial data are used to project the location of the objects on 2-Dimentional piece of paper.
The spatial data management systems are designed to make the storage, retrieval, & manipulation of spatial data (i.e points, lines and polygons) easier and natural to users, such as GIS.
While typical databases can understand various numeric and character types of data, additional functionality needs to be added for databases to process spatial data types.
These are typically called geometry or feature.
This document discusses key concepts related to databases and information systems. It defines data, information, and databases. It explains that a database management system (DBMS) stores data in a structured way to facilitate retrieval and use. An information system combines a DBMS with tools for querying, analyzing, and presenting the data. The document outlines advantages of database systems like concurrent access, structured storage, separation of data and applications, and data integrity and persistence. Examples of database applications discussed include banking transactions, timetables, and library catalogs.
The document discusses major issues in data mining including mining methodology, user interaction, performance, and data types. Specifically, it outlines challenges of mining different types of knowledge, interactive mining at multiple levels of abstraction, incorporating background knowledge, visualization of results, handling noisy data, evaluating pattern interestingness, efficiency and scalability of algorithms, parallel and distributed mining, and handling relational and complex data types from heterogeneous databases.
Metadata contains answers to questions about the data in a data warehouse. It is stored in a metadata repository and describes pertinent details about the data to users, developers, and the project team. Metadata is necessary for using, building, and administering the data warehouse as it provides information about data extraction, transformations, structure, refreshment, and more. It serves important roles for both business users and IT staff across the data acquisition, storage, and delivery processes.
This presentation discusses the following topics:
Object Oriented Databases
Object Oriented Data Model(OODM)
Characteristics of Object oriented database
Object, Attributes and Identity
Object oriented methodologies
Benefit of object orientation in programming language
Object oriented model vs Entity Relationship model
Advantages of OODB over RDBMS
The document discusses metadata in data warehousing and business intelligence contexts. Some key points:
1. Metadata provides information about data in a data warehouse or warehouse components like data marts. It describes data structures, attributes, transformations and more.
2. Metadata is important for tasks like ETL processing, querying, reporting and overall data management. It helps users understand what data is available and how to access and analyze it.
3. There are different types of metadata including technical metadata about data storage and processes, and business metadata that provides business definitions and rules. Maintaining accurate and consistent metadata is vital for a successful data warehouse.
This document discusses techniques for data reduction to reduce the size of large datasets for analysis. It describes five main strategies for data reduction: data cube aggregation, dimensionality reduction, data compression, numerosity reduction, and discretization. Data cube aggregation involves aggregating data at higher conceptual levels, such as aggregating quarterly sales data to annual totals. Dimensionality reduction removes redundant attributes. The document then focuses on attribute subset selection techniques, including stepwise forward selection, stepwise backward elimination, and combinations of the two, to select a minimal set of relevant attributes. Decision trees can also be used for attribute selection by removing attributes not used in the tree.
The document discusses database management systems (DBMS). It explains that a DBMS is software that stores and manages databases to provide benefits like data independence, efficient access, integrity and security. It also discusses key DBMS concepts like data models, schemas, transactions, concurrency control and ensuring atomicity through logging. DB application development and database administration are important roles supported by a DBMS.
The document discusses object-oriented databases and their advantages over traditional relational databases, including their ability to model more complex objects and data types. It covers fundamental concepts of object-oriented data models like classes, objects, inheritance, encapsulation, and polymorphism. Examples are provided to illustrate object identity, object structure using type constructors, and how an object-oriented model can represent relational data.
This document provides an introduction and overview of databases and the basic operations used to manage data in a database using Microsoft Access 2007. It defines what a database is, how data is organized in tables with rows and columns, and when it is appropriate to use a database. It also outlines and provides examples of the basic CRUD (create, read, update, delete) operations used in structured query language (SQL) to manipulate data, including inserting, selecting, updating, and deleting records from database tables.
This document provides an introduction to database concepts. It discusses the advantages of a database system compared to file processing, including reduced data redundancy, controlled inconsistency, shared data, standardized data, secured data, and integrated data. It also describes three levels of abstraction in a database - the physical level, conceptual level, and external or view level. Additionally, it covers database models including the relational, network, and hierarchical models as well as key database concepts such as primary keys, foreign keys, candidate keys, and alternate keys.
Data mining involves multiple steps in the knowledge discovery process including data cleaning, integration, selection, transformation, mining, and pattern evaluation. It has various functionalities including descriptive mining to characterize data, predictive mining for inference, and different mining techniques like classification, association analysis, clustering, and outlier analysis.
The document discusses database integration, which involves combining multiple existing databases with different schemas (called local conceptual schemas or LCSs) into a single integrated schema (called a global conceptual schema or GCS). It covers topics such as schema matching to find relationships between elements in different LCSs, schema mapping to translate between LCSs and the GCS, and methods for generating the GCS by combining parts of the LCSs. The goal is to enable queries and applications to interact with the distributed databases through a unified interface via the GCS.
This document discusses various machine learning techniques for classification and prediction. It covers decision tree induction, tree pruning, Bayesian classification, Bayesian belief networks, backpropagation, association rule mining, and ensemble methods like bagging and boosting. Classification involves predicting categorical labels while prediction predicts continuous values. Key steps for preparing data include cleaning, transformation, and comparing different methods based on accuracy, speed, robustness, scalability, and interpretability.
This presentation gives the idea about Data Preprocessing in the field of Data Mining. Images, examples and other things are adopted from "Data Mining Concepts and Techniques by Jiawei Han, Micheline Kamber and Jian Pei "
The document discusses different database models including hierarchical, network, relational, entity-relationship, object-oriented, object-relational, and semi-structured models. It provides details on the characteristics, structures, advantages and disadvantages of each model. It also includes examples and diagrams to illustrate concepts like hierarchical structure, network structure, relational schema, entity relationship diagrams, object oriented diagrams, and XML schema. The document appears to be teaching materials for a database management course that provides an overview of various database models.
This document discusses different methods for organizing and indexing data stored on disk in a database management system (DBMS). It covers unordered or heap files, ordered or sequential files, and hash files as methods for physically arranging records on disk. It also discusses various indexing techniques like primary indexes, secondary indexes, dense vs sparse indexes, and multi-level indexes like B-trees and B+-trees that provide efficient access to records. The goal of file organization and indexing in a DBMS is to optimize performance for operations like inserting, searching, updating and deleting records from disk files.
The document discusses different levels of coupling between data mining (DM) systems and database/data warehouse (DB/DW) systems. It defines:
1) No coupling as DM systems operating independently without utilizing any DB/DW functions.
2) Loose coupling as DM systems fetching data from and storing results in DB/DW systems.
3) Semi-tight coupling as DM systems linking to and using efficient implementations of some DM functions within DB/DW systems.
4) Tight coupling as DM systems being fully integrated with and optimized based on the query processing and data structures of DB/DW systems.
This document discusses data mining and different types of data mining techniques. It defines data mining as the process of analyzing large amounts of data to discover patterns and relationships. The document describes predictive data mining, which makes predictions based on historical data, and descriptive data mining, which identifies patterns and relationships. It also discusses classification, clustering, time-series analysis, and data summarization as specific data mining techniques.
This document provides an overview and introduction to a lecture on database management systems (DBMS). It discusses how companies are increasingly data-driven and how this class will teach the basics of using and managing data. The lecture will cover the motivation for studying DBMS, an overview of the subject, and course logistics. The goal is for students to understand fundamental database concepts and be able to design, query, and build applications with databases.
The document discusses transaction states, ACID properties, and concurrency control in databases. It describes the different states a transaction can be in, including active, partially committed, committed, failed, and terminated. It then explains the four ACID properties of atomicity, consistency, isolation, and durability. Finally, it discusses the need for concurrency control and some problems that can occur without it, such as lost updates, dirty reads, incorrect summaries, and unrepeatable reads.
This document discusses distributed databases and distributed database management systems (DDBMS). It defines a distributed database as a logically interrelated collection of shared data physically distributed over a computer network. A DDBMS is software that manages the distributed database and makes the distribution transparent to users. The document outlines key concepts of distributed databases including data fragmentation, allocation, and replication across multiple database sites connected by a network. It also discusses reference architectures, components, design considerations, and types of transparency provided by DDBMS.
The document discusses databases and database management systems. It provides examples of common database applications like banking, universities, sales, and airlines. It defines what a database is, the role of a database management system, and examples of DBMS software. It also compares the advantages and disadvantages of using a database system versus a traditional file system to store data. Key benefits of a DBMS include supporting complex queries, controlling redundancy and consistency, handling concurrent access from multiple users, and providing security and data recovery.
The document is a presentation on IBM's DB2 database software. It contains:
1) An overview of the DB2 product family and add-on products.
2) Descriptions of DB2 administrative programs, table spaces, constraints, and data types.
3) Explanations of instances and databases, controlling authorities, and the development center.
4) Details about the backup wizard, failure detection and recovery, and a comparison to Oracle's database software.
Lecture4 big data technology foundationshktripathy
The document discusses big data architecture and its components. It explains that big data architecture is needed when analyzing large datasets over 100GB in size or when processing massive amounts of structured and unstructured data from multiple sources. The architecture consists of several layers including data sources, ingestion, storage, physical infrastructure, platform management, processing, query, security, monitoring, analytics and visualization. It provides details on each layer and their functions in ingesting, storing, processing and analyzing large volumes of diverse data.
The document discusses key concepts from Chapter 2 on database environments, including:
1) It describes the ANSI-SPARC three-level architecture for database systems, which separates data into external, conceptual, and internal levels.
2) It explains the roles of various users in a database environment like data administrators, database administrators, and end users.
3) It provides an overview of database languages, data models, and the functions of a database management system.
This document provides an overview of object-oriented databases. It introduces object-oriented programming concepts like encapsulation, polymorphism and inheritance. It then discusses how object-oriented databases combine these concepts with database principles like ACID properties. Advantages include being integrated with programming languages and automatic method storage. Disadvantages include requiring object-oriented programming and high costs to convert data. The document also discusses the Object Query Language and provides an example query in OQL.
This document provides an overview of data mining techniques and concepts. It defines data mining as the process of discovering interesting patterns and knowledge from large amounts of data. The key steps involved are data cleaning, integration, selection, transformation, mining, evaluation, and presentation. Common data mining techniques include classification, clustering, association rule mining, and anomaly detection. The document also discusses data sources, major applications of data mining, and challenges.
The document discusses object-oriented databases and their advantages over traditional relational databases, including their ability to model more complex objects and data types. It covers fundamental concepts of object-oriented data models like classes, objects, inheritance, encapsulation, and polymorphism. Examples are provided to illustrate object identity, object structure using type constructors, and how an object-oriented model can represent relational data.
This document provides an introduction and overview of databases and the basic operations used to manage data in a database using Microsoft Access 2007. It defines what a database is, how data is organized in tables with rows and columns, and when it is appropriate to use a database. It also outlines and provides examples of the basic CRUD (create, read, update, delete) operations used in structured query language (SQL) to manipulate data, including inserting, selecting, updating, and deleting records from database tables.
This document provides an introduction to database concepts. It discusses the advantages of a database system compared to file processing, including reduced data redundancy, controlled inconsistency, shared data, standardized data, secured data, and integrated data. It also describes three levels of abstraction in a database - the physical level, conceptual level, and external or view level. Additionally, it covers database models including the relational, network, and hierarchical models as well as key database concepts such as primary keys, foreign keys, candidate keys, and alternate keys.
Data mining involves multiple steps in the knowledge discovery process including data cleaning, integration, selection, transformation, mining, and pattern evaluation. It has various functionalities including descriptive mining to characterize data, predictive mining for inference, and different mining techniques like classification, association analysis, clustering, and outlier analysis.
The document discusses database integration, which involves combining multiple existing databases with different schemas (called local conceptual schemas or LCSs) into a single integrated schema (called a global conceptual schema or GCS). It covers topics such as schema matching to find relationships between elements in different LCSs, schema mapping to translate between LCSs and the GCS, and methods for generating the GCS by combining parts of the LCSs. The goal is to enable queries and applications to interact with the distributed databases through a unified interface via the GCS.
This document discusses various machine learning techniques for classification and prediction. It covers decision tree induction, tree pruning, Bayesian classification, Bayesian belief networks, backpropagation, association rule mining, and ensemble methods like bagging and boosting. Classification involves predicting categorical labels while prediction predicts continuous values. Key steps for preparing data include cleaning, transformation, and comparing different methods based on accuracy, speed, robustness, scalability, and interpretability.
This presentation gives the idea about Data Preprocessing in the field of Data Mining. Images, examples and other things are adopted from "Data Mining Concepts and Techniques by Jiawei Han, Micheline Kamber and Jian Pei "
The document discusses different database models including hierarchical, network, relational, entity-relationship, object-oriented, object-relational, and semi-structured models. It provides details on the characteristics, structures, advantages and disadvantages of each model. It also includes examples and diagrams to illustrate concepts like hierarchical structure, network structure, relational schema, entity relationship diagrams, object oriented diagrams, and XML schema. The document appears to be teaching materials for a database management course that provides an overview of various database models.
This document discusses different methods for organizing and indexing data stored on disk in a database management system (DBMS). It covers unordered or heap files, ordered or sequential files, and hash files as methods for physically arranging records on disk. It also discusses various indexing techniques like primary indexes, secondary indexes, dense vs sparse indexes, and multi-level indexes like B-trees and B+-trees that provide efficient access to records. The goal of file organization and indexing in a DBMS is to optimize performance for operations like inserting, searching, updating and deleting records from disk files.
The document discusses different levels of coupling between data mining (DM) systems and database/data warehouse (DB/DW) systems. It defines:
1) No coupling as DM systems operating independently without utilizing any DB/DW functions.
2) Loose coupling as DM systems fetching data from and storing results in DB/DW systems.
3) Semi-tight coupling as DM systems linking to and using efficient implementations of some DM functions within DB/DW systems.
4) Tight coupling as DM systems being fully integrated with and optimized based on the query processing and data structures of DB/DW systems.
This document discusses data mining and different types of data mining techniques. It defines data mining as the process of analyzing large amounts of data to discover patterns and relationships. The document describes predictive data mining, which makes predictions based on historical data, and descriptive data mining, which identifies patterns and relationships. It also discusses classification, clustering, time-series analysis, and data summarization as specific data mining techniques.
This document provides an overview and introduction to a lecture on database management systems (DBMS). It discusses how companies are increasingly data-driven and how this class will teach the basics of using and managing data. The lecture will cover the motivation for studying DBMS, an overview of the subject, and course logistics. The goal is for students to understand fundamental database concepts and be able to design, query, and build applications with databases.
The document discusses transaction states, ACID properties, and concurrency control in databases. It describes the different states a transaction can be in, including active, partially committed, committed, failed, and terminated. It then explains the four ACID properties of atomicity, consistency, isolation, and durability. Finally, it discusses the need for concurrency control and some problems that can occur without it, such as lost updates, dirty reads, incorrect summaries, and unrepeatable reads.
This document discusses distributed databases and distributed database management systems (DDBMS). It defines a distributed database as a logically interrelated collection of shared data physically distributed over a computer network. A DDBMS is software that manages the distributed database and makes the distribution transparent to users. The document outlines key concepts of distributed databases including data fragmentation, allocation, and replication across multiple database sites connected by a network. It also discusses reference architectures, components, design considerations, and types of transparency provided by DDBMS.
The document discusses databases and database management systems. It provides examples of common database applications like banking, universities, sales, and airlines. It defines what a database is, the role of a database management system, and examples of DBMS software. It also compares the advantages and disadvantages of using a database system versus a traditional file system to store data. Key benefits of a DBMS include supporting complex queries, controlling redundancy and consistency, handling concurrent access from multiple users, and providing security and data recovery.
The document is a presentation on IBM's DB2 database software. It contains:
1) An overview of the DB2 product family and add-on products.
2) Descriptions of DB2 administrative programs, table spaces, constraints, and data types.
3) Explanations of instances and databases, controlling authorities, and the development center.
4) Details about the backup wizard, failure detection and recovery, and a comparison to Oracle's database software.
Lecture4 big data technology foundationshktripathy
The document discusses big data architecture and its components. It explains that big data architecture is needed when analyzing large datasets over 100GB in size or when processing massive amounts of structured and unstructured data from multiple sources. The architecture consists of several layers including data sources, ingestion, storage, physical infrastructure, platform management, processing, query, security, monitoring, analytics and visualization. It provides details on each layer and their functions in ingesting, storing, processing and analyzing large volumes of diverse data.
The document discusses key concepts from Chapter 2 on database environments, including:
1) It describes the ANSI-SPARC three-level architecture for database systems, which separates data into external, conceptual, and internal levels.
2) It explains the roles of various users in a database environment like data administrators, database administrators, and end users.
3) It provides an overview of database languages, data models, and the functions of a database management system.
This document provides an overview of object-oriented databases. It introduces object-oriented programming concepts like encapsulation, polymorphism and inheritance. It then discusses how object-oriented databases combine these concepts with database principles like ACID properties. Advantages include being integrated with programming languages and automatic method storage. Disadvantages include requiring object-oriented programming and high costs to convert data. The document also discusses the Object Query Language and provides an example query in OQL.
This document provides an overview of data mining techniques and concepts. It defines data mining as the process of discovering interesting patterns and knowledge from large amounts of data. The key steps involved are data cleaning, integration, selection, transformation, mining, evaluation, and presentation. Common data mining techniques include classification, clustering, association rule mining, and anomaly detection. The document also discusses data sources, major applications of data mining, and challenges.
This document provides an overview of web mining. It defines web mining as using data mining techniques to automatically discover and extract information from web documents and services. It discusses the differences between web mining and data mining, and covers the main topics in web mining including web graph analysis, structured data extraction, and web advertising. It also describes the different approaches of web content mining, web structure mining, and web usage mining.
This document provides an overview of web mining, which uses data mining techniques to automatically discover and extract information from web documents and services. It discusses the differences between web mining and traditional data mining, and covers various topics in web mining including web content mining, web structure mining, and web usage mining. The document also examines issues around the large scale of web data and approaches for analyzing it at scale across distributed systems.
Brief description of the 3 mining techniques and we give a brief description of the differences between them and the similarities. Finally we talked about the shared techniques.
The document discusses databases and data warehouses. It explains the differences between traditional file organization and database management. Relational and object-oriented database models are used to construct and manipulate databases. Data modeling creates a conceptual design for databases. Data is extracted from transactional databases and transformed for loading into data warehouses to support analysis and decision making.
Data warehousing is an architectural model that gathers data from various sources into a single unified data model for analysis purposes. It consists of extracting data from operational systems, transforming it, and loading it into a database optimized for querying and analysis. This allows organizations to integrate data from different sources, provide historical views of data, and perform flexible analysis without impacting transaction systems. While implementation and maintenance of a data warehouse requires significant costs, the benefits include a single access point for all organizational data and optimized systems for analysis and decision making.
Various Applications of Data Warehouse.pptRafiulHasan19
The document discusses various applications of data warehousing. It begins by describing problems with traditional transactional systems and how data warehouses address these issues. It then defines key components of a data warehouse including the extraction, transformation, and loading of data from various sources. The document outlines how online analytical processing (OLAP) tools, metadata repositories, and data mining techniques analyze and explore the collected data. Finally, it weighs the benefits of a data warehouse against the costs of implementation and maintenance.
Data mining involves extracting useful patterns from large amounts of data. It defines the process of data mining which includes problem definition, data gathering and preparation, model building and evaluation, and knowledge deployment. The document also discusses why data mining is used, the types of data it can be applied to, and some common applications. It provides an overview of popular data mining tools and techniques such as association, classification, clustering, prediction, and decision trees.
Modern databases can be categorized as memory based distributed transactional databases, column stores, NoSQL distributed document stores, NoSQL distributed key-value stores, NoSQL distributed data stores using Apache Lucene, distributed data stores supporting ACID transactions, and graph databases. Each has advantages for different data and query requirements regarding performance, scalability, data structure, and transaction support. The document provides examples of databases for each category.
- Data warehousing aims to help knowledge workers make better decisions by integrating data from multiple sources and providing historical and aggregated data views. It separates analytical processing from operational processing for improved performance.
- A data warehouse contains subject-oriented, integrated, time-variant, and non-volatile data to support analysis. It is maintained separately from operational databases. Common schemas include star schemas and snowflake schemas.
- Online analytical processing (OLAP) supports ad-hoc querying of data warehouses for analysis. It uses multidimensional views of aggregated measures and dimensions. Relational and multidimensional OLAP are common architectures. Measures are metrics like sales, and dimensions provide context like products and time periods.
This document discusses key concepts related to databases and business intelligence. It defines common terms like databases, records, fields, and entities. It explains how relational database management systems (RDBMS) represent data in tables and allow querying, manipulation, and reporting of data through SQL. It also discusses data warehousing, analytics tools, data mining, and ensuring high quality data. The goal is to provide organizations with tools and technologies to access information from databases and improve business performance.
This document discusses architecting a data lake. It begins by introducing the speaker and topic. It then defines a data lake as a repository that stores enterprise data in its raw format including structured, semi-structured, and unstructured data. The document outlines some key aspects to consider when architecting a data lake such as design, security, data movement, processing, and discovery. It provides an example design and discusses solutions from vendors like AWS, Azure, and GCP. Finally, it includes an example implementation using Azure services for an IoT project that predicts parts failures in trucks.
A data warehouse is a collection of data integrated from multiple sources to support decision making. It contains subject-oriented, integrated, time-variant, and non-volatile data stored in a way that makes it readily available for analysis. Data marts can be dependent on the warehouse or independent subsets designed for specific departments. Successful implementation requires identifying data sources and governance, planning data quality and modeling, selecting ETL and database tools, and supporting end users. Key challenges include unrealistic expectations, technical issues, and ensuring ongoing value.
Colorado Springs Open Source Hadoop/MySQL David Smelker
This document discusses MySQL and Hadoop integration. It covers structured versus unstructured data and the capabilities and limitations of relational databases, NoSQL, and Hadoop. It also describes several tools for integrating MySQL and Hadoop, including Sqoop for data transfers, MySQL Applier for streaming changes to Hadoop, and MySQL NoSQL interfaces. The document outlines the typical life cycle of big data with MySQL playing a role in data acquisition, organization, analysis, and decisions.
Combining Data Mining and Machine Learning for Effective User ProfilingCodePolitan
Slide presentasi ini dibawakan oleh Anne Regina pada Seminar & Workshop Pengenalan & Potensi Big Data & Machine Learning yang diselenggarakan oleh KUDIO pada tanggal 14 Mei 2016.
This document provides an overview of data warehousing, including its definition, types, components, architecture, database design, OLAP, and metadata repository. It discusses the differences between OLTP and data warehousing systems and describes the key steps in building a data warehouse, including data extraction, transformation, loading, storage, analysis, delivery of information to users, and ongoing management of the data warehouse system.
This document provides an overview of data warehousing, including its definition, types, components, architecture, database design, OLAP, and metadata repository. It discusses the differences between OLTP and data warehousing systems and describes the key steps in building a data warehouse, including data extraction, transformation, loading, storage, analysis, delivery of information to users, and ongoing management of the data warehouse system.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
What is Augmented Reality Image Trackingpavan998932
Augmented Reality (AR) Image Tracking is a technology that enables AR applications to recognize and track images in the real world, overlaying digital content onto them. This enhances the user's interaction with their environment by providing additional information and interactive elements directly tied to physical images.
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...kalichargn70th171
A dynamic process unfolds in the intricate realm of software development, dedicated to crafting and sustaining products that effortlessly address user needs. Amidst vital stages like market analysis and requirement assessments, the heart of software development lies in the meticulous creation and upkeep of source code. Code alterations are inherent, challenging code quality, particularly under stringent deadlines.
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
4. • KDD Process Steps
• 1) Data Clearing
• 2) Data Integration
• 3) Data Selection
• 4) Data transformation
• 5) Data mining
• 6) Pattern evaluation
• 7) Knowledge Presentation
Ajith G.S: poposir.orgfree.com
Data Mining
5. • KDD Process Steps
• 1) Data Clearing – remove noise and inconsistent data
• 2) Data Integration – combine multiple data source
• 3) Data Selection – select relevant data for analysis
• 4) Data transformation – convert into needed format
• 5) Data mining – apply methods to extract data pattern
• 6) Pattern evaluation – select needed pattern to represent
knowledge
• 7) Knowledge Presentation – diff visualization techniques
Ajith G.S: poposir.orgfree.com
Data Mining
6. • Data Mining is a step in knowledge discovery process
•
Ajith G.S: poposir.orgfree.com
Data Mining
7. • Architecture of data mining system
• .
Ajith G.S: poposir.orgfree.com
Data Mining
8. • Architecture of data mining system
• Components are
• Database, Data ware house, World wide web, other
information repository
• - data cleaning and integration techniques may be performed
on the data
• Database or data ware house server
• - responsible for fetching needed data
•
Ajith G.S: poposir.orgfree.com
Data Mining
9. • Architecture of data mining system
• Knowledge base
• - used to guide the search
• Data mining Engine
• - task such as characterization, association, correlation analysis,
classification, ..
• Pattern evaluation module
• - to select needed patterns
• User interface
• - user communication
Ajith G.S: poposir.orgfree.com
Data Mining
10. • It deals with a number of different data repositories on which mining can
be performed.
• Can be applicable to any kinds of repositories as well as data streams.
• Data Repositories like
• Relational Databases
• Data Warehouses
• Transactional Databases
• Advanced database systems
• Flat files
• Data streams
• WWW
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
11. • Advanced database systems like
• Object relational databases
• Temporal, sequence and time series database
• Spatial databases
• Multimedia databases
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
12. • Relational Databases
• DBMS - Collection of interrelated data + set of software programs
to access and manage the data
• Relational Database - A collection of tables, each of which is
assigned a unique name
• Each table consist of a set of attributes and stores a large set of
tuples
• Tuple represents an object identified by a unique key and described
by a set of attribute values
•
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
13. • Relational Databases
• Relational data can be accessed by relational query language
such as SQL or with assistance of GUI.
• A given query is transformed into relational operations such as
join, selection and projection
• Data mining in relational database Searching for data
patterns Example: To predict credit risk of new customers
based on the data available in the database.
• Relational DB is most commonly available and is a rich
information repository.
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
14. • Data Warehouse
• It is a repository of information collected from multiple sources
stored under a unified schema and that usually resides at a
single site.
• Constructed using Data Cleaning, Integration,
Transformation, Loading and Periodic data refreshing.
• In a data warehouse rather than storing details it may store a
summary of the data from a historical perspective.
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
15. • Data Warehouse
• Multidimensional database structure Dimension- An attribute
or a set of attribute in the schema. Cell- Aggregate measure
• Usually by a multidimensional data cube.
• Data mart Department subset of a data warehouse and
focuses on selected subjects
• OLAP operations Roll up, Drill down
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
16. • Typical framework of a data warehouse
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
17. • Multidimensional data cube
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
18. • Transactional Database
• Consist of a file where each record represents a transaction.
• Includes a unique transaction identity number and list of items
making up the transaction
• Example: Transactional database for sales “Which items sold
well together?” Data mining for transactional data identifies
frequent item sets easily
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
19. • Advanced Data and Information Systems and Advanced
Applications
• Object Relational Databases
• Temporal Databases, Sequence Databases and Time-Series
Database
• Spatial Databases and spatio-temporal databases
• Text Databases and Multimedia Databases
• Heterogeneous Databases and legacy Databases
• Data Streams
• WWW
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
20. • Advanced Data and Information Systems and Advanced Applications
• Object Relational Databases
• Handles complex objects
• Each entity is considered as an object Individual items, employees etc.
• Data and code relating to an object are encapsulated into a single unit
• Each object has
• A set of variables Attributes
• A set of messages to communicate with other objects
• A set of methods Holds the code to implement the message
• Object class Objects that share a common set of properties
• Each object is an instance of a class.
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
21. • Advanced Data and Information Systems and Advanced Applications
• Temporal Databases, Sequence Databases and Time-Series Database
• Temporal databases handles data involving time Stores relational data
that include time related attributes
• Sequence Databases stores sequence of ordered events with or with out a
concrete notion of time. Example Customer shopping sequences
• Time Series Databases stores sequence of values or events obtained over
repeated measurements of time. Example Data collected from the stock
exchange.
• Data mining techniques can be used to find the trends of changes for
objects in the database.
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
22. • Advanced Data and Information Systems and Advanced
Applications
• Spatial Databases and spatio-temporal databases
• Spatial database contains objects defined geometric space
Example Maps, CAD databases
• Using data mining the relationship among a set of spatial
objects can be examined
• Spatio temporal databases Spatial DBs that stores spatial
objects that change with time Example : Tracking of moving
vehicles
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
23. • Advanced Data and Information Systems and Advanced
Applications
• Text Databases and Multimedia Databases
• Text databases contains word descriptions for objects Long
sequence of paragraphs. Example : Product specification
• Text databases may highly unstructured(Web pages on WWW),
semi structured(email) and well structured.
• By mining text data we can uncover general and concise
descriptions of the text documents, keywords etc.
• Multimedia databases store image, audio and video data Must
support large objects
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
24. • Advanced Data and Information Systems and Advanced
Applications
• Heterogeneous Databases and Legacy databases
• Heterogeneous databases consist of a set of interconnected
component databases where the objects in the component
databases differ greatly.
• Legacy database is a group of heterogeneous databases
• Information exchange across these databases is very difficult
due to diverse semantics Data mining is a solution by
transforming the data into higher and more generalized levels
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
25. • Advanced Data and Information Systems and Advanced
Applications
• Data Streams
• New kind of data where the data flow in and out of an
observation platform dynamically.
• Example: Video Surveillance
• Data streams are normally not stored in any kind of
repository Challenges to management and analysis
• Uses continuous query model
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
26. • Advanced Data and Information Systems and Advanced Applications
• World Wide Web
• Data objects are linked together to facilitate interactive access.
• Opportunity as well as challenge to data mining
• Web usage mining Capturing user access pattern in distributed
information environment
• Keyword-based search offer limited help to users
• Authoritative web page analysis Rank webpages based on their
importance
• Automated web page clustering and classification Arrange web pages
based on their contents
• Web community analysis Identifies hidden social networks and
communities
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
27. • What kinds of patterns can be mined?
• Used to specify the kind of patterns to be found in data mining
tasks.
• Tasks can be classified into 2:
• Descriptive Deals with the general properties of data in the
database
• Predictive Perform inference on the current data in order to
make predictions
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
28. • Concept/ Class Description: Characterization and
Discrimination
• Mining frequent Patterns, Association and Correlations
• Classification and Prediction
• Cluster Analysis
• Outlier Analysis
• Evolution Analysis
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
29. • Concept/ Class Description: Characterization and Discrimination
• Data can be associated with classes or concepts.
• Example:
• classes of items for sales - computer and printers
• concepts of customers - big spenders and budget spenders
• Using precise terms we can describe individual classes and concepts.
• Such descriptions of a class or a concept are called class/concept descriptions
• These descriptions can be derived via
• Data Characterization − This refers to summarizing data of class under study -
Target Class
• Data Discrimination − By comparison of the target class with one or a set of
comparative classes- Contrasting classes
• Both the above methods
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
30. • Mining frequent Patterns
• Patterns that occur frequently in transactional data.
• Frequent Item Set − It refers to a set of items that frequently
appear together - milk and bread
• Frequent Subsequence − A sequence of patterns that occur
frequently - purchasing a camera is followed by memory card
• Frequent Sub Structure − Substructure refers to different structural
forms, such as graphs, trees, or lattices, which may be combined
with item−sets or subsequences.
• Mining frequent patterns lead to the discovery of interesting
associations and correlations within the data
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
31. • Association and Correlations
• Association Rules: 2 types
• Single dimensional association rules
• Multi-dimensional association rules
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
32. • Association and Correlations
• The association rules are discarded as uninteresting if they do
not satisfy both a minimum support threshold and a minimum
confidence threshold.
• Confidence- Certainty
• Support- indication of how frequently the items appear in the
database
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
33. • Classification
• Classification is the process of finding a model that describes the data
classes or concepts.
• This derived model is based on the analysis of sets of training data- Known
class labels
• Using this model to predict the class of objects whose class label is
unknown.
• The derived model can be presented in the following forms −
• (IF-THEN) Rules
• Decision Trees
• Mathematical Formulae
• Neural Networks
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
35. • Prediction
• Models continuous valued functions
• It is used to predict missing or unavailable numerical data
values rather than class labels.
• Regression Analysis is generally used for prediction.
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
36. • Cluster Analysis
• Analyzes data objects without consulting a known class label
• The objects are clustered or grouped based on the principle of
“ maximizing the intra-class similarity and minimizing the
interclass similarity”
• Within a cluster the data objects will have high similarity but
dissimilar to objects in other clusters
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
38. • Outlier Analysis
• Outliers- Data objects in a database that do not obey the
general behavior or model of data.
• In some applications, the rare events can be more interesting
than the regularly occurring ones Fraud detection Outlier
mining
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
39. • Evolution Analysis
• Evolution analysis refers to the description and model
regularities or trends for objects whose behavior changes over
time.
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
41. • Classification according to the kinds of database mined
• Data models (Relational, Transactional, Object relational)
• Type of data (spatial, time series, text, stream , multimedia,
WWW)
• Classification according to the kinds of knowledge mined
• Based on different data mining functionalities
• According to the level of abstraction of knowledge mined
• According to the regularity or irregularity of data that is mined
Ajith G.S: poposir.orgfree.com
Data Mining Classification of Data Mining System
42. • Classification according to the kinds of techniques utilized
• Degree of user interactions involved
• Methods of data analysis involved (database oriented or data
warehouse oriented etc)
• Classification according to the applications adapted
• Finance
• Tele communication
• DNA
Ajith G.S: poposir.orgfree.com
Data Mining Classification of Data Mining System
43. • Each user will have a data mining task, to perform a task with
help of data mining query
• Query is defined as Data mining task primitives Allow the
users to interact with the data mining system.
• DMQL Data Mining query Language
Ajith G.S: poposir.orgfree.com
Data Mining Task Primitives
44. • The primitives specify
• The set of task relevant data to be mined
• Specifies the portions of database or the set of data in which the
user is interested
• It includes
• Database or data warehouse name
• Database tables or Data warehouse cubes
• Conditions for data selection
• Relevant attributes or dimensions
• Data grouping criteria
Ajith G.S: poposir.orgfree.com
Data Mining Task Primitives
45. • The primitives specify
• The kind of knowledge to be mined
• Specifies the data mining functions to be performed
• Characterization
• Discrimination
• Association/ Correlation
• Classification/Prediction
• Clustering
• Outlier or Evolution Analysis
Ajith G.S: poposir.orgfree.com
Data Mining Task Primitives
46. • The primitives specify
• The background knowledge to be used in the discovery process
• Knowledge about the domain to be mined
• Guides the knowledge discovery process and evaluations of
the patterns found
• User beliefs regarding the relationships in the data
Ajith G.S: poposir.orgfree.com
Data Mining Task Primitives
47. • The primitives specify
• The interestingness measures and threshold for pattern
evaluation
• Used to guide the mining process or evaluation of the
discovered patterns
• Different kind of knowledge have different interestingness
measures
• eg
• Support
• Confidence
Ajith G.S: poposir.orgfree.com
Data Mining Task Primitives
48. • The primitives specify
• The expected representation for visualizing the discovered patterns
• Refers to the form in which discovered patterns are to be displayed
• Rules
• Tables
• Charts
• Graphs
• Decision Trees
• Cubes
Ajith G.S: poposir.orgfree.com
Data Mining Task Primitives
49. • Integration of Data Mining System with Database or Data
Warehouse System
Ajith G.S: poposir.orgfree.com
50. • When DM work in an environment, it required to communicate
with other information components such DB and DW
• Diff integration schema are
• No coupling
• Loose coupling
• Semi tight coupling
• Tight coupling
Ajith G.S: poposir.orgfree.com
Integration of Data Mining System with Database or Data Warehouse System
51. • No coupling
• A DM system will not use facilities of a DB / DW system
• Fetch data from a particular source(file) and process the data
and stores the results in another file.
• Simple integration scheme
• Drawbacks
• Wastage of time for preprocessing the data
• Use other tools to extract data
• Poor Design
Ajith G.S: poposir.orgfree.com
Integration of Data Mining System with Database or Data Warehouse System
52. • Loose coupling
• A data mining system will use some facilities of a DB / DW
system
• Fetch data from a data repository and process the data and
stores the results in DB or DW
• It fetch the data using query processing, indexing and other
DB/DW system facilities
• Drawback
• Difficult to achieve high scalability and good performance with
large data sets
Ajith G.S: poposir.orgfree.com
Integration of Data Mining System with Database or Data Warehouse System
53. • Semi tight coupling
• Essential data mining primitives are provided in the DB/DW system
• Sorting
• Indexing
• Aggregation
• Histogram Analysis
• Pre-computation of statistical measures
• Also some frequently used intermediate mining results can be pre-
computed and stored in a DB/DW system.
• The design will enhance the performance of a DM system
Ajith G.S: poposir.orgfree.com
Integration of Data Mining System with Database or Data Warehouse System
54. • Tight coupling
• Smoothly integrated into the DB/DW system
• DM system is treated as one functional component of an
information system
• Data mining queries and functions are optimized based on
different methods of DB/DW system.
Ajith G.S: poposir.orgfree.com
Integration of Data Mining System with Database or Data Warehouse System
55. • Data mining is not an easy task,
• The algorithms use very complex data is not always available at
one place
• Needs to be integrated from various heterogeneous data
sources.
• Common Issues are
• Mining methodology and user interaction Issues
• Performance Issues
• Issues related to the different types of database
Ajith G.S: poposir.orgfree.com
Issues in Data Mining
56. • Mining different kinds of knowledge in the databases
• Different users may be interested in different kinds of knowledge.
It should cover a broad range of knowledge discovery
task(classification, clustering)
• Uses the same database in different ways
• Interactive mining of knowledge at multiple levels of abstraction
• The data mining process needs to be interactive allows users to
focus the search for patterns, providing and refining data mining
requests based on the returned results.
• Enables the user to view the data from different angles and level
of abstractions
Ajith G.S: poposir.orgfree.com
Issues in Data Mining Mining methodology and user interaction Issues
57. • Incorporation of background knowledge(knowledge about the
domain under study)
• To guide discovery process and to express the discovered patterns,
the background knowledge can be used Express the discovered
patterns not only in concise terms but at multiple levels of
abstraction.
• Data mining query languages and ad hoc data mining
• Data Mining query language that allows the user to describe ad hoc
mining tasks should be developed.
• These languages should be integrated with a database or data
warehouse query language and optimized for efficient and flexible
data mining.
Ajith G.S: poposir.orgfree.com
Issues in Data Mining Mining methodology and user interaction Issues
58. • Presentation and visualization of data mining results
• Once the patterns are discovered it needs to be expressed in
high level languages, and visual representations.
• These representations should be easily understandable
• Handling noisy and incomplete data
• The data cleaning methods are required to handle the noise
and incomplete objects while mining the data regularities.
• If the data cleaning methods are not there then the accuracy
of the discovered patterns will be poor
Ajith G.S: poposir.orgfree.com
Issues in Data Mining Mining methodology and user interaction Issues
59. • Pattern evaluation
• The patterns discovered may be uninteresting because either
they represent common knowledge or lack novelty
• To guide the discovery process and reduce the search space,
interestingness measures or user specified constraints should
be there.
Ajith G.S: poposir.orgfree.com
Issues in Data Mining Mining methodology and user interaction Issues
60. • Efficiency and scalability of data mining algorithm
• In order to effectively extract the information from huge
amount of data in databases
• The running time must be predictable and scalable.
• Parallel, distributed and incremental mining algorithms
• These algorithms divide the data into partitions which is
further processed in a parallel fashion.
• Then the results from the partitions is merged.
• The incremental algorithms, update databases without mining
the data again from scratch.
Ajith G.S: poposir.orgfree.com
Issues in Data Mining Performance Issues
61. • Handling of relational and complex types of data
• The database may contain complex data objects, multimedia data
objects, spatial data, temporal data etc.
• It is not possible for one system to mine all these kind of data.
• Mining information from heterogeneous databases and global
information systems
• The data is available at different data sources on LAN or WAN.
• These data source may be structured, semi structured or
unstructured.
• Therefore mining the knowledge from them adds challenges to
data mining.
Ajith G.S: poposir.orgfree.com
Issues in Data Mining Issues relating to the diversity of database types