Hardware enhanced association rule mining

839 views

Published on

Hardware enhanced association rule mining

Published in: Devices & Hardware
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
839
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Hardware enhanced association rule mining

  1. 1. Hardware-Enhanced Association Rule Mining with Hashing and Pipelining Abstract The project titled “Hardware-Enhanced Association Rule Mining with Hashing and Pipelining ” is designed using Active Server Pages .NET with Microsoft Visual Studio.Net 2005 as front end which works in .Net framework version 2.0. The coding language used is C# .Net. Data mining techniques have been widely used in various applications. One of the most important data mining applications is association rule mining. Apriori-based association rule mining in hardware, one has to load candidate itemsets and a database into the hardware. Since the capacity of the hardware architecture is fixed, if the number of candidate itemsets or the number of items in the database is larger than the hardware capacity, the items are loaded into the hardware separately. The time complexity of those steps that need to load candidate itemsets or database items into the hardware is in proportion to the number of candidate itemsets multiplied by the number of items in the database. Too many candidate itemsets and a large database would create a performance bottleneck. In this paper, we propose a HAsh-based and PiPelIned (abbreviated as HAPPI) architecture for hardware enhanced association rule mining. Therefore, we can effectively reduce the frequency of loading the database into the hardware. HAPPI solves the bottleneck problem in a priori-based hardware schemes.
  2. 2. 1. INTRODUCTION DATA mining technology is now used in a wide variety of fields. Applications include the analysis of customer transaction records, web site logs, credit card purchase information, call records, to name a few. The interesting results of data mining can provide useful information such as customer behavior for business managers and researchers. One of the most important data mining applications is association rule mining We only need to focus on the methods of finding the frequent itemsets in the database. The Apriori approach was the first to address this issue. Apriori finds frequent itemsets by scanning a database to check the frequencies of candidate itemsets, which are generated by merging frequent subitemsets. However, Apriori-based algorithms have undergone bottlenecks because they have too many candidate itemsets. Apriori-based hardware schemes require loading the candidate itemsets and the database into the hardware. Since the capacity of the hardware is fixed, if the number of items in the database is larger than the hardware capacity, the data items must be loaded separately. Therefore, the process of comparing candidate itemsets with the database needs to be executed several times. Similarly, if the number of candidate itemsets is larger than the capacity of the hardware, the pattern matching procedure has to be separated into many rounds. Clearly, it is infeasible for any hardware design to load the candidate itemsets and the database into hardware for multiple times. Since the time complexity of those steps that need to load candidate itemsets or database items into the hardware is in proportion to the number of candidate itemsets and the number of items in the database, this procedure is very time
  3. 3. consuming. In addition, numerous candidate itemsets and a huge database may cause a bottleneck in the system 1.1PROJECT DESCRIPTION The project entitled as “Hardware-Enhanced Association Rule Mining with Hashing and Pipelining ” developed using .NET using C#. Modules display as follows. 1.Transaction Table 2. Support Count 3. Frequent Item set 4. Transaction Trimming 5. HAPPI Result Transaction Table This table is used during restart recovery to track the state of active transactions. It is initialized during the analysis pass from the most recent checkpoints record and is modified during the analysis of the log records written after the beginning of that checkpoint. It is also used during normal processing
  4. 4. Support Count The support counting procedure finds frequent itemsets by comparing candidate itemsets with transactions in the database. Frequent Item set A frequent itemset is an itemset whose support is greater than some user-specified minimum support Transaction Trimming From this information, infrequent items in the transactions can be eliminated since they are not useful in generating frequent itemsets through the trimming filter HAPPI Result It will compare the existing system and proposed system with graph manner.
  5. 5. 2. SYSTEM STUDY The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company. For feasibility analysis, some understanding of the major requirements for the system is essential. Three key considerations involved in the feasibility analysis are ♦ ECONOMICAL FEASIBILITY ♦ TECHNICAL FEASIBILITY ♦ SOCIAL FEASIBILITY ECONOMICAL FEASIBILITY This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company
  6. 6. can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased. TECHNICAL FEASIBILITY This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system. SOCIAL FEASIBILITY The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must
  7. 7. accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system. 2.2 EXISTING SYSTEM • Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on database containing transactions. Apriori finds frequent itemsets by scanning a database to check the frequencies of candidate itemsets, which are generated by merging frequent subitemsets. Apriori uses to count candidate item sets efficiently. • Apriori-based algorithms have undergone bottlenecks because they have too many candidate itemsets. So we can’t reduce the frequency of loading the database into the hardware. 2.3 PROPOSED SYSTEM • We propose a HAsh-based and PiPelIned (abbreviated as HAPPI) architecture for hardware-enhanced association rule mining. • There are three hardware modules in our system. 1. First, when the database is fed into the hardware, the candidate itemsets are compared with the items in the database by the systolic array. 2. Second, we collect trimming information. From this information, infrequent items in the transactions can be
  8. 8. eliminated since they are not useful in generating frequent itemsets through the trimming filter. 3. Third, we generate itemsets from transactions and hash them into the hash table, which is then used to filter out unnecessary candidate itemsets. • Our Proposed System solves the bottleneck problem in a priori-based hardware schemes. 3. SYSTEM SPECIFICATION Hardware Requirements: • SYSTEM : Pentium III 700 MHz • HARD DISK : 40 GB • RAM : 128 MB Software Requirements: • Operating system :- Windows XP Professional • Front End :- Microsoft Visual Studio .Net 2005 • Coding Language :- Visual C# .Net • Backend :-Sqlserver 2000
  9. 9. 4. LANGAUGE SPECIFICATION 4.1 THE .NET FRAMEWORK The .NET Framework is a new computing platform that simplifies application development in the highly distributed environment of the Internet. OBJECTIVES OF. NET FRAMEWORK: 1. To provide a consistent object-oriented programming environment whether object codes is stored and executed locally on Internet-distributed, or executed remotely. 2. To provide a code-execution environment to minimizes software deployment and guarantees safe execution of code. 3. Eliminates the performance problems. There are different types of application, such as Windows-based applications and Web-based applications.
  10. 10. To make communication on distributed environment to ensure that code be accessed by the .NET Framework can integrate with any other code.
  11. 11. COMPONENTS OF .NET FRAMEWORK 1. THE COMMON LANGUAGE RUNTIME (CLR): The common language runtime is the foundation of the .NET Framework. It manages code at execution time, providing important services such as memory management, thread management, and remoting and also ensures more security and robustness. The concept of code management is a fundamental principle of the runtime. Code that targets the runtime is known as managed code, while code that does not target the runtime is known as unmanaged code. THE .NET FRAME WORK CLASS LIBRARY: It is a comprehensive, object-oriented collection of reusable types used to develop applications ranging from traditional command-line or graphical user interface (GUI) applications to applications based on the latest innovations provided by ASP.NET, such as Web Forms and XML Web services. The .NET Framework can be hosted by unmanaged components that load the common language runtime into their processes and initiate the execution of managed code, thereby creating a software environment that can exploit both managed and unmanaged features. The .NET Framework not only provides several runtime hosts, but also supports the development of third-party runtime hosts.
  12. 12. Internet Explorer is an example of an unmanaged application that hosts the runtime (in the form of a MIME type extension). Using Internet Explorer to host the runtime to enables embeds managed components or Windows Forms controls in HTML documents. FEATURES OF THE COMMON LANGUAGE RUNTIME: The common language runtime manages memory; thread execution, code execution, code safety verification, compilation, and other system services these are all run on CLR. Security. Robustness. Productivity. Performance. SECURITY: The runtime enforces code access security. The security features of the runtime thus enable legitimate Internet-deployed software to be exceptionally feature rich. With regards to security, managed components are awarded varying degrees of trust, depending on a number of factors that
  13. 13. include their origin to perform file-access operations, registry-access operations, or other sensitive functions. ROBUSTNESS: The runtime also enforces code robustness by implementing a strict type- and code-verification infrastructure called the common type system(CTS). The CTS ensures that all managed code is self-describing. The managed environment of the runtime eliminates many common software issues. PRODUCTIVITY: The runtime also accelerates developer productivity. For example, programmers can write applications in their development language of choice, yet take full advantage of the runtime, the class library, and components written in other languages by other developers. PERFORMANCE: The runtime is designed to enhance performance. Although the common language runtime provides many standard runtime services,
  14. 14. managed code is never interpreted. A feature called just-in-time (JIT) compiling enables all managed code to run in the native machine language of the system on which it is executing. Finally, the runtime can be hosted by high-performance, server-side applications, such as Microsoft® SQL Server™ and Internet Information Services (IIS). 4.1 FEATURES OF ASP.NET ASP.NET ASP.NET is the next version of Active Server Pages (ASP); it is a unified Web development platform that provides the services necessary for developers to build enterprise-class Web applications. While ASP.NET is largely syntax compatible, it also provides a new programming model and infrastructure for more secure, scalable, and stable applications. ASP.NET is a compiled, NET-based environment, we can author applications in any .NET compatible language, including Visual Basic .NET, C#, and JScript .NET. Additionally, the entire .NET Framework is available to any ASP.NET application. Developers can easily access the benefits of these technologies, which include the managed common language runtime environment (CLR), type safety, inheritance, and so on. ASP.NET has been designed to work seamlessly with WYSIWYG HTML editors and other programming tools, including Microsoft Visual Studio .NET. Not only does this make Web development easier, but it also provides all the benefits that these tools have to offer, including a GUI that developers can use to drop server controls onto a Web page and fully integrated debugging support.
  15. 15. Developers can choose from the following two features when creating an ASP.NET application. Web Forms and Web services, or combine these in any way they see fit. Each is supported by the same infrastructure that allows you to use authentication schemes, cache frequently used data, or customize your application's configuration, to name only a few possibilities. Web Forms allows us to build powerful forms-based Web pages. When building these pages, we can use ASP.NET server controls to create common UI elements, and program them for common tasks. These controls allow we to rapidly build a Web Form out of reusable built-in or custom components, simplifying the code of a page. An XML Web service provides the means to access server functionality remotely. Using Web services, businesses can expose programmatic interfaces to their data or business logic, which in turn can be obtained and manipulated by client and server applications. XML Web services enable the exchange of data in client-server or server-server scenarios, using standards like HTTP and XML messaging to move data across firewalls. XML Web services are not tied to a particular component technology or object-calling convention. As a result, programs written in any language, using any component model, and running on any operating system can access XML Web services Each of these models can take full advantage of all ASP.NET features, as well as the power of the .NET Framework and .NET Framework common language runtime. Accessing databases from ASP.NET applications is an often-used technique for displaying data to Web site
  16. 16. visitors. ASP.NET makes it easier than ever to access databases for this purpose. It also allows us to manage the database from your code . ASP.NET provides a simple model that enables Web developers to write logic that runs at the application level. Developers can write this code in the global.aspx text file or in a compiled class deployed as an assembly. This logic can include application-level events, but developers can easily extend this model to suit the needs of their Web application. ASP.NET provides easy-to-use application and session-state facilities that are familiar to ASP developers and are readily compatible with all other .NET Framework APIs.ASP.NET offers the IHttpHandler and IHttpModule interfaces. Implementing the IHttpHandler interface gives you a means of interacting with the low-level request and response services of the IIS Web server and provides functionality much like ISAPI extensions, but with a simpler programming model. Implementing the IHttpModule interface allows you to include custom events that participate in every request made to your application. ASP.NET takes advantage of performance enhancements found in the .NET Framework and common language runtime. Additionally, it has been designed to offer significant performance improvements over ASP and other Web development platforms. All ASP.NET code is compiled, rather than interpreted, which allows early binding, strong typing, and just-in-time (JIT) compilation to native code, to name only a few of its benefits. ASP.NET is also easily factorable, meaning that developers can remove
  17. 17. modules (a session module, for instance) that are not relevant to the application they are developing. ASP.NET provides extensive caching services (both built-in services and caching APIs). ASP.NET also ships with performance counters that developers and system administrators can monitor to test new applications and gather metrics on existing applications. Writing custom debug statements to your Web page can help immensely in troubleshooting your application's code. However, it can cause embarrassment if it is not removed. The problem is that removing the debug statements from your pages when your application is ready to be ported to a production server can require significant effort. ASP.NET offers the Trace Context class, which allows us to write custom debug statements to our pages as we develop them. They appear only when you have enabled tracing for a page or entire application. Enabling tracing also appends details about a request to the page, or, if you so specify, to a custom trace viewer that is stored in the root directory of your application. The .NET Framework and ASP.NET provide default authorization and authentication schemes for Web applications. we can easily remove, add to, or replace these schemes, depending upon the needs of our application . ASP.NET configuration settings are stored in XML-based files, which are human readable and writable. Each of our applications can have a distinct configuration file and we can extend the configuration scheme to suit our requirements.
  18. 18. DATA ACCESS WITH ADO.NET As you develop applications using ADO.NET, you will have different requirements for working with data. You might never need to directly edit an XML file containing data - but it is very useful to understand the data architecture in ADO.NET. ADO.NET offers several advantages over previous versions of ADO: Interoperability Maintainability Programmability Performance Scalability INTEROPERABILITY: ADO.NET applications can take advantage of the flexibility and broad acceptance of XML. Because XML is the format for transmitting datasets across the network, any component that can read the XML format can process data. The receiving component need not be an ADO.NET component.
  19. 19. The transmitting component can simply transmit the dataset to its destination without regard to how the receiving component is implemented. The destination component might be a Visual Studio application or any other application implemented with any tool whatsoever. The only requirement is that the receiving component be able to read XML. SO, XML was designed with exactly this kind of interoperability in mind. MAINTAINABILITY: In the life of a deployed system, modest changes are possible, but substantial, Architectural changes are rarely attempted because they are so difficult. As the performance load on a deployed application server grows, system resources can become scarce and response time or throughput can suffer. Faced with this problem, software architects can choose to divide the server's business-logic processing and user-interface processing onto separate tiers on separate machines. In effect, the application server tier is replaced with two tiers, alleviating the shortage of system resources. If the original application is implemented in ADO.NET using datasets, this transformation is made easier.
  20. 20. ADO.NET data components in Visual Studio encapsulate data access functionality in various ways that help you program more quickly and with fewer mistakes. PERFORMANCE: ADO.NET datasets offer performance advantages over ADO disconnected record sets. In ADO.NET data-type conversion is not necessary. SCALABILITY: ADO.NET accommodates scalability by encouraging programmers to conserve limited resources. Any ADO.NET application employs disconnected access to data; it does not retain database locks or active database connections for long durations. VISUAL STUDIO .NET Visual Studio .NET is a complete set of development tools for building ASP Web applications, XML Web services, desktop applications, and mobile applications In addition to building high-performing desktop applications, you can use Visual Studio's powerful component-based development tools and other technologies to simplify team-based design, development, and deployment of Enterprise solutions.
  21. 21. Visual Basic .NET, Visual C++ .NET, and Visual C# .NET all use the same integrated development environment (IDE), which allows them to share tools and facilitates in the creation of mixed-language solutions. In addition, these languages leverage the functionality of the .NET Framework and simplify the development of ASP Web applications and XML Web services. Visual Studio supports the .NET Framework, which provides a common language runtime and unified programming classes; ASP.NET uses these components to create ASP Web applications and XML Web services. Also it includes MSDN Library, which contains all the documentation for these development tools. 4.2 FEATURES OF SQL-SERVER 2000 The OLAP Services feature available in SQL Server version 7.0 is now called SQL Server 2000 Analysis Services. The term OLAP Services has been replaced with the term Analysis Services. Analysis Services also includes a new data mining component. The Repository component available in SQL Server version 7.0 is now called Microsoft SQL Server 2000 Meta Data Services. References to the component now use
  22. 22. the term Meta Data Services. The term repository is used only in reference to the repository engine within Meta Data Services SQL-SERVER database consist of six type of objects, They are, 1. TABLE 2. QUERY 3. FORM 4. REPORT 5. MACRO TABLE: A database is a collection of data about a specific topic. VIEWS OF TABLE: We can work with a table in two types, 1. Design View 2. Datasheet View Design View
  23. 23. To build or modify the structure of a table we work in the table design view. We can specify what kind of data will be hold. Datasheet View To add, edit or analyses the data itself we work in tables datasheet view mode. QUERY: A query is a question that has to be asked the data. Access gathers data that answers the question from one or more table. The data that make up the answer is either dynaset (if you edit it) or a snapshot(it cannot be edited).Each time we run query, we get latest information in the dynaset.Access either displays the dynaset or snapshot for us to view or perform an action on it ,such as deleting or updating. FORMS: A form is used to view and edit information in the database record by record .A form displays only the information we want to see in the way we want to see it. Forms use the familiar controls such as textboxes and checkboxes. This makes viewing and entering data easy.
  24. 24. Views of Form: We can work with forms in several primarily there are two views, They are, 1. Design View 2. Form View Design View To build or modify the structure of a form, we work in forms design view. We can add control to the form that are bound to fields in a table or query, includes textboxes, option buttons, graphs and pictures. Form View The form view which display the whole design of the form. REPORT: A report is used to vies and print information from the database. The report can ground records into many levels and compute totals and average by checking values from many records at once. Also the report is
  25. 25. attractive and distinctive because we have control over the size and appearance of it. MACRO: A macro is a set of actions. Each action in macros does something. Such as opening a form or printing a report .We write macros to automate the common tasks the work easy and save the time. MODULE: Modules are units of code written in access basic language. We can write and use module to automate and customize the database in very sophisticated ways. It is a personal computer based RDBMS. This provides most of the features available in the high-end RDBMS products like Oracle, Sybase, and Ingress etc. VB keeps access as its native database. Developer can create a database for development & further can create. The tables are required to store data. During the initial Development phase data can be stored in the access database & during the implementation phase depending on the volume data can use a higher – end database.
  26. 26. 5. SYSTEM DESIGN Module Diagram: UML Diagram : Usecase Diagram : Transaction Table Support Count Frequent Item set Transaction Trimming HAPPI RESULT
  27. 27. Class Diagram : user Admin Vendor customer Database
  28. 28. State Chart Diagram : Admin customer Vendor Trimm Support Customer usename password Buy Product Vendor Add Product Edit Pname Database Admin data Cust Data Vendor data Trans id
  29. 29. Login Admin View Result Customer Buy Product Transactio n table
  30. 30. Activity Diagram :
  31. 31. Check Login . NewState2 Login Successfull Login Admin Vendor Customer Transactio Table Result
  32. 32. Collaboration Diagram :
  33. 33. Sequence Diagram : Login IAdmin Result Graph i 1: Authenticated user 2: unauthorized user Compare 3HAPPI 4:existing and propoesdHAPPI RESULT 5: Final
  34. 34. Component Diagram : Admin Vendor Customer Result Authenticated user unauthorized user Transaction Product
  35. 35. E-R Diagram : Admin Vendor Customer Transaction Result
  36. 36. 6. SYSTEM TESTING AND MAINTENANCE users Admin Customer Result Vendor Transactio n Table
  37. 37. Testing is vital to the success of the system. System testing makes a logical assumption that if all parts of the system are correct, the goal will be successfully achieved. In the testing process we test the actual system in an organization and gather errors from the new system operates in full efficiency as stated. System testing is the stage of implementation, which is aimed to ensuring that the system works accurately and efficiently. In the testing process we test the actual system in an organization and gather errors from the new system and take initiatives to correct the same. All the front-end and back-end connectivity are tested to be sure that the new system operates in full efficiency as stated. System testing is the stage of implementation, which is aimed at ensuring that the system works accurately and efficiently. The main objective of testing is to uncover errors from the system. For the uncovering process we have to give proper input data to the system. So we should have more conscious to give input data. It is important to give correct inputs to efficient testing. Testing is done for each module. After testing all the modules, the modules are integrated and testing of the final system is done with the test data, specially designed to show that the system will operate successfully in all its aspects conditions. Thus the system testing is a
  38. 38. confirmation that all is correct and an opportunity to show the user that the system works. Inadequate testing or non-testing leads to errors that may appear few months later. This will create two problems Time delay between the cause and appearance of the problem. The effect of the system errors on files and records within the system. The purpose of the system testing is to consider all the likely variations to which it will be suggested and push the system to its limits. The testing process focuses on logical intervals of the software ensuring that all the statements have been tested and on the function intervals (i.e.,) conducting tests to uncover errors and ensure that defined inputs will produce actual results that agree with the required results. Testing has to be done using the two common steps Unit testing and Integration testing. In the project system testing is made as follows: The procedure level testing is made first. By giving improper inputs, the errors occurred are noted and eliminated. This is the final step in system life cycle. Here we implement the tested error-free system into real-life environment and make necessary changes, which runs in an online
  39. 39. fashion. Here system maintenance is done every months or year based on company policies, and is checked for errors like runtime errors, long run errors and other maintenances like table verification and reports. 6.1. UNIT TESTING Unit testing verification efforts on the smallest unit of software design, module. This is known as “Module Testing”. The modules are tested separately. This testing is carried out during programming stage itself. In these testing steps, each module is found to be working satisfactorily as regard to the expected output from the module. 6.2. INTEGRATION TESTING Integration testing is a systematic technique for constructing tests to uncover error associated within the interface. In the project, all the modules are combined and then the entire programmer is tested as a whole. In the integration-testing step, all the error uncovered is corrected for the next testing steps. 7. SYSTEM IMPLEMENTATION
  40. 40. Implementation is the stage of the project when the theoretical design is turned out into a working system. Thus it can be considered to be the most critical stage in achieving a successful new system and in giving the user, confidence that the new system will work and be effective. The implementation stage involves careful planning, investigation of the existing system and it’s constraints on implementation, designing of methods to achieve changeover and evaluation of changeover methods. Implementation is the process of converting a new system design into operation. It is the phase that focuses on user training, site preparation and file conversion for installing a candidate system. The important factor that should be considered here is that the conversion should not disrupt the functioning of the organization. 7.1 SCOPE FOR FUTURE ENHANCEMENTS Every application has its own merits and demerits. The project has covered almost all the requirements. Further requirements and improvements can easily be done since the coding is mainly structured or modular in nature. Changing the existing modules or adding new modules can append improvements. Further enhancements can be made to the application, so that the web site functions very attractive and useful manner than the present one. In our future work, we are going to increase the clock frequency of the hardware architecture. We will try to optimize the bottleneck module.
  41. 41. 8. CONCLUSION We have proposed the HAPPI architecture for hardware-enhanced association rule mining. The bottleneck of a priori-based hardware schemes is related to the number of candidate itemsets and the size of the database. To solve this problem, we apply the pipeline methodology in the HAPPI architecture to compare itemsets with the database and collect useful information to reduce the number of candidate itemsets and items in the database simultaneously. HAPPI can prune infrequent items in the transactions and reduce the size of the database gradually by utilizing the trimming filter. In addition, HAPPI can effectively eliminate infrequent candidate itemsets with the help of the hash table filter. Therefore, the bottleneck of priori-based hardware schemes can be solved by the HAPPI architecture. Moreover, we derive some properties to analyze the performance of HAPPI. We conduct a sensitivity analysis of various parameters to show many insights into the HAPPI architecture. HAPPI outperforms the previous approach, especially with the increasing number of different items in the database, and with the increasing minimum support values. Also, HAPPI increases computing power and saves the costs of data
  42. 42. mining in hardware design as compared to the previous approach. Furthermore, HAPPI possesses good scalability. 9. REFERENCES Literature Survey: We discuss two previous works that use a systolic array architecture to enhance the performance of data mining.The Systolic Process Array (SPA) architecture is proposed in to perform K-means clustering. SPA accelerates the processing speed by utilizing several hardware cells to calculate the distances in parallel. Each cell corresponds to a cluster and stores the centroid of thecluster in local memory. The data flows linked by each cellinclude the data object, the minimum distance between the object and its closest centroid, and the closest centroid of the object. The cell computes the distance between the centroidand the input data object. Based on the resulting distance,the cell updates the minimum distance and the closest centroid of the data object. Therefore, the system can obtain the closest centroid of each object, respectively, from SPA. The centroids are recomputed and updated by the system,and the new centroids are sent to the cells. The system continuously updates clustering results. The authors implemented a systolic array with several hardware cells to speed up the Apriori algorithm. Each cell performs an ALU (larger than, smaller than, or equal to) operation, which compares the incoming item with items in the memory of th e cell. This operationgenerates frequent itemsets by comparing candidate itemsets with
  43. 43. the items in the database. Since all the cells canexecute their own operations simultaneously, the performanceof the architecture is better than that of a single processor. However, the number of cells in the systolicarray is fixed. If the number of candidate itemsets is larger than the number of hardware cells, the pattern matching procedure has to be separated into many rounds. It is infeasible to load candidate itemsets and the database into the hardware for multiple times. As reported in the performance is only about four times faster than some software algorithms. Hence, there is much room to improve the execution time. However, research in hardware implementations of related data mining algorithms has been Done In the k-means clustering algorithm is implemented as an example of a special reconfigurable fabric in the form of a cellular array connected to a host processor. K-means clustering is a data mining strategy that groups together elements based on a distance measure. The distance can be an actual measure of Euclidean distance or can be mapped from some other data type. Each item in a set is randomly assigned to a cluster, and the centers of the clusters are computed. The elements are then iteratively added and removed from clusters to move them closer to the centers of the clusters. This is related to the Apriori algorithm as both are dependent on efficient set additions and computations performed on all elements of those sets. However, k-means adds the distance computation and significantly changes how the sets are built up. In a system is implemented which attempts to mediate the high cost of data transfers for large data sets. Common databases can easily extend beyond the capacity of the physical memory, and slow tertiary storage, e.g., hard drives, are brought into the datapath. The paper proposes the integration of a simple computational structure for data mining onto the
  44. 44. hard drive controller itself. The data mining proposed by the paper is not Apriori, but rather the problem of exact and inexact string matching, a much more computationally regular problem compared to the Apriori algorithm. However, the work is useful, and will become more so as FPGA performance scales up and significantly exceeds the data supply capabilities of hierarchical memory systems. We base our comparisons of hardware performance versus various state-of-the-art software implementation as we are unaware of any comparable hardware implementation of the Apriori algorithm. Extensive research exists on parallelizing correlation algorithms, But we focus on single-processor machine performance. The array architecture provides a performance improvement of orders of magnitude over the state-of-the-art software implementations. The system is easily scalable and introduces an efficient “systolic injection” method for intelligently reporting unpredictably generated mid-array results to a controller without any chance of collision or excessive stalling. However, the support operation still requires an two orders of magnitude more time than the other stages. We focus on addressing the support problem in this paper. The architecture we develop in this paper is entirely distinct from the work in. Techniques: Apriori Algorithm Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions. For example, collections of items bought by customers.
  45. 45. To compare itemsets with the database and collect useful information for reducing the number of candidate itemsets and items in the database simultaneously. When the database is fed into the hardware, candidate itemsets are compared with the items in the database to find frequent itemsets. Advantages: 1. Reduce infrequent itemset from database 2. Avoid Performance degradation. Applications: 1. Customer transaction records 2 Web site logs
  46. 46. BIBLIOGRAPHY 1. Z.K. Baker and V.K. Prasanna, “Efficient Hardware Data Mining with the Apriori Algorithm on FPGAS,” Proc. 13th Ann. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM), 2005. 2. Z.K. Baker and V.K. Prasanna, “An Architecture for Efficient Hardware Data Mining Using Reconfigurable Computing Systems,” Proc. 14th Ann. IEEE Symp. Field- Programmable Custom Computing Machines (FCCM ’06), pp. 67-75, Apr. 2006. 3. C. Besemann and A. Denton, “Integration of Profile Hidden Markov Model Output into Association Rule Mining,” Proc. 11th ACM SIGKDD Int’l Conf. Knowledge Discovery in Data Mining (KDD ’05), pp. 538-543, 2005. 4. C.W. Chen, J. Luo, and K.J. Parker, “Image Segmentation via Adaptive K-Mean Clustering and Knowledge-Based Morphological Operations with Biomedical Applications,” IEEE Trans. Image Processing, vol. 7, no. 12, pp. 1673-1683, 1998. 5. S.M. Chung and C. Luo, “Parallel Mining of Maximal Frequent Itemsets from Databases,” Proc. 15th IEEE Int’l Conf. Tools with Artificial Intelligence (ICTAI), 2003 APPENDIX SCREEN SHOTS
  47. 47. using System; using System.Data; using System.Configuration; using System.Collections; using System.Web; using System.Web.Security; using System.Web.UI; using System.Web.UI.WebControls; using System.Web.UI.WebControls.WebParts; using System.Web.UI.HtmlControls; using System.Data.SqlClient; public partial class _Default : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { if (Request.QueryString["var"] == "admin") { lnknewuser.Enabled = false; lnkpwd.Enabled = false;
  48. 48. Session["s"] = Request.QueryString["var"]; } if (Request.QueryString["var"] == "customer") { //lnknewuser.Enabled = false; //lnkpwd.Enabled = false; Session["s"] = Request.QueryString["var"]; } if (Request.QueryString["var"] == "vendor") { //lnknewuser.Enabled = false; //lnkpwd.Enabled = false; Session["s"] = Request.QueryString["var"]; } } protected void Button1_Click(object sender, EventArgs e) { if (Request.QueryString["var"] == "admin") { string a = "admin"; string q; SqlConnection conn = new SqlConnection("server=.;database=assciation;uid=sa"); conn.Open(); q = "select uname,pwd from admin where uname='"+txtus.Text+"' and pwd='"+txtpass.Text+"' and usertype='" + a + "'"; SqlCommand cmd = new SqlCommand(q, conn); SqlDataReader dr = cmd.ExecuteReader(); if (dr.Read()) { Response.Redirect("admin.aspx?var="+txtus.Text+""); } dr.Close(); conn.Close(); } if (Request.QueryString["var"] == "customer") { string a = "customer"; string q; SqlConnection conn = new SqlConnection("server=.;database=assciation;uid=sa"); conn.Open(); q = "select uname,pword from users where uname='"+txtus.Text+"' and pword='"+txtpass.Text+"' and usertype='"+a+"'" ; SqlCommand cmd = new SqlCommand(q, conn); SqlDataReader dr = cmd.ExecuteReader(); if (dr.Read()) { Response.Redirect("customer.aspx?var="+txtus.Text+""); } dr.Close(); conn.Close();
  49. 49. } if (Request.QueryString["var"] == "vendor") { string a = "vendor"; string q; SqlConnection conn = new SqlConnection("server=.;database=assciation;uid=sa"); conn.Open(); q = "select uname,pword from users where uname='" + txtus.Text + "' and pword='" + txtpass.Text + "' and usertype='" + a + "'"; SqlCommand cmd = new SqlCommand(q, conn); SqlDataReader dr = cmd.ExecuteReader(); if (dr.Read()) { Response.Redirect("vendor.aspx?var="+txtus.Text+""); } dr.Close(); conn.Close(); } } } using System; using System.Data; using System.Configuration; using System.Collections; using System.Web; using System.Web.Security; using System.Web.UI; using System.Web.UI.WebControls; using System.Web.UI.WebControls.WebParts; using System.Web.UI.HtmlControls; using System.Data.SqlClient; public partial class Newuser : System.Web.UI.Page { //string a; SqlConnection conn; SqlCommand cmd; string q; public void Page_Load(object sender, EventArgs e)
  50. 50. { Label12.Text = Session["s"].ToString(); } protected void ImageButton1_Click(object sender, ImageClickEventArgs e) { if (Label12.Text== "customer") { Response.Redirect("Login.aspx?var=customer"); } if (Label12.Text == "vendor") { Response.Redirect("Login.aspx?var=vendor"); } } protected void btncreate_Click(object sender, EventArgs e) { conn = new SqlConnection("server=.;database=assciation;uid=sa"); conn.Open(); q="insert into users values('"+txtfname.Text+"','"+txtlname.Text+"','"+txtusername.Text+"','" +txtpwd.Text+"','"+txtmail.Text+"','"+txtphone.Text+"','"+txtaddress.Te xt+"','"+txtsecque.Text+"','"+txtsecans.Text+"','"+Label12.Text+"')"; cmd = new SqlCommand(q,conn); cmd.ExecuteNonQuery(); conn.Close(); } }
  51. 51. using System; using System.Data; using System.Configuration; using System.Collections; using System.Web; using System.Web.Security; using System.Web.UI; using System.Web.UI.WebControls; using System.Web.UI.WebControls.WebParts; using System.Web.UI.HtmlControls; using System.Data.SqlClient; public partial class forgotpwd : System.Web.UI.Page { string q; protected void Page_Load(object sender, EventArgs e) { Label9.Text = Session["s"].ToString(); } protected void ImageButton1_Click(object sender, ImageClickEventArgs e) { if (Label9.Text == "customer") { Response.Redirect("Login.aspx?var=customer"); } if (Label9.Text == "vendor") { Response.Redirect("Login.aspx?var=vendor"); } } protected void btnretrieve_Click(object sender, EventArgs e) { SqlConnection conn = new SqlConnection("server=.;database=assciation;uid=sa"); conn.Open(); q = "select uname,pword from users where secque='" + txtsecque.Text + "' and secans='" + txtsecans.Text + "' and mailid='" + txtmail.Text + "' and usertype='" + Label9.Text + "'"; SqlCommand cmd = new SqlCommand(q, conn); SqlDataReader dr = cmd.ExecuteReader(); while(dr.Read()) { retuname.Text = dr[0].ToString(); retpwd.Text = dr[1].ToString(); } dr.Close(); conn.Close(); }
  52. 52. }

×