Final report

815 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
815
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
25
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Final report

  1. 1. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project WEB MINING PROJECT OF 531A SUMMERY REPORT Team member: Yi-Jen Ho Yi-Ling Lin Howard R. Zhao Ying Ding Instructor: Prof. Hsinchun Chen 12/16/2005 1
  2. 2. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project INDEX 1. Introduction................................................................................3 2. Research Questions....................................................................4 2.1 Friendly User Interface..........................................................................................4 2.2 Web-service APIs...................................................................................................5 2.3 Database and Data Mining Algorithm...................................................................5 3. Literature Review......................................................................6 4. Research Design.........................................................................7 4.1 Functionality..........................................................................................................7 4.2 Architecture............................................................................................................8 5. Findings.......................................................................................8 5.1 Database Schema / Data Mining Algorithm..........................................................9 5.1.1 Database Design..........................................................................................9 5.1.2 Schema of Data Tables..............................................................................10 5.1.3 Data Mining Algorithm.............................................................................11 5.2 APIs......................................................................................................................12 5.3 User Interface.......................................................................................................13 5.4 Sample Scenarios.................................................................................................13 6. System Novelty.........................................................................21 6.1 Data Mining Algorithm........................................................................................22 6.2 APIs......................................................................................................................22 6.3 User Interface.......................................................................................................22 6.4 Compare with others............................................................................................23 7. Conclusion / Future Directions...............................................23 Reference......................................................................................26 2
  3. 3. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project 1. Introduction Standing on the Giant’s shoulder, we can always look further and more widely. Our team, encouraged by the promising access huge, open data warehouses of Amazon, eBay, and Google, , came to a creative e-business idea: provide a platform for users to gather specifically useful information for their special products to sell or buy. In order to make our dream come true, we did our all efforts to establish a powerful website to fulfill the potential requirements of our target users and named it “Wishsky”. We expect our product, Wishsky, could provide complete production information, such as retail prices on Amazon, auction prices and seller details on eBay, and hot news of certain items. In the future, Wishsky will become an all purpose community not only to serve general Internet users but also to provide bring manufacturers and other businesses important information. According to above brainchild, we consider following five perspectives to construct the business model of Wishsky: target customers, services provided, potential business partners, information resources and financials. Our target customers are users who want to compare prices before buying, who prefer buying products online, and who prefer applications which emphasis community interaction. Moreover, we provide services including product information gathered from eBay and Amazon, Google searches related to customers’ wish items and the consuming trend information. As to potential business partners, we expect to collaborate with e-market places such as Yahoo! Auction, and online retailers as well as Walmart.com. In addition, we will get information resources from e-Bay, Amazon, Google, and our business partners. The financial opportunities in the platform would be positioned on different business values. Basically, in the short term, we may attract some advertisements and our statistics information to sustain the operation overhead of Wishsky. Besides, we are going to collect information from the “Wish-Lists” on Amazon and 3
  4. 4. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project from our own users. After making some statistical analysis, we will provide precise potential market information for sellers, like the catalog of the most popular products, the acceptable price range, the geographical distribution of buyers, etc. Furthermore, we will serve the buyers by mining meaningful information for them to get the products with the cheapest price and best services. Based on mentioned business model, Wishsky is implemented by accessing API (Application Programming Interface) of Amazon, eBay, and Google, by presenting a user friendly environment, by constructing complete supporting database, and by adopting efficient data mining algorithm. It is a critical success factor for Wishsky to combine above components closely and effectively so we have to overcome many research questions which are detailed as following. 2. Research Questions After overall business model of Wishsky is initiated, some important research challenges should be overcome. These challenges could be categorized in three topics, friendly user interface, web-service APIs, and data mining algorithm. 2.1 Friendly User Interface Because Wishsky is design for supplicated Internet users as well as aged and young users, it is very important for Wishsky to build up a friendly environment to serve multi-aged users. Therefore, there are three major issues of user interface. First, the operation processes of Wishsky should be very easy to use, and there should be some clear instruction to guide users with diverse information literacy to perform powerful functionality of Wishsky. Second, interesting and lovely interface is 4
  5. 5. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project critical for Wishsky to attractive more traffic. Last, well-design interface should combine and present all functionality in Wishsky. Therefore, in Wishsky, lots of easily understandable pictures and figures will be used to teach first time users how to operate functions so users do not spend much time to do “try-and-error”. Meanwhile, lively interface will promote the reputation of Wishsky significantly. 2.2 Web-service APIs According to initial plans, there are three APIs which are implemented in Wishsky, and these APIs come from Amazon, Google, and eBay. Basically, APIs of each web-service provider need to be tested because there could be some conflictions between APIs and system architecture. Moreover, the combination of different APIs provided by different web-service providers should be paid more attention. For instance, how to integrate different data formats of heterogeneous APIs is critical technical issue to solve. In order to achieve the planned functions, some Amazon APIs, such as customers’ wish-list retrieval and item search, are implemented in Wishsky. Furthermore, available auction search of eBay API is used to provide users valuable auction details. As to Google’s APIs, new search is important to be adopted to search hot news related to products. 2.3 Database and Data Mining Algorithm In order to provide effective recommendation mechanism of related product for users, it is important to construct correct database schema and to apply proper and efficient data mining algorithm. Fortunately, there are tons of well-designed DBMS and data mining related applications/program packages implemented easily by doing some revisions. As a result, significant are how to choose proper DMBS, how to design effective database 5
  6. 6. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project schema, how to decide which type of data mining algorithms form 4 major types, predictive modeling, database segmentation, link analysis, and deviation detection, and how to choose proper algorithm package to implement into Wishsky. In addition, time-complexity of algorithm is another major issue for data mining implementation. Imagine that it is time-consuming to execute recommendation algorithm if the volume of data is extremely high so adopting real-time process or back-end process significant affects overall performance of Wishsky. 3. Literature Review How to expand the market is always one of the most concerned issues in business. The initiative of procuring customers’ wishes has been proved to be one of the most efficient ways to understand what the market needs, and this has been widely adopted in business. Based on the customers’ wish-lists, business distributes to them their product information via various channels, like email, personalized dynamic web content. This strategy, on one hand, provides business a useful channel for advertising; on the other hand, customers get informed about those products they are interested in. Tons of businesses have integrated this method in their marketing strategies, like www. Half.com. However, many businesses are not satisfied with this simple application of information from customers’ wish-lists, they move further to dig out more value in it. Various recommendation algorithms, such as association rules and clustering algorithm, are designed to figure out the items customers may possibly be interested in according to their shopping history or the items which already exist in their wish-lists so business may develop a wider market. The logistics behind recommendation rules are different from one another, which may come from market analysis, or studies in social science or psychology, yet they are all created to expand the sellers’ market. The typical representative and also one of the pioneers to deploy this strategy is Amazon. No matter how innovative techniques the aforementioned businesses apply to make full use of customers’ wish-lists, they share one thing in common - they are trying to sell what they want to sell. However, our project set out to provide a quite different kind of 6
  7. 7. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project service to customers who create wish-lists in our website, where they can really enjoy the pleasure of being served like God. 4. Research Design 4.1 Functionality The main function of the Wishsky application is to allow users to add and manage Amazon items in a feature-rich wish-list based setting. There are two ways users can add items to their wish-list. The first way is to search for a wishlist on Amazon using the email address. The second way is to search for items by item type (i.e. Book, DVD, etc) and keywords. In both ways, a list of items will be displayed with thumbnail image (if available), title, and listing price. There’s also a link to the item on the Amazon webpage, so users can see more details about the item. To add an item to the Wishsky wish-list, select the checkbox next to it and click the “Add Items to this Wishlist” button. A user can also delete items from his or her wish-list by clicking the “Delete” button next to the item. The next major function of the application is the integration with EBay and Google. Once a user is done managing the wish-list, the next webpage shows the items on your wish-list. Above it is a list of news items from Google related to a wish-list item. Clicking in a wish-list item displays a popup with auction items searched from EBay. The bottom part of the page displays recommended items. Recommendations are figured out three ways: by Association rule, popular items from a user’s friends list, and most popular items from all users. 7
  8. 8. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project 4.2 Architecture The above diagram displays the basic architecture of the Wishsky application. Wishsky is a Java application, and uses a variety of java libraries to interact with its various components. Users access the application through a web browser (i.e. Firefox, Internet Explorer), and the web browser uses HTTP requests to interface with our application, which is hosted in a Tomcat Web container. Using JDBC, Wishsky interfaces with a SQL Server database. The database is used for saving customer information, and wish-list items. It is also used for data mining. Finally, Wishsky also connects to EBay, Google and Amazon via Web Services. Using SOAP calls, the application is able to dynamically request and receive information from each site. 5. Findings After doing our all efforts to construct basic architecture of Wishsky and link all 8
  9. 9. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project important components together, we are pleasant to detail what we found during the process of implementing Wishsky. 5.1 Database Schema / Data Mining Algorithm In order to achieve effective and efficient operations within functionality of Wishsky, there is a need to establish a supporting database to record users’ data or to store product related information which retrieved from Amazon, Google, or eBay. Moreover, based on the consideration of efficiency, the back-end recommendation of Wishsky implemented by association rules also needs database to record rules which are generated by Apriori algorithm. Hence, there are two groups of data tables which are detailed as following. 5.1.1 Database Design The first group of data tables is used to store users related data, such as their basic profiles, their wish-list information, items in users’ wish-lists, and theirs friend lists. The second group of data tables is used to record association rules which are generated by back-end data mining algorithm everyday. As following figure shows, the relationships between data tables present how Wishsky record users’ related information and data mining results. Friend Association Customer Wishlist Item 9
  10. 10. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project 5.1.2 Schema of Data Tables Table Name: Customer PK C_id int 4 C_account char 20 C_pwd char 20 C_amazon_email nvarchar 50 C_amazon_listid char 10 Null C_ebay_id nvarchar 50 Null C_ebay_pwd nvarchar 20 Null C_first_name nvarchar 20 C_last_name nvarchar 20 C_gender char 1 C_marry char 1 C_birth_year int 4 C_birth_month int 4 C_birth_day int 4 C_profession nvarchar 15 C_income char 6 C_state char 2 C_city nvarchar 15 Table Name: Wishlist PK W_id int 4 FK C_id int 4 W_name char 20 Table Name: Item PK I_id bigint 8 FK W_id int 4 I_amazon_id nvarchar 50 I_name nvarchar 50 Null I_quantity int 4 Null I_priority int 4 I_lowestprice decimal 9 10
  11. 11. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project I_set_lowprice decimal 9 Null I_set_highprice decimal 9 Null Table Name: Friend PK F_no bigint 8 C_Account_one char 20 FK C_Account_two_id int 4 C_Account_two char 20 FK C_Account_two_email nvarchar 50 C_Account_two_W_id int 4 F_Nickname nvarchar 100 Null Table Name: Association PK A_no bigint 8 0 FK I_one char 20 0 FK I_two char 20 0 A_confidential decimal 9 Null 5.1.3 Data Mining Algorithm Wishsky is very friendly because it provides recommended production information, including product picture, title, and retailer price on Amazon, to users so they can notice what they need but forget. In order to recommend proper products for users, there are three recommendation mechanisms in Wishsky, and they are “Associated Items”, “Friend’s Wish-Items”, and “Hot Product”. First, “Associated Items” is implemented by overwriting Apriori algorithm in WEKA software package. Like what is mentioned before, we decide to do data mining process with back-end method because of efficiency consideration, and the back-end method was shown in following picture. Apriori is commonly used to figure out association between items within different transaction records so we adapted it to do first type of recommendation. 11
  12. 12. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project Second, we implement “Friend’s Wish-Items” by calculating hot products between your friends’ wish-lists because we assume that birds of a feather flock together. As a result, Wishsky recommends what your friends like to you. Last one is hot product recommendation. By intuition, the most popular items are recommended to users. 5.2 APIs When these three major Internet companies (eBay, Google, and Amazon) announced that they would provide access to their wealth of data free of charge, the Internet community applauded them. The way they each decided to allow access to their data is in some ways very similar, but also different. Fortunately, they all provided a SOAP interface using Java, and sometimes, they provide the java libraries themselves. This made it easier to integrate each web service with the Wishsky application. Another similarity is that we were given unique ID strings to be attached to each SOAP request. Through this method, each company could monitor and limit the number of requests made to their system by specific applications. During our testing and development, the request limits were never an issue, but if our application were to be used by hundreds, or even thousands of customers, this would be a major issue. Because each Web Service provides different levels of functionality for users, there was noticeable different level of effort required to setup the interface. By far, EBay took the most amount of time, because not only does it allow searching of items being auctioned, but it also allows bidding and posting items to sell. What this meant is that EBay requires not one, but three IDs, a real eBay account, a registered runame (a unique name of our Wishsky application), and a long unique string (token) associated with the EBay account, just to do just the simplest search. The scope of our application did not involve bidding or selling items on EBay, only searching. This meant we only needed one EBay account and token string was needed for the whole application. Otherwise, each Wishsky user would be required to register an EBay account and go through an authorization process. In comparison, Amazon and Google especially, were relatively easy to setup, was not without its difficulties. The good folks at Google provided a jar file with the necessary java classes to quickly access Google searches. However, we found out that the supplied jar file was not 12
  13. 13. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project compatible with Tomcat. I was forced to discard the jar, and create my own SOAP classes using Apache Axis WSDL2Java function. Amazon is more similar to EBay, because we use it to search for Items, either through an Amazon Wishlist, or by keyword. However, it does not allow posting items to sell, so there’s no extensive authorization process. 5.3 User Interface The interface design rationale of the platform is giving our customer hope and making their dreams come true. Because we believe everybody is happy when he or she sees the beautiful rainbow in the sky, we design the interface with sky and rainbow. We also hope our customers are happy when they are in our website. So we design two little happy boy and girl to welcome our customers and let our customers have a family feeling in our website. In short, we will design the interface based on these expectations:  Give user the impression of being in a dream  Make the interface appealing to multiple age groups  Family friendly 5.4 Sample Scenarios When a new user visit to Wishsky, and he can easily find what products are hot ones. If click the picture of the product, it will directly link to Amazon related page so that he can get more detail about that products. Meanwhile, Wishsky provides important news and reviews information for hot products. After user clicks on the title of the production, right frame will show him the lists of related links which are generated from Google search. For instance, he would like read the first news. Pop up windows directly like to the webpage. 13
  14. 14. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project By the way, in this page, Wishsky also provides the function of searching wish-list in Amazon by interring correct email address. After entering basic data and click submit, user, as a new register, will see a user friendly page because there is a clear guide to teach first time users how to create his own wish-lists on Wishsky. First, user can directly import desired items from his Amazon 14
  15. 15. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project Wishlist according to his email address for Amazon. After successfully import it, he can choose items and add them to my wish-list. 15
  16. 16. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project 16
  17. 17. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project On the other hands, users can user type keyword of product title and select product categories to search products form Amazon. For instance, he would like search “Sony T7 Camera” related products. Easily, he can find what he want and add it into his wish-list. 17
  18. 18. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project 18
  19. 19. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project If users finish Wishlist initial setting, they just click done and go to next page. Then, they can enter a personalized page. Wishsky tries to build up an environment which is just for you so that you can see you name, cute icon based on your gender, and your wish items on this page. Again, Wishsky provides critical product information based on your wished items. Moreover, when users click on item title, a list of available auctions of certain items will pop up. If users want see more detail, just click the link to eBay. In Wishsky, users can maintain their basic data, including password, personal profiles, and friend lists. Users just need to input their friends’ ID and added into their 19
  20. 20. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project friend list. After finishing it, they can easily know what their friends want for Christmas gifts. In addition, our Wishsky provides three different mechanisms to recommend users something they might be interesting in. The first one is based on the result created by 20
  21. 21. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project association rule algorithm. Initially, Wishsky will show us recommended items which come from association rule based on our customers’ Wishlist. The second type is based on your friends’ Wishlist so Wishsky recommends what your friends like to you. The last one is based on hot item within our websites so that we can catch the trend of the fashion. 6. System Novelty Wishsky is a website integrates product information form Amazon, Google, and eBay and, at the same time, it provides friendly interface for multi-aged users to operate useful functions. There must be something special within Wishsky so, in this chapter, we discuss the novelty of Wishsky. 21
  22. 22. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project 6.1 Data Mining Algorithm Wishsky implements recommendation function based on the result generated from Apriori algorithm, this kind of recommendation is most common one applied on nearly all of electronic retailers, such as Amazon. Simultaneously, Wishsky design another interesting recommendation mechanism which depend on the most popular items which appear in users’ friends’ wish-lists repeatedly. It is the unique innovation which successfully combines wish-list and friend list because Wishsky is the only one community based on wish-list. As a result, Wishsky has high potential to become a whole new community which provides lots of business opportunities. 6.2 APIs Our application is the only application known to interface with EBay, Amazon and Google. This allows users to potentially receive a great deal of relevant information quickly. Users can open multiple browsers at once, and search each application separately, but it would tedious to search for many items, and there would be no way to easily view all the information on one page. 6.3 User Interface On the basis of the design rationale of family friendly interface, we not only have friendly interface design for different customers but also provide clear directions for different functions. Our interface is designed with soft color and naive logos. With different functions, providing circumspect directions to lead customers using each function more smoothly. 22
  23. 23. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project 6.4 Compare with others Our project is different not only in the technological way, but also our business idea is quite interesting and fresh. This demonstrates in two perspectives. The first is we create a new kind of business community. A business community is a place where customers with similar business interest gather together. In the community we create in our project, customers know each other’s wishes. This will provide a lot of convenience to our customers. For instance, you can avoid the embarrassing time of sending an inappropriate gift to your friend if you know what he or she really wishes. And you will probably get very excited to receive from your friend a right gift you have long been expecting for, as he or she gets a chance to know your wishes. As far as we know, this kind of business community has never been seen before. 7. Conclusion / Future Directions We are so glad to see what we have achieved after a long semester effort- a business website with an innovative business idea, comprehensive and considerate service, friendly and warm interface, most importantly, the optimistic huge market value. This is from a cooperative and hard working team. At the beginning period of our project, we proposed several different plans, and each sounded very reasonable, but we respected each other’s idea, and discussed without biased in order to choose out the best one. During the designing period, we all tried to do what he or she could do, and no one talked about if the work distribution was fair or not. Furthermore, we helped each other out, and never retained anything within him or her. We learned from each other, and encouraged each other whenever we felt depressed. Due to the lack of practical experience in developing complete systems, we did feel a lot of pain during the process for this project, as we always came up more requirements than planned while having to make the schedule. But the best thing was we had never given up, and we came to know that things can be more perfect but never be completely perfect. The result was we were 23
  24. 24. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project very happy to see the good responses from the professor and our classmates during the demo. We sincerely hope that all the users of “Wishsky” realize their wishes as we did. In order to continue our great dream of Wishsky, we, of course, will put more efforts in it so we have some future plans to create more value-added functions, design more interesting interfaces, and even integrating more APIs of web-services. The major future plans are summarized into three following directions.  Data Mining Algorithm At this moment, Wishsky provides users product recommendation based on product association relationships of all users’ wish-lists, product association relationships of users’ friends’ wish-lists, and hot products in Wishsky. Although current solution of recommendation is sufficient to fulfill the need of users, it still could be improved. According to our future plan, we would like to adopt ART (Adaptive Resonance Theory Network) to group our mass users based on their basic attributes so that is the reason why we require new registers full out many personal attributes. Because ART is an unsupervised machine learning algorithm, it is useful to categorize users. Hence, users can be divided into different interesting groups first, and then Apriori association rule algorithm is used to find out product association rules among wish-lists of users who are in the same interesting groups. By doing so, Wishsky could do more effective and accurate product recommendations.  APIs Web Services is still a new technology, a novelty that applications (Web or Desktop) are just starting to use. Most likely, this technology will evolve and change rapidly. To take advantage of new features and bug fixes, Wishsky will also need to update to newer versions on a regular basis. Also, because of time limitations, for now the Wishsky application can only search for items on eBay and Amazon. Future development may allow users to actually buy items or post items for auction in eBay. A key idea of Wishsky is to allow users to see buyer demand before auctioning items, and to find out which items are popular, and likely to receive more bids. It only seems only natural to also allow users to quickly auction an item where they never have to leave our interface, and may then be tempted to use a competing application’s software.  User Interface We could provide more functions for our customers to personalize their own interface in the future. Our customer can upload their personal photo to be the welcome logo, and 24
  25. 25. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project adjust the interface color by their preferences. In addition, they can design and change each block of the interface with their interests and the interface will show all of the interesting functions based on different customers. Meanwhile, the platform will dynamically show different information and functions for different customers based on their individual interests and settings. In briefly, we hope we can provide a personalized space for our customers to enjoy all of information what they want. We hope that our product, Wishsky, not only make our clients’ dreams come true but also ours. For the success of our project, we appreciate a lot for the instruction from Dr. Chen, and the help from the teaching assistant Xin Li and other teams. 25
  26. 26. MIS 531A Data Structure & Algorithm Report of Group Web Mining Project Reference  Website 1. http://www.half.com 2. http://www.amazon.com 3. http://www.google.com 4. http://www.ebay.com 5. http://www.cs.waikato.ac.nz/ml/weka/ 6. http://www.amazon.com/gp/aws/landing.html 7. http://www.google.com/apis/ 8. http://developer.ebay.com  Paper 1. Yung-Hsiang Chiu, “The Study of Applying Neural Network and Data Mining Techniques to Course Recommendation Base on E-learning Environment”, 6 June 2003 2. Guan-Hua Sun, “A Data Mining Methodology for Library New Book Recommendation”, July 2000 26

×