<ul><li>By </li></ul><ul><li>Rami Shawkat Hatem Al-Salman </li></ul><ul><li>Advisor </li></ul><ul><li>Dr.Natheer Khasawneh...
<ul><li>Introduction. </li></ul><ul><li>Server logs data. </li></ul><ul><li>Clients data. </li></ul><ul><li>Framework for ...
Introduction  <ul><li>In the recent years a large number of websites is published. </li></ul><ul><li>Current web applicati...
Web personalization <ul><li>Web personalization concerns to support the user’s specific environment related to their needs...
AMAZON & Web personalization <ul><li>AMAZON uses recommender system relay on collaborative filtering technique for produci...
AMAZON as a real example Recommendations based on browsing history Recommendations based on preferences of people with sim...
AMAZON as a real example Recommendations based on most recent viewed items
Server logs data <ul><li>server log is a log file that contains vectors of data which are recorded by web server. </li></u...
Apache server access.log
Clients data <ul><li>Clients data is a data which is recorded based on the client navigation to the visited Webpage elemen...
Clients data example
Problem statement <ul><li>Most previous studies are investigated by working on server logs data.  </li></ul><ul><li>The pr...
Motivations <ul><li>Some entries can be extracted from the client’s mouse movements over the visited Webpage.  </li></ul><...
Contributions <ul><li>Until now there is no complete framework which could record and mine in the clients data. </li></ul>...
Framework for collecting and mining client side data <ul><li>We propose a framework to record and mine client’s side data....
Framework for collecting and mining client side data
Session identification  <ul><li>Once a client requests a webpage, the session id is assigned for him.  </li></ul><ul><li>T...
Events identification and recording <ul><li>We identify web elements and associated events. </li></ul><ul><li>The clients ...
Cont, Events identification and recording <ul><li>Our events are classified into two categories: </li></ul><ul><ul><ul><li...
Snapshot of clickstream-based data (Events storing)
Snapshot of time-based data (Events storing)
Merging and Exporting data <ul><li>The records are grouped per client session (session id). </li></ul><ul><li>Our merging ...
Snapshot  of merging data in clickstream-based
Snapshot of merging data in time-based
Web Mining  <ul><li>As in every data mining task, the process of Web Usage Mining consists of three steps: </li></ul><ul><...
Data preprocessing  <ul><li>Preprocessing or data cleaning process is aiming to remove irrelevant data and keeps the consi...
Pattern discovery and web mining
Information and Pattern analysis <ul><li>Most of times, the analysis of the generated patterns and information allows us t...
Three case studies  <ul><li>To validate the proposed framework we have integrated the framework with three different web a...
TinyMCE <ul><li>TinyMCE is a platform independent web based Javascript HTML editor control.  </li></ul><ul><li>We modified...
Snapshot of TinyMCE
Data Collection  <ul><li>As a source of data 60 students from JUST in CPE 411 and CPE 311 classes are asked to use our sys...
Snapshot of merged data
Data Preprocessing <ul><li>The collected data was preprocessed by removing invalid sequences . </li></ul><ul><li>The inval...
Clustering  <ul><li>We separated student’s sequences into clusters with similar clickstream sequences. </li></ul><ul><li>W...
Pattern discovery <ul><li>The clustered sequences are used as an input to the pattern discovery algorithm. </li></ul><ul><...
Classification <ul><li>The output data of clustering step was used as an input to classification models.  </li></ul><ul><l...
E-commerce system <ul><li>In the second case study, E-commerce web application is built from scratch.  </li></ul><ul><li>W...
Snapshot of E-commerce system for Mobile’s
Snapshot of E-commerce system for Camera’s
Data Collection  <ul><li>As a source of data we depend on three sources: </li></ul><ul><ul><ul><li>Students from JUST Univ...
Snapshot of merged data in time-based mode
Data Preprocessing  <ul><li>The total session time and the number of visited features are used as two thresholds.  </li></...
Classification <ul><li>In the time-based data mode, classification models can be directly applied on preprocessed data . <...
E-survey  <ul><li>In the third case study, E-survey web application is built from scratch.  </li></ul><ul><li>We integrate...
Snapshot of E-Survey
Data Collection <ul><li>As a source of data we depend on three sources: </li></ul><ul><ul><ul><li>Students from Yarmook-Ac...
Data Preprocessing <ul><li>The total session time and the number of visited questions are used as two thresholds.  </li></...
Snapshot of preprocessed data
Classification <ul><li>The aggregated times which are spent over 12 questions are used as main 12 features. </li></ul><ul>...
The student’s data model (exponential)
Evaluation <ul><li>For evaluation purpose, we use three well known measures which always used in information retrieval top...
5 folds cross-validation method  Green color as training subsets Red color as testing subset
Results-TinyMCE The Precision, Recall and F-Measure values for NB and DT in 2, 3, 4 clusters using 5-folds cross-validation.
Results-TinyMCE False Positive and True Positive values for NB and DT in 2, 3, 4 clusters using 5-folds cross-validation.
Results E-Survey  Using training dataset Using 5-folds cross-validation Using training dataset Using 5-folds cross-validat...
Results E-Survey Using training dataset Using 5-folds cross-validation
Conclusions  <ul><li>Clients data is very useful. </li></ul><ul><li>Clients data has a flexibility to be mined. </li></ul>...
Future Work  <ul><li>We are looking forward to deal with more clients data such as:  x,y  axis’s. </li></ul><ul><li>We are...
<ul><li>Thank You </li></ul>
Results for E-commerce camera’s
Snapshot of the generated tree from decision tree model for camera’s category
Results for  E-commerce mobile’s
Snap shot of the generated tree from decision tree model for mobiles category
Web applications links <ul><li>http://web- engineering.orgfree.com /   </li></ul><ul><li>http:// easyshoping.orgfree.com /...
Machine learning Algorithms <ul><li>Naïve Bayes is a probabilistic model based on Bayesian theorem . </li></ul>
Machine learning Algorithms <ul><li>C4.5 is a supervised machine learning algorithm which it is developed originally from ...
Machine learning Algorithms SVM is a supervised machine learning algorithm. The main idea is to find a separator line whic...
Upcoming SlideShare
Loading in …5
×

web personalization

344 views
304 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
344
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

web personalization

  1. 1. <ul><li>By </li></ul><ul><li>Rami Shawkat Hatem Al-Salman </li></ul><ul><li>Advisor </li></ul><ul><li>Dr.Natheer Khasawneh </li></ul><ul><li>Co-Advisor </li></ul><ul><li>Dr. Ahmad Al-Hammouri </li></ul>MINING CLIENT SIDE PARADATA FOR ADAPTIVE WEBPAGES
  2. 2. <ul><li>Introduction. </li></ul><ul><li>Server logs data. </li></ul><ul><li>Clients data. </li></ul><ul><li>Framework for collecting and mining client side data. </li></ul><ul><li>Three case studies. </li></ul><ul><li>Results and Discussions. </li></ul><ul><li>Conclusions. </li></ul><ul><li>Future Work. </li></ul>Contents
  3. 3. Introduction <ul><li>In the recent years a large number of websites is published. </li></ul><ul><li>Current web applications aim to interact with users through rich and dynamic contents. </li></ul><ul><li>In the recent years JavaScript has developed to be more interactive not only with a client side but also with the server side, Thus, Asynchronous JavaScript and XML (AJAX) is introduced. </li></ul><ul><li>Web personalization is applied by several websites. </li></ul>
  4. 4. Web personalization <ul><li>Web personalization concerns to support the user’s specific environment related to their needs and domain. </li></ul><ul><li>Many websites use recommender system for supporting a web personalization. </li></ul><ul><li>Webpage's are personalized based on clients preferences (i.e., interests, country, gender etc…). </li></ul>
  5. 5. AMAZON & Web personalization <ul><li>AMAZON uses recommender system relay on collaborative filtering technique for producing personal recommendations. </li></ul><ul><li>Personal (client) recommendations are generated by computing similarity between client preference and others. </li></ul><ul><li>Collaborative filtering technique consists of three steps: </li></ul><ul><ul><ul><li>Record the preferences of a group of clients. </li></ul></ul></ul><ul><ul><ul><li>Choose group of clients whose preferences are similar to the target client using a similarity metric . </li></ul></ul></ul><ul><ul><ul><li>Recommend options (i.e., products) to the target client . </li></ul></ul></ul>
  6. 6. AMAZON as a real example Recommendations based on browsing history Recommendations based on preferences of people with similar profile
  7. 7. AMAZON as a real example Recommendations based on most recent viewed items
  8. 8. Server logs data <ul><li>server log is a log file that contains vectors of data which are recorded by web server. </li></ul><ul><li>The analysis for server logs can help to understanding client’s behavior (i.e., the most and least traffic). </li></ul>agent referrer bytes status request date IP-Address Entry name &quot;Mozilla/3.0WebTV/1.2 (compatible; MSIE 2.0)&quot; http:// www.just.edu.jo 8788 200 &quot;GET/default.ASPX HTTP/1.0&quot; [03/Jan/2011:15:20:06 -0800] 178.77.146.157 Server Log Info
  9. 9. Apache server access.log
  10. 10. Clients data <ul><li>Clients data is a data which is recorded based on the client navigation to the visited Webpage elements. </li></ul><ul><li>Clients data could record the interactions between clients and the elements in the visited Webpage. </li></ul><ul><ul><ul><ul><li>For example: record the name, value and spent time for specific Webpage element. </li></ul></ul></ul></ul>agent referrer bytes status request date IP-Address Spent time Element value Element name Entry name &quot;Mozilla/3.0WebTV/1.2 (compatible; MSIE 2.0)&quot; http:// www.just.edu.jo 8788 200 &quot;GET/default.ASPX HTTP/1.0&quot; [03/Jan/2011:15:20:06 -0800] 178.77.146.157 156.77 seconds Yes DIV1 Client Info
  11. 11. Clients data example
  12. 12. Problem statement <ul><li>Most previous studies are investigated by working on server logs data. </li></ul><ul><li>The previous studies used Web Usage Mining (WUM) techniques for extracting the knowledge from this data. </li></ul><ul><li>Some tools and systems are proposed for tracking clients data. </li></ul><ul><li>The previous studies which related to clients data have not shown the usefulness of clients data. </li></ul><ul><li>Unfortunately , until now there is no complete framework which could record and mine in the clients logs data. </li></ul>
  13. 13. Motivations <ul><li>Some entries can be extracted from the client’s mouse movements over the visited Webpage. </li></ul><ul><li>Extracting useful knowledge from clients data, will help to understanding clients’ behaviors and attitudes in better way. </li></ul><ul><li>Support clients with appropriate recommendations. </li></ul><ul><li>The understanding of clients behaviors and needs, will improve the advertisements for products in WWW. </li></ul>
  14. 14. Contributions <ul><li>Until now there is no complete framework which could record and mine in the clients data. </li></ul><ul><li>Thus, the main contribution of this thesis is to building a complete framework that can recode client’s events and apply the WUM techniques on this data . </li></ul><ul><ul><ul><ul><li>We mainly show the usefulness of the client’s data. </li></ul></ul></ul></ul><ul><li>We customize the client’s data and then we apply WUM techniques on it. </li></ul><ul><li>We build three different web applications and then we integrate our framework with their. </li></ul><ul><li>We build a recommendation engine which is able to discovering the client’s patterns . </li></ul><ul><li>We extract the useful information from the client’s data. </li></ul><ul><ul><ul><ul><li>We generate client’s data model based on client’s data statistics. </li></ul></ul></ul></ul>
  15. 15. Framework for collecting and mining client side data <ul><li>We propose a framework to record and mine client’s side data. </li></ul><ul><li>Our framework consists of five phases respectively: </li></ul><ul><ul><ul><ul><li>Session identification </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Events identification and catching. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Events storing. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Merging and exporting events. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Web mining. </li></ul></ul></ul></ul>
  16. 16. Framework for collecting and mining client side data
  17. 17. Session identification <ul><li>Once a client requests a webpage, the session id is assigned for him. </li></ul><ul><li>The session id presents the number of milliseconds since midnight Jan 1, 1970, by this way the assigned session id for each client is a unique. </li></ul><ul><li>The generated session id is used to identify all recorded events which belong to the same user. </li></ul><ul><li>The session for the client can be finished by a target button or link. </li></ul>
  18. 18. Events identification and recording <ul><li>We identify web elements and associated events. </li></ul><ul><li>The clients data is transferred associated with session id via XmlHttpRequest AJAX call. </li></ul><ul><li>Based on AJAX, the transferring data is a lightweight operation (Clients never feel while data is transferred to server ). </li></ul><ul><li>Seven values are recorded: name, value, Item time, session id, Date, Total mouse's clicks and Personalized. </li></ul><ul><li>Personalized, represents the web element that finishes the session. </li></ul>
  19. 19. Cont, Events identification and recording <ul><li>Our events are classified into two categories: </li></ul><ul><ul><ul><li>Clickstream-based. </li></ul></ul></ul><ul><ul><ul><li>Time based. </li></ul></ul></ul><ul><li>In the clickstream-based category, the name and value of clicked element will be transferred. </li></ul><ul><li>In the time-based category, the name, the value and the spent time of web element will be transferred. </li></ul>
  20. 20. Snapshot of clickstream-based data (Events storing)
  21. 21. Snapshot of time-based data (Events storing)
  22. 22. Merging and Exporting data <ul><li>The records are grouped per client session (session id). </li></ul><ul><li>Our merging algorithm works as follow: </li></ul><ul><ul><ul><li>Load a list of session id’s </li></ul></ul></ul><ul><ul><ul><li>For each session id: </li></ul></ul></ul><ul><ul><ul><ul><ul><li>If the data is clickstream-based then accumulate the sequence of clicks. </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>If the data is time-based then accumulate the spent time over each element. </li></ul></ul></ul></ul></ul><ul><li>The merged data is exported to another Database table. </li></ul><ul><li>The output this phase will be the input for the web mining phase. </li></ul>
  23. 23. Snapshot of merging data in clickstream-based
  24. 24. Snapshot of merging data in time-based
  25. 25. Web Mining <ul><li>As in every data mining task, the process of Web Usage Mining consists of three steps: </li></ul><ul><ul><ul><li>Data preprocessing. </li></ul></ul></ul><ul><ul><ul><li>Pattern discovery and web mining. </li></ul></ul></ul><ul><ul><ul><li>Information and Pattern analysis. </li></ul></ul></ul>
  26. 26. Data preprocessing <ul><li>Preprocessing or data cleaning process is aiming to remove irrelevant data and keeps the consistent data. </li></ul><ul><li>The preprocessing is fulfilled based on thresholds. </li></ul><ul><li>We mainly use two thresholds: </li></ul><ul><ul><ul><ul><li>The total session time. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>The total number of visited elements. </li></ul></ul></ul></ul>
  27. 27. Pattern discovery and web mining
  28. 28. Information and Pattern analysis <ul><li>Most of times, the analysis of the generated patterns and information allows us to understand clients behavior deeply. </li></ul><ul><li>The output of this step can be formulated in many forms. </li></ul><ul><li>One of the most important forms is a generated model which is usually extracted from the statistics (i.e., frequencies.). </li></ul>
  29. 29. Three case studies <ul><li>To validate the proposed framework we have integrated the framework with three different web applications. </li></ul><ul><li>The three web applications are: </li></ul><ul><ul><ul><ul><ul><li>Web based editor controls (TinyMCE). </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>E-commerece web application. </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>E-survey web application. </li></ul></ul></ul></ul></ul><ul><li>The three web applications are hosted online. </li></ul>
  30. 30. TinyMCE <ul><li>TinyMCE is a platform independent web based Javascript HTML editor control. </li></ul><ul><li>We modified TinyMCE source code to integrate the proposed framework with it. </li></ul><ul><li>The events of TinyMCE belong to general data (or clickstream-based data). </li></ul><ul><li>We applied data mining to cluster and discover the client’s sequence patterns. </li></ul><ul><li>Finally we classify the clustered output. </li></ul>
  31. 31. Snapshot of TinyMCE
  32. 32. Data Collection <ul><li>As a source of data 60 students from JUST in CPE 411 and CPE 311 classes are asked to use our system. </li></ul><ul><li>We asked the students to write an advertisement using TinyMCE about JUST to encourage students from Europe Union (EU) countries to study in JUST. </li></ul><ul><li>The click events are recorded. </li></ul><ul><li>The events are merged in a general data mode. </li></ul><ul><li>The merged data will be the input for the data preprocessing step. </li></ul>
  33. 33. Snapshot of merged data
  34. 34. Data Preprocessing <ul><li>The collected data was preprocessed by removing invalid sequences . </li></ul><ul><li>The invalid sequences were determined based on two thresholds: </li></ul><ul><ul><ul><ul><ul><li>The number of clicked controls. </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Total session time which is spent in the sequence . </li></ul></ul></ul></ul></ul><ul><li>Heuristically we used 10 clicks as a first threshold and 200 seconds as a second threshold. </li></ul><ul><li>The data preprocessing step reduces the total number of sequences to be 36 sequences (24 sequences are removed). </li></ul>
  35. 35. Clustering <ul><li>We separated student’s sequences into clusters with similar clickstream sequences. </li></ul><ul><li>We applied K-means clustering technique using heuristics numbers clusters equal to two, three, and four. </li></ul><ul><li>We used edit distance as distance measure to calculating the similarity or dissimilarity between any two objects closing to the mean point. </li></ul><ul><li>The main goal of clustering is to label students sequences. </li></ul>The points represent the student’s sequences
  36. 36. Pattern discovery <ul><li>The clustered sequences are used as an input to the pattern discovery algorithm. </li></ul><ul><li>We applied Generalize Sequence Pattern (GSP) to extract the patterns from each cluster. </li></ul><ul><li>GSP not only discovers the patterns sequences but also preserve the order of these patterns. </li></ul><ul><li>The output of GSP is a top ten patterns for a cluster. </li></ul><ul><li>Theses patterns will be assigned later in classification step. </li></ul>
  37. 37. Classification <ul><li>The output data of clustering step was used as an input to classification models. </li></ul><ul><li>Total session time, number of controls and the clickstream sequence are used as three features for our classification models. </li></ul><ul><li>The classification models are trained based on these features and data. </li></ul><ul><li>We use two classifiers, Naive Bayes and Support Vector Machines. </li></ul><ul><li>After training phase, our classifiers were able to classify the new clients to one of two or three or four classes. </li></ul>
  38. 38. E-commerce system <ul><li>In the second case study, E-commerce web application is built from scratch. </li></ul><ul><li>We integrate our framework with it. </li></ul><ul><li>Our E-commerce system offers two categories of products, Camera’s and Mobiles. </li></ul><ul><li>The main goal of this web application is to proof, that the classification for similar clients can be easily and directly done. </li></ul><ul><li>Each product has seven features. </li></ul>
  39. 39. Snapshot of E-commerce system for Mobile’s
  40. 40. Snapshot of E-commerce system for Camera’s
  41. 41. Data Collection <ul><li>As a source of data we depend on three sources: </li></ul><ul><ul><ul><li>Students from JUST University. </li></ul></ul></ul><ul><ul><ul><li>Students from Heinrich-Heine University of Duesseldorf (Germany). </li></ul></ul></ul><ul><ul><ul><li>Social network websites (Facebook, Myspace, etc.). </li></ul></ul></ul><ul><li>We record the events. </li></ul><ul><li>The events are merged in a time-based mode. </li></ul><ul><li>Based on the time-based mode, the times which are spent over any cell within specific user session, they are aggregated. </li></ul><ul><li>Based on our database statistics, 58 clients bought cameras and 54 clients bought mobiles. </li></ul>
  42. 42. Snapshot of merged data in time-based mode
  43. 43. Data Preprocessing <ul><li>The total session time and the number of visited features are used as two thresholds. </li></ul><ul><li>Based on our experiments, we set total session time to be 20 and number of visited features to be 7. </li></ul><ul><li>Based on these thresholds: </li></ul><ul><ul><ul><ul><li>For Cameras data, 40 clients transactions are pruned, and the remaining clients transactions were 18. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>For Mobiles data, 35 clients transactions are pruned, and the remaining clients transactions were 20. </li></ul></ul></ul></ul>
  44. 44. Classification <ul><li>In the time-based data mode, classification models can be directly applied on preprocessed data . </li></ul><ul><li>Each client transaction is labeled by a buy product button (i.e., client who bought a camera #1). </li></ul><ul><li>Aggregated times which are spent over 28 features (4 products * 7 features), are used as main features. </li></ul><ul><li>Our classification models are trained by preprocessed time-based data. </li></ul><ul><li>We use three classifiers Naive Bayes, Support Vector Machines and Decision Tree (C4.5 algorithm). </li></ul>
  45. 45. E-survey <ul><li>In the third case study, E-survey web application is built from scratch. </li></ul><ul><li>We integrate our framework with it. </li></ul><ul><li>E-survey is a simple web application which allows students to assessing lecturers by both multiple and assay questions. </li></ul><ul><li>The main goal of E-survey is to understand student’s attitude and behavior. </li></ul><ul><li>E-survey Webpage consists of twelve questions (eleven multiple questions and one assay question). </li></ul><ul><li>Each multiple choice question, consists of four options (Can not dot it at all, weak, good and very good). </li></ul>
  46. 46. Snapshot of E-Survey
  47. 47. Data Collection <ul><li>As a source of data we depend on three sources: </li></ul><ul><ul><ul><li>Students from Yarmook-Accouncting class. </li></ul></ul></ul><ul><ul><ul><li>Students from Jadara-Computer skills class. </li></ul></ul></ul><ul><ul><ul><li>Students from Philadelphia-Design class. </li></ul></ul></ul><ul><li>We record the events. </li></ul><ul><li>The events are merged in the time-based mode. </li></ul><ul><li>Based on the time-based mode, the times which are spent over any question within specific user session, they are aggregated. </li></ul><ul><li>Based on our database statistics, 101 students assessed their lecturers. </li></ul><ul><ul><ul><ul><li>37 students from Yarmook University, 38 students from Philadelphia University and 26 students from Jadara University. </li></ul></ul></ul></ul>
  48. 48. Data Preprocessing <ul><li>The total session time and the number of visited questions are used as two thresholds. </li></ul><ul><li>Based on our experiments, we set total session time to be 25 and number of visited questions to be 12. </li></ul><ul><li>Based on these thresholds 11 students transactions are discarded from student Database. </li></ul><ul><ul><ul><ul><li>The remaining transactions are 90. </li></ul></ul></ul></ul>
  49. 49. Snapshot of preprocessed data
  50. 50. Classification <ul><li>The aggregated times which are spent over 12 questions are used as main 12 features. </li></ul><ul><li>In E-Survey, the recorded transactions are not labeled directly. </li></ul><ul><li>Labeling is done by a flag question. </li></ul><ul><li>Our classification models are trained by preprocessed time-based data. </li></ul><ul><li>We use three classifiers Naive Bayes, Support Vector Machines and Decision Tree (C4.5 algorithm). </li></ul>
  51. 51. The student’s data model (exponential)
  52. 52. Evaluation <ul><li>For evaluation purpose, we use three well known measures which always used in information retrieval topic, 1. Precision, 2. Recall, 3.F-measure. </li></ul><ul><li>The False Positive (FP) and False Negative (FN) measures are used for evaluating the errors in classification models. </li></ul><ul><li>For testing purposes, the classifiers are testing in two modes : </li></ul><ul><ul><ul><ul><li>Training dataset method. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>5 folds cross-validation method. </li></ul></ul></ul></ul><ul><li>Training dataset method uses dataset for both training and testing. </li></ul><ul><li>5 folds cross-validation method divides dataset into subsets, one of them used for testing and the remaining subsets for training. </li></ul>
  53. 53. 5 folds cross-validation method Green color as training subsets Red color as testing subset
  54. 54. Results-TinyMCE The Precision, Recall and F-Measure values for NB and DT in 2, 3, 4 clusters using 5-folds cross-validation.
  55. 55. Results-TinyMCE False Positive and True Positive values for NB and DT in 2, 3, 4 clusters using 5-folds cross-validation.
  56. 56. Results E-Survey Using training dataset Using 5-folds cross-validation Using training dataset Using 5-folds cross-validation
  57. 57. Results E-Survey Using training dataset Using 5-folds cross-validation
  58. 58. Conclusions <ul><li>Clients data is very useful. </li></ul><ul><li>Clients data has a flexibility to be mined. </li></ul><ul><li>Clients data could has multiple forms. </li></ul><ul><li>Clustering should be used for labeling unlabeled clients transactions. </li></ul><ul><li>Classification is very practical in clients data. </li></ul><ul><li>Our complete framework will help to improve clients experiences. </li></ul><ul><li>Our classification models show the ability to classify with high accuracy rate. </li></ul>
  59. 59. Future Work <ul><li>We are looking forward to deal with more clients data such as: x,y axis’s. </li></ul><ul><li>We are looking for developing new clustering and classification techniques which can deal efficiently with client’s data. </li></ul><ul><li>We will extract more knowledge of clients data. </li></ul>
  60. 60. <ul><li>Thank You </li></ul>
  61. 61. Results for E-commerce camera’s
  62. 62. Snapshot of the generated tree from decision tree model for camera’s category
  63. 63. Results for E-commerce mobile’s
  64. 64. Snap shot of the generated tree from decision tree model for mobiles category
  65. 65. Web applications links <ul><li>http://web- engineering.orgfree.com / </li></ul><ul><li>http:// easyshoping.orgfree.com / </li></ul><ul><li>http:// questions.orgfree.com / </li></ul>
  66. 66. Machine learning Algorithms <ul><li>Naïve Bayes is a probabilistic model based on Bayesian theorem . </li></ul>
  67. 67. Machine learning Algorithms <ul><li>C4.5 is a supervised machine learning algorithm which it is developed originally from ID3 algorithm . </li></ul><ul><li>C4.5 generates decision trees from a set of training data based on an information entropy concept. </li></ul>
  68. 68. Machine learning Algorithms SVM is a supervised machine learning algorithm. The main idea is to find a separator line which called hyperplane. Hyperplane separates the n- dimensional data completely into its two (or more) classes.

×