Comparing Colleges on basis of various attributes and doing regression using Weka Software
Demonstration of Clustering using Weka on various attributes on data set of places.
This presentation demonstrated the fundamental of SPSS for beginner to learn what is SPSS and how to create variables and define their definition.
Thank you for your interest.
Please contact for more detail.
This presentation demonstrated the fundamental of SPSS for beginner to learn what is SPSS and how to create variables and define their definition.
Thank you for your interest.
Please contact for more detail.
SPSS is widely used program for statistical analysis in social sciences, particularly in education and research. However, because of its potential, it is also widely used by market researchers, health-care researchers, survey organizations, governments and, most notably, data miners and big data professionals.
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docxjosies1
Task A. [20 marks] Data Choice.
Name the chosen data set(s) (from module resources, UCI ML Repository or other open data sources or own collection) and describe the data (e.g. attribute types and values, source of data) Comment by Abdulrahman Alkandari: In this part on the red section is a link where I got the data and their a summary on the data
[5 marks]
Adult data set for salary prediction of 50K less or more
http://archive.ics.uci.edu/ml/datasets/adult
Describe the data mining problem (and background) you will address e.g. as a classification, prediction, association, clustering, or text mining related exercise
[5 marks] Classification and predicting, association rule task mining Comment by Abdulrahman Alkandari: The data mininig problem chosen to view this data
Introduce the specific data mining question(s) related to the problem, with specific reference to the dataset(s) and the expected or proposed outcome of the data mining task upon completion Comment by Abdulrahman Alkandari: In the red section the questions are.
How to predict the salaries based on the genders and other charateristics.
And finding the income of the adults
[10 marks]
Predicting the salaries and the best rules needed in knowing the income of the adults by reading the data.
The main aim of this coursework is to critically analyse data sources and data sets, critically evaluate possible data analytics challenges and solutions, choose, design and implement data mining algorithms to the chosen data, and apply the data mining techniques to specific case studies. The coursework is worth 100 marks, and the distribution of marks is detailed on the marking scheme.
You are expected to explore one or two chosen data set(s) of your choice from open data mining/machine learning (re)sources, to develop case studies and apply data mining techniques on the data set(s) for supervised and/or unsupervised learning, as motivated and decided by which is suitable (depending on the data set characteristics). Tasks A, B, and G are compulsory, and you must choose 2 tasks from C, D, E, and F:
Task A. [20 marks] Data Choice.
Name the chosen data set(s) (from module resources, UCI ML Repository or other open data sources or own collection) and describe the data (e.g. attribute types and values, source of data)
[5 marks]
Adult data set for salary prediction of 50K less or more
http://archive.ics.uci.edu/ml/datasets/adult
Describe the data mining problem (and background) you will address e.g. as a classification, prediction, association, clustering, or text mining related exercise
[5 marks] Classification and predicting, association rule task mining
Introduce the specific data mining question(s) related to the problem, with specific reference to the dataset(s) and the expected or proposed outcome of the data mining task upon completion
[10 marks]
Predicting the salaries and the best rules needed in knowing the income of the adults by reading the data.
Task B. [20 marks] Data.
(Gaurav sawant & dhaval sawlani)bia 678 final project reportGaurav Sawant
PROJECT REPORT
• Performed memory-based collaborative filtering techniques like Cosine similarities, Pearson’s r & model-based Matrix Factorization techniques like Alternating Least Squares (ALS) method
• Studied the scalability of these methods on local machines & on Hadoop clusters
Leveraging Machine Learning or IA in order to detect Credit Card Fraud and suspicious transations. The aim of this presentation is to help you to improve your knowledge in Machnie Learning and to start development of multiple families of algorithms in Python.
MISY 3331 Advanced Database Concepts
Assignment 3
Dr. Sotirios Zygiaris
[email protected]
Room:F084, tel. ext 5471
10% of your final grade (Covers chapters 7,8,10,11)
1. In Exercise 2.6, related to sales forecasting, the following business requirements were set. A-Oil & Chemical is chemical company that plans to create a database to forecast sales.
· A salesperson is responsible for a lead to sales. Each lead consists of the responsible salesperson, the customer targeted, date occur, projected date, projected sale amount and possibility of the sale to occur.
· Each Salesperson is specified by : First name, last name, telephone, date of hire
· Each customer is specified by title, address, telephone
· Leads that became sales are marked as “s” for success. Leads that fail are marked as “F” for fail. Leads that not have a final outcome yet are marked as “I” for idle.
The following diagram reflects the design for the database.
Guidelines
1. Attend and participate in assignment labs
2, . For each of the questions above create a clear screen shot that will include the database name, the SQL command and the produced results. Make sure that you have tested the results for correctness
3. show your work to your professor and get green light allowing you to submit assignment 1. Instructor will sign the evaluation rubric allowing you to submit.
7. Submit report on line on BB and print the report and hand it to your instructor. For late submissions 2 marks off for late submission penalty applies.
Exercises (chapter 7-10)
1. Create a view VE1 that will customer_id, cust_title and the total amount for each customer.
2. You want the same group results as in 1 but only for customers with total amount more than 25,000 (HAVING). Can you do it with a consecutive view VE2 from VE1, if not why? If cannot do it as consecutive do it as new view VE3.
3. Create a view VV4 to list customer_id, amount, possibility, cust_title. Create a consecutive view from VV4, named High_Possibility, for leads with possibility >80. Create a consecutive view from VV4 called TX_CUST_LIST to list the same three attributes for only the Texas customers. Why you cannot do It?
4. Using the ROUND function create an SQL query that will COUNT LEADS in possibilities in 10S. Show only 10s with that counted more than three possibilities. Shorted by 10s.
5. Using the FLOOR function show create an SQL query that will COUNT LEADS in AMOUNTS IN EVERY 5000 but with only for leads with amount more than the average amount. Shorted by 5000s.
6. Write an SQL query that will display the customers as customer title in capital, underscore, City with first letter in Capital and the rest in lower characters, dot, state in capital,dot, and zip code inside brackets [], dot , telephone the first three character in parenthesis followed by a dash . Example:
NCR_Houston.TX.[55120].(345)-99345625
7. Write an SQL query that will lead list all leads with expected day 2000 days before today.
8. Us ...
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
PHP Frameworks: I want to break free (IPC Berlin 2024)
Weka term paper(siddharth 10 bm60086)
1. Data mining technique using WEKA
IT for Business Intelligence
Submitted By:-
Siddharth Verma 10BM60086
2. WEKA
Data mining isn't solely the domain of big companies and expensive software. In fact, there's a
piece of software that does almost all the same things as these expensive pieces of software —
the software is called WEKA. WEKA is the product of the University of Waikato (New Zealand)
and was first implemented in its modern form in 1997. It uses the GNU General Public License
(GPL). The software is written in the Java™ language and contains a GUI for interacting with
data files and producing visual results (think tables and curves). It's Java-based, so if we don't
have a JRE installed on your computer, download the WEKA version that contains the JRE, as
well.To load data into WEKA, we have to put it into a format that will be understood. WEKA's
preferred method for loading data is in the Attribute-Relation File Format (ARFF), where we can
define the type of data being loaded, then
supply the data itself.
When we start WEKA, the GUI chooser
pops up as shown in figure
It lets us choose four ways to work with
WEKA and our data. The four ways are
Explorer
Experimenter
Knowledge Flow
Simple CLI
REGRESSION
Regression is the easiest technique to use, but is also probably the least powerful. In effect,
regression models all fit the same general pattern. There are a number of independent
variables, which, when taken together, produce a result — a dependent variable. The
regression model is then used to predict the result of an unknown dependent variable, given
the values of the independent variables.
We will perform Regression on the Colleges data comparing them on basis of various attributes.
Various attributes are:-
School: Contains the name of each school
3. School_Type: Coded 'LibArts' for liberal arts and 'Univ' for university
SAT: Median combined Math and Verbal SAT score of students
Acceptance: % of applicants accepted
$/Student: Money spent per student in dollars
Top 10%: % of students in the top 10% of their h.s. graduating class
%PhD: % of faculty at the institution that have PhD degrees
Grad%: % of students at institution who eventually graduate
To create our regression model, start WEKA, then choose the Explorer. In the Explorer screen,
select the Preprocess tab. Select the Open File button and select the ARFF file. After selecting the
file the explorer window looks as below
In the left section of the Explorer window, it outlines all of the columns in the data (Attributes)
and the number of rows of data supplied (Instances). By selecting each column, the right
section of the Explorer window will also give the information about the data in that column of
the data set. For example, by selecting the SAT column in the left section the right-section
should change to show the additional statistical information about the column. Finally, there's
a visual way of examining the data, which can be viewed by clicking the Visualize All button.
4. To create the model, click on the Classify tab. The first step is to select the model we want to
build, so WEKA knows how to work with the data, and how to create the appropriate model:
1. Click the Choose button, then expand the functions branch.
2. Select the LinearRegression leaf.
This tells WEKA that we want to build a regression model.
5. Though it may be obvious to us that we want to use the data we supplied in the ARFF file, there
are actually different options than what we'll be using. The other three choices are Supplied
test set, where we can supply a different set of data to build the model;Cross-validation, which
lets WEKA build a model based on subsets of the supplied data and then average them out to
create a final model; and Percentage split, where WEKA takes a percentile subset of the
supplied data to build a final model. With regression, we can simply choose Use training set.
Finally, Choosing no attribute method to determine each attributes contribution to regression.
The last step to creating our model is to choose the dependent variable one by one all
numerical attributes.
6. Right below the test options, there's a combo box that lets you choose the dependent variable.
Choosing SAT as dependent variable .To create our model, click Start. Figure below shows the
output window
INTERPRETATION OF THE RESULT:
SAT = ……..+ (30.6632* School) + (-1.245* School type) + (0.0609* Acceptance) +
(0.0341* Top) +(0.064* 10%) +(0.1479* PHD%) – 1089.2569
Interpreting the pattern and conclusion that our model generated we see that besides just a
strict house value:
SAT affects choice of School — WEKA tells us that Sat score affects choice of school the
most
School Type do not matter — Since we use a simple 0 or 1 value for an upgraded
bathroom, we can use the coefficient from the regression model to determine the value
of an upgraded bathroom on the house value.
SAT score has no correlation with money spent per student. — WEKA will only use
columns that statistically contribute to the accuracy of the model. It will throw out and
ignore columns that don't help in creating a good model. So this regression model is
telling us that no effect of SAT score on relation with $ spent on students.
7. CLUSTERING
Clustering is the task of assigning a set of objects into groups (called clusters) so that the
objects in the same cluster are more similar (in some sense or another) to each other than to
those in other clusters. WEKA offers clustering capabilities not only as standalone schemes, but
also as filters and classifiers.
To begin with clustering we will use the data set of places. The data set contains places
classified on bases of – The nine rating criteria used by Places Rated Almanac are:
Climate & Terrain, Housing, Health Care & Environment, Crime, Transportation, Education, The
Arts, Recreation and Economics
To create clustering, start WEKA, then choose the Explorer. In the Explorer screen, select
the Preprocess tab. Select the Open File button and select the ARFF file. After selecting the file
the explorer window looks as below
To perform clustering, select the "Cluster" tab in the Explorer and click on the "Choose" button.
This results in a drop down list of available clustering algorithms. In this case select
"SimpleKMeans". Next, click on the text box to the right of the "Choose" button to get the pop-
up window shown in Figure below, for editing the clustering parameter.
8. In the pop-up window we enter 5 as the number of clusters (instead of the default values of 2)
and we leave the value of "seed" as is. The seed value is used in generating a random number
which is, in turn, used for making the initial assignment of instances to clusters.
Once the options have been specified, we can run the clustering algorithm. Here we make sure
that in the "Cluster Mode" panel, the "Use training set" option is selected, and we click "Start".
We can right click the result set in the "Result list" panel and view the results of clustering in a
separate window.
We can choose the cluster number and any of the other attributes for each of the three
different dimensions available (x-axis, y-axis, and color).
9. Different combinations of choices will result in a visual rendering of different relationships
within each cluster.
INTERPRETING THE RESULT:
Each cluster shows us a type of behavior in our customers, from which we can begin to draw
some conclusions:
Cluster 0: Transportation facility is best in this place and it excels in educations as well. This place is rich
in arts and recreations as well.
Cluster 1: This place has least crime. Relatively low utility bills, property taxes, mortgage payments
makes it favorable place to live in.
Cluster 2: This place has highest violent crime rate and property crime rate. People in this place have to
pay highest utility bills, property taxes, mortgage payments but it has best climate and terrain and best
health care and environment too. Transport facilities are also good. This place is rich in arts and
recreations various venues are available which are best in categories.
Cluster 3: This place has least utility bills, property taxes, mortgage payments and therefore favorable.
Transport facilities are in bad shape. Little or no avenues of arts and recreation available. Lowest
average household income among all.
Cluster 4: This place has high crime rate. Health care and Environment is worse among all places and so
are education facilities. Highest average household income among all.