SlideShare a Scribd company logo
1 of 73
Download to read offline
Summer Internship Report Page 1
Summer Internship Report
s
8th
July, 2016
Submitted By
Durga Kant Gupta
(Roll No. 13267)
Undergraduate student at IIT Kanpur
Department: Biological Sciences and Bio Engineering
Under The Guidance
Of
IndiaMART Guide: IndiaMART Co-Guide:
Mr. Somesh Kumar, Mr. Anirudh Singh,
VP Business Analytics, Asst. VP Business Analytics,
IndiaMART InterMESH Ltd. IndiaMART InterMESH Ltd.
Summer Internship Report Page 2
CONTENTS
ACKNOWLEDGEMENT..........................................................................................................................4
ABOUT THE COMPANY ........................................................................................................................5
CORE VALUES: .....................................................................................................................................5
PRODUCTS:..........................................................................................................................................6
LISTING SERVICES: ...............................................................................................................................7
BUY LEADS:..........................................................................................................................................7
ACCESS TO SERVICE: ............................................................................................................................8
SOFTWARE OR LANGUAGES USED:.......................................................................................................9
DESCRIPTION OF PROJECTS / ACTIVITIES............................................................................................10
PROJECT#1 ........................................................................................................................................10
AIM: ..................................................................................................................................................10
PROCEDURE: .....................................................................................................................................10
COMPARISON WITH THE CURRENT SEARCH ALGORITHM: .................................................................13
RESULT:.............................................................................................................................................15
PROJECT#2 ........................................................................................................................................16
AIM: ..................................................................................................................................................16
PROCEDURE: .....................................................................................................................................17
PROJECT#3 ........................................................................................................................................18
AIM: ..................................................................................................................................................18
DATA DESCRIPTION: ..........................................................................................................................18
PROCEDURE: .....................................................................................................................................19
PROJECT#4 ........................................................................................................................................20
Aim:...................................................................................................................................................20
PROCEDURE: .....................................................................................................................................20
PROJECT#5 ........................................................................................................................................20
PROCEDURE: .....................................................................................................................................21
BIBLIOGRAPHY:..................................................................................................................................24
APPENDIX:......................................................................................................................................24
Summer Internship Report Page 3
CERTIFICATE OF INTERNSHIP COMPLETION
This is to certify that Mr. Durga Kant Gupta, a 3rd
year undergraduate student of Biological
Sciences and Bio Engineering department at Indian Institute of Technology, Kanpur has
successfully completed his summer internship from 9th
May, 2016 to 8th
July 2016.
During this period his performance was excellent and we found him dedicated, hardworking and
sincere. We have derived immense benefit from the project and his contribution to our
organization is highly appreciated.
I hereby convey my best wishes to him for all his future endeavors.
Somesh Kumar | VP - Business Analytics
Mobile: +91-9717776552
Email: somesh.kumar@indiamart.com
IndiaMART InterMESH Ltd.
"Kaam Yahi Banta Hai"
7th Floor, Advant-Navis Business Park,
Plot No -7 Sector-142, Noida - 201305
Ph: +91-(0120)-6777 777 Extn : 7787
Summer Internship Report Page 4
ACKNOWLEDGEMENT
I take this opportunity to extend my sincere thanks to IndiaMART for offering me a unique
platform to gain exposure and garner knowledge in the field of Business Analytics.
I would like to extend my heartfelt gratitude to my Internship guide Mr. Somesh Kumar and
co-guide Mr. Anirudh Singh for having made my summer training a great learning experience
by their constant guidance, encouragement and support.
Last but not the least I would like to express my profound gratitude to each and every employee
of Business Analytics Division, IndiaMART InterMESH Limited who contributed in their own
ways in successful completion of my Internship.
Durga Kant Gupta
Summer Internship Report Page 5
ABOUT THE COMPANY
IndiaMART is India‟s largest B2B online marketplace, connecting buyers with suppliers. The
online channel focuses on providing a platform for buyers, who can be SMEs, large enterprises
as well as individuals. Buyers typically gain access to a wider marketplace; diverse portfolios of
quality products to choose from and tap a one-stop-shop which caters to all their specific
requirements, thereby aiding the discerning buyer make well-informed choices!
IndiaMART offers a platform and tools to over 2.6 crore buyers to search from over 3.3 crore
products and get connected with over 22 lakh reliable and competitive suppliers. Founded in
1996, the company‟s mission is „to make doing businesses, easy‟.
CORE VALUES:
There are four Core values of IndiaMART, in short known as TRIP.
 Team Work: “Together we can achieve the impossible” is our belief. Our success is a
result of our team work. We have experts from the field of management, marketing, IT,
arts, content & various other disciplines who work cordially as a team on every project,
every endeavor. Dedication and passion are the true means to our mission fulfillment.
 Responsible: Responsible, not just for quality work but for continuous self-development,
of our decisions and of our actions. This helps us think rationally and provides a sense of
accountability to ourselves, our commitment to customers and to our colleagues.
 Integrity: We realize the importance of the job & information we handle. We understand
the responsibility that each member of our team has to shoulder and we do that with
highest levels of trust, honesty and integrity – of purpose and action.
 Passion: Work at IndiaMART involves constant innovation and creativity. It involves a
continuous thought process to get tangible benefits for our customers, taking into account
the uniqueness of their purpose. Passionate people with a determination to make the
difference are the ones who make this possible.
Summer Internship Report Page 6
Customers are of two types:
 Buyers: Users who use the service with an intention to buy something.
 Suppliers: Users who use the service with an intention to sell something.
A customer can be both Buyer and Supplier.
Suppliers are of two types:
 Free Listed Customers: Use basic service which is available at zero cost.
 Paid Customers: Bought products and listing service by paying some cost.
IndiaMART works on Freemium model. It earns revenue from the products, listing
services and buy leads packages.
PRODUCTS:
IndiaMART offers following products:
 MDC (Mini Dynamic Catalogue): IndiaMART develops compact 4 page home page
showcasing key strengths of the customer, Zoom up Window for detailed product view
and Preferred Number Service. Website hosted on sub domain. Also 10 Buy
Leads/Tenders worth Rs. 2000 free every month under IndiaMART Advantage Program.
 Maximiser: Website hosted on personalized domain. 360 degree visibility through
PDF/Mobile Video (30 sec). 10 Buy Leads/ Tenders worth Rs. 2000 free every week
under IndiaMART Advantage Program. Add up to 400 products. Preferred Number
Service
Summer Internship Report Page 7
LISTING SERVICES:
The various listing services are as follows:
 TrustSEAL: Third party verified TrustSEAL report. Edge over non-certified competitors
online. Certified members attract genuine buyers & more business enquiries.
 Star supplier: Priority listing among other catalog clients. Corporate video of supplier‟s
company. Preferred Number Service. 15 Buy Leads/ Tenders worth Rs. 3000 free every
week under IndiaMART Advantage Program
 Leading supplier: Priority listing among all clients. Corporate video of supplier‟s
company. Preferred Number service. 20 Buy Leads/Tenders worth Rs. 4000 free every
week under IndiaMART Advantage Program.
 Keyword Premium Listing: Listing of clients as per Keywords (for products to be
bought) typed by buyers. One keyword can be bought by a single supplier only.
 Featured Premium Listing: Listing of clients as per their preferred city for business.
 Industry leader: Top priority listing service. Will always be listed on top whenever a
product is searched related to that industry. Only one supplier can be an Industry leader
of any particular industry.
These listing services help in Search Engine Optimization which facilitates the visibility of
suppliers on the platform.
BUY LEADS:
Buy Leads provide instant access to Buyers and their requirements.
Buy Leads are generated through three ways:
 Free buy requirement: Buy requirement made by buyers to IndiaMART.ss
 Direct buy requirement: Buy requirement made by buyers directly to the suppliers.
Summer Internship Report Page 8
 Intent: Our system analyzes activities of users on the website and application, and figures
out their intent to buy product if any. Henceforth, it creates buy leads and posts them to
supplier‟s account after verification.
These leads are posted at supplier‟s account and they can buy the leads as per their requirement.
So Buy leads package, provide a pre-paid system for having Leads in your account which they
can consume at any point of time.
Customers access the service both from website and mobile application.
IndiaMART Website:
Suppliers can purchase any of the products or listing service for three different tenures i.e.
monthly, annually or 3-years (multi yearly).
ACCESS TO SERVICE:
Summer Internship Report Page 9
SOFTWARE OR LANGUAGES USED:
 R
R is the free software environment and programming language for statistical analysis and
graphics. The R language is widely used by statistician and data miners for various
statistical analysis and statistical software development. R is supported by wide varieties
of UNIX platforms, windows and MacOS. I used R to perform various statistical analysis
and text mining. Some examples of the libraries used are stringr(), stringdist(), plyr() etc.
Version used: R 3.2.1
 SQL
Used it for extracting the required data for the analysis from the online database system.
SQL (Structured Query Language) is a standard interactive and programming language
for getting information from a database. Queries take the form of a command language
that lets you select , insert , update, find out location of data and so forth. There is also a
programming interface.
Download this free guide
Summer Internship Report Page 10
DESCRIPTION OF PROJECTS / ACTIVITIES
PROJECT#1
AIM:
Analysis and Implementation of Product which maps a product to its most relevant Mcat by
considering the maximum string match and maximum number of leads for that Mcat in the
previous three months.
DATA DESCRIPTION:
Product Name –
Contains the list of all the product names to which we have to assign the most relevant Mcat
PC_ITEM_GLUSR_ID PC_ITEM_ID PC_ITEM_NAME MCAT_ID
Lead Name –
Contain the LEAD_OFR_TITLE to which the product name is matched and the corresponding
MCAT_ID is stored in the Match_Results.
ETO_OFR_TITLE MCAT_ID ETO_OFR_GLCAT_MCAT_NAME
Match Results -
The output containing 7 columns , having the info related to the best match of OFR_TITLE and
product names.
Match Results Final-
Here we also considered the no. of leads corresponding to the Mcat IDs selected based on string
matching. Merge to files by GL_MCAT_ID . And sort the result based on the no. of leads.
PROCEDURE:
1. First I removed “ ,” and ( ) from the PC_ITEM_NAME and then the extra spaces
produced due to removing , and (). This was done using regular expressions in R.
Summer Internship Report Page 11
2. Then I removed “ ,” and ( ) from the ETO_OFR_TITLE and then the extra spaces
produced due to removing , and (). This was also done using regular expressions in R.
3. Since I had to match the PC_ITEM_NAME with the LEAD_OFR_TITLE and to find out
how
much match is there, I had to break up the PC_ITEM_NAME into smaller fragments.
4. So, I splitted the PC_ITEM_NAME into single words using strsplit() function in stringr()
library of R and put this output in a list. Splitted_Row is a list of splitted
PC_ITEM_NAME.
5. Similarly I splitted the LEAD_OFR_TITLE and stored the output in a list.
Splitted_OFR_ID is a list of splitted LEAD_OFR_TITLE.
6. Now I created vectors of columns of Product_Name matrix and put them in a list and as a
list so that accessing the elements of a list becomes easy.
7. Similarly I created vectors of columns of Lead_Name matrix and put them in a list and as
a list so that accessing the elements of a list becomes easy.
8. Then I combined these vectors using cbind() function in R and formed two datasets
namely product_list1 and Lead_list1 which had list inside list.
9. After all this data preparation and modifications I started with the loop. Before that I
created a null dataframe namely Match_Results which had these columns OP_USR_ID,
OP_ITEM_ID,OP_ITEM_NAME,OP_MCAT_ID,OP_LN_OFR_TITLE,OP_LN_MCAT
_NAME,
OP_LN_MCAT_ID all initiated to zero.
Loop:
1. For a particular row in the product_list1 which contain the splitted PC_ITEM_NAME,
access its elements one by one and check if it matches with any of the elements in the
Lead_list1 which contains the splitted LEAD_OFR_TITLE.
2. Once a match is found increase the score by one and check for the next splitted word of
the same PC_ITEM_NAME.
3. If the last word of splitted PC_ITEM_NAME matches with any fragment of the splitted
LEAD_OFR_TITLE then increase t by 1.
Summer Internship Report Page 12
4. Similarly If the second last word of splitted PC_ITEM_NAME matches with any
fragment of the splitted LEAD_OFR_TITLE then increase v by 1.
5. After checking all the fragments of the splitted PC_ITEM_NAME with the splitted
LEAD_OFR_TITLE, check the values of s, t and v.
6. To consider LEAD_OFR_TITLE as a match it has to satisfy certain criteria. The values of
s and t should not be equal to 0 that means that last and second last word must
compulsorily match.
7. The next condition to consider a LEAD_OFR_TITLE as a match is that it should satisfy a
certain threshold of percentage match with PC_ITEM_NAME which varies with different
length of different PC_ITEM_NAME.
8. Now after deciding that weather this is a match or not go to the next
LEAD_OFR_TITLE and do the same. It has be done for all the LEAD_OFR_TITLE.
9. If a particular LEAD_OFR_TITLE is considered as a match then put this
LEAD_OFR_TITLE in OP_LN_OFR_TITLE which is a null vector. Similarly put the
corresponding MCAT_ID and MCAT_NAME in the OP_LN_MCAT_ID and
OP_LN_MCAT_NAME vectors respectively.
10. Similarly for a particular PC_ITEM_NAME if a match is found in the
LEAD_OFR_TITLE , then put corresponding MCAT_ID, PC_ITEM_NAME,
GLUSR_ID and PC_ITEM_ID in the null vectors OP_MCAT_ID, OP_ITEM_NAME,
OP_USR_ID and OP_ITEM_ID respectively and use rbind() function to repeat the
observations till the loop iterates for LEAD_OFR_TITLE.
11. Now use the cbind() function to combine all of the above mentioned vectors and give it
the name Match_Results_K. This is only for one PC_ITEM_NAME.
12. So, repeat it for all the PC_ITEM_NAME and use rbind() function to get the final result in
a dataframe which was named as Match_Results.
13. Now before going for the next PC_ITEM_NAME, empty all the vectors so that they can
store the new values related to next PC_ITEM_NAME.
14. Finally Remove the first row of zeroes from the Match_Results and merge it with
Lead_Count data by MCAT_ID and the final result as Match_Results_Final which have
Summer Internship Report Page 13
all the Match_Results data along with the no. of leads corresponding to every
PC_ITEM_NAME.
15. Then I exported this data in csv format using write.csv() command in R. And sorted the
output in excel by PC_ITEM_NAME and then added a level of no. of leads.
16. This gave me the final output which contained the PC_ITEM_NAME and all the
LEAD_OFR_TITLE which were considered as a match in sorted format according to the
no. of leads. The LEAD_OFR_TITLE with maximum of no. of lead comes at top for a
particular PC_ITEM_NAME.
OP_LN_MCAT_ID OP_USR_ID OP_ITEM_ID OP_ITEM_NAME
OP_MCAT_ID OP_LN_OFR_TITLE OP_LN_MCAT_NAME NO._OF_LEADS
COMPARISON WITH THE CURRENT SEARCH ALGORITHM:
When we were ready with our algorithm which maps a product to an Mcat which gets the
maximum no. of leads, Mr. Samarendra Pratap (AVP, Product Management, IndiaMART) gave
a list of 9000 products.
On these products we had to run our algorithm and find the difference in the no. of leads for a
particular product mapped to a particular Mcat by our algorithm and the current search algorithm
which is live in the system.
DATA DESCRIPTION:
1. Samar_Products
Table containing all the products wih more than one Mcat assigned, elements in the first column
repeat themselves instead of putting "" so that merging is possible when required.
2. Lead_Count
It is the master Mcat which contains the no. of leads corresponding to every Mcat
3.paid_supplier_new_products_mcat1
All the products with more than one Mcats, items in left column repeat themselves
4. Samar_Products_Max_Leads
Data containing the product and the corresponding Mcat which comes on top in search
Summer Internship Report Page 14
results when searched on IndiaMART portal.
5. Somesh_Final_Results
Data of all the products and all the corresponding Mcats with the no. of leads
6. Somesh_Final_Max_Results
A subset of Somesh_Final_Results, where it contains only the Mcats with max leads
7. Samar_Max_Results
Data containing the no. of leads corresponding to only the Mcat which comes on top in search
results
8. paid_supplier_new_products_mcat
The original data set of 9k products with removed blank rows
9. Samar_Final_Max_Results
Data containing the product and the corresponding Mcat which comes on top in search
results and also the lead count
10. paid_supplier_new_products_mcat2
Top most Mcat corresponding to a product
LOOP:
1. Firstly I removed the products which have only one Mcat because in that case no
comparison could be made.
2. To fecilitate merging I had to repeat the Product in the 1st column for the corresponding
Mcats.
For this I checked if the first column is blank and the Mcat column has some value in it,
then put the value of the previous row in the product column to the current cell.
3. Now I merged it with the Lead count table which contain the no. of leads for a Mcat.
4. The result is the table which contain all the products with their corresponding Mcats and
the no. of leads.
5. Finally I considered only the Mcat which had the maximum leads corresponding to a
product.
Summer Internship Report Page 15
6. Also create a table which contains only one Mcat which comes on top on searching on
IndiaMART portal corresponding to a particular product.
7. Merge this table with the Lead Count data by Mcat ID. Now we have the data of the
product, the corresponding Mcat which comes on top while searched on IndiaMART
portal and the corresponding no. of leads.
8. After all this we can compare or find the difference between the no. of leads of Mcat
assigned by our algorithm and that which comes on top when being searched.
RESULT:
After analyzing the output files and comparing them, It was found that on an average 142 leads
comes to an Mcat which comes on top while being searched.
While if we apply our algorithm and assign a different Mcat with maximum leads to the same
product. On an average the new no. of leads would be 321.
Therefore the gain is of 179 leads per product which will make suppliers much more happy and
engaged on IndiaMART portal.
Summer Internship Report Page 16
PROJECT#2
AIM:
Analysis regarding the auto-rejection of intent generated leads by matching their secondary
Mcats in deleted leads . Also to find the potential loss that would occur if auto-rejection system
is implemented.
DATA DESCRIPTION:
Deleted_total.csv
The list of all the deleted leads with code 1 and 45 which implies – with manual deletion and
auto deletion. The corresponding deletion date is also mentioned.(1st
-7th
May)
FK_GLUSR_USR_ID ETO_OFR_FENQ_DATE FK_GLCAT_MCAT_ID
Approved_total.csv
This is the data of leads which were approved in the same period. From this we can infer the
potential loss by matching the secondary Mcats.
Secondary_Mcat.csv
The data of all the leads which were live during 1st
– 7th
May.
From this data we can check if there are leads with same secondary Mcats as in deleted leads.
Then that many leads could have been auto rejected.
And we can also check for the potential loss by finding leads with same secondary Mcats as in
approved leads. Then that many approved leads would have been auto rejected if we implement
the auto-rejection system.
mydata
The output file, which had the following format.
USR_ID OFR_ID MCAT_ID
Summer Internship Report Page 17
PROCEDURE:
To find the leads which could have been Auto-rejected:
1. First of all to find all the leads which could be autorejected, I checked for the cases in the
deleted leads which had same secondary Mcats.
2. Now I checked for another condition that those leads where there was a secondary Mcat
match must have the same GLUSR_ID.
3. The next condition to check was that the lead must have been offered before the deletion.
4. Since Secondary_Mcat table also contains some primary Mcats, Therefore the next
condition was to check if there was a primary Mcat match then just Ignore that case.
To check for potential loss:
1. Here, I followed the same steps as mentioned above. The only change here is to use the
Approved_table instead of Deleted_table.
2. Approved table contains all the approved leads. Now I applied the same conditions as
mentioned above and found the leads with matching secondary Mcats.
3. These matches implies that these many leads would be auto-rejected if we implement the
auto-rejection system. So, in other way we came to know the potential loss.
4. For both of the above activities the output was in the following format.
RESULT:
1. The exact no. of deleted leads which could have been auto-rejected was found to be
2690.
2. While the exact no. of approved leads which would have been auto-rejected by
implementing the auto-rejection system was found to be 8440.
3. Therefore, it was found that the loss due to the rejection of good leads is more than the
cost saved from rejecting false positive leads . So, it was decided not to implement the
autorejection system.
Summer Internship Report Page 18
PROJECT#3
AIM:
For a particular ticket raised by a customer, find which of the standard issues were present in the
description of the ticket by string matching. The expected output is a matrix with the following format: A
particular row is corresponding to Ticket_ID and all the columns are corresponding to the standard issues
which can be possibly the reasons for ticket generation. These columns should contain 1 if the that issues
is present else 0.
DATA DESCRIPTION:
Somesh_Ticket:
Data related to the tickets raised by suppliers. It contains their user Ids, customer ticket Ids , the
date of ticket issue, ticket detail, appendum and the history of the ticket.
GLID TICKET_ID ISSUE_DATE TICKET_DETAIL APPENDUM TICKET_HISTORY
Result:
Output table containing all the above mentioned columns and 1 and 0 in the new columns. Other
new columns are following:
These are which created to capture some particular issues, if they are present in text provided by
a particular customer.
STOP NO_BENEFIT NO_MATURITY NO_TIME_TO_USE
FAKE_BUYERS IRRELEVANT_ENQUIRIES FAKE_BL NOTICE_PERIOD
BUSINESS_CLOSED HYPER_LOCAL_BUYER MISCOMMITMENT PHYSICAL_VISIT
NOT_TECH_SA
VY
BUYER_WANT_LOW_PR
ICE
LANGUAGE_BARRI
ER
CHANGES_REQUIR
ED
WRONG_PRODUCT WRONG_IMAGE WRONG_CATALOG CHANGE_NUMBER
Summer Internship Report Page 19
CHANGE_EMAIL CLOSE_ACCOUNT NOT_INTERESTED
REMOVE_PRICE DIFFERENT_CATALOG CHANGE_AFTER_APPROVAL
PROCEDURE:
1. Firstly I created a null dataframe with above mentioned column names so that they
represent different standard issues and initially assigned them the value 0.
2. The list of standard issues is following:
1."stop the service/ Deactivate the service"
2."Did not get benefit/ No Benefit/Did not get Business"
3."Did not get maturity/ No Maturity/ Maturity Issues/Deal Not Maturing"
4."No time to use the service"
5."Buyers not responding/fake Buyers/Fake Leads"
6."Irrelevant Enquiries/ Less Enquiries/Bulk Enquiries/Low Enquiry/Retail
Enquiry/Wrong Enquiry"
Enquiry/Inquiry/Query
7."Wrong Buy Leads/Fake Buy Lead/ Wrong BL"
Buy Lead= BL
8."Notice Period"
9."Business Closed/Out of India/Partnership Issue/Personal reason/Changed My
Business"
10."Need hyper-local buyer/ Hyper local enquiries"
11."Wrong commitment from sales/Miscommitment/mis-commitment"
12."physical visit"
13."Client is not Tech Savy/Tech Savvy/ Computer Savvy"
14. "Buyer quote very less price/Buyer asking Low Price/Buyer want low price"
15. "Language Barrier/Tamil"
16. Changes required
17. Wrong Product
18. Wrong Image
19. Wrong Catalog
20. Change number
21. Change email
22. Close the account
23. Not interested
24. Remove Price
25. not the same catalog that i approved
26. change after approval/ change after hosting approval
3. In every ticket description I checked for the following strings using grepl() function in R.
Each string has a corresponding column in the output dataframe.
Summer Internship Report Page 20
4. If string was found to be present I put 1 in that column for that particular row else I put
zero.
PROJECT#4
Aim:
To predict weather a customer is going to renew his subscription or not.
DATA DESCRIPTION:
Complete file.csv
Data containing the information about the customers eg. What is their turn over value , state from
which they operate , how many emloyees they have, what is their business type ie. manufacturer,
wholesale trader, service provider etc.
PROCEDURE:
 Import the data file in R.
 Then consider columns in the input as deciding variables.
 Create a model to predict whether an existing customer is going to renew his subscription
at the end of his subscription cycle or not.
 Use decision tree C5.0 in R to create a large set of rules which will be used for final
predictions as stated above.
PROJECT#5
Aim:
To find out the most recurring Brands for a particular Mcat so that they can form separate
category. Also to find out the most asked specifications for products, so that only these
specifications can be made compulsory for the agent to enquire and get rid off not so important
ones to reduce the calling cost.
DATA DESCRIPTION:
Direct _Text:
Table containing only 2 columns ie. lead description and the Mcat of direct leads. This
description is in text format and contains all the brands and specifications which we have to find
out.
Brand_Data:
Summer Internship Report Page 21
The table which contains the description of the lead, Brand name and Mcat name (3 columns).
These brand names(2nd
column) have been found out from the 1st
column of the above table.
Brand_Data_Result:
The output table which contains along with the 3 columns mentioned above of Brand_Data - all
the specifications corresponding to that Lead eg. size, quantity, budget etc.
PROCEDURE:
To find the Brand Names :-
1. Firstly I was instructed to look for the Brand names in the Lead description text.
2. I considered only two columns namely ETO_OFR_DESC and the MCAT .
3. Now in the description column of this new sheet I searched the word “Brand” so that
we can find the leads in which indeed some Brands were mentioned.
4. Then we considered the text after the word “Brand” and continued till the new line starts,
therefore it considered the multiple Brand names separated by “and”, “comma” and “or”.
5. After splitting the text based on the above words all the brands can be separated.
6. Put all of these Brands and corresponding Mcat in different rows.
7. Remove the rows which contain wrongly captured Brand names eg. „any‟, „other‟ and
„all‟.
8. Remove the Brands which starts with a number except “3m” because that‟s a brand.
9. Now at this point we have all the genuine Brands and the corresponding Mcats. The
output is in the following format.
Description Brand_Name Mcat_Name
To find the specifications:
Here also quite similar procedure was followed as mentioned above. In this first case I
splitted the text using strsplit() function in stringr library of R by “:” because almost all the
specifications had this in common eg. Budget: 50000 INR, Model: 5690 etc.
1. Now the splitted text is in the list format. Check if the length of this list is less than 2. If
yes then consider the only first three columns in the final output. Otherwise split the list
by “n”.
2. After doing this access the elements of the list one by one, and attach the last word of a
string to the first word of next consecutive string. This attachment can be done by paste()
function in R
Summer Internship Report Page 22
3. Now put the result of the attachment in a different column of the result. It was decided
that maximum no. of columns can go up to 10 so that almost all the specifications can be
captured.
4. Since in a particular lead the word “Brand:” can be anywhere either before, after or in
between the specifications. So to keep the specifications aside what I did was that I
looked for the word “Brand:” in all the columns for a particular row and wherever I found
it I applied the Swap operation between that value and the value in the 4th
column so that
all Specifications come in 5th
to 10th
column not before that.
5. The header of the output is in following format. The 2nd
and the 4th
column have the
same value. Its just that the Brand name in 4th
column comes after “Brand:”.
Description Brand Mcat Brand: Spec1 Spec2 Spec3 Spec4 Spec5 Spec6
The summary of the Results:-
1. Using pivot table, we generated the summary in which the counts of all the Mcats
mapped to a particular Brand are mentioned.
2. Similarly we created another summary sheet which contained the counts of Brands
mapped to a particular Mcat.
To find the count of Brands and Specifications for a particular Mcat :-
In this part the required output is in the following format.
Mcat_Name Count_Of
All_Brands
Brand_Name Individual
Brand_Count
Specifications Specification_Freq
1. The output of the previous activity has been used as input for this one with certain
modifications.
2. The first modification required was to remove the values attached with the specifications eg.
Budget: 50000 INR, I had to remove the value after “:” so that only specification name
remains and which can be counted easily.
3. For this I splitted the specification text by “:” using strsplit() function in R which returns a
list. In this list the first element is the specification name like “budget” and the second
Summer Internship Report Page 23
element is the value of that specification ie. “50000 INR”. The first is what we needed to put
instead of the whole text.
4. Also It was needed to convert the specification‟s name to lower case so that “Budget” and
“budget” are not different from each other when converted into factors should give a
cumulative count.
5. Then I removed those specifications which were pure numeric in nature and occur due to
error.
6. This was all the data preparation that was needed for this activity. Now I created a null
dataframe called Result which had 6 columns same as mentioned above and all of them were
initiated to zero.
7. The specification file has the following format.
Mcat Brand: Spec1 Spec2 Spec3 Spec4 Spec5 Spec6
8. To find the total count of brands, I checked that how many times a particular Mcat in the first
column repeated itself.
9. Now put all the specifications in a vector b. It contains repeated specfications which means
that all the specifications for every brand for a particular Mcat are in this vector.
10. Now put all the brands in a vector a. It contains repeated brands which means that all the
brands for a particular Mcat are in this vector.
11. After all this we have to find out the individual brand count and individual specifications
count for a particular Mcat.
12. For this I used count() function in plyr() library in R. This function is used to return a
dataframe of frequency of different variables.
13. I converted the elements of vectors a and b to factors so that count() function can work and
can return the dataframe which contains the frequency of the variables. These dataframes are
named as summary_a and summary_b.
14. Now unfactor these dataframes using unfactor() function in varhandle() library of R, so that
the elements in these dataframes can be accessed in put into the final result sheet.
15. To know how much rows are required in the final result for a particular Mcat, find the
maximum length among (length(summary_a[[1]]), length(summary_b[[1]]), count).
16. Finally put the values from summary_a and summary_a in the final result dataframe along
with their total and individual count corresponding to a particular Mcat.
Summer Internship Report Page 24
BIBLIOGRAPHY:
I referred to some books which had provided me with much of guidance for the project. Apart
from domain knowledge these books had provided us deep insights of the subject.
BOOKS:
 R for programmers by Norman Matloff
 Introducing Python by Bill Lubanovic
APPENDIX:
#Project 1: Part1 -
# To remove the , and ( ) from the product names
Product_Name[,3] <- str_replace_all(Product_Name[,3], "[^[:alnum:]]", " ")
#After that remove extra spaces produced due to removing , and ()
Product_Name[ ,3] <- gsub(pattern = "s+", replacement = " ", Product_Name[ ,3])
# We have to remove the , and ( ) from the Lead names
Lead_Name[,1] <- str_replace_all(Lead_Name[,1], "[^[:alnum:]]", " ")
#After that remove extra spaces produced due to removing , and ()
Lead_Name[ ,1] <- gsub(pattern = "s+", replacement = " ", Lead_Name[ ,1])
# Now do the splittig of product names
#To produce splitted text of Product_Name in list format
test = 0
for (i in 1:length(Product_Name)) {
#for (i in 1:20) {
print(i)
test1 = (strsplit(Product_Name[i,3], " "))
test = rbind(test, test1)
Summer Internship Report Page 25
}
Splitted_Row = test[-1] #because first row is 0
#Splitted_OFR_ID is a list of splitted offer title name
test2 =0
for (i in 1:length(Lead_Name)) {
print(i)
test1 = (strsplit(Lead_Name[i,1], " "))
test2 = rbind(test2, test1)
}
Splitted_OFR_ID = test2[-1] #because first row is 0
#Creating vectors of columns of Product_Name matrix and putting them in a list as a list so that
access in a list becomes easy
PN_USR_ID = list(Product_Name[ ,1])
PN_ITEM_ID = list(Product_Name[ ,2])
PN_ITEM_NAME = list(Product_Name[ ,3])
PN_MCAT_ID = list(Product_Name[ ,4])
#Creating vectors of coloumns of LEAD_NAME matrix and putting them in a list as list so that
accessing the elements becomes easy
LN_OFR_TITLE = list(Lead_Name[ ,1])
LN_MCAT_ID = list(Lead_Name[ ,2])
LN_MCAT_NAME = list(Lead_Name[ ,3])
#combining the data
product_list1 = list( PN_USR_ID = PN_USR_ID, PN_ITEM_ID = PN_ITEM_ID,
PN_ITEM_NAME = PN_ITEM_NAME,PN_MCAT_ID= PN_MCAT_ID,
pn_splitted = Splitted_Row )
Lead_list1 = list( LN_OFR_TITLE= LN_OFR_TITLE, LN_MCAT_ID = LN_MCAT_ID,
Summer Internship Report Page 26
LN_MCAT_NAME = LN_MCAT_NAME,
ln_id_splitted = Splitted_OFR_ID )
# Loop to search matches b/w splitted_row and splitted _ofr_id
# Initialization values
s = 0
t=0
v=0
J=0
flag = 0
count = 0
count1 = 0
Match_Results = cbind(OP_USR_ID=0, OP_ITEM_ID=0,OP_ITEM_NAME=0,OP_MCAT_ID=0,
OP_LN_OFR_TITLE=0,OP_LN_MCAT_NAME=0,OP_LN_MCAT_ID=0)
for( i in 1:300)
#for(i in 1:length(product_list1$pn_splitted))
{ print(i)
count1 = 0
#print(length(product_list1$pn_splitted[[i]]))
J = length(product_list1$pn_splitted[[i]])
for (k in 1:length(Lead_list1$ln_id_splitted))
{
s = 0
Summer Internship Report Page 27
t = 0
for ( j in 1:length(product_list1$pn_splitted[[i]]))
{
for (l in 1:length(Lead_list1$ln_id_splitted[[k]]))
{
# compulsory match for last word
if(identical( product_list1$pn_splitted[[i]][J],
Lead_list1$ln_id_splitted[[k]][l] ) == TRUE)
{ t = t+1 }
# compulsory match for second last word
if(identical( product_list1$pn_splitted[[i]][J-1],
Lead_list1$ln_id_splitted[[k]][l] ) == TRUE)
{v = v+1}
if( identical( product_list1$pn_splitted[[i]][j],
Lead_list1$ln_id_splitted[[k]][l] ) == TRUE )
{
s = s + 1
break
#print(s)
} } }
if(J==1)
{ r = 1 }
if(J == 2) {
r = 1
}
Summer Internship Report Page 28
if(J == 3) {
r = .65
}
if(J == 4) {
r = .7
}
if(J == 5) {
r = .8
}
if(J == 6) {
r = .6
}
if( (s/J >= r | s >= 4) & t!=0 & v!=0 )
{
print(k)
count1 = count1 + 1
if(flag == 0)
{
OP_MCAT_ID = product_list1$PN_MCAT_ID[[1]][i]
# Returning item name of that item
OP_ITEM_NAME = product_list1$PN_ITEM_NAME[[1]][i]
# Returning iuser id of that item
OP_USR_ID = product_list1$PN_USR_ID[[1]][i]
# Returning item id of that item
op_ITEM_ID = product_list1$PN_ITEM_ID[[1]][i]
Summer Internship Report Page 29
# Returning OFFER TITLE
OP_LN_OFR_TITLE = Lead_list1$LN_OFR_TITLE[[1]][k]
# Returning Lead Mcat Id
OP_LN_MCAT_ID = Lead_list1$LN_MCAT_ID[[1]][k]
# Returning Lead Mcat Name
OP_LN_MCAT_NAME = Lead_list1$LN_MCAT_NAME[[1]][k]
flag = 1
}
else {
# Returning mcat Id of that item
OP_MCAT_ID1 = product_list1$PN_MCAT_ID[[1]][i]
OP_MCAT_ID = rbind(OP_MCAT_ID,OP_MCAT_ID1)
OP_ITEM_NAME1 = product_list1$PN_ITEM_NAME[[1]][i]
OP_ITEM_NAME = rbind(OP_ITEM_NAME, OP_ITEM_NAME1)
# Returning user id of that item
OP_USR_ID1 = product_list1$PN_USR_ID[[1]][i]
OP_USR_ID = rbind(OP_USR_ID, OP_USR_ID1)
# Returning item id of that item
op_ITEM_ID1 = product_list1$PN_ITEM_ID[[1]][i]
op_ITEM_ID = rbind(op_ITEM_ID, op_ITEM_ID1)
Summer Internship Report Page 30
# Returning OFFER TITLE
OP_LN_OFR_TITLE1 = Lead_list1$LN_OFR_TITLE[[1]][k]
OP_LN_OFR_TITLE = rbind(OP_LN_OFR_TITLE, OP_LN_OFR_TITLE1)
MCAT_NUM = as.numeric(Lead_list1$LN_MCAT_ID[[1]][k])
OP_LN_MCAT_ID1 = MCAT_NUM
OP_LN_MCAT_ID = rbind(OP_LN_MCAT_ID, OP_LN_MCAT_ID1)
# Returning Lead Mcat Name
OP_LN_MCAT_NAME1 = Lead_list1$LN_MCAT_NAME[[1]][k]
OP_LN_MCAT_NAME = rbind(OP_LN_MCAT_NAME, OP_LN_MCAT_NAME1)
}}}
Match_Results_K = cbind( OP_USR_ID, op_ITEM_ID,
OP_ITEM_NAME, OP_MCAT_ID, OP_LN_OFR_TITLE,
OP_LN_MCAT_NAME,OP_LN_MCAT_ID)
Match_Results_K <- subset(Match_Results_K, !duplicated(Match_Results_K[,7]))
Match_Results = rbind(Match_Results,Match_Results_K)
OP_LN_OFR_TITLE = NULL
OP_LN_MCAT_ID = NULL
OP_LN_MCAT_NAME = NULL
OP_USR_ID = NULL
OP_MCAT_ID = NULL
op_ITEM_ID = NULL
OP_ITEM_NAME = NULL
# Counting the products for which there is no match
Summer Internship Report Page 31
if(count1 == 0)
{ count = count + 1
} }
# To remove the first row of zeroes from the result
Match_Results = Match_Results[ -1, ]
Match_Results_final = merge(Match_Results, Lead_Count,
by.x="OP_LN_MCAT_ID", by.y = "GLCAT_MCAT_ID")
# Finally putting the no. of leads corresponding to different Mcat
#IDs in the Match_Results
# Using merge function
#write.csv(Match_Results_final ,"Match_Results_83.csv")
Part 2 : Comparison
# To remove the products with only one Mcat
for (i in 1:length(paid_supplier_new_products_mcat[[1]]))
#for(i in 29464:29469)
{ print(i)
if(paid_supplier_new_products_mcat[i,1] != "" &
paid_supplier_new_products_mcat[i+1,1] != "" )
{
paid_supplier_new_products_mcat[i,1] = ""
paid_supplier_new_products_mcat[i,2] = ""
} }
# Remove the blank rows in Excel
Summer Internship Report Page 32
write.csv(paid_supplier_new_products_mcat, "Samar_Products.csv")
# To facilitate merging Product should repeat in 1st column for the corresponding Mcats
for (i in 1:length(paid_supplier_new_products_mcat1[[1]]))
{
print(i)
if(paid_supplier_new_products_mcat1[i,1] == "")
{
paid_supplier_new_products_mcat1[i,1] = paid_supplier_new_products_mcat1[i-1,1]
} }
Somesh_Results1 = merge(paid_supplier_new_products_mcat1, Lead_Count,
by.x="Mcat", by.y = "GLCAT_MCAT_NAME")
write.csv(Samar_Final_Max_Results, "Samar_Final_Max_Results.csv")
# To interchange the 1st and 2nd columns
for (i in 1:length(Results[[1]]))
{
print(i)
temp = Results[i,1]
Results[i,1] = Results[i,2]
Results[i,2] = temp
}
colnames(Results) = c( "Product","Mcat" , "GLCAT_MCAT_ID", "JFM.Approved")
write.csv(Results, "Somesh_Final_Results.csv")
write.csv(Samar_Final_Max_Results, "Samar_Final_Max_Results.csv")
# To consider only the maximum leads Mcats
Summer Internship Report Page 33
for (i in 1:length(paid_supplier_new_products_mcat1[[1]]))
{ print(i)
if(paid_supplier_new_products_mcat1[i,1] == "")
{
paid_supplier_new_products_mcat1[i,2] = ""
} }
write.csv(paid_supplier_new_products_mcat1, "test.csv")
write.csv(Somesh_Results1, "Samar_Final_Max_Results.csv")
# To consider only the maximum leads Mcats
for (i in 1:length(paid_supplier_new_products_mcat2[[1]]))
{
print(i)
if(paid_supplier_new_products_mcat2[i,1] == "")
{
paid_supplier_new_products_mcat2[i,2] = ""
} }
write.csv(paid_supplier_new_products_mcat2, "paid_supplier_new_products_mcat2.csv")
Samar_Final_Max_Results = merge(paid_supplier_new_products_mcat2, Lead_Count,
by.x="Mcat", by.y = "GLCAT_MCAT_NAME")
Somesh_Results1 = merge(paid_supplier_new_products_mcat1, Lead_Count,
by.x="Mcat", by.y = "GLCAT_MCAT_NAME")
Somesh_Final_Results = Somesh_Results1
write.csv(Somesh_Final_Max_Results, "Somesh_Final_Max_Results.csv")
Results = merge(Samar_Products, Lead_Count, by.x="Mcat", by.y = "GLCAT_MCAT_NAME" )
Summer Internship Report Page 34
write.csv(Results, "Somesh_Results.csv")
# Code for trimming
for (i in 1:length(paid_supplier_new_products_mcat1[[1]])) {
print(i)
paid_supplier_new_products_mcat1[i,2] = trimws(paid_supplier_new_products_mcat1[i,2])
}
for (i in 1:length(Samar_Products[[1]])) {
print(i)
Samar_Products[i,2] = trimws(Samar_Products[i,2])
}
Project #2:
Loop to search in deleted leads data -
for(i in 1:length(deleted_total[ ,1]))
{
for (j in 1:length(secondary_mcat[ ,1]) )
{
if(deleted_total[i,1] == secondary_mcat[j,1]
&&
deleted_total[i,3]== secondary_mcat[j,3] && secondary_mcat[j,4] <= deleted_total[i,2]
&& secondary_mcat[ j,3]!= secondary_mcat[ j,5])
{
print(i)
USR_ID = secondary_mcat[j,1]
OFR_ID = secondary_mcat[j,2]
Summer Internship Report Page 35
MCAT_ID= secondary_mcat[j,3]
DEL_REASON= deleted_total[i,4]
mydata1 = cbind(USR_ID,OFR_ID,MCAT_ID,DEL_REASON)
mydata = rbind(mydata, mydata1)
} } }
Loop to search in Approved leads data:
mydata = cbind(USR_ID = 0,OFR_ID = 0, MCAT_ID = 0
for(i in 1:(length(approved_total[ ,1])) )
{ print(i)
for (j in 1:length(secondary_mcat[ ,1]) )
{
if(approved_total[i,1] == secondary_mcat[j,1]
&&
approved_total[i,3]== secondary_mcat[j,3] && secondary_mcat[j,4] <= approved_total[i,2]
&& secondary_mcat[ j,3]!= secondary_mcat[ j,5])
{
USR_ID = secondary_mcat[j,1]
OFR_ID = secondary_mcat[j,2]
MCAT_ID= secondary_mcat[j,3]
mydata1 = cbind(USR_ID,OFR_ID,MCAT_ID)
mydata = rbind(mydata, mydata1)
} } }
Project #3:
Result = data.frame()
Summer Internship Report Page 36
GLID = 0
CUSTOMER_TICKET_ID = 0
CUSTOMER_TICKET_ISSUE_DATE = 0
CUSTOMER_TICKET_DETAIL = 0
APPENDUM = 0
TICKET_HISTORY = 0
STOP = 0
NO_BENEFIT = 0
NO_MATURITY = 0
NO_TIME_TO_USE = 0
FAKE_BUYERS = 0
IRRELEVANT_ENQUIRIES = 0
FAKE_BL = 0
NOTICE_PERIOD = 0
BUSINESS_CLOSED = 0
HYPER_LOCAL_BUYER = 0
MISCOMMITMENT = 0
PHYSICAL_VISIT = 0
NOT_TECH_SAVY = 0
BUYER_WANT_LOW_PRICE = 0
LANGUAGE_BARRIER = 0
CHANGES_REQUIRED = 0
WRONG_PRODUCT = 0
Summer Internship Report Page 37
WRONG_IMAGE = 0
WRONG_CATALOG = 0
CHANGE_NUMBER = 0
CHANGE_EMAIL = 0
CLOSE_ACCOUNT = 0
NOT_INTERESTED = 0
REMOVE_PRICE = 0
DIFFERENT_CATALOG = 0
CHANGE_AFTER_APPROVAL=0
for (i in 1:length(Somesh_Ticket[[1]]))
#for(i in 1:10000)
{
print(i)
GLID = Somesh_Ticket[i,1]
CUSTOMER_TICKET_ID = Somesh_Ticket[i,2]
CUSTOMER_TICKET_ISSUE_DATE = Somesh_Ticket[i,3]
CUSTOMER_TICKET_DETAIL = Somesh_Ticket[i,4]
APPENDUM = Somesh_Ticket[i,5]
TICKET_HISTORY = Somesh_Ticket[i,6]
a = paste(Somesh_Ticket[i,4], Somesh_Ticket[i,5], Somesh_Ticket[i,6])
#1
if(grepl("stop the service", a, ignore.case = TRUE))
{ STOP = 1 }
else if(grepl("Deactivate the service", a, ignore.case = TRUE))
Summer Internship Report Page 38
{ STOP = 1 }
else { STOP = 0}
#2
if(grepl("Did not get benefit", a, ignore.case = TRUE))
{
NO_BENEFIT = 1 }
else if(grepl("No Benefit", a, ignore.case = TRUE))
{
NO_BENEFIT = 1 }
else if(grepl("Did not get Business", a, ignore.case = TRUE))
{
NO_BENEFIT = 1 }
else
{NO_BENEFIT = 0 }
#3
if(grepl("Did not get maturity", a, ignore.case = TRUE))
{
NO_MATURITY = 1 }
else if(grepl("No Maturity", a, ignore.case = TRUE))
{
NO_MATURITY = 1 }
else if(grepl("Maturity Issues", a, ignore.case = TRUE))
{
NO_MATURITY = 1 }
else if(grepl("Deal Not Maturing", a, ignore.case = TRUE))
Summer Internship Report Page 39
{
NO_MATURITY = 1 }
else
{
#4
if(grepl("No time to use the service", a, ignore.case = TRUE))
{
NO_TIME_TO_USE = 1 }
else
{
NO_TIME_TO_USE = }
#5
if(grepl("Buyers not responding", a, ignore.case = TRUE))
{
FAKE_BUYERS = 1
#FAKE_BUYERS = rbind(FAKE_BUYERS, FAKE_BUYERS1)
}
else if(grepl("fake Buyers", a, ignore.case = TRUE))
{
FAKE_BUYERS = 1 }
else if(grepl("Fake Leads", a, ignore.case = TRUE))
{
FAKE_BUYERS = 1
else
Summer Internship Report Page 40
{
FAKE_BUYERS = 0 }
if(grepl("Irrelevant Enquiries", a, ignore.case = TRUE))
{
irrelevant_enquiries = 1
}
else if(grepl("Less Enquiries", a, ignore.case = TRUE))
{
irrelevant_enquiries = 1 }
else if(grepl("Bulk Enquiries"
, a, ignore.case = TRUE))
{
irrelevant_enquiries = 1 }
else if(grepl("Low Enquiry", a, ignore.case = TRUE))
{
irrelevant_enquiries = 1
}
else if(grepl("Retail Enquiry", a, ignore.case = TRUE))
{
irrelevant_enquiries = 1 }
else if(grepl("Wrong Enquiry", a, ignore.case = TRUE))
{
Summer Internship Report Page 41
irrelevant_enquiries = 1 }
else if(grepl("Irrelevant inquiries", a, ignore.case = TRUE))
{
irrelevant_enquiries = 1 }
else if(grepl("Less inquiries", a, ignore.case = TRUE))
{
irrelevant_enquiries = 1 }
else if(grepl("Bulk inquiries", a, ignore.case = TRUE))
{
irrelevant_enquiries = 1 }
else if(grepl("Low inquiry", a, ignore.case = TRUE))
{
irrelevant_enquiries = 1 }
else if(grepl("Retail inquiry", a, ignore.case = TRUE))
{
irrelevant_enquiries = 1 }
else if(grepl("Wrong inquiry", a, ignore.case = TRUE))
{
irrelevant_enquiries = 1 }
else if(grepl("Irrelevant queries", a, ignore.case = TRUE))
{
irrelevant_enquiries = 1
}
else if(grepl("Less queries", a, ignore.case = TRUE))
{
Summer Internship Report Page 42
irrelevant_enquiries = 1 }
#6
else if(grepl("Bulk queries", a, ignore.case = TRUE))
{
irrelevant_enquiries = 1 }
else if(grepl(
"Low query", a, ignore.case = TRUE))
{
irrelevant_enquiries = 1 }
else if(grepl("Retail query", a, ignore.case = TRUE))
{
irrelevant_enquiries = 1 }
else if(grepl("Wrong query", a, ignore.case = TRUE))
{
irrelevant_enquiries = 1 }
else
{
irrelevant_enquiries = 0 }
#7
if(grepl("Wrong Buy Leads", a, ignore.case = TRUE))
{
Summer Internship Report Page 43
FAKE_BL = 1 }
else if(grepl("Fake Buy Lead", a, ignore.case = TRUE))
{
FAKE_BL = 1 }
else if(grepl("Wrong BL", a, ignore.case = TRUE))
{
FAKE_BL = 1 }
else
{
FAKE_BL = 0 }
#8
if(grepl("Notice Period", a, ignore.case = TRUE))
{
NOTICE_PERIOD = 1 }
else
{
NOTICE_PERIOD = 0 }
#9
if(grepl("Business Closed", a, ignore.case = TRUE))
{
BUSINESS_CLOSED = 1 }
else if(grepl("Out of India", a, ignore.case = TRUE))
{
Summer Internship Report Page 44
BUSINESS_CLOSED = 1 }
else if(grepl("Partnership Issue", a, ignore.case = TRUE))
{
BUSINESS_CLOSED = 1 }
else if(grepl("Personal reason", a, ignore.case = TRUE))
{
BUSINESS_CLOSED = 1
}
else if(grepl("Changed My Business", a, ignore.case = TRUE))
{
BUSINESS_CLOSED = 1
}
else
{
BUSINESS_CLOSED = 0 }
#10
if(grepl("Need local buyer", a, ignore.case = TRUE))
{
HYPER_LOCAL_BUYER = 1
}
else if(grepl("local enquiries", a, ignore.case = TRUE))
{
HYPER_LOCAL_BUYER = 1 }
else
{
Summer Internship Report Page 45
HYPER_LOCAL_BUYER = 0 }
#11
if(grepl("Wrong commitment from sales", a, ignore.case = TRUE))
{
MISCOMMITMENT = 1 }
else if(grepl("Miscommitment", a, ignore.case = TRUE))
{
MISCOMMITMENT = 1 }
else if(grepl("mis-commitment", a, ignore.case = TRUE))
{
MISCOMMITMENT = 1 }
else
{
MISCOMMITMENT = 0 }
#12
if(grepl("physical visit", a, ignore.case = TRUE))
{
PHYSICAL_VISIT = 1 }
else
{
PHYSICAL_VISIT = 0 }
#13
if(grepl("Client is not Tech Savy", a, ignore.case = TRUE))
Summer Internship Report Page 46
{
NOT_TECH_SAVY = 1
}
else if(grepl("Tech Savvy", a, ignore.case = TRUE))
{
NOT_TECH_SAVY = 1 }
else if(grepl("Computer Savvy", a, ignore.case = TRUE))
{
NOT_TECH_SAVY = 1 }
else
{
NOT_TECH_SAVY = 0 }
#14
if(grepl("Buyer quote very less price", a, ignore.case = TRUE))
{
BUYER_WANT_LOW_PRICE = 1 }
else if(grepl("Buyer asking Low Price", a, ignore.case = TRUE))
{
BUYER_WANT_LOW_PRICE = 1
}
else if(grepl("Buyer want low price", a, ignore.case = TRUE))
{
BUYER_WANT_LOW_PRICE = 1
}
else
Summer Internship Report Page 47
{
BUYER_WANT_LOW_PRICE = 0 }
#15
if(grepl("Language Barrier", a, ignore.case = TRUE))
{
LANGUAGE_BARRIER = 1 }
else if(grepl("Tamil", a, ignore.case = TRUE))
{
LANGUAGE_BARRIER = 1 }
else
{
LANGUAGE_BARRIER = 0 }
#16
if(grepl("changes required", a, ignore.case = TRUE))
{
CHANGES_REQUIRED = 1 }
else { CHANGES_REQUIRED = 0 }
#17
if(grepl("WRONG PRODUCT", a, ignore.case = TRUE))
{
WRONG_PRODUCT = 1 }
else { WRONG_PRODUCT = 0}
#18
Summer Internship Report Page 48
if(grepl("WRONG IMAGE", a, ignore.case = TRUE))
{
WRONG_IMAGE = 1 }
else { WRONG_IMAGE = 0}
#19
if(grepl("WRONG CATALOG", a, ignore.case = TRUE))
{
WRONG_CATALOG = 1 }
else { WRONG_CATALOG = 0}
#20
if(grepl("Change number", a, ignore.case = TRUE))
{
CHANGE_NUMBER = 1 }
else { CHANGE_NUMBER = 0}
#21
if(grepl("Change email", a, ignore.case = TRUE))
{
CHANGE_EMAIL = 1 }
else { CHANGE_EMAIL = 0}
#22
if(grepl("Close the account", a, ignore.case = TRUE))
{
CLOSE_ACCOUNT = 1
Summer Internship Report Page 49
}
else { CLOSE_ACCOUNT = 0}
#23
if(grepl("Not interested", a, ignore.case = TRUE))
{
NOT_INTERESTED = 1}
else { NOT_INTERESTED = 0}
#24
if(grepl("Remove Price", a, ignore.case = TRUE))
{
REMOVE_PRICE = 1 }
else { REMOVE_PRICE = 0}
#25
if(grepl("not the same catalog that i approved", a, ignore.case = TRUE))
{
DIFFERENT_CATALOG = 1 }
else { DIFFERENT_CATALOG = 0}
#26
if(grepl("change after approval", a, ignore.case = TRUE))
{
CHANGE_AFTER_APPROVAL=1 }
else { CHANGE_AFTER_APPROVAL=0}
else if(grepl("change after hosting approval", a, ignore.case = TRUE))
{
Summer Internship Report Page 50
CHANGE_AFTER_APPROVAL=1 }
else { CHANGE_AFTER_APPROVAL=0}
Result1 = cbind(GLID ,CUSTOMER_TICKET_ID,CUSTOMER_TICKET_ISSUE_DATE,
CUSTOMER_TICKET_DETAIL
,APPENDUM ,TICKET_HISTORY ,STOP ,NO_BENEFIT ,NO_MATURITY
,NO_TIME_TO_USE
,FAKE_BUYERS ,IRRELEVANT_ENQUIRIES ,FAKE_BL ,NOTICE_PERIOD
,BUSINESS_CLOSED
,HYPER_LOCAL_BUYER ,MISCOMMITMENT,PHYSICAL_VISIT ,NOT_TECH_SAVY
,BUYER_WANT_LOW_PRICE ,LANGUAGE_BARRIER,
CHANGES_REQUIRED,WRONG_PRODUCT,WRONG_IMAGE
,WRONG_CATALOG, CHANGE_NUMBER, CHANGE_EMAIL, CLOSE_ACCOUNT,
NOT_INTERESTED
,REMOVE_PRICE, DIFFERENT_CATALOG, CHANGE_AFTER_APPROVAL )
Result = rbind(Result,Result1)
Result1 = NULL
}
#write.csv(Result, "Somesh_Ticket_Result.csv")
Project #4:
d7 <- read.csv(“complete_file.csv”)
d7_train <- d7[1:3000,]
d7_test <- d7[3001:3613,]
library(C50)
m <- C5.0(d7_train[c(2:22)], as.factor(d7_train[[31]]), trials = 1)
summary(m)
p <- predict(m, d7_test[c(2:22)])
library(gmodels)
Summer Internship Report Page 51
CrossTable(d7_test$Status, p, prop.chisq = FALSE, prop.c = FALSE, prop.r = FALSE, dnn = c("actual",
"predicted"))
library(irr)
p1 <- predict(m, d7_test[c(2:22)], type = "prob")
p1 <- cbind(p1, Prediction = p, Actual_Status = d7_test$Status)
head(p1,20)
write.csv(p1, "yearly_prob.csv")
Project #5:
To find the Brand Names :
# Name the imported file as mydata
Brand_Data <-data.frame(x=numeric(length(mydata[,1]))
,y=numeric(length(mydata[,1]))
,z=numeric(length(mydata[,1])))
count <- 1
for(i in 1:length(mydata[ ,1]))
{
print(i)
temp <- sapply(mydata[i,1], as.character)
temp1 <- tolower(temp)
temp <- sub(".*brand(:| :|-| -|:-| :- )","",temp1)
if(temp != temp1)
{
temp <- sub("n.*","",temp)
Brand_Data[count,1] <- sapply(mydata[i,1], as.character)
Brand_Data[count,2] <- temp
Brand_Data[count,3] <- sapply(mydata[i,2], as.character)
count = count + 1}
Summer Internship Report Page 52
}
write.csv(Brand_Data,"Brand_Data.csv")
To find the specifications:
#Declaring 2 lists
list1 = list()
list2 = list()
# Declaring a null dataframe
Brand_Data_Result = data.frame(cbind( A=NULL,B=NULL, C=NULL, D=NULL, E=NULL,
F=NULL, G=NULL, H=NULL, I=NULL))
for (i in 1:length(Brand_Data[[1]])) {
print(i)
a = as.character(Brand_Data[i,1])
#Split the text based on the ":"
if(grepl("Brand:", a))
{ b = strsplit(a, ":") }
#Split the text based on the ":-"
else if(grepl("Brand:-",a))
{ b = strsplit(a, ":-") }
#Split the text based on the "-"
else if(grepl("Brand-",a))
b = strsplit(a, "-")
#Split the text based on the " -"
else if(grepl("Brand -",a))
b = strsplit(a, " -")
Summer Internship Report Page 53
#Split the text based on the " :-"
else if(grepl("Brand :-",a))
b = strsplit(a, " :-")
#Split the text based on the " :"
else if(grepl("Brand :",a))
b = strsplit(a, " :")
else b = a
if( length(b[[1]])<2 )
{
# when length of the splitted text is less than 2 Just consider the first 3 columns of input in which
# the 2nd one already contains the brand name
Brand_Data_Result[i,1] = Brand_Data[i,1]
Brand_Data_Result[i,2] = Brand_Data[i,2]
Brand_Data_Result[i,3] = Brand_Data[i,3]
next
# Go for the next iteration ( next i )
}
for (j in 1:length(b[[1]]))
{
# split the text based on "n"
c = strsplit(b[[1]][j], "n")
list1[j]= c
}
# Attach the last word of one string and the first word of next string
for (k in 1:(j-1)) {
Summer Internship Report Page 54
d = paste(list1[[k]][length(list1[[k]])], list1[[k+1]][1], sep = ":")
list2[k]= d
}
Brand_Data_Result[i,1] = Brand_Data[i,1]
Brand_Data_Result[i,2] = Brand_Data[i,2]
Brand_Data_Result[i,3] = Brand_Data[i,3]
for(l in 1:length(list2))
{
Brand_Data_Result[i,l+3]=list2[l]
}
list1 = NULL
list2 = NULL}
Brand_Data_Result = Brand_Data_Result[ ,1:10]
# Code to rearrange the rows of the brand data result so that 4th columns conatins only the brand
name not anything else
for (p in 1:length(Brand_Data_Result[[1]])){
print(p)
if(grepl("Brand", Brand_Data_Result[p,4]))
{
next
}
for (q in 5:10) {
if(grepl("Brand", Brand_Data_Result[p,q]))
{
Summer Internship Report Page 55
temp = Brand_Data_Result[p,q]
Brand_Data_Result[p,q]= Brand_Data_Result[p,4]
Brand_Data_Result[p,4] = temp }}}
write.csv(Brand_Data_Result, "Brand_Data_Result.csv")
To find the count of Brands and Specifications for a particular Mcat :
Brand_Data1 = Brand_Specifications_Result
Brand_Data1[is.na(Brand_Data1)] = ""
# Code to rearrange the rows of the brand data result so that 4th columns conatins only the brand
name not anything else
a = unfactor(a)
for (p in 1:length(a[[1]]))
{
if(p%%100 ==0)
{print(p)}
if(grepl("Brand", a[p,3]))
{
next
}
for (q in 3:8)
{
if(grepl("Brand", a[p,q]))
{
temp = a[p,q]
a[p,q]= a[p,3]
a[p,3] = temp }}}
Summer Internship Report Page 56
library(stringr)
list_splitted = list()
list3 = list()
test = list()
Result = data.frame()
Brand_Name = 0
Mcat_Name=0
Spec1 = 0
Spec2 = 0
Spec3 = 0
Spec4 = 0
Spec5 = 0
Spec6 = 0
Spec7 = 0
# Spec# are the specification columns
for (p in 1:length(Brand_Data1[[1]]) ){
if(p%%100 ==0)
{print(p)}
if(Brand_Data1[p,2]==""){next}
# First split the text on the basis of " and " and assign that list to test
test = strsplit(Brand_Data1[p,2]," and ")
if(test[[1]][1] ==""){next}
#print(test)
Summer Internship Report Page 57
#print(length(test[[1]]))
# Now split the elements of the test on the basis of ","
for(q in 1:length(test[[1]]))
{
#print(q)
#Put all the splitted elements in list3
list3[q] = strsplit(test[[1]][q],",") }
#print(list3)
for (r in 1:length(list3))
{
# check if Brand name is pure no.- then don't consider that
for(s in 1:length(list3[[r]]))
{
#if( is.na(as.numeric(list3[[r]][s])))
test1 = list3[[r]][s]
test1 = str_trim(test1)
Brand_Name = rbind(Brand_Name,test1)
test2 = Brand_Data1[p,3]
test2 = str_trim(test2)
Mcat_Name = rbind(Mcat_Name,test2)
test2 = Brand_Data1[p,4]
Spec1 = rbind(Spec1,test2)
test2 = Brand_Data1[p,5]
Spec2 = rbind(Spec2,test2)
Summer Internship Report Page 58
test2 = Brand_Data1[p,6]
Spec3 = rbind(Spec3,test2)
test2 = Brand_Data1[p,7]
Spec4 = rbind(Spec4,test2)
test2 = Brand_Data1[p,8]
Spec5 = rbind(Spec5,test2)
test2 = Brand_Data1[p,9]
Spec6 = rbind(Spec6,test2)
} }
test1 = NULL
test2 = NULL
Result1 = cbind(Brand_Name, Mcat_Name,Spec1,Spec2, Spec3, Spec4,Spec5, Spec6)
Result = rbind(Result, Result1)
Brand_Name = NULL
Mcat_Name = NULL
Spec1 = NULL
Spec2 = NULL
Spec3 = NULL
Spec4 = NULL
Spec5 = NULL
Spec6 = NULL
Summer Internship Report Page 59
list3 = NULL
}
# To remove the first row of zeroes
Result = Result[-1, ]
#colnames(Result) = c("Brand_Name","Mcat_Name")
Result_final_1 = Result
# To remove wrongly chosen brands
for (t in 1:length(Result[[1]]))
{
if(t%%100 ==0)
{print(t)}
if(grepl("any|other|all ",Result[t,1]) )
{
Result = Result[-t, ]
#print("Good")
}
}
# To remove "." and ":" from Brand names in the 1st column of Result So that when grepl is used,
some observations should not miss due to extra "."
#Result[Result.na] = 0
Result[ ,1] = sub("[.,:,',-,(,),+]","", Result[ ,1])
# To remove the rows which contain only "all" in brand column
Summer Internship Report Page 60
for (t in 1:length(Result[[1]]))
{
if(t%%100 ==0)
{print(t)}
if(grepl(Result[t,1],"all ") | is.na(Result[t,1]))
{
Result = Result[-t, ]
#print("Good")
}}
# To do the trimming of extra spaces created due to removal of ":"
# # Code for trimming
for (i in 1:length(Result[[1]])) {
if(i%%100 ==0)
{print(i)}
Result[i,1] = trimws(Result[i,1])
}
#write.csv(Result, "Result.csv")
# To get rid of factors first save it and then import it
#write.csv(Result, "Result.csv")
Result2 = Result
#rm(Result)
library(varhandle)
Result = unfactor(Result)
Summer Internship Report Page 61
#Result = read.csv(file.choose(), header = TRUE, sep = ",", stringsAsFactors = FALSE)
#Result[Result == ""] = 0
# Result1 = Result2
# Result2 = Result
# Result = Result2
# To split the Brands in Result based on " or "
# First assigned the Result to a different dataframe so that u can use earlier to split based on " and
" as such remove the first column of Result
Result_copy1 = Result
Result = NULL
library(stringr)
list_splitted = list()
list3 = list()
test = list()
Result = data.frame()
Brand_Name = 0
Mcat_Name=0
Spec1 = 0
Spec2 = 0
Spec3 = 0
Spec4 = 0
Spec5 = 0
Spec6 = 0
Spec7 = 0
Summer Internship Report Page 62
# Now make the Result null because the output will be stored in it
#colnames(Result_copy1) = c("P","M")
for (p in 1:length(Result_copy1[[1]]) )
# for ( p in 3:4)
{
if(p%%100 ==0)
{print(p)}
# First split the text on the basis of " and " and assign that list to test
test = strsplit(Result_copy1[p,1]," or ")
if(test[[1]][1]==""){next}
#print(test)
#print(length(test[[1]]))
# Now split the elements of the test on the basis of ","
for(q in 1:length(test[[1]]))
{
#print(q)
#Put all the splitted elements in list3
list3[q] = strsplit(test[[1]][q],",")
#print(list3[q])
}
#print(list3)
for (r in 1:length(list3))
{
# check if Brand name is pure no.- then don't consider that
Summer Internship Report Page 63
for(s in 1:length(list3[[r]]))
{
#if( is.na(as.numeric(list3[[r]][s])))
#{
test1 = list3[[r]][s]
test1 = str_trim(test1)
Brand_Name = rbind(Brand_Name,test1)
test2 = Result_copy1[p,2]
test2 = str_trim(test2)
Mcat_Name = rbind(Mcat_Name,test2)
test2 = Result_copy1[p,3]
Spec1 = rbind(Spec1,test2)
#print(Spec1)
test2 = Result_copy1[p,4]
Spec2 = rbind(Spec2,test2)
#print(Spec2)
test2 = Result_copy1[p,5]
Spec3 = rbind(Spec3,test2)
# print(Result_copy1[p,5])
# print(test2)
# print(Spec3)
test2 = Result_copy1[p,6]
Spec4 = rbind(Spec4,test2)
Summer Internship Report Page 64
test2 = Result_copy1[p,7]
Spec5 = rbind(Spec5,test2)
test2 = Result_copy1[p,8]
Spec6 = rbind(Spec6,test2)
}}
test1 = NULL
test2 = NULL
Result1 = cbind(Brand_Name, Mcat_Name,Spec1,Spec2, Spec3, Spec4,Spec5, Spec6)
Result = rbind(Result, Result1)
Brand_Name = NULL
Mcat_Name = NULL
Spec1 = NULL
Spec2 = NULL
Spec3 = NULL
Spec4 = NULL
Spec5 = NULL
Spec6 = NULL
list3 = NULL
}
Result = as.data.frame(Result)
Result3 = Result
Summer Internship Report Page 65
# To remove the first row of zeroes
Result = Result[-1, ]
Result[ ,1] = sub("[.,:,',-,(,),+]","", Result[ ,1])
# To remove wrongly chosen brands
for (t in 1:length(Result[[1]]))
{
print(t)
if(grepl("any|other|all ",Result[t,1]) )
{
Result = Result[-t, ]
#print("Good")}}
for (t in 1:length(Result[[1]])){
if(t%%100 ==0)
{print(t)}
if(grepl(Result[t,1],"all ") | is.na(Result[t,1]))
{
Result = Result[-t, ]
#print("Good")}}
# To do the trimming of extra spaces created due to removal of ":"
# # Code for trimming
Summer Internship Report Page 66
for (i in 1:length(Result[[1]])) {
if(i%%100 ==0)
{print(i)}
Result[i,1] = trimws(Result[i,1]) }
Result4 = Result
# To remove the Brands which starts with a number
for (i in 1:length(Result[[1]]))
# for(i in 1:10 )
{
if(i%%100 ==0)
{print(i)}
if(substr(Result[i, 1], 1, 2)== "3m" |
is.na(as.numeric(substr(Result[i,1],1,1))))
{ next }
if(!is.na(as.numeric(substr(Result[i,1],1,1))))
{
Result[i, ] = "" } }
Result_Final = Result
d = Result_Final
#write.csv(Result_Final, "Result_Final.csv")
# Remove the first column in the Excel. Again import that data as the final input for the count of
specifications
# read.csv(file.choose(), header = TRUE, sep = ",", stringsAsFactors = FALSE)
# Now the final code to get the result in a given format
Summer Internship Report Page 67
# Format:-
# For a particular Mcat get all the brands and their individual count , Also get the count of all the
specifications for the same Mcat
# Input:- Brand_Specifications_Final_Result
Specifications = Brand_Specifications_Final_Result
# First take the specifications and split on ":" to consider only Specification not the value
# Now convert these specifications to factors so that count becomes easy
for (i in 1:length(Specifications[[1]]))
{
print(i)
for (j in 3:8)
{
if(Specifications[i,j]!="")
{
a = strsplit(Specifications[i,j],":")
Specifications[i,j] = a[[1]][1]
}
a = NULL }}
# To remove the brands with " etc" string
for (i in 1:length(Specifications[[1]]))
{
print(i)
if(grepl(" etc| china",Specifications[i,2] ))
{
Summer Internship Report Page 68
a = strsplit(Specifications[i,2]," etc")
Specifications[i,2] = a[[1]][1] }
a = NULL
}
# To remove the brands which has only "etc" string
for (i in 1:length(Specifications[[1]]) )
{
print(i)
if(Specifications[i,2] !="etc" & Specifications[i,2] !="china"
& Specifications[i,2] !="chinese")
{
next }
else { Specifications = Specifications[-i, ]}
}
# To make specifications lower case so that we don't get different and less counts for the
# same specification and also remove the numbers from the specifications
for (i in 1:length(Specifications[[1]]))
{
print(i)
for (j in 3:8)
{
if(is.na(as.numeric(Specifications[i,j])))
{
Summer Internship Report Page 69
Specifications[i,j] = tolower(Specifications[i,j])
}
else { Specifications[i,j] = "" }
if(!grepl("price|budget", Specifications[i,j]))
{
next
}
else { Specifications[i,j] = "" } }}
write.csv(Specifications,"Specs_Final_Result.csv")
# Final Loop
library(plyr) # For count function
library(varhandle) # For unfactor function
Result = data.frame()
Mcat_Name = 0
Total_Brand_Count = 0
Brand_Name = 0
Brand_count = 0
Specs = 0
Specs_Count = 0
count = 1
test1 = 0
test2 = 0
a = NULL
Summer Internship Report Page 70
b = NULL
c = NULL
d = NULL
for (i in 1:length(Specifications[[1]]))
#for(i in 1:30)
{
print(i)
temp = Specifications[i,2]
a = c(a,temp )
#print(temp)
# a is the vector of brand names
#print(a)
if(Specifications[i,1] == Specifications[i+1,1])
{
count = count + 1
#print(count)
}
for (j in 3:7)
{
#print("yes")
if(Specifications[i,j]!="")
{
temp = Specifications[i,j]
b = c(b,temp)
#print(b)
Summer Internship Report Page 71
} }
# When Mcat Changes
if(Specifications[i,1]!=Specifications[i+1,1])
{
Mcat = Specifications[i,1]
if(length(a)!=0)
{
a = factor(a)
summary_a = count(a)
summary_a = unfactor(summary_a)
}
if(length(b)!=0)
{
b = factor(b)
summary_b = count(b)
summary_b = unfactor(summary_b)
}
#print(summary_b)
# To know how much rows are required for particular Mcat
max_len = max(length(summary_a[[1]]), length(summary_b[[1]]), count)
for (k in 1:max_len)
{ Mcat_Name = rbind(Mcat_Name, Mcat )
# To get the Brand Names and their count
if(k<= length(summary_a[[1]]))
Summer Internship Report Page 72
{
test1 = summary_a[[1]][k]
Brand_Name = rbind(Brand_Name,test1)
test2 = summary_a[[2]][k]
Brand_count = rbind(Brand_count,test2)
}
else
{
test1 = ""
Brand_Name = rbind(Brand_Name,test1)
test2 = ""
Brand_count = rbind(Brand_count,test2)
}
Total_Brand_Count = rbind(Total_Brand_Count, count)
# To get the specifications and their count
if(k<= length(summary_b[[1]]))
{
test1 = summary_b[[1]][k]
Specs = rbind(Specs,test1)
test2 = summary_b[[2]][k]
Specs_Count = rbind(Specs_Count,test2)
}
else
{
Summer Internship Report Page 73
test1 = ""
Specs = rbind(Specs,test1)
test2 = ""
Specs_Count = rbind(Specs_Count,test2)
} }
Result1 = cbind(Mcat_Name,Total_Brand_Count, Brand_Name,
Brand_count, Specs, Specs_Count)
Result = rbind(Result,Result1)
count = 1
a = NULL
b =NULL
summary_a = NULL
summary_b = NULL
}
Result1 = NULL
Mcat_Name = NULL
Total_Brand_Count = NULL
Brand_Name = NULL
Brand_count = NULL
Specs = NULL
Specs_Count = NULL
}
Specifications_Final_Result = Result
write.csv(Result, "Specifications_Final_Result.csv")

More Related Content

What's hot

MBA Summer Internship Project Report
MBA Summer Internship Project ReportMBA Summer Internship Project Report
MBA Summer Internship Project Reportprateek tyagi
 
Analyzing customer decision making process in insurance services
Analyzing customer decision making process in insurance servicesAnalyzing customer decision making process in insurance services
Analyzing customer decision making process in insurance servicesAnand Tomar
 
Marketing Strategies of HDFC Standard Life
Marketing Strategies of HDFC Standard LifeMarketing Strategies of HDFC Standard Life
Marketing Strategies of HDFC Standard LifeAnshiMalaiya
 
Summer internship report at ifortis corporate
Summer internship report at ifortis corporate Summer internship report at ifortis corporate
Summer internship report at ifortis corporate VenkatasaiMalla
 
LIC India - An Introduction
LIC India - An IntroductionLIC India - An Introduction
LIC India - An IntroductionManmohan Anand
 
Business development summer internship project report
Business development summer internship project reportBusiness development summer internship project report
Business development summer internship project reportRahulkumar6266
 
Individual behavior regarding mutual fund investment
Individual behavior regarding mutual fund investmentIndividual behavior regarding mutual fund investment
Individual behavior regarding mutual fund investmentPritesh Radadiya
 
Sales promotion at dmart
Sales promotion at dmartSales promotion at dmart
Sales promotion at dmartRahul Jain
 
A project report on SBI bank
A project report on SBI bankA project report on SBI bank
A project report on SBI bankBhavik Parmar
 
Multidisciplinary action project report
Multidisciplinary action project reportMultidisciplinary action project report
Multidisciplinary action project reportHIMANI SONI
 
Summer Internship Project PPT
Summer Internship Project PPTSummer Internship Project PPT
Summer Internship Project PPTArun Gupta
 
richa report final
richa report finalricha report final
richa report finalRicha Verma
 

What's hot (20)

Reliance fresh
Reliance freshReliance fresh
Reliance fresh
 
MBA Summer Internship Project Report
MBA Summer Internship Project ReportMBA Summer Internship Project Report
MBA Summer Internship Project Report
 
sbi mutual fund
sbi mutual fundsbi mutual fund
sbi mutual fund
 
Airtel Management
Airtel ManagementAirtel Management
Airtel Management
 
D mart
D martD mart
D mart
 
Analyzing customer decision making process in insurance services
Analyzing customer decision making process in insurance servicesAnalyzing customer decision making process in insurance services
Analyzing customer decision making process in insurance services
 
Marketing Strategies of HDFC Standard Life
Marketing Strategies of HDFC Standard LifeMarketing Strategies of HDFC Standard Life
Marketing Strategies of HDFC Standard Life
 
Report Bajaj
Report BajajReport Bajaj
Report Bajaj
 
Summer internship report at ifortis corporate
Summer internship report at ifortis corporate Summer internship report at ifortis corporate
Summer internship report at ifortis corporate
 
LIC India - An Introduction
LIC India - An IntroductionLIC India - An Introduction
LIC India - An Introduction
 
Business development summer internship project report
Business development summer internship project reportBusiness development summer internship project report
Business development summer internship project report
 
Individual behavior regarding mutual fund investment
Individual behavior regarding mutual fund investmentIndividual behavior regarding mutual fund investment
Individual behavior regarding mutual fund investment
 
Sales promotion at dmart
Sales promotion at dmartSales promotion at dmart
Sales promotion at dmart
 
A project report on SBI bank
A project report on SBI bankA project report on SBI bank
A project report on SBI bank
 
Multidisciplinary action project report
Multidisciplinary action project reportMultidisciplinary action project report
Multidisciplinary action project report
 
Summer Internship Project PPT
Summer Internship Project PPTSummer Internship Project PPT
Summer Internship Project PPT
 
DMart IMC
DMart IMCDMart IMC
DMart IMC
 
Reliance Retail
Reliance Retail Reliance Retail
Reliance Retail
 
richa report final
richa report finalricha report final
richa report final
 
Dunzo
DunzoDunzo
Dunzo
 

Viewers also liked

Summer Internship Report 2009
Summer Internship Report 2009Summer Internship Report 2009
Summer Internship Report 2009Harish Lunani
 
summer Internship report L&T_ Bihar museum_Patna
summer Internship report L&T_ Bihar museum_Patnasummer Internship report L&T_ Bihar museum_Patna
summer Internship report L&T_ Bihar museum_Patnakishore1192
 
Summer Internship Report At Zydus Wellness
Summer Internship Report At Zydus WellnessSummer Internship Report At Zydus Wellness
Summer Internship Report At Zydus WellnessHuzefa Daudi
 
Coca Cola Summer Internship Report " Retailers Satisfaction With Coca Cola"
Coca Cola Summer Internship Report " Retailers Satisfaction With Coca Cola"Coca Cola Summer Internship Report " Retailers Satisfaction With Coca Cola"
Coca Cola Summer Internship Report " Retailers Satisfaction With Coca Cola"Nishant Singh
 
Industrial Relation Scenario at Nalco
Industrial Relation Scenario at NalcoIndustrial Relation Scenario at Nalco
Industrial Relation Scenario at NalcoDeepika Das
 
B.sc in medical lab sciene internship report(SRL) from mritunjay Soni
B.sc in medical lab sciene internship report(SRL) from mritunjay SoniB.sc in medical lab sciene internship report(SRL) from mritunjay Soni
B.sc in medical lab sciene internship report(SRL) from mritunjay SoniLaxmivip29
 
Summer Internship Report -By Rahul Mehra
Summer Internship Report -By Rahul MehraSummer Internship Report -By Rahul Mehra
Summer Internship Report -By Rahul MehraRahul Mehra
 
INDUSTRIAL RELATION –IMPACT OF TECHNOLOGY AND HR ISSUES.
INDUSTRIAL RELATION –IMPACT OF TECHNOLOGY AND HR ISSUES.INDUSTRIAL RELATION –IMPACT OF TECHNOLOGY AND HR ISSUES.
INDUSTRIAL RELATION –IMPACT OF TECHNOLOGY AND HR ISSUES.Shashank Shekhar
 
settlement of industrial disputes with case study: Hero Honda
 settlement of industrial disputes with case study: Hero Honda  settlement of industrial disputes with case study: Hero Honda
settlement of industrial disputes with case study: Hero Honda SUDARSHAN TIWARI
 
Summer internship report 2012
Summer internship report 2012Summer internship report 2012
Summer internship report 2012Nilesh Patil
 
Summer Internship Report on Developing business promotional strategies and ma...
Summer Internship Report on Developing business promotional strategies and ma...Summer Internship Report on Developing business promotional strategies and ma...
Summer Internship Report on Developing business promotional strategies and ma...Kartik Mehta
 
Summer internship report | repairwale.com mobile application design and devel...
Summer internship report | repairwale.com mobile application design and devel...Summer internship report | repairwale.com mobile application design and devel...
Summer internship report | repairwale.com mobile application design and devel...Rajath Thomson
 
Organizational Structure
Organizational StructureOrganizational Structure
Organizational StructureMary Ann Adiong
 
Summer internship report L&T
Summer internship report L&TSummer internship report L&T
Summer internship report L&TUmed Paliwal
 
Ppt. developing a conceptual framework
Ppt.  developing a conceptual frameworkPpt.  developing a conceptual framework
Ppt. developing a conceptual frameworkNursing Path
 
Grade 9 Module 1, Lesson 1.1: Volcanoes (Teacher's Guide for Discussion)
Grade 9 Module 1, Lesson 1.1: Volcanoes (Teacher's Guide for Discussion)Grade 9 Module 1, Lesson 1.1: Volcanoes (Teacher's Guide for Discussion)
Grade 9 Module 1, Lesson 1.1: Volcanoes (Teacher's Guide for Discussion)Rachel Espino
 

Viewers also liked (20)

PVF_Durga Kant Gupta
PVF_Durga Kant GuptaPVF_Durga Kant Gupta
PVF_Durga Kant Gupta
 
Summer Internship Report 2009
Summer Internship Report 2009Summer Internship Report 2009
Summer Internship Report 2009
 
summer Internship report L&T_ Bihar museum_Patna
summer Internship report L&T_ Bihar museum_Patnasummer Internship report L&T_ Bihar museum_Patna
summer Internship report L&T_ Bihar museum_Patna
 
Summer Internship Report At Zydus Wellness
Summer Internship Report At Zydus WellnessSummer Internship Report At Zydus Wellness
Summer Internship Report At Zydus Wellness
 
Coca Cola Summer Internship Report " Retailers Satisfaction With Coca Cola"
Coca Cola Summer Internship Report " Retailers Satisfaction With Coca Cola"Coca Cola Summer Internship Report " Retailers Satisfaction With Coca Cola"
Coca Cola Summer Internship Report " Retailers Satisfaction With Coca Cola"
 
Industrial Relation Scenario at Nalco
Industrial Relation Scenario at NalcoIndustrial Relation Scenario at Nalco
Industrial Relation Scenario at Nalco
 
Training and Employee Retention Strategies
Training and  Employee Retention  StrategiesTraining and  Employee Retention  Strategies
Training and Employee Retention Strategies
 
B.sc in medical lab sciene internship report(SRL) from mritunjay Soni
B.sc in medical lab sciene internship report(SRL) from mritunjay SoniB.sc in medical lab sciene internship report(SRL) from mritunjay Soni
B.sc in medical lab sciene internship report(SRL) from mritunjay Soni
 
Adjudication
AdjudicationAdjudication
Adjudication
 
Summer Internship Report -By Rahul Mehra
Summer Internship Report -By Rahul MehraSummer Internship Report -By Rahul Mehra
Summer Internship Report -By Rahul Mehra
 
INDUSTRIAL RELATION –IMPACT OF TECHNOLOGY AND HR ISSUES.
INDUSTRIAL RELATION –IMPACT OF TECHNOLOGY AND HR ISSUES.INDUSTRIAL RELATION –IMPACT OF TECHNOLOGY AND HR ISSUES.
INDUSTRIAL RELATION –IMPACT OF TECHNOLOGY AND HR ISSUES.
 
Parle g
Parle gParle g
Parle g
 
settlement of industrial disputes with case study: Hero Honda
 settlement of industrial disputes with case study: Hero Honda  settlement of industrial disputes with case study: Hero Honda
settlement of industrial disputes with case study: Hero Honda
 
Summer internship report 2012
Summer internship report 2012Summer internship report 2012
Summer internship report 2012
 
Summer Internship Report on Developing business promotional strategies and ma...
Summer Internship Report on Developing business promotional strategies and ma...Summer Internship Report on Developing business promotional strategies and ma...
Summer Internship Report on Developing business promotional strategies and ma...
 
Summer internship report | repairwale.com mobile application design and devel...
Summer internship report | repairwale.com mobile application design and devel...Summer internship report | repairwale.com mobile application design and devel...
Summer internship report | repairwale.com mobile application design and devel...
 
Organizational Structure
Organizational StructureOrganizational Structure
Organizational Structure
 
Summer internship report L&T
Summer internship report L&TSummer internship report L&T
Summer internship report L&T
 
Ppt. developing a conceptual framework
Ppt.  developing a conceptual frameworkPpt.  developing a conceptual framework
Ppt. developing a conceptual framework
 
Grade 9 Module 1, Lesson 1.1: Volcanoes (Teacher's Guide for Discussion)
Grade 9 Module 1, Lesson 1.1: Volcanoes (Teacher's Guide for Discussion)Grade 9 Module 1, Lesson 1.1: Volcanoes (Teacher's Guide for Discussion)
Grade 9 Module 1, Lesson 1.1: Volcanoes (Teacher's Guide for Discussion)
 

Similar to Summer_Internship_Report_DurgaKant_Gupta

Automobile Lead Generation Strategies & Outcome Research Paper - 2019
Automobile Lead Generation Strategies & Outcome Research Paper - 2019Automobile Lead Generation Strategies & Outcome Research Paper - 2019
Automobile Lead Generation Strategies & Outcome Research Paper - 2019Astha Benevolent
 
PROJECT ON RELIANCE RETAIL
PROJECT ON RELIANCE RETAILPROJECT ON RELIANCE RETAIL
PROJECT ON RELIANCE RETAILtarun3288
 
Report in PDF Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO CORP...
Report in PDF Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO CORP...Report in PDF Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO CORP...
Report in PDF Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO CORP...Aneesh Porwal
 
A Study on marketing mix & competitive analysis of “Pure it” (HUL)
A Study on marketing mix & competitive analysis of “Pure it” (HUL)A Study on marketing mix & competitive analysis of “Pure it” (HUL)
A Study on marketing mix & competitive analysis of “Pure it” (HUL)jitu9030394490
 
Report in Word Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO COR...
Report in Word Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO COR...Report in Word Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO COR...
Report in Word Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO COR...Aneesh Porwal
 
consumer preference towards organized retail to unorganized retail
consumer preference towards organized retail to unorganized retailconsumer preference towards organized retail to unorganized retail
consumer preference towards organized retail to unorganized retailtwinklekande
 
consumer preference towards organized retail to unorganized retail
consumer preference towards organized retail to unorganized retailconsumer preference towards organized retail to unorganized retail
consumer preference towards organized retail to unorganized retailtwinklekande
 
Hul 101128100726-phpapp01
Hul 101128100726-phpapp01Hul 101128100726-phpapp01
Hul 101128100726-phpapp01Jitender Kumar
 
Se mag july aug binder
Se mag july aug binderSe mag july aug binder
Se mag july aug binderNidhi Vats
 
Ba fin prakash123
Ba fin prakash123Ba fin prakash123
Ba fin prakash123BaFin M
 
Final report on Consumer Buying Behavior and Factors Affecting their Buying B...
Final report on Consumer Buying Behavior and Factors Affecting their Buying B...Final report on Consumer Buying Behavior and Factors Affecting their Buying B...
Final report on Consumer Buying Behavior and Factors Affecting their Buying B...Pran Mahato
 
DIGITAL MARKETING ...
DIGITAL MARKETING ...DIGITAL MARKETING ...
DIGITAL MARKETING ...mittali1503
 
Project report on mahindra &amp; mahindra ltd. (bus division)
Project report on mahindra &amp; mahindra ltd. (bus division) Project report on mahindra &amp; mahindra ltd. (bus division)
Project report on mahindra &amp; mahindra ltd. (bus division) Yogendra Soni
 
A Project Report on the impact of surrogate advertisement in surrogate produc...
A Project Report on the impact of surrogate advertisement in surrogate produc...A Project Report on the impact of surrogate advertisement in surrogate produc...
A Project Report on the impact of surrogate advertisement in surrogate produc...Shameer M
 
Project in advertising management
Project in advertising managementProject in advertising management
Project in advertising managementSukalpa Das
 

Similar to Summer_Internship_Report_DurgaKant_Gupta (20)

Data enrichment of indiaMart
Data enrichment of indiaMartData enrichment of indiaMart
Data enrichment of indiaMart
 
Automobile Lead Generation Strategies & Outcome Research Paper - 2019
Automobile Lead Generation Strategies & Outcome Research Paper - 2019Automobile Lead Generation Strategies & Outcome Research Paper - 2019
Automobile Lead Generation Strategies & Outcome Research Paper - 2019
 
Maruti suzuki 2
Maruti suzuki 2Maruti suzuki 2
Maruti suzuki 2
 
PROJECT ON RELIANCE RETAIL
PROJECT ON RELIANCE RETAILPROJECT ON RELIANCE RETAIL
PROJECT ON RELIANCE RETAIL
 
Business research report
Business research reportBusiness research report
Business research report
 
SIP Final
SIP FinalSIP Final
SIP Final
 
Report in PDF Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO CORP...
Report in PDF Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO CORP...Report in PDF Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO CORP...
Report in PDF Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO CORP...
 
A Study on marketing mix & competitive analysis of “Pure it” (HUL)
A Study on marketing mix & competitive analysis of “Pure it” (HUL)A Study on marketing mix & competitive analysis of “Pure it” (HUL)
A Study on marketing mix & competitive analysis of “Pure it” (HUL)
 
Report in Word Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO COR...
Report in Word Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO COR...Report in Word Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO COR...
Report in Word Format on MARKETING OF TRADE FAIR FOR CONSUMER DURABLES TO COR...
 
consumer preference towards organized retail to unorganized retail
consumer preference towards organized retail to unorganized retailconsumer preference towards organized retail to unorganized retail
consumer preference towards organized retail to unorganized retail
 
consumer preference towards organized retail to unorganized retail
consumer preference towards organized retail to unorganized retailconsumer preference towards organized retail to unorganized retail
consumer preference towards organized retail to unorganized retail
 
Hul 101128100726-phpapp01
Hul 101128100726-phpapp01Hul 101128100726-phpapp01
Hul 101128100726-phpapp01
 
Se mag july aug binder
Se mag july aug binderSe mag july aug binder
Se mag july aug binder
 
SIP Report madhavi
SIP Report madhaviSIP Report madhavi
SIP Report madhavi
 
Ba fin prakash123
Ba fin prakash123Ba fin prakash123
Ba fin prakash123
 
Final report on Consumer Buying Behavior and Factors Affecting their Buying B...
Final report on Consumer Buying Behavior and Factors Affecting their Buying B...Final report on Consumer Buying Behavior and Factors Affecting their Buying B...
Final report on Consumer Buying Behavior and Factors Affecting their Buying B...
 
DIGITAL MARKETING ...
DIGITAL MARKETING ...DIGITAL MARKETING ...
DIGITAL MARKETING ...
 
Project report on mahindra &amp; mahindra ltd. (bus division)
Project report on mahindra &amp; mahindra ltd. (bus division) Project report on mahindra &amp; mahindra ltd. (bus division)
Project report on mahindra &amp; mahindra ltd. (bus division)
 
A Project Report on the impact of surrogate advertisement in surrogate produc...
A Project Report on the impact of surrogate advertisement in surrogate produc...A Project Report on the impact of surrogate advertisement in surrogate produc...
A Project Report on the impact of surrogate advertisement in surrogate produc...
 
Project in advertising management
Project in advertising managementProject in advertising management
Project in advertising management
 

Summer_Internship_Report_DurgaKant_Gupta

  • 1. Summer Internship Report Page 1 Summer Internship Report s 8th July, 2016 Submitted By Durga Kant Gupta (Roll No. 13267) Undergraduate student at IIT Kanpur Department: Biological Sciences and Bio Engineering Under The Guidance Of IndiaMART Guide: IndiaMART Co-Guide: Mr. Somesh Kumar, Mr. Anirudh Singh, VP Business Analytics, Asst. VP Business Analytics, IndiaMART InterMESH Ltd. IndiaMART InterMESH Ltd.
  • 2. Summer Internship Report Page 2 CONTENTS ACKNOWLEDGEMENT..........................................................................................................................4 ABOUT THE COMPANY ........................................................................................................................5 CORE VALUES: .....................................................................................................................................5 PRODUCTS:..........................................................................................................................................6 LISTING SERVICES: ...............................................................................................................................7 BUY LEADS:..........................................................................................................................................7 ACCESS TO SERVICE: ............................................................................................................................8 SOFTWARE OR LANGUAGES USED:.......................................................................................................9 DESCRIPTION OF PROJECTS / ACTIVITIES............................................................................................10 PROJECT#1 ........................................................................................................................................10 AIM: ..................................................................................................................................................10 PROCEDURE: .....................................................................................................................................10 COMPARISON WITH THE CURRENT SEARCH ALGORITHM: .................................................................13 RESULT:.............................................................................................................................................15 PROJECT#2 ........................................................................................................................................16 AIM: ..................................................................................................................................................16 PROCEDURE: .....................................................................................................................................17 PROJECT#3 ........................................................................................................................................18 AIM: ..................................................................................................................................................18 DATA DESCRIPTION: ..........................................................................................................................18 PROCEDURE: .....................................................................................................................................19 PROJECT#4 ........................................................................................................................................20 Aim:...................................................................................................................................................20 PROCEDURE: .....................................................................................................................................20 PROJECT#5 ........................................................................................................................................20 PROCEDURE: .....................................................................................................................................21 BIBLIOGRAPHY:..................................................................................................................................24 APPENDIX:......................................................................................................................................24
  • 3. Summer Internship Report Page 3 CERTIFICATE OF INTERNSHIP COMPLETION This is to certify that Mr. Durga Kant Gupta, a 3rd year undergraduate student of Biological Sciences and Bio Engineering department at Indian Institute of Technology, Kanpur has successfully completed his summer internship from 9th May, 2016 to 8th July 2016. During this period his performance was excellent and we found him dedicated, hardworking and sincere. We have derived immense benefit from the project and his contribution to our organization is highly appreciated. I hereby convey my best wishes to him for all his future endeavors. Somesh Kumar | VP - Business Analytics Mobile: +91-9717776552 Email: somesh.kumar@indiamart.com IndiaMART InterMESH Ltd. "Kaam Yahi Banta Hai" 7th Floor, Advant-Navis Business Park, Plot No -7 Sector-142, Noida - 201305 Ph: +91-(0120)-6777 777 Extn : 7787
  • 4. Summer Internship Report Page 4 ACKNOWLEDGEMENT I take this opportunity to extend my sincere thanks to IndiaMART for offering me a unique platform to gain exposure and garner knowledge in the field of Business Analytics. I would like to extend my heartfelt gratitude to my Internship guide Mr. Somesh Kumar and co-guide Mr. Anirudh Singh for having made my summer training a great learning experience by their constant guidance, encouragement and support. Last but not the least I would like to express my profound gratitude to each and every employee of Business Analytics Division, IndiaMART InterMESH Limited who contributed in their own ways in successful completion of my Internship. Durga Kant Gupta
  • 5. Summer Internship Report Page 5 ABOUT THE COMPANY IndiaMART is India‟s largest B2B online marketplace, connecting buyers with suppliers. The online channel focuses on providing a platform for buyers, who can be SMEs, large enterprises as well as individuals. Buyers typically gain access to a wider marketplace; diverse portfolios of quality products to choose from and tap a one-stop-shop which caters to all their specific requirements, thereby aiding the discerning buyer make well-informed choices! IndiaMART offers a platform and tools to over 2.6 crore buyers to search from over 3.3 crore products and get connected with over 22 lakh reliable and competitive suppliers. Founded in 1996, the company‟s mission is „to make doing businesses, easy‟. CORE VALUES: There are four Core values of IndiaMART, in short known as TRIP.  Team Work: “Together we can achieve the impossible” is our belief. Our success is a result of our team work. We have experts from the field of management, marketing, IT, arts, content & various other disciplines who work cordially as a team on every project, every endeavor. Dedication and passion are the true means to our mission fulfillment.  Responsible: Responsible, not just for quality work but for continuous self-development, of our decisions and of our actions. This helps us think rationally and provides a sense of accountability to ourselves, our commitment to customers and to our colleagues.  Integrity: We realize the importance of the job & information we handle. We understand the responsibility that each member of our team has to shoulder and we do that with highest levels of trust, honesty and integrity – of purpose and action.  Passion: Work at IndiaMART involves constant innovation and creativity. It involves a continuous thought process to get tangible benefits for our customers, taking into account the uniqueness of their purpose. Passionate people with a determination to make the difference are the ones who make this possible.
  • 6. Summer Internship Report Page 6 Customers are of two types:  Buyers: Users who use the service with an intention to buy something.  Suppliers: Users who use the service with an intention to sell something. A customer can be both Buyer and Supplier. Suppliers are of two types:  Free Listed Customers: Use basic service which is available at zero cost.  Paid Customers: Bought products and listing service by paying some cost. IndiaMART works on Freemium model. It earns revenue from the products, listing services and buy leads packages. PRODUCTS: IndiaMART offers following products:  MDC (Mini Dynamic Catalogue): IndiaMART develops compact 4 page home page showcasing key strengths of the customer, Zoom up Window for detailed product view and Preferred Number Service. Website hosted on sub domain. Also 10 Buy Leads/Tenders worth Rs. 2000 free every month under IndiaMART Advantage Program.  Maximiser: Website hosted on personalized domain. 360 degree visibility through PDF/Mobile Video (30 sec). 10 Buy Leads/ Tenders worth Rs. 2000 free every week under IndiaMART Advantage Program. Add up to 400 products. Preferred Number Service
  • 7. Summer Internship Report Page 7 LISTING SERVICES: The various listing services are as follows:  TrustSEAL: Third party verified TrustSEAL report. Edge over non-certified competitors online. Certified members attract genuine buyers & more business enquiries.  Star supplier: Priority listing among other catalog clients. Corporate video of supplier‟s company. Preferred Number Service. 15 Buy Leads/ Tenders worth Rs. 3000 free every week under IndiaMART Advantage Program  Leading supplier: Priority listing among all clients. Corporate video of supplier‟s company. Preferred Number service. 20 Buy Leads/Tenders worth Rs. 4000 free every week under IndiaMART Advantage Program.  Keyword Premium Listing: Listing of clients as per Keywords (for products to be bought) typed by buyers. One keyword can be bought by a single supplier only.  Featured Premium Listing: Listing of clients as per their preferred city for business.  Industry leader: Top priority listing service. Will always be listed on top whenever a product is searched related to that industry. Only one supplier can be an Industry leader of any particular industry. These listing services help in Search Engine Optimization which facilitates the visibility of suppliers on the platform. BUY LEADS: Buy Leads provide instant access to Buyers and their requirements. Buy Leads are generated through three ways:  Free buy requirement: Buy requirement made by buyers to IndiaMART.ss  Direct buy requirement: Buy requirement made by buyers directly to the suppliers.
  • 8. Summer Internship Report Page 8  Intent: Our system analyzes activities of users on the website and application, and figures out their intent to buy product if any. Henceforth, it creates buy leads and posts them to supplier‟s account after verification. These leads are posted at supplier‟s account and they can buy the leads as per their requirement. So Buy leads package, provide a pre-paid system for having Leads in your account which they can consume at any point of time. Customers access the service both from website and mobile application. IndiaMART Website: Suppliers can purchase any of the products or listing service for three different tenures i.e. monthly, annually or 3-years (multi yearly). ACCESS TO SERVICE:
  • 9. Summer Internship Report Page 9 SOFTWARE OR LANGUAGES USED:  R R is the free software environment and programming language for statistical analysis and graphics. The R language is widely used by statistician and data miners for various statistical analysis and statistical software development. R is supported by wide varieties of UNIX platforms, windows and MacOS. I used R to perform various statistical analysis and text mining. Some examples of the libraries used are stringr(), stringdist(), plyr() etc. Version used: R 3.2.1  SQL Used it for extracting the required data for the analysis from the online database system. SQL (Structured Query Language) is a standard interactive and programming language for getting information from a database. Queries take the form of a command language that lets you select , insert , update, find out location of data and so forth. There is also a programming interface. Download this free guide
  • 10. Summer Internship Report Page 10 DESCRIPTION OF PROJECTS / ACTIVITIES PROJECT#1 AIM: Analysis and Implementation of Product which maps a product to its most relevant Mcat by considering the maximum string match and maximum number of leads for that Mcat in the previous three months. DATA DESCRIPTION: Product Name – Contains the list of all the product names to which we have to assign the most relevant Mcat PC_ITEM_GLUSR_ID PC_ITEM_ID PC_ITEM_NAME MCAT_ID Lead Name – Contain the LEAD_OFR_TITLE to which the product name is matched and the corresponding MCAT_ID is stored in the Match_Results. ETO_OFR_TITLE MCAT_ID ETO_OFR_GLCAT_MCAT_NAME Match Results - The output containing 7 columns , having the info related to the best match of OFR_TITLE and product names. Match Results Final- Here we also considered the no. of leads corresponding to the Mcat IDs selected based on string matching. Merge to files by GL_MCAT_ID . And sort the result based on the no. of leads. PROCEDURE: 1. First I removed “ ,” and ( ) from the PC_ITEM_NAME and then the extra spaces produced due to removing , and (). This was done using regular expressions in R.
  • 11. Summer Internship Report Page 11 2. Then I removed “ ,” and ( ) from the ETO_OFR_TITLE and then the extra spaces produced due to removing , and (). This was also done using regular expressions in R. 3. Since I had to match the PC_ITEM_NAME with the LEAD_OFR_TITLE and to find out how much match is there, I had to break up the PC_ITEM_NAME into smaller fragments. 4. So, I splitted the PC_ITEM_NAME into single words using strsplit() function in stringr() library of R and put this output in a list. Splitted_Row is a list of splitted PC_ITEM_NAME. 5. Similarly I splitted the LEAD_OFR_TITLE and stored the output in a list. Splitted_OFR_ID is a list of splitted LEAD_OFR_TITLE. 6. Now I created vectors of columns of Product_Name matrix and put them in a list and as a list so that accessing the elements of a list becomes easy. 7. Similarly I created vectors of columns of Lead_Name matrix and put them in a list and as a list so that accessing the elements of a list becomes easy. 8. Then I combined these vectors using cbind() function in R and formed two datasets namely product_list1 and Lead_list1 which had list inside list. 9. After all this data preparation and modifications I started with the loop. Before that I created a null dataframe namely Match_Results which had these columns OP_USR_ID, OP_ITEM_ID,OP_ITEM_NAME,OP_MCAT_ID,OP_LN_OFR_TITLE,OP_LN_MCAT _NAME, OP_LN_MCAT_ID all initiated to zero. Loop: 1. For a particular row in the product_list1 which contain the splitted PC_ITEM_NAME, access its elements one by one and check if it matches with any of the elements in the Lead_list1 which contains the splitted LEAD_OFR_TITLE. 2. Once a match is found increase the score by one and check for the next splitted word of the same PC_ITEM_NAME. 3. If the last word of splitted PC_ITEM_NAME matches with any fragment of the splitted LEAD_OFR_TITLE then increase t by 1.
  • 12. Summer Internship Report Page 12 4. Similarly If the second last word of splitted PC_ITEM_NAME matches with any fragment of the splitted LEAD_OFR_TITLE then increase v by 1. 5. After checking all the fragments of the splitted PC_ITEM_NAME with the splitted LEAD_OFR_TITLE, check the values of s, t and v. 6. To consider LEAD_OFR_TITLE as a match it has to satisfy certain criteria. The values of s and t should not be equal to 0 that means that last and second last word must compulsorily match. 7. The next condition to consider a LEAD_OFR_TITLE as a match is that it should satisfy a certain threshold of percentage match with PC_ITEM_NAME which varies with different length of different PC_ITEM_NAME. 8. Now after deciding that weather this is a match or not go to the next LEAD_OFR_TITLE and do the same. It has be done for all the LEAD_OFR_TITLE. 9. If a particular LEAD_OFR_TITLE is considered as a match then put this LEAD_OFR_TITLE in OP_LN_OFR_TITLE which is a null vector. Similarly put the corresponding MCAT_ID and MCAT_NAME in the OP_LN_MCAT_ID and OP_LN_MCAT_NAME vectors respectively. 10. Similarly for a particular PC_ITEM_NAME if a match is found in the LEAD_OFR_TITLE , then put corresponding MCAT_ID, PC_ITEM_NAME, GLUSR_ID and PC_ITEM_ID in the null vectors OP_MCAT_ID, OP_ITEM_NAME, OP_USR_ID and OP_ITEM_ID respectively and use rbind() function to repeat the observations till the loop iterates for LEAD_OFR_TITLE. 11. Now use the cbind() function to combine all of the above mentioned vectors and give it the name Match_Results_K. This is only for one PC_ITEM_NAME. 12. So, repeat it for all the PC_ITEM_NAME and use rbind() function to get the final result in a dataframe which was named as Match_Results. 13. Now before going for the next PC_ITEM_NAME, empty all the vectors so that they can store the new values related to next PC_ITEM_NAME. 14. Finally Remove the first row of zeroes from the Match_Results and merge it with Lead_Count data by MCAT_ID and the final result as Match_Results_Final which have
  • 13. Summer Internship Report Page 13 all the Match_Results data along with the no. of leads corresponding to every PC_ITEM_NAME. 15. Then I exported this data in csv format using write.csv() command in R. And sorted the output in excel by PC_ITEM_NAME and then added a level of no. of leads. 16. This gave me the final output which contained the PC_ITEM_NAME and all the LEAD_OFR_TITLE which were considered as a match in sorted format according to the no. of leads. The LEAD_OFR_TITLE with maximum of no. of lead comes at top for a particular PC_ITEM_NAME. OP_LN_MCAT_ID OP_USR_ID OP_ITEM_ID OP_ITEM_NAME OP_MCAT_ID OP_LN_OFR_TITLE OP_LN_MCAT_NAME NO._OF_LEADS COMPARISON WITH THE CURRENT SEARCH ALGORITHM: When we were ready with our algorithm which maps a product to an Mcat which gets the maximum no. of leads, Mr. Samarendra Pratap (AVP, Product Management, IndiaMART) gave a list of 9000 products. On these products we had to run our algorithm and find the difference in the no. of leads for a particular product mapped to a particular Mcat by our algorithm and the current search algorithm which is live in the system. DATA DESCRIPTION: 1. Samar_Products Table containing all the products wih more than one Mcat assigned, elements in the first column repeat themselves instead of putting "" so that merging is possible when required. 2. Lead_Count It is the master Mcat which contains the no. of leads corresponding to every Mcat 3.paid_supplier_new_products_mcat1 All the products with more than one Mcats, items in left column repeat themselves 4. Samar_Products_Max_Leads Data containing the product and the corresponding Mcat which comes on top in search
  • 14. Summer Internship Report Page 14 results when searched on IndiaMART portal. 5. Somesh_Final_Results Data of all the products and all the corresponding Mcats with the no. of leads 6. Somesh_Final_Max_Results A subset of Somesh_Final_Results, where it contains only the Mcats with max leads 7. Samar_Max_Results Data containing the no. of leads corresponding to only the Mcat which comes on top in search results 8. paid_supplier_new_products_mcat The original data set of 9k products with removed blank rows 9. Samar_Final_Max_Results Data containing the product and the corresponding Mcat which comes on top in search results and also the lead count 10. paid_supplier_new_products_mcat2 Top most Mcat corresponding to a product LOOP: 1. Firstly I removed the products which have only one Mcat because in that case no comparison could be made. 2. To fecilitate merging I had to repeat the Product in the 1st column for the corresponding Mcats. For this I checked if the first column is blank and the Mcat column has some value in it, then put the value of the previous row in the product column to the current cell. 3. Now I merged it with the Lead count table which contain the no. of leads for a Mcat. 4. The result is the table which contain all the products with their corresponding Mcats and the no. of leads. 5. Finally I considered only the Mcat which had the maximum leads corresponding to a product.
  • 15. Summer Internship Report Page 15 6. Also create a table which contains only one Mcat which comes on top on searching on IndiaMART portal corresponding to a particular product. 7. Merge this table with the Lead Count data by Mcat ID. Now we have the data of the product, the corresponding Mcat which comes on top while searched on IndiaMART portal and the corresponding no. of leads. 8. After all this we can compare or find the difference between the no. of leads of Mcat assigned by our algorithm and that which comes on top when being searched. RESULT: After analyzing the output files and comparing them, It was found that on an average 142 leads comes to an Mcat which comes on top while being searched. While if we apply our algorithm and assign a different Mcat with maximum leads to the same product. On an average the new no. of leads would be 321. Therefore the gain is of 179 leads per product which will make suppliers much more happy and engaged on IndiaMART portal.
  • 16. Summer Internship Report Page 16 PROJECT#2 AIM: Analysis regarding the auto-rejection of intent generated leads by matching their secondary Mcats in deleted leads . Also to find the potential loss that would occur if auto-rejection system is implemented. DATA DESCRIPTION: Deleted_total.csv The list of all the deleted leads with code 1 and 45 which implies – with manual deletion and auto deletion. The corresponding deletion date is also mentioned.(1st -7th May) FK_GLUSR_USR_ID ETO_OFR_FENQ_DATE FK_GLCAT_MCAT_ID Approved_total.csv This is the data of leads which were approved in the same period. From this we can infer the potential loss by matching the secondary Mcats. Secondary_Mcat.csv The data of all the leads which were live during 1st – 7th May. From this data we can check if there are leads with same secondary Mcats as in deleted leads. Then that many leads could have been auto rejected. And we can also check for the potential loss by finding leads with same secondary Mcats as in approved leads. Then that many approved leads would have been auto rejected if we implement the auto-rejection system. mydata The output file, which had the following format. USR_ID OFR_ID MCAT_ID
  • 17. Summer Internship Report Page 17 PROCEDURE: To find the leads which could have been Auto-rejected: 1. First of all to find all the leads which could be autorejected, I checked for the cases in the deleted leads which had same secondary Mcats. 2. Now I checked for another condition that those leads where there was a secondary Mcat match must have the same GLUSR_ID. 3. The next condition to check was that the lead must have been offered before the deletion. 4. Since Secondary_Mcat table also contains some primary Mcats, Therefore the next condition was to check if there was a primary Mcat match then just Ignore that case. To check for potential loss: 1. Here, I followed the same steps as mentioned above. The only change here is to use the Approved_table instead of Deleted_table. 2. Approved table contains all the approved leads. Now I applied the same conditions as mentioned above and found the leads with matching secondary Mcats. 3. These matches implies that these many leads would be auto-rejected if we implement the auto-rejection system. So, in other way we came to know the potential loss. 4. For both of the above activities the output was in the following format. RESULT: 1. The exact no. of deleted leads which could have been auto-rejected was found to be 2690. 2. While the exact no. of approved leads which would have been auto-rejected by implementing the auto-rejection system was found to be 8440. 3. Therefore, it was found that the loss due to the rejection of good leads is more than the cost saved from rejecting false positive leads . So, it was decided not to implement the autorejection system.
  • 18. Summer Internship Report Page 18 PROJECT#3 AIM: For a particular ticket raised by a customer, find which of the standard issues were present in the description of the ticket by string matching. The expected output is a matrix with the following format: A particular row is corresponding to Ticket_ID and all the columns are corresponding to the standard issues which can be possibly the reasons for ticket generation. These columns should contain 1 if the that issues is present else 0. DATA DESCRIPTION: Somesh_Ticket: Data related to the tickets raised by suppliers. It contains their user Ids, customer ticket Ids , the date of ticket issue, ticket detail, appendum and the history of the ticket. GLID TICKET_ID ISSUE_DATE TICKET_DETAIL APPENDUM TICKET_HISTORY Result: Output table containing all the above mentioned columns and 1 and 0 in the new columns. Other new columns are following: These are which created to capture some particular issues, if they are present in text provided by a particular customer. STOP NO_BENEFIT NO_MATURITY NO_TIME_TO_USE FAKE_BUYERS IRRELEVANT_ENQUIRIES FAKE_BL NOTICE_PERIOD BUSINESS_CLOSED HYPER_LOCAL_BUYER MISCOMMITMENT PHYSICAL_VISIT NOT_TECH_SA VY BUYER_WANT_LOW_PR ICE LANGUAGE_BARRI ER CHANGES_REQUIR ED WRONG_PRODUCT WRONG_IMAGE WRONG_CATALOG CHANGE_NUMBER
  • 19. Summer Internship Report Page 19 CHANGE_EMAIL CLOSE_ACCOUNT NOT_INTERESTED REMOVE_PRICE DIFFERENT_CATALOG CHANGE_AFTER_APPROVAL PROCEDURE: 1. Firstly I created a null dataframe with above mentioned column names so that they represent different standard issues and initially assigned them the value 0. 2. The list of standard issues is following: 1."stop the service/ Deactivate the service" 2."Did not get benefit/ No Benefit/Did not get Business" 3."Did not get maturity/ No Maturity/ Maturity Issues/Deal Not Maturing" 4."No time to use the service" 5."Buyers not responding/fake Buyers/Fake Leads" 6."Irrelevant Enquiries/ Less Enquiries/Bulk Enquiries/Low Enquiry/Retail Enquiry/Wrong Enquiry" Enquiry/Inquiry/Query 7."Wrong Buy Leads/Fake Buy Lead/ Wrong BL" Buy Lead= BL 8."Notice Period" 9."Business Closed/Out of India/Partnership Issue/Personal reason/Changed My Business" 10."Need hyper-local buyer/ Hyper local enquiries" 11."Wrong commitment from sales/Miscommitment/mis-commitment" 12."physical visit" 13."Client is not Tech Savy/Tech Savvy/ Computer Savvy" 14. "Buyer quote very less price/Buyer asking Low Price/Buyer want low price" 15. "Language Barrier/Tamil" 16. Changes required 17. Wrong Product 18. Wrong Image 19. Wrong Catalog 20. Change number 21. Change email 22. Close the account 23. Not interested 24. Remove Price 25. not the same catalog that i approved 26. change after approval/ change after hosting approval 3. In every ticket description I checked for the following strings using grepl() function in R. Each string has a corresponding column in the output dataframe.
  • 20. Summer Internship Report Page 20 4. If string was found to be present I put 1 in that column for that particular row else I put zero. PROJECT#4 Aim: To predict weather a customer is going to renew his subscription or not. DATA DESCRIPTION: Complete file.csv Data containing the information about the customers eg. What is their turn over value , state from which they operate , how many emloyees they have, what is their business type ie. manufacturer, wholesale trader, service provider etc. PROCEDURE:  Import the data file in R.  Then consider columns in the input as deciding variables.  Create a model to predict whether an existing customer is going to renew his subscription at the end of his subscription cycle or not.  Use decision tree C5.0 in R to create a large set of rules which will be used for final predictions as stated above. PROJECT#5 Aim: To find out the most recurring Brands for a particular Mcat so that they can form separate category. Also to find out the most asked specifications for products, so that only these specifications can be made compulsory for the agent to enquire and get rid off not so important ones to reduce the calling cost. DATA DESCRIPTION: Direct _Text: Table containing only 2 columns ie. lead description and the Mcat of direct leads. This description is in text format and contains all the brands and specifications which we have to find out. Brand_Data:
  • 21. Summer Internship Report Page 21 The table which contains the description of the lead, Brand name and Mcat name (3 columns). These brand names(2nd column) have been found out from the 1st column of the above table. Brand_Data_Result: The output table which contains along with the 3 columns mentioned above of Brand_Data - all the specifications corresponding to that Lead eg. size, quantity, budget etc. PROCEDURE: To find the Brand Names :- 1. Firstly I was instructed to look for the Brand names in the Lead description text. 2. I considered only two columns namely ETO_OFR_DESC and the MCAT . 3. Now in the description column of this new sheet I searched the word “Brand” so that we can find the leads in which indeed some Brands were mentioned. 4. Then we considered the text after the word “Brand” and continued till the new line starts, therefore it considered the multiple Brand names separated by “and”, “comma” and “or”. 5. After splitting the text based on the above words all the brands can be separated. 6. Put all of these Brands and corresponding Mcat in different rows. 7. Remove the rows which contain wrongly captured Brand names eg. „any‟, „other‟ and „all‟. 8. Remove the Brands which starts with a number except “3m” because that‟s a brand. 9. Now at this point we have all the genuine Brands and the corresponding Mcats. The output is in the following format. Description Brand_Name Mcat_Name To find the specifications: Here also quite similar procedure was followed as mentioned above. In this first case I splitted the text using strsplit() function in stringr library of R by “:” because almost all the specifications had this in common eg. Budget: 50000 INR, Model: 5690 etc. 1. Now the splitted text is in the list format. Check if the length of this list is less than 2. If yes then consider the only first three columns in the final output. Otherwise split the list by “n”. 2. After doing this access the elements of the list one by one, and attach the last word of a string to the first word of next consecutive string. This attachment can be done by paste() function in R
  • 22. Summer Internship Report Page 22 3. Now put the result of the attachment in a different column of the result. It was decided that maximum no. of columns can go up to 10 so that almost all the specifications can be captured. 4. Since in a particular lead the word “Brand:” can be anywhere either before, after or in between the specifications. So to keep the specifications aside what I did was that I looked for the word “Brand:” in all the columns for a particular row and wherever I found it I applied the Swap operation between that value and the value in the 4th column so that all Specifications come in 5th to 10th column not before that. 5. The header of the output is in following format. The 2nd and the 4th column have the same value. Its just that the Brand name in 4th column comes after “Brand:”. Description Brand Mcat Brand: Spec1 Spec2 Spec3 Spec4 Spec5 Spec6 The summary of the Results:- 1. Using pivot table, we generated the summary in which the counts of all the Mcats mapped to a particular Brand are mentioned. 2. Similarly we created another summary sheet which contained the counts of Brands mapped to a particular Mcat. To find the count of Brands and Specifications for a particular Mcat :- In this part the required output is in the following format. Mcat_Name Count_Of All_Brands Brand_Name Individual Brand_Count Specifications Specification_Freq 1. The output of the previous activity has been used as input for this one with certain modifications. 2. The first modification required was to remove the values attached with the specifications eg. Budget: 50000 INR, I had to remove the value after “:” so that only specification name remains and which can be counted easily. 3. For this I splitted the specification text by “:” using strsplit() function in R which returns a list. In this list the first element is the specification name like “budget” and the second
  • 23. Summer Internship Report Page 23 element is the value of that specification ie. “50000 INR”. The first is what we needed to put instead of the whole text. 4. Also It was needed to convert the specification‟s name to lower case so that “Budget” and “budget” are not different from each other when converted into factors should give a cumulative count. 5. Then I removed those specifications which were pure numeric in nature and occur due to error. 6. This was all the data preparation that was needed for this activity. Now I created a null dataframe called Result which had 6 columns same as mentioned above and all of them were initiated to zero. 7. The specification file has the following format. Mcat Brand: Spec1 Spec2 Spec3 Spec4 Spec5 Spec6 8. To find the total count of brands, I checked that how many times a particular Mcat in the first column repeated itself. 9. Now put all the specifications in a vector b. It contains repeated specfications which means that all the specifications for every brand for a particular Mcat are in this vector. 10. Now put all the brands in a vector a. It contains repeated brands which means that all the brands for a particular Mcat are in this vector. 11. After all this we have to find out the individual brand count and individual specifications count for a particular Mcat. 12. For this I used count() function in plyr() library in R. This function is used to return a dataframe of frequency of different variables. 13. I converted the elements of vectors a and b to factors so that count() function can work and can return the dataframe which contains the frequency of the variables. These dataframes are named as summary_a and summary_b. 14. Now unfactor these dataframes using unfactor() function in varhandle() library of R, so that the elements in these dataframes can be accessed in put into the final result sheet. 15. To know how much rows are required in the final result for a particular Mcat, find the maximum length among (length(summary_a[[1]]), length(summary_b[[1]]), count). 16. Finally put the values from summary_a and summary_a in the final result dataframe along with their total and individual count corresponding to a particular Mcat.
  • 24. Summer Internship Report Page 24 BIBLIOGRAPHY: I referred to some books which had provided me with much of guidance for the project. Apart from domain knowledge these books had provided us deep insights of the subject. BOOKS:  R for programmers by Norman Matloff  Introducing Python by Bill Lubanovic APPENDIX: #Project 1: Part1 - # To remove the , and ( ) from the product names Product_Name[,3] <- str_replace_all(Product_Name[,3], "[^[:alnum:]]", " ") #After that remove extra spaces produced due to removing , and () Product_Name[ ,3] <- gsub(pattern = "s+", replacement = " ", Product_Name[ ,3]) # We have to remove the , and ( ) from the Lead names Lead_Name[,1] <- str_replace_all(Lead_Name[,1], "[^[:alnum:]]", " ") #After that remove extra spaces produced due to removing , and () Lead_Name[ ,1] <- gsub(pattern = "s+", replacement = " ", Lead_Name[ ,1]) # Now do the splittig of product names #To produce splitted text of Product_Name in list format test = 0 for (i in 1:length(Product_Name)) { #for (i in 1:20) { print(i) test1 = (strsplit(Product_Name[i,3], " ")) test = rbind(test, test1)
  • 25. Summer Internship Report Page 25 } Splitted_Row = test[-1] #because first row is 0 #Splitted_OFR_ID is a list of splitted offer title name test2 =0 for (i in 1:length(Lead_Name)) { print(i) test1 = (strsplit(Lead_Name[i,1], " ")) test2 = rbind(test2, test1) } Splitted_OFR_ID = test2[-1] #because first row is 0 #Creating vectors of columns of Product_Name matrix and putting them in a list as a list so that access in a list becomes easy PN_USR_ID = list(Product_Name[ ,1]) PN_ITEM_ID = list(Product_Name[ ,2]) PN_ITEM_NAME = list(Product_Name[ ,3]) PN_MCAT_ID = list(Product_Name[ ,4]) #Creating vectors of coloumns of LEAD_NAME matrix and putting them in a list as list so that accessing the elements becomes easy LN_OFR_TITLE = list(Lead_Name[ ,1]) LN_MCAT_ID = list(Lead_Name[ ,2]) LN_MCAT_NAME = list(Lead_Name[ ,3]) #combining the data product_list1 = list( PN_USR_ID = PN_USR_ID, PN_ITEM_ID = PN_ITEM_ID, PN_ITEM_NAME = PN_ITEM_NAME,PN_MCAT_ID= PN_MCAT_ID, pn_splitted = Splitted_Row ) Lead_list1 = list( LN_OFR_TITLE= LN_OFR_TITLE, LN_MCAT_ID = LN_MCAT_ID,
  • 26. Summer Internship Report Page 26 LN_MCAT_NAME = LN_MCAT_NAME, ln_id_splitted = Splitted_OFR_ID ) # Loop to search matches b/w splitted_row and splitted _ofr_id # Initialization values s = 0 t=0 v=0 J=0 flag = 0 count = 0 count1 = 0 Match_Results = cbind(OP_USR_ID=0, OP_ITEM_ID=0,OP_ITEM_NAME=0,OP_MCAT_ID=0, OP_LN_OFR_TITLE=0,OP_LN_MCAT_NAME=0,OP_LN_MCAT_ID=0) for( i in 1:300) #for(i in 1:length(product_list1$pn_splitted)) { print(i) count1 = 0 #print(length(product_list1$pn_splitted[[i]])) J = length(product_list1$pn_splitted[[i]]) for (k in 1:length(Lead_list1$ln_id_splitted)) { s = 0
  • 27. Summer Internship Report Page 27 t = 0 for ( j in 1:length(product_list1$pn_splitted[[i]])) { for (l in 1:length(Lead_list1$ln_id_splitted[[k]])) { # compulsory match for last word if(identical( product_list1$pn_splitted[[i]][J], Lead_list1$ln_id_splitted[[k]][l] ) == TRUE) { t = t+1 } # compulsory match for second last word if(identical( product_list1$pn_splitted[[i]][J-1], Lead_list1$ln_id_splitted[[k]][l] ) == TRUE) {v = v+1} if( identical( product_list1$pn_splitted[[i]][j], Lead_list1$ln_id_splitted[[k]][l] ) == TRUE ) { s = s + 1 break #print(s) } } } if(J==1) { r = 1 } if(J == 2) { r = 1 }
  • 28. Summer Internship Report Page 28 if(J == 3) { r = .65 } if(J == 4) { r = .7 } if(J == 5) { r = .8 } if(J == 6) { r = .6 } if( (s/J >= r | s >= 4) & t!=0 & v!=0 ) { print(k) count1 = count1 + 1 if(flag == 0) { OP_MCAT_ID = product_list1$PN_MCAT_ID[[1]][i] # Returning item name of that item OP_ITEM_NAME = product_list1$PN_ITEM_NAME[[1]][i] # Returning iuser id of that item OP_USR_ID = product_list1$PN_USR_ID[[1]][i] # Returning item id of that item op_ITEM_ID = product_list1$PN_ITEM_ID[[1]][i]
  • 29. Summer Internship Report Page 29 # Returning OFFER TITLE OP_LN_OFR_TITLE = Lead_list1$LN_OFR_TITLE[[1]][k] # Returning Lead Mcat Id OP_LN_MCAT_ID = Lead_list1$LN_MCAT_ID[[1]][k] # Returning Lead Mcat Name OP_LN_MCAT_NAME = Lead_list1$LN_MCAT_NAME[[1]][k] flag = 1 } else { # Returning mcat Id of that item OP_MCAT_ID1 = product_list1$PN_MCAT_ID[[1]][i] OP_MCAT_ID = rbind(OP_MCAT_ID,OP_MCAT_ID1) OP_ITEM_NAME1 = product_list1$PN_ITEM_NAME[[1]][i] OP_ITEM_NAME = rbind(OP_ITEM_NAME, OP_ITEM_NAME1) # Returning user id of that item OP_USR_ID1 = product_list1$PN_USR_ID[[1]][i] OP_USR_ID = rbind(OP_USR_ID, OP_USR_ID1) # Returning item id of that item op_ITEM_ID1 = product_list1$PN_ITEM_ID[[1]][i] op_ITEM_ID = rbind(op_ITEM_ID, op_ITEM_ID1)
  • 30. Summer Internship Report Page 30 # Returning OFFER TITLE OP_LN_OFR_TITLE1 = Lead_list1$LN_OFR_TITLE[[1]][k] OP_LN_OFR_TITLE = rbind(OP_LN_OFR_TITLE, OP_LN_OFR_TITLE1) MCAT_NUM = as.numeric(Lead_list1$LN_MCAT_ID[[1]][k]) OP_LN_MCAT_ID1 = MCAT_NUM OP_LN_MCAT_ID = rbind(OP_LN_MCAT_ID, OP_LN_MCAT_ID1) # Returning Lead Mcat Name OP_LN_MCAT_NAME1 = Lead_list1$LN_MCAT_NAME[[1]][k] OP_LN_MCAT_NAME = rbind(OP_LN_MCAT_NAME, OP_LN_MCAT_NAME1) }}} Match_Results_K = cbind( OP_USR_ID, op_ITEM_ID, OP_ITEM_NAME, OP_MCAT_ID, OP_LN_OFR_TITLE, OP_LN_MCAT_NAME,OP_LN_MCAT_ID) Match_Results_K <- subset(Match_Results_K, !duplicated(Match_Results_K[,7])) Match_Results = rbind(Match_Results,Match_Results_K) OP_LN_OFR_TITLE = NULL OP_LN_MCAT_ID = NULL OP_LN_MCAT_NAME = NULL OP_USR_ID = NULL OP_MCAT_ID = NULL op_ITEM_ID = NULL OP_ITEM_NAME = NULL # Counting the products for which there is no match
  • 31. Summer Internship Report Page 31 if(count1 == 0) { count = count + 1 } } # To remove the first row of zeroes from the result Match_Results = Match_Results[ -1, ] Match_Results_final = merge(Match_Results, Lead_Count, by.x="OP_LN_MCAT_ID", by.y = "GLCAT_MCAT_ID") # Finally putting the no. of leads corresponding to different Mcat #IDs in the Match_Results # Using merge function #write.csv(Match_Results_final ,"Match_Results_83.csv") Part 2 : Comparison # To remove the products with only one Mcat for (i in 1:length(paid_supplier_new_products_mcat[[1]])) #for(i in 29464:29469) { print(i) if(paid_supplier_new_products_mcat[i,1] != "" & paid_supplier_new_products_mcat[i+1,1] != "" ) { paid_supplier_new_products_mcat[i,1] = "" paid_supplier_new_products_mcat[i,2] = "" } } # Remove the blank rows in Excel
  • 32. Summer Internship Report Page 32 write.csv(paid_supplier_new_products_mcat, "Samar_Products.csv") # To facilitate merging Product should repeat in 1st column for the corresponding Mcats for (i in 1:length(paid_supplier_new_products_mcat1[[1]])) { print(i) if(paid_supplier_new_products_mcat1[i,1] == "") { paid_supplier_new_products_mcat1[i,1] = paid_supplier_new_products_mcat1[i-1,1] } } Somesh_Results1 = merge(paid_supplier_new_products_mcat1, Lead_Count, by.x="Mcat", by.y = "GLCAT_MCAT_NAME") write.csv(Samar_Final_Max_Results, "Samar_Final_Max_Results.csv") # To interchange the 1st and 2nd columns for (i in 1:length(Results[[1]])) { print(i) temp = Results[i,1] Results[i,1] = Results[i,2] Results[i,2] = temp } colnames(Results) = c( "Product","Mcat" , "GLCAT_MCAT_ID", "JFM.Approved") write.csv(Results, "Somesh_Final_Results.csv") write.csv(Samar_Final_Max_Results, "Samar_Final_Max_Results.csv") # To consider only the maximum leads Mcats
  • 33. Summer Internship Report Page 33 for (i in 1:length(paid_supplier_new_products_mcat1[[1]])) { print(i) if(paid_supplier_new_products_mcat1[i,1] == "") { paid_supplier_new_products_mcat1[i,2] = "" } } write.csv(paid_supplier_new_products_mcat1, "test.csv") write.csv(Somesh_Results1, "Samar_Final_Max_Results.csv") # To consider only the maximum leads Mcats for (i in 1:length(paid_supplier_new_products_mcat2[[1]])) { print(i) if(paid_supplier_new_products_mcat2[i,1] == "") { paid_supplier_new_products_mcat2[i,2] = "" } } write.csv(paid_supplier_new_products_mcat2, "paid_supplier_new_products_mcat2.csv") Samar_Final_Max_Results = merge(paid_supplier_new_products_mcat2, Lead_Count, by.x="Mcat", by.y = "GLCAT_MCAT_NAME") Somesh_Results1 = merge(paid_supplier_new_products_mcat1, Lead_Count, by.x="Mcat", by.y = "GLCAT_MCAT_NAME") Somesh_Final_Results = Somesh_Results1 write.csv(Somesh_Final_Max_Results, "Somesh_Final_Max_Results.csv") Results = merge(Samar_Products, Lead_Count, by.x="Mcat", by.y = "GLCAT_MCAT_NAME" )
  • 34. Summer Internship Report Page 34 write.csv(Results, "Somesh_Results.csv") # Code for trimming for (i in 1:length(paid_supplier_new_products_mcat1[[1]])) { print(i) paid_supplier_new_products_mcat1[i,2] = trimws(paid_supplier_new_products_mcat1[i,2]) } for (i in 1:length(Samar_Products[[1]])) { print(i) Samar_Products[i,2] = trimws(Samar_Products[i,2]) } Project #2: Loop to search in deleted leads data - for(i in 1:length(deleted_total[ ,1])) { for (j in 1:length(secondary_mcat[ ,1]) ) { if(deleted_total[i,1] == secondary_mcat[j,1] && deleted_total[i,3]== secondary_mcat[j,3] && secondary_mcat[j,4] <= deleted_total[i,2] && secondary_mcat[ j,3]!= secondary_mcat[ j,5]) { print(i) USR_ID = secondary_mcat[j,1] OFR_ID = secondary_mcat[j,2]
  • 35. Summer Internship Report Page 35 MCAT_ID= secondary_mcat[j,3] DEL_REASON= deleted_total[i,4] mydata1 = cbind(USR_ID,OFR_ID,MCAT_ID,DEL_REASON) mydata = rbind(mydata, mydata1) } } } Loop to search in Approved leads data: mydata = cbind(USR_ID = 0,OFR_ID = 0, MCAT_ID = 0 for(i in 1:(length(approved_total[ ,1])) ) { print(i) for (j in 1:length(secondary_mcat[ ,1]) ) { if(approved_total[i,1] == secondary_mcat[j,1] && approved_total[i,3]== secondary_mcat[j,3] && secondary_mcat[j,4] <= approved_total[i,2] && secondary_mcat[ j,3]!= secondary_mcat[ j,5]) { USR_ID = secondary_mcat[j,1] OFR_ID = secondary_mcat[j,2] MCAT_ID= secondary_mcat[j,3] mydata1 = cbind(USR_ID,OFR_ID,MCAT_ID) mydata = rbind(mydata, mydata1) } } } Project #3: Result = data.frame()
  • 36. Summer Internship Report Page 36 GLID = 0 CUSTOMER_TICKET_ID = 0 CUSTOMER_TICKET_ISSUE_DATE = 0 CUSTOMER_TICKET_DETAIL = 0 APPENDUM = 0 TICKET_HISTORY = 0 STOP = 0 NO_BENEFIT = 0 NO_MATURITY = 0 NO_TIME_TO_USE = 0 FAKE_BUYERS = 0 IRRELEVANT_ENQUIRIES = 0 FAKE_BL = 0 NOTICE_PERIOD = 0 BUSINESS_CLOSED = 0 HYPER_LOCAL_BUYER = 0 MISCOMMITMENT = 0 PHYSICAL_VISIT = 0 NOT_TECH_SAVY = 0 BUYER_WANT_LOW_PRICE = 0 LANGUAGE_BARRIER = 0 CHANGES_REQUIRED = 0 WRONG_PRODUCT = 0
  • 37. Summer Internship Report Page 37 WRONG_IMAGE = 0 WRONG_CATALOG = 0 CHANGE_NUMBER = 0 CHANGE_EMAIL = 0 CLOSE_ACCOUNT = 0 NOT_INTERESTED = 0 REMOVE_PRICE = 0 DIFFERENT_CATALOG = 0 CHANGE_AFTER_APPROVAL=0 for (i in 1:length(Somesh_Ticket[[1]])) #for(i in 1:10000) { print(i) GLID = Somesh_Ticket[i,1] CUSTOMER_TICKET_ID = Somesh_Ticket[i,2] CUSTOMER_TICKET_ISSUE_DATE = Somesh_Ticket[i,3] CUSTOMER_TICKET_DETAIL = Somesh_Ticket[i,4] APPENDUM = Somesh_Ticket[i,5] TICKET_HISTORY = Somesh_Ticket[i,6] a = paste(Somesh_Ticket[i,4], Somesh_Ticket[i,5], Somesh_Ticket[i,6]) #1 if(grepl("stop the service", a, ignore.case = TRUE)) { STOP = 1 } else if(grepl("Deactivate the service", a, ignore.case = TRUE))
  • 38. Summer Internship Report Page 38 { STOP = 1 } else { STOP = 0} #2 if(grepl("Did not get benefit", a, ignore.case = TRUE)) { NO_BENEFIT = 1 } else if(grepl("No Benefit", a, ignore.case = TRUE)) { NO_BENEFIT = 1 } else if(grepl("Did not get Business", a, ignore.case = TRUE)) { NO_BENEFIT = 1 } else {NO_BENEFIT = 0 } #3 if(grepl("Did not get maturity", a, ignore.case = TRUE)) { NO_MATURITY = 1 } else if(grepl("No Maturity", a, ignore.case = TRUE)) { NO_MATURITY = 1 } else if(grepl("Maturity Issues", a, ignore.case = TRUE)) { NO_MATURITY = 1 } else if(grepl("Deal Not Maturing", a, ignore.case = TRUE))
  • 39. Summer Internship Report Page 39 { NO_MATURITY = 1 } else { #4 if(grepl("No time to use the service", a, ignore.case = TRUE)) { NO_TIME_TO_USE = 1 } else { NO_TIME_TO_USE = } #5 if(grepl("Buyers not responding", a, ignore.case = TRUE)) { FAKE_BUYERS = 1 #FAKE_BUYERS = rbind(FAKE_BUYERS, FAKE_BUYERS1) } else if(grepl("fake Buyers", a, ignore.case = TRUE)) { FAKE_BUYERS = 1 } else if(grepl("Fake Leads", a, ignore.case = TRUE)) { FAKE_BUYERS = 1 else
  • 40. Summer Internship Report Page 40 { FAKE_BUYERS = 0 } if(grepl("Irrelevant Enquiries", a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else if(grepl("Less Enquiries", a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else if(grepl("Bulk Enquiries" , a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else if(grepl("Low Enquiry", a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else if(grepl("Retail Enquiry", a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else if(grepl("Wrong Enquiry", a, ignore.case = TRUE)) {
  • 41. Summer Internship Report Page 41 irrelevant_enquiries = 1 } else if(grepl("Irrelevant inquiries", a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else if(grepl("Less inquiries", a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else if(grepl("Bulk inquiries", a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else if(grepl("Low inquiry", a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else if(grepl("Retail inquiry", a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else if(grepl("Wrong inquiry", a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else if(grepl("Irrelevant queries", a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else if(grepl("Less queries", a, ignore.case = TRUE)) {
  • 42. Summer Internship Report Page 42 irrelevant_enquiries = 1 } #6 else if(grepl("Bulk queries", a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else if(grepl( "Low query", a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else if(grepl("Retail query", a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else if(grepl("Wrong query", a, ignore.case = TRUE)) { irrelevant_enquiries = 1 } else { irrelevant_enquiries = 0 } #7 if(grepl("Wrong Buy Leads", a, ignore.case = TRUE)) {
  • 43. Summer Internship Report Page 43 FAKE_BL = 1 } else if(grepl("Fake Buy Lead", a, ignore.case = TRUE)) { FAKE_BL = 1 } else if(grepl("Wrong BL", a, ignore.case = TRUE)) { FAKE_BL = 1 } else { FAKE_BL = 0 } #8 if(grepl("Notice Period", a, ignore.case = TRUE)) { NOTICE_PERIOD = 1 } else { NOTICE_PERIOD = 0 } #9 if(grepl("Business Closed", a, ignore.case = TRUE)) { BUSINESS_CLOSED = 1 } else if(grepl("Out of India", a, ignore.case = TRUE)) {
  • 44. Summer Internship Report Page 44 BUSINESS_CLOSED = 1 } else if(grepl("Partnership Issue", a, ignore.case = TRUE)) { BUSINESS_CLOSED = 1 } else if(grepl("Personal reason", a, ignore.case = TRUE)) { BUSINESS_CLOSED = 1 } else if(grepl("Changed My Business", a, ignore.case = TRUE)) { BUSINESS_CLOSED = 1 } else { BUSINESS_CLOSED = 0 } #10 if(grepl("Need local buyer", a, ignore.case = TRUE)) { HYPER_LOCAL_BUYER = 1 } else if(grepl("local enquiries", a, ignore.case = TRUE)) { HYPER_LOCAL_BUYER = 1 } else {
  • 45. Summer Internship Report Page 45 HYPER_LOCAL_BUYER = 0 } #11 if(grepl("Wrong commitment from sales", a, ignore.case = TRUE)) { MISCOMMITMENT = 1 } else if(grepl("Miscommitment", a, ignore.case = TRUE)) { MISCOMMITMENT = 1 } else if(grepl("mis-commitment", a, ignore.case = TRUE)) { MISCOMMITMENT = 1 } else { MISCOMMITMENT = 0 } #12 if(grepl("physical visit", a, ignore.case = TRUE)) { PHYSICAL_VISIT = 1 } else { PHYSICAL_VISIT = 0 } #13 if(grepl("Client is not Tech Savy", a, ignore.case = TRUE))
  • 46. Summer Internship Report Page 46 { NOT_TECH_SAVY = 1 } else if(grepl("Tech Savvy", a, ignore.case = TRUE)) { NOT_TECH_SAVY = 1 } else if(grepl("Computer Savvy", a, ignore.case = TRUE)) { NOT_TECH_SAVY = 1 } else { NOT_TECH_SAVY = 0 } #14 if(grepl("Buyer quote very less price", a, ignore.case = TRUE)) { BUYER_WANT_LOW_PRICE = 1 } else if(grepl("Buyer asking Low Price", a, ignore.case = TRUE)) { BUYER_WANT_LOW_PRICE = 1 } else if(grepl("Buyer want low price", a, ignore.case = TRUE)) { BUYER_WANT_LOW_PRICE = 1 } else
  • 47. Summer Internship Report Page 47 { BUYER_WANT_LOW_PRICE = 0 } #15 if(grepl("Language Barrier", a, ignore.case = TRUE)) { LANGUAGE_BARRIER = 1 } else if(grepl("Tamil", a, ignore.case = TRUE)) { LANGUAGE_BARRIER = 1 } else { LANGUAGE_BARRIER = 0 } #16 if(grepl("changes required", a, ignore.case = TRUE)) { CHANGES_REQUIRED = 1 } else { CHANGES_REQUIRED = 0 } #17 if(grepl("WRONG PRODUCT", a, ignore.case = TRUE)) { WRONG_PRODUCT = 1 } else { WRONG_PRODUCT = 0} #18
  • 48. Summer Internship Report Page 48 if(grepl("WRONG IMAGE", a, ignore.case = TRUE)) { WRONG_IMAGE = 1 } else { WRONG_IMAGE = 0} #19 if(grepl("WRONG CATALOG", a, ignore.case = TRUE)) { WRONG_CATALOG = 1 } else { WRONG_CATALOG = 0} #20 if(grepl("Change number", a, ignore.case = TRUE)) { CHANGE_NUMBER = 1 } else { CHANGE_NUMBER = 0} #21 if(grepl("Change email", a, ignore.case = TRUE)) { CHANGE_EMAIL = 1 } else { CHANGE_EMAIL = 0} #22 if(grepl("Close the account", a, ignore.case = TRUE)) { CLOSE_ACCOUNT = 1
  • 49. Summer Internship Report Page 49 } else { CLOSE_ACCOUNT = 0} #23 if(grepl("Not interested", a, ignore.case = TRUE)) { NOT_INTERESTED = 1} else { NOT_INTERESTED = 0} #24 if(grepl("Remove Price", a, ignore.case = TRUE)) { REMOVE_PRICE = 1 } else { REMOVE_PRICE = 0} #25 if(grepl("not the same catalog that i approved", a, ignore.case = TRUE)) { DIFFERENT_CATALOG = 1 } else { DIFFERENT_CATALOG = 0} #26 if(grepl("change after approval", a, ignore.case = TRUE)) { CHANGE_AFTER_APPROVAL=1 } else { CHANGE_AFTER_APPROVAL=0} else if(grepl("change after hosting approval", a, ignore.case = TRUE)) {
  • 50. Summer Internship Report Page 50 CHANGE_AFTER_APPROVAL=1 } else { CHANGE_AFTER_APPROVAL=0} Result1 = cbind(GLID ,CUSTOMER_TICKET_ID,CUSTOMER_TICKET_ISSUE_DATE, CUSTOMER_TICKET_DETAIL ,APPENDUM ,TICKET_HISTORY ,STOP ,NO_BENEFIT ,NO_MATURITY ,NO_TIME_TO_USE ,FAKE_BUYERS ,IRRELEVANT_ENQUIRIES ,FAKE_BL ,NOTICE_PERIOD ,BUSINESS_CLOSED ,HYPER_LOCAL_BUYER ,MISCOMMITMENT,PHYSICAL_VISIT ,NOT_TECH_SAVY ,BUYER_WANT_LOW_PRICE ,LANGUAGE_BARRIER, CHANGES_REQUIRED,WRONG_PRODUCT,WRONG_IMAGE ,WRONG_CATALOG, CHANGE_NUMBER, CHANGE_EMAIL, CLOSE_ACCOUNT, NOT_INTERESTED ,REMOVE_PRICE, DIFFERENT_CATALOG, CHANGE_AFTER_APPROVAL ) Result = rbind(Result,Result1) Result1 = NULL } #write.csv(Result, "Somesh_Ticket_Result.csv") Project #4: d7 <- read.csv(“complete_file.csv”) d7_train <- d7[1:3000,] d7_test <- d7[3001:3613,] library(C50) m <- C5.0(d7_train[c(2:22)], as.factor(d7_train[[31]]), trials = 1) summary(m) p <- predict(m, d7_test[c(2:22)]) library(gmodels)
  • 51. Summer Internship Report Page 51 CrossTable(d7_test$Status, p, prop.chisq = FALSE, prop.c = FALSE, prop.r = FALSE, dnn = c("actual", "predicted")) library(irr) p1 <- predict(m, d7_test[c(2:22)], type = "prob") p1 <- cbind(p1, Prediction = p, Actual_Status = d7_test$Status) head(p1,20) write.csv(p1, "yearly_prob.csv") Project #5: To find the Brand Names : # Name the imported file as mydata Brand_Data <-data.frame(x=numeric(length(mydata[,1])) ,y=numeric(length(mydata[,1])) ,z=numeric(length(mydata[,1]))) count <- 1 for(i in 1:length(mydata[ ,1])) { print(i) temp <- sapply(mydata[i,1], as.character) temp1 <- tolower(temp) temp <- sub(".*brand(:| :|-| -|:-| :- )","",temp1) if(temp != temp1) { temp <- sub("n.*","",temp) Brand_Data[count,1] <- sapply(mydata[i,1], as.character) Brand_Data[count,2] <- temp Brand_Data[count,3] <- sapply(mydata[i,2], as.character) count = count + 1}
  • 52. Summer Internship Report Page 52 } write.csv(Brand_Data,"Brand_Data.csv") To find the specifications: #Declaring 2 lists list1 = list() list2 = list() # Declaring a null dataframe Brand_Data_Result = data.frame(cbind( A=NULL,B=NULL, C=NULL, D=NULL, E=NULL, F=NULL, G=NULL, H=NULL, I=NULL)) for (i in 1:length(Brand_Data[[1]])) { print(i) a = as.character(Brand_Data[i,1]) #Split the text based on the ":" if(grepl("Brand:", a)) { b = strsplit(a, ":") } #Split the text based on the ":-" else if(grepl("Brand:-",a)) { b = strsplit(a, ":-") } #Split the text based on the "-" else if(grepl("Brand-",a)) b = strsplit(a, "-") #Split the text based on the " -" else if(grepl("Brand -",a)) b = strsplit(a, " -")
  • 53. Summer Internship Report Page 53 #Split the text based on the " :-" else if(grepl("Brand :-",a)) b = strsplit(a, " :-") #Split the text based on the " :" else if(grepl("Brand :",a)) b = strsplit(a, " :") else b = a if( length(b[[1]])<2 ) { # when length of the splitted text is less than 2 Just consider the first 3 columns of input in which # the 2nd one already contains the brand name Brand_Data_Result[i,1] = Brand_Data[i,1] Brand_Data_Result[i,2] = Brand_Data[i,2] Brand_Data_Result[i,3] = Brand_Data[i,3] next # Go for the next iteration ( next i ) } for (j in 1:length(b[[1]])) { # split the text based on "n" c = strsplit(b[[1]][j], "n") list1[j]= c } # Attach the last word of one string and the first word of next string for (k in 1:(j-1)) {
  • 54. Summer Internship Report Page 54 d = paste(list1[[k]][length(list1[[k]])], list1[[k+1]][1], sep = ":") list2[k]= d } Brand_Data_Result[i,1] = Brand_Data[i,1] Brand_Data_Result[i,2] = Brand_Data[i,2] Brand_Data_Result[i,3] = Brand_Data[i,3] for(l in 1:length(list2)) { Brand_Data_Result[i,l+3]=list2[l] } list1 = NULL list2 = NULL} Brand_Data_Result = Brand_Data_Result[ ,1:10] # Code to rearrange the rows of the brand data result so that 4th columns conatins only the brand name not anything else for (p in 1:length(Brand_Data_Result[[1]])){ print(p) if(grepl("Brand", Brand_Data_Result[p,4])) { next } for (q in 5:10) { if(grepl("Brand", Brand_Data_Result[p,q])) {
  • 55. Summer Internship Report Page 55 temp = Brand_Data_Result[p,q] Brand_Data_Result[p,q]= Brand_Data_Result[p,4] Brand_Data_Result[p,4] = temp }}} write.csv(Brand_Data_Result, "Brand_Data_Result.csv") To find the count of Brands and Specifications for a particular Mcat : Brand_Data1 = Brand_Specifications_Result Brand_Data1[is.na(Brand_Data1)] = "" # Code to rearrange the rows of the brand data result so that 4th columns conatins only the brand name not anything else a = unfactor(a) for (p in 1:length(a[[1]])) { if(p%%100 ==0) {print(p)} if(grepl("Brand", a[p,3])) { next } for (q in 3:8) { if(grepl("Brand", a[p,q])) { temp = a[p,q] a[p,q]= a[p,3] a[p,3] = temp }}}
  • 56. Summer Internship Report Page 56 library(stringr) list_splitted = list() list3 = list() test = list() Result = data.frame() Brand_Name = 0 Mcat_Name=0 Spec1 = 0 Spec2 = 0 Spec3 = 0 Spec4 = 0 Spec5 = 0 Spec6 = 0 Spec7 = 0 # Spec# are the specification columns for (p in 1:length(Brand_Data1[[1]]) ){ if(p%%100 ==0) {print(p)} if(Brand_Data1[p,2]==""){next} # First split the text on the basis of " and " and assign that list to test test = strsplit(Brand_Data1[p,2]," and ") if(test[[1]][1] ==""){next} #print(test)
  • 57. Summer Internship Report Page 57 #print(length(test[[1]])) # Now split the elements of the test on the basis of "," for(q in 1:length(test[[1]])) { #print(q) #Put all the splitted elements in list3 list3[q] = strsplit(test[[1]][q],",") } #print(list3) for (r in 1:length(list3)) { # check if Brand name is pure no.- then don't consider that for(s in 1:length(list3[[r]])) { #if( is.na(as.numeric(list3[[r]][s]))) test1 = list3[[r]][s] test1 = str_trim(test1) Brand_Name = rbind(Brand_Name,test1) test2 = Brand_Data1[p,3] test2 = str_trim(test2) Mcat_Name = rbind(Mcat_Name,test2) test2 = Brand_Data1[p,4] Spec1 = rbind(Spec1,test2) test2 = Brand_Data1[p,5] Spec2 = rbind(Spec2,test2)
  • 58. Summer Internship Report Page 58 test2 = Brand_Data1[p,6] Spec3 = rbind(Spec3,test2) test2 = Brand_Data1[p,7] Spec4 = rbind(Spec4,test2) test2 = Brand_Data1[p,8] Spec5 = rbind(Spec5,test2) test2 = Brand_Data1[p,9] Spec6 = rbind(Spec6,test2) } } test1 = NULL test2 = NULL Result1 = cbind(Brand_Name, Mcat_Name,Spec1,Spec2, Spec3, Spec4,Spec5, Spec6) Result = rbind(Result, Result1) Brand_Name = NULL Mcat_Name = NULL Spec1 = NULL Spec2 = NULL Spec3 = NULL Spec4 = NULL Spec5 = NULL Spec6 = NULL
  • 59. Summer Internship Report Page 59 list3 = NULL } # To remove the first row of zeroes Result = Result[-1, ] #colnames(Result) = c("Brand_Name","Mcat_Name") Result_final_1 = Result # To remove wrongly chosen brands for (t in 1:length(Result[[1]])) { if(t%%100 ==0) {print(t)} if(grepl("any|other|all ",Result[t,1]) ) { Result = Result[-t, ] #print("Good") } } # To remove "." and ":" from Brand names in the 1st column of Result So that when grepl is used, some observations should not miss due to extra "." #Result[Result.na] = 0 Result[ ,1] = sub("[.,:,',-,(,),+]","", Result[ ,1]) # To remove the rows which contain only "all" in brand column
  • 60. Summer Internship Report Page 60 for (t in 1:length(Result[[1]])) { if(t%%100 ==0) {print(t)} if(grepl(Result[t,1],"all ") | is.na(Result[t,1])) { Result = Result[-t, ] #print("Good") }} # To do the trimming of extra spaces created due to removal of ":" # # Code for trimming for (i in 1:length(Result[[1]])) { if(i%%100 ==0) {print(i)} Result[i,1] = trimws(Result[i,1]) } #write.csv(Result, "Result.csv") # To get rid of factors first save it and then import it #write.csv(Result, "Result.csv") Result2 = Result #rm(Result) library(varhandle) Result = unfactor(Result)
  • 61. Summer Internship Report Page 61 #Result = read.csv(file.choose(), header = TRUE, sep = ",", stringsAsFactors = FALSE) #Result[Result == ""] = 0 # Result1 = Result2 # Result2 = Result # Result = Result2 # To split the Brands in Result based on " or " # First assigned the Result to a different dataframe so that u can use earlier to split based on " and " as such remove the first column of Result Result_copy1 = Result Result = NULL library(stringr) list_splitted = list() list3 = list() test = list() Result = data.frame() Brand_Name = 0 Mcat_Name=0 Spec1 = 0 Spec2 = 0 Spec3 = 0 Spec4 = 0 Spec5 = 0 Spec6 = 0 Spec7 = 0
  • 62. Summer Internship Report Page 62 # Now make the Result null because the output will be stored in it #colnames(Result_copy1) = c("P","M") for (p in 1:length(Result_copy1[[1]]) ) # for ( p in 3:4) { if(p%%100 ==0) {print(p)} # First split the text on the basis of " and " and assign that list to test test = strsplit(Result_copy1[p,1]," or ") if(test[[1]][1]==""){next} #print(test) #print(length(test[[1]])) # Now split the elements of the test on the basis of "," for(q in 1:length(test[[1]])) { #print(q) #Put all the splitted elements in list3 list3[q] = strsplit(test[[1]][q],",") #print(list3[q]) } #print(list3) for (r in 1:length(list3)) { # check if Brand name is pure no.- then don't consider that
  • 63. Summer Internship Report Page 63 for(s in 1:length(list3[[r]])) { #if( is.na(as.numeric(list3[[r]][s]))) #{ test1 = list3[[r]][s] test1 = str_trim(test1) Brand_Name = rbind(Brand_Name,test1) test2 = Result_copy1[p,2] test2 = str_trim(test2) Mcat_Name = rbind(Mcat_Name,test2) test2 = Result_copy1[p,3] Spec1 = rbind(Spec1,test2) #print(Spec1) test2 = Result_copy1[p,4] Spec2 = rbind(Spec2,test2) #print(Spec2) test2 = Result_copy1[p,5] Spec3 = rbind(Spec3,test2) # print(Result_copy1[p,5]) # print(test2) # print(Spec3) test2 = Result_copy1[p,6] Spec4 = rbind(Spec4,test2)
  • 64. Summer Internship Report Page 64 test2 = Result_copy1[p,7] Spec5 = rbind(Spec5,test2) test2 = Result_copy1[p,8] Spec6 = rbind(Spec6,test2) }} test1 = NULL test2 = NULL Result1 = cbind(Brand_Name, Mcat_Name,Spec1,Spec2, Spec3, Spec4,Spec5, Spec6) Result = rbind(Result, Result1) Brand_Name = NULL Mcat_Name = NULL Spec1 = NULL Spec2 = NULL Spec3 = NULL Spec4 = NULL Spec5 = NULL Spec6 = NULL list3 = NULL } Result = as.data.frame(Result) Result3 = Result
  • 65. Summer Internship Report Page 65 # To remove the first row of zeroes Result = Result[-1, ] Result[ ,1] = sub("[.,:,',-,(,),+]","", Result[ ,1]) # To remove wrongly chosen brands for (t in 1:length(Result[[1]])) { print(t) if(grepl("any|other|all ",Result[t,1]) ) { Result = Result[-t, ] #print("Good")}} for (t in 1:length(Result[[1]])){ if(t%%100 ==0) {print(t)} if(grepl(Result[t,1],"all ") | is.na(Result[t,1])) { Result = Result[-t, ] #print("Good")}} # To do the trimming of extra spaces created due to removal of ":" # # Code for trimming
  • 66. Summer Internship Report Page 66 for (i in 1:length(Result[[1]])) { if(i%%100 ==0) {print(i)} Result[i,1] = trimws(Result[i,1]) } Result4 = Result # To remove the Brands which starts with a number for (i in 1:length(Result[[1]])) # for(i in 1:10 ) { if(i%%100 ==0) {print(i)} if(substr(Result[i, 1], 1, 2)== "3m" | is.na(as.numeric(substr(Result[i,1],1,1)))) { next } if(!is.na(as.numeric(substr(Result[i,1],1,1)))) { Result[i, ] = "" } } Result_Final = Result d = Result_Final #write.csv(Result_Final, "Result_Final.csv") # Remove the first column in the Excel. Again import that data as the final input for the count of specifications # read.csv(file.choose(), header = TRUE, sep = ",", stringsAsFactors = FALSE) # Now the final code to get the result in a given format
  • 67. Summer Internship Report Page 67 # Format:- # For a particular Mcat get all the brands and their individual count , Also get the count of all the specifications for the same Mcat # Input:- Brand_Specifications_Final_Result Specifications = Brand_Specifications_Final_Result # First take the specifications and split on ":" to consider only Specification not the value # Now convert these specifications to factors so that count becomes easy for (i in 1:length(Specifications[[1]])) { print(i) for (j in 3:8) { if(Specifications[i,j]!="") { a = strsplit(Specifications[i,j],":") Specifications[i,j] = a[[1]][1] } a = NULL }} # To remove the brands with " etc" string for (i in 1:length(Specifications[[1]])) { print(i) if(grepl(" etc| china",Specifications[i,2] )) {
  • 68. Summer Internship Report Page 68 a = strsplit(Specifications[i,2]," etc") Specifications[i,2] = a[[1]][1] } a = NULL } # To remove the brands which has only "etc" string for (i in 1:length(Specifications[[1]]) ) { print(i) if(Specifications[i,2] !="etc" & Specifications[i,2] !="china" & Specifications[i,2] !="chinese") { next } else { Specifications = Specifications[-i, ]} } # To make specifications lower case so that we don't get different and less counts for the # same specification and also remove the numbers from the specifications for (i in 1:length(Specifications[[1]])) { print(i) for (j in 3:8) { if(is.na(as.numeric(Specifications[i,j]))) {
  • 69. Summer Internship Report Page 69 Specifications[i,j] = tolower(Specifications[i,j]) } else { Specifications[i,j] = "" } if(!grepl("price|budget", Specifications[i,j])) { next } else { Specifications[i,j] = "" } }} write.csv(Specifications,"Specs_Final_Result.csv") # Final Loop library(plyr) # For count function library(varhandle) # For unfactor function Result = data.frame() Mcat_Name = 0 Total_Brand_Count = 0 Brand_Name = 0 Brand_count = 0 Specs = 0 Specs_Count = 0 count = 1 test1 = 0 test2 = 0 a = NULL
  • 70. Summer Internship Report Page 70 b = NULL c = NULL d = NULL for (i in 1:length(Specifications[[1]])) #for(i in 1:30) { print(i) temp = Specifications[i,2] a = c(a,temp ) #print(temp) # a is the vector of brand names #print(a) if(Specifications[i,1] == Specifications[i+1,1]) { count = count + 1 #print(count) } for (j in 3:7) { #print("yes") if(Specifications[i,j]!="") { temp = Specifications[i,j] b = c(b,temp) #print(b)
  • 71. Summer Internship Report Page 71 } } # When Mcat Changes if(Specifications[i,1]!=Specifications[i+1,1]) { Mcat = Specifications[i,1] if(length(a)!=0) { a = factor(a) summary_a = count(a) summary_a = unfactor(summary_a) } if(length(b)!=0) { b = factor(b) summary_b = count(b) summary_b = unfactor(summary_b) } #print(summary_b) # To know how much rows are required for particular Mcat max_len = max(length(summary_a[[1]]), length(summary_b[[1]]), count) for (k in 1:max_len) { Mcat_Name = rbind(Mcat_Name, Mcat ) # To get the Brand Names and their count if(k<= length(summary_a[[1]]))
  • 72. Summer Internship Report Page 72 { test1 = summary_a[[1]][k] Brand_Name = rbind(Brand_Name,test1) test2 = summary_a[[2]][k] Brand_count = rbind(Brand_count,test2) } else { test1 = "" Brand_Name = rbind(Brand_Name,test1) test2 = "" Brand_count = rbind(Brand_count,test2) } Total_Brand_Count = rbind(Total_Brand_Count, count) # To get the specifications and their count if(k<= length(summary_b[[1]])) { test1 = summary_b[[1]][k] Specs = rbind(Specs,test1) test2 = summary_b[[2]][k] Specs_Count = rbind(Specs_Count,test2) } else {
  • 73. Summer Internship Report Page 73 test1 = "" Specs = rbind(Specs,test1) test2 = "" Specs_Count = rbind(Specs_Count,test2) } } Result1 = cbind(Mcat_Name,Total_Brand_Count, Brand_Name, Brand_count, Specs, Specs_Count) Result = rbind(Result,Result1) count = 1 a = NULL b =NULL summary_a = NULL summary_b = NULL } Result1 = NULL Mcat_Name = NULL Total_Brand_Count = NULL Brand_Name = NULL Brand_count = NULL Specs = NULL Specs_Count = NULL } Specifications_Final_Result = Result write.csv(Result, "Specifications_Final_Result.csv")