1. Resume
Resume
NAME : Gunjan Kumar Gupta
CURRENT OCCUPATION : Machine Learning Scientist, Amazon.com, Seattle, WA
INDUSTRY WORK EXPERIENCE : 10 years (Data Mining + Software Engineering )
WORK EXPERIENCE : • Nov 2006 – Present, May – Aug 2005, Machine
Learning Scientist, Amzon.com
• Jan 2004 – Aug 2006, Research Assistant, NSF/UT
• May 2002 – Jan 2004, Standard & Poor’s, New York.
• June 2000 – May 2002, i2 Technologies, Austin, TX.
• May '99 – May 2000, Net Perceptions, Austin, TX.
• Aug '98 – May ’99, Teaching Assistant, UT-Austin.
• Sept. '97 - June '98, MCI, Colorado Springs.
• April '97 to July'97 Samsung Electronics, Seoul.
• June '95 to March '97, Infosys Techologies. Ltd., India.
EDUCATIONAL BACKGROUND : • PhD in Data Mining/Comp. Eng., UT Austin, 2006
• MS in Data Mining/Comp. Eng., UT Austin, 2000.
• BS in Computer Science from IT-BHU (part of IIT-
JEE), India, 1995.
SYSTEMS WORKED ON : HP, Sun, DEC Unix, DEC-VMS, AIX, NCR
COMPUTER LANGUAGES KNOWN : Java, C++, C, Perl, Pascal, HTML (CGI), SQL Plus,
MySQL, MATLAB, SAS
LANGUAGES FAMILIAR WITH : LISP, Fortran, Smalltalk, PROLOG, 8086 Assembly
OPERATING SYSTEMS : Linux, Windows, HP-UX, Solaris, DEC-Unix, VMS
DATABASES WORKED ON : Oracle, MySql, DB2, Orchestrate, SAS
VISA STATUS : Green Card
Last updated: April 2010
WORK EXPERIENCE (latest projects first)
01 Nov 2006 – Present Machine Learning Scientist, Amazon.com, Seattle, WA
Developing and deploying scalable implementations of machine-learning algorithms. Using it for predicting
various aspects of merchant and customer behavior to enhance quality of service, and to reduce financial and
other types of risks for Amazon. Providing my expertise and guidance to multiple groups and departments at
Amazon on strategic machine-learning technologies and issues. Domains included many areas in fraud,
product demand forecasting, item and website recommendation, matching and categorization. In particular
dealing with learning problems involving highly skewed priors, sparse features, adversarial learning settings
and incomplete and noisy labels. Supervised and unsupervised learning on text data, learning on massive
amounts of real-time streaming data, and learning on temporal and historical profiles.
Platform : Linux, Windows XP, Perl, Java, Hibernate, Spring, C++, Matlab,
Shell-scripting, SAS, SQL, Oracle, MySQL, Weka, Amazon SDE platforms
02 Jan 2004 – Oct 2006 Research Assistant for Professor Joydeep Ghosh, UT-Austin, Austin, TX
An NSF RA grant directly supported my Ph.D. dissertation work; I have developed algorithms that can
discover high purity clusters in unsupervised, large, very noisy, high-dimensional datasets where most of the
data points do not cluster well. Application domains include Bioinformatics, market-basket data, web data,
biometrics and anomaly detection. (e.g. Gene DIVER: http://www.ideal.ece.utexas.edu/~gunjan/genediver/)
Platform : Linux, Windows 2000, Perl, Java, SWING, Matlab, Weka
Gunjan Kumar Gupta, 4839 130tthAve SE, Bellevue, WA 98006, Ph(r): (425) 818-1551
Email: gunjan@iname.com., Online Resume:http://www.ideal.ece.utexas.edu/~gunjan/resume.html
Page 1
2. Resume
03 May 2005 – Aug 2005 Amazon.com, Risk Management Group, Seattle, WA
I was involved in developing a fraud detection model using historical data. I designed and developed a
MySql database and a modeling server from ground up that aggregates and summarizes modeling data from
massive amounts of continuous raw data from multiple live databases and an online probabilistic model that
adapts to new fraud patterns automatically, as they emerge, and predicts new fraud activity. Also my trip to
Bonn, Germany as a first author at ICML-2005 was sponsored by Amazon.com to help recruit researchers
for Amazon.
Platform : MySql (DBA+programming), J2SE 5.0, Red Hat, Shell scripting, reg-ex
04 May 2002 – Jan 2004 Standard & Poor’s, Risk Solutions Group, Santa Fe, NM/ New York, NY
Involved in research and development of financial models and algorithms for the Risk Solutions Group at
Standard & Poor’s using historical data on companies, countries and individuals, using logistic regression,
neural-networks, and other probabilistic methods.
Platform : Windows, Matlab, Linux,JBuilder,Java, C++, SAS, SQL, CVS, Perl.
05 June 2000 – May 2002 i2 Technologies, Austin, TX(http://www.i2.com)
Involved in development of CRM and Data Mining applications in Oracle SQL and Java, mainly:
Trending: Architect, designer and developer of a communication API to enable collection of information
from e-commerce customer web sites for a CRM product, involving extensive OOAD and integration with
existing i2 products and Rightworks. Marketing Analytics and Campaign Manager: I was involved with
design, research and development of this product. It used data collected from Trending and other sources and
provided real-time automated decision support, targeted advertisement and campaign management.
Algorithms included advanced cross-sell recommendations and advising the PM on Data Mining.
Platform : NT 4.0 Server, Borland JBuilder, Java 2 SDK, C++, InstallAnywhere,
Oracle 8i, ClearCase, JRun 3.0, JSP
06 May 1999 - May 2000 KD1/Net Perceptions, Austin, TX
Involved in developing algorithms for market-basket analysis and clustering, an extension of which became
my Masters thesis that also resulted in two publications. Challenges included size and dimensionality
(~100,000 products) of the data.
Platform : NT 4.0, AIX, Sun, Matlab, Orchestrate, Korn-Shell, C++
07 Sept 1997 - June, 1998 MCI, Colorado Springs
Extensive OO design and development of class libraries for Call-Processing software used in conjunction
with DAP for regulatory routing of telephone calls on the MCI network.
Platform : VMS, DEC-Unix and Windows NT 4.0, C++, Object-Broker, Object-Store, CORBA, X-Motif,
Rogue Wave & UIMX, Visual C++ 4.0, IDL
08 March 1997 - July, 1997 Samsung Electronics, Seoul, South Korea
Design and development of Network Management Systems on Windows 95 platform for SR4024 a multi-
protocol router and SH2024 hub using winSNMP library and Visual C++. Also ported the NMS onto Sun
and HP platforms using WindU Tool.
Platform : SR4024/Multi router, SH2024 hub, Pentium m/c, Sun Sparc and HP M/c,
Windows 95, HPUX, SunOS, WindU, NetXRay, SNMPc, agent software and
MIB compiler.
09 July 1995 - March, 1997 Infosys Technologies Ltd., Bangalore, India (http://www.inf.com)
Involved in many projects for Infosys clients including: Sept 1996 - March 1997: OOAD & dev. of new
class libraries and modules for Datavision, an analysis/visualization product for Nortell. May 1996 - Sept.
1996: Developed an Extended MAPI interface for email support on Inconcert, a Workflow Automation
product from Xsoft, a part of Xerox Inc. Feb 1996 – May 1996 Designed and developed IMAP, a
multimedia prototype client and part of the server for Nynex S&T Lab, Bangkok. Involved in on-site
demo with Nynex’s customers. Sept 1995 – Feb 1996: Designed and developed a back-end parser and X-
Motif user-interface called SLLBFM for the SLL language for Nynex S&T.
Gunjan Kumar Gupta, 4839 130tthAve SE, Bellevue, WA 98006, Ph(r): (425) 818-1551
Email: gunjan@iname.com., Online Resume:http://www.ideal.ece.utexas.edu/~gunjan/resume.html
Page 2
3. Resume
Platform : HP,Sun,Openwin,Fore-ATM,VAT and VIC,NVATM,Windows,XRT, Motif,BSD
and Windows sockets,Extended MAPI,IPC,Exchange, GNU&Msft C++,Rogue
Wave Tools.h++ &Views.h++, UIMX, X-Windows Motif 1.2/X11R5
EDUCATION & RESEARCH BACKGROUND
August 1998 – 2000, January University of Texas at Austin. Continued collaboration with UT IDEAL
2003-October 2006 group since 2006. Served as a Reader for a UT Austin graduate student’s
2010 machine learning focused Masters thesis.
January 2004 –October 2006: PhD in Data Mining (Computer Engineering).
August 1998 - June 2000: MS in Data Mining (Computer Engineering).
Coursework: Advanced Topics in Data Mining, Bioinformatics, Arch. & App. Of Biological Databases, Data
Mining, Machine Learning, Digital Image Processing, Artificial Neural Networks, Optimization of
Engineering Systems, Knowledge Representation, Practicum in Data-Mining (involving a project for Dell),
Software Engineering Metrics, CPU Optimization for DSS Systems, Natural Language Processing.
Masters Thesis: Gupta, G. “Modeling Customer Dynamics using Motion Estimation in a Value Based
Cluster Space for Large Retail Data-sets.” MS Thesis, Department of Electrical and Computer Engineering
University of Texas (http://www.lans.ece.utexas.edu/~gunjan/publications.html).
Refereed publications (http://www.lans.ece.utexas.edu/~gunjan/publications.html):
Journals:
1. G. Gupta, J. Ghosh, Bregman Bubble Clustering: A Robust Framework for mining Dense
Clusterings, ACM Transactions on Knowledge Discovery from Data, 2(8), July 2008
2. G. Gupta, A. Liu, J. Ghosh, Automated Hierarchical Density Shaving: A robust, automated
clustering and visualization framework for large biological datasets, IEEE/ACM Transactions on
Computational Biology and Bioinformatics, 17 March 2008
Conferences:
3. M. Deodhar, H. Cho, G. Gupta, J. Ghosh, I. Dhillon, A Scalable Framework for Discovering
Coherent Co-clusters in Noisy Data, (Best Paper Award Honorable Mention), ICML 2009.
4. M. Deodhar, H. Cho, G. Gupta, J. Ghosh, I. Dhillon, Hunting Coherent Clusters in High
Dimensional Noisy Datasets, In Workshop on Foundations of Data Mining, ICDM 2008.
5. G. Gupta, J. Ghosh, Bregman Bubble Clustering: A Robust, Scalable Framework for Locating
Multiple, Dense Regions in Data, (Runners up Best Research Paper Award), ICDM 2006,
December 2006, 12 pages.
6. G. Gupta, A. Liu, J. Ghosh, ierarchical Density Shaving: A clustering and visualization framework
for large biological datasets", ICDM 2006 Workshop on Data Mining in Bioinformatics (DMB
2006).
7. G. Gupta, A. Liu and J. Ghosh, Clustering and Visualization of High-Dimensional Biological
Datasets using a fast HMA Approximation, In Proc. ANNIE 2006, ASME, November 2006,6 pages
8. G. Gupta and J. Ghosh, Robust One-Class Clustering Using Hybrid Global and Local Search, In
Proc. ICML 2005, August 7-11, 2005, Bonn, Germany, pp. 273-280
9. G. Gupta and J. Ghosh, Detecting Seasonal Trends and Cluster Motion Visualization for very High
Dimensional Transactional Data, First Siam Conf. On Data Mining, (SDM2001), Chicago, April
2001.
10.G. Gupta and J. Ghosh, Value Balanced Agglomerative Connectivity Clustering, Proc. SPIE Conf.
on Data Mining and Knowledge Discovery, SPIE Proc., Orlando, April 2001.
11.G. Gupta, A. Strehl and J. Ghosh. Distance Based Clustering of Association Rules. in Intelligent
Engineering Systems Through Artificial Neural Networks, Vol. 9, ASME Press, Proc ANNIE '99,
Nov 1999, pp. 759-764.
Other Papers: http://www.lans.ece.utexas.edu/~gunjan/projects.html
Platform : Linux, Windows 2000, Perl, Matlab, C++, Java, MySql
August 1991 - May 1995 B-Tech, Institute of Technology, Banaras Hindu University
My undergraduate thesis was on recognition of hand-written numerals and alphabets using a meta-learner to
combine the outputs from multiple classifiers. Other projects included Plot4, a chess-like game with AI and
learning that won first prize in a IEEE contest, a Pascal to C translator, and a 2.5 months internship at Tata
Iron & Steel Corp. Jamshedpur, India involving stove operation simulations.
Platform : BGI, Pascal, Borland C/C++, cc, MASM, Unix, Dos, Assemb. 8086
Gunjan Kumar Gupta, 4839 130tthAve SE, Bellevue, WA 98006, Ph(r): (425) 818-1551
Email: gunjan@iname.com., Online Resume:http://www.ideal.ece.utexas.edu/~gunjan/resume.html
Page 3
4. Resume
Gunjan Kumar Gupta, 4839 130tthAve SE, Bellevue, WA 98006, Ph(r): (425) 818-1551
Email: gunjan@iname.com., Online Resume:http://www.ideal.ece.utexas.edu/~gunjan/resume.html
Page 4