To use the Data warehousing, Data Mining, Statistic, Machine Learning education and skills acquired to work on challenging
and interesting projects in industry. My goal is to use both these skills to assist data analysis & data model, build data
warehouses and eventually work on data mining on the data to produce interesting patterns for decision support.
Here are some industry specific positions that I qualify for (with experience from past) and I am interested:
1) ETL Developer/Any position that involves ETL experience
2) Data Analyst/Data Mining Analyst/Data warehouse Analyst/Business Intelligence Analyst
3) Data modeling
I am looking for positions in Dallas, Texas.
Over 5 Years of IT experience in analysis, design, development and implementation of software applications, Data
Mining and Data Warehousing in Engineering, Financial, Sales, Human Resources, Health Care and Call Centre
Over 3 years of experience with Data Mining, Data Warehousing and Business Intelligence applications:
a) Using Informatica (PowerCenter/PowerMart) 7.x/6.x/5.1.1 (Mapping Designer, Warehouse Designers,
Mapplet Designer, Transformation Developer, Repository Manager, Server Manager, Workflow Manager,
Workflow Monitor), OLAP and OLTP.
b) Using Microsoft Integration Services SSIS (Control Flow tasks, Data Flow tasks, Transformations,
Database administration tasks)
c) Cognos Decision Stream (Fact/Dimension Builds, Job Streams)
d) Business Objects Data Integrator
a) Data Modeling experience using Cognos Framework Manager/Business Objects Universe - well convergent
with concepts of Star-join Schema/Snowflake modeling, FACT & Dimensions tables and Logical & Physical data
a) Extensive reporting experience using Cognos ReportNet (Query Studio, Report Studio)/Business Objects.
Machine Learning/Data Mining/Predictive Analytics:
a) Excellent working knowledge of Data Mining Models (Classification, Association Rule Mining, Clustering,
b) Extensive experience using Data Mining tools WEKA(Analyzer, Experimenter, Feature Extraction,
Visualization), Microsoft Analysis Services
Extensive experience in working with relational databases like Oracle 9i/8i/8.x/7.x, MS SQL Server 2000/2005, MS
Access 7.0/2000, SQL, PL/SQL, SQL*Plus, SQL*Loader and Developer 2000.
Extensive experience in CAD/CAE (Computer Aided Design and Engineering) software development using C/C+
+/Shell Scripting/Shell programming on UNIX platform.
Strong analytical, presentation, problem solving and excellent inter-personal skills to perform as a part of a team.
M.S. in Computer Science (Specialization: Databases, Data Mining) University of Calgary, Alberta, Canada
M.S in Mathematics, Indian Institute of Technology (IIT) (Madras), Chennai, India
B.S in Mathematics, University of Madras, Chennai, India.
Diploma Equivalent (One Year) – Unix, C and Oracle (Tata InfoTech Computer Education, India)
Certificate Course (2 months) – Data warehousing using Informatica, Business Objects (Texsas Technologies,
Business Objects XI: Certificate Course – ETL using Data Integrator , Universe design using Business Objects,
Report Development using Business Objects (Infosol Inc, Phoenix, Arizona, USA)
Cognos Report Net and Cognos 8: Certificate Course - ETL, Modelling, Reporting (Calgary, Canada, USA)
Cognos Reportnet Product User
Cognos Advanced Report Authoring Professional
Cognos Reportnet Modelling Professional
Relational Databases: Oracle 9i, 8i / 7.x, SQL Server (2000, 2005), MS Access,
Querying/Reporting: PL/SQL, SQL * Plus, SQL*Loader, TOAD
Reporting Tools: Cognos ReportNet, Cognos 8, Business Objects, Oracle Reports
Modelling Tools: Cognos Framework Manager, Business Objects Universe, Rational Rose, UML
ETL Tools: Informatica PowerCenter/ PowerMart(7.x, 6.x, 5.1), Business Objects Data Integrator, Cognos Decision Stream,
Microsoft Integration Services (SSIS)
Data Mining Tools: WEKA, Microsoft Analysis Services
Data Mining Models: Classification, Association Rule Mining, Clustering
Knowledge Discovery Models: Genetic Algorithms
Programming Languages: C, C++,Unix shell scripting, PL/SQL, Python, Shell Programming
Documentation Tools: Latex, MS Office (Including MS Project, VISIO)
Web Portal Technologies, Front End: HTML, Visual Studio, Sharepoint
Operation Systems: Windows (98, 2000, NT, XP), UNIX, Linux
Take solutions Inc
New jersey, USA June 06 - Ongoing
Business Intelligence Consultant
Client: Infosol Inc, Phoenix, Arizona, USA (Partner of Take Solutions Inc)
Working for Banner Health, Phoenix, Arizona, USA (Client)
Description: Banner Health is the administrative unit of Banner Hospital in Phoenix, Arizona. Worked on 2 projects with
the Enterprise Data Services (EDS) group. The first project, Quality Book’s goal was to integrating 11 sources (or
Administrative Units/Health Facilities) performance data against metrics (Measures) defined by the management. The final
objective here is to report/produce dashboard on initiatives (and key metrics) by different units. The second project, Nursing
Advisary Board’s goal was to produce 10 independent patient data extracts from 10 different sources. The third project
‘Glucose’ was an initiative taken by Banner to analyze the Glucose Levels of the patients to build a decision support system
eventually. Every patient visits the hospital for events such as diagnosis or treatments or ordering medicines etc. When a
patient is admitted for a treatment, the patient moves between different ICU and NON ICU units. The patient also goes through
several ‘Clinical Events’. This project tracked down the patients in nursing units with their glucose levels/insulin orders.
The challenge in QBOOK project was integration between 11 sources to build 2 FACT tables and 6 Dimension tables which
will also be a starting point for building Flexible & Scalable Enterprise Data Ware House System. The challenge in Nursing
Advisary Board was in processing source data using complex transformations to produce the extracts. The challenge in
Glucose project was to deal with 2 million data on a daily basis. From an ETL stand point, significant query optimization
techniques was required load the Datawarehouse tables in a timely manner. A lot of data cleansing in dealing with date-
time/var char fields were required as well.
• Extract, Transform and Load source data into respective target tables to build the required data marts.
• Understand user requirements and Data Model defined and developed by the business analysts and data architects.
• Communicate to the business analyst, Data Modeler on a regular basis to be informed on changing requirements or
data model design alterations.
• Stage (Extract) the source data from files, SQL server, DB2 sources into STAGE tables.
• Worked with Flat Files (Pipe Delimited) sources and implemented error handling routines.
• Worked with Flat Files Data source connections, ODBC data source connections (to SQL server sources) and DB2
data source connections.
• Built Lookup (LKP) tables (Code Set and Code Value) to identify correlation between facilities defined by different
• Used incremental approach in the Work Flows and Data Flows to integrate source data into warehouse tables using
• Used Query, SQL, LKP, Merge, Case, Validation transforms to transform the data (as needed) before loading into the
• Wrote custom functions to produce the derived fields using the source table fields for each record from source
• Used Date functions, String functions, Database functions to manipulate and format source data as a part of Data
cleansing and data profiling process.
• Worked with Date/time manipulations/arithmetic in DB2 using DB2 built in functionalities as well as Data Integrator
• Implemented history preserving using Table comparison, History Preserving and Key generation transformations in
warehouse /dimension workflows.
• As a part of optimization:
o Used MAP operations to route UPDATE and INSERT records in warehouse workflows
o Created the necessary indexes (for fields in WHERE clause)
o Run stats on all tables
o Right choice of transformations & right design in data flows
o Implemented incremental load process
• Created necessary indexes
• Built Data marts (Fact and Dimension tables) from the warehouse tables.
• Handled the cross schema (Stage, Warehouse, Data mart) challenges between the different environments
(Development, QA and Production) by granting permissions on tables using Database functions in scripts before
running work flows.
• As a part of QA testing, worked with the ECS team, Dash board developer and the Data modeler to test the ETL
process to populate the dashboard successfully.
• Prepared ETL design documents for implemented work flows and loaded the documents in the project share point site.
• Batch jobs configuration and set repository schedules using Data Integrator Management console
Environment: Data Integrator 11.7.2 (Business Objects ETL, SQL server (2000), DB2 (Server on UNIX), Toad For DB2,
Data Integrator Management Console
Online Business Systems, Calgary, Alberta, Canada Dec ’06 – Present
Business Intelligence Consultant
Client: ConAgra Foods, Toronto, Canada April’06 -Ongoing
Datawarehousing/Cognos consultant (Focussed on Reporting (inclusive of the ETL using Cognos Decision Stream,
Modelling using Cognos Framework Manager))
Description: ConAgra Foods is North America’s largest Packaged Food Company. The source system used to capture
their Sales numbers is BPCS. The Sales Data Model pre-built package was bought from Cognos and this was customized to
meet their specific business requirements. I was involved in development of reports in Cognos Report Studio to report on key
sales metrics and the related ETL work using Decision Stream and Modelling enhancements/customization in Cognos
Framework manager. The target datawarehouse tables were SQL server (2000) tables.
• Understand the Sales domain and ConAgra business processes, Cognos pre-built Sales Datamodel (Dimensional, Star
• Analyzed the business logic built into the warehouse and also gathered user requirements for the reporting work
• Documented requirements and performed Gap analysis to identify the gaps
• Created the required fact/dimension tables in SQL server 2000, exported tables into the production server (from
development) using import/export wizard in SQL server 2000
• The business logic to identify the required sales transactions were implemented in the Data Source SQL code using
• Used the Data Stream and Transformation model components in Decision Stream to extract the required attributes
from source and map to the target elements
• Built the Dimension and Fact builds in Decision Stream, executed the builds and job streams to load the required Fact
and Dimension data from the BPCS source to target SQL server Fact/Dimension tables
• Enhanced the Sales data model query subjects in the Cognos Framework manager by selecting the required query
items, creating complex calculation model elements (attributes) and fact measures
• Published Sales package to Cognos Report Net server
• Used Cognos Report Studio and developed complex queries to pull the sales information from different fact and
• Implemented the required prompts, filters, groups in the reports (based on user requirements)
• Used tabular reference from different queries, tabular models to re-use summary calculations in reporting
• Used data formatting techniques and complex SQL calculations (specifically in grouping) to present the data in the
• Backed up XML specifications of reports and used the local save option in Cognos to transfer reports between
different environments/servers (Development/Production)
• Balanced sales numbers between the reports and the BPCS source system (for validation/testing) on a daily basis
Environment: BPCS ERP system, SQL server (2000), Cognos Decision stream ETL/Data warehousing/Dimensional
Modeling, Cognos Framework manager (version 7), Cognos Report Studio (Report Net), Windows XP
Client: City of Calgary, Calgary, Alberta, Canada Dec’06 – Feb‘07
Datawarehousing/Business Intelligence (with Informatica, Business Objects) consultant
Description: City of Calgary offers services to the city people using a 3-1-1 phone line. Citizens can use this service line to
request information on the services provided or for any form of assistance offered by the government. Call center employees
record the user information through a source CSR system (Contact Center Representative System). The source information
stored (in the background) in the Oracle database is loaded into the CSR Datawarehouse using the ETL tool – Informatica and
Business Objects is used to report on the warehouse information providing business intelligence for end users.
Worked on 7 key enhancement problems related to the Business Intelligence solutions currently offered by the CSR
datawarehouse. The solutions provided covered all the areas of Business intelligence including the Datawarehouse design, ETL
(Informatica) design, Business Objects Universe design and Business Objects reporting. Specific responsibilities include:
• Understand the Call Center Datawarehouse and the underlying Dimensional Data Model
• Used TOAD (menu driven interface) for Oracle to query the call center datamart (in Oracle). Extensive use of SQL
for querying to understand the underlying data model
• Technical analysis of the datawarehouse and relate to the user requirements to find the source of the problems
• Analyzed the business logic built into the warehouse and also gathered user requirements for enhancement problems
• Enhanced the data model, Designed and Developed technical solutions ( complying the underlying data model) for
• Attacked core issues of the datawarehouse including datamart security and implemented solutions
• Understand the ETL (Informatica) that was used to transform and load data between the source system and datamart
• Designed ETL process flow and developed data mapping spread sheets to define transformation rules for each stage of
• Enhanced/Developed new ETL processes for the enhancement solutions. Worked with transformations in Informatica
• Responsible for monitoring, scheduled, running, completed and failed sessions in Informatica
• Enhanced the Business Objects Universe design to add new tables, identify joins and contexts, define new classes and
objects for enhancement solutions
• Provided sample reports using business objects to clients as proof of concept solutions
• Worked with different data providers in Business Objects, including TAB delimited and CSV data providers.
Enhanced Visual Basic Macros to modify CSV source files data providers
• Exposure/worked with processes involving the refresh of the datawarehouse, business objects universe and executed
• Unit and integrated testing of developed solutions in the datawarehouse, ETL, Business Objects Universe and
• Designed, implemented and documented QA processes for User Acceptance Testing
• Documented the requirements gathering and the design of the solutions provided
• Communicated with Clients and provided status reports on a weekly basis
• Project management, project planning and utilization of resources
Environment: CSR system, Oracle (9), ETL/Data warehousing/Dimensional Modeling, Informatica (7.4), Business
Objects (5.1 and 6.5), Windows XP, TOAD for Oracle
Enterprise Reporting, Information Technology, University of Calgary, Calgary, Alberta, Canada Apr ’06-Nov ‘06
Description: University of Calgary, Enterprise reporting team offers Datawarehousing and Reporting services to the internal
customers such as Finance, HR, Student and other administrative departments. The datawarehouse is SQL server and the
reporting tool is Cognos (Reportnet) and Cognos 8. People Soft ERP is used to store all the source data and Decision Stream as
well as Microsoft Integration Services (SSIS) is used to extract, transform and load between the source and the target
COGNOS 8- TRAINING
Underwent 2 day Report Studio and Query Studio module (COGNOS 8) training from COGNOS, Ottawa, Canada.
Data/Requirements Design and Analysis of Data warehouse portal projects.
Understand the physical data structures and contents of the Data marts in the warehouse and all related metadata
and business logic.
Interacting with business users and translating requirements to developers.
Provide technical support (hardware and software) to COGNOS Report Net users within the University.
Communication with the user base and addressing user related issues (user access, training, communication).
Understand user issues; track down problems from the oracle source to people soft systems to warehouse data
marts using SQL queries.
Log user reports, maintain and resolve user bugs using an action request software application. Identified
frequently asked problems and prepared documentation.
Maintain status updates of user problems using MS EXCEL spread sheets.
Extract, Transform and Load data using Microsoft ETL tool (SSIS) from stage tables between different sources
and target databases (Oracle, SQL 2005).
Integrate the TSQL stored procedures inside the SSIS tool to automate backend monitoring tasks in SQL 2005
using SSIS tool. Using property expressions in SSIS to dynamically connect multiple servers at runtime (to run
Error handling using SSIS to report job failures.
Preparation of test cases and test plans for system/module testing (Unit, integrated and Regression tests) for
Created corporate Report Templates in Report Net with Customized headers and footers.
Created List, Crosstab, and Chart reports.
Added business rules, filter, calculations and prompts to reports.
Set up as drill through targets from other cognos reports.
Managed reports in cognos connection.
Tested reports for data validation, synchronization & optimizing performance.
Involved in Reports Testing in different testing environments as Integration Testing, System testing and User
Project charter, planning and preparation of design documents for all accomplished projects.
Use Visual Source Safe version control system to maintain projects versions.
Environment: People Soft Source System, OLAP Cognos Report Net, ETL/Data warehousing, Oracle, SQL Server
2005, Query Analyzer, PL/SQL, Visual SourceSafe 6.1, Windows XP, Visual Studio
Advanced Database Systems
Department of Computer Science, University of Calgary Jan’02-Dec’04, Feb’06-Apr’06
Calgary, Alberta, Canada
Graduate Assistant (Data Mining)
Description: The Database Laboratory at the University of Calgary undertakes Data Mining Research work with
focus to applications in Business Intelligence. The laboratory also has several industry tie-ups to ensure the
practicability of the work in the real-world.
I presented all related work in World Data mining conference (participated by Data Mining industries and relevant
community) in Las Vegas, June 2006. Also, the work was published in a Machine Learning Conference in Los
Angeles, December 2005.
Designed, Developed and Validated a Data Mining framework for knowledge discovery in Databases.
Developed solutions (using Data Mining techniques) that process and manipulate large amounts of data into
information for business decisions.
Worked with Health Care (Breast Cancer Datasets), business datasets, small and large.
Ability to bring a business problem within the framework of Data Mining, find solutions
Proposed a new methodology, devised and developed new algorithms for Data Pre-processing and Data-Post
The functionalities were implemented using the C-programming language on Unix platform. Implementation also
included in-depth usage of WEKA Data Mining Software tool written in JAVA.
Provided basic technical support for users of WEKA software as well. Enhancements to the JAVA functionalities
to improve performance.
Extensive UNIX Shell Programming to interface (pass messages) between the JAVA code (tool) and the C code
(Software) to exploit existing decision support techniques.
Demonstrated, proven sound knowledge of Data Mining and Data Analytics Concepts (Refer publications).
Experience in Predictive modeling and Data Analysis
Using appropriate tools and techniques to analyze a variety of data sources
Extensive Extraction, Data cleansing, Data pre-processing, feature extraction (using Pre-processing tools) (for
Mining) from databases
Qualitative Data analysis to finding the right data-split, data selection for training and validation datasets
Experience using Random sampling, stratified random sampling, Boot Strap and Cross validation techniques for
Using Data Mining tools find patterns or rules answering decision support problems
Extensive experience with Classification (Including Decision Tress, Neural Networks), Association Rule Mining
Data post-processing using Genetic Algorithms, rule-extraction using relevant Data Mining tools to find efficient
set of accurate and intelligent (new) patterns
Used post-processing visualization tools
Maintenance of Data quality and Data Integrity
Created and present analytical reports – both written and graphical
Exposure to Data warehousing methodologies
Trained students for Programming, Data structures: Pascal, C, C++, and Python on UNIX platform. Helped in
coding and debugging as well.
Environment: UNIX, C, WEKA Data Mining Tool, Genetic Algorithms
Tom Baker Cancer Center Jan’03-Dec’03
Alberta Cancer Board, Calgary, Alberta, Canada
Tom Baker Cancer Center is a unit of the Alberta Cancer Research Board. They were using statistical models to detect
Breast Cancer Recurrence Risk. While the quantitative data analysis was effective to an extent, they wanted to make
better predictions using qualitative data analysis such as Data Mining (combined with Quantitative analysis). I worked
on a project to detect risk factors patterns or rules for high risk cancer patients in Alberta using Data Mining
Understand the underlying Breast Cancer Data
Bring the problem within the framework of WEKA Data Mining tool
Data pre-processing and transformation (including cleaning missing values) of the data suitable for mining
Data Split with Random sampling, Cross Validation techniques
Convert the schema to a form handled by Data Mining tools (CSV format)
Experience working with Flat File (CSV) schema, discrete attributes
Used Analyzer component of WEKA Data Mining tool to check for file formats
Used the Experimenter component of WEKA Data Mining tool to run several classification algorithms
(including Decision Trees and Neural Networks)
Used feature extraction techniques in the tool to discover the most important features affecting the problem
Compared the performance of the algorithms, including the boosting and bagging techniques
Used the Association Rule Mining technique (APRIORI) in the tool to discover patterns
Used post-processing strategies for discovering intelligent and accurate patterns hence-forth
Validated the results to the Cancer Board
Excellent presentation skills in the form of Preparation of reports and graphical results, submitted to the Alberta
Presented and Communicated benefits and process of Data Mining to medical audience (in Tom Baker Cancer
Center and Calgary Health Region)
Fluent India Pvt Ltd Aug’98-Jun’01
Subsidiary of Fluent Inc, Newhampshire, USA
Senior Software Engineer
Description: Fluent Inc, based at New Hampshire develops Computational Fluid Dynamics software. Their clients
span engineering CAD/CAE (CAD Aided Engineering) applications. Pre-processor, Solver and Post-processors (for
visualization) are the products developed.
Training@Fluent Inc, USA: Underwent 2 month training program from Fluent Inc, New Hampshire, USA
between Nov’98 – Dec’99
As an application developer, worked through the software development lifecycle to design and build the
integration of Fluent Solver with other CAD/CAE Software Products.
Understand the underlying Fluent Code, Import/Export of Data File formats
Understand the File formats of several CAE software products such as Ansys, Abaqus, Data Explorer, IDEAS etc.
Requirements analysis, Design of functionalities facilitating data import/export between products
Communicate with users, translate business requirements
Implement product functionalities
Worked with several file formats (Flat files, text files, Binary files, CSV files)
Excellent Experience working with conversion tools, PDF, PS, Latex, HTML for text and graphical display
Experience in understanding, interpreting underlying high level software codes (C, C++, LISP, Fortran) of
different CAE products
Experience in linking (using libraries) of software codes
Experience using CVS version control technique to maintain codes
Porting the software codes on different UNIX, LINUX platforms
Developed and debugged prototype on multiple platforms to ensure maximum usability
Developed Graphical User Interface for Integration functionalities of Fluent Software
Perform Unit testing, Integrated testing, Acceptance testing of developed functionalities
Also, performed regression Testing of Key functionalities of Fluent Solver. As a part of this work, written UNIX
shell scripts to automate the testing process (that involves high manual time otherwise).
Excellent writing skills demonstrated. Extensive Technical writing/documentation of POLYFLOW (one of the
software products) user manuals for end users using Latex. Collaborated very well with developers, application
staff and users of Fluent Inc across the Globe (USA, UK, Belgium)
Installation of Fluent application (and supported) software’s on Windows/UNIX workstations. Provided support
to users during training/work sessions.
Presentation of developed functionalities (using MS OFFICE) to developers in the team/world wide.
Experience working with support/developers/technical consultants between different CAE software companies
(work necessitated quite a lot of interactions) in India/North America/Europe
Environment: UNIX, C, C++, Latex, MS Office products (Word, Power Point)
Tata InfoTech Computer Education, India July’07-July’08
Automating Updation of Time Sheets
Time Sheet System is user-friendly software used to check if employees have logged their time sheets weekly.
Developed a system using Shell programming whose functionalities include:
Keep track of users time sheets
Check if users have updated their time sheets on the specified day of the week
If the users did not time sheets, send a reminder
Once the time sheets are updated in their local directories, copy it to a common time sheet repository
Send a report to the team manager on the status of time sheets
Created variables, used UNIX commands (echo, ls, wc, cp etc.)
Proficient in using regular expressions using grep, find, expr, sed, awk
Used File manipulation commands such as cat, touch, sort
Very proficient in using pipes, redirections and shell programming constructs (if-then, case)
Environment: UNIX Shell Programming
Automated Inventory System
Involved in design and development of an application for automation of Government identity cards (for purchase of
items in a Government handled unit). This interactive application has the following features.
Inform customers about availability of things and ordering.
Kept trace of consumer payments.
Accepted orders from consumer and allot goods taking into account the priority or demand of each consumer
Involved in configuration management and quality assurance
Simple Report generation on daily statistics using status reports.
Responsible for development of front end using Developer 2000. Store backend data in Oracle; write triggers to
pop event messages using PL / SQL and Oracle reports
Environment: Oracle, PL /SQL, Oracle Designer and Forms4.5 and 5
Library Management System
Library management system is user-friendly software used for the maintenance of a library. It maintained the
transaction of books, record of members and the list of returnable books. It provided reports such as categorized by
members, group codes, author and publishers. It also gave the listing of members, issues, outstanding and all the
facilities and options required for an efficient library management system.
Analyzed the requirements and prepared the analysis report.
Designed and developed the functionalities:
R Item Management
L Search for Books
o Search for Members
Used basic and advanced C programming techniques using functions, procedures, Pointers, Linked lists and
Memory management on UNIX platform
Environment: UNIX and C (Pointers, Linked Lists, Memory Management)
J. Gopalan, E. Korkmaz, R. Alhajj and K. Barker, “Effective Data Mining by Integrating Genetic Algorithm into
the Data Preprocessing Phase,” Proceedings of the International Conference on Machine Learning and
Applications, Los Angeles, CA, USA, Dec. 2005.
J. Gopalan, R. Alhajj and K. Barker, “Post-processing Rule Sets Using Genetic Algorithms,” Proceedings of the
International Conference on Data Mining, Las Vegas, USA, June. 2006.