Article

Enabling Big Data Analytics with Modeling Workbench
Authors: Ravishankar Rajagopalan
and Dhanesh Padmanabhan
Data...
The data scientists at DSG are
required to analyze enormous
amounts of data to develop new
insights and models that can
ac...
EDA is the process of using standardized statistical procedures such as
univariate and bivariate analysis to extract varia...
Columnar DB

Weblogs

Big Data Stack

Workbench Backend

Java Front End

Data Scientists

The Modeling Workbench Architect...
Upcoming SlideShare
Loading in …5
×

Enabling Big Data Analytics with Modeling Workbench

1,845 views

Published on

Enabling Big Data Analytics with Modeling Workbench

Authors: Ravishankar Rajagopalan
and Dhanesh Padmanabhan
Data Science Infrastructure (DSI) Team
Data Sciences Group
[24]7 Innovation Labs
Bangalore, India

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,845
On SlideShare
0
From Embeds
0
Number of Embeds
46
Actions
Shares
0
Downloads
20
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Enabling Big Data Analytics with Modeling Workbench

  1. 1. Article Enabling Big Data Analytics with Modeling Workbench Authors: Ravishankar Rajagopalan and Dhanesh Padmanabhan Data Science Infrastructure (DSI) Team Data Sciences Group [24]7 Innovation Labs Bangalore, India
  2. 2. The data scientists at DSG are required to analyze enormous amounts of data to develop new insights and models that can accurately predict customer intent. [24]7 Inc accumulates several gigabytes of data from web, mobile, chat and IVR channels every day. Innovation Labs (iLabs), the technology division of [24]7, provides predictive analytics solutions to improve customer experience. Data Sciences Group (DSG) of the iLabs is primarily responsible for developing statistical and machine learning models that predict customer intent. These models are used to offer contextual chat, self-serve application on the web channel or contextual IVR menu on the IVR channel, driving down the time required for a customer to locate the information they are seeking, thereby improving the overall experience. The data scientists at DSG are required to analyze enormous amounts of data to develop new insights and models that can accurately predict customer intent. There is also a constant need to improve the models due to evolving customer behavior and changing business landscape of our customers, which requires continual monitoring of models and model updates. The Data Science Infrastructure (DSI) team is primarily responsible for building scalable analytics products to equip the data scientists with tools to quickly analyze data, develop models and monitor performance of models. Modeling Workbench is one such tool developed by DSI. Workbench is a web-based tool for the data scientists to analyze millions of online customer journeys What is the Modeling Workbench? Modeling Workbench is one of the products DSI conceptualized and developed in collaboration with the Platform Engineering (PE) team of iLabs and currently being piloted for the web channel. Workbench is a web-based tool for the data scientists to analyze millions of online customer journeys and develop quick insights and build models at scale for improved online predictive targeting. Workbench is expected to support Exploratory Data Analysis (EDA), Model building/Validation and Simulation. Model deployment and model monitoring are supported by other internal tools developed at iLabs. The feedback from the production systems drives the model improvements. Development Production Model Building Exploratory Data Analysis Big Data Model Deployment Model Validation Model Monitoring Model Simulation Modeling Life Cycle Follow [24]7 India www.247-inc.com
  3. 3. EDA is the process of using standardized statistical procedures such as univariate and bivariate analysis to extract variables (features) of interest for the problem at hand (predict online user’s purchase intent), which are then subsequently used for model building. Model building and validation involves implementing several advanced statistical/machine learning algorithms and picking the best performing model. Simulation is used for understanding the dynamics of the model in real time. These phases are iterative and a data scientist typically goes through several iterations to identify the most effective model. Being highly scalable, the workbench could be used to analyze 100+ million customer journeys in a few minutes. The workbench provides customized data analytics functionalities at the click of a button and it is expected to save considerable time and effort for the data scientists. Being highly scalable, the workbench could be used to analyze 100+ million customer journeys in a few minutes. In addition, the workbench also incorporates best practices to be adopted during different phases of modeling and also facilitates standardization of analyses across DSG. Productivity Reduce time to analyze data and build models by 50-75% Scalability Provide ability to build and simulate models with millions of customer journeys in a few minutes Standardization Standardize model building and analysis Benefits of Modeling Workbench What is the Technology behind the Workbench? Data scientists at [24]7 in the past have traditionally used relational databases in conjunction with statistical modeling and data mining software such as R and Python for analyzing data. The process in the past involved writing custom SQL scripts on relational databases to prepare the datasets and moving this prepared datasets to other computing infrastructure where R and Python scripts were used for analysis and model building. This traditional approach severely limits the size of data one could analyze since most statistical modeling software is memory dependent. Follow [24]7 India www.247-inc.com
  4. 4. Columnar DB Weblogs Big Data Stack Workbench Backend Java Front End Data Scientists The Modeling Workbench Architecture The tight integration of R and columnar database technology allows for scalable data analytics The workbench solves these issues by connecting users through a central web-based application to an analytical database, which is based on a distributed columnar database technology. The workbench exposes a standard set of analyses that execute as server-side SQL or R scripts running directly on the columnar database. The tight integration of R and columnar database technology allows for scalable data analytics without the need for data movement. The distributed columnar database obtains the data from Hive tables where weblogs are being transformed on a daily basis using Python Map-reduce scripts within Hive. The workbench itself is a Java-based web application that accesses the data from the distributed columnar database remotely. The analyses performed by data scientists are cached in an application database powered by Mongo DB, which ensures quick retrieval of results from previously-saved analysis. The saved analyses are shareable across the team for effective collaboration. expected to include natural language processing, text and speech analytics Modeling workbench provides a scalable analytics platform for quickly crunching data, generating useful insights, and building advanced statistical & machine learning models. The current version supports the analysis of web channel data. Future versions are expected to include natural language processing, text and speech analytics for data obtained from [24]7’s chat and IVR platforms. About the Authors Dhanesh Padmanabhan leads the Data Science Infrastructure team with the [24]7 Data Sciences Group (DSG). He holds the responsibilities of developing the analytics infrastructure and the prediction platform for DSG. He has 10 years of experience in marketing analytics in R&D, KPO and Consulting companies including General Motors R&D, HP Analytics and Marketics Technologies (now WNS). He holds a Ph.D. in Mechanical Engineering from the University of Notre Dame. Ravishankar Rajagopalan is a Principal Analytics Consultant in the [24]7 Data Science Infrastructure (DSI) team. He is the DSG (Data Sciences Group) lead for the modeling workbench project. Prior to [24]7, he had worked with GE Power and Water as part of their Advanced Analytics team and Mu Sigma. He holds a Ph.D. in Applied Statistics from The Ohio State University. Follow [24]7 India www.247-inc.com

×