This document presents a case study on GINA (Global Innovation Network and Analysis), which analyzed innovation data from EMC to improve knowledge sharing and identify patterns. The case study followed the data analytics lifecycle, including discovery, data preparation, model planning, model building using natural language processing, social network analysis and data visualization. Key findings were identifying "hidden innovators" and boundary spanners to promote sharing. The study was considered successful but further longitudinal studies were recommended to improve the model over time.
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Analyzing Innovation Data at EMC's GINA Team
1. SNJB’S LATE SAU K. B. JAIN COE, CHANDWAD
DEPARTMENT OF COMPUTER ENGINEERING
ACADEMIC YEAR 2020-21
SUBJECT: DATAANALYTICS
CASE STUDY ON GINA
By
Name: Divya Prafull Wani
Roll No:41
Class: BE Computer
Date:16-08-2020
Case StudyOn GINA
1
2. GINA
■ Global Innovation Network and Analysis.
■ The GINA case study provides an example of how a team applied the Data Analytics
Lifecycle to analyze innovation data at EMC.
■ GINA is a group of senior technologists located in centers of excellence around the world.
■ The GINA team thought its approach would provide a means to share ideas globally and
increase knowledge sharing among members who may be separated geographically.
■ It planned to create a data repository containing both structured and unstructured data to
accomplish 3 main goals .
1. Store Formal and informal data.
2. Track research from global technologists.
3. Mine the data for patterns and insights to improve the teams operations and strategy.
Case StudyOn GINA 2
3. Phase 1 : Discovery
■ Team Members and Roles
Business user, project sponsor, project manager -Vice President from Office of CTO.
BI analyst – person from IT
Data Engineer and DBA – people from IT
Data Scientist – distinguished engineer.
■ The data fell into two categories
5 years of idea submissions from internal innovation contests.
Minutes an notes representing innovation and research activity from around the world.
■ The data fell into two categories
5 years of idea submissions from internal innovation contests.
Minutes an notes representing innovation and research activity from around the world.
■ Hypothesis grouped into 2 categories
Descriptive analytics of what is happening to spark further creativity, collaboration, an asset generation
Predictive analytics to advise executive management of where it should be investing in the future.
Case StudyOn GINA 3
4. Phase 2 : Data Preparation
■ Set up anAnalytical Sandbox to store and experiment on the data.
■ Discovered that certain data needed conditioning and normalization and that missing
datasets were critical.
■ Team recognized that poor quality data could impact subsequent steps.
■ They discovered many names were misspelled and problems with extra spaces.
■ Important to determine what level of data quality and cleanliness was sufficient for
the project being undertaken.
Case StudyOn GINA 4
5. Phase 3: Model Planning
■ Included following considerations :
Identify the right milestones to achieve the goals
Trace how people move ideas from each ,milestone towards the goal.
Once this is done, trace ideas that die and others that reach the goal.Compare the
journeys of ideas that make it and those that do not.
Compare times and outcomes using a few different methods.These could be as simple
as t-tests or perhaps involve different types of classificationAlgorithms.
Case StudyOn GINA 5
6. Phase 4 : Model Building
■ The GINA team employed several analytical methods.This included work by the data
scientist using Natural Language Processing (NLP) techniques on the textual
descriptions of the innovation Roadmap ideas.
■ Social Network Analysis using R and Rstudio.
■ Developed SocialGraphs andVisualizations.
Case StudyOn GINA 6
7. Social Graph Data
Submitters and Finalists
and Graph of top
innovation influencers
• Fig shows socai graphs that portray
relationships between idea submitters within
GINA.
• Each colour represents an innovator from a
different country.
• The large dots with red circles around them
represent hubs.
• A hub represents a person with high
connectivity and a high “betweeness” score.
• The team usedTableau software for data
visualization and exploration and used the
Pivotal Greenplum database as the main data
repository and analytics engine.
Case StudyOn GINA 7
8. Phase 5 : Communicate Results
■ This project was considered successful in identifying boundary spanners and hidden
innovators.
■ The GINA project promoted knowledge sharing related to innovation an researchers
spanning multiple areas within the company and outside of it.
■ The GINA also enables EMC to cultivate additional property leads to research topics
and provided opportunities to forge relationships with universities for joint academic
research in the fields of Data Science and Big Data.
■ Study was successful in identifying hidden innovators.(found high density in Cork,
Ireland)
■ The CTO office launched longitudinal studies.
Case StudyOn GINA 8
9. Phase 6: Operationalize
■ Deployment was not really discussed
■ Key Findings
Need more data in future
Some data were sensitive
A parallel initiative needs to be created to improve basic BI activities.
A mechanism is needed to continually reevaluate the model after deployment.
Case StudyOn GINA 9