SNJB’S LATE SAU K. B. JAIN COE, CHANDWAD
DEPARTMENT OF COMPUTER ENGINEERING
ACADEMIC YEAR 2020-21
SUBJECT: DATAANALYTICS
CASE STUDY ON GINA
By
Name: Divya Prafull Wani
Roll No:41
Class: BE Computer
Date:16-08-2020
Case StudyOn GINA
1
GINA
■ Global Innovation Network and Analysis.
■ The GINA case study provides an example of how a team applied the Data Analytics
Lifecycle to analyze innovation data at EMC.
■ GINA is a group of senior technologists located in centers of excellence around the world.
■ The GINA team thought its approach would provide a means to share ideas globally and
increase knowledge sharing among members who may be separated geographically.
■ It planned to create a data repository containing both structured and unstructured data to
accomplish 3 main goals .
1. Store Formal and informal data.
2. Track research from global technologists.
3. Mine the data for patterns and insights to improve the teams operations and strategy.
Case StudyOn GINA 2
Phase 1 : Discovery
■ Team Members and Roles
 Business user, project sponsor, project manager -Vice President from Office of CTO.
 BI analyst – person from IT
 Data Engineer and DBA – people from IT
 Data Scientist – distinguished engineer.
■ The data fell into two categories
 5 years of idea submissions from internal innovation contests.
 Minutes an notes representing innovation and research activity from around the world.
■ The data fell into two categories
 5 years of idea submissions from internal innovation contests.
 Minutes an notes representing innovation and research activity from around the world.
■ Hypothesis grouped into 2 categories
 Descriptive analytics of what is happening to spark further creativity, collaboration, an asset generation
 Predictive analytics to advise executive management of where it should be investing in the future.
Case StudyOn GINA 3
Phase 2 : Data Preparation
■ Set up anAnalytical Sandbox to store and experiment on the data.
■ Discovered that certain data needed conditioning and normalization and that missing
datasets were critical.
■ Team recognized that poor quality data could impact subsequent steps.
■ They discovered many names were misspelled and problems with extra spaces.
■ Important to determine what level of data quality and cleanliness was sufficient for
the project being undertaken.
Case StudyOn GINA 4
Phase 3: Model Planning
■ Included following considerations :
 Identify the right milestones to achieve the goals
 Trace how people move ideas from each ,milestone towards the goal.
 Once this is done, trace ideas that die and others that reach the goal.Compare the
journeys of ideas that make it and those that do not.
 Compare times and outcomes using a few different methods.These could be as simple
as t-tests or perhaps involve different types of classificationAlgorithms.
Case StudyOn GINA 5
Phase 4 : Model Building
■ The GINA team employed several analytical methods.This included work by the data
scientist using Natural Language Processing (NLP) techniques on the textual
descriptions of the innovation Roadmap ideas.
■ Social Network Analysis using R and Rstudio.
■ Developed SocialGraphs andVisualizations.
Case StudyOn GINA 6
Social Graph Data
Submitters and Finalists
and Graph of top
innovation influencers
• Fig shows socai graphs that portray
relationships between idea submitters within
GINA.
• Each colour represents an innovator from a
different country.
• The large dots with red circles around them
represent hubs.
• A hub represents a person with high
connectivity and a high “betweeness” score.
• The team usedTableau software for data
visualization and exploration and used the
Pivotal Greenplum database as the main data
repository and analytics engine.
Case StudyOn GINA 7
Phase 5 : Communicate Results
■ This project was considered successful in identifying boundary spanners and hidden
innovators.
■ The GINA project promoted knowledge sharing related to innovation an researchers
spanning multiple areas within the company and outside of it.
■ The GINA also enables EMC to cultivate additional property leads to research topics
and provided opportunities to forge relationships with universities for joint academic
research in the fields of Data Science and Big Data.
■ Study was successful in identifying hidden innovators.(found high density in Cork,
Ireland)
■ The CTO office launched longitudinal studies.
Case StudyOn GINA 8
Phase 6: Operationalize
■ Deployment was not really discussed
■ Key Findings
 Need more data in future
 Some data were sensitive
 A parallel initiative needs to be created to improve basic BI activities.
 A mechanism is needed to continually reevaluate the model after deployment.
Case StudyOn GINA 9
Analytic Plan from the EMC GINA
Project
Case StudyOn GINA 10
Reference
■ Data-science-and-big-data-analy-nieizv_book
■ https://bhavanakhivsara.wordpress.com/subjects/data-analytics/
Case StudyOn GINA 11
ThankYou!!
Case StudyOn GINA 12

Case Study on GINA(Global Innovation Network and Analysis) based on Data Analytic Life Cycle

  • 1.
    SNJB’S LATE SAUK. B. JAIN COE, CHANDWAD DEPARTMENT OF COMPUTER ENGINEERING ACADEMIC YEAR 2020-21 SUBJECT: DATAANALYTICS CASE STUDY ON GINA By Name: Divya Prafull Wani Roll No:41 Class: BE Computer Date:16-08-2020 Case StudyOn GINA 1
  • 2.
    GINA ■ Global InnovationNetwork and Analysis. ■ The GINA case study provides an example of how a team applied the Data Analytics Lifecycle to analyze innovation data at EMC. ■ GINA is a group of senior technologists located in centers of excellence around the world. ■ The GINA team thought its approach would provide a means to share ideas globally and increase knowledge sharing among members who may be separated geographically. ■ It planned to create a data repository containing both structured and unstructured data to accomplish 3 main goals . 1. Store Formal and informal data. 2. Track research from global technologists. 3. Mine the data for patterns and insights to improve the teams operations and strategy. Case StudyOn GINA 2
  • 3.
    Phase 1 :Discovery ■ Team Members and Roles  Business user, project sponsor, project manager -Vice President from Office of CTO.  BI analyst – person from IT  Data Engineer and DBA – people from IT  Data Scientist – distinguished engineer. ■ The data fell into two categories  5 years of idea submissions from internal innovation contests.  Minutes an notes representing innovation and research activity from around the world. ■ The data fell into two categories  5 years of idea submissions from internal innovation contests.  Minutes an notes representing innovation and research activity from around the world. ■ Hypothesis grouped into 2 categories  Descriptive analytics of what is happening to spark further creativity, collaboration, an asset generation  Predictive analytics to advise executive management of where it should be investing in the future. Case StudyOn GINA 3
  • 4.
    Phase 2 :Data Preparation ■ Set up anAnalytical Sandbox to store and experiment on the data. ■ Discovered that certain data needed conditioning and normalization and that missing datasets were critical. ■ Team recognized that poor quality data could impact subsequent steps. ■ They discovered many names were misspelled and problems with extra spaces. ■ Important to determine what level of data quality and cleanliness was sufficient for the project being undertaken. Case StudyOn GINA 4
  • 5.
    Phase 3: ModelPlanning ■ Included following considerations :  Identify the right milestones to achieve the goals  Trace how people move ideas from each ,milestone towards the goal.  Once this is done, trace ideas that die and others that reach the goal.Compare the journeys of ideas that make it and those that do not.  Compare times and outcomes using a few different methods.These could be as simple as t-tests or perhaps involve different types of classificationAlgorithms. Case StudyOn GINA 5
  • 6.
    Phase 4 :Model Building ■ The GINA team employed several analytical methods.This included work by the data scientist using Natural Language Processing (NLP) techniques on the textual descriptions of the innovation Roadmap ideas. ■ Social Network Analysis using R and Rstudio. ■ Developed SocialGraphs andVisualizations. Case StudyOn GINA 6
  • 7.
    Social Graph Data Submittersand Finalists and Graph of top innovation influencers • Fig shows socai graphs that portray relationships between idea submitters within GINA. • Each colour represents an innovator from a different country. • The large dots with red circles around them represent hubs. • A hub represents a person with high connectivity and a high “betweeness” score. • The team usedTableau software for data visualization and exploration and used the Pivotal Greenplum database as the main data repository and analytics engine. Case StudyOn GINA 7
  • 8.
    Phase 5 :Communicate Results ■ This project was considered successful in identifying boundary spanners and hidden innovators. ■ The GINA project promoted knowledge sharing related to innovation an researchers spanning multiple areas within the company and outside of it. ■ The GINA also enables EMC to cultivate additional property leads to research topics and provided opportunities to forge relationships with universities for joint academic research in the fields of Data Science and Big Data. ■ Study was successful in identifying hidden innovators.(found high density in Cork, Ireland) ■ The CTO office launched longitudinal studies. Case StudyOn GINA 8
  • 9.
    Phase 6: Operationalize ■Deployment was not really discussed ■ Key Findings  Need more data in future  Some data were sensitive  A parallel initiative needs to be created to improve basic BI activities.  A mechanism is needed to continually reevaluate the model after deployment. Case StudyOn GINA 9
  • 10.
    Analytic Plan fromthe EMC GINA Project Case StudyOn GINA 10
  • 11.
  • 12.