SlideShare a Scribd company logo
1 of 1
Philip Morris International R&D, Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland
pmi.com, pmiscience.com
Agile Development and Fast Data Integration in a Large R&D Company
Antonio Castellon1, Pavel Pospisil2
1blue-infinity, Geneva, Switzerland, 2Philip Morris International R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
Research and Development departments in large companies like Philip Morris International (PMI) deal with extremely large
quantities of data in a variety of formats, including primary scientific data, results from chemical, biological and toxicological
assays, and data from clinical investigations. All these data are recorded in different formats, derived from scientific and
laboratory management systems, data warehouses and documents. Now the question is, how do we provide users with a
convenient, single-point entry overview for all results… and fast?
Here we present an approach to perform data integration using the best technologies and architectures currently available,
including OSGi architecture combined with Micro-services, document and graph-based databases and web based user
interfaces.
The first prototype was completed by just one scientist and one IT developer in less than 3 months.
The product is named ChemoInformatics KnowledgeBase (CIKB). CIKB
is chemocentric database that (concretely within PMI) assembles data for
chemical constituents present in the aerosol of our products (e.g.
conventional cigarettes, e-cigarettes) and associates them with both
internal and publicly available scientific data. It is chemocentric since it
takes the most essential node of information related to the chemical
substance, however it can be expanded to any other-centric direction
later.
1. Concept 2. Architecture
3. Project Management 4. Conclusion
Pros
• Rapid development of new features
• Reduction in complexity with the use of separate services
• Highly flexible and adaptable environment enabling easy integration
of evolving user requirements using new languages and/or
frameworks from each service
• Easy to redistribute the services to several servers according to
computational and bandwidth requirements
• Easy deployment; only requires installation of new services and not
the full application
• The graph database allows better adaptability and flexibility in the
schema model than in traditional RDBMs
• A very good decoupled application between the view and data layers
using an AngularJS framework
• Easy redistribution of the work packages into future teams
Cons
• Requires a multidisciplinary developer or a group of developers with
skills in different environments (server, languages, frameworks, etc.)
• Although graph databases allow flexible model generation, it is crucial
to maintain regular discussions with scientists regarding the desired
model, to define the correct data type (node/relationship/property/etc.)
• The continuous integration of diverse data types derived from
different scientific disciplines requires an ongoing learning process
The architecture solution is an OSGi platform for core services,
incorporating Micro-services for the remainder, in order to manage
challenges (ambiguity, inconsistency, vagueness, incompleteness) due to
the complexity of the data models and the features required by the users.
Furthermore, it was essential to provide users with different interfaces
with the least effort required.
Less is More
According to the rule that “less is more,” the application development
started with minimal user requirements based upon common sense. The
complex features were defined as ‘epics’ in the tracking system JIRA. The
simple features were defined less formally within internal documents,
email communications and meeting notes.
CIKB
Graph Database
CHEMICAL CLUSTERING
CHEMOINFORMATICS
BIOINFORMATICS
CALCULATED
DATA
IN VITRO TOXICOLOGY
ANALYTICAL CHEMISTRY
IN VIVO TOXICOLOGY
SYSTEM TOXICOLOGY
PMI
MEASURED
DATA
PROPERTIES CALCULATION
AEROSOL PHYSICS
PRODUCT PORTFOLIO
FLAVOR & SENSORY DATA
AEROSOL CHEMISTRY
PMI
INTERNAL DATA
One comparative,
graphical,
user-friendly interface
FLAVOR PROPERTIES
REGULATORY LISTS
TOXICITY DATA
PUBLICLY
AVAILABLE DATA
CHEMICAL PROPERTIES
To be most efficient in communicating between the Scientific & Computer
worlds, two important roles were defined for this project:
As can be seen there is a high diversity of data types, including product
names, toxicities and calculated data, to name but a few. It requires a
degree of multidisciplinary knowledge and significant efforts to develop
the correct integration strategy for a single unique repository.
The architecture employed is robust, due to the fact that each service
is self-contained, having dependency upon other services only in the
case of security validation features. At the same time, it provides the
flexibility to create rapid solutions in response to user feedback, with
the ability to modify features or add technologies.
Scientific User 1
Scientific User <n>
. . .
IT dev. 1
IT dev. <n>
. . .JIRAUser req.
User req.
IT Leader
User + System req. = Tasks
Tasks
Scientific Leader
Feedback + bugs
SCIENTIFIC DOMAIN IT DOMAIN
Business User
. . .
Scientific Leader
In order to cover the diverse range
of scientific disciplines and also
to isolate the IT developers from
the complexity of scientific
understanding, this role acquires
the needs and wishes of scientists
and evaluates the priority for each
requirement from a business use
point of view.
IT Leader
In order to evaluate the complexity
for each new requirement, the IT
Leader will analyze each request and
propose solutions. Tasks are then
created and distributed to developers
in the form of work packages with
due dates. Finally, the Scientific and
IT leaders regularly check that the
solutions meet business needs.

More Related Content

Viewers also liked

Viewers also liked (11)

Aula 13 espontaneidade das reações - 2º ano
Aula 13   espontaneidade das reações - 2º anoAula 13   espontaneidade das reações - 2º ano
Aula 13 espontaneidade das reações - 2º ano
 
мо
момо
мо
 
frankonia-camaras_anecoicas_y_apantalladas_2012
frankonia-camaras_anecoicas_y_apantalladas_2012frankonia-camaras_anecoicas_y_apantalladas_2012
frankonia-camaras_anecoicas_y_apantalladas_2012
 
Aulas 15 e 16 eletrólise em solução aquosa
Aulas 15 e 16   eletrólise em solução aquosaAulas 15 e 16   eletrólise em solução aquosa
Aulas 15 e 16 eletrólise em solução aquosa
 
Amrapali la residentia
Amrapali la residentiaAmrapali la residentia
Amrapali la residentia
 
Moda
ModaModa
Moda
 
Endfm escenario
Endfm escenarioEndfm escenario
Endfm escenario
 
Cadere 7 volte, rialzarsi 8
Cadere 7 volte, rialzarsi 8Cadere 7 volte, rialzarsi 8
Cadere 7 volte, rialzarsi 8
 
Chromosomes2
Chromosomes2Chromosomes2
Chromosomes2
 
Configuracion excel 1 f
Configuracion excel 1 fConfiguracion excel 1 f
Configuracion excel 1 f
 
Diad emuertos
Diad emuertosDiad emuertos
Diad emuertos
 

Similar to CIKB poster_2015

IBM Bluemix: science fiction has been overtaken....now everything is possible
IBM Bluemix: science fiction has been overtaken....now everything is possibleIBM Bluemix: science fiction has been overtaken....now everything is possible
IBM Bluemix: science fiction has been overtaken....now everything is possibleCodemotion
 
cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)Pistoia Alliance
 
RECAP at ETSI Experiential Network Intelligence (ENI) Meeting
RECAP at ETSI Experiential Network Intelligence (ENI) MeetingRECAP at ETSI Experiential Network Intelligence (ENI) Meeting
RECAP at ETSI Experiential Network Intelligence (ENI) MeetingRECAP Project
 
Introducing Polymerize Connect_ The Ultimate Solution for Chemical R&D (1).pdf
Introducing Polymerize Connect_ The Ultimate Solution for Chemical R&D (1).pdfIntroducing Polymerize Connect_ The Ultimate Solution for Chemical R&D (1).pdf
Introducing Polymerize Connect_ The Ultimate Solution for Chemical R&D (1).pdfGerman Rings
 
Project On-Science
Project On-ScienceProject On-Science
Project On-ScienceAmrit Ravi
 
Privacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagePrivacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagedbpublications
 
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdfHOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdfAgaram Technologies
 
A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...
A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...
A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...Kim Daniels
 
Continuity in the development of seamless mobility: An approach for a system-...
Continuity in the development of seamless mobility: An approach for a system-...Continuity in the development of seamless mobility: An approach for a system-...
Continuity in the development of seamless mobility: An approach for a system-...IRJET Journal
 
Supercharging the Cloud for an Agile Enterprise
Supercharging the Cloud  for an Agile EnterpriseSupercharging the Cloud  for an Agile Enterprise
Supercharging the Cloud for an Agile EnterprisePatrick Bouillaud
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
Advanced infrastructure for pan european collaborative engineering - E-colleg
Advanced infrastructure for pan european collaborative engineering - E-collegAdvanced infrastructure for pan european collaborative engineering - E-colleg
Advanced infrastructure for pan european collaborative engineering - E-collegXavier Warzee
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devopsUlf Mattsson
 
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Ola Spjuth
 
Trends und Anwendungsbeispiele im Life Science Bereich
Trends und Anwendungsbeispiele im Life Science BereichTrends und Anwendungsbeispiele im Life Science Bereich
Trends und Anwendungsbeispiele im Life Science BereichAWS Germany
 

Similar to CIKB poster_2015 (20)

IBM Bluemix: science fiction has been overtaken....now everything is possible
IBM Bluemix: science fiction has been overtaken....now everything is possibleIBM Bluemix: science fiction has been overtaken....now everything is possible
IBM Bluemix: science fiction has been overtaken....now everything is possible
 
Digital transformation and AI @Edge
Digital transformation and AI @EdgeDigital transformation and AI @Edge
Digital transformation and AI @Edge
 
inmation Presentation_2017
inmation Presentation_2017inmation Presentation_2017
inmation Presentation_2017
 
cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)
 
RECAP at ETSI Experiential Network Intelligence (ENI) Meeting
RECAP at ETSI Experiential Network Intelligence (ENI) MeetingRECAP at ETSI Experiential Network Intelligence (ENI) Meeting
RECAP at ETSI Experiential Network Intelligence (ENI) Meeting
 
Development of a Mobile Application for the C2NET Supply Chain Cloud–based P...
Development of a Mobile Application for the  C2NET Supply Chain Cloud–based P...Development of a Mobile Application for the  C2NET Supply Chain Cloud–based P...
Development of a Mobile Application for the C2NET Supply Chain Cloud–based P...
 
Introducing Polymerize Connect_ The Ultimate Solution for Chemical R&D (1).pdf
Introducing Polymerize Connect_ The Ultimate Solution for Chemical R&D (1).pdfIntroducing Polymerize Connect_ The Ultimate Solution for Chemical R&D (1).pdf
Introducing Polymerize Connect_ The Ultimate Solution for Chemical R&D (1).pdf
 
Project On-Science
Project On-ScienceProject On-Science
Project On-Science
 
Privacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagePrivacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storage
 
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdfHOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
 
A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...
A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...
A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...
 
Continuity in the development of seamless mobility: An approach for a system-...
Continuity in the development of seamless mobility: An approach for a system-...Continuity in the development of seamless mobility: An approach for a system-...
Continuity in the development of seamless mobility: An approach for a system-...
 
Supercharging the Cloud for an Agile Enterprise
Supercharging the Cloud  for an Agile EnterpriseSupercharging the Cloud  for an Agile Enterprise
Supercharging the Cloud for an Agile Enterprise
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Advanced infrastructure for pan european collaborative engineering - E-colleg
Advanced infrastructure for pan european collaborative engineering - E-collegAdvanced infrastructure for pan european collaborative engineering - E-colleg
Advanced infrastructure for pan european collaborative engineering - E-colleg
 
A Multi-agent Approach for Processing Industrial Enterprise Data
A Multi-agent Approach for Processing Industrial Enterprise DataA Multi-agent Approach for Processing Industrial Enterprise Data
A Multi-agent Approach for Processing Industrial Enterprise Data
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devops
 
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
 
Trends und Anwendungsbeispiele im Life Science Bereich
Trends und Anwendungsbeispiele im Life Science BereichTrends und Anwendungsbeispiele im Life Science Bereich
Trends und Anwendungsbeispiele im Life Science Bereich
 

CIKB poster_2015

  • 1. Philip Morris International R&D, Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland pmi.com, pmiscience.com Agile Development and Fast Data Integration in a Large R&D Company Antonio Castellon1, Pavel Pospisil2 1blue-infinity, Geneva, Switzerland, 2Philip Morris International R&D, Philip Morris Products S.A., Neuchâtel, Switzerland Research and Development departments in large companies like Philip Morris International (PMI) deal with extremely large quantities of data in a variety of formats, including primary scientific data, results from chemical, biological and toxicological assays, and data from clinical investigations. All these data are recorded in different formats, derived from scientific and laboratory management systems, data warehouses and documents. Now the question is, how do we provide users with a convenient, single-point entry overview for all results… and fast? Here we present an approach to perform data integration using the best technologies and architectures currently available, including OSGi architecture combined with Micro-services, document and graph-based databases and web based user interfaces. The first prototype was completed by just one scientist and one IT developer in less than 3 months. The product is named ChemoInformatics KnowledgeBase (CIKB). CIKB is chemocentric database that (concretely within PMI) assembles data for chemical constituents present in the aerosol of our products (e.g. conventional cigarettes, e-cigarettes) and associates them with both internal and publicly available scientific data. It is chemocentric since it takes the most essential node of information related to the chemical substance, however it can be expanded to any other-centric direction later. 1. Concept 2. Architecture 3. Project Management 4. Conclusion Pros • Rapid development of new features • Reduction in complexity with the use of separate services • Highly flexible and adaptable environment enabling easy integration of evolving user requirements using new languages and/or frameworks from each service • Easy to redistribute the services to several servers according to computational and bandwidth requirements • Easy deployment; only requires installation of new services and not the full application • The graph database allows better adaptability and flexibility in the schema model than in traditional RDBMs • A very good decoupled application between the view and data layers using an AngularJS framework • Easy redistribution of the work packages into future teams Cons • Requires a multidisciplinary developer or a group of developers with skills in different environments (server, languages, frameworks, etc.) • Although graph databases allow flexible model generation, it is crucial to maintain regular discussions with scientists regarding the desired model, to define the correct data type (node/relationship/property/etc.) • The continuous integration of diverse data types derived from different scientific disciplines requires an ongoing learning process The architecture solution is an OSGi platform for core services, incorporating Micro-services for the remainder, in order to manage challenges (ambiguity, inconsistency, vagueness, incompleteness) due to the complexity of the data models and the features required by the users. Furthermore, it was essential to provide users with different interfaces with the least effort required. Less is More According to the rule that “less is more,” the application development started with minimal user requirements based upon common sense. The complex features were defined as ‘epics’ in the tracking system JIRA. The simple features were defined less formally within internal documents, email communications and meeting notes. CIKB Graph Database CHEMICAL CLUSTERING CHEMOINFORMATICS BIOINFORMATICS CALCULATED DATA IN VITRO TOXICOLOGY ANALYTICAL CHEMISTRY IN VIVO TOXICOLOGY SYSTEM TOXICOLOGY PMI MEASURED DATA PROPERTIES CALCULATION AEROSOL PHYSICS PRODUCT PORTFOLIO FLAVOR & SENSORY DATA AEROSOL CHEMISTRY PMI INTERNAL DATA One comparative, graphical, user-friendly interface FLAVOR PROPERTIES REGULATORY LISTS TOXICITY DATA PUBLICLY AVAILABLE DATA CHEMICAL PROPERTIES To be most efficient in communicating between the Scientific & Computer worlds, two important roles were defined for this project: As can be seen there is a high diversity of data types, including product names, toxicities and calculated data, to name but a few. It requires a degree of multidisciplinary knowledge and significant efforts to develop the correct integration strategy for a single unique repository. The architecture employed is robust, due to the fact that each service is self-contained, having dependency upon other services only in the case of security validation features. At the same time, it provides the flexibility to create rapid solutions in response to user feedback, with the ability to modify features or add technologies. Scientific User 1 Scientific User <n> . . . IT dev. 1 IT dev. <n> . . .JIRAUser req. User req. IT Leader User + System req. = Tasks Tasks Scientific Leader Feedback + bugs SCIENTIFIC DOMAIN IT DOMAIN Business User . . . Scientific Leader In order to cover the diverse range of scientific disciplines and also to isolate the IT developers from the complexity of scientific understanding, this role acquires the needs and wishes of scientists and evaluates the priority for each requirement from a business use point of view. IT Leader In order to evaluate the complexity for each new requirement, the IT Leader will analyze each request and propose solutions. Tasks are then created and distributed to developers in the form of work packages with due dates. Finally, the Scientific and IT leaders regularly check that the solutions meet business needs.