SlideShare a Scribd company logo
By Antonio Castellón (as Blue-Infinity consultant) , February, 2014
for Philip Morris International R&D
CIKB
Software Architecture Design Proposal
Problem : Data Complex
Problem : Data Complex to Model
Problem : Dynamic Data ( Uncertainty )
End User requirements and data itself sometimes generate
different types of uncertainty
Problem : GUI - User experience
I’m not stupid but
…. this interface
is too
complicated !!!
Problem : GUI - Adaptable + Flexible
Problem : GUI – Technology + Design
Be careful with awesome solutions that not fit
design and engineering at the same time
“The Solution” is a mix of 4 …
“The Solution” - Is a mix of …
An Architecture
A set of Data
A cool User Interface
And a mad developer to do it
(joke)
“The Solution” – Brick 1
An Architecture
Architecture – we aim to
• Reduce the complexity
• To be reusable
• Easy in deployment
• Allows dynamic updates
• To be adaptive
• Fast in responses
• Low memory profile
• To provide security
• …
Architecture – The response
Architecture – The response
Open Service Gateway initiative
Defines the standard.
Architecture – OSGi supported by
Architecture – OSGi implemented by …
. . .
Architecture – In summary, OSGi goals are …
Service Oriented +
Modular (bundles)
Bundle (x)
Service (x’)
Service (y)
Service (x)
Architecture – OSGi : Simple overview
Console Logging Admin …
Web Server
WAB
Application
1
WAB
Application
2
…
Application
Service 1
Application
Service 2
…
…
OSGi Instance 1
JVM
…
Bundles to be
developed for us
Bundles to be
installed
A set of Data
“The Solution” – Brick 2
Data
NoSQL
( Not Only SQL )
Data – NoSQL – Different implementations
Data - NoSQL – Comparing data structure
Image from: http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/
Data - NoSQL – Compare
98% of the business
requirements
There is still billions of
nodes and relationships
Data – Our selection
Graph
Databases
Data – Graph Databases – Why?
Flexible data structure
Doesn’t matter if the relations will change in the future.
Closer match to business logic
Data – Graph Databases – Why?
Natural query system
You tell what you want, not how to get it.
with recursive cluster (party, path, depth)
as ( select cast(@userId as character varying),
cast(@userId as character varying), 1
union
(
select (case
when this.party = amc.userA then amc.userB
when this.party = amc.userB then amc.userA
end), (this.path || '.' || (case
when this.party = amc.userA then amc.userB
when this.party = amc.userB then amc.userA
end)), this.depth + 1
from cluster this, chat amc
where ((this.party = amc.userA and
position(amc.userB in this.path) = 0)
or (this.party = amc.userB and position(amc.userA
in this.path) = 0)) AND this.depth < @depth + 1 )
)
select party, path
from cluster
where not exists (
select *
from cluster c2 where cluster.party = c2.party
and (
char_length(cluster.path) > char_length(c2.path)
or (char_length(cluster.path) =
char_length(c2.path)) and (cluster.path > c2.path)
)
)
order by party, path;
SQL = several hours to be executed
VS
START b = node:User(UserId=‘Manolo')
MATCH (b) --(friend)--(friendoffriend)
RETURN count(friendoffriend)
Cypher Language = 635ms
Data - Graph Databases – Why?
Fits very well with complex data
Data - Graph Databases – Why?
Fits very well with Bio-Informatics
0.9 Billion
relationsips
Data – Graph Databases – Why?
Fast Prototyping and development
We don’t need to lose too much time to define the schema (fine-grained).
Data - Graph Databases – What is it?
Properties
Labels
Relationships
Data - Graph Databases - Implemented by …
Data - Graph Databases - Compare
Name API Query
Methods
Consistency Staff (people) /
Community
OrientDB Java Traverser
API, Blueprints,
Rexster
Own SQL-like
Query
Language,
Gremlin
ACID, MVCC 3 / Low
Neo4j Java, Python,
JPython, Ruby,
JRuby,
JavaScript
(Node.js), PHP,
.NET, Django,
Clojure, Spring,
Scala, or REST
(any language)
Cypher
(native/preferre
d), Native Java
APIs (special
cases),
Traverser API,
REST,
Blueprints,
Gremlin
ACID 42 / Very High
DEX Java, C++,
.NET
Native Java, C#
and C++ APIs,
Blueprints,
Gremlin
Consistency,
durability and
partial isolation
and atomicity
5 / ?
Data - Graph Databases – Compare
Data - Graph Databases - Neo4j customers
Data - Graph Database - Neo4j - Partners
Data - Graph Database - Neo4j - Licenses
“The Solution” – Brick 3
A cool User Interface
GUI
+
GUI
UI Graphs
Model / View / Controller
( on Browser using Jscript )
JAX-RS (RESTful web services)
JSON responses
On OSGi bundle as a webservice
On Browser client
Data Driven Documents
GUI - AngularJS – What is it?
RESTful
+
JSON
GUI - D3.js – What is it?
GUI - D3.js – Rich and cool interfaces
GUI - Examples
GUI - Licenses
No requires any payment to use or to modify their code.
“The Solution” – The last brick
At least a mad developer to
do it (joke)
Architecture – Current draft
KARAF :: OSGi kernel platform
Shell admin
web admin
console
ServiceMix (Optional) :: Enterprise Service Bus
Groovy 2.2.1
Runtime
Jetty Server
8.1.9 Runtime
CIKB
Neo4j 2.0.0
Server
Core ( Business )
Database
connector
CVS
connector
SAW
connector
LIMS
connector
User Portal
UCSD
Connector
XML
Connector
AngularJS + D3.js
…
Admin
Portal
…
Thanks you for your attention.
End

More Related Content

Similar to CIKB - Software Architecture Analysis Design

Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital.AI
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & TomorrowAmsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 
Thinking in parallel ab tuladev
Thinking in parallel ab tuladevThinking in parallel ab tuladev
Thinking in parallel ab tuladev
Pavel Tsukanov
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
Srivatsan Ramanujam
 
Paolo Kreth - Persistence layers for microservices – the converged database a...
Paolo Kreth - Persistence layers for microservices – the converged database a...Paolo Kreth - Persistence layers for microservices – the converged database a...
Paolo Kreth - Persistence layers for microservices – the converged database a...
matteo mazzeri
 
Data science and OSS
Data science and OSSData science and OSS
Data science and OSS
Kevin Crocker
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
Sanket Shikhar
 
GraphTour 2020 - Neo4j: What's New?
GraphTour 2020 - Neo4j: What's New?GraphTour 2020 - Neo4j: What's New?
GraphTour 2020 - Neo4j: What's New?
Neo4j
 
Databases for Data Science
Databases for Data ScienceDatabases for Data Science
Databases for Data Science
Alexander Hendorf
 
The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdf
Neo4j
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
Marco Parenzan
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
MLconf
 
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science ToolkitApache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Denis Magda
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
inside-BigData.com
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big Data
InfiniteGraph
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Databricks
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
Databricks
 
Latest trends in information technology
Latest trends in information technologyLatest trends in information technology
Latest trends in information technology
Eldos Kuriakose
 

Similar to CIKB - Software Architecture Analysis Design (20)

NoSQL
NoSQLNoSQL
NoSQL
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & TomorrowAmsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
 
Thinking in parallel ab tuladev
Thinking in parallel ab tuladevThinking in parallel ab tuladev
Thinking in parallel ab tuladev
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
Paolo Kreth - Persistence layers for microservices – the converged database a...
Paolo Kreth - Persistence layers for microservices – the converged database a...Paolo Kreth - Persistence layers for microservices – the converged database a...
Paolo Kreth - Persistence layers for microservices – the converged database a...
 
Data science and OSS
Data science and OSSData science and OSS
Data science and OSS
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
GraphTour 2020 - Neo4j: What's New?
GraphTour 2020 - Neo4j: What's New?GraphTour 2020 - Neo4j: What's New?
GraphTour 2020 - Neo4j: What's New?
 
Databases for Data Science
Databases for Data ScienceDatabases for Data Science
Databases for Data Science
 
The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdf
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
 
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science ToolkitApache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big Data
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Latest trends in information technology
Latest trends in information technologyLatest trends in information technology
Latest trends in information technology
 

CIKB - Software Architecture Analysis Design

  • 1. By Antonio Castellón (as Blue-Infinity consultant) , February, 2014 for Philip Morris International R&D CIKB Software Architecture Design Proposal
  • 2. Problem : Data Complex
  • 3. Problem : Data Complex to Model
  • 4. Problem : Dynamic Data ( Uncertainty ) End User requirements and data itself sometimes generate different types of uncertainty
  • 5. Problem : GUI - User experience I’m not stupid but …. this interface is too complicated !!!
  • 6. Problem : GUI - Adaptable + Flexible
  • 7. Problem : GUI – Technology + Design Be careful with awesome solutions that not fit design and engineering at the same time
  • 8. “The Solution” is a mix of 4 …
  • 9. “The Solution” - Is a mix of … An Architecture A set of Data A cool User Interface And a mad developer to do it (joke)
  • 10. “The Solution” – Brick 1 An Architecture
  • 11. Architecture – we aim to • Reduce the complexity • To be reusable • Easy in deployment • Allows dynamic updates • To be adaptive • Fast in responses • Low memory profile • To provide security • …
  • 13. Architecture – The response Open Service Gateway initiative Defines the standard.
  • 14. Architecture – OSGi supported by
  • 15. Architecture – OSGi implemented by … . . .
  • 16. Architecture – In summary, OSGi goals are … Service Oriented + Modular (bundles) Bundle (x) Service (x’) Service (y) Service (x)
  • 17. Architecture – OSGi : Simple overview Console Logging Admin … Web Server WAB Application 1 WAB Application 2 … Application Service 1 Application Service 2 … … OSGi Instance 1 JVM … Bundles to be developed for us Bundles to be installed
  • 18. A set of Data “The Solution” – Brick 2
  • 20. Data – NoSQL – Different implementations
  • 21. Data - NoSQL – Comparing data structure Image from: http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/
  • 22. Data - NoSQL – Compare 98% of the business requirements There is still billions of nodes and relationships
  • 23. Data – Our selection Graph Databases
  • 24. Data – Graph Databases – Why? Flexible data structure Doesn’t matter if the relations will change in the future. Closer match to business logic
  • 25. Data – Graph Databases – Why? Natural query system You tell what you want, not how to get it. with recursive cluster (party, path, depth) as ( select cast(@userId as character varying), cast(@userId as character varying), 1 union ( select (case when this.party = amc.userA then amc.userB when this.party = amc.userB then amc.userA end), (this.path || '.' || (case when this.party = amc.userA then amc.userB when this.party = amc.userB then amc.userA end)), this.depth + 1 from cluster this, chat amc where ((this.party = amc.userA and position(amc.userB in this.path) = 0) or (this.party = amc.userB and position(amc.userA in this.path) = 0)) AND this.depth < @depth + 1 ) ) select party, path from cluster where not exists ( select * from cluster c2 where cluster.party = c2.party and ( char_length(cluster.path) > char_length(c2.path) or (char_length(cluster.path) = char_length(c2.path)) and (cluster.path > c2.path) ) ) order by party, path; SQL = several hours to be executed VS START b = node:User(UserId=‘Manolo') MATCH (b) --(friend)--(friendoffriend) RETURN count(friendoffriend) Cypher Language = 635ms
  • 26. Data - Graph Databases – Why? Fits very well with complex data
  • 27. Data - Graph Databases – Why? Fits very well with Bio-Informatics 0.9 Billion relationsips
  • 28. Data – Graph Databases – Why? Fast Prototyping and development We don’t need to lose too much time to define the schema (fine-grained).
  • 29. Data - Graph Databases – What is it? Properties Labels Relationships
  • 30. Data - Graph Databases - Implemented by …
  • 31. Data - Graph Databases - Compare Name API Query Methods Consistency Staff (people) / Community OrientDB Java Traverser API, Blueprints, Rexster Own SQL-like Query Language, Gremlin ACID, MVCC 3 / Low Neo4j Java, Python, JPython, Ruby, JRuby, JavaScript (Node.js), PHP, .NET, Django, Clojure, Spring, Scala, or REST (any language) Cypher (native/preferre d), Native Java APIs (special cases), Traverser API, REST, Blueprints, Gremlin ACID 42 / Very High DEX Java, C++, .NET Native Java, C# and C++ APIs, Blueprints, Gremlin Consistency, durability and partial isolation and atomicity 5 / ?
  • 32. Data - Graph Databases – Compare
  • 33. Data - Graph Databases - Neo4j customers
  • 34. Data - Graph Database - Neo4j - Partners
  • 35. Data - Graph Database - Neo4j - Licenses
  • 36. “The Solution” – Brick 3 A cool User Interface
  • 37. GUI +
  • 38. GUI UI Graphs Model / View / Controller ( on Browser using Jscript ) JAX-RS (RESTful web services) JSON responses On OSGi bundle as a webservice On Browser client Data Driven Documents
  • 39. GUI - AngularJS – What is it? RESTful + JSON
  • 40. GUI - D3.js – What is it?
  • 41. GUI - D3.js – Rich and cool interfaces
  • 43. GUI - Licenses No requires any payment to use or to modify their code.
  • 44. “The Solution” – The last brick At least a mad developer to do it (joke)
  • 45. Architecture – Current draft KARAF :: OSGi kernel platform Shell admin web admin console ServiceMix (Optional) :: Enterprise Service Bus Groovy 2.2.1 Runtime Jetty Server 8.1.9 Runtime CIKB Neo4j 2.0.0 Server Core ( Business ) Database connector CVS connector SAW connector LIMS connector User Portal UCSD Connector XML Connector AngularJS + D3.js … Admin Portal …
  • 46. Thanks you for your attention. End

Editor's Notes

  1. thanks for attending this presentation, I hope that it covers your expectations. This is only a high level description about the reasons to choose the selected architecture and their tools… Therefore, Do Not hesitate to interrupt me if you have any question, I will glad to explain in more details anything, if this allows to you to understand much better the final solution.
  2. Data is complex from their definition, too many relationships between different nodes and different domains.
  3. To fit from the „real“ world to an standard Entity Relational Model is a nightmare and it‘s a focus of errors if something need to be changed in the future (to introduce new properties, new objects, new relationships, etc. )
  4. The important thing from any design is to acquire correctly at least the 99% of the User requirements, but it‘s impossible if the user generate uncertainly from different reasons (and also when exists different users with different domains or points of view).
  5. One part where all software solutions spent more time is to developing the User Interface. Need to be flexible and adaptable from different requirements and uses or at least, that the technnology used provide the most easy way to create a good user experiences. INTUITIVE
  6. Need to be flexible and adaptable from different requirements based in different platforms to be used.
  7. To select the correct technology is also the goal to create a success project. Not all is based only in the front-end, and also, not all is based in the backend.
  8. It‘s our solution, we known that is possible to do it using different approachs ... All roads lead to Rome, but some are more easy than others
  9. A good solution is never easy to do...but if it is simple, it‘s much better.
  10. A good solution is never easy to do...but if it is simple, it‘s much better.
  11. Some of them are oriented as a wen server applications, but others are more service oriented.
  12. Each module/bundle is a service that publish to the others some functionallity using the OSGi framework where they are living.
  13. A good solution is never easy to do...but if it is simple, it‘s much better.
  14. It‘s a complement, this technology appears several years ago...but the last years was impossed by the requirements about the scalability, clustering and performance.
  15. In difference with the RDBMS, the implementation for each solution differs sometimes between these solutions because each solution is based in another paradigme and focused in different perspectives based on different types of organization data.
  16. It‘s a complement, this technology appears several years ago...but the last years was impossed by the requirements about the scalability, clustering and performance.
  17. - Data is according with the mind of the expert area (ex: Lab. people) and not with the mind of the IT Expert area. Good reference: http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/
  18. http://www.slideshare.net/ayeeson/0221-cypher-for-sql-professionals
  19. Cypher probably will be standard of GDB... ACID – standard for consistency of data..
  20. A good solution is never easy to do...but if it is simple, it‘s much better.
  21. A good solution is never easy to do...but if it is simple, it‘s much better.