SlideShare a Scribd company logo
1 of 28
Architecture of Wemlin Hub

22 December 2013 – Ognen Ivanovski & Goran Cvetkoski
What’s Wemlin?

 Wemlin provides access to public transport information
– easy, fast and independent of time and place
iOS

Android

Windows Phone

Web

Netcetera | 2
Wemlin Hub in a nutshell

Netcetera | 3
Wemlin Hub – non-functional requirements
- Wemlin Hub shall be a high performance parallelized message processing system

-

low latency – good processing speed
throughput – number of messages we can process
available - zero downtime!
good disaster recovery
scalability (horizontal, vertical)
modular – component based
extensible – 80% usage pattern
flexible – adapt to any infrastructure with minimal effort
Netcetera | 4
What is Software Architecture?

?
Netcetera | 5
Is this Software Architecture?

Netcetera | 6
Is this Software Architecture?

Netcetera | 7
Software Architecture is…

The decisions about software that are hard to change
E.g.
 Use of Jodatime vs. java.util.Date
 What kind of Database will you use
 GWT vs. Angular JS
Encapsulation

Netcetera | 8
Wemin Hub Architecture

Model
 Pure Java (no dependencies)
 Well defined extensible classes
 Immutable (like Jodatime, every
modification produces a new object)
 Algebraic
 Inverse References

Pipeline
 Compositional (All components are
wired together using a fixed set of well
defined interfaces)
Filter (stateless, function)
Transformer (stateless, function)
Aggregator (stateful, function)
Sink (consumer)
Tap (producer)

Netcetera | 9
Pure Model
- no external dependency
- design not influenced by any technology e.g. Hibernate, RDBMS, MVC

Netcetera | 10
Algebraic, Immutable Model
- algebraic: each object identity is defined by it's contents.
- immutable: each object, once created, cannot be modified. Each part of the
pipeline must copy-and-modify each object to perform it’s processing. This
characteristic enables easier reasoning about concurrency.
- metadata: classes support arbitrary metadata expressed as key-value pairs.
Metadata does not take part in the definition of the object's identity. This allows
encoding of format specific information in the model, which can be used in the
pipeline.

Netcetera | 11
Compositional
pipeElement = PipelineBuilder.from(gtfsInputJunction())
.transform(new StoppingPlaceResolver())
.filter(new InvalidStopsFilter())
.transform(new UnresolvedLineVehicleTypeAdder())
.transform(new UnresolvedLineNameAdjuster())
.transform(new LineResolver())
.transform(
new LineColorsEnricher(…))
.aggregate(new CacheAggregator(cache()))
.to(nullSink());

Netcetera | 12
Pipeline: Stateless, Functions

boolean accept(Object obj);
Object transform(Object original);

Optional<?> aggregate(Object obj);

filter, transform are referentially transparent

Netcetera | 13
Pipeline: Implementation
Netcetera | 14
Pipeline: Typical Hub
Netcetera | 15
Memoization
Immutable Model: each object, once created, cannot be modified.
Problem: Big memory consumption, a lot of objects are created with the same
contents
Solution: Memoization
When a factory method is executed, for example:
Station.get("1", "St. Gallen, Bahnhof");

a global cache of objects in searched if a object with the specified data already exists.
If object exists, it is returned, if not, new object is created and stored into cache for
future use.
Netcetera | 16
Memoization (2)
Implementation:
Constructors are made private and replaced by annotated factory methods:

@DesignatedFactoryMethod
public static Station get(Map<String, ? extends Serializable> attributes,
String referenceId,
String name,
String localName,
String place,
GeoPoint location) {
return new Station(attributes, referenceId, name, localName, place, location);
}
Netcetera | 17
Memoization (3)
@Pointcut("execution(@com.wemlin.hub.memoization.annotations.DesignatedFactoryMeth
od * *(..))")
public void designatedFactoryMethodPointcut() {
}

@Around("designatedFactoryMethodPointcut()")
public Object handleMemoize(final ProceedingJoinPoint pjp) throws Exception {
// search for object with the specified factory parameters in cache
}}

Impact: 500 to 1000 times less objects created (depends on how much data is
processed)
Netcetera | 18
Modularization – components architectural constraints
- The following architectural constrains define a module in Wemlin Hub:
- it is a maven module, a jar or web fragment
- is not allowed to use spring annotations for injection, i.e. all injection is done
via constructors
- a module provides components, only a few (up to 4) and facades for the 80%
usage pattern
- each component is allowed to hook into the wemlin pipeline only through
the predefined pipeline interfaces Filter, Transformer, Aggregator, input
component, output component
- components are Spring independent, as far as it is possible. They may
implement some spring interfaces, but as few as possible, and provide means
to achieve the same functionality without Spring.
Netcetera | 19
Station resolver module (1)
-

The Wemlin Hub Station resolver module does station resolving with help of the reference
stations list
The reference station list contains all CH stations listed in the “Stationsnamen Fahrplan
und Antragsformular für Mutationen“
http://www.bav.admin.ch/dokumentation/publikationen/00475/01497/index.html

-

We use the reference station list primarily to match references to stations in incoming data
(HAFAS, VDV, GTFS) to known stations for which we fully control the names, have the
coordinates and other meta-data that can be associated with them.
- The list is well defined JSON file that lists
- attributes of the stations (full name, local name, place, coordinates, agency etc)
- their respective referenceId
- a set of rules that may be used to match the station in incoming data
- connection areas
Netcetera | 20
Station Resolver (2)
- I Every station is resolved by a set of rules:
-

General rules
Station specific rules

General rules
- station id (optimal)
- station similarity – we use Apache Lucene for search of name similarity in combination
with coordinates distance tolerance

-

Station specific rules
- matchByRegex – when the station has different id from the one we have, but also the
name we get is slightly different
Example: Bahnhof, Esslingen:
Esslingen Bhf, Esslingen Bhf., Esslingen etc
Netcetera | 21
Cache
-

we don’t use a DB:
- all data is time bounded i.e. all data we keep is temporary (daily)

-

our choice was an in-memory cache
- very simple java maps cache
- no third party cache libraries are involved

-

the cache is easy to browse via the cache browser component
- few implementations available:
-

-

standard cache browser
forwarding cache browser
lazy loading cache browser

all cache browsers can define filters
Netcetera | 22
REST
-

all data that is in the cache is available via the REST api

-

there are two versions of the api available:
- legacy api (V0): wemlin clients still operate with this one
- V2 api according to the new transport.opendata.ch specification – some of the
customers started using it

-

both apis support pretty much the same things:
- locations listings – cities, stations
- lines listing – all lines that operate within a network
- trips – for a given period of time
- departures – with realtime prognosis

Netcetera | 23
Microkernel architectural style

Illustration

Netcetera | 24
Wemlin Hub - Blue-Green deployment
- Requirement: ensure zero downtime of the system.
- Decision: blue-green deployment
- two identical production environments
- the reverse proxy before the machines resolves to one of them depending on
which one is configured active
- test on the “idle” server before go-live
- switch proxy to the tested instance – the other one is now “idle”
- Advantages:
- ensure zero downtime of the system
- easy rollback if anything goes wrong
Netcetera | 25
Wemlin Hub today
- currently 4 customers (all in Switzerland)
- they cover nearly 40% of the transport in all Switzerland including Liechtenstein
- 17 agencies, 14 with realtime data
- daily processing load
- around 33’000 trips, 571’000 stops
- 2’500 projections per second in peak hours (in the moment, not the actual
capacity of the system)

- offline transport data conversion - contract with Google for Switzerland
- we convert the Swiss yearly transport schedule (over 400 agencies, ~1GB data)
to GTFS (Google Transit Feed Specification) format for Google Maps usage
- conversion takes ~20min
Netcetera | 26
Contact

Goran Cvetkoski, senior software engineer, Netcetera
goran.cvetkoski@netcetera.com

Ognen Ivanovski, chief architect, Netcetera
ognen.ivanovski@netcetera.com

Netcetera | 27
Q&A?

Netcetera | 28

More Related Content

What's hot

Load balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed systemLoad balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed systemAchal Gupta
 
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...Continuent
 
IEEE 2014 JAVA NETWORKING PROJECTS Cost effective resource allocation of over...
IEEE 2014 JAVA NETWORKING PROJECTS Cost effective resource allocation of over...IEEE 2014 JAVA NETWORKING PROJECTS Cost effective resource allocation of over...
IEEE 2014 JAVA NETWORKING PROJECTS Cost effective resource allocation of over...IEEEGLOBALSOFTSTUDENTPROJECTS
 
Probabilistic consolidation of virtual machines in self organizing cloud data...
Probabilistic consolidation of virtual machines in self organizing cloud data...Probabilistic consolidation of virtual machines in self organizing cloud data...
Probabilistic consolidation of virtual machines in self organizing cloud data...WMLab,NCU
 
High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of Viewaragozin
 
Project Deimos
Project DeimosProject Deimos
Project DeimosSimon Suo
 
Embedded Mirror Maker
Embedded Mirror MakerEmbedded Mirror Maker
Embedded Mirror MakerSimon Suo
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
 
Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Bhupesh Chawda
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Multiprocessing -Interprocessing communication and process sunchronization,se...
Multiprocessing -Interprocessing communication and process sunchronization,se...Multiprocessing -Interprocessing communication and process sunchronization,se...
Multiprocessing -Interprocessing communication and process sunchronization,se...Neena R Krishna
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5Shah Zaib
 
New Multi-Hop Clustering Algorithm for Vehicular Ad Hoc Networks
New Multi-Hop Clustering Algorithm for Vehicular Ad Hoc NetworksNew Multi-Hop Clustering Algorithm for Vehicular Ad Hoc Networks
New Multi-Hop Clustering Algorithm for Vehicular Ad Hoc NetworksJAYAPRAKASH JPINFOTECH
 

What's hot (15)

Load balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed systemLoad balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed system
 
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
 
IEEE 2014 JAVA NETWORKING PROJECTS Cost effective resource allocation of over...
IEEE 2014 JAVA NETWORKING PROJECTS Cost effective resource allocation of over...IEEE 2014 JAVA NETWORKING PROJECTS Cost effective resource allocation of over...
IEEE 2014 JAVA NETWORKING PROJECTS Cost effective resource allocation of over...
 
Chap6 slides
Chap6 slidesChap6 slides
Chap6 slides
 
Probabilistic consolidation of virtual machines in self organizing cloud data...
Probabilistic consolidation of virtual machines in self organizing cloud data...Probabilistic consolidation of virtual machines in self organizing cloud data...
Probabilistic consolidation of virtual machines in self organizing cloud data...
 
High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of View
 
Project Deimos
Project DeimosProject Deimos
Project Deimos
 
Embedded Mirror Maker
Embedded Mirror MakerEmbedded Mirror Maker
Embedded Mirror Maker
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Multiprocessing -Interprocessing communication and process sunchronization,se...
Multiprocessing -Interprocessing communication and process sunchronization,se...Multiprocessing -Interprocessing communication and process sunchronization,se...
Multiprocessing -Interprocessing communication and process sunchronization,se...
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5
 
SparkNet presentation
SparkNet presentationSparkNet presentation
SparkNet presentation
 
New Multi-Hop Clustering Algorithm for Vehicular Ad Hoc Networks
New Multi-Hop Clustering Algorithm for Vehicular Ad Hoc NetworksNew Multi-Hop Clustering Algorithm for Vehicular Ad Hoc Networks
New Multi-Hop Clustering Algorithm for Vehicular Ad Hoc Networks
 

Similar to Architecture of Wemlin Hub

PLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aq
PLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aqPLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aq
PLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aqPROIDEA
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...netvis
 
NoC simulators presentation
NoC simulators presentationNoC simulators presentation
NoC simulators presentationHossam Hassan
 
Monalytics - Online Monitoring and Analytics for Large Scale Data Centers
Monalytics - Online Monitoring and Analytics for Large Scale Data CentersMonalytics - Online Monitoring and Analytics for Large Scale Data Centers
Monalytics - Online Monitoring and Analytics for Large Scale Data CentersMahendra Kutare
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...Malobe Lottin Cyrille Marcel
 
VET4SBO Level 3 module 1 - unit 2 - 0.009 en
VET4SBO Level 3   module 1 - unit 2 - 0.009 enVET4SBO Level 3   module 1 - unit 2 - 0.009 en
VET4SBO Level 3 module 1 - unit 2 - 0.009 enKarel Van Isacker
 
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...Prolifics
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computingpurplesea
 
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...ijceronline
 
Crowd management system
Crowd management systemCrowd management system
Crowd management systemMumbaikar Le
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesAlexander Penev
 
distributed-systemsfghjjjijoijioj-chap3.pptx
distributed-systemsfghjjjijoijioj-chap3.pptxdistributed-systemsfghjjjijoijioj-chap3.pptx
distributed-systemsfghjjjijoijioj-chap3.pptxlencho3d
 
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SAMeh Zaghloul
 
ABC Present-Service-Mesh.pptx
ABC Present-Service-Mesh.pptxABC Present-Service-Mesh.pptx
ABC Present-Service-Mesh.pptxBrodyMitchum
 

Similar to Architecture of Wemlin Hub (20)

PLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aq
PLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aqPLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aq
PLNOG 8: Peter Ashwood-Smith - Shortest Path Bridging IEEE 802.1aq
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
 
NoC simulators presentation
NoC simulators presentationNoC simulators presentation
NoC simulators presentation
 
Monalytics - Online Monitoring and Analytics for Large Scale Data Centers
Monalytics - Online Monitoring and Analytics for Large Scale Data CentersMonalytics - Online Monitoring and Analytics for Large Scale Data Centers
Monalytics - Online Monitoring and Analytics for Large Scale Data Centers
 
Effective VM Scheduling Strategy for Heterogeneous Cloud Environment
Effective VM Scheduling Strategy for Heterogeneous Cloud EnvironmentEffective VM Scheduling Strategy for Heterogeneous Cloud Environment
Effective VM Scheduling Strategy for Heterogeneous Cloud Environment
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...
 
A distributed virtual architecture for data centers
A distributed virtual architecture for data centersA distributed virtual architecture for data centers
A distributed virtual architecture for data centers
 
VET4SBO Level 3 module 1 - unit 2 - 0.009 en
VET4SBO Level 3   module 1 - unit 2 - 0.009 enVET4SBO Level 3   module 1 - unit 2 - 0.009 en
VET4SBO Level 3 module 1 - unit 2 - 0.009 en
 
10 sdn-vir-6up
10 sdn-vir-6up10 sdn-vir-6up
10 sdn-vir-6up
 
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computing
 
Решения NFV в контексте операторов связи
Решения NFV в контексте операторов связиРешения NFV в контексте операторов связи
Решения NFV в контексте операторов связи
 
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
 
Crowd management system
Crowd management systemCrowd management system
Crowd management system
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
distributed-systemsfghjjjijoijioj-chap3.pptx
distributed-systemsfghjjjijoijioj-chap3.pptxdistributed-systemsfghjjjijoijioj-chap3.pptx
distributed-systemsfghjjjijoijioj-chap3.pptx
 
Csc concepts
Csc conceptsCsc concepts
Csc concepts
 
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
 
ABC Present-Service-Mesh.pptx
ABC Present-Service-Mesh.pptxABC Present-Service-Mesh.pptx
ABC Present-Service-Mesh.pptx
 

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Architecture of Wemlin Hub

  • 1. Architecture of Wemlin Hub 22 December 2013 – Ognen Ivanovski & Goran Cvetkoski
  • 2. What’s Wemlin?  Wemlin provides access to public transport information – easy, fast and independent of time and place iOS Android Windows Phone Web Netcetera | 2
  • 3. Wemlin Hub in a nutshell Netcetera | 3
  • 4. Wemlin Hub – non-functional requirements - Wemlin Hub shall be a high performance parallelized message processing system - low latency – good processing speed throughput – number of messages we can process available - zero downtime! good disaster recovery scalability (horizontal, vertical) modular – component based extensible – 80% usage pattern flexible – adapt to any infrastructure with minimal effort Netcetera | 4
  • 5. What is Software Architecture? ? Netcetera | 5
  • 6. Is this Software Architecture? Netcetera | 6
  • 7. Is this Software Architecture? Netcetera | 7
  • 8. Software Architecture is… The decisions about software that are hard to change E.g.  Use of Jodatime vs. java.util.Date  What kind of Database will you use  GWT vs. Angular JS Encapsulation Netcetera | 8
  • 9. Wemin Hub Architecture Model  Pure Java (no dependencies)  Well defined extensible classes  Immutable (like Jodatime, every modification produces a new object)  Algebraic  Inverse References Pipeline  Compositional (All components are wired together using a fixed set of well defined interfaces) Filter (stateless, function) Transformer (stateless, function) Aggregator (stateful, function) Sink (consumer) Tap (producer) Netcetera | 9
  • 10. Pure Model - no external dependency - design not influenced by any technology e.g. Hibernate, RDBMS, MVC Netcetera | 10
  • 11. Algebraic, Immutable Model - algebraic: each object identity is defined by it's contents. - immutable: each object, once created, cannot be modified. Each part of the pipeline must copy-and-modify each object to perform it’s processing. This characteristic enables easier reasoning about concurrency. - metadata: classes support arbitrary metadata expressed as key-value pairs. Metadata does not take part in the definition of the object's identity. This allows encoding of format specific information in the model, which can be used in the pipeline. Netcetera | 11
  • 12. Compositional pipeElement = PipelineBuilder.from(gtfsInputJunction()) .transform(new StoppingPlaceResolver()) .filter(new InvalidStopsFilter()) .transform(new UnresolvedLineVehicleTypeAdder()) .transform(new UnresolvedLineNameAdjuster()) .transform(new LineResolver()) .transform( new LineColorsEnricher(…)) .aggregate(new CacheAggregator(cache())) .to(nullSink()); Netcetera | 12
  • 13. Pipeline: Stateless, Functions boolean accept(Object obj); Object transform(Object original); Optional<?> aggregate(Object obj); filter, transform are referentially transparent Netcetera | 13
  • 16. Memoization Immutable Model: each object, once created, cannot be modified. Problem: Big memory consumption, a lot of objects are created with the same contents Solution: Memoization When a factory method is executed, for example: Station.get("1", "St. Gallen, Bahnhof"); a global cache of objects in searched if a object with the specified data already exists. If object exists, it is returned, if not, new object is created and stored into cache for future use. Netcetera | 16
  • 17. Memoization (2) Implementation: Constructors are made private and replaced by annotated factory methods: @DesignatedFactoryMethod public static Station get(Map<String, ? extends Serializable> attributes, String referenceId, String name, String localName, String place, GeoPoint location) { return new Station(attributes, referenceId, name, localName, place, location); } Netcetera | 17
  • 18. Memoization (3) @Pointcut("execution(@com.wemlin.hub.memoization.annotations.DesignatedFactoryMeth od * *(..))") public void designatedFactoryMethodPointcut() { } @Around("designatedFactoryMethodPointcut()") public Object handleMemoize(final ProceedingJoinPoint pjp) throws Exception { // search for object with the specified factory parameters in cache }} Impact: 500 to 1000 times less objects created (depends on how much data is processed) Netcetera | 18
  • 19. Modularization – components architectural constraints - The following architectural constrains define a module in Wemlin Hub: - it is a maven module, a jar or web fragment - is not allowed to use spring annotations for injection, i.e. all injection is done via constructors - a module provides components, only a few (up to 4) and facades for the 80% usage pattern - each component is allowed to hook into the wemlin pipeline only through the predefined pipeline interfaces Filter, Transformer, Aggregator, input component, output component - components are Spring independent, as far as it is possible. They may implement some spring interfaces, but as few as possible, and provide means to achieve the same functionality without Spring. Netcetera | 19
  • 20. Station resolver module (1) - The Wemlin Hub Station resolver module does station resolving with help of the reference stations list The reference station list contains all CH stations listed in the “Stationsnamen Fahrplan und Antragsformular für Mutationen“ http://www.bav.admin.ch/dokumentation/publikationen/00475/01497/index.html - We use the reference station list primarily to match references to stations in incoming data (HAFAS, VDV, GTFS) to known stations for which we fully control the names, have the coordinates and other meta-data that can be associated with them. - The list is well defined JSON file that lists - attributes of the stations (full name, local name, place, coordinates, agency etc) - their respective referenceId - a set of rules that may be used to match the station in incoming data - connection areas Netcetera | 20
  • 21. Station Resolver (2) - I Every station is resolved by a set of rules: - General rules Station specific rules General rules - station id (optimal) - station similarity – we use Apache Lucene for search of name similarity in combination with coordinates distance tolerance - Station specific rules - matchByRegex – when the station has different id from the one we have, but also the name we get is slightly different Example: Bahnhof, Esslingen: Esslingen Bhf, Esslingen Bhf., Esslingen etc Netcetera | 21
  • 22. Cache - we don’t use a DB: - all data is time bounded i.e. all data we keep is temporary (daily) - our choice was an in-memory cache - very simple java maps cache - no third party cache libraries are involved - the cache is easy to browse via the cache browser component - few implementations available: - - standard cache browser forwarding cache browser lazy loading cache browser all cache browsers can define filters Netcetera | 22
  • 23. REST - all data that is in the cache is available via the REST api - there are two versions of the api available: - legacy api (V0): wemlin clients still operate with this one - V2 api according to the new transport.opendata.ch specification – some of the customers started using it - both apis support pretty much the same things: - locations listings – cities, stations - lines listing – all lines that operate within a network - trips – for a given period of time - departures – with realtime prognosis Netcetera | 23
  • 25. Wemlin Hub - Blue-Green deployment - Requirement: ensure zero downtime of the system. - Decision: blue-green deployment - two identical production environments - the reverse proxy before the machines resolves to one of them depending on which one is configured active - test on the “idle” server before go-live - switch proxy to the tested instance – the other one is now “idle” - Advantages: - ensure zero downtime of the system - easy rollback if anything goes wrong Netcetera | 25
  • 26. Wemlin Hub today - currently 4 customers (all in Switzerland) - they cover nearly 40% of the transport in all Switzerland including Liechtenstein - 17 agencies, 14 with realtime data - daily processing load - around 33’000 trips, 571’000 stops - 2’500 projections per second in peak hours (in the moment, not the actual capacity of the system) - offline transport data conversion - contract with Google for Switzerland - we convert the Swiss yearly transport schedule (over 400 agencies, ~1GB data) to GTFS (Google Transit Feed Specification) format for Google Maps usage - conversion takes ~20min Netcetera | 26
  • 27. Contact Goran Cvetkoski, senior software engineer, Netcetera goran.cvetkoski@netcetera.com Ognen Ivanovski, chief architect, Netcetera ognen.ivanovski@netcetera.com Netcetera | 27

Editor's Notes

  1. So What is Wemlin?According to our marketing, Wemlin is a Passenger Information System that provides access to public transport information – easy, fast and independent of time and place.The Wemlin Product family for the end user consists of mobile applications for iOS, Android, Windows Phone and Web application.All applications display the same data.Transport unitPlan data – schedules (yearly, daily) – which vehicle drives where?Stop – zastanuvanjeTrip – rutaGps – sends data to avms
  2. Under the hood, Wemlin also has a Server part called Wemlin Hub whose main purpose is:to gather data from the transport agencies (using different protocols, like: VDV DFI, VDV AUS, GTFS, HAFAS)enrich the acquired data with its own reference data (why? – it’s third party data, often bad quality or not in desired format)serve that data to primarily to Wemlin mobile clients and also to 3-th party clients.convert data between various formats
  3. Exstensible - Many clients with different requirements
  4. Exstensible - Many clients with different requirements
  5. Under the hood, Wemlin also has a Server part called Wemlin Hub whose main purpose is:to gather data from the transport agencies (using different protocols, like: VDV DFI, VDV AUS, GTFS, HAFAS)enrich the acquired data with its own reference data (why? – it’s third party data, often bad quality or not in desired format)serve that data to primarily to Wemlin mobile clients and also to 3-th party clients.convert data between various formats
  6. Why microkernel?software systems that must be able to adapt to changing system requirementsit separates a minimal functional core from extended functionality and customer-specific partsmicrokernel also serves as a socket for plugging in these extensions and coordinating their collaborationsize of the microkernel should be kept to a minimum, therefore, only part of the core functionality can be included in itBenefits of the pattern:good portability, since only microkernel need to be modified when porting the system to a new environment.high flexibility and extensibility, as modifications or extensions can be done by modifying or extending internal servers.maintainability and changeability of the system.
  7. You want to make your architecture as small as possible (the less things that are “hard to change” the better)Encapsulation: the more you are able t
  8. Microkerncel.This is the architecture. Small, easy to remember.
  9. - Log free
  10. Explain input components responsibilityKeep connections openHandle specific protocolResolversStation resolver, line resolverWe translate data from the protocols to our model formatUniversal model for any protocolBidejkisekojakomponenta se pridrzuva do odredeninterfejs od pajplajnot, se dobivaparalelizacijazabezpari I bezmnogurazmisluvanjepriimplementacijataPrakanjenaporaki od edenna drug sistem: ova e moznobidejkiporakite se immutable Porakite ne moze da se smenat, mozesamo da zastarat
  11. Memoization is a optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously processed inputs.Having immutable model in a system has it’s advantages when it comes to parallelization and concurrency, but also has it’s drawbacks which is big memory consumption, because for every modification of a single field in an object, you need to shallow copy the whole object and create a new one with the modified filed.Because many of the model objects in Wemlin will be recreated with the same data many times, it makes sense to check if we had previously created objects with the specified parameters and return the old object instead of creating a new one.
  12. Memoization is a optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously processed inputs.
  13. The pointcut selects all methods that are anotated with the @DesignatedFactoryMethod annotation.We use the Around advice on the selected join points.
  14. Memoization is a optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously processed inputs.Having immutable model in a system has it’s advantages when it comes to parallelization and concurrency, but also has it’s drawbacks which is big memory consumption, because for every modification of a single field in an object, you need to shallow copy the whole object and create a new one with the modified filed.Because many of the model objects in Wemlin will be recreated with the same data many times, it makes sense to check if we had previously created objects with the specified parameters and return the old object instead of creating a new one.
  15. forwarding cache browser - forwards all requests to a wrapped cache browser. useful for overriding some logic.lazy loading cache browser - lazy loads data from the cache, saves time for large schedules
  16. forwarding cache browser - forwards all requests to a wrapped cache browser. useful for overriding some logic.lazy loading cache browser - lazy loads data from the cache, saves time for large schedules
  17. Why microkernel?software systems that must be able to adapt to changing system requirementsit separates a minimal functional core from extended functionality and customer-specific partsmicrokernel also serves as a socket for plugging in these extensions and coordinating their collaborationsize of the microkernel should be kept to a minimum, therefore, only part of the core functionality can be included in itBenefits of the pattern:good portability, since only microkernel need to be modified when porting the system to a new environment.high flexibility and extensibility, as modifications or extensions can be done by modifying or extending internal servers.maintainability and changeability of the system.Apache httpd exampleKarakteristicno e stoparalelizirame