I'll do a talk on how we've used Neo4J for dataquality analysis & corrections as well as breed-analysis and more at NKK, where performance (dogs/second 100-9.000) and (queries/second 200-20.000) are important metrics.. :)
2. A Global Leader
AMERICAS
EUROPE
ASIA
Bringing our customers'
projects to life and
boosting their performance
through technology and
innovation
«
«
€ +1 633 m
REVENUES in 2013
+20 000
EMPLOYEES in 2013
+20
COUNTRIES
3. R&D and Innovation
For 30 years, Altran has had a
close relationship with
innovation.
Where creative ideas become
a reality, Altran consultants
step up to transform ideas into
innovative solutions that can
enable technological progress.
In this way, Altran has
contributed to major
technological advances in
recent decades: speed,
precision, security,
communication, practicality,
interoperability, artificial
intelligence...
AEGT: the world's most
powerful electric car
Altran was responsible for
designing and engineering the
electric transmission on this
car, capable of reaching
speeds of 300 km/h.
Solar Impulse:
the first plane to fly on
solar energy alone
Since 2003, Altran experts
have dedicated their skills
to bringing about this
formidable technical and
human achievement.
The Airport of the Future:
outlining a ‘friend-lean’
space in 2040
Altran develops revolutionary
concepts for airports
responding to long-term
changes in the industry.
4. Agenda
● Situation analysis
● From dog register via case management
to dog-hub
● The platform
● Performance and some metrics
5. Initial analysis
● From register to case management – over 20 years of legacy..
– Dog information spread across 30+ relational tables
– 2-3 weeks of work to retrieve «a dog» with some info (every time)
– «impossible» to store new types of data/information on a dog
– Data was hidden/unavailable to people -> «data rot»
– Cascading costs of change and new features
● Recognized the need for a different approach
– But how to get out of the squeeze was not obvious..
– Limited technical skills, system knowledge and functional knowledge
– No time, capacity or money to do a «full rewrite»
● We selected a bottom up, data first, platform aproach. With strong capabilities for
continous data quality processes and strong support for semi-structured data.
6. From dog register to case management to dog hub
● Quick and easy access to individual dogs
● Scale - 10 to 50 integrations with other systems (hub)
● Handling individual dogs of «questionable» data quality
● Easily extendable to store more data on any individual
– Semi structured strategy for persistence/storage
8. The platform we built
● Dog search & lookup
– SolrCloud with "json_full"
● DogPopulationService
– Pedigree, population structure, breeddata
– Data error, data deviation, data missing -> DogFixer
● DogIDMapper (multi-source, multi-master, map different ID-schemes)
● DogCrawler
– Is it possible to find aditional data to fix this individual?
● DogFixer
– Is it possible to statistically find the right answer?
– Manual process in some corner-cases / difficult cases
● DogServiceREST
– verify & merge, writeback updates
– «tailing» datasources of dog information
9. Some numbers
● 2 mill reqs/hour
● 10 mill reqs/24 hours
● Breed calculations went form taking «months» to «instant»
– 200-500 joins per individual, 1000/year, 10 years = 2-8 sek
● Latency: 0.2 sek, 99.7% of reqs
● DogIDMapper: 4000 dogs/sec
● DogGraph: 3000 dogs/sec
● DogFixer: 10-15 dogs/sec
● DogCrawler: 100-200 dogs/sec