Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

0

Share

Download to read offline

Automotive Information Research driven by Apache Solr

Download to read offline

Lucene Revolution 2016, Boston: Talk by Mario-Leander Reimer (@LeanderReimer, Principal Software Architect at QAware).

Abstract: We are searching the unknown. How can you find hidden and unknown relationships in unrelated relational data silos? How can you search the relevant information in a 10^56 dimensional space? How do you create a consistent yet up to date information network for over 20 languages on a daily basis? And how on earth do you convince IT governance to let you use Solr for this kind of job? All this sounds impossible? This talk will give the answers and present a detailed case study and success story about how we used Apache Solr to build a search based business intelligence and automotive information research application for a major German car manufacturer.
3 Things to Learn:
• How to use Solr as a reverse data engineering tool to chart and explore relational data silos and their hidden relationships.
• Different approaches for de-normalizing relational data models efficiently without suffering combinatorial explosion like using multi-value fields, child documents or JavaScript validity term based post filtering.
• How to develop a rock solid, scalable, performant enterprise solution with complex business logic on top of Solr and Java EE.

  • Be the first to like this

Automotive Information Research driven by Apache Solr

  1. 1. Automotive Information Research driven by Apache Solr Mario-Leander Reimer Chief Technologist, QAware GmbH mario-leander.reimer@qaware.de @LeanderReimer
  2. 2. 2 01 Agenda Reverse Data Engineering and Exploration with MIR Aftersales Information Research with AIR Architecture, Requirements, Challenges Solutions for the Problem of Combinatorial Explosion Data Consistency and Timeliness BOM Explosions and Demand Forecasts with ZEBRA
  3. 3. Reverse Data Engineering and Exploration with MIR
  4. 4. 5 02 How do we find the originating data silo for the desired data? System A System B System C System D Vehicle data Other data Where to find the vehicle data? 60 potential systems with 5000 entities.
  5. 5. 6 03 How do we find the hidden relations between the systems? How is the data linked to each other? 400.000 potential relations. Vehicle data Other data System A System B System C System D Parts Documents
  6. 6. 7 01 Reverse Data Engineering and Analysis with MIR and Solr MIR manages the meta information, data models and record descriptions about the all our source systems (RDBMS, XML, SOAP, …) MIR allows to navigate and search the metadata, easy drill into the metadata using facets MIR also manages the target data model and Solr schema description
  7. 7. Search Results Tree view of systems, tables and attributes Drill down via facets Wildcard Search Found potential synonyms for the chassis number
  8. 8. Aftersales Information Research with AIR
  9. 9. 10 01 Find the right information in less than 3 clicks. The initial situation: Users had to use up to 7 different applications for their daily work. Systems were not really integrated nicely. Finding the correct information was laborious and error prone. The project vision: Combine the data into a consistent information network. Make the information network and its data searchable and navigable. Replace existing application with one easy to use application.
  10. 10. 11 01
  11. 11. 12 01
  12. 12. „But Apache Solr is only a full-text search engine. You have to use an Oracle database for your application data.“ – Anonymous IT person
  13. 13. 14 01 Solr outperformed Oracle in query time as well as index size. SELECT * FROM VEHICLE WHERE VIN='V%' INFO_TYPE:VEHICLE AND VIN:V* SELECT * FROM MEASURE WHERE TEXT='engine' INFO_TYPE:MEASURE AND TEXT:engine SELECT * FROM VEHICLE WHERE VIN='%X%' INFO_TYPE:VEHICLE AND VIN:*X* | 038 ms | 000 ms | 000 ms | 383 ms | 384 ms | 383 ms | 092 ms | 000 ms | 000 ms | 389 ms | 387 ms | 386 ms | 039 ms | 000 ms | 000 ms | 859 ms | 379 ms | 383 ms Disk space: 132 MB Solr vs. 385 MB OracleTest data set: 150.000 records
  14. 14. The dirt race use case: •No internet connection •Low-End Devices
  15. 15. 16 01 Solr and AIR on Raspberry Pi Model B as PoC worked like a charm! Running Debian Linux + JDK8 Jetty Servlet Container with the Solr und AIR web apps deployed A reduced offline data set with ~1.5 Mio Solr Documents Model B Hardware Specs: ARMv6 CPU 700Mhz 512MB RAM 32GB SD Card And now try this with Oracle!
  16. 16. 17 01 A careful schema design is crucial for your Solr performance.
  17. 17. 18 01 Naive denormalization quickly leads to combinatorial explosion! 33.071.137 Vehicles14.830.197 Flat Rate Units 1.678.667 Packages 5.078.411 FRU Groups 18.573 Repair Instructions 648.129 Technical Documents 55.000 Parts 648.129 Measures 41.385 Types 6.180 Fault Indications Relationship Navigation
  18. 18. 19 01 Multi-value typed fields can efficiently store 1..n relations, but may result in false positives. { "INFO_TYPE":"AWPOS_GROUP", "NUMMER" :[ "1134190" , "1235590" ] "BAUSTAND" :["1969-12-31T23:00:00Z","1975-12-31T23:00:00Z"] "E_SERIES" :[ "F10" , "E30" ] } In case this doesn‘t matter, perform a post filtering of the results in your application. Alternative: current Solr versions support nested child documents. Use instead. Index 0 Index 1 fq=INFO_TYPE:AWPOS_GROUP AND NUMMER:1134190 AND E_SERIES:F10 fq=INFO_TYPE:AWPOS_GROUP AND NUMMER:1134190 AND E_SERIES:E30
  19. 19. 20 01 Technical documents and their validity were expressed and stored in a binary representation. Validity expressions may have up to 46 characteristics Validity expressions use 5 different boolean operators (AND, NOT, …) Validity expessions can be nested and complex Some characteristics are dynamic and not even known at index time The solution: transform the validity expressions into the equivalent ternary JavaScript terms and evaluate these terms at query time using a custom function query filter.
  20. 20. 21 01 Binary validity expression example. Type(53078923) = ‚Brand‘, Value(53086475) = ‚BMW PKW‘ Type(53088651) = ‚E-Series‘, Value(53161483) = ‚F10‘ Type(64555275) = ‚Transmission‘, Value(53161483) = ‚MECH‘
  21. 21. 22 01 Transformation of the binary validity terms into their JavaScript equivalent at index time. ((BRAND=='BMW PKW')&&(E_SERIES=='F10')&&(TRANSMISSION=='MECH')) AND(Brand='BMW PKW', E-Series='F10'‚ Transmission='MECH') { "INFO_TYPE": "TECHNISCHES_DOKUMENT", "DOKUMENT_TITEL": "Getriebe aus- und einbauen", "DOKUMENT_ART": " reparaturanleitung", "VALIDITY": "((BRAND=='BMW PKW')&&((E_SERIES=='F10')&&(...))", „BRAND": [„BMW PKW"] }
  22. 22. 23 01 The JavaScript validity term is evaluated at query time using a custom function query. &fq=INFO_TYPE:TECHNISCHES_DOKUMENT &fq=DOKUMENT_ART:reparaturanleitung &fq={!frange l=1 u=1 incl=true incu=true cache=false cost=500} jsTerm(VALIDITY,eyJNT1RPUl9LUkFGVFNUT0ZGQVJUX01PVE9SQVJCRUlUU 1ZFUkZBSFJFTiI6IkIiLCJFX01BU0NISU5FX0tSQUZUU1RPRkZBUlQiOm51bG wsIlNJQ0hFUkhFSVRTRkFIUlpFVUciOiIwIiwiQU5UUklFQiI6IkFXRCIsIkV kJBVVJFSUhFIjoiWCcifQ==) Base64decode { "BRAND":"BMW PKW", "E_SERIES":"F10", "TRANSMISSION":"MECH" } http://qaware.blogspot.de/2014/11/how-to-write-postfilter-for-solr-49.html
  23. 23. 24 01 Custom ETL combined with Continuous Delivery and DevOps ensure data consistency and timeliness.
  24. 24. BOM Explosions and Demand Forecasts with ZEBRA
  25. 25. 26 01 Bills of Materials (BOMs) explained
  26. 26. 27 01 BOMs are required for … Production planning Forecasting Demand Scenario-based PlanningSimulations
  27. 27. 28 01 The Big Picture of ZEBRA Parts / abstract demands Orders / actual demands Analytics BOMs / dependent demands Demand Resolver Production Planning 7 Mio.2 Mio. 21 Mrd.
  28. 28. 29 01 The most essential Solr optimizations in ZEBRA Bulk RequestHandler Binary DocValue support Boolean interpreter as postfilter Mass data binary response format Search components with custom JOIN algorithm Solving thousands of orders with one request Be able to store data effective using our own JOIN implementation. Speed up the access to persisted data dramatically using binary doc values. 0111 0111 Use the standard Solr cinary codec with an optimized data- model that reduce the amount 
 of data by a factor of 8. Computing BOM explosions Enable Solr with custom post filters to filter documents using stored boolean expessions.
  29. 29. 30 01 Low Level Optimizations can yield great boosts in performance October 14 January 15 May 15 October 15 4,9 ms 0,28 ms 24 ms TimetocalculatetheBoMforoneorder 0,08 ms Scoring (-8%) Default Query Parser (-25%) Stat-Cache (-8%) String DocValues (-28%) Development of the processing time Demand Calulation Service PoC Profiling result and the some improvements to reduce the query time. X X X X
  30. 30. Solr has become a powerful tool for building enterprise and data analytics applications. Be creative!
  31. 31. & Mario-Leander Reimer Chief Technologist, QAware GmbH mario-leander.reimer@qaware.de https://www.qaware.de https://slideshare.net/MarioLeanderReimer/ https://speakerdeck.com/lreimer/ https://twitter.com/leanderreimer/

Lucene Revolution 2016, Boston: Talk by Mario-Leander Reimer (@LeanderReimer, Principal Software Architect at QAware). Abstract: We are searching the unknown. How can you find hidden and unknown relationships in unrelated relational data silos? How can you search the relevant information in a 10^56 dimensional space? How do you create a consistent yet up to date information network for over 20 languages on a daily basis? And how on earth do you convince IT governance to let you use Solr for this kind of job? All this sounds impossible? This talk will give the answers and present a detailed case study and success story about how we used Apache Solr to build a search based business intelligence and automotive information research application for a major German car manufacturer. 3 Things to Learn: • How to use Solr as a reverse data engineering tool to chart and explore relational data silos and their hidden relationships. • Different approaches for de-normalizing relational data models efficiently without suffering combinatorial explosion like using multi-value fields, child documents or JavaScript validity term based post filtering. • How to develop a rock solid, scalable, performant enterprise solution with complex business logic on top of Solr and Java EE.

Views

Total views

256

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

18

Shares

0

Comments

0

Likes

0

×