Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Master Meta Data


Published on

Organize & manage master meta data centrally, built upon kong, cassandra, neo4j & elasticsearch. Managing master & meta data is a very common problem with no good opensource alternative as far as I know, so initiating this project – MasterMetaData.

Published in: Software
  • Be the first to comment

Master Meta Data

  1. 1. Organize & manage master meta data centrally, built upon kong, cassandra, neo4j & elasticsearch.
  2. 2. Hello! I am Akhil Agrawal Managing master & meta data is a very common problem with no good opensource alternative as far as I know, so initiating this project – MasterMetaData Started BIZense in 2008 & Digikrit in 2015
  3. 3. 1. Problem Let’s start with what problem we are addressing – why mastermetadata ?
  4. 4. Less Frequently Changing  Master data and meta data both have one common behavior of less frequent changes although their purpose is different.  The less frequently changing data whether it is data about real world entities (master data) or data about other data (meta data), both can be stored, accessed and managed in very similar ways. Why MasterMetaData ?
  5. 5. No Open Source Option  There are MDM solutions (mostly from ERP vendors like SAP, Oracle etc. & analytics companies like Informatica, SAS) but the master meta data intersection is being explored only recently.  There is no open source alternatives for smaller companies or something that can be embedded with SAAS products. Why MasterMetaData ?
  6. 6. 2. Definitions Let’s start with some definitions around data categories
  7. 7. Definition of Data Categories Meta Data meta information about other forms of data (can describe master, transaction or lower level meta data) Master Data real world entities like customer, partner etc. (only the stable attributes are considered part of master data) Transaction Data real world interactions which have very short lifespan and occurrence is linked with time/space (unstable/changing attribute values, although definition/description is stable but each new data point is unique) Master Meta Data combination of master and meta data defined at application, enterprise or global level (although the volume and variety of master & meta data is very different, they have lot of common access patterns)
  8. 8. 3. Implementation Let’s discuss the implementation – technologies & concepts involved
  9. 9. Background ◎ Faced difficulty with managing master and meta data in previous projects ◎ Implemented custom solution while building mobile ad platform ◎ Currently implementing same features required for the communication platform ◎ Have worked with elasticsearch + kibana while kong + cassandra seems useful
  10. 10. Build With Following Technologies neo4j highly scalable native graph database that leverages data relationships as first-class entities, handles evolving data challenges elasticsearch search and analyze data in real time, defacto standard for making data accessible through search and aggregations cassandra right choice when you need linear scalability and high availability without compromising performance & durability kong the open-source management layer for APIs and microservices, delivering security, high performance and reliability lua lua is a powerful, fast, lightweight, embeddable scripting language. For writing kong plugins for access to various meta master data kibana explore and visualize data in elasticsearch, opensource project from elasticsearch team, intuitive interface, visualization & dashboards
  11. 11. Opensource, Scalable, Searchable, Ready to Use Project mastermetadata needs to be ready to use for atleast few of the use cases like location, device, movie, tour etc.
  12. 12. Challenges  Complex & hierarchical data sets  Real-time query performance  Dynamic structure  Evolving relationships Why neo4j for mastermetadata ? Why neo4j ?  Native graph store  Flexible schema  Performance and scalability  High availability Referenced from
  13. 13. Why elasticsearch for mastermetadata ? Scale ◎ Real-Time Data ◎ Massively Distributed ◎ High Availability ◎ Multitenancy ◎ Per-Operation Persistence Search ◎ Full-Text Search ◎ Document- Oriented ◎ Schema-Free ◎ Developer- Friendly, RESTful API ◎ Build on top of Apache Lucene™ Analytics ◎ Real-Time Advanced Analytics ◎ Very flexible Query DSL ◎ Flexible analytics & visualization platform - Kibana ◎ Real-time summary and charting of streaming data Referenced from
  14. 14. Why kong for mastermetadata ? Secure, Manage & Extend your APIs and Microservices RESTful Interface Plugin Oriented Platform Agnostic Referenced from Without Kong With Kong
  15. 15. 4. Interesting What are interesting things happening around this ?
  16. 16. Master & Metadata Management Interesection Maximized Metadata Model ◎data model describing the metadata needs to be “maximized” to cover as many use cases possible ◎meta data model needs to be inclusive of all metadata in the organization as well as cover the master data ◎governance of metadata model requires the ability to describe maximum metadata in the system to provide ability to govern data describing other data Minimalistic Master Data Model ◎master data model describing master data needs to be “minimalist” ◎master data model is neither inclusive of all data in the organization, nor specific to applications using it for specific purpose ◎central governance of master data requires that data model backing it is minimalistic to be able to govern without application specific details ◎master data model is basically metadata describing the master data Referenced from on-metadata-and-master-data-management-intersection/
  17. 17. From Big Data To Smart Data Zero Latency Organization data ◎latency linked to the data (capturing) ◎latency linked to analytical processes (processing) structural ◎latency linked to decision making processes ◎time needed to implement actions linked with decisions action ◎data latency added with structural latency ◎time needed from capturing of data till the action takes place value data is considered smart based on the value it brings in decision making and action taking (than anything else like size, source, etc) master data which represents real world entities and also remains stable over time is the smart data as it helps with common data reference meta data which describes other data whether master, transactional or lower level meta data is also smart data as it helps in understanding Types Of Latency Smart Data
  18. 18. 5. Get Involved Let’s discuss ways to get involved in this project
  19. 19. Areas where you can get involved ? DEMO Functional Tests, Integration Tests, Run Demo CODE Implement Ideas, Fix Bugs, Enhance Features DOCUMENT User Documentation, Developer Documentation
  20. 20. Current Focus Devices Storage: Device, Browser, OS Access: User Agent Locations Storage: Country, State, City Access: IP Address Tours Storage: People, Interest, Culture, Destination, City, Activity, Duration Access: What, Where, For
  21. 21. Storage & Access Master Data Storage Storage which is highly efficient for read but at the same time efficient for writes. Additional requirement to be able to search the stored data as well as flexible efficient query interface to enable faster access Meta Data Storage Storage which is highly flexible in defining relationships like inheritance, composition or other relationships. Graph modeled relationships are most flexible to change as and when the model evolves Diagram featured by Meta Data Access CRUD, Fill in the blanks, Semantic Query, Search Master Data Access CRUD, Query (Structured / Unstructured) & Search
  22. 22. References      10steps_DataCategories.pdf  04/26/more-on-metadata-and-master-data- management-intersection/  management/
  23. 23. Thanks! Any questions? You can find me at: @digikrit Special thanks to all the people who made and released these awesome resources for free:  Presentation template by SlidesCarnival  Presentation models by SlideModel & PoweredTemplate  To companies behind kong, cassandra, neo4j & elasticsearch