Online Available Data Services: a Primer

1,165 views
1,097 views

Published on

Data as a Service (DaaS) is the new paradigm where data will be delivered on-demand to be consumed within platform and third-party services. As Open Data gains mainstream adoption at every level of government around the world, public sector organizations are increasingly looking to participate in data ecosystems and drive adoption of their data as fuel for innovation. In this context open data speed up economics combining not only government's open data but heterogeneous, large and rapidly changing dataset from every public sources like social networks, DBpedia (Wikipedia) and many more. The business idea is to design and implement an effective platform able to collect, aggregate and interlink data accessible via APIs in order to enable the creation of the new apps and services for customers. The main pillars of the entire construction is the ability to abstract both the data store and the data access patterns combining a big data architecture with traditional one depending on the data type and volume. Large datasets may want to live on a Hadoop or HBASE cluster, real-time data may have a Cassandra truth store and relax the transactional guarantees but all keep the same APIs.

Published in: Technology, Education
2 Comments
2 Likes
Statistics
Notes
No Downloads
Views
Total views
1,165
On SlideShare
0
From Embeds
0
Number of Embeds
81
Actions
Shares
0
Downloads
0
Comments
2
Likes
2
Embeds 0
No embeds

No notes for slide

Online Available Data Services: a Primer

  1. 1. Online available data services: a primer Silvano Galasso Michele Piunti
  2. 2. 3 • Where is the data? • The way to use Data • APP-ify the Data • Technological perspective • Prototype example Agenda
  3. 3. Where is the Data?
  4. 4. 5 Public sector organizations are increasingly looking to participate in data ecosystems and drive adoption of their data as fuel for innovation Data is everywhere
  5. 5. 6 Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. In this context open data speed up economics combining not only government's open data but heterogeneous, large and rapidly changing dataset from every public sources like social networks, DBpedia (Wikipedia) and many more. Open Data as main source
  6. 6. 7 Under the UK presidency during the recent G8 Summit (17-18 June) an Open Data Charter has been ratified Open Data is the global drive : • To enforce Transparency, Innovation, exchange between pepole and countries • To fuel better outcomes in public services such as health, education, public safety, environmental protection, governance, etc. • To provide a catalyst for innovation in the private sector, supporting the creation of new markets, businesses, and jobs. [2013-2015] time for planning and implementation G8 Open Data Charter https://www.gov.uk/government/publications/open-data-charter
  7. 7. 8 Where Open Data is http://census.okfn.org/ https://nycopendata.socrata.com/ https://dati.lombardia.it ..and counting
  8. 8. The way to use Data
  9. 9. 10 Multiple legal or regulatory restrictions on the use of the data. Legal Restrictions, Privacy, Licenses
  10. 10. 11 Third parties offers public data as valuable services APIs freely available under certain usage quota Data owner and APIs Source: Jonhn Musser, Programmable web
  11. 11. 12 5★ Open Data ★ make PUBLIC stuff available on the Web (whatever format, .jpeg .pdf) under an open license ★★ make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ use non-proprietary formats (e.g., CSV instead of Excel) ★★★★ use URIs to denote things, so that people can point at your stuff ★★★★★ link your data to other data to provide context Tim Berners-Lee, the inventor of the Web and Linked Data initiator, suggested a 5 star deployment scheme for Open Data.
  12. 12. 13 Recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs OWL and RDF. Linked Open Data 1. Requires Ontologies to be applied to data 2. Allows heterogeneous Nodes to be traversed in a semantically coherent fashion http://live.dbpedia.org/page/Adoration_of_the_Magi_of_1475_(Botticelli) http://live.dbpedia.org/page/Primavera_(painting) http://live.dbpedia.org/page/Sandro_Botticelli
  13. 13. 14 Open government data Municipal Regional National Community data Geographic Media Scientific Encyclopedic Data from third parties Facebook Twitter LinkedIn Google Data could be linked Linked Open Data Could be linked under certain conditions
  14. 14. APP-ify the Data
  15. 15. 16 Data modeling Identify Integrate Store Process Visualize • Unstructured source: Forum, Blog, Social Network, Web Data from which to extract the discussions. • Structured source: Operational database, CRM, SCM, ERP and other tools from which informations is collected. • Metadata ingestion: The selected information enriched with metadata can create relationships between the authors, websites, forums etc. • Information acquisition: The information is collected without any structure or filtration mechanisms with several connectors. • Data organization: The metadata and information are stored in a distributed environment • KPI Generation: The data are elaborated to produce KPI summary. • Organization: The Data is categorized through KPI calculations. • Data calculation: Through the calculation and statistical instruments the data is modeled • Analysis of data: Application of statistical models enhance the information in terms of quality • Use: The data are aggregated to create and summarize the results of the analysis. • Display: Trough a report environment data are displayed to visualize the results
  16. 16. 17 Data as a Service (DaaS) Develop an easy to use Platform that offers data sets management (collect, aggregate and interlink data accessible via APIs) in order to enable the creation of the new apps and services for customers
  17. 17. 18 Google index and search information from web, we are able to index, collect and expose data with APIs. Business case Business case example
  18. 18. Technological perspective collect data and produce APIs
  19. 19. 20 Data source aggregations Mashup Diverse Data Sets After shaping a table to the form you want, easily join it with another to uncover the hidden relationships between them. Integrate heterogeneous data sources Many open data could be the source of a complex big data system Develop connectors The connectors allow to ingest the data into the system …and so forth
  20. 20. 21 Data models and technologies Document • Document- Oriented Storage • Full Index Support • Replication & High Availability • Auto-Sharding • Querying • GridFS LinkedData • Graph model for data representation • Full ACID transactions • Native storage engine • Massively scalable • Multiple graph query language MapReduceandHDFS • Distributed Files System • JobTracker • TaskTracker • Log and file stream • Real time analysis • Fast access data • Sensors and IOT Relational • Transanctional operation • ACID based • Entity- Relationship • Legacy system • User administration
  21. 21. 22 Architecture overview STORAGE DATA SOURCE Open DataPrivate DataPublic Data CRMERPDWH RDF PROCESS and ANALYTICS DATA and APIs PROVIDER APIs Users API clients APIs and DATA CONNECTORS Spring Data JDBC
  22. 22. Prototype example A first POC
  23. 23. 24 We may recognize few contingencies in our scenario: • Exponential growth in data volumes • Rise of connectedness • Increase in degrees of semi-structure • Structures and Schemes emerge rather than having a pre-defined upfront Key facts: • Volume: the size of the stored data • Velocity: the rate at which data changes over time • Variety: the degree to which data is regularly or irregularly structured, dense or sparse, and importantly connected or disconnected Enriching Open Linked Data
  24. 24. 25 Graph theory was pioneered by Euler in the 18th century, received multidisciplinary contributes across centuries Graph is an ordered pair G = (V, E) comprising set V of vertices or nodes together with a set E of edges or lines, which are 2-element subsets of V . Graph Theory One trick is to search for “graph based approach to” and your problem.
  25. 25. 26 Six Degrees
  26. 26. 27
  27. 27. 28 • Facebook, Google and Twitter have centered their business models around their own proprietary distributed graph technologies Graph databases store information in ways that much more closely resemble the ways the world is organized and the humans “think about” data. Top 10 Gartner IT technologies in 2013 “[..] are designed to support new transaction, interaction and observation use cases involving web scale, mobile, cloud and clustered environments” Storing Data in Graphs • Facebook The Association and Objects (TAO) Data Store https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-the-graph/10151525983993920 • Twitter FlockDB https://github.com/twitter/flockdb
  28. 28. 29 Neo4j Stack DATA STORAGE AND TRAVERSING DATA ACCESS AND PROCESSING DATA IMPORT Batch Import Neo4j
  29. 29. 30 Graph DB place relationships as first-class abstractions of the data model A Graph –[:RECORDS_DATA_IN] Nodes –[:WHICH_HAVE] Properties. Nodes –[:LINKED_BY] Relationships From Relational to Graph based Modeling • It contains nodes and relationships • Nodes contain properties (key- value pairs) • Relationships are named, directed and always have a start and end node • Relationships can also contain properties
  30. 30. 31 Shake RDBMS while keeping all the relationships, and you’ll see a graph Where RDBMS are optimized for aggregated data, Graph Database are optimized for highly connected data From Relational to Graph based Modeling
  31. 31. 32 Graph -directed Infrastructure DATA STORAGE AND TRAVERSING DATA ACCESS AND PROCESSING DATA IMPORT Batch Import Neo4j ENTERPRISE MANAGEMENT VISUALIZATION API Connector API Provider
  32. 32. 33 It is possible to derive queries for domain entities from finder method names like Iterable<T> @Indexed fields will be converted into index-lookups of the start clause, navigation along relationships will be reflected in the match clause properties with operators will end up as expressions in the where clause Spring Data Neo4j
  33. 33. 34 Linking Data
  34. 34. 35 Open Linked Graph User
  35. 35. 36 Open Linked Graph Document User [:OWNS] Document [:OWNS]
  36. 36. 37 Open Linked Graph Document User [:OWNS] [:INCLUDES] [:INCLUDES] [:INCLUDES] Document [:INCLUDES] [:INCLUDES] [:INCLUDES] [:OWNS] Node Node Node Node NodeNode
  37. 37. 38 Open Linked Graph Document User [:OWNS] [:INCLUDES] [:INCLUDES] [:INCLUDES] Document [:INCLUDES] [:INCLUDES] [:INCLUDES] [:OWNS] [:DBP_LINKED][:LOCATED] Node Node Node Node NodeNode [:LOCATED] [:DBP_LINKED] Venue VenueDBPedia URI DBPedia URI [:DBP_LINKED] [:LOCATED] [:LOCATED] [:DBP_LINKED] Venue VenueDBPedia URI DBPedia URI Open API
  38. 38. Thanks Silvano Galasso Michele Piunti

×