Successfully reported this slideshow.
Your SlideShare is downloading. ×

Data Lakehouse Symposium | Day 1 | Part 1

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 43 Ad

Data Lakehouse Symposium | Day 1 | Part 1

Download to read offline

The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.

Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.

Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.

This is an educational event.

Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.

The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.

Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.

Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.

This is an educational event.

Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Advertisement

More from Databricks (20)

Advertisement

Recently uploaded (20)

Data Lakehouse Symposium | Day 1 | Part 1

  1. 1. A presentation by W H Inmon DATA LAKEHOUSE – THE BASIC ELEMENTS
  2. 2. All data in the corporation Structured data Textual data Analog/IoT data
  3. 3. Structured data Textual data Analog/IoT data Each of the different types of data have their own unique characteristics
  4. 4. Structured data Usually transaction based record key attribute index Bank transactions Point of sale Telephone call Payments made Payments received …………………….
  5. 5. Structured data The same record type is repeated Each record has different contents
  6. 6. Textual data Medical records Contracts Internet Call centers Warranty claims Insurance claims Email ……………….. Text is found everywhere English Spanish Portuguese French Mandarin Korean German Formal language Slang Acronyms …………….. Voice Written Internet Video ………………..
  7. 7. Textual data Textual ETL taxonomies Text is transformed Into a structured format
  8. 8. Analog/IoT Machine generated Drones Electric eye Temperature gauge Speed Mechanical Telemetry …………………….
  9. 9. Analog/IoT telemetry Date Sept 2, 2021 Time 11:21 am Location from Denver Location to Co Spgs Elevation 786 792 812 854 901 978 1012 1256 1469 1672 2018 2259 2871 …….. Speed 0 35 79 124 197 276 367 416 521 702 835 915 ….. Telemetry data is generated as the rocket is launched and is measured throughout the flight
  10. 10. Analog/IoT The data lake is created by throwing data all the data into the lake Textual data Structured data
  11. 11. Analog/IoT Soon the data lake turned into a swamp
  12. 12. Analog/IoT The data swamp was not good for anyone….
  13. 13. Analog/IoT The data lake needs to be turned into a lakehouse
  14. 14. Analog/IoT All this education and 95% of my job is being a data garbageman Data scientist
  15. 15. Analog/IoT Data scientist Ah, that’s more like it infrastructure
  16. 16. Analog/IoT Machine generated Time – 0912 Time – 0916 Time – 1002 Time – 1008 Time – 1017 ……………. Basic, raw measurements High probability High performance Low probability Bulk storage Analog/IoT data is often segmented
  17. 17. High probability High performance Low probability Bulk storage Date of launch Ultimate speed Ultimate height Final landing point Second by second measurements
  18. 18. Structured data Textual data Analog/IoT data Relative volumes of data in each sector
  19. 19. Structured data Textual data Analog/IoT data Business value and the volumes of data
  20. 20. Structured data Textual data Analog/IoT data Relational format Raw data format From a format standpoint, the structured and the textual environments are very different from the analog/IoT environment Format compatibility
  21. 21. Structured data Textual data Analog/IoT data Key compatibility – very unintegrated Content compatibility
  22. 22. Structured data Textual data Analog/IoT data In order to do analytics, there must be some common data on which to do a comparison Without common data it is very difficult to do a meaningful comparison
  23. 23. Structured data Textual data Analog/IoT data The problem is that there may be no obvious, easy way to isolate common identifiers
  24. 24. Structured data Textual data Analog/IoT data Fortunately there are such things as universal common connectors
  25. 25. Structured data Textual data Analog/IoT data Universal common connectors exist regardless of the way that data has been collected
  26. 26. Structured data Textual data Analog/IoT data Universal common connector for anything geography time dollar amount General common connectors
  27. 27. Structured data Textual data Analog/IoT data Universal common connector for humans gender age race Common connectors for humans
  28. 28. Structured data Textual data Analog/IoT data Universal common connector for physical objects weight color cost size shape Common connectors for objects
  29. 29. SOME EXAMPLES Universal common connector
  30. 30. Healthcare – outcomes analysis
  31. 31. Did the medicine work? Did the vaccination work? Did the operation have the right effect? Outcome analysis
  32. 32. Structured data Textual data Analog/IoT data Prolia Estrogen Vitamin D Algaecal Calcitonin Sales of - Doctor’s notes tests diagnosis procedure medication history …………… X rays date location patient age examination results
  33. 33. Structured data Textual data Analog/IoT data What medicines have been purchased What medicines have been prescribed and/or discussed with doctors By state By age By gender By state By age By gender What outcomes have been achieved By state By age By gender
  34. 34. What medicines have been purchased By state By age By gender What medicines have been prescribed and/or discussed with doctors By state By age By gender What outcomes have been achieved By state By age By gender Analyses – how does treatment in Utah vary from treatment in Oregon? is Prolia more effective than estrogen? when patients are treated with Algaecal, what other side effects are noticed? do women have better results than men? how much does age affect – the types of treatment for osteoporosis the effectiveness of treatment whether men react differently than women
  35. 35. What medicines have been purchased By state By age By gender What medicines have been prescribed and/or discussed with doctors By state By age By gender What outcomes have been achieved By state By age By gender When you have both treatment and outcome data together, you can answer – for the first time – important questions about treatment, medication, dosage, side, effects, demographics of treatment You can match outcome with treatment
  36. 36. What medicines have been purchased By state By age By gender What medicines have been prescribed and/or discussed with doctors By state By age By gender What outcomes have been achieved By state By age By gender The result is healthier people and longer life and better quality of life
  37. 37. Manufacturing
  38. 38. Structured data Textual data Analog/IoT data Sales data unit sold date of sale location of sale customer address Warranty claims unit unit type defect severity in use desc Manufacturing data unit id lot id date of manufacture machine used operator
  39. 39. Textual data Structured data Analog/IoT data Units sold Date of sale Location of sale Unit id Defect description Date of warranty Unit id Machine used for manufacture Date of manufacture Operator Lot id Manufacture telemetry Unit id Unit id Unit id
  40. 40. Units sold Date of sale Location of sale Unit id Defect description Date of warranty Unit id Machine used for manufacture Date of manufacture Operator Lot id Manufacture telemetry Analyses – what manufacturing machines are producing defects what manufacturing machines are not producing defects what operators are producing defects what operators are not producing defects what telemetry needs to be adjusted under what conditions are defects created …………………………………………………………
  41. 41. Units sold Date of sale Location of sale Unit id Defect description Date of warranty Unit id Machine used for manufacture Date of manufacture Operator Lot id Manufacture telemetry With all of this data together and able to be analyzed you can now tell what defects can be corrected and what conditions cause defects to occur. The manufacturing process can be materially improved
  42. 42. Units sold Date of sale Location of sale Unit id Defect description Date of warranty Unit id Machine used for manufacture Date of manufacture Operator Lot id Manufacture telemetry Now manufacturing can be done efficiently and in a cost effective manner
  43. 43. With analytics from the data lakehouse, you can improve the lives and livelihood of many people

×