Successfully reported this slideshow.

Course 4 : Big Data Structuring, Integration and Management Systems by Daan Gerits

0

Share

1 of 33
1 of 33

Course 4 : Big Data Structuring, Integration and Management Systems by Daan Gerits

0

Share

Download to read offline

For more info about our Big Data courses, check out our website ➡️ https://www.betacowork.com/big-data/
---------
"Data is the new oil" - Many companies and professionals do not know how to use their data or are not aware of the added value they could gain from it.

It is in response to these problems that the project “Brussels: The Beating Heart of Big Data” was born.

This project, financed by the Region of Brussels Capital and organised by Betacowork, offers 3 training cycles of 10 courses on big data, at both beginner and advanced levels. These 3 cycles will be followed by a Hackathon weekend.

No prerequisites are required to start these courses. The aim of these courses is to familiarize participants with the principles of Big Data.
------
For more info about our Big Data courses, check out our website ➡️ https://www.betacowork.com/big-data/

For more info about our Big Data courses, check out our website ➡️ https://www.betacowork.com/big-data/
---------
"Data is the new oil" - Many companies and professionals do not know how to use their data or are not aware of the added value they could gain from it.

It is in response to these problems that the project “Brussels: The Beating Heart of Big Data” was born.

This project, financed by the Region of Brussels Capital and organised by Betacowork, offers 3 training cycles of 10 courses on big data, at both beginner and advanced levels. These 3 cycles will be followed by a Hackathon weekend.

No prerequisites are required to start these courses. The aim of these courses is to familiarize participants with the principles of Big Data.
------
For more info about our Big Data courses, check out our website ➡️ https://www.betacowork.com/big-data/

More Related Content

Course 4 : Big Data Structuring, Integration and Management Systems by Daan Gerits

  1. 1. Big Data Structuring, Modeling, Managing COURSE BY DAAN GERITS
  2. 2. CONTENT Dive into the techniques that make data systems scale 1 ANATOMY 2 DATA AT SCALE What is so different in working with data the traditional way vs the bigdata way? 3 DATA MODELS An overview of the most popular types of data models 4 ADVICE So what to make of all this?
  3. 3. Course by Daan Gerits Data expert at design is dead Co-Founder of Fitchain.io data unicorn, technopreneur, founder Daan Gerits @daangerits Co-Founder of Bigdata.be https://pbs.twimg.com/profile_imag es/431014702533976064/7RZOwlp H_400x400.jpeg
  4. 4. 01 ANATOMY Discover the techniques that make data systems scale • Replication and Partitioning • System load
  5. 5. Course by Daan Gerits What? Copy data across physical nodes Why? Improve reliability and fault tolerance How? Create replica’s of the data and keep those in sync Replication
  6. 6. Course by Daan Gerits What? Partition the data and distribute across physical nodes Why? Scale data systems How? Logical partitioning key Same partitioning key goes to same node Partitioning
  7. 7. Course by Daan Gerits Read Heavy Most of the operations are read operations Write Heavy Most of the operations are write operations Balanced # read operations == # write operations Load
  8. 8. Course by Daan Gerits How you store the data depends on how you query the data
  9. 9. 02 DATA AT SCALE To seasoned data professionals a lot of the techniques and approaches do not seem so different to what they have done during the past decades. So what is so different?
  10. 10. Course by Daan Gerits At the core of big data is the ability to deal with the volume, variety and velocity of data.
  11. 11. Course by Daan Gerits Big Data is all about new ways of thinking about data
  12. 12. THINK DIFFERENT OPERATIONAL Automate your processes through the use of data BUSINESS Change the metrics you use to measure success PERSONAL Data makes people important again. This doesn’t stop with the customer
  13. 13. Course by Daan Gerits TRADITIONAL APPROACH Supply Model Request Request Request
  14. 14. Course by Daan Gerits Big Data Approach Supply Model Request Request Request Model Model
  15. 15. 03 DATA MODELS How you want to retrieve your data has an impact in how you store your data. These data models provide almost standard approaches to do so.
  16. 16. HOW DATA IS STORED GRAPH Data model built out of nodes and their connections COLUMN FAMILY Seriously powerful but complex data model, ideal for sparse data KEY-VALUE A very simple data model mapping a key to a value KV DOCUMENT A data model where the structure of every value can be different
  17. 17. KEY-VALUE KEY VALUE users.214.name Daan gerits users.214.birthdate 18/05/1983 users.214.roles [user, admin] users.214.isSubscribed true users.214.social.twitter @daangerits
  18. 18. Course by Daan Gerits Fast Lookups But no way to query the data Scanning if keys are ordered Flexible value types Key and value can be anything, even collections and more complex data structures Easy to scale - Little to no dependencies between key-value pairs - Ordering can become difficult to scale Use cases - Caches - Configuration KEY-VALUE
  19. 19. Course by Daan Gerits SCAN <prefix> Scan through all pairs where the key matches the given prefix. This is only possible if the keys are ordered GET <key> Get a key-value pair by its key SET <key> <value> Set the value of the given key DELETE <key> Remove the pair with the given key KEY-VALUE
  20. 20. DOCUMENT KEY DOCUMENT daan { “name”: “Daan Gerits”, “birthday”: “18/05/1983” } wim { “name”: “Wim Van Leuven”, “company”: “Highestpoint” }
  21. 21. Course by Daan Gerits Queryable Technology specific query language Separate index needs to be kept in sync Flexible value types Key can be anything Value is structured type (JSON, BSON, XML, …) Scalability requires caution - Relationships between documents - Scaling search can become a hurdle Use cases - Search engines - Entity Data Stores DOCUMENT
  22. 22. Course by Daan Gerits FIND <query> Find all documents matching the given query GET <key> Get the document matching the given key CREATE <key> <document> Create a new document with the given key UPDATE <key> <field> <value> Update the given field within the document with the given key DELETE <key> Remove the document with the given key DOCUMENT
  23. 23. GRAPH teaches Name: Daan Type: Tutor 1 Name: Els Type: Tutor 2 Name: bigdata Type: Course 3 Name: Amy Type: Student 4 teaches friend of enrolled in
  24. 24. Course by Daan Gerits Relationships are first class citizens Graph traversal in specific language Updating relationships is cheap Easy concepts Node with properties Edge Very hard to scale Golden Ratio Scaling requires deep knowledge of the data Use cases - Social modeling - Metadata stores GRAPH
  25. 25. Course by Daan Gerits LINK <type> <src-node-id> <target-node-id> Create a new link with the given characteristics UNLINK <type> <src-node-id> <target-node-id> Remove the link with the given characteristics GET <node-id> Get the node with the given node id SET <node-id> <properties> Set the properties of the node with the given id DELETE <node-id> Remove the node with the given id GRAPH
  26. 26. COLUMN FAMILY KEY DEFAULT INVOICES name birthday 2018/001 20../... 2019/483 customers/214 Daan Gerits 18/05/1983 { total: 980.03, … } ... { total: 38.73, … } customer/583 Wim Van Leuven 10/05/1973 { total: 20.83, … } ... { total: 378.60, … }
  27. 27. Course by Daan Gerits Seemingly trivial concepts Table, RowKey, Column Family, Column Hard to reason about Dynamic column names Optimize for retrieval Very fast All data including related data in one request Use cases - Analytical stores COLUMN FAMILY
  28. 28. Course by Daan Gerits SCAN <prefix> Scan through all records where the key matches the given prefix. GET <key> <column_family> [, <column_family>] Get the given column families for the given key SET <key> <value> Set the value of the given key DELETE <key> Remove the record with the given key COLUMN FAMILY
  29. 29. 04 ADVICE So how to deal with all of this?
  30. 30. Course by Daan Gerits Data model for writing can differ from data model for reading
  31. 31. Course by Daan Gerits Always start from the questions you are to answer
  32. 32. Course by Daan Gerits If you need a join, you most likely did it wrong!
  33. 33. Questions? @daangerits

×