Data Abstraction for Large Web Applications

4,002 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
4,002
On SlideShare
0
From Embeds
0
Number of Embeds
1,224
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • Years and years ago, when the web was young, state was maintained simply by the creation of a database. Web applications were mostly small, and databases could easily handle the traffic that was being sent their way. Most of us learned how to write web applications against a database. Most of us used the “LAMP stack” or Linux Apache MySQL PHP.\n
  • As the web grew up, and grew bigger, methods for obtaining, storing and using data changed.\n\nDevelopers began using data sources provided by others, first over SOAP then REST. Other data stores like NoSQL, Redis, Elastic Search and Memcache came along to complicate things. \n\nIt was no longer all about the database. The database was just one piece of the puzzle.\n
  • Yet if we take a good look at most of the frameworks available, they’re database-centric. For a long time, Doctrine support for other data layers was non-existent. Support for something other than a database in Django is non-existent. We still think in a database-centric way. Or data layers are still database-focused.\n
  • The bottom line: we need to change our thinking.\n\nDatabases are not it. Even for applications that start against a database (and that’s most if not all of them), we need to think about the other ways that we’ll ingest data.\n
  • This lesson was painful for those of us working on Socorro. Initially built as a database-centric application we’ve slowly expanded our technology stack as new needs have arisen. While much of our webapp data comes from Postgres, we’ve begun a process of moving our data layer to a more source-agnostic middleware layer.\n
  • \n
  • It’s clear for us that a database centric model doesn’t work anymore. We can’t think of data in concepts of rows and columns. It doesn’t work like that. \n\nSo how do we solve this problem?\n
  • \n
  • Large web applications don’t pursue abstraction as an art form. They pursue it as a necessity. Failing to properly abstract a large web application can result in catastrophic failure. It is therefore important to abstract the layer that gets data from a data storage unit from the layers that use the data.\n\nHere’s an example...\n
  • When programmers are in a hurry they often don’t take the time to abstract their code in a way that makes it easy to come along later and make changes. I’ve seen this example hundreds of time in codebases I’ve worked on; many of you probably have too. But the problem here is that if ever the data source changes from some SQL-based database to something else, a programmer will have to rewrite the logic here and everywhere else all over again. This makes the cost of transition much higher than it has to be.\n
  • When programmers are in a hurry they often don’t take the time to abstract their code in a way that makes it easy to come along later and make changes. I’ve seen this example hundreds of time in codebases I’ve worked on; many of you probably have too. But the problem here is that if ever the data source changes from some SQL-based database to something else, a programmer will have to rewrite the logic here and everywhere else all over again. This makes the cost of transition much higher than it has to be.\n
  • It would make good sense to therefore abstract the process of \n
  • We should instead use adapters to query the data and return it in an agreed upon format. The processing takes place elsewhere.\n
  • \n
  • NAP story. Data layer Postgres focused.\n
  • When the retrieval and processing are combined, it makes it that much harder to remove one from the other in the future.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • When you think in terms of actions, rather than data sources, you don’t care what happens behind the scenes. Instead, you start caring about the finished product. In Socorro, we have reports that use both Hbase and Postgres data. If we cared about the data source, we’d have many more calls than we need.\n
  • \n
  • If we use JSON as a standard data format throughout our app, we can construct generic objects easily without worrying about what methods are automatically available to us.\n
  • \n
  • Rather than relying upon model-constructed or ORM-built objects, we should create our own when and if the need arises. \n
  • It’s okay to process the results from a database query into some standard format or create an object using the data. But once the data has been retrieved, it should be pushed into a standard format that can be used in the app without caring about what the data source was.\n
  • Developers are drawn to things that are new, cool, or otherwise unique and special. But it’s important to use the correct storage medium for development.\n
  • \n
  • \n
  • \n
  • Socorro uses ElasticSearch (not a NoSQL database) and Hbase. We should have used Cassandra, but we have Hbase instead.\n
  • \n
  • External APIs, the file system, all are valid data storage mechanisms. Just because we write database-driven applications doesn’t mean our data storage has to be entirely a database. A REST API to an external resource is a valid data storage mechanism, that isn’t database-driven (at least as far as your app is concerned).\n
  • \n
  • \n
  • \n
  • Data Abstraction for Large Web Applications

    1. 1. Data Abstraction forLarge Web Applications By Brandon Savage
    2. 2. Who Am I?• Software developer at Mozilla working on Socorro• Author of the PHP Playbook• Former frequent blogger on PHP topics• Private pilot in my spare time
    3. 3. Data Abstraction For LARGE Web Applications
    4. 4. No magic bullets
    5. 5. Once upon a time...In a galaxy far far away...
    6. 6. Eventually the web grew up. And grew larger.
    7. 7. Most webapps still start asthough they’ll always use a database.
    8. 8. We need to change our thinking.
    9. 9. Socorro
    10. 10. Socorro Data Sources• Postgres• REST API (Middleware)• Elastic Search• Hbase• Bugzilla REST API• Memcache
    11. 11. A database-centric model just doesn’t work anymore.
    12. 12. Solving the problem• Separate the use of data from the retrieval of data.• Think in terms of actions.• Build our applications to be storage agnostic.• Use the correct data storage medium.
    13. 13. #1 Separate the use of data from the retrieval of data
    14. 14. <?phpclass MainPage_Controller {/* ... */public function do_something(){ /* ... */ $sql = ‘SELECT * FROM database”; $results = $this->execute($sql); return $this->executeView(‘index’, array(‘results’ => $results)); }}
    15. 15. <?phpclass Data_Model {/* ... */public function get_some_data() { $sql = ‘SELECT * FROM database”; $results = $this->execute($sql); /** process results **/ return $processedResults }}
    16. 16. Processing the data is a separate layer.
    17. 17. <?phpclass Data_Model {/* ... */ public function getSomeData() { $data = $this->adapter->queryData();! /** process data here **/! return $processedData; }}class Data_Model_Adapter extends MySQL_Adapter implements Adapter{ public function queryData() { $sql = ‘SELECT * FROM table’; /** turn into common format **/ return $commonFormatData; }}
    18. 18. Swapping out data sources becomes very simple.
    19. 19. A cautionary tale
    20. 20. Move to middleware in Socorro
    21. 21. Make life easier on yourself: do it right the first time!
    22. 22. #2 Think in terms of actions.
    23. 23. Actions move beyond SELECT,INSERT, UPDATE and DELETE.
    24. 24. Domain Modeling:“What are you modeling?”
    25. 25. What do I want? What do I need?What does this data represent?
    26. 26. Django Models: One model per table.All methods relate to SQL. That sucks.
    27. 27. <?phpabstract class User_Model {public function loadUser();public function authenticateUser();public function showUserPhones();}
    28. 28. #3 Build our applications to be storage agnostic
    29. 29. Use a standard data format
    30. 30. stdClass()
    31. 31. Create custom objects for typehinting or additional methods
    32. 32. Avoid expecting built-ins like PDOStatement and MongoCursor outside retrieval layer
    33. 33. #4 Use the correct storage medium.
    34. 34. Example: memcache isn’t for long-term storage.
    35. 35. Example: MongoDB is not for relational data storage.
    36. 36. Relational data goes in relational databases!
    37. 37. Choose the correct NoSQL database for your needs.
    38. 38. Availability, reliability, and consistency. Pick two.
    39. 39. Consider data storage that isn’t a database at all.
    40. 40. Alternative data options• Elastic Search• Redis• S3• The File System (Yes! It still exists!)
    41. 41. Fix it now or fix it later.But you will have to fix it.
    42. 42. Question time

    ×