SlideShare a Scribd company logo
Social networking architectures what can we learn
An enthusiasts view 2
How do sites with a social networking angle figure globally? 3 As ranked by Alexa Site		Global ranking Facebook		2 YouTube		3 Yahoo			4 Windows Live	5 Blogger		7 Wikipedia		8 Twitter		10
3 Principles 4 3 common principles Fast feature delivery is key Cache everything everywhere Relational data is dead
Interesting stats 5 	Facebook - 	Serve 120 million queries per second 			without a single join 	37 Signals -	Developed a production application 				serving over 4 million items using 				only 579 lines of code 	Flickr - 		2 Billion photos served without 				using 	relational  databases
How did they do it? 6 Nobody thought this was possible Unencumbered by history or restrictive rules Had to be creative in solving problems that nobody had experienced using very little capital outlay
3 Principles 7 Fast feature delivery is key Cache everything everywhere Relational data is dead
Fast feature delivery is key 8 Choose an appropriate language Speed of development more important than speed of execution Languages like PHP and Ruby commonly used for rapid development and deployment
Language is not religion 9
3 Principles 10 Fast feature delivery is key Cache everything everywhere Relational data is dead
Cache everything everywhere 11 You need a really good reason not to cache data for reading Local caching a good start but more than one server means duplicating the cache no group invalidation memory limited to how much spare RAM on the server Most social networks use a distributed cache like memcached
Cache everything everywhere 12 Check if the information is in the cache. If so, use it If not, query the database put the result in the cache On update delete from the cache. The next user goes to the database function get_foo(int userid) {  	result = memcached_fetch("userrow:" + userid);  	if (!result) {  		result = db_select("SELECT * FROM users WHERE userid = ?", userid);  		memcached_add("userrow:" + userid, result);  	} return result;
Responsivness is key 13
3 Principles 14 Fast feature delivery is key Cache everything everywhere Relational data is dead
Everybody wants to use a database 15
Relational issue No 1 - Normalisation 16 Relational databases do not scale well because of normalisation Why normalise? 			- reduce storage space 			- reduce anomalies Today  			- storage is cheap 			- as data gets larger, joins are expensive
Relational issue No 2 - Transactions 17 ACID principles govern transactions Relational databases do not scale well because of transactions
After relational 18 Use BASE (basically available, soft state, eventually consistent) Shard Data Favour Name value pair stores over relational databases
Lessons for enterprise 19 Design of software should always be it depends. Test your most basic assumptions Dynamic languages and frameworks may be suitable to deliver a feature quickly You don't need an RDBMS for everything, especially if you need huge scale You should always cache data for read (unless you shouldn’t)
Fresh ideas always welcome 20
Find me here  21 MarkGreville@itarc.ie
Or find me here 22

More Related Content

Viewers also liked

Iasa Architect responsibilities in the cloud
Iasa Architect responsibilities in the cloudIasa Architect responsibilities in the cloud
Iasa Architect responsibilities in the cloud
iasaglobal
 
Cita iasa certifications
Cita iasa certificationsCita iasa certifications
Cita iasa certificationsAdams Firdaus
 
Why certify
Why certifyWhy certify
Why certify
Matt Deacon
 
Iasa, Iasa Ireland, ICS Jan 2011
Iasa, Iasa Ireland, ICS Jan 2011Iasa, Iasa Ireland, ICS Jan 2011
Iasa, Iasa Ireland, ICS Jan 2011iasaireland
 
The Role of the Software Architect
The Role of the Software ArchitectThe Role of the Software Architect
The Role of the Software Architect
Hayim Makabee
 
Architecting multi sided business
Architecting multi sided businessArchitecting multi sided business
Architecting multi sided business
Richard Veryard
 
User story estimation with agile architectures
User story estimation with agile architecturesUser story estimation with agile architectures
User story estimation with agile architectures
Raffaele Garofalo
 
Solution architecture
Solution architectureSolution architecture
Solution architecture
iasaglobal
 
Are You an Accidental or Intention Software Architect
Are You an Accidental or Intention Software ArchitectAre You an Accidental or Intention Software Architect
Are You an Accidental or Intention Software Architect
Randy Ynchausti
 
Software architecture in an agile environment
Software architecture in an agile environmentSoftware architecture in an agile environment
Software architecture in an agile environment
Raffaele Garofalo
 
Business Process Management: Implementing Continuous Improvement in Your Orga...
Business Process Management: Implementing Continuous Improvement in Your Orga...Business Process Management: Implementing Continuous Improvement in Your Orga...
Business Process Management: Implementing Continuous Improvement in Your Orga...
Henry Chandra
 
Platforms or Two-sided markets
Platforms or Two-sided marketsPlatforms or Two-sided markets
Platforms or Two-sided markets
Martin Westhead
 
Structured Approach to Solution Architecture
Structured Approach to Solution ArchitectureStructured Approach to Solution Architecture
Structured Approach to Solution Architecture
Alan McSweeney
 

Viewers also liked (13)

Iasa Architect responsibilities in the cloud
Iasa Architect responsibilities in the cloudIasa Architect responsibilities in the cloud
Iasa Architect responsibilities in the cloud
 
Cita iasa certifications
Cita iasa certificationsCita iasa certifications
Cita iasa certifications
 
Why certify
Why certifyWhy certify
Why certify
 
Iasa, Iasa Ireland, ICS Jan 2011
Iasa, Iasa Ireland, ICS Jan 2011Iasa, Iasa Ireland, ICS Jan 2011
Iasa, Iasa Ireland, ICS Jan 2011
 
The Role of the Software Architect
The Role of the Software ArchitectThe Role of the Software Architect
The Role of the Software Architect
 
Architecting multi sided business
Architecting multi sided businessArchitecting multi sided business
Architecting multi sided business
 
User story estimation with agile architectures
User story estimation with agile architecturesUser story estimation with agile architectures
User story estimation with agile architectures
 
Solution architecture
Solution architectureSolution architecture
Solution architecture
 
Are You an Accidental or Intention Software Architect
Are You an Accidental or Intention Software ArchitectAre You an Accidental or Intention Software Architect
Are You an Accidental or Intention Software Architect
 
Software architecture in an agile environment
Software architecture in an agile environmentSoftware architecture in an agile environment
Software architecture in an agile environment
 
Business Process Management: Implementing Continuous Improvement in Your Orga...
Business Process Management: Implementing Continuous Improvement in Your Orga...Business Process Management: Implementing Continuous Improvement in Your Orga...
Business Process Management: Implementing Continuous Improvement in Your Orga...
 
Platforms or Two-sided markets
Platforms or Two-sided marketsPlatforms or Two-sided markets
Platforms or Two-sided markets
 
Structured Approach to Solution Architecture
Structured Approach to Solution ArchitectureStructured Approach to Solution Architecture
Structured Approach to Solution Architecture
 

Similar to Delivering Data - Social Networking Personal

Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
Trieu Nguyen
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
markgrover
 
bigdata 2.pptx
bigdata 2.pptxbigdata 2.pptx
bigdata 2.pptx
AjayAgarwal107
 
Performance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and morePerformance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and more
Denodo
 
PyConline AU 2021 - Things might go wrong in a data-intensive application
PyConline AU 2021 - Things might go wrong in a data-intensive applicationPyConline AU 2021 - Things might go wrong in a data-intensive application
PyConline AU 2021 - Things might go wrong in a data-intensive application
Hua Chu
 
Minerva: Drill Storage Plugin for IPFS
Minerva: Drill Storage Plugin for IPFSMinerva: Drill Storage Plugin for IPFS
Minerva: Drill Storage Plugin for IPFS
BowenDing4
 
Eliminating the Problems of Exponential Data Growth, Forever
Eliminating the Problems of Exponential Data Growth, ForeverEliminating the Problems of Exponential Data Growth, Forever
Eliminating the Problems of Exponential Data Growth, Forever
spectralogic
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
Neo4j
 
Key aspects of big data storage and its architecture
Key aspects of big data storage and its architectureKey aspects of big data storage and its architecture
Key aspects of big data storage and its architecture
Rahul Chaturvedi
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
VIJAYAPRABAP
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
Pankajkumar496281
 
Demystifying Data Virtualization (ASEAN)
Demystifying Data Virtualization (ASEAN)Demystifying Data Virtualization (ASEAN)
Demystifying Data Virtualization (ASEAN)
Denodo
 
Stored-Procedures-Presentation
Stored-Procedures-PresentationStored-Procedures-Presentation
Stored-Procedures-PresentationChuck Walker
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic Tool
EDB
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
Venkata Reddy Konasani
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Demi Ben-Ari
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
Bikram Sinha. MBA, PMP
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
MaulikLakhani
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo
 

Similar to Delivering Data - Social Networking Personal (20)

Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
bigdata 2.pptx
bigdata 2.pptxbigdata 2.pptx
bigdata 2.pptx
 
Performance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and morePerformance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and more
 
PyConline AU 2021 - Things might go wrong in a data-intensive application
PyConline AU 2021 - Things might go wrong in a data-intensive applicationPyConline AU 2021 - Things might go wrong in a data-intensive application
PyConline AU 2021 - Things might go wrong in a data-intensive application
 
Minerva: Drill Storage Plugin for IPFS
Minerva: Drill Storage Plugin for IPFSMinerva: Drill Storage Plugin for IPFS
Minerva: Drill Storage Plugin for IPFS
 
Eliminating the Problems of Exponential Data Growth, Forever
Eliminating the Problems of Exponential Data Growth, ForeverEliminating the Problems of Exponential Data Growth, Forever
Eliminating the Problems of Exponential Data Growth, Forever
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Key aspects of big data storage and its architecture
Key aspects of big data storage and its architectureKey aspects of big data storage and its architecture
Key aspects of big data storage and its architecture
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Demystifying Data Virtualization (ASEAN)
Demystifying Data Virtualization (ASEAN)Demystifying Data Virtualization (ASEAN)
Demystifying Data Virtualization (ASEAN)
 
Stored-Procedures-Presentation
Stored-Procedures-PresentationStored-Procedures-Presentation
Stored-Procedures-Presentation
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic Tool
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 

Recently uploaded (20)

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 

Delivering Data - Social Networking Personal

  • 3. How do sites with a social networking angle figure globally? 3 As ranked by Alexa Site Global ranking Facebook 2 YouTube 3 Yahoo 4 Windows Live 5 Blogger 7 Wikipedia 8 Twitter 10
  • 4. 3 Principles 4 3 common principles Fast feature delivery is key Cache everything everywhere Relational data is dead
  • 5. Interesting stats 5 Facebook - Serve 120 million queries per second without a single join 37 Signals - Developed a production application serving over 4 million items using only 579 lines of code Flickr - 2 Billion photos served without using relational databases
  • 6. How did they do it? 6 Nobody thought this was possible Unencumbered by history or restrictive rules Had to be creative in solving problems that nobody had experienced using very little capital outlay
  • 7. 3 Principles 7 Fast feature delivery is key Cache everything everywhere Relational data is dead
  • 8. Fast feature delivery is key 8 Choose an appropriate language Speed of development more important than speed of execution Languages like PHP and Ruby commonly used for rapid development and deployment
  • 9. Language is not religion 9
  • 10. 3 Principles 10 Fast feature delivery is key Cache everything everywhere Relational data is dead
  • 11. Cache everything everywhere 11 You need a really good reason not to cache data for reading Local caching a good start but more than one server means duplicating the cache no group invalidation memory limited to how much spare RAM on the server Most social networks use a distributed cache like memcached
  • 12. Cache everything everywhere 12 Check if the information is in the cache. If so, use it If not, query the database put the result in the cache On update delete from the cache. The next user goes to the database function get_foo(int userid) { result = memcached_fetch("userrow:" + userid); if (!result) { result = db_select("SELECT * FROM users WHERE userid = ?", userid); memcached_add("userrow:" + userid, result); } return result;
  • 14. 3 Principles 14 Fast feature delivery is key Cache everything everywhere Relational data is dead
  • 15. Everybody wants to use a database 15
  • 16. Relational issue No 1 - Normalisation 16 Relational databases do not scale well because of normalisation Why normalise? - reduce storage space - reduce anomalies Today - storage is cheap - as data gets larger, joins are expensive
  • 17. Relational issue No 2 - Transactions 17 ACID principles govern transactions Relational databases do not scale well because of transactions
  • 18. After relational 18 Use BASE (basically available, soft state, eventually consistent) Shard Data Favour Name value pair stores over relational databases
  • 19. Lessons for enterprise 19 Design of software should always be it depends. Test your most basic assumptions Dynamic languages and frameworks may be suitable to deliver a feature quickly You don't need an RDBMS for everything, especially if you need huge scale You should always cache data for read (unless you shouldn’t)
  • 20. Fresh ideas always welcome 20
  • 21. Find me here 21 MarkGreville@itarc.ie
  • 22. Or find me here 22

Editor's Notes

  1. Looked at top 10 sites on the web found 7 with social networking aspectsOther:Google 1Baidu 6QQ.com 9
  2. Decided to look at the traffic and found some very interesting statsFacebook – 200 million active users & 50 billion page views per monthYouTube – over 1 billion views per dayBasecamp – 2 million active accounts & 1.3 million projects managedTwitter – 1 Million + users & 3 million tweets per day
  3. It should be noted that neither are the most efficient languages as they are not compiled (both are interpreted languages, they are not directly executed by the CPU but executed by an interpreter)Sites like Twitter and Yellowpages.com are written using Ruby on Rails. Tada list – has so much build into the framework that a full production app can be developed with very little code.
  4. Some treat language as a religion, its ok to try something different, it doesn’t define you as a person.
  5. Duplicating the cache is a waste of memoryNo group invalidation means you either need to notify all of your servers that they need to refresh their cache or rely solely on cache timeouts.a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.Memcached is used by: Facebook, YouTube, Wikipedia, LiveJournal, Digg, Twitter, SourceForgeMost site founders said that the biggest gain was from implementing a caching layer
  6. There is a significant penalty in going to disk to read every time as opposed to reading from the cache.Implementing a cache is extremely easy, as shown by the code aboveGreat for reading data, but you still have to write data
  7. All about responsivnessUsers wont tolerate long waits on social networksThey are now expecting this behaviour from all software
  8. To prevent anomalies we don't duplicate data. We split everything up so it is stored once. The price of normalization is that when we want a person's address we have to go find the person and their address and bring the data together again. This is called a join. Joins are relatively slow, especially over very large data sets. Not just for reads (caching takes care of this) but for CUD.Flickr decided to denormalize because it took 13 Selects to each Insert, Delete or Update.
  9. eBay do not use transactions, they have so much data that distributed transactions would harm responsiveness. Referential integrity and sorting are done in application code.Atomicity - all parts of a transaction succeed or none of then succeed.Consistency - The database will be in a consistent state when the transaction begins and ends.Isolation - The transaction will behave as if it is the only operation being performed upon the database.Durability - Upon completion of the transaction, the operation will not be reversed.Facebook has 4500 database servers
  10. All solutions are slightly differentSame challenge in 5 years may have a totally different solution (hardware/software changes)
  11. Need fresh ideas – otherwise well copy the mistakes of others