This document discusses principles for scaling social networking sites. It outlines three key principles: fast feature delivery through rapid development languages like PHP and Ruby, caching everything everywhere using distributed caches like Memcached, and moving away from relational databases towards non-relational databases due to issues with normalization and transactions that prevent scaling. Examples are given of large social networks like Facebook and Flickr that serve massive amounts of queries, photos, and data daily without using relational databases.
Getting started with Web Scraping in PythonSatwik Kansal
All the necessary tricks, libraries, tools that a beginner should know to successfully scrape any site with python. Instead of covering on code I'm focusing more on developing an intuition in the reader so that he can decide intuitively what path to take.
Case study of collecting Pakistan census data for robust distribution and better availability. This deck discusses the problems faced while accessing public data in general, using this particular case.
A Top 10 Key to Success for Architects, delivered by author Pete Eeles, IBM, hosted on the "Good Design is Good Business" group on developerWorks: https://www.ibm.com/developerworks/mydeveloperworks/blogs/669242b1-dd91-4d63-a08f-231314c793bb/entry/top_10_success_secrets_for_software_architects_good_design_is_good_business_series?lang=en
IASA 2014 Conference - Cape Town, South Africa #iasa2014Karen Du Toit
Report back about attending the most recent International Association of Sound and Audiovisual Association Conference in Cape Town from 5 - 9 October 2014
Only possible to mention but a few of the papers that were read.
Information about the conference can be found here: http://2014.iasa-web.org/
Why IT needs more IT Architects (IASA style)Paddy Baxter
This is a deck I presented to IT leaders in the public sector. In it I explain the IASA definition of IT Architect and how in-house tech leaders can deliver substantially to their teams, IT and their whole organisation by focusing on the skills IASA defines as required for a high performing IT architect.
Getting started with Web Scraping in PythonSatwik Kansal
All the necessary tricks, libraries, tools that a beginner should know to successfully scrape any site with python. Instead of covering on code I'm focusing more on developing an intuition in the reader so that he can decide intuitively what path to take.
Case study of collecting Pakistan census data for robust distribution and better availability. This deck discusses the problems faced while accessing public data in general, using this particular case.
A Top 10 Key to Success for Architects, delivered by author Pete Eeles, IBM, hosted on the "Good Design is Good Business" group on developerWorks: https://www.ibm.com/developerworks/mydeveloperworks/blogs/669242b1-dd91-4d63-a08f-231314c793bb/entry/top_10_success_secrets_for_software_architects_good_design_is_good_business_series?lang=en
IASA 2014 Conference - Cape Town, South Africa #iasa2014Karen Du Toit
Report back about attending the most recent International Association of Sound and Audiovisual Association Conference in Cape Town from 5 - 9 October 2014
Only possible to mention but a few of the papers that were read.
Information about the conference can be found here: http://2014.iasa-web.org/
Why IT needs more IT Architects (IASA style)Paddy Baxter
This is a deck I presented to IT leaders in the public sector. In it I explain the IASA definition of IT Architect and how in-house tech leaders can deliver substantially to their teams, IT and their whole organisation by focusing on the skills IASA defines as required for a high performing IT architect.
Iasa Architect responsibilities in the cloudiasaglobal
Cloud platforms drive marketing campaigns that offer to simplify the hardest challenges of information technology. From resilience to scalability, disaster recovery to management, the cloud platforms offer to take the challenge off of the table forever! It can be easy to ?buy in? to the platform. Too often, we find out later that our responsibility as architects cannot ?end at the door? to the provider, that there are provisos and implementation considerations we discover ? often after the provider falls down.
Title: The Role of the Software Architect
Speaker: Hayim Makabee, co-founder of the Israeli Chapter of the International Association of Software Architects (IASA)
Abstract:
In this talk Hayim will present the practical aspects of the role of the Software Architect, including:
- The four areas of expertise: Design, Domain, Technology and Methodology.
- The cooperation with stakeholders: Developers, Team Leaders, Project Managers, QA and Technical Writers.
Understanding the expected areas of expertise is essential for the architect to develop his/her professional skills.
Understanding how to cooperate with the diverse stakeholders is essential to improve the architect's impact and effectiveness.
Are You an Accidental or Intention Software ArchitectRandy Ynchausti
This presentation challenges viewers to consider what knowledge body and skills base a professional software architect possesses. It was presented originally at the UT IASA Chapter meeting November 21, 2013.
This lecture describes the Platform model or Two-sided Markets. Platforms serve multiple customer groups and benefit from network effects that take place with and between those groups. Businesses based on Platforms are able to adopt innovative pricing structures in which one side subsidizes another. When the marginal costs are near zero it can be practical to drop the subsidized price all the way to zero.
Structured Approach to Solution ArchitectureAlan McSweeney
The role of solution architecture is to identify answer to a business problem and set of solution options and their components. There will be many potential solutions to a problem with varying degrees of suitability to the underlying business need. Solution options are derived from a combination of Solution Architecture Dimensions/Views which describe characteristics, features, qualities, requirements and Solution Design Factors, Limitations And Boundaries which delineate limitations. Use of structured approach can assist with solution design to create consistency. The TOGAF approach to enterprise architecture can be adapted to perform some of the analysis and design for elements of Solution Architecture Dimensions/Views.
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
1) Introduction to the key Big Data concepts
1.1 The Origins of Big Data
1.2 What is Big Data ?
1.3 Why is Big Data So Important ?
1.4 How Is Big Data Used In Practice ?
2) Introduction to the key principles of Big Data Systems
2.1 How to design Data Pipeline in 6 steps
2.2 Using Lambda Architecture for big data processing
3) Practical case study : Chat bot with Video Recommendation Engine
4) FAQ for student
Talk on Data Discovery and Metadata by Mark Grover from July 2019.
Goes into detail of the problem, build/buy/adopt analysis and Lyft's solution - Amundsen, along with thoughts on the future.
Performance Acceleration: Summaries, Recommendation, MPP and moreDenodo
Watch full webinar here: https://bit.ly/3nLHayP
Performance is critical for an organization across the board. Developers can optimize execution with Summaries, MPP, Data Movement, and more. Business users rely on the Recommendation engine to guide them to the right data. Let’s discover and learn about various performance acceleration techniques in this session.
PyConline AU 2021 - Things might go wrong in a data-intensive applicationHua Chu
We are going to go behind the scene of building a data-intensive system. The story includes challenges I have faced and what I learned from those incidents.
https://2021.pycon.org.au/program/8hlvvs/
Minerva is a storage plugin of Drill that connects IPFS's decentralized storage and Drill's flexible query engine. Any data file stored on IPFS can be easily accessed from Drill's query interface, just like a file stored on a local disk.
Visit https://github.com/bdchain/Minerva to learn more and try it out!
Eliminating the Problems of Exponential Data Growth, Foreverspectralogic
Balancing explosive data growth while addressing the need for extended data protection is mandatory for any IT department. But customers today find it difficult to address these challenges because of the software management layers and tools required in order to meet longer retention mandates. While exponential data growth is not a new problem, the quandary that IT faces in 2014, now has a new solution.
Join Spectra and IDC as we identify the greatest dilemmas facing data centers in 2014, and explore the capabilities of Spectra’s newest product, the BlackPearl™ Deep Storage Appliance. During this brief webinar, attendees will learn about:
-A situation analysis of today’s software-defined data center
-How moving to an “elastic” data center enables more cost-effective and efficient data management
-Emerging technologies and key strategies to store and manage data indefinitely
Speaker: Philippe Mizrahi - Associate Product Manager - Lyft
Abstract: Philippe Mizrahi works on Lyft’s data discovery and metadata engine, Amundsen. With the help of a Neo4j graph database, Amundsen has improved Lyft’s data discovery by reducing time to discover data by 10x.
During this session, Philippe will dive deep into Amundsen’s use cases, impact, and architecture, which effectively combines a comprehensive knowledge graph based upon Neo4j, centralized metadata and other search ranking optimizations to discover data quickly.
Key aspects of big data storage and its architectureRahul Chaturvedi
This paper helps understand the tools and technologies related to a classic BigData setting. Someone who reads this paper, especially Enterprise Architects, will find it helpful in choosing several BigData database technologies in a Hadoop architecture.
Iasa Architect responsibilities in the cloudiasaglobal
Cloud platforms drive marketing campaigns that offer to simplify the hardest challenges of information technology. From resilience to scalability, disaster recovery to management, the cloud platforms offer to take the challenge off of the table forever! It can be easy to ?buy in? to the platform. Too often, we find out later that our responsibility as architects cannot ?end at the door? to the provider, that there are provisos and implementation considerations we discover ? often after the provider falls down.
Title: The Role of the Software Architect
Speaker: Hayim Makabee, co-founder of the Israeli Chapter of the International Association of Software Architects (IASA)
Abstract:
In this talk Hayim will present the practical aspects of the role of the Software Architect, including:
- The four areas of expertise: Design, Domain, Technology and Methodology.
- The cooperation with stakeholders: Developers, Team Leaders, Project Managers, QA and Technical Writers.
Understanding the expected areas of expertise is essential for the architect to develop his/her professional skills.
Understanding how to cooperate with the diverse stakeholders is essential to improve the architect's impact and effectiveness.
Are You an Accidental or Intention Software ArchitectRandy Ynchausti
This presentation challenges viewers to consider what knowledge body and skills base a professional software architect possesses. It was presented originally at the UT IASA Chapter meeting November 21, 2013.
This lecture describes the Platform model or Two-sided Markets. Platforms serve multiple customer groups and benefit from network effects that take place with and between those groups. Businesses based on Platforms are able to adopt innovative pricing structures in which one side subsidizes another. When the marginal costs are near zero it can be practical to drop the subsidized price all the way to zero.
Structured Approach to Solution ArchitectureAlan McSweeney
The role of solution architecture is to identify answer to a business problem and set of solution options and their components. There will be many potential solutions to a problem with varying degrees of suitability to the underlying business need. Solution options are derived from a combination of Solution Architecture Dimensions/Views which describe characteristics, features, qualities, requirements and Solution Design Factors, Limitations And Boundaries which delineate limitations. Use of structured approach can assist with solution design to create consistency. The TOGAF approach to enterprise architecture can be adapted to perform some of the analysis and design for elements of Solution Architecture Dimensions/Views.
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
1) Introduction to the key Big Data concepts
1.1 The Origins of Big Data
1.2 What is Big Data ?
1.3 Why is Big Data So Important ?
1.4 How Is Big Data Used In Practice ?
2) Introduction to the key principles of Big Data Systems
2.1 How to design Data Pipeline in 6 steps
2.2 Using Lambda Architecture for big data processing
3) Practical case study : Chat bot with Video Recommendation Engine
4) FAQ for student
Talk on Data Discovery and Metadata by Mark Grover from July 2019.
Goes into detail of the problem, build/buy/adopt analysis and Lyft's solution - Amundsen, along with thoughts on the future.
Performance Acceleration: Summaries, Recommendation, MPP and moreDenodo
Watch full webinar here: https://bit.ly/3nLHayP
Performance is critical for an organization across the board. Developers can optimize execution with Summaries, MPP, Data Movement, and more. Business users rely on the Recommendation engine to guide them to the right data. Let’s discover and learn about various performance acceleration techniques in this session.
PyConline AU 2021 - Things might go wrong in a data-intensive applicationHua Chu
We are going to go behind the scene of building a data-intensive system. The story includes challenges I have faced and what I learned from those incidents.
https://2021.pycon.org.au/program/8hlvvs/
Minerva is a storage plugin of Drill that connects IPFS's decentralized storage and Drill's flexible query engine. Any data file stored on IPFS can be easily accessed from Drill's query interface, just like a file stored on a local disk.
Visit https://github.com/bdchain/Minerva to learn more and try it out!
Eliminating the Problems of Exponential Data Growth, Foreverspectralogic
Balancing explosive data growth while addressing the need for extended data protection is mandatory for any IT department. But customers today find it difficult to address these challenges because of the software management layers and tools required in order to meet longer retention mandates. While exponential data growth is not a new problem, the quandary that IT faces in 2014, now has a new solution.
Join Spectra and IDC as we identify the greatest dilemmas facing data centers in 2014, and explore the capabilities of Spectra’s newest product, the BlackPearl™ Deep Storage Appliance. During this brief webinar, attendees will learn about:
-A situation analysis of today’s software-defined data center
-How moving to an “elastic” data center enables more cost-effective and efficient data management
-Emerging technologies and key strategies to store and manage data indefinitely
Speaker: Philippe Mizrahi - Associate Product Manager - Lyft
Abstract: Philippe Mizrahi works on Lyft’s data discovery and metadata engine, Amundsen. With the help of a Neo4j graph database, Amundsen has improved Lyft’s data discovery by reducing time to discover data by 10x.
During this session, Philippe will dive deep into Amundsen’s use cases, impact, and architecture, which effectively combines a comprehensive knowledge graph based upon Neo4j, centralized metadata and other search ranking optimizations to discover data quickly.
Key aspects of big data storage and its architectureRahul Chaturvedi
This paper helps understand the tools and technologies related to a classic BigData setting. Someone who reads this paper, especially Enterprise Architects, will find it helpful in choosing several BigData database technologies in a Hadoop architecture.
Watch full webinar here: https://bit.ly/3JlhTnT
In the last few years, Data Virtualization technology has experienced tremendous growth, emerging as a key component for enabling modern data architectures such as the logical data warehouse, data fabric, and data mesh.
Gartner recently named it “a must-have data integration component” and estimated that it results in 45% cost savings in data integration, while Forrester has estimated 65% faster data delivery than ETL processes.
However, there are still misconceptions in the market about data virtualization technology, how it can be leveraged, and the real benefits that it can provide.
Catch this on-demand session where we review these misconceptions and discuss:
- What data virtualization is and what it is not
- Key capabilities of a modern data virtualization platform
- How to leverage data virtualization for faster data delivery
During this webinar, we will review best practices and lessons learned from working with large and mid-size companies on their deployment of PostgreSQL. We will explore the practices that helped industry leaders move through these stages quickly, and get as much value out of PostgreSQL as possible without incurring undue risk.
We have identified a set of levers that companies can use to accelerate their success with PostgreSQL:
- Application Tiering
- Collaboration between DBAs and Development Teams
- Evangelizing
- Standardization and Automation
- Balance of Migration and New Development
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari
Everybody wants to go on the “Big Data” hype cycle, “To do Scale”, to use the coolest tools in the market like Hadoop, Apache Spark, Apache Cassandra, etc.
But do they ask themselves is there really a reason for that?
In the talk we’ll make a brief overview to all of the technologies in the Big Data world nowadays and we’ll talk about the problems that really emerge when you’d like to enter the great world of Big Data handling.
Showing you the Hadoop ecosystem and Apache Spark and all of the distributed tools leading the market today, will give you all a notion of what will be the real costs entering that world.
Promise that I’ll share some stories from the trenches :)
(And about the “pool” thing...I don’t really know how to swim)
This deck talks about the basic overview of NoSQL technologies, implementation vendors/products, case studies, and some of the core implementation algorithms. The presentation also describes a quick overview of "Polyglot Persistency", "NewSQL" like emerging trends.
The deck is targeted to beginners who wants to get an overview of NoSQL databases.
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
Watch here: https://bit.ly/2NGQD7R
In an era increasingly dominated by advancements in cloud computing, AI and advanced analytics it may come as a shock that many organizations still rely on data architectures built before the turn of the century. But that scenario is rapidly changing with the increasing adoption of real-time data virtualization - a paradigm shift in the approach that organizations take towards accessing, integrating, and provisioning data required to meet business goals.
As data analytics and data-driven intelligence takes centre stage in today’s digital economy, logical data integration across the widest variety of data sources, with proper security and governance structure in place has become mission-critical.
Attend this session to learn:
- Learn how you can meet cloud and data science challenges with data virtualization.
- Why data virtualization is increasingly finding enterprise-wide adoption
- Discover how customers are reducing costs and improving ROI with data virtualization
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
3. How do sites with a social networking angle figure globally? 3 As ranked by Alexa Site Global ranking Facebook 2 YouTube 3 Yahoo 4 Windows Live 5 Blogger 7 Wikipedia 8 Twitter 10
4. 3 Principles 4 3 common principles Fast feature delivery is key Cache everything everywhere Relational data is dead
5. Interesting stats 5 Facebook - Serve 120 million queries per second without a single join 37 Signals - Developed a production application serving over 4 million items using only 579 lines of code Flickr - 2 Billion photos served without using relational databases
6. How did they do it? 6 Nobody thought this was possible Unencumbered by history or restrictive rules Had to be creative in solving problems that nobody had experienced using very little capital outlay
7. 3 Principles 7 Fast feature delivery is key Cache everything everywhere Relational data is dead
8. Fast feature delivery is key 8 Choose an appropriate language Speed of development more important than speed of execution Languages like PHP and Ruby commonly used for rapid development and deployment
10. 3 Principles 10 Fast feature delivery is key Cache everything everywhere Relational data is dead
11. Cache everything everywhere 11 You need a really good reason not to cache data for reading Local caching a good start but more than one server means duplicating the cache no group invalidation memory limited to how much spare RAM on the server Most social networks use a distributed cache like memcached
12. Cache everything everywhere 12 Check if the information is in the cache. If so, use it If not, query the database put the result in the cache On update delete from the cache. The next user goes to the database function get_foo(int userid) { result = memcached_fetch("userrow:" + userid); if (!result) { result = db_select("SELECT * FROM users WHERE userid = ?", userid); memcached_add("userrow:" + userid, result); } return result;
16. Relational issue No 1 - Normalisation 16 Relational databases do not scale well because of normalisation Why normalise? - reduce storage space - reduce anomalies Today - storage is cheap - as data gets larger, joins are expensive
17. Relational issue No 2 - Transactions 17 ACID principles govern transactions Relational databases do not scale well because of transactions
18. After relational 18 Use BASE (basically available, soft state, eventually consistent) Shard Data Favour Name value pair stores over relational databases
19. Lessons for enterprise 19 Design of software should always be it depends. Test your most basic assumptions Dynamic languages and frameworks may be suitable to deliver a feature quickly You don't need an RDBMS for everything, especially if you need huge scale You should always cache data for read (unless you shouldn’t)
Looked at top 10 sites on the web found 7 with social networking aspectsOther:Google 1Baidu 6QQ.com 9
Decided to look at the traffic and found some very interesting statsFacebook – 200 million active users & 50 billion page views per monthYouTube – over 1 billion views per dayBasecamp – 2 million active accounts & 1.3 million projects managedTwitter – 1 Million + users & 3 million tweets per day
It should be noted that neither are the most efficient languages as they are not compiled (both are interpreted languages, they are not directly executed by the CPU but executed by an interpreter)Sites like Twitter and Yellowpages.com are written using Ruby on Rails. Tada list – has so much build into the framework that a full production app can be developed with very little code.
Some treat language as a religion, its ok to try something different, it doesn’t define you as a person.
Duplicating the cache is a waste of memoryNo group invalidation means you either need to notify all of your servers that they need to refresh their cache or rely solely on cache timeouts.a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.Memcached is used by: Facebook, YouTube, Wikipedia, LiveJournal, Digg, Twitter, SourceForgeMost site founders said that the biggest gain was from implementing a caching layer
There is a significant penalty in going to disk to read every time as opposed to reading from the cache.Implementing a cache is extremely easy, as shown by the code aboveGreat for reading data, but you still have to write data
All about responsivnessUsers wont tolerate long waits on social networksThey are now expecting this behaviour from all software
To prevent anomalies we don't duplicate data. We split everything up so it is stored once. The price of normalization is that when we want a person's address we have to go find the person and their address and bring the data together again. This is called a join. Joins are relatively slow, especially over very large data sets. Not just for reads (caching takes care of this) but for CUD.Flickr decided to denormalize because it took 13 Selects to each Insert, Delete or Update.
eBay do not use transactions, they have so much data that distributed transactions would harm responsiveness. Referential integrity and sorting are done in application code.Atomicity - all parts of a transaction succeed or none of then succeed.Consistency - The database will be in a consistent state when the transaction begins and ends.Isolation - The transaction will behave as if it is the only operation being performed upon the database.Durability - Upon completion of the transaction, the operation will not be reversed.Facebook has 4500 database servers
All solutions are slightly differentSame challenge in 5 years may have a totally different solution (hardware/software changes)
Need fresh ideas – otherwise well copy the mistakes of others