Facebook uses a LAMP stack with additional services and customizations for its architecture. The LAMP stack includes PHP, MySQL, and Memcache. PHP is used for its web programming capabilities but has scaling issues. MySQL is used primarily as a key-value store and is optimized for recency of data. Memcache provides high-performance caching. Additional services are created using Thrift for tasks better suited to other languages. Services include News Feed, Search, and logging via Scribe. The architecture emphasizes scalability, simplicity, and optimizing for common use cases.
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
We have covered the need for CDC and the benefits of building a CDC pipeline. We will compare various CDC streaming and reconciliation frameworks. We will also cover the architecture and the challenges we faced while running this system in the production. Finally, we will conclude the talk by covering Apache Hudi, Schema Registry and Debezium in detail and our contributions to the open-source community.
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
We have covered the need for CDC and the benefits of building a CDC pipeline. We will compare various CDC streaming and reconciliation frameworks. We will also cover the architecture and the challenges we faced while running this system in the production. Finally, we will conclude the talk by covering Apache Hudi, Schema Registry and Debezium in detail and our contributions to the open-source community.
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
Apache Sqoop is a tool designed for efficiently transferring bulk data between Hadoop and structured datastores such as relational databases. This slide deck aims at familiarizing the user with Sqoop and how to effectively use it in real deployments.
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a new breed of messaging system built for the "big data" world. Coming out of LinkedIn (and donated to Apache), it is a distributed pub/sub system built in Scala. It has been an Apache TLP now for several months with the first Apache release imminent. Built for speed, scalability, and robustness, Kafka should definitely be one of the data tools you consider when designing distributed data-oriented applications.
The talk will cover a general overview of the project and technology, with some use cases, and a demo.
DNS is critical network infrastructure and securing it against attacks like DDoS, NXDOMAIN, hijacking and Malware/APT is very important to protecting any business.
Presentation about facebook's chat architecture which explains how erlang was used to build a high scaling chat. This is an extract from - http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf
Note that facebook moved out of erlang as it was difficult to find the programmers and a few other small problems that forced to reimplement this with C++ on thrift.
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Databricks
"Continuous applications" supported by Apache Spark's Structured Streaming API enable real-time decision making in the areas such as IoT, AI, fraud mitigation, personalized experience, etc. All continuous applications have one thing in common: they collect data from various sources (devices in IoT, for example), process them in real-time (example: ETL), and deliver them to machine learning serving layer for decision making. Continuous applications face many challenges as they grow to production. Often, due to the rapid increase in the number devices or end-users or other data sources, the size of their data set grows exponentially. This results in a backlog of data to be processed. The data will no longer be processed in near-real-time. Redis, the open-source, in-memory database offers many options to handle this situation in a cost-effective manner. First and foremost, you could insert Redis into an existing continuous application without disrupting its architecture, and with minimal code changes. Redis, being in-memory, allows over a million writes per second with sub-millisecond latency. The Redis Stream data structure enables you to collect both binary and text data in the time series format. The consumer groups of Redis Stream help you match the data processing rate of your continuous application with the rate of data arrival from various sources. In this session, I will perform a live demonstration of how to integrate a continuous application using Apache Spark's Structured Streaming API with open source Redis. I will also walk through the code, and run a live IoT continuous application.
Speaker: Roshan Kumar
The tech talk was gieven by Ranjeeth Kathiresan, Salesforce Senior Software Engineer & Gurpreet Multani, Salesforce Principal Software Engineer in June 2017.
Content Management with MongoDB by Mark HelmstetterMongoDB
MongoDB is great for content management and delivery across a multitude of apps such as e-commerce websites, online publications, web content management systems (CMS), document management, archives and others. MongoDB's flexible schema and data model make it easy to catalog multiple content types with diverse meta data.
-Schema design for content management
-Using GridFS for storing binary files
-How you can leverage MongoDB's auto-sharding to partition your content across multiple servers
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
Apache Sqoop is a tool designed for efficiently transferring bulk data between Hadoop and structured datastores such as relational databases. This slide deck aims at familiarizing the user with Sqoop and how to effectively use it in real deployments.
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a new breed of messaging system built for the "big data" world. Coming out of LinkedIn (and donated to Apache), it is a distributed pub/sub system built in Scala. It has been an Apache TLP now for several months with the first Apache release imminent. Built for speed, scalability, and robustness, Kafka should definitely be one of the data tools you consider when designing distributed data-oriented applications.
The talk will cover a general overview of the project and technology, with some use cases, and a demo.
DNS is critical network infrastructure and securing it against attacks like DDoS, NXDOMAIN, hijacking and Malware/APT is very important to protecting any business.
Presentation about facebook's chat architecture which explains how erlang was used to build a high scaling chat. This is an extract from - http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf
Note that facebook moved out of erlang as it was difficult to find the programmers and a few other small problems that forced to reimplement this with C++ on thrift.
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Databricks
"Continuous applications" supported by Apache Spark's Structured Streaming API enable real-time decision making in the areas such as IoT, AI, fraud mitigation, personalized experience, etc. All continuous applications have one thing in common: they collect data from various sources (devices in IoT, for example), process them in real-time (example: ETL), and deliver them to machine learning serving layer for decision making. Continuous applications face many challenges as they grow to production. Often, due to the rapid increase in the number devices or end-users or other data sources, the size of their data set grows exponentially. This results in a backlog of data to be processed. The data will no longer be processed in near-real-time. Redis, the open-source, in-memory database offers many options to handle this situation in a cost-effective manner. First and foremost, you could insert Redis into an existing continuous application without disrupting its architecture, and with minimal code changes. Redis, being in-memory, allows over a million writes per second with sub-millisecond latency. The Redis Stream data structure enables you to collect both binary and text data in the time series format. The consumer groups of Redis Stream help you match the data processing rate of your continuous application with the rate of data arrival from various sources. In this session, I will perform a live demonstration of how to integrate a continuous application using Apache Spark's Structured Streaming API with open source Redis. I will also walk through the code, and run a live IoT continuous application.
Speaker: Roshan Kumar
The tech talk was gieven by Ranjeeth Kathiresan, Salesforce Senior Software Engineer & Gurpreet Multani, Salesforce Principal Software Engineer in June 2017.
Content Management with MongoDB by Mark HelmstetterMongoDB
MongoDB is great for content management and delivery across a multitude of apps such as e-commerce websites, online publications, web content management systems (CMS), document management, archives and others. MongoDB's flexible schema and data model make it easy to catalog multiple content types with diverse meta data.
-Schema design for content management
-Using GridFS for storing binary files
-How you can leverage MongoDB's auto-sharding to partition your content across multiple servers
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Large-scale projects development (scaling LAMP)Alexey Rybak
This 8-hours tutorial was given at various conferences including Percona conference (London), DevConf (Moscow), Highload++ (Moscow).
ABSTRACT
During this tutorial we will cover various topics related to high scalability for the LAMP stack. This workshop is divided into three sections.
The first section covers basic principles of shared nothing architectures and horizontal scaling for the app//cache/database tiers.
Section two of this tutorial is devoted to MySQL sharding techniques, queues and a few performance-related tips and tricks.
In section three we will cover the practical approach for measuring site performance and quality, porviding a "lean" support philosophy, connecting buesiness and technology metrics.
In addition we will cover a very useful Pinba real-time statistical server, it's features and various use cases. All of the sections will be based on real-world examples built in Badoo, one of the biggest dating sites on the Internet.
An Engine to process big data in faster(than MR), easy and extremely scalable way. An Open Source, parallel, in-memory processing, cluster computing framework. Solution for loading, processing and end to end analyzing large scale data. Iterative and Interactive : Scala, Java, Python, R and with Command line interface.
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
YapMap is a new kind of search platform that does multi-quanta search to better understand threaded discussions. This talk will cover how HBase made it possible for two self-funded guys to build a new kind of search platform. We will discuss our data model and how we use row based atomicity to manage parallel data integration problems. We’ll also talk about where we don’t use HBase and instead use a traditional SQL based infrastructure. We’ll cover the benefits of using MapReduce and HBase for index generation. Then we’ll cover our migration of some tasks from a message based queue to the Coprocessor framework as well as our future Coprocessor use cases. Finally, we’ll talk briefly about our operational experience with HBase, our hardware choices and challenges we’ve had.
Summary of recent progress on Apache Drill, an open-source community-driven project to provide easy, dependable, fast and flexible ad hoc query capabilities.
NoSQL is not a buzzword anymore. The array of non- relational technologies have found wide-scale adoption even in non-Internet scale focus areas. With the advent of the Cloud...the churn has increased even more yet there is no crystal clear guidance on adoption techniques and architectural choices surrounding the plethora of options available. This session initiates you into the whys & wherefores, architectural patterns, caveats and techniques that will augment your decision making process & boost your perception of architecting scalable, fault-tolerant & distributed solutions.
This is from a 2 hour talk introducing in-memory databases. First a look at traditional RDBMS architecture and some of it's limitations, then a look at some in-memory products and finally a closer look at OrigoDB, the open source in-memory database toolkit for NET/Mono.
In this session, we'll discuss architectural, design and tuning best practices for building rock solid and scalable Alfresco Solutions. We'll cover the typical use cases for highly scalable Alfresco solutions, like massive injection and high concurrency, also introducing 3.3 and 3.4 Transfer / Replication services for building complex high availability enterprise architectures.
Turbocharging php applications with zend server (workshop)Eric Ritchie
Zend Server is best known for its robust monitoring toolset. But what good is a monitoring toolset if you don't have the tools to fix the issues that come up? In this tutorial we will go over how you can discover where performance issues are occuring in your application and how you can implement fixes using various performance features in our flagship product.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
4. At a Glance
The Social Graph
120M+ active users
50B+ PVs per month
10B+ Photos
1B+ connections
50K+ Platform Apps
400K+ App Developers
5. General Design Principles
▪ Use open source where possible
▪ Explore making optimizations where needed
▪ Unix Philosophy
▪ Keep individual components simple yet performant
▪ Combine as necessary
▪ Concentrate on clean interface points
▪ Build everything for scale
▪ Try to minimize failure points
▪ Simplicity, Simplicity, Simplicity!
7. PHP
▪ Good web programming language
▪ Extensive library support for web development
▪ Active developer community
▪ Good for rapid iteration
▪ Dynamically typed, interpreted scripting language
8. PHP: What we Learnt
▪ Tough to scale for large code bases
▪ Weak typing
▪ Limited opportunities for static analysis, code optimizations
▪ Not necessarily optimized for large website use case
▪ E.g. No dynamic reloading of files on web server
▪ Linearly increasing cost per included file
▪ Extension framework is difficult to use
10. MySQL
▪ Fast, reliable
▪ Used primarily as <key,value> store
▪ Data randomly distributed amongst large set of logical instances
▪ Most data access based on global id
▪ Large number of logical instances spread out across physical nodes
▪ Load balancing at physical node level
▪ No read replication
11. MySQL: What We Learnt (ing)
▪ Logical migration of data is very difficult
▪ Create a large number of logical dbs, load balance them over varying
number of physical nodes
▪ No joins in production
▪ Logically difficult (because data is distributed randomly)
▪ Easier to scale CPU on web tier
12. MySQL: What we Learnt (ing)
▪ Most data access is for recent data
▪ Optimize table layout for recency
▪ Archive older data
▪ Don’t ever store non-static data in a central db
▪ CDB makes it easier to perform certain aggregated queries
▪ Will not scale
▪ Use services or memcache for global queries
▪ E.g.: What are the most popular groups in my network
13. MySQL: Customizations
▪ No extensive native MySQL modifications
▪ Custom partitioning scheme
▪ Global id assigned to all data
▪ Custom archiving scheme
▪ Based on frequency and recency of data on a per-user basis
▪ Extended Query Engine for cross-data center replication, cache
consistency
14. MySQL: Customizations
▪ Graph based data-access libraries
▪ Loosely typed objects (nodes) with limited datatypes (int, varchar, text)
▪ Replicated connections (edges)
▪ Analogous to distributed foreign keys
▪ Some data collocated
▪ Example: User profile data and all of user’s connections
▪ Most data distributed randomly
15. Memcache
▪ High-Performance, distributed in-memory hash table
▪ Used to alleviate database load
▪ Primary form of caching
▪ Over 25TB of in-memory cache
▪ Average latency < 200 micro-seconds
▪ Cache serialized PHP data structures
▪ Lots and lots of multi-gets to retrieve data spanning across graph edges
16. Memache: Customizations
▪ Memache over UDP
▪ Reduce memory overhead of thousands of TCP connection buffers
▪ Application-level flow control (optimization for multi-gets)
▪ On demand aggregation of per-thread stats
▪ Reduces global lock contention
▪ Multiple Kernel changes to optimize for Memcache usage
▪ Distributing network interrupt handling over multiple cores
▪ Opportunistic polling of network interface
18. Under the Covers
▪ Get my profile data
▪ Fetch from cache, potentially go to my DB (based on user-id)
▪ Get friend connections
▪ Cache, if not DB (based on user-id)
▪ In parallel, fetch last 10 photo album ids for each of my friends
▪ Multi-get; individual cache misses fetches data from db (based on photo-
album id)
▪ Fetch data for most recent photo albums in parallel
▪ Execute page-specific rendering logic in PHP
▪ Return data, make user happy
20. LAMP is not Perfect
▪ PHP+MySQL+Memcache works for a large class of problems but not for
everything
▪ PHP is stateless
▪ PHP not the fastest executing language
▪ All data is remote
▪ Reasons why services are written
▪ Store code closer to data
▪ Compiled environment is more efficient
▪ Certain functionality only present in other languages
21. Services Philosophy
▪ Create a service iff required
▪ Real overhead for deployment, maintenance, separate code-base
▪ Another failure point
▪ Create a common framework and toolset that will allow for easier
creation of services
▪ Thrift
▪ Scribe
▪ ODS, Alerting service, Monitoring service
▪ Use the right language, library and tool for the task
23. Thrift
▪ Lightweight software framework for cross-language development
▪ Provide IDL, statically generate code
▪ Supported bindings: C++, PHP, Python, Java, Ruby, Erlang, Perl, Haskell
etc.
▪ Transports: Simple Interface to I/O
▪ Tsocket, TFileTransport, TMemoryBuffer
▪ Protocols: Serialization Format
▪ TBinaryProtocol, TJSONProtocol
▪ Servers
▪ Non-Blocking, Async, Single Threaded, Multi-threaded
24. Hasn’t this been done before? (yes.)
▪ SOAP
▪ XML, XML, and more XML
▪ CORBA
▪ Bloated? Remote bindings?
▪ COM
▪ Face-Win32ClientSoftware.dll-Book
▪ Pillar
▪ Slick! But no versioning/abstraction.
▪ Protocol Buffers
25. Thrift: Why?
• It’s quick. Really quick.
• Less time wasted by individual developers
• No duplicated networking and protocol code
• Less time dealing with boilerplate stuff
• Write your client and server in about 5 minutes
• Division of labor
• Work on high-performance servers separate from applications
• Common toolkit
• Fosters code reuse and shared tools
26. Scribe
▪ Scalable distributed logging framework
▪ Useful for logging a wide array of data
▪ Search Redologs
▪ Powers news feed publishing
▪ A/B testing data
▪ Weak Reliability
▪ More reliable than traditional logging but not suitable for database
transactions.
▪ Simple data model
▪ Built on top of Thrift
27. Other Tools
▪ SMC (Service Management Console)
▪ Centralized configuration
▪ Used to determine logical service -> physical node mapping
28. Other Tools
▪ ODS
▪ Used to log and view historical trends for any stats published by service
▪ Useful for service monitoring, alerting
29. Open Source
▪ Thrift
▪ http://developers.facebook.com/thrift/
▪ Scribe
▪ http://developers.facebook.com/scribe/
▪ PHPEmbed
▪ http://developers.facebook.com/phpembed/
▪ More good stuff
▪ http://developers.facebook.com/opensource.php
31. NewsFeed – The Work
friends’
actions
web tier Leaf Server
Html
PHP Actions (Scribe) Leaf Server
home.php Leaf Server
user
return Leaf Server
view state
view aggregators
state
storage friends’
actions?
aggregating...
- Most arrows indicate thrift calls ranking...
33. Search – The Work
Thrift
search tier
slave slave master slave
index index index index
user
web tier
Scribe live db
PHP change index
logs files
Indexing service
DB Tier
Updates