Facebook is a social networking website where users can post comments, share photographs and post links to news or other interesting content on the web, chat live, and watch short-form video. You can even order food on Facebook if that's what you want to do. Shared content can be made publicly accessible, or it can be shared only among a select group of friends or family, or with a single person
This document provides an introduction and overview of Apache Hadoop. It discusses how Hadoop provides the ability to store and analyze large datasets in the petabyte range across clusters of commodity hardware. It compares Hadoop to other systems like relational databases and HPC and describes how Hadoop uses MapReduce to process data in parallel. The document outlines how companies are using Hadoop for applications like log analysis, machine learning, and powering new data-driven business features and products.
This document provides an outline for a student talk on NoSQL databases. It introduces NoSQL databases and discusses their characteristics and uses. It then covers different types of NoSQL databases including key-value, column, document, and graph databases. Examples of specific NoSQL databases like MongoDB, Cassandra, HBase, Riak, and Neo4j are provided. The document also discusses concepts like CAP theorem, replication, sharding, and provides comparisons of different database types.
Course "Machine Learning and Data Mining" for the degree of Computer Engineering at the Politecnico di Milano. In in this lecture we overview the mining of data streams
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17spark-project
Slides from Tathagata Das's talk at the Spark Meetup entitled "Deep Dive with Spark Streaming" on June 17, 2013 in Sunnyvale California at Plug and Play. Tathagata Das is the lead developer on Spark Streaming and a PhD student in computer science in the UC Berkeley AMPLab.
A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system.
The document provides an overview of Hadoop and its ecosystem. It discusses the history and architecture of Hadoop, describing how it uses distributed storage and processing to handle large datasets across clusters of commodity hardware. The key components of Hadoop include HDFS for storage, MapReduce for processing, and an ecosystem of related projects like Hive, HBase, Pig and Zookeeper that provide additional functions. Advantages are its ability to handle unlimited data storage and high speed processing, while disadvantages include lower speeds for small datasets and limitations on data storage size.
The 'macro view' on Big Query:
We started with an overview, some typical uses and moved to project hierarchy, access control and security.
In the end we touch about tools and demos.
The document discusses YARN (Yet Another Resource Negotiator), which is the cluster resource management layer of Hadoop. It describes the limitations of the previous Hadoop 1.0 architecture where MapReduce was responsible for both data processing and resource management. YARN was created to address these limitations by separating resource management from data processing. It discusses the components of YARN including the Resource Manager, Node Manager, Containers, and Application Master. It also provides examples of workloads that can run on YARN beyond MapReduce and describes the YARN architecture and how applications run on the YARN framework.
This document provides an introduction and overview of Apache Hadoop. It discusses how Hadoop provides the ability to store and analyze large datasets in the petabyte range across clusters of commodity hardware. It compares Hadoop to other systems like relational databases and HPC and describes how Hadoop uses MapReduce to process data in parallel. The document outlines how companies are using Hadoop for applications like log analysis, machine learning, and powering new data-driven business features and products.
This document provides an outline for a student talk on NoSQL databases. It introduces NoSQL databases and discusses their characteristics and uses. It then covers different types of NoSQL databases including key-value, column, document, and graph databases. Examples of specific NoSQL databases like MongoDB, Cassandra, HBase, Riak, and Neo4j are provided. The document also discusses concepts like CAP theorem, replication, sharding, and provides comparisons of different database types.
Course "Machine Learning and Data Mining" for the degree of Computer Engineering at the Politecnico di Milano. In in this lecture we overview the mining of data streams
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17spark-project
Slides from Tathagata Das's talk at the Spark Meetup entitled "Deep Dive with Spark Streaming" on June 17, 2013 in Sunnyvale California at Plug and Play. Tathagata Das is the lead developer on Spark Streaming and a PhD student in computer science in the UC Berkeley AMPLab.
A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system.
The document provides an overview of Hadoop and its ecosystem. It discusses the history and architecture of Hadoop, describing how it uses distributed storage and processing to handle large datasets across clusters of commodity hardware. The key components of Hadoop include HDFS for storage, MapReduce for processing, and an ecosystem of related projects like Hive, HBase, Pig and Zookeeper that provide additional functions. Advantages are its ability to handle unlimited data storage and high speed processing, while disadvantages include lower speeds for small datasets and limitations on data storage size.
The 'macro view' on Big Query:
We started with an overview, some typical uses and moved to project hierarchy, access control and security.
In the end we touch about tools and demos.
The document discusses YARN (Yet Another Resource Negotiator), which is the cluster resource management layer of Hadoop. It describes the limitations of the previous Hadoop 1.0 architecture where MapReduce was responsible for both data processing and resource management. YARN was created to address these limitations by separating resource management from data processing. It discusses the components of YARN including the Resource Manager, Node Manager, Containers, and Application Master. It also provides examples of workloads that can run on YARN beyond MapReduce and describes the YARN architecture and how applications run on the YARN framework.
Hadoop MapReduce is an open source framework for distributed processing of large datasets across clusters of computers. It allows parallel processing of large datasets by dividing the work across nodes. The framework handles scheduling, fault tolerance, and distribution of work. MapReduce consists of two main phases - the map phase where the data is processed key-value pairs and the reduce phase where the outputs of the map phase are aggregated together. It provides an easy programming model for developers to write distributed applications for large scale processing of structured and unstructured data.
This document discusses Hadoop, an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It describes how Hadoop uses HDFS for distributed storage and fault tolerance, YARN for resource management, and MapReduce for parallel processing of large datasets. It provides details on the architecture of HDFS including the name node, data nodes, and clients. It also explains the MapReduce programming model and job execution involving map and reduce tasks. Finally, it states that as data volumes continue rising, Hadoop provides an affordable solution for large-scale data handling and analysis through its distributed and scalable architecture.
The document discusses eBay's architecture and strategies for maintaining scalability and agility. It describes eBay's large scale, including billions of daily interactions. It also outlines eBay's transition to more automated, cloud-based infrastructure and a next generation service-oriented platform. This is intended to improve development productivity while allowing faster innovation and time-to-market through increased infrastructure and platform services.
Hadoop is an open-source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop automatically manages data replication and platform failure to ensure very large data sets can be processed efficiently in a reliable, fault-tolerant manner. Common uses of Hadoop include log analysis, data warehousing, web indexing, machine learning, financial analysis, and scientific applications.
This document provides an overview of Aneka, an open-source cloud computing platform. It discusses that Aneka allows developers to build applications that can run on private and public clouds using programming models like MapReduce. The key components of Aneka are its SDK that provides APIs and tools for application development, and a runtime engine that manages deployment and execution. Aneka provides features like scalability, heterogeneity, and cost savings. It can be deployed on various hardware and supports dynamic resource allocation and security.
This document provides an overview and introduction to NoSQL databases. It begins with an agenda that explores key-value, document, column family, and graph databases. For each type, 1-2 specific databases are discussed in more detail, including their origins, features, and use cases. Key databases mentioned include Voldemort, CouchDB, MongoDB, HBase, Cassandra, and Neo4j. The document concludes with references for further reading on NoSQL databases and related topics.
Google App Engine (GAE) is a platform as a service that allows developers to build and host web applications in Google's data centers. GAE applications are sandboxed and automatically scale based on traffic. GAE provides a computing environment with common web technologies, an admin console, scalable infrastructure, and SDK. It compares favorably to AWS with automatic scaling, large data storage, and programming language support, though developers must follow Google's policies and porting applications can be difficult. GAE offers cost savings, performance, and reliability though fees do apply for high resource usage.
Hive was initially developed by Facebook to manage large amounts of data stored in HDFS. It uses a SQL-like query language called HiveQL to analyze structured and semi-structured data. Hive compiles HiveQL queries into MapReduce jobs that are executed on a Hadoop cluster. It provides mechanisms for partitioning, bucketing, and sorting data to optimize query performance.
This document provides an overview of Apache Flink internals. It begins with an introduction and recap of Flink programming concepts. It then discusses how Flink programs are compiled into execution plans and executed in a pipelined fashion, as opposed to being executed eagerly like regular code. The document outlines Flink's architecture including the optimizer, runtime environment, and data storage integrations. It also covers iterative processing and how Flink handles iterations both by unrolling loops and with native iterative datasets.
This document discusses different architectures for big data systems, including traditional, streaming, lambda, kappa, and unified architectures. The traditional architecture focuses on batch processing stored data using Hadoop. Streaming architectures enable low-latency analysis of real-time data streams. Lambda architecture combines batch and streaming for flexibility. Kappa architecture avoids duplicating processing logic. Finally, a unified architecture trains models on batch data and applies them to real-time streams. Choosing the right architecture depends on use cases and available components.
The document discusses choosing between SQL and NoSQL databases. It covers the evolution of data architectures from traditional client-server models to newer distributed NoSQL solutions. It provides an overview of different data store types like SQL, NoSQL, key-value, document, column family, and graph databases. The document advises picking the right data model based on business needs, use cases, data storage requirements, and growth patterns then evaluating solutions based on pros and cons. It concludes that for large, growing data, both SQL and NoSQL solutions may be needed.
This document discusses AJAX (Asynchronous JavaScript and XML). It defines AJAX as a group of interrelated web development techniques used on the client-side to create interactive web applications. AJAX allows web pages to be updated asynchronously by exchanging small amounts of data with the server without reloading the entire page. The document outlines the technologies that power AJAX like HTML, CSS, XML, JavaScript, and XMLHttpRequest and how they work together to enable asynchronous updates on web pages.
Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of commodity hardware. It was created to support applications handling large datasets operating on many servers. Key Hadoop technologies include MapReduce for distributed computing, and HDFS for distributed file storage inspired by Google File System. Other related Apache projects extend Hadoop capabilities, like Pig for data flows, Hive for data warehousing, and HBase for NoSQL-like big data. Hadoop provides an effective solution for companies dealing with petabytes of data through distributed and parallel processing.
Kafka is an open-source distributed commit log service that provides high-throughput messaging functionality. It is designed to handle large volumes of data and different use cases like online and offline processing more efficiently than alternatives like RabbitMQ. Kafka works by partitioning topics into segments spread across clusters of machines, and replicates across these partitions for fault tolerance. It can be used as a central data hub or pipeline for collecting, transforming, and streaming data between systems and applications.
a simple presentation about different big data stream processing systems such as SPARK, SAMZA and STORM and the difference between their architectures and purpose, in addition we talk about streaming layers tools such as Kafka and rabbitMQ, this presentation refer to this paper
https://vsis-www.informatik.uni-hamburg.de/getDoc.php/publications/561/Real-time%20stream%20processing%20for%20Big%20Data.pdf and other useful links.
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
This face to face talk about Apache Flink in Sao Paulo, Brazil is the first event of its kind in Latin America! It explains how Apache Flink 1.0 announced on March 8th, 2016 by the Apache Software Foundation (link), marks a new era of Big Data analytics and in particular Real-Time streaming analytics. The talk maps Flink's capabilities to real-world use cases that span multiples verticals such as: Financial Services, Healthcare, Advertisement, Oil and Gas, Retail and Telecommunications.
In this talk, you learn more about:
1. What is Apache Flink Stack?
2. Batch vs. Streaming Analytics
3. Key Differentiators of Apache Flink for Streaming Analytics
4. Real-World Use Cases with Flink for Streaming Analytics
5. Who is using Flink?
6. Where do you go from here?
Lecture4 big data technology foundationshktripathy
The document discusses big data architecture and its components. It explains that big data architecture is needed when analyzing large datasets over 100GB in size or when processing massive amounts of structured and unstructured data from multiple sources. The architecture consists of several layers including data sources, ingestion, storage, physical infrastructure, platform management, processing, query, security, monitoring, analytics and visualization. It provides details on each layer and their functions in ingesting, storing, processing and analyzing large volumes of diverse data.
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
A talk on the use of Hadoop and Pig inside Twitter, focusing on the flexibility and simplicity of Pig, and the benefits of that for solving real-world big data problems.
Facebook[The Nuts and Bolts Technology]Koushik Reddy
This document provides a summary of a seminar presentation on Facebook and the nuts and bolts of the technology behind it. The presentation covered several key topics:
Languages used at Facebook including JavaScript, PHP, C++, Java, Python, Erlang, and Haskell. Databases used include MySQL, HBase, and Cassandra. Software and technologies discussed were Linux, Apache, Memcache, Haystack, BigPipe, Thrift, Scribe, and HipHop for PHP. The presentation provided details on how some of these technologies are applied at Facebook, such as Erlang for chat messaging, Haskell for spam detection, and HBase for photo storage.
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.Rishikese MR
The document provides an overview of Facebook's scalable architecture presented by Sharath Basil Kurian. It discusses how Facebook uses a variety of technologies like LAMP stack, PHP, Memcached, HipHop, Haystack, Scribe, Thrift, Hadoop and Hive to handle large amounts of user data and scale to support its massive user base. The architecture includes front-end components like PHP and BigPipe to dynamically render pages and back-end databases and caches like MySQL, Memcached and Haystack to efficiently store and retrieve user data.
Hadoop MapReduce is an open source framework for distributed processing of large datasets across clusters of computers. It allows parallel processing of large datasets by dividing the work across nodes. The framework handles scheduling, fault tolerance, and distribution of work. MapReduce consists of two main phases - the map phase where the data is processed key-value pairs and the reduce phase where the outputs of the map phase are aggregated together. It provides an easy programming model for developers to write distributed applications for large scale processing of structured and unstructured data.
This document discusses Hadoop, an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It describes how Hadoop uses HDFS for distributed storage and fault tolerance, YARN for resource management, and MapReduce for parallel processing of large datasets. It provides details on the architecture of HDFS including the name node, data nodes, and clients. It also explains the MapReduce programming model and job execution involving map and reduce tasks. Finally, it states that as data volumes continue rising, Hadoop provides an affordable solution for large-scale data handling and analysis through its distributed and scalable architecture.
The document discusses eBay's architecture and strategies for maintaining scalability and agility. It describes eBay's large scale, including billions of daily interactions. It also outlines eBay's transition to more automated, cloud-based infrastructure and a next generation service-oriented platform. This is intended to improve development productivity while allowing faster innovation and time-to-market through increased infrastructure and platform services.
Hadoop is an open-source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop automatically manages data replication and platform failure to ensure very large data sets can be processed efficiently in a reliable, fault-tolerant manner. Common uses of Hadoop include log analysis, data warehousing, web indexing, machine learning, financial analysis, and scientific applications.
This document provides an overview of Aneka, an open-source cloud computing platform. It discusses that Aneka allows developers to build applications that can run on private and public clouds using programming models like MapReduce. The key components of Aneka are its SDK that provides APIs and tools for application development, and a runtime engine that manages deployment and execution. Aneka provides features like scalability, heterogeneity, and cost savings. It can be deployed on various hardware and supports dynamic resource allocation and security.
This document provides an overview and introduction to NoSQL databases. It begins with an agenda that explores key-value, document, column family, and graph databases. For each type, 1-2 specific databases are discussed in more detail, including their origins, features, and use cases. Key databases mentioned include Voldemort, CouchDB, MongoDB, HBase, Cassandra, and Neo4j. The document concludes with references for further reading on NoSQL databases and related topics.
Google App Engine (GAE) is a platform as a service that allows developers to build and host web applications in Google's data centers. GAE applications are sandboxed and automatically scale based on traffic. GAE provides a computing environment with common web technologies, an admin console, scalable infrastructure, and SDK. It compares favorably to AWS with automatic scaling, large data storage, and programming language support, though developers must follow Google's policies and porting applications can be difficult. GAE offers cost savings, performance, and reliability though fees do apply for high resource usage.
Hive was initially developed by Facebook to manage large amounts of data stored in HDFS. It uses a SQL-like query language called HiveQL to analyze structured and semi-structured data. Hive compiles HiveQL queries into MapReduce jobs that are executed on a Hadoop cluster. It provides mechanisms for partitioning, bucketing, and sorting data to optimize query performance.
This document provides an overview of Apache Flink internals. It begins with an introduction and recap of Flink programming concepts. It then discusses how Flink programs are compiled into execution plans and executed in a pipelined fashion, as opposed to being executed eagerly like regular code. The document outlines Flink's architecture including the optimizer, runtime environment, and data storage integrations. It also covers iterative processing and how Flink handles iterations both by unrolling loops and with native iterative datasets.
This document discusses different architectures for big data systems, including traditional, streaming, lambda, kappa, and unified architectures. The traditional architecture focuses on batch processing stored data using Hadoop. Streaming architectures enable low-latency analysis of real-time data streams. Lambda architecture combines batch and streaming for flexibility. Kappa architecture avoids duplicating processing logic. Finally, a unified architecture trains models on batch data and applies them to real-time streams. Choosing the right architecture depends on use cases and available components.
The document discusses choosing between SQL and NoSQL databases. It covers the evolution of data architectures from traditional client-server models to newer distributed NoSQL solutions. It provides an overview of different data store types like SQL, NoSQL, key-value, document, column family, and graph databases. The document advises picking the right data model based on business needs, use cases, data storage requirements, and growth patterns then evaluating solutions based on pros and cons. It concludes that for large, growing data, both SQL and NoSQL solutions may be needed.
This document discusses AJAX (Asynchronous JavaScript and XML). It defines AJAX as a group of interrelated web development techniques used on the client-side to create interactive web applications. AJAX allows web pages to be updated asynchronously by exchanging small amounts of data with the server without reloading the entire page. The document outlines the technologies that power AJAX like HTML, CSS, XML, JavaScript, and XMLHttpRequest and how they work together to enable asynchronous updates on web pages.
Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of commodity hardware. It was created to support applications handling large datasets operating on many servers. Key Hadoop technologies include MapReduce for distributed computing, and HDFS for distributed file storage inspired by Google File System. Other related Apache projects extend Hadoop capabilities, like Pig for data flows, Hive for data warehousing, and HBase for NoSQL-like big data. Hadoop provides an effective solution for companies dealing with petabytes of data through distributed and parallel processing.
Kafka is an open-source distributed commit log service that provides high-throughput messaging functionality. It is designed to handle large volumes of data and different use cases like online and offline processing more efficiently than alternatives like RabbitMQ. Kafka works by partitioning topics into segments spread across clusters of machines, and replicates across these partitions for fault tolerance. It can be used as a central data hub or pipeline for collecting, transforming, and streaming data between systems and applications.
a simple presentation about different big data stream processing systems such as SPARK, SAMZA and STORM and the difference between their architectures and purpose, in addition we talk about streaming layers tools such as Kafka and rabbitMQ, this presentation refer to this paper
https://vsis-www.informatik.uni-hamburg.de/getDoc.php/publications/561/Real-time%20stream%20processing%20for%20Big%20Data.pdf and other useful links.
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
This face to face talk about Apache Flink in Sao Paulo, Brazil is the first event of its kind in Latin America! It explains how Apache Flink 1.0 announced on March 8th, 2016 by the Apache Software Foundation (link), marks a new era of Big Data analytics and in particular Real-Time streaming analytics. The talk maps Flink's capabilities to real-world use cases that span multiples verticals such as: Financial Services, Healthcare, Advertisement, Oil and Gas, Retail and Telecommunications.
In this talk, you learn more about:
1. What is Apache Flink Stack?
2. Batch vs. Streaming Analytics
3. Key Differentiators of Apache Flink for Streaming Analytics
4. Real-World Use Cases with Flink for Streaming Analytics
5. Who is using Flink?
6. Where do you go from here?
Lecture4 big data technology foundationshktripathy
The document discusses big data architecture and its components. It explains that big data architecture is needed when analyzing large datasets over 100GB in size or when processing massive amounts of structured and unstructured data from multiple sources. The architecture consists of several layers including data sources, ingestion, storage, physical infrastructure, platform management, processing, query, security, monitoring, analytics and visualization. It provides details on each layer and their functions in ingesting, storing, processing and analyzing large volumes of diverse data.
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
A talk on the use of Hadoop and Pig inside Twitter, focusing on the flexibility and simplicity of Pig, and the benefits of that for solving real-world big data problems.
Facebook[The Nuts and Bolts Technology]Koushik Reddy
This document provides a summary of a seminar presentation on Facebook and the nuts and bolts of the technology behind it. The presentation covered several key topics:
Languages used at Facebook including JavaScript, PHP, C++, Java, Python, Erlang, and Haskell. Databases used include MySQL, HBase, and Cassandra. Software and technologies discussed were Linux, Apache, Memcache, Haystack, BigPipe, Thrift, Scribe, and HipHop for PHP. The presentation provided details on how some of these technologies are applied at Facebook, such as Erlang for chat messaging, Haskell for spam detection, and HBase for photo storage.
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.Rishikese MR
The document provides an overview of Facebook's scalable architecture presented by Sharath Basil Kurian. It discusses how Facebook uses a variety of technologies like LAMP stack, PHP, Memcached, HipHop, Haystack, Scribe, Thrift, Hadoop and Hive to handle large amounts of user data and scale to support its massive user base. The architecture includes front-end components like PHP and BigPipe to dynamically render pages and back-end databases and caches like MySQL, Memcached and Haystack to efficiently store and retrieve user data.
Architecture Patterns - Open DiscussionNguyen Tung
This document provides an overview of software architecture fundamentals and patterns, with a focus on architectures for scalable systems. It discusses key quality attributes for architecture like performance, reliability, and scalability. Common patterns for scalable systems are described, including load balancing, map-reduce, and caching. The document also provides a detailed look at architectures used at Facebook, including the architectures for Facebook's website, chat service, and handling of big data. Key aspects of each system are summarized, including the technologies and design principles used.
How facebook works and function- a complete approachPrakhar Gethe
Facebook uses a variety of technologies to handle its massive scale, including PHP, C++, Java, Python, and custom technologies like FBML and XHP. It relies on databases like MySQL, Memcached, Haystack and Cassandra, and systems like Scribe, Presto, and Hadoop to store and retrieve massive amounts of user data and content. Technologies like Ajax, JSON, JavaScript, jQuery are used on the front-end to power Facebook's interactive features.
Web development refers to tasks associated with developing websites, including web design, content development, and client-side/server-side scripting. There are different types of web developers such as front-end developers who code the front-end using HTML, CSS, and JavaScript, and back-end developers who build the server-side logic using languages like PHP, Ruby, or Python. A web development stack typically includes a front-end framework, back-end programming language, database, and content management system. Popular stacks include LAMP (Linux, Apache, MySQL, PHP), LEMP (Linux, Nginx, MySQL, PHP), and MERN (MongoDB, Express, React, Node). Companies use different technologies
Facebook uses several technologies to handle large amounts of user data and traffic on its platform. These include cookies and caches to store and access frequently used data more quickly. Technologies like gzip compression reduce data transfer sizes. AJAX and JSON are used to asynchronously retrieve and send data to and from servers without interfering with page displays. XMPP messaging allows for real-time messaging between users. Large databases like HBase provide horizontal scalability and automatic failover. Zookeeper coordinates sharding and failover. Memcached alleviates database load. And Scribe aggregates log data in real-time from many servers. These technologies work together to allow Facebook to efficiently store, access, and analyze massive amounts of user data every day on its social media platform
Facebook deployed a new messaging application using Apache Hadoop and HBase to address high write throughput needs. This was due to the large volume of messages sent daily and the denormalized data model requiring multiple writes per message. Hadoop provided scalable storage through HDFS and HBase allowed for fast random lookups needed for the application. Enhancements were made to Hadoop and HBase to support the real-time requirements of this and other new applications at Facebook.
Data infrastructure at Facebook with reference to the conference paper " Data warehousing and analytics infrastructure at facebook"
Datewarehouse
Hadoop - Hive - scrive
LAMP is a shorthand term for a web application platform consisting of Linux, Apache, MySQL and one of Perl or PHP or Python. Together, these open-source tools provide a world-class platform for deploying web applications. LAMP has been touted as "the killer app" of the open-source world.
php with wordpress and mysql ppt by Naveen TokasNAVEEN TOKAS
This document discusses PHP, MySQL, and WordPress. It provides an overview of each:
PHP was created in 1994 and has evolved through several versions. It is a widely used open source scripting language that powers many popular websites. MySQL is a popular open source database that works well with PHP. WordPress is a free and open source content management system built with PHP and MySQL that powers over 60 million websites, making it the most popular blogging platform. It allows for easy publishing and management of content on the web.
Webinar - Windows Server 2016 for Nonprofits and Libraries - 2017-01-10TechSoup
Visit http://www.techsoup.org to access donated technology for nonprofits and libraries!
Learn about the features and functionality of Microsoft's Windows Server. You get a peek "under the hood" to see some of the newest features from Microsoft's principal program manager for the Windows Server program, Jeff Woolsey.
This document provides an overview of the LAMP web development stack, including its components (Linux, Apache, MySQL, PHP), why it is a popular choice for web applications, how each component works, how to implement a LAMP-based application, and the benefits of using LAMP such as ease of use, deployment and coding. It notes that LAMP is well-suited for applications that don't require large data exchanges or complex state maintenance. The conclusion reiterates that PHP, HTML, and databases will continue to dominate web design.
This document provides an overview of the LAMP web development stack, including its components (Linux, Apache, MySQL, PHP), why it is a popular choice for building web applications, how each component works together, and the benefits it provides such as ease of use, scalability, and local development. Some key points are that LAMP allows for quick development of data-driven web applications, uses open source tools, and provides a low-cost way to deploy websites and applications.
LAMP is a shorthand term for a popular open source web development platform consisting of Linux, Apache, MySQL, and PHP (LAMP). Together, these components provide a robust, scalable, and secure environment for building dynamic websites. LAMP has gained widespread adoption as it offers a full-stack solution that is free, flexible, and powerful enough to support many enterprise applications.
Facebook uses several technologies to handle large amounts of user data and traffic efficiently:
1. Cookies, caches, Gzip compression, and AJAX/JSON help speed up data transfer and front-end performance.
2. Large-scale data storage uses HBase and Haystack for elastic distributed storage, and Zookeeper for coordination during sharding and failover.
3. Additional technologies include Memcached for caching, and Scribe for centralized log aggregation to analyze site performance.
Presented on Tuesday, August 7, at the 2018 LRCN (Librarians' Registration Council of Nigeria) National Workshop on Electronic Resource Management Systems in Libraries, held at the University of Nigeria, Nsukka, Enugu State, Nigeria
This file contains full report of online fitness gym.And it was prepare by Abhishek, Saurav and Jitendra. If any query please contact at abhishek96patel@gmail.com
New ICT Trends and Issues of LibrarianshipLiaquat Rahoo
The document summarizes a one-day workshop on new ICT trends and issues in librarianship. It will cover topics like the introduction of ICT in libraries, different types of libraries supported by ICT, necessary ICT infrastructure, software for library automation, digital repositories, and web applications. The workshop will be held at the Institute of Modern Sciences and Arts on April 17, 2016.
The LAMP stack is a well know and ubiquitous web development stack, but have you heard of MEAN? It's an up and coming stack that's unified by a single language, JavaScript. Learn the basic components of the MEAN stack as well as practical use case and applications.
Introduction to Modern and Emerging Web TechnologiesSuresh Patidar
2017 is here and we are already a couple of days in!
A lot happened in the software development world in 2016. There were new releases of popular programming languages, new versions of important frameworks, and new tools. Let’s discuss some of the most important releases, and find out which skills you can learn that would be a great investment for your time in 2017!
Similar to Presention on Facebook in f Distributed systems (20)
Lifecycle of a GME Trader: From Newbie to Diamond Handsmediavestfzllc
Your phone buzzes with a Reddit notification. It's the WallStreetBets forum, a cacophony of memes, rocketship emojis, and fervent discussions about Gamestop (GME) stock. A spark ignites within you - a mix of internet bravado, a rebellious urge to topple the hedge funds (remember Mr. Mayo?), and maybe that one late-night YouTube rabbit hole about tendies. You decide to YOLO (you only live once, right?).
Ramen noodles become your new best friend. Every spare penny gets tossed into the GME piggy bank. You're practically living on fumes, but the dream of a moonshot keeps you going. Your phone becomes an extension of your hand, perpetually glued to the GME ticker. It's a roller-coaster ride - every dip a stomach punch, every rise a shot of adrenaline.
Then, it happens. Roaring Kitty, the forum's resident legend, fires off a cryptic tweet. The apes, as the GME investors call themselves, erupt in a frenzy. Could this be it? Is the rocket finally fueled for another epic launch? You grip your phone tighter, heart pounding in your chest. It's a wild ride, but you're in it for the long haul.
Telegram is a messaging platform that ushers in a new era of communication. Available for Android, Windows, Mac, and Linux, Telegram offers simplicity, privacy, synchronization across devices, speed, and powerful features. It allows users to create their own stickers with a user-friendly editor. With robust encryption, Telegram ensures message security and even offers self-destructing messages. The platform is open, with an API and source code accessible to everyone, making it a secure and social environment where groups can accommodate up to 200,000 members. Customize your messenger experience with Telegram's expressive features.
Your LinkedIn Success Starts Here.......SocioCosmos
In order to make a lasting impression on your sector, SocioCosmos provides customized solutions to improve your LinkedIn profile.
https://www.sociocosmos.com/product-category/linkedin/
The Evolution of SEO: Insights from a Leading Digital Marketing AgencyDigital Marketing Lab
Explore the latest trends in Search Engine Optimization (SEO) and discover how modern practices are transforming business visibility. This document delves into the shift from keyword optimization to user intent, highlighting key trends such as voice search optimization, artificial intelligence, mobile-first indexing, and the importance of E-A-T principles. Enhance your online presence with expert insights from Digital Marketing Lab, your partner in maximizing SEO performance.
Project Serenity is an innovative initiative aimed at transforming urban environments into sustainable, self-sufficient communities. By integrating green architecture, renewable energy, smart technology, sustainable transportation, and urban farming, Project Serenity seeks to minimize the ecological footprint of cities while enhancing residents' quality of life. Key components include energy-efficient buildings, IoT-enabled resource management, electric and autonomous transportation options, green spaces, and robust waste management systems. Emphasizing community engagement and social equity, Project Serenity aspires to serve as a global model for creating eco-friendly, livable urban spaces that harmonize modern conveniences with environmental stewardship.
This tutorial presentation offers a beginner-friendly guide to using THREADS, Instagram's messaging app. It covers the basics of account setup, privacy settings, and explores the core features such as close friends lists, photo and video sharing, creative tools, and status updates. With practical tips and instructions, this tutorial will empower you to use THREADS effectively and stay connected with your close friends on Instagram in a private and engaging way.
EASY TUTORIAL OF HOW TO USE G-TEAMS BY: FEBLESS HERNANEFebless Hernane
Using Google Teams (G-Teams) is simple. Start by opening the Google Teams app on your phone or visiting the G-Teams website on your computer. Sign in with your Google account. To join a meeting, click on the link shared by the organizer or enter the meeting code in the "Join a Meeting" section. To start a meeting, click on "New Meeting" and share the link with others. You can use the chat feature to send messages and the video button to turn your camera on or off. G-Teams makes it easy to connect and collaborate with others!
Your Path to YouTube Stardom Starts HereSocioCosmos
Skyrocket your YouTube presence with Sociocosmos' proven methods. Gain real engagement and build a loyal audience. Join us now.
https://www.sociocosmos.com/product-category/youtube/
EASY TUTORIAL OF HOW TO USE REMINI BY: FEBLESS HERNANEFebless Hernane
Using Remini is easy and quick for enhancing your photos. Start by downloading the Remini app on your phone. Open the app and sign in or create an account. To improve a photo, tap the "Enhance" button and select the photo you want to edit from your gallery. Remini will automatically enhance the photo, making it clearer and sharper. You can compare the before and after versions by swiping the screen. Once you're happy with the result, tap "Save" to store the enhanced photo in your gallery. Remini makes your photos look amazing with just a few taps!
This tutorial presentation provides a step-by-step guide on how to use Facebook, the popular social media platform. In simple and easy-to-understand language, this presentation explains how to create a Facebook account, connect with friends and family, post updates, share photos and videos, join groups, and manage privacy settings. Whether you're new to Facebook or just need a refresher, this presentation will help you navigate the features and make the most of your Facebook experience.
Surat Digital Marketing School is created to offer a complete course that is specifically designed as per the current industry trends. Years of experience has helped us identify and understand the graduate-employee skills gap in the industry. At our school, we keep up with the pace of the industry and impart a holistic education that encompasses all the latest concepts of the Digital world so that our graduates can effortlessly integrate into the assigned roles.
This is the place where you become a Digital Marketing Expert.
2. THE “SOCIAL MEDIA” REVOLUTION A STUDY AND ANALYSIS OF THE
PHENOMENON
Ahmad Yar
BS Computer Science
Bahauddin Zakariya University Multan (BZU), Sahiwal Campus.
Email: ahmadyark1@gmail.com
Mobile: +92303 9464551
3. What are Distributed Systems ?
A distributed system is one in which hardware or software components located
at networked computers.
A distributed system is a piece of software that ensures that :
A collection of independent computers appears to its users as a single
coherent system.
Two aspects:
Independent computers
Single system
World Wide Web (WWW) is the biggest example of distributed system.
4. What is Facebook?
A portal for social networking
Interact with friends
Share photos and/or videos
Community organizing
Email and instant messaging
Various forms of interpersonal communication
Operated and privately owned by Facebook, Inc.
5. Who Created Facebook?
Mark Zuckerberg created Facebook while at Harvard University in
2004 with roommate Dustin and fellow Computer Science major Chris.
Initially created for college students
Then moved to include high school students
Now open to anyone over the age of 13
Mark Zuckerberg, 23, founded Facebook while studying psychology at Harvard University.
A keen computer programmer, Mr Zuckerberg had already developed a number of social-
networking websites.
Coursematch
Facemash
6. Idea & Creation of Facebook
Divya Narendra
Cameron and Tyler Winklevoss
In February 2004 Mr Zuckerberg launched "The facebook", as it was originally known; the
name taken from the sheets of paper distributed to freshmen, profiling students and staff.
Within 24 hours, 1,200 Harvard students had signed up, and after one month, over half of
the undergraduate population had a profile.
The network was promptly extended to other Boston universities, the Ivy League and
eventually all US universities. It became Facebook.com in August 2005 after the address
was purchased for $200,000. US high schools could sign up from September 2005, then
it began to spread worldwide, reaching UK universities the following month.
7. Social Network Feb. 2008 Feb. 2009 Growth
Facebook 20,043,000 65,704,000 +228%
Growth of Facebook
8. Facebook Architecture
Front End
LAMP:
Linux, Apache, MYSQL, PHP & Bigpipe
Great Documentation
Large Community
Why LAMP?
Easy to learn, huge community, lots of
Framework used by Facebook
9. • LINUX is a computer operating system kernel.
• It’s open source, very customizable, and good for security.
• Facebook runs the Linux operating system on Apache HTTP Servers.
In many ways, linux is similar to other operating systems you may have used before,
such as windows, osx, or ios.
Like other operating systems, linux has a graphical interface, and types of software
you are accustomed to using on other operating systems, such as word processing
applications, have linux equivalents. In many cases, the software’s creator may have
made a linux version of the same program you use on other systems. If you can use
a computer or other electronic device, you can use linux.
Linux & Apache
10. • APACHE is also free and is the most popular open source
webserver in use.
Facebook messaging system has recently added to the application, by the support of
Apache HBase which is a database like layer built on Hadoop designed to support
billions of messages per day. The application’s requirements for consistency, availability,
partition tolerance, data model and scalability.
Hbase support Facebook billion messages capacity which will be increased with minimal
overhead and no down time, with Highly write throughput, efficient and low-latency that
support the strong consistency semantics within a data center, the efficient random
reads from disks, and being highly available specially in disaster recovery, and fault
isolation ,and retaining the atomic read modify write primitives .
Linux & Apache
11. • PHP is a dynamically typed/interpreted scripting language.
• Facebook uses PHP because it is a good web programming
Language with extensive support and an active developer
community and it is good for rapid iteration.
The facebook sdk (system development kit) for php is a library with powerful features
that enable php developers to easily integrate facebook login and make requests to
the graph API.
It also plays well with the facebook sdk for javascript to give the front-end user the
best possible user experience. But it doesn't end there, the facebook sdk for php
makes it easy to upload photos and videos and send batch requests to the graph API
among other things.
PHP & Bigpipe
12. Pipelining
• Bigpipe is a dynamic web page system developed by Facebook.
The general idea is to perform pipelining of sections through the
implementation of various stages within web browsers and servers.
Browser sends an http request to web server.
Web server parses the request, pulls data from storage tier then formulates
an html document and sends it to the client in an http response.
Http response is transferred over the internet to browser.
Browser parses the response from web server, constructs a tree
representation of the html document, and downloads
css and javascript resources referenced by the document.
After downloading javascript resources,
browser parses and executes them.
BigPipe
13. HIP-HOP
• PHP compiler.
• Developed by Facebook
• The processing time for PHP language is slow Created to minimize server
resources.
• Converts PHP scripts into optimized C++ code.
14. • Back end are the application servers.
• Application servers are responsible for answering all queries and take all the writes
into the system.
• Facebook’s backend services are written in a variety of different programming
languages including C++, Java, Python, and Erlang.
Back End
• Haystack
• SCRIBE
• My SQL
• Memcache
• Cassandra
• Storing
15. Haystack
• Haystack is an object store that is designed for
sharing photos on Facebook where data is
written once, read often, never modified, and
rarely deleted and replaced.
• Efficient storage of billions of photos.
• Highly scalable.
• Uses extensive caching in its main memory.
The new photo infrastructure merges the photo serving tier and storage tier into one
physical tier. It implements a HTTP based photo server which stores photos in a
generic object store called Haystack. The main requirement for the new tier was to
eliminate any unnecessary metadata overhead for photo read operations, so that
each read I/O operation was only reading actual photo data (instead of file system
metadata). Haystack can be broken down into these functional layers:
16. SCRIBE
• Simple data model
• Scalable distributed logging framework
• Useful for logging a wide array of data
• Built on top of Thrift
• HTTP server
• Photo Store
• Haystack Object Store
• File system
• Storage
17. SCRIBE
Scribe is a server for aggregating log data streamed in real-time from a large
number of servers. It was designed to be scalable, extensible without client-side
modification, and robust to failure of the network or any specific machine.
Scribe is developed at facebook and released in 2008 as open source. Scribe servers are arranged in a
directed graph, with each server knowing only about the next server in the graph. This network topology
allows for adding extra layers of fan-in as a system grows, and batching messages before sending them
between datacenters, without having any code that explicitly needs to understand datacenter topology,
only a simple configuration.
Scribe is designed to consider reliability but to not require heavyweight protocols and expansive disk
usage. Scribe spools data to disk on any node to handle intermittent connectivity node failure, but doesn't
sync a log file for every message. This creates a possibility of a small amount of data loss in the event of
a crash or catastrophic hardware failure. However, this degree of reliability is often suitable for most
facebook use cases.
18. • Facebook utilizes MySQL because of its speed and reliability.
• Thousands of MySQL servers
• Users randomly distributed across these servers
• Relational aspect of DB is not used
• No joins. Logically difficult(Data is distributed randomly)
• Primarily key-value store
Memcache
• Protects the main database from high read demands
from users.
• Memcache is a memory caching system that is used
to speed up dynamic database driven websites (like
Facebook)
Memory Management using Memcached
My SQL
19. Cassandra is a database management system designed to
handle large amounts of data spread out across many
servers. It powers Facebook’s Inbox Search feature and
provides a structured key-value store with eventual
consistency.
Storing
Apache Hadoop is being used in three broad types of systems:
• as a warehouse for web analytics
• as storage for a distributed database
• and for MySQL database backups.
Cassandra
Cassandra
20. Fault Tolerance
Ability of a system to continue functioning in the event of a partial failure.
Though the system continues to function but overall performance may get affected.
Two main reasons for the occurrence of a fault :
1)Hardware or software failure. 2)Unauthorized Access.
Why do we need fault tolerance
Fault Tolerance is needed in order to provide 3 main feature to distributed systems.
1) Reliability-Focuses on a continuous service with out any interruptions.
2) Availability - Concerned with read readiness of the system.
3) Security-Prevents any unauthorized access.
21. Phases In Fault Tolerance
• Implementation of a fault tolerance technique depends on the design , configuration
and application of a distributed system.
• In general designers have suggested some general principles which have been followed.
1)Fault Detection
2)Fault Diagnosis
3)Evidence Generation
4)Assessment
5)Recovery
22. Fault
Detection
•Constantly monitoring the performance and comparing it with
expected outcome.
•Fault is reported if there is a deviation from expected
outcome.
Fault
Diagnosis
•Done to understand the nature of the fault and possible root
cause.
Evidence
Generation
•Report generated based on the outcome of the fault diagnosis.
Assessment •Understanding the extent of the damage caused by the faulty
component.
•Done by examining the flow of information that has passed out
from the faulty component to the rest of the system.
•A virtual Boundary is created.
Recovery Making the system fault free and restoring it to a consistent
state- Forward recovery and Backward recovery.
23. Fault Tolerance Techniques
Replication
• Creating multiple copies or replica of data items and storing them at different sites
• Main idea is to increase the availability so that if a node fails at one site, so data can
be accessed from a different site.
• Has its limitation too such as data consistency and degree of replica.
24. LIMITATIONS
Replication
• Difficult to manage as the no. replica or copies increases.
• Consistency and degree of replica is a major issue.
Check Pointing
• Lost of computation
• Check point length and check point frequency and storage is a major issue.
25. • A situation in which two or more persons access the same record at same time is
called Concurrency.
• Concurrency control ensures that correct results of parallel operations are generated.
Concurrency
Why concurrency control?
• Concurrency control is needed because there are a lot of things that can go wrong
• Each transaction itself can be okay, but the concurrency generates problems such as:
• The lost update problem
• The dirty read problem
• The incorrect summary problem
26. • Facebook has worked hard on concurrent programming. Now, Facebook is sharing its
newest debugger tool: RacerD, its new open source race detector.
• RacerD is launched by the company in 2015.
• Dedicated to identifying source code bugs.
• RacerD statically analyzes Java code to detect potential concurrency bugs. This
analysis does not attempt to prove the absence of concurrency issues, rather, it
searches for a high-confidence class of data races.
• RacerD doesn’t try to check all code for concurrency issues.
• There are two signals that RacerD looks for:
1. Explicitly annotating a class/method
2. Using a lock via the synchronized keyword.
RacerD
27. • Scalability is an attribute that describes the ability of a process, network, software or
organization to grow and manage increased demand. A system, business or software
that is described as scalable has an advantage because it is more adaptable to the
changing needs or demands of its users or clients.
Scalability
28. Facebook’s scaling challenge
Before we get into the details, here are a few factoids to give you an idea of the scaling challenge
that Facebook has to deal with:
Facebook serves 570 billion page views per month (according to Google Ad Planner).
There are more photos on Facebook than all other photo sites combined (including sites like
Flickr) More than 3 billion photos are uploaded every month.
Facebook’s systems serve 1.2 million photos per second. This doesn’t include the images
served by Facebook’s CDN. More than 25 billion pieces of content (status updates, comments,
etc) are shared every month. Facebook has more than 30,000 servers (and this number is from
last year)
1-LAMP 2-PHP 3-Linux 4-MySQL 5-Memcached
6-HIPHOP 7-HAYSTACK 8-BIGPIPE 9-CASSANDRA 10-SCRIBE
11-HADOOP & HIVE 12-THRIFT
Software that helps Facebook scale
29. Here’s a look at Facebook’s rapidly growing data center campuses around the
world:
Prineville, Oregon : 2.15 million square feet of data center space in Prineville by
2021.
Altoona, Iowa : 2.5 million square feet of data center space.
The campus features three data centers between 468,000 SF and 496,000 SF. In 2016
the company added a 100,000 SF cold storage facility
Clonee, Ireland : 621,000 square feet of data center space.
Forth Worth, Texas : 2.5 million square feet of data center space.
30. Las Lunas, New Mexico : Sept. 2016 nearly 3 million square feet of data
center space.
Papillion, Nebraska : In March 2018 2.6 million square feet of space.
New Albany, Ohio : Facebook investing $750 million in a 900,000 square foot data
center in New Albany, an Ohio town that also hosts a cloud computing data center for
Amazon Web Services.
Henrico County, Virginia : Facebook spend $750 million to build a 970,000 square
foot data center.
Newton County, Georgia : In February 2017 Facebook invest about $750 million in
the facility in Newton County, about 40 miles east of downtown Atlanta, where it build
two data centers spanning 970,000 square feet. The buildings will be fully operational
in 2020.
31. Openness
Openness means being open in terms of sharing information so employees
know what’s going on, and crucially, feel heard. But it also means being, and
expecting, an openness to different ways of working different styles, different
opinions, and, critically, feedback. It means openness to change.
Whether the system can be extended in various ways without troublesome
existing system and services
• Hardware extensions
• Adding peripherals, memory, communication interfaces
• Software extensions
• Operating System features
• Communication protocols
32. Openness is supported by:
• Public interfaces
• Standardized communication protocols
1.Be Personal:
Don’t try to be something you’re not, or someone
else. Be yourself. Just be yourself. That includes being
vulnerable, honest. If something isn’t working, or is
worrying you, share it. If you’ve struggled with
something that’s relevant and learned a lesson or
two along the way, share it. Sharing your own
perspective on an event, a trend, or a challenge
makes you more relatable and builds trust.
Share a story.“We can tag others and it is a much more elegant
way to have a conversation, versus the email
conversations that we were having a lot of times.” Stacie Sherer, SVP Corporate
Communications, Weight Watchers.
Openness key aspects
33. 2. Internal before external:
Just about everything should be shared internally before it’s shared externally. It gives us
the opportunity to get feedback, prepare for public feedback, and to refine and practice
our broader messages before going to the public.
3. Feedback:
Root your programs in feedback and use data to support wherever possible. Often the
feedback helps you figure out what point you’re trying to make. And be clear about what
kind of feedback you want, where and how you want it shared, and what you’ve learned
or what changes you’ve made from the feedback.
Feedback also helps all people get better together. Without it, people can see the
problems and become complacent or jaded if they don’t think their opinion matters or that
their insight can make a difference.
34. Transparency
Concealment (Hiding) from the user and the application programmer
of the separation of the components of a distributed system
Access Transparency - Local and remote resources are accessed in
same way
Location Transparency - Users are unaware of the location of
resources
Migration Transparency - Resources can migrate without name
change
Replication(something that has been copied) Transparency -
Users are unaware of the existence of multiple copies of resources
Failure Transparency - Users are unaware of the failure of individual
components
Concurrency Transparency - Users are unaware of sharing
resources with others
35. Facebook released its latest Transparency report, where the social network
shares information on government requests for user data, noting that these
requests had increased globally by around 4 percent compared to the first half of
2017, though U.S. government-initiated requests stayed roughly the same. In
addition, the company added a new report to accompany the usual Transparency
report, focused on detailing how and why Facebook takes action on enforcing its
Community Standards, specifically in the areas of graphic violence, and sexual
activity, terrorist propaganda, hate speech, spam and fake accounts.
Including that facts this is very much a work in progress and they will likely
improve their methodology over time.
Government requests for account data increased globally by around 4%
compared to the first half of 2017, increasing from 78,890 to 82,341 requests. In
the US, government requests remained roughly even at 32,742, of which 62%
included a non-disclosure order prohibiting Facebook from notifying the user,
which is up from 57% during the first half of 2017.
36. During the second half of 2017, the number of pieces of content we restricted
based on local law fell from 28,036 to 14,294. Last cycle’s figures had been
increased primarily by content restrictions in Mexico related to the video of a
tragic school shooting.
There were 46 disruptions of Facebook services in 12 countries in the second
half of 2017, compared to 52 disruptions in nine countries in the first half. We
continue to be deeply concerned by internet disruptions, which prevent people
from communicating with family and friends and also threaten the growth of small
businesses.
The report also includes data covering the volume and nature of copyright,
trademark and counterfeit reports we received, as well as the amount of content
affected by those reports. During this period, on Facebook and Instagram we
took down 2,776,665 pieces of content based on 373,934 copyright reports,
222,226 pieces of content based on 61,172 trademark reports and 459,176
pieces of content based on 28,680 counterfeit reports.