Presentation of the SIGCOMM 2010 paper: "The little engine(s) that could: scaling online social networks".
The full paper is available at:
http://ccr.sigcomm.org/online/?q=node/642
or at
http://research.tid.es/jmps/
Data Collection without Privacy Side EffectsJosep M. Pujol
Presented at WWW BIG 2016. Paper available at: http://josepmpujol.net/public/papers/big_green_tracker.pdf
Abstract: The standard approach to collect users’ activity data on the Web relies on server-side processing. This approach requires the presence of user-identifiers in order to aggregate data in sessions, which leads to tracking. Server-side aggregation is bound to produce side-effects because the scope of sessions cannot be safely limited to a particular use-case. We provide several examples of such side-effects.
To preserve privacy we propose an alternative approach based on client-side aggregation, where user-identifiers are not needed because sessions only exist on the client-side (i.e. the user’s browser). We demonstrate the feasibility of this approach by providing an implementation of a tracking agent – green-tracker – able to gather the data needed to power a service functionally equivalent to Google Analytics.
Paper at: http://www2016.net/proceedings/proceedings/p121.pdf
Abstract: Online tracking poses a serious privacy challenge that has drawn significant attention in both academia and industry. Existing approaches for preventing user tracking, based on curated blocklists, suffer from limited coverage and coarse-grained resolution for classification, rely on exceptions that impact sites’ functionality and appearance, and require significant manual maintenance. In this paper we propose a novel approach, based on the concepts leveraged from k-Anonymity, in which users collectively identify unsafe data elements, which have the potential to identify uniquely an individual user, and remove them from requests. We deployed our system to 200,000 German users running the Cliqz Browser or the Cliqz Firefox extension to evaluate its efficiency and feasibility. Results indicate that our approach achieves better privacy protection than blocklists, as provided by Disconnect, while keeping the site breakage to a minimum, even lower than the community-optimized Ad-Block Plus. We also provide evidence of the prevalence and reach of trackers to over 21 million pages of 350,000 unique sites, the largest scale empirical evaluation to date. 95% of the pages visited contain 3rd party requests to potential trackers and 78% attempt to transfer unsafe data. Tracker organizations are also ranked, showing that a single organization can reach up to 42% of all page visits in Germany.
Keyvi is a key value index built using finite state machines, making it an immutable key value store. It was developed as a drop-in replacement for Redis to serve as the backend search index for Cliqz, allowing them to reduce data size and machine requirements. Keyvi provides faster performance than Redis through its single-threaded, shared memory model and ability to perform auto-complete matching, approximate matching, and scoring-based matching through techniques like Levenshtein distance.
This document describes a web portal called the Placement Portal that was created to digitize and streamline the placement process at an educational institution. The portal allows students to register their details once and select companies they are interested in. It removes redundant data collection and provides transparency around placement activities. The portal is built using Node.js for the backend, MySQL for the database, and utilizes cloud computing. It aims to reduce the workload of placement coordinators and provide students and stakeholders insights into the institution's placement performance and process.
This document discusses cloud computing and provides an overview of key concepts. It begins with definitions of cloud computing and describes the three main models of cloud services: Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). It then outlines some common applications of cloud computing and benefits such as scalability, simplicity, and security. The document also reviews limitations, design principles, and the future scope of cloud computing. In conclusion, cloud computing provides convenient and cost-effective Internet-based computing services.
The document describes a minor project report on creating a LAN network with redundancy that was completed by two students for their B-Tech degree. It outlines the background, methodology, requirements, and results of designing a network topology on GNS3 using routing protocols, VPN, NAT, firewalls, and Cisco phones to provide redundancy. The project implemented Cisco ASA, MPLS routing, and other protocols to create a secure and reliable network connecting different office locations.
The sole purpose to study Enterprise Network is to create business simplicity across worldwide. The side arms of successful networking are scalability, robustness, fault identification, communication, modularity, security and maintaining privacy. The key for making a network is to provide the essential tools and techniques that will offer the quality of a private/public network.
As I discussed earlier the key purpose is to create business simplicity that means creating IT/Infrastructure simplicity across the cities where an Enterprise Network is connected. Obtaining success in failure/break-down conditions is the main purpose of a network. So to achieve that requirement network designing involves certain topologies, protocols, bandwidth allocation. Topology requirement can be described as maintaining two adjacent networks against any failure in a single link or node. Protocol requirement can be described as using dynamic/static routing protocol to provide routes must be congestion free in a network. Bandwidth allocation is needed to actively allocate extra bandwidth just to maintain the working condition in a network. Design and Modification criteria is all over handed to a person called Network Administrator, who maintains and solely responsible for anything(wanted or unwanted) happens in a network.
Data Collection without Privacy Side EffectsJosep M. Pujol
Presented at WWW BIG 2016. Paper available at: http://josepmpujol.net/public/papers/big_green_tracker.pdf
Abstract: The standard approach to collect users’ activity data on the Web relies on server-side processing. This approach requires the presence of user-identifiers in order to aggregate data in sessions, which leads to tracking. Server-side aggregation is bound to produce side-effects because the scope of sessions cannot be safely limited to a particular use-case. We provide several examples of such side-effects.
To preserve privacy we propose an alternative approach based on client-side aggregation, where user-identifiers are not needed because sessions only exist on the client-side (i.e. the user’s browser). We demonstrate the feasibility of this approach by providing an implementation of a tracking agent – green-tracker – able to gather the data needed to power a service functionally equivalent to Google Analytics.
Paper at: http://www2016.net/proceedings/proceedings/p121.pdf
Abstract: Online tracking poses a serious privacy challenge that has drawn significant attention in both academia and industry. Existing approaches for preventing user tracking, based on curated blocklists, suffer from limited coverage and coarse-grained resolution for classification, rely on exceptions that impact sites’ functionality and appearance, and require significant manual maintenance. In this paper we propose a novel approach, based on the concepts leveraged from k-Anonymity, in which users collectively identify unsafe data elements, which have the potential to identify uniquely an individual user, and remove them from requests. We deployed our system to 200,000 German users running the Cliqz Browser or the Cliqz Firefox extension to evaluate its efficiency and feasibility. Results indicate that our approach achieves better privacy protection than blocklists, as provided by Disconnect, while keeping the site breakage to a minimum, even lower than the community-optimized Ad-Block Plus. We also provide evidence of the prevalence and reach of trackers to over 21 million pages of 350,000 unique sites, the largest scale empirical evaluation to date. 95% of the pages visited contain 3rd party requests to potential trackers and 78% attempt to transfer unsafe data. Tracker organizations are also ranked, showing that a single organization can reach up to 42% of all page visits in Germany.
Keyvi is a key value index built using finite state machines, making it an immutable key value store. It was developed as a drop-in replacement for Redis to serve as the backend search index for Cliqz, allowing them to reduce data size and machine requirements. Keyvi provides faster performance than Redis through its single-threaded, shared memory model and ability to perform auto-complete matching, approximate matching, and scoring-based matching through techniques like Levenshtein distance.
This document describes a web portal called the Placement Portal that was created to digitize and streamline the placement process at an educational institution. The portal allows students to register their details once and select companies they are interested in. It removes redundant data collection and provides transparency around placement activities. The portal is built using Node.js for the backend, MySQL for the database, and utilizes cloud computing. It aims to reduce the workload of placement coordinators and provide students and stakeholders insights into the institution's placement performance and process.
This document discusses cloud computing and provides an overview of key concepts. It begins with definitions of cloud computing and describes the three main models of cloud services: Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). It then outlines some common applications of cloud computing and benefits such as scalability, simplicity, and security. The document also reviews limitations, design principles, and the future scope of cloud computing. In conclusion, cloud computing provides convenient and cost-effective Internet-based computing services.
The document describes a minor project report on creating a LAN network with redundancy that was completed by two students for their B-Tech degree. It outlines the background, methodology, requirements, and results of designing a network topology on GNS3 using routing protocols, VPN, NAT, firewalls, and Cisco phones to provide redundancy. The project implemented Cisco ASA, MPLS routing, and other protocols to create a secure and reliable network connecting different office locations.
The sole purpose to study Enterprise Network is to create business simplicity across worldwide. The side arms of successful networking are scalability, robustness, fault identification, communication, modularity, security and maintaining privacy. The key for making a network is to provide the essential tools and techniques that will offer the quality of a private/public network.
As I discussed earlier the key purpose is to create business simplicity that means creating IT/Infrastructure simplicity across the cities where an Enterprise Network is connected. Obtaining success in failure/break-down conditions is the main purpose of a network. So to achieve that requirement network designing involves certain topologies, protocols, bandwidth allocation. Topology requirement can be described as maintaining two adjacent networks against any failure in a single link or node. Protocol requirement can be described as using dynamic/static routing protocol to provide routes must be congestion free in a network. Bandwidth allocation is needed to actively allocate extra bandwidth just to maintain the working condition in a network. Design and Modification criteria is all over handed to a person called Network Administrator, who maintains and solely responsible for anything(wanted or unwanted) happens in a network.
Using NetFlow to Improve Network Visibility and Application PerformanceEmulex Corporation
Network and application performance issues can cost your business millions of dollars in lost revenue and productivity. Without persistent, real-time visibility of the infrastructure, Network Operations teams lack the information to predict potential business disruption and prove network and application performance.
Join us on November 6 at 7:00 a.m. PT and hear from Lee Doyle, Principal Analyst at Doyle Research, about the solutions to today’s performance visibility challenges, including:
•Trends affecting traffic visibility, such as application mobility, network upgrades, and data center virtualization and consolidation
•Best practices for managing Quality of Service and reducing failure scenarios
•Critical criteria to consider when selecting performance management solutions
In addition, hear from Richard Trujillo (Emulex Product Marketing) and Scott Frymire (SevOne Product Marketing) how the joint deployment of the Emulex EndaceFlow™ 3040 NetFlow Generator Appliance and SevOne’s Network Performance Management solution lowers time to resolution by reliably monitoring the makeup of the traffic traversing your most critical links.
Making Observability Actionable At Scale - DBS DevConnect 2019Squadcast Inc
Many organisations already possess a vast amount of existing data about production systems. As customer expectations evolve, organisations are often challenged to find more proactive ways of dealing with traditionally reactive incident response activity. In this talk, we discuss approaches to unlock value from this data by making it truly actionable. Understanding production failure modes better, enriching technical and business context effectively, decomposing response activity into shared primitives, actions and workflows, and overall, sharing and augmenting this active knowledge repository on a continuous basis are key takeaways. Through case studies, we'll discuss how we can accomplish this by engineering your observability processes and tooling to work for human-in-the-loop interpretation and response rather than a purely human-reliant strategy.
Review and Analysis of Self Destruction of Data in Cloud ComputingIRJET Journal
This document proposes and reviews a system for self-destruction of data in cloud computing. The system aims to enhance security by shuffling user-uploaded data between directories in the cloud after a set time interval, and by taking timely backups. The key aspects are:
1) Data is divided into fragments and distributed across different nodes in the cloud using coloring algorithms, while also being replicated for availability.
2) Advanced encryption and hashing techniques are used to secure the data fragments.
3) Algorithms are provided for initial fragment placement and subsequent replication to balanced load and improve retrieval time.
The system is designed to enhance both security and usability of cloud storage by preventing unauthorized access to data over time.
Network Automation Journey, A systems engineer NetOps perspectiveWalid Shaari
Network devices play a crucial role; they are not just in the Data Center. It's the Wifi, VOIP, WAN and recently underlays and overlays. Network teams are essential for operations. It's about time we highlight to the configuration management community the importance of Network teams and include them in our discussions. This talk describes the personal experience of systems engineer on how to kickstart a network team into automation. Most importantly, how and where to start, challenges faced, and progress made. The network team in question uses multi-vendor network devices in a large traditional enterprise.
NetDevOps, we do not hear that term as frequent as we should. Every time we hear about automation, or configuration management, it is usually the application, if not, it is the systems that host the applications. How about the network systems and devices that interconnect and protects our services? This talk aims to describe the journey a systems engineer had as part of an automation assignment with the network management team. Building from lessons learned and challenges faced with system automation, how one can kickstart an automation project and gain small wins quickly. Where and how to start the journey? What to avoid? What to prioritise? How to overcome the lack of network skills for the automation engineer and lack of automation and Linux/Unix skills for network engineers. What challenges were faced and how to overcome them? What fights to give up? Where do I see network automation and configuration management as a systems engineer? What are the status quo and future expectations?
Keynote presentation by Amin Vahdat on behalf of Google Technical Infrastructure and Google Cloud Platform. Presentation was delivered at the 2017 Open Networking Summit.
Working with the Mainstay team, the Cisco IOT Manufacturing Marketing team combined research from manufacturing trade associations, management consulting research and an internal benchmarking project to create an Executive Briefing Presentation that would educate CxOs on the opportunities IOT can provide. This content was also repurposed to create a manufacturing IOT whitepaper to provide an asset to entice prospective customers to consider Cisco’s IOT offerings.
This document describes a project to develop a graphical user interface (GUI) tool for Oracle installation. It discusses the objectives of creating a GUI tool, which would be more user-friendly than a command line interface. It provides an introduction to the hardware and software specifications for the system. It also provides an overview of the programming language Java that will be used to develop the tool, including Java's goals, versions, platforms, implementations, and performance. The document outlines the system analysis, design, and testing approach that will be taken for the project.
Deekshita P is presenting on how AI/ML algorithms can impact VLSI design technology. The presentation will cover an introduction to AI/ML, applications in various areas of VLSI design like circuit simulation, physical design, and manufacturing. It will also discuss challenges and opportunities in applying AI/ML in VLSI, such as reducing design time and improving chip yield. The objectives are to enable automated approaches for VLSI design/testing and reduce processing time for design data using learning algorithms.
How to Monitor Digital Dependencies Across Your Modern IT StackThousandEyes
The document discusses how to take a modern approach to IT operations. It recommends collecting data across infrastructure to identify problems, correlating and analyzing data to determine which alerts require immediate attention, and defining key performance indicators and workflows. This approach aims to enhance operations by integrating ThousandEyes into existing environments to facilitate data-driven conversations that minimize disruptions. A live demo was also provided.
This document provides an overview of cloud computing concepts and platforms from leading cloud providers like Amazon Web Services, Google App Engine, and Microsoft Azure. It discusses cloud characteristics like on-demand access and elastic scaling. It also covers the three main service models (IaaS, PaaS, SaaS) and four deployment models (public, private, hybrid, community). The document reviews features of each provider's cloud environment and compares their computing, storage, and database offerings. It provides an example cost calculation for storing and accessing data on different cloud platforms.
Nonfunctional Testing: Examine the Other Side of the CoinTechWell
Creating a highly available, scalable, and high-performing system requires a substantial amount of what we call nonfunctional testing. Developing nonfunctional testing skills is a must for many of today’s quality engineers (QEs). For the past several years, Balaji Arunachalam’s quality team for Intuit Core Services has experienced several highly available and disaster recovery buildup and testing challenges. Their journey includes the evolution of functional QEs into hybrid QEs who are capable of doing both functional and nonfunctional testing. Nonfunctional testing includes capacity, stability, benchmarking, FMEA/RAS, datacenter failover, and scalability testing. Balaji shares nonfunctional testing best practices, learnings, and mistakes they encountered on this journey. If you or your team is ready flip the coin and take a serious look at nonfunctional testing methods, opportunities, challenges, and solutions, this session is for you.
IRJET- Implementation of Dynamic Internetworking in the Real World it DomainIRJET Journal
This document summarizes a study that implemented a dynamic internetworking in a real-world IT domain. The study created a network topology for an organization using Cisco Packet Tracer with routers, switches, computers and a DHCP server. It configured routing protocols, access control lists, authentication, VLANs and inter-VLAN routing. DHCP was configured to automatically assign IP addresses. Routing protocols like RIP, OSPF and EIGRP were configured between routers. Access control lists were used to filter traffic and provide security. Authentication ensured security and remote access was provided using telnet. VLANs divided the network into broadcast domains and inter-VLAN routing allowed communication between VLANs.
This document is a project report submitted by Supriya Jangid in partial fulfillment of an MCA (Master of Computer Applications) degree. The report documents the development of an "Online SlamBook" website application. The report includes declarations, certificates, an abstract, acknowledgements, table of contents, and 8 chapters that cover an introduction to the project, existing and proposed systems, system analysis including data flow diagrams and system flowcharts, system design, coding, testing, security, and conclusions.
IRJET- Cost Effective Scheme for Delay Tolerant Data TransmissionIRJET Journal
This document proposes two schemes, the deadline cost (DC) scheme and the deadline shortest queue first (DSQF) scheme, to improve the rate of data meeting its deadline with minimal data transmission cost in a wireless mesh network of IoT gateways. The DC scheme selects the cheapest gateway that meets the data deadline, while DSQF selects the fastest gateway, with the other metric as the secondary factor. The schemes aim to reduce overall data transmission costs compared to traditional greedy cost and shortest queue first schemes. According to tests, the proposed schemes can meet over 98% of data deadlines while reducing costs by 5.74% on average.
The document discusses Cisco's vision for the Internet of Everything (IoE) and how it applies to the manufacturing sector. It addresses some of the challenges manufacturers face, such as disconnected systems and technology silos. It then presents Cisco's proposed architecture for industrial networks, including security frameworks, edge computing, and fog computing to enable distributed data processing at the network edge. This architecture is meant to help manufacturers overcome challenges and leverage IoT/IoE for operational improvements and business benefits.
Read to learn what Mule Runtime Fabric (RTF) and Anypoint RTF are, how you can leverage these integration engines, the best adoption strategies, and the right way to conduct the risk-cost-benefit analysis for your business.
This document discusses cloning an organization to allow testing and manipulation without affecting the original site. It defines cloning as creating an exact copy that can be used for tasks without risk to the original. Types of clones include the frontend design, backend design, and database. Benefits of cloning for software testing are that it is cost-effective, improves security and product quality, and increases customer satisfaction. The document then discusses various software testing types, reverse engineering, and software development life cycles like waterfall, RAD, spiral, V-model, incremental, agile, iterative, big bang and prototype models. The conclusion is that cloning can help test and learn new features without interrupting the original organization's data and business.
Data centers are growing to accommodate more internet-connected devices, with innovations helping achieve network coverage for billions of devices by 2020. As data centers grow, trends like software-driven infrastructure, microtechnology, and alternative energy use are making data centers more efficient by consolidating resources and reducing size. Hyperconvergence allows more efficient use of rack space by consolidating computer storage, networking, and virtualization in compact 2U systems from companies like Simplivity and Nutanix.
This document describes an online exam project created using J2EE. It was submitted as a thesis project to fulfill requirements for an industrial training program. The project aims to automate exam assessment and provide instant results and reports to reduce workload. It allows multiple choice questions and sending score notifications via email. Future enhancements could include additional question types and improved reusability, extensibility, and portability.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
More Related Content
Similar to The little engine(s) that could: scaling online social networks
Using NetFlow to Improve Network Visibility and Application PerformanceEmulex Corporation
Network and application performance issues can cost your business millions of dollars in lost revenue and productivity. Without persistent, real-time visibility of the infrastructure, Network Operations teams lack the information to predict potential business disruption and prove network and application performance.
Join us on November 6 at 7:00 a.m. PT and hear from Lee Doyle, Principal Analyst at Doyle Research, about the solutions to today’s performance visibility challenges, including:
•Trends affecting traffic visibility, such as application mobility, network upgrades, and data center virtualization and consolidation
•Best practices for managing Quality of Service and reducing failure scenarios
•Critical criteria to consider when selecting performance management solutions
In addition, hear from Richard Trujillo (Emulex Product Marketing) and Scott Frymire (SevOne Product Marketing) how the joint deployment of the Emulex EndaceFlow™ 3040 NetFlow Generator Appliance and SevOne’s Network Performance Management solution lowers time to resolution by reliably monitoring the makeup of the traffic traversing your most critical links.
Making Observability Actionable At Scale - DBS DevConnect 2019Squadcast Inc
Many organisations already possess a vast amount of existing data about production systems. As customer expectations evolve, organisations are often challenged to find more proactive ways of dealing with traditionally reactive incident response activity. In this talk, we discuss approaches to unlock value from this data by making it truly actionable. Understanding production failure modes better, enriching technical and business context effectively, decomposing response activity into shared primitives, actions and workflows, and overall, sharing and augmenting this active knowledge repository on a continuous basis are key takeaways. Through case studies, we'll discuss how we can accomplish this by engineering your observability processes and tooling to work for human-in-the-loop interpretation and response rather than a purely human-reliant strategy.
Review and Analysis of Self Destruction of Data in Cloud ComputingIRJET Journal
This document proposes and reviews a system for self-destruction of data in cloud computing. The system aims to enhance security by shuffling user-uploaded data between directories in the cloud after a set time interval, and by taking timely backups. The key aspects are:
1) Data is divided into fragments and distributed across different nodes in the cloud using coloring algorithms, while also being replicated for availability.
2) Advanced encryption and hashing techniques are used to secure the data fragments.
3) Algorithms are provided for initial fragment placement and subsequent replication to balanced load and improve retrieval time.
The system is designed to enhance both security and usability of cloud storage by preventing unauthorized access to data over time.
Network Automation Journey, A systems engineer NetOps perspectiveWalid Shaari
Network devices play a crucial role; they are not just in the Data Center. It's the Wifi, VOIP, WAN and recently underlays and overlays. Network teams are essential for operations. It's about time we highlight to the configuration management community the importance of Network teams and include them in our discussions. This talk describes the personal experience of systems engineer on how to kickstart a network team into automation. Most importantly, how and where to start, challenges faced, and progress made. The network team in question uses multi-vendor network devices in a large traditional enterprise.
NetDevOps, we do not hear that term as frequent as we should. Every time we hear about automation, or configuration management, it is usually the application, if not, it is the systems that host the applications. How about the network systems and devices that interconnect and protects our services? This talk aims to describe the journey a systems engineer had as part of an automation assignment with the network management team. Building from lessons learned and challenges faced with system automation, how one can kickstart an automation project and gain small wins quickly. Where and how to start the journey? What to avoid? What to prioritise? How to overcome the lack of network skills for the automation engineer and lack of automation and Linux/Unix skills for network engineers. What challenges were faced and how to overcome them? What fights to give up? Where do I see network automation and configuration management as a systems engineer? What are the status quo and future expectations?
Keynote presentation by Amin Vahdat on behalf of Google Technical Infrastructure and Google Cloud Platform. Presentation was delivered at the 2017 Open Networking Summit.
Working with the Mainstay team, the Cisco IOT Manufacturing Marketing team combined research from manufacturing trade associations, management consulting research and an internal benchmarking project to create an Executive Briefing Presentation that would educate CxOs on the opportunities IOT can provide. This content was also repurposed to create a manufacturing IOT whitepaper to provide an asset to entice prospective customers to consider Cisco’s IOT offerings.
This document describes a project to develop a graphical user interface (GUI) tool for Oracle installation. It discusses the objectives of creating a GUI tool, which would be more user-friendly than a command line interface. It provides an introduction to the hardware and software specifications for the system. It also provides an overview of the programming language Java that will be used to develop the tool, including Java's goals, versions, platforms, implementations, and performance. The document outlines the system analysis, design, and testing approach that will be taken for the project.
Deekshita P is presenting on how AI/ML algorithms can impact VLSI design technology. The presentation will cover an introduction to AI/ML, applications in various areas of VLSI design like circuit simulation, physical design, and manufacturing. It will also discuss challenges and opportunities in applying AI/ML in VLSI, such as reducing design time and improving chip yield. The objectives are to enable automated approaches for VLSI design/testing and reduce processing time for design data using learning algorithms.
How to Monitor Digital Dependencies Across Your Modern IT StackThousandEyes
The document discusses how to take a modern approach to IT operations. It recommends collecting data across infrastructure to identify problems, correlating and analyzing data to determine which alerts require immediate attention, and defining key performance indicators and workflows. This approach aims to enhance operations by integrating ThousandEyes into existing environments to facilitate data-driven conversations that minimize disruptions. A live demo was also provided.
This document provides an overview of cloud computing concepts and platforms from leading cloud providers like Amazon Web Services, Google App Engine, and Microsoft Azure. It discusses cloud characteristics like on-demand access and elastic scaling. It also covers the three main service models (IaaS, PaaS, SaaS) and four deployment models (public, private, hybrid, community). The document reviews features of each provider's cloud environment and compares their computing, storage, and database offerings. It provides an example cost calculation for storing and accessing data on different cloud platforms.
Nonfunctional Testing: Examine the Other Side of the CoinTechWell
Creating a highly available, scalable, and high-performing system requires a substantial amount of what we call nonfunctional testing. Developing nonfunctional testing skills is a must for many of today’s quality engineers (QEs). For the past several years, Balaji Arunachalam’s quality team for Intuit Core Services has experienced several highly available and disaster recovery buildup and testing challenges. Their journey includes the evolution of functional QEs into hybrid QEs who are capable of doing both functional and nonfunctional testing. Nonfunctional testing includes capacity, stability, benchmarking, FMEA/RAS, datacenter failover, and scalability testing. Balaji shares nonfunctional testing best practices, learnings, and mistakes they encountered on this journey. If you or your team is ready flip the coin and take a serious look at nonfunctional testing methods, opportunities, challenges, and solutions, this session is for you.
IRJET- Implementation of Dynamic Internetworking in the Real World it DomainIRJET Journal
This document summarizes a study that implemented a dynamic internetworking in a real-world IT domain. The study created a network topology for an organization using Cisco Packet Tracer with routers, switches, computers and a DHCP server. It configured routing protocols, access control lists, authentication, VLANs and inter-VLAN routing. DHCP was configured to automatically assign IP addresses. Routing protocols like RIP, OSPF and EIGRP were configured between routers. Access control lists were used to filter traffic and provide security. Authentication ensured security and remote access was provided using telnet. VLANs divided the network into broadcast domains and inter-VLAN routing allowed communication between VLANs.
This document is a project report submitted by Supriya Jangid in partial fulfillment of an MCA (Master of Computer Applications) degree. The report documents the development of an "Online SlamBook" website application. The report includes declarations, certificates, an abstract, acknowledgements, table of contents, and 8 chapters that cover an introduction to the project, existing and proposed systems, system analysis including data flow diagrams and system flowcharts, system design, coding, testing, security, and conclusions.
IRJET- Cost Effective Scheme for Delay Tolerant Data TransmissionIRJET Journal
This document proposes two schemes, the deadline cost (DC) scheme and the deadline shortest queue first (DSQF) scheme, to improve the rate of data meeting its deadline with minimal data transmission cost in a wireless mesh network of IoT gateways. The DC scheme selects the cheapest gateway that meets the data deadline, while DSQF selects the fastest gateway, with the other metric as the secondary factor. The schemes aim to reduce overall data transmission costs compared to traditional greedy cost and shortest queue first schemes. According to tests, the proposed schemes can meet over 98% of data deadlines while reducing costs by 5.74% on average.
The document discusses Cisco's vision for the Internet of Everything (IoE) and how it applies to the manufacturing sector. It addresses some of the challenges manufacturers face, such as disconnected systems and technology silos. It then presents Cisco's proposed architecture for industrial networks, including security frameworks, edge computing, and fog computing to enable distributed data processing at the network edge. This architecture is meant to help manufacturers overcome challenges and leverage IoT/IoE for operational improvements and business benefits.
Read to learn what Mule Runtime Fabric (RTF) and Anypoint RTF are, how you can leverage these integration engines, the best adoption strategies, and the right way to conduct the risk-cost-benefit analysis for your business.
This document discusses cloning an organization to allow testing and manipulation without affecting the original site. It defines cloning as creating an exact copy that can be used for tasks without risk to the original. Types of clones include the frontend design, backend design, and database. Benefits of cloning for software testing are that it is cost-effective, improves security and product quality, and increases customer satisfaction. The document then discusses various software testing types, reverse engineering, and software development life cycles like waterfall, RAD, spiral, V-model, incremental, agile, iterative, big bang and prototype models. The conclusion is that cloning can help test and learn new features without interrupting the original organization's data and business.
Data centers are growing to accommodate more internet-connected devices, with innovations helping achieve network coverage for billions of devices by 2020. As data centers grow, trends like software-driven infrastructure, microtechnology, and alternative energy use are making data centers more efficient by consolidating resources and reducing size. Hyperconvergence allows more efficient use of rack space by consolidating computer storage, networking, and virtualization in compact 2U systems from companies like Simplivity and Nutanix.
This document describes an online exam project created using J2EE. It was submitted as a thesis project to fulfill requirements for an industrial training program. The project aims to automate exam assessment and provide instant results and reports to reduce workload. It allows multiple choice questions and sending score notifications via email. Future enhancements could include additional question types and improved reusability, extensibility, and portability.
Similar to The little engine(s) that could: scaling online social networks (20)
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
The little engine(s) that could: scaling online social networks
1. ACM/SIGCOMM 2010 – New Delhi, India
The Little Engine(s)
that could: Scaling
Online Social Networks
Josep M. Pujol, Vijay Erramilli, Georgos Siganos, Xiaoyuan Yang
Nikos Laoutaris, Parminder Chhabra and Pablo Rodriguez
Telefonica Research, Barcelona
2. ACM/SIGCOMM 2010 – New Delhi, India
Problem
Solution
Implementation
Conclusions
2
3. ACM/SIGCOMM 2010 – New Delhi, India
Scalability is a pain: the designers’ dilemma
“Scalability is a desirable
property of a system, a network,
or a process, which indicates its
ability to either handle growing
amounts of work in a graceful
manner or to be readily enlarged"
The designers’ dilemma:
wasted complexity and long
time to market
vs.
short time to market but risk
of death by success
3
4. ACM/SIGCOMM 2010 – New Delhi, India
Towards transparent scalability
The advent of the Cloud:
virtualized machinery + gigabit network
Hardware scalability
Application scalability
4
5. ACM/SIGCOMM 2010 – New Delhi, India
Towards transparent scalability
Elastic resource allocation for the
Presentation and the Logic layer:
More machines when load increases.
Components are stateless, therefore,
independent and duplicable
Components are interdependent,
therefore, non-duplicable on demand.
5
6. ACM/SIGCOMM 2010 – New Delhi, India
Scaling the data backend...
• Full replication
– Load of read requests decrease with the number of servers
– State does not decrease with the number of servers
6
7. ACM/SIGCOMM 2010 – New Delhi, India
Scaling the data backend...
• Full replication
– Load of read requests decrease with the number of servers
– State does not decrease with the number of servers
– Maintains data locality
7
8. ACM/SIGCOMM 2010 – New Delhi, India
Scaling the data backend...
• Full replication
– Load of read requests decrease with the number of servers
– State does not decrease with the number of servers
– Maintains data locality
Operations on the data (e.g. a query) can be resolved in a single
server from the application perspective
Application Logic
Data Store
Application Logic
Data
Store
Data
Store
Data
Store
Centralized Distributed
8
9. ACM/SIGCOMM 2010 – New Delhi, India
Scaling the data backend...
• Full replication
– Load of read requests decrease with the number of servers
– State does not decrease with the number of servers
– Maintains data locality
• Horizontal partitioning or sharding
– Load of read requests decrease with the number of servers
– State decrease with the number of servers
– Maintains data locality as long as…
9
10. ACM/SIGCOMM 2010 – New Delhi, India
Scaling the data backend...
• Full replication
– Load of read requests decrease with the number of servers
– State does not decrease with the number of servers
– Maintains data locality
• Horizontal partitioning or sharding
– Load of read requests decrease with the number of servers
– State decrease with the number of servers
– Maintains data locality as long as…
… the splits are disjoint (independent)
Bad news for OSNs!!
10
11. ACM/SIGCOMM 2010 – New Delhi, India
Why sharding is not enough for OSN?
• Shards in OSN can never be
disjoint because:
– Operations on user i require
access to the data of other
users
– at least one-hop away.
• From graph theory:
– there is no partition that for all
nodes all neighbors and itself are
in the same partition if there is a
single connected component.
Data locality cannot be
maintained by partitioning a
social network!!
11
12. ACM/SIGCOMM 2010 – New Delhi, India
A very active area…
• Online Social Networks are massive systems spanning hundreds
of machines in multiple datacenters
• Partition and placement to minimize network traffic and
delay
– Karagiannis et al. “Hermes: Clustering Users in Large-Scale
Email Services” -- SoCC 2010
– Agarwal et al. “Volley: Automated Data Placement of Geo-
Distributed Cloud Services” -- NSDI 2010
• Partition and replication towards scalability
– Pujol et al “Scaling Online Social Networks without Pains” --
NetDB / SOSP 2009
12
13. ACM/SIGCOMM 2010 – New Delhi, India
Problem
Solution
Implementation
Conclusions
13
14. ACM/SIGCOMM 2010 – New Delhi, India
OSN’s operations 101
1 2 3 4
5
6789
0
14
15. ACM/SIGCOMM 2010 – New Delhi, India
OSN’s operations 101
1 2 3 4
5
6789
0
Me
15
20. ACM/SIGCOMM 2010 – New Delhi, India
OSN’s operations 101
1 2 3 4
5
6789
0
[key-146, key-121, key-105, key-101]
key-101
key-105
key-121
key-146
Let’s see what’s going on
in the world
20
21. ACM/SIGCOMM 2010 – New Delhi, India
OSN’s operations 101
1 2 3 4
5
6789
0
[key-146, key-121, key-105, key-101]
key-101zz
key-105
key-121
key-146
Let’s see what’s going on
in the world
1) FETCH –inbox [from server ]A
21
22. ACM/SIGCOMM 2010 – New Delhi, India
OSN’s operations 101
1 2 3 4
5
6789
0
[key-146, key-121, key-105, key-101]
key-101
key-105
key-121
key-146
Let’s see what’s going on
in the world
1) FETCH –inbox [from server ]
R= [key-146, key-121, key-105, key-101]
A
22
23. ACM/SIGCOMM 2010 – New Delhi, India
OSN’s operations 101
1 2 3 4
5
6789
0
[key-146, key-121, key-105, key-101]
key-101
key-105
key-121
key-146
Let’s see what’s going on
in the world
1) FETCH –inbox [from server ]
2) FETCH text WHERE key IN R [from
server ]
R= [key-146, key-121, key-105, key-101]
A
?
23
24. ACM/SIGCOMM 2010 – New Delhi, India
OSN’s operations 101
1 2 3 4
5
6789
0
[key-146, key-121, key-105, key-101]
key-101
key-105
key-121
key-146
Let’s see what’s going on
in the world
1) FETCH –inbox [from server ]
2) FETCH text WHERE key IN R [from
server ]
Where is the data?
R= [key-146, key-121, key-105, key-101]
A
?
24
25. ACM/SIGCOMM 2010 – New Delhi, India
OSN’s Operations 101
1 2 3 4
5
6789
0
[key-146, key-121, key-105, key-101]
key-101
key-105
key-121
key-146
1) FETCH –inbox [from server ]
2) FETCH text WHERE key IN R [from
server ]
in , then data locality!
R= [key-146, key-121, key-105, key-101]
A
?
A
A AA
A
25
26. ACM/SIGCOMM 2010 – New Delhi, India
OSN’s Operations 101
1 2 3 4
5
6789
0
[key-146, key-121, key-105, key-101]
key-101
key-105
key-121
key-146
1) FETCH –inbox [from server ]
2) FETCH text WHERE key IN R [from
server ]
in , then no
data locality!
R= [key-146, key-121, key-105, key-101]
A
?
E
A FB
B F E
26
27. ACM/SIGCOMM 2010 – New Delhi, India
OSN’s operations 101
1) Relational Databases Selects and joins across multiple shards of
the database are possible but performance is
poor (e.g. MySQL Cluster, Oracle Rack)
2) Key-Value Stores (DHT) More efficient than relational databases:
multi-get primitives to transparently fetch
data from multiple servers.
But it’s not a silver bullet:
o lose SQL query language =>
programmatic queries
o lose abstraction from data operations
o suffer from high traffic, eventually
affecting performance:
• Incast issue
• Multi-get hole
• latency dominated by the worse
performing server
27
28. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
28
29. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0
Server 1 Server 2
Users are assigned to
servers randomly
Random partition
(DHT-like)
29
30. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 Random Partitioning – DHT
30
31. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 Random Partitioning – DHT
31
32. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 Random Partitioning – DHT
How do we enforce data locality?
32
33. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 Random Partitioning – DHT
How do we enforce data locality?
Replication
33
34. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0
4 7
34
36. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 That does it for user #3, but
what about the others?
4 7
36
37. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 That does it for user #3, but
what about the others?
The very same procedure…4 7
37
38. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 Data locality for reads for any user is
guaranteed!
4 7 1
0 5
3 2 8
9 6
After a while…
38
39. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 Data locality for reads for any user is
guaranteed!
4 7 1
0 5
3 2 8
9 6
39
40. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 Data locality for reads for any user is
guaranteed!
4 7 1
0 5
3 2 8
9 6
40
41. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 Data locality for reads for any user is
guaranteed!
Random partition + Replication can lead
to high replication overheads!
4 7 1
0 5
3 2 8
9 6
This is full
replication!!
41
42. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 Data locality for reads for any user is
guaranteed!
Random partition + Replication can lead
to high replication overheads!
4 7 1
0 5
3 2 8
9 6
Can we do better?
42
43. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 Data locality for reads in any user is
guaranteed!
Random partition + Replication can lead
to high replication overheads!
4 7 1
0 5
3 2 8
9 6
We can leverage the underlying
social structure for the partition…
43
44. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 Data locality for reads in any user is
guaranteed!
Random partition + Replication can lead
to high replication overheads!
4 7 1
0 5
3 2 8
9 6
We can leverage the underlying
social structure for the partition…
44
45. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 Data locality for reads in any user is
guaranteed!
Random partition + Replication can lead
to high replication overheads!
4 7 1
0 5
3 2 8
9 6
We can leverage the underlying
social structure for the partition…
0 43
9
2
68
5
7
1 Social Partition
45
46. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 Data locality for reads in any user is
guaranteed!
Random partition + Replication can lead
to high replication overheads!
4 7 1
0 5
3 2 8
9 6
We can leverage the underlying
social structure for the partition…
0 43
9
2
68
5
7
1 Social Partition and Replication: SPAR
3 2
46
47. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 Data locality for reads in any user is
guaranteed!
Random partition + Replication can lead
to high replication overheads!
4 7 1
0 5
3 2 8
9 6
We can leverage the underlying
social structure for the partition…
0 43
9
2
68
5
7
1 Social Partition and Replication: SPAR
… only two replicas to achieve data
locality
3 2
47
48. ACM/SIGCOMM 2010 – New Delhi, India
Maintaining data locality
1 2 3 4
5
6789
0
1 23
4
5
67
8
9
0 Data locality for reads in any user is
guaranteed!
Random partition + Replication can lead
to high replication overheads!
4 7 1
0 5
3 2 8
9 6
We can leverage the underlying
social structure for the partition…
0 43
9
2
68
5
7
1 Social Partition and Replication: SPAR
… only two replicas to achieve data
locality
3 2
48
49. ACM/SIGCOMM 2010 – New Delhi, India
SPAR Algorithm from 10000 feet
“All partition algorithms are equal, but some are more equal than others”
49
50. ACM/SIGCOMM 2010 – New Delhi, India
SPAR Algorithm from 10000 feet
“All partition algorithms are equal, but some are more equal than others”
1) Online (incremental) a) Dynamics of the SN (add/remove node, edge)
b) Dynamics of the System (add/remove server)
2) Fast (and simple) a) Local information
b) Hill-climbing heuristic
c) Load-balancing via back-pressure
3) Stable a) Avoid cascades
4) Effective a) Optimize for MIN_REPLICA (i.e. NP-Hard)
b) While maintaining a fixed number of replicas per
user for redundancy (e.g. K=2)
50
51. ACM/SIGCOMM 2010 – New Delhi, India
Looks good on paper, but...
Real OSN data
• Twitter
– 2.4M users, 48M edges,
12M tweets for 15 days
(Twitter as of Dec08)
• Orkut (MPI)
– 3M users, 223M edges
• Facebook (MPI)
– 60K users, 500K edges
Partition algorithms
• Random (DHT)
• METIS
• MO+
• modularity
optimization + hack
• SPAR online
... let’s try it for real
51
52. ACM/SIGCOMM 2010 – New Delhi, India
How many replicas are generated?
Twitter 16 partitions Twitter 128 partitions
Rep. overhead % over K=2 Rep. overhead % over K=2
SPAR 2.44 +22% 3.69 +84%
MO+ 3.04 +52% 5.14 +157%
METIS 3.44 +72% 8.03 +302%
Random 5.68 +184% 10.75 +434%
52
53. ACM/SIGCOMM 2010 – New Delhi, India
How many replicas are generated?
Twitter 16 partitions Twitter 128 partitions
Rep. overhead % over K=2 Rep. overhead % over K=2
SPAR 2.44 +22% 3.69 +84%
MO+ 3.04 +52% 5.14 +157%
METIS 3.44 +72% 8.03 +302%
Random 5.68 +184% 10.75 +434%
2.44 replicas per user (on average)
2 to guarantee redundacy
+ 0.44 to guarantee data locality (+22%)
2.44
53
54. ACM/SIGCOMM 2010 – New Delhi, India
How many replicas are generated?
Twitter 16 partitions Twitter 128 partitions
Rep. overhead % over K=2 Rep. overhead % over K=2
SPAR 2.44 +22% 3.69 +84%
MO+ 3.04 +52% 5.14 +157%
METIS 3.44 +72% 8.03 +302%
Random 5.68 +184% 10.75 +434%
Replication with random partitioning,
too costly!
54
55. ACM/SIGCOMM 2010 – New Delhi, India
How many replicas are generated?
Twitter 16 partitions Twitter 128 partitions
Rep. overhead % over K=2 Rep. overhead % over K=2
SPAR 2.44 +22% 3.69 +84%
MO+ 3.04 +52% 5.14 +157%
METIS 3.44 +72% 8.03 +302%
Random 5.68 +184% 10.75 +434%
Other social based partitioning algorithms
have higher replication overhead than SPAR.
55
56. ACM/SIGCOMM 2010 – New Delhi, India
Problem
Solution
Implementation
Conclusions
56
58. ACM/SIGCOMM 2010 – New Delhi, India
The SPAR architecture
Application, 3-tiers
58
59. ACM/SIGCOMM 2010 – New Delhi, India
The SPAR architecture
Application, 3-tiers
Application, 3-tiers
SPAR Middleware
(data-store specific)
59
60. ACM/SIGCOMM 2010 – New Delhi, India
The SPAR architecture
Application, 3-tiers
Application, 3-tiers
SPAR Middleware
(data-store specific)
SPAR Controller :
Partiton manager and
Directory Service
60
61. ACM/SIGCOMM 2010 – New Delhi, India
SPAR in the wild
• Twitter clone (Laconica, now Statusnet)
– Centralized architecture, PHP + MySQL/Postgres
• Twitter data
– Twitter as of end of 2008, 2.4M users, 12M tweets in 15 days
• The little engines
– 16 commodity desktops: Pentium Duo at 2.33Ghz, 2GB RAM
connected with Gigabit-Ethernet switch
• Test SPAR on top of:
– MySQL (v5.5)
– Cassandra (v0.5.0)
61
62. ACM/SIGCOMM 2010 – New Delhi, India
SPAR on top of MySQL
• Different read request rate (application level):
– 1 call to LDS, 1 join to on the notice and notice inbox to get the last 20
tweets and its content
– User is queried only once per session (4min)
– Spar allows for data to be partitioned, improving both query times
and cache hit ratios.
– Full replication suffers from low cache hit ratio and large tables
(1B+)
MySQL Full-Replication SPAR + MySQL
99th < 150ms 16 req/s 2500 req/s
62
63. ACM/SIGCOMM 2010 – New Delhi, India
SPAR on top of Cassandra
Network bandwidth is not an issue but
Network I/O and CPU I/O are:
– Network delays is reduced
• Worse performing server produces
delays
– Memory hit ratio is increased
• Random partition destroys correlations
•Different read request rate (application level):
Vanilla
Cassandra
SPAR +
Cassandra
99th < 100ms 200 req/s 800 req/s
63
64. ACM/SIGCOMM 2010 – New Delhi, India
Problem
Solution
Implementation
Conclusions
64
65. ACM/SIGCOMM 2010 – New Delhi, India
Conclusions
SPAR provides the means
to achieve transparent
scalability
1) For applications using
RDBMS (not necessarily
limited to OSN)
2) And, a performance
boost for key-value stores
due to the reduction of
Network I/O
65
66. ACM/SIGCOMM 2010 – New Delhi, India
jmps@tid.es
http://research.tid.es/jmps/
@solso
Thanks!
66