Cyber Analytics Applications for Data-Intensive ComputingMike Fisk
This document discusses applying data-intensive computing techniques to cyber security problems. It describes three characteristic cyber problems: query and retrieval of large datasets, time-series anomaly detection to find unusual patterns, and non-local graph analysis to examine global and local properties of network activity. A file-oriented MapReduce approach called FileMap is proposed to enable parallel processing of large datasets across multiple nodes in a way that is compatible with existing analysis tools. This approach aims to make distributed querying and iterative analysis more efficient.
In 2001, as early high-speed networks were deployed, George Gilder observed that “when the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances.” Two decades later, our networks are 1,000 times faster, our appliances are increasingly specialized, and our computer systems are indeed disintegrating. As hardware acceleration overcomes speed-of-light delays, time and space merge into a computing continuum. Familiar questions like “where should I compute,” “for what workloads should I design computers,” and "where should I place my computers” seem to allow for a myriad of new answers that are exhilarating but also daunting. Are there concepts that can help guide us as we design applications and computer systems in a world that is untethered from familiar landmarks like center, cloud, edge? I propose some ideas and report on experiments in coding the continuum.
This document provides an overview of RDF stream processing and existing RDF stream processing engines. It discusses RDF streams and how sensor data can be represented as RDF streams. It also summarizes some existing RDF stream processing query languages and systems, including C-SPARQL, and the features they support like continuous execution, operators, and time-based windows. The document is intended as a tutorial for developers on working with RDF stream processing.
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkFlink Forward
1) Apache SAMOA is a platform for mining big data streams in real-time that provides algorithms, libraries and an execution framework.
2) It allows researchers to develop and compare stream mining algorithms and practitioners to easily apply state-of-the-art algorithms to problems like sentiment analysis, spam detection and recommendations.
3) The Vertical Hoeffding Tree algorithm in SAMOA provides high parallelism and accuracy for streaming decision tree learning, outperforming native Apache Flink implementations on certain datasets while being faster on others.
The document discusses requirements and approaches for RDF stream processing (RSP). It covers the following key points in 3 sentences:
RSP aims to process continuous RDF streams to address scenarios like sensor data and social media. It involves querying streaming data, integrating streams with static data, and handling issues like imperfections. The document reviews existing RSP systems and languages, actor-based approaches, and the 8 requirements for real-time stream processing including keeping data moving, generating predictable outcomes, and responding instantaneously.
This document summarizes Jean-Paul Calbimonte's presentation on connecting stream reasoners on the web. It discusses representing data streams as RDF and using RDF stream processing systems. Key points include:
- RDF streams can be represented as sequences of timestamped RDF graphs.
- The W3C RSP community group is working to standardize RDF stream models and query languages.
- Producing RDF streams involves mapping live data sources to RDF and adding timestamps.
- Consuming RDF streams involves discovering stream metadata and endpoints to access the streams.
- Systems like TripleWave demonstrate approaches for spreading RDF streams on the web.
My talk at the Winter School on Big Data in Tarragona, Spain.
Abstract: We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to leverage the “cloud” (whether private or public) to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of large-scale outsourcing and automation for science, and suggest opportunities and challenges for today’s researchers.
Dongwon Kim – A Comparative Performance Evaluation of FlinkFlink Forward
This document provides a summary and analysis of a performance evaluation comparing the big data processing engine Flink to other engines like Spark, Tez, and MapReduce. The key points are:
- Flink completes a 3.2TB TeraSort benchmark faster than Spark, Tez, and MapReduce due to its pipelined execution model which allows more overlap between stages compared to the other engines.
- While Tez and Spark attempt to overlap stages, in practice they do not due to the way tasks are scheduled and launched. MapReduce shows some overlap but is still slower.
- Flink causes fewer disk accesses during shuffling by transferring data directly from memory to memory instead of writing to disk like
Cyber Analytics Applications for Data-Intensive ComputingMike Fisk
This document discusses applying data-intensive computing techniques to cyber security problems. It describes three characteristic cyber problems: query and retrieval of large datasets, time-series anomaly detection to find unusual patterns, and non-local graph analysis to examine global and local properties of network activity. A file-oriented MapReduce approach called FileMap is proposed to enable parallel processing of large datasets across multiple nodes in a way that is compatible with existing analysis tools. This approach aims to make distributed querying and iterative analysis more efficient.
In 2001, as early high-speed networks were deployed, George Gilder observed that “when the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances.” Two decades later, our networks are 1,000 times faster, our appliances are increasingly specialized, and our computer systems are indeed disintegrating. As hardware acceleration overcomes speed-of-light delays, time and space merge into a computing continuum. Familiar questions like “where should I compute,” “for what workloads should I design computers,” and "where should I place my computers” seem to allow for a myriad of new answers that are exhilarating but also daunting. Are there concepts that can help guide us as we design applications and computer systems in a world that is untethered from familiar landmarks like center, cloud, edge? I propose some ideas and report on experiments in coding the continuum.
This document provides an overview of RDF stream processing and existing RDF stream processing engines. It discusses RDF streams and how sensor data can be represented as RDF streams. It also summarizes some existing RDF stream processing query languages and systems, including C-SPARQL, and the features they support like continuous execution, operators, and time-based windows. The document is intended as a tutorial for developers on working with RDF stream processing.
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkFlink Forward
1) Apache SAMOA is a platform for mining big data streams in real-time that provides algorithms, libraries and an execution framework.
2) It allows researchers to develop and compare stream mining algorithms and practitioners to easily apply state-of-the-art algorithms to problems like sentiment analysis, spam detection and recommendations.
3) The Vertical Hoeffding Tree algorithm in SAMOA provides high parallelism and accuracy for streaming decision tree learning, outperforming native Apache Flink implementations on certain datasets while being faster on others.
The document discusses requirements and approaches for RDF stream processing (RSP). It covers the following key points in 3 sentences:
RSP aims to process continuous RDF streams to address scenarios like sensor data and social media. It involves querying streaming data, integrating streams with static data, and handling issues like imperfections. The document reviews existing RSP systems and languages, actor-based approaches, and the 8 requirements for real-time stream processing including keeping data moving, generating predictable outcomes, and responding instantaneously.
This document summarizes Jean-Paul Calbimonte's presentation on connecting stream reasoners on the web. It discusses representing data streams as RDF and using RDF stream processing systems. Key points include:
- RDF streams can be represented as sequences of timestamped RDF graphs.
- The W3C RSP community group is working to standardize RDF stream models and query languages.
- Producing RDF streams involves mapping live data sources to RDF and adding timestamps.
- Consuming RDF streams involves discovering stream metadata and endpoints to access the streams.
- Systems like TripleWave demonstrate approaches for spreading RDF streams on the web.
My talk at the Winter School on Big Data in Tarragona, Spain.
Abstract: We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to leverage the “cloud” (whether private or public) to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of large-scale outsourcing and automation for science, and suggest opportunities and challenges for today’s researchers.
Dongwon Kim – A Comparative Performance Evaluation of FlinkFlink Forward
This document provides a summary and analysis of a performance evaluation comparing the big data processing engine Flink to other engines like Spark, Tez, and MapReduce. The key points are:
- Flink completes a 3.2TB TeraSort benchmark faster than Spark, Tez, and MapReduce due to its pipelined execution model which allows more overlap between stages compared to the other engines.
- While Tez and Spark attempt to overlap stages, in practice they do not due to the way tasks are scheduled and launched. MapReduce shows some overlap but is still slower.
- Flink causes fewer disk accesses during shuffling by transferring data directly from memory to memory instead of writing to disk like
IEEE ICC 2013 - Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia CaseKalman Graffi
Lars Bremer and Kalman Graffi. Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case. In IEEE ICC ’13: Proceedings of the International Conference on Communications, 2013.
Abstract—Comparative evaluations of peer-to-peer protocols through simulations are a viable approach to judge the per- formance and costs of the individual protocols in large-scale networks. In order to support this work, we enhanced the peer- to-peer systems simulator PeerfactSim.KOM with a fine-grained analyzer concept, with exhaustive automated measurements and gnuplot generators as well as a coordination control to evaluate a set of experiment setups in parallel. Thus, by configuring all experiments and protocols only once and starting the simulator, all desired measurements are performed, analyzed, evaluated and combined, resulting in a holistic environment for the comparative evaluation of peer-to-peer systems.
Abstract—Cloud computing offers high availability, dynamic scalability, and elasticity requiring only very little administration. However, this service comes with financial costs. Peer-to-peer systems, in contrast, operate at very low costs but cannot match the quality of service of the cloud. This paper focuses on the case study of Wikipedia and presents an approach to reduce the operational costs of hosting similar websites in the cloud by using a practical peer-to-peer approach. The visitors of the site are joining a Chord overlay, which acts as first cache for article lookups. Simulation results show, that up to 72% of the article lookups in Wikipedia could be answered by other visitors instead of using the cloud.
The document discusses peer-to-peer (P2P) networking. It defines P2P networking as allowing computers and software to function without special servers by directly connecting to other peers. It then discusses the history and basic concepts of P2P, including definitions, overlay networks, typical problems faced, and some popular P2P applications like file sharing using Napster, Gnutella, Kazaa, and BitTorrent. It also briefly discusses P2P for voice over IP using Skype and efforts to implement P2P on mobile devices.
A Brief Note On Peer And Peer ( P2P ) Applications Have No...Brenda Thomas
The document discusses peer-to-peer (P2P) networks and server-based client/server networks. In a P2P network, all computers have equal privileges to share and access information directly without restrictions. P2P networks are easier to set up but provide less security. In a client/server network, file storage and management is centralized on a server. This provides better security but is more complex to set up and manage. The document explores the advantages and disadvantages of each type of network for different usage contexts.
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?Gabriele Bozzi
1. The document discusses the potential for peer-to-peer (P2P) computing as an alternative or complement to the traditional client-server model, especially in the context of cloud computing.
2. It notes challenges with P2P such as lack of centralized control and potential for freeloading, but also advantages like harnessing unused resources.
3. Emerging technologies like autonomic and cognitive networking aim to address P2P challenges by enabling self-configuration and optimization of distributed resources.
1. The document discusses the potential for peer-to-peer (P2P) computing as an alternative or complement to the traditional client-server model, especially in the context of cloud computing.
2. P2P systems offer access to distributed resources but lack centralized control, which makes it difficult to ensure reliability, performance, and security.
3. Autonomic and cognitive approaches may help address issues with P2P by enabling self-configuration, healing, optimization and protection of distributed resources.
4. Future networking approaches like DirecNet envision high-speed mobile mesh networks that could further enable wide-scale distributed computing architectures.
This document provides an overview of network refactoring and offloading trends, including fluid network planes. It discusses the evolution of SDN from 2009 to 2019 and concepts like network softwarization. Instances of fluid network planes are described, such as RouteFlow, NFV layers, and VNF offloading to hardware or multi-vendor P4 fabrics. The document also covers slicing for IoT analytics and references recent works on in-network computing, fast connectivity recovery, and scaling distributed machine learning with in-network aggregation.
Leveraging the power of SolrCloud and Spark with OpenShiftQAware GmbH
Kubernetes/Cloud-Native-Meetup September 2018, Munich : Talk by Franz Wimmer (@zalintyre, Software Engineer at QAware)
Abstract: One of the most commonly used big data processing frameworks is Apache Spark. Spark manages to process large datasets with parallelization. Solr is a search platform based on Lucene. Solr can be distributed across a cluster using ZooKeeper for configuration management. Both applications can be combined to create performant Big Data applications.
But what if you want to scale up horizonally and add a node? In a manual setup, you'd have to install the new node manually. Cluster orchestrators like OpenShift claim to solve this problem.
This talk shows how to put Spark, Solr and ZooKeeper into containers, which can then be scaled individually inside a cluster using OpenShift. We will cover OpenShift details like DeploymentConfigs, StatefulSets, Services, Routes and Persistent Volumes and install a complete, failsafe and horizontally scaleable SolrCloud / Spark / Zookeeper cluster in seconds.
You will also learn about the drawbacks and pitfalls of running Big Data applications inside an OpenShift cluster.
LibreSocial - P2P Framework for Social Networks - OverviewKalman Graffi
Digital social networks promise to activate the social participants and to support them in their interactivity patterns. Private relationships evolve to friendships, professional contacts define competence networks and political opinions emerge to revolutionary trends. Social networks often act as driving force to intensify the social and global relationships.
In future, using the „Peer-to-Peer Framework for Social Networks“ everybody may host easily and out-of-the-box his personal online social network, without operating costs and without security risks. The framework offers a large set of interactive apps, which can be are freely combinable and technically limitless in their applicability.
The operating costs for such a social network are a revolutionary: no expenses arise. Whether a network for 10 users or for a global network of Millions of users, one aspect is common: due to the peer-to-peer technology used, no expenses arise. Researchers led by Dr.-Ing. Kalman Graffi at the University of Paderborn combined in the framework the advantages of decentralized peer-to-peer applications, of an app market as well as the cloud principle.
The social network is maintained in a peer-to-peer fashion through the computational power of the users’ devices, expensive servers are not needed. Still the availability, retrievability and security of the users‘ data are guaranteed. Each user keeps total control on the access control rights of his data. Similar to the main property of the cloud, the network’s capabilities grow elastically with the number of users. Further plugins can be developed easily. An app market that is included allows to provide these plugins in order to extend the capabilities and applications in the social network on the fly.
Enormous application opportunities without operating costs are the main reason to use the „P2P Framework for Social Networks“ emphasize the researchers of the corresponding project group at the University of Paderborn. The software as a prototype is already in use. Contact us for more information.
Presentation on RDF Stream Processing models given at the SR4LD tutorial (ISWC 2013) -- updated version at: http://www.slideshare.net/dellaglio/rsp2014-01rspmodelsss
Metacomputer Architecture of the Global LambdaGridLarry Smarr
06.01.13
Invited Talk
Department of Computer Science
Donald Bren School of Information and Computer Sciences
Title: Metacomputer Architecture of the Global LambdaGrid
Irvine, CA
This document discusses web and social computing and peer-to-peer networks. It provides an overview of peer-to-peer network types including unstructured and structured networks. It also describes PeerSim, a peer-to-peer network simulator. The document outlines implementing maximum and minimum functions in PeerSim and analyzing the results. New methods were designed, run, and graphs of the outputs were generated to study how the maximum and minimum values changed over simulations.
Peer-to-peer and mobile networks gained significant attention of both research community and industry. Applying the peer-to-peer paradigm in mobile networks lead to several problems regarding the bandwidth demand of peer-to-peer networks. Time-critical messages are delayed and delivered unacceptably slow. In addition to this, scarce bandwidth is wasted on messages of less priority. Therefore, the focus of this paper is on bandwidth management issues at the overlay layer and how they can be solved. We present HiPNOS.KOM, a priority based scheduling and active queue management system. It guarantees better QoS for higher prioritized messages in upper network layers of peer-to-peer systems. Evaluation using the peer-to-peer simulator PeerfactSim.KOM shows that HiPNOS.KOM brings significant improvement in Kademlia in comparison to FIFO and Drop-Tail, strategies that are used nowadays on each peer. User initiated lookups have in Kademlia 24% smaller operation duration when using HiPNOS.KOM.
SDN and NFV aim to make networks more flexible and simplify their management by separating the network control plane from the data plane and decoupling software functions from hardware. Key benefits include virtualization, orchestration, programmability, dynamic scaling, automation, visibility, performance optimization, multi-tenancy, service integration and openness. SDN controls the data plane through a centralized controller and interface, while NFV virtualizes network functions. Different SDN models take varying approaches to where the control plane resides and how the control and data planes communicate and are programmed.
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTINGijp2p
Peer-to-Peer (P2P) networking emerged as a disruptive business model displacing the server based
networks within a point in time.P2P technologies are on the edge of becoming all-purpose in developing
several applications for social networking. In the past seventeen years, research on P2P computing and
systems has received enormous amount of attention in the areas of academia and the industry. P2P rose to
triumphant profit-making systems in the internet. It represents the best incarnation of the end to end
argument, the frequently disputed design philosophies that guided the design of the internet. The doubting
factor then is why is research on P2P computing now fading from the spotlight and suffering a nose dive
fall as dramatic as its rise to its popularity. This paper is going to capture a quick look at past results in
peer-to-peer computing with focus on understanding what led to its rise, what contributed to its commercial
success and what has led to its lack of interest. The insight of this paper introduces cloud computing as a
paradigm to peer-to-peer computing.
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTINGijp2p
Peer-to-Peer (P2P) networking emerged as a disruptive business model displacing the server based
networks within a point in time.P2P technologies are on the edge of becoming all-purpose in developing
several applications for social networking. In the past seventeen years, research on P2P computing and
systems has received enormous amount of attention in the areas of academia and the industry. P2P rose to
triumphant profit-making systems in the internet. It represents the best incarnation of the end to end
argument, the frequently disputed design philosophies that guided the design of the internet. The doubting
factor then is why is research on P2P computing now fading from the spotlight and suffering a nose dive
fall as dramatic as its rise to its popularity. This paper is going to capture a quick look at past results in
peer-to-peer computing with focus on understanding what led to its rise, what contributed to its commercial
success and what has led to its lack of interest. The insight of this paper introduces cloud computing as a
paradigm to peer-to-peer computing.
Huawei Advanced Data Science With Spark StreamingJen Aman
This document discusses streamDM, an open source machine learning library for stream mining in Spark Streaming. It summarizes streamDM's capabilities for incremental learning on data streams using algorithms like SGD, Naive Bayes, clustering and decision trees. Examples of using streamDM in Huawei's network alarm analysis and fault localization systems are provided, demonstrating improvements in efficiency, accuracy and ability to handle large volumes of streaming data. The document encourages researchers to apply for Huawei's Innovation Research Program grants to further collaborative work on stream mining algorithms and applications.
Dissertation title and final project: Data source registration in the Virtual Laboratory. The subject of the thesis and related project was to integrate EGEE/WLCG data sources into GridSpace Virtual Laboratory (http://gs.cyfronet.pl/).
Poster presentation entitled Integrating EGEE Storage Services with the Virtual Laboratory:
http://www.plgrid.pl/en/pr_materials/posters
Dissertation available at http://virolab.cyfronet.pl/trac/vlvl#MasterofScienceThesesrelatedtoViroLab
IEEE CRS 2014 - Secure Distributed Data Structures for Peer-to-Peer-based Soc...Kalman Graffi
The document describes research into secure distributed data structures for peer-to-peer social networks. It proposes using distributed lists to store social media content like guestbook entries or photos, partitioned into buckets stored on different nodes. Remote operations are introduced to allow efficient list manipulation with less network traffic than retrieving full buckets. Access control is implemented cryptographically by encrypting or signing list elements and buckets. Evaluation shows the approach reduces network traffic compared to naïve distributed list implementations.
Timo Klerx and Kalman Graffi. Bootstrapping Skynet: Calibration and Autonomic Self-Control of Structured Peer-to-Peer Networks. In IEEE P2P ’13: Proceedings of the International Conference on Peer-to-Peer Computing, 2013.
Abstract—Peer-to-peer systems scale to millions of nodes and provide routing and storage functions with best effort quality. In order to provide a guaranteed quality of the overlay functions, even under strong dynamics in the network with regard to peer capacities, online participation and usage patterns, we propose to calibrate the peer-to-peer overlay and to autonomously learn which qualities can be reached. For that, we simulate the peer- to-peer overlay systematically under a wide range of parameter configurations and use neural networks to learn the effects of the configurations on the quality metrics. Thus, by choosing a specific quality setting by the overlay operator, the network can tune itself to the learned parameter configurations that lead to the desired quality. Evaluation shows that the presented self-calibration succeeds in learning the configuration-quality interdependencies and that peer-to-peer systems can learn and adapt their behavior according to desired quality goals.
More Related Content
Similar to Kalman Graffi - IEEE ICC 2013 - Symbiotic Coupling of Peer-to-Peer and Cloud Systems: The Wikipedia Case
IEEE ICC 2013 - Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia CaseKalman Graffi
Lars Bremer and Kalman Graffi. Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case. In IEEE ICC ’13: Proceedings of the International Conference on Communications, 2013.
Abstract—Comparative evaluations of peer-to-peer protocols through simulations are a viable approach to judge the per- formance and costs of the individual protocols in large-scale networks. In order to support this work, we enhanced the peer- to-peer systems simulator PeerfactSim.KOM with a fine-grained analyzer concept, with exhaustive automated measurements and gnuplot generators as well as a coordination control to evaluate a set of experiment setups in parallel. Thus, by configuring all experiments and protocols only once and starting the simulator, all desired measurements are performed, analyzed, evaluated and combined, resulting in a holistic environment for the comparative evaluation of peer-to-peer systems.
Abstract—Cloud computing offers high availability, dynamic scalability, and elasticity requiring only very little administration. However, this service comes with financial costs. Peer-to-peer systems, in contrast, operate at very low costs but cannot match the quality of service of the cloud. This paper focuses on the case study of Wikipedia and presents an approach to reduce the operational costs of hosting similar websites in the cloud by using a practical peer-to-peer approach. The visitors of the site are joining a Chord overlay, which acts as first cache for article lookups. Simulation results show, that up to 72% of the article lookups in Wikipedia could be answered by other visitors instead of using the cloud.
The document discusses peer-to-peer (P2P) networking. It defines P2P networking as allowing computers and software to function without special servers by directly connecting to other peers. It then discusses the history and basic concepts of P2P, including definitions, overlay networks, typical problems faced, and some popular P2P applications like file sharing using Napster, Gnutella, Kazaa, and BitTorrent. It also briefly discusses P2P for voice over IP using Skype and efforts to implement P2P on mobile devices.
A Brief Note On Peer And Peer ( P2P ) Applications Have No...Brenda Thomas
The document discusses peer-to-peer (P2P) networks and server-based client/server networks. In a P2P network, all computers have equal privileges to share and access information directly without restrictions. P2P networks are easier to set up but provide less security. In a client/server network, file storage and management is centralized on a server. This provides better security but is more complex to set up and manage. The document explores the advantages and disadvantages of each type of network for different usage contexts.
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?Gabriele Bozzi
1. The document discusses the potential for peer-to-peer (P2P) computing as an alternative or complement to the traditional client-server model, especially in the context of cloud computing.
2. It notes challenges with P2P such as lack of centralized control and potential for freeloading, but also advantages like harnessing unused resources.
3. Emerging technologies like autonomic and cognitive networking aim to address P2P challenges by enabling self-configuration and optimization of distributed resources.
1. The document discusses the potential for peer-to-peer (P2P) computing as an alternative or complement to the traditional client-server model, especially in the context of cloud computing.
2. P2P systems offer access to distributed resources but lack centralized control, which makes it difficult to ensure reliability, performance, and security.
3. Autonomic and cognitive approaches may help address issues with P2P by enabling self-configuration, healing, optimization and protection of distributed resources.
4. Future networking approaches like DirecNet envision high-speed mobile mesh networks that could further enable wide-scale distributed computing architectures.
This document provides an overview of network refactoring and offloading trends, including fluid network planes. It discusses the evolution of SDN from 2009 to 2019 and concepts like network softwarization. Instances of fluid network planes are described, such as RouteFlow, NFV layers, and VNF offloading to hardware or multi-vendor P4 fabrics. The document also covers slicing for IoT analytics and references recent works on in-network computing, fast connectivity recovery, and scaling distributed machine learning with in-network aggregation.
Leveraging the power of SolrCloud and Spark with OpenShiftQAware GmbH
Kubernetes/Cloud-Native-Meetup September 2018, Munich : Talk by Franz Wimmer (@zalintyre, Software Engineer at QAware)
Abstract: One of the most commonly used big data processing frameworks is Apache Spark. Spark manages to process large datasets with parallelization. Solr is a search platform based on Lucene. Solr can be distributed across a cluster using ZooKeeper for configuration management. Both applications can be combined to create performant Big Data applications.
But what if you want to scale up horizonally and add a node? In a manual setup, you'd have to install the new node manually. Cluster orchestrators like OpenShift claim to solve this problem.
This talk shows how to put Spark, Solr and ZooKeeper into containers, which can then be scaled individually inside a cluster using OpenShift. We will cover OpenShift details like DeploymentConfigs, StatefulSets, Services, Routes and Persistent Volumes and install a complete, failsafe and horizontally scaleable SolrCloud / Spark / Zookeeper cluster in seconds.
You will also learn about the drawbacks and pitfalls of running Big Data applications inside an OpenShift cluster.
LibreSocial - P2P Framework for Social Networks - OverviewKalman Graffi
Digital social networks promise to activate the social participants and to support them in their interactivity patterns. Private relationships evolve to friendships, professional contacts define competence networks and political opinions emerge to revolutionary trends. Social networks often act as driving force to intensify the social and global relationships.
In future, using the „Peer-to-Peer Framework for Social Networks“ everybody may host easily and out-of-the-box his personal online social network, without operating costs and without security risks. The framework offers a large set of interactive apps, which can be are freely combinable and technically limitless in their applicability.
The operating costs for such a social network are a revolutionary: no expenses arise. Whether a network for 10 users or for a global network of Millions of users, one aspect is common: due to the peer-to-peer technology used, no expenses arise. Researchers led by Dr.-Ing. Kalman Graffi at the University of Paderborn combined in the framework the advantages of decentralized peer-to-peer applications, of an app market as well as the cloud principle.
The social network is maintained in a peer-to-peer fashion through the computational power of the users’ devices, expensive servers are not needed. Still the availability, retrievability and security of the users‘ data are guaranteed. Each user keeps total control on the access control rights of his data. Similar to the main property of the cloud, the network’s capabilities grow elastically with the number of users. Further plugins can be developed easily. An app market that is included allows to provide these plugins in order to extend the capabilities and applications in the social network on the fly.
Enormous application opportunities without operating costs are the main reason to use the „P2P Framework for Social Networks“ emphasize the researchers of the corresponding project group at the University of Paderborn. The software as a prototype is already in use. Contact us for more information.
Presentation on RDF Stream Processing models given at the SR4LD tutorial (ISWC 2013) -- updated version at: http://www.slideshare.net/dellaglio/rsp2014-01rspmodelsss
Metacomputer Architecture of the Global LambdaGridLarry Smarr
06.01.13
Invited Talk
Department of Computer Science
Donald Bren School of Information and Computer Sciences
Title: Metacomputer Architecture of the Global LambdaGrid
Irvine, CA
This document discusses web and social computing and peer-to-peer networks. It provides an overview of peer-to-peer network types including unstructured and structured networks. It also describes PeerSim, a peer-to-peer network simulator. The document outlines implementing maximum and minimum functions in PeerSim and analyzing the results. New methods were designed, run, and graphs of the outputs were generated to study how the maximum and minimum values changed over simulations.
Peer-to-peer and mobile networks gained significant attention of both research community and industry. Applying the peer-to-peer paradigm in mobile networks lead to several problems regarding the bandwidth demand of peer-to-peer networks. Time-critical messages are delayed and delivered unacceptably slow. In addition to this, scarce bandwidth is wasted on messages of less priority. Therefore, the focus of this paper is on bandwidth management issues at the overlay layer and how they can be solved. We present HiPNOS.KOM, a priority based scheduling and active queue management system. It guarantees better QoS for higher prioritized messages in upper network layers of peer-to-peer systems. Evaluation using the peer-to-peer simulator PeerfactSim.KOM shows that HiPNOS.KOM brings significant improvement in Kademlia in comparison to FIFO and Drop-Tail, strategies that are used nowadays on each peer. User initiated lookups have in Kademlia 24% smaller operation duration when using HiPNOS.KOM.
SDN and NFV aim to make networks more flexible and simplify their management by separating the network control plane from the data plane and decoupling software functions from hardware. Key benefits include virtualization, orchestration, programmability, dynamic scaling, automation, visibility, performance optimization, multi-tenancy, service integration and openness. SDN controls the data plane through a centralized controller and interface, while NFV virtualizes network functions. Different SDN models take varying approaches to where the control plane resides and how the control and data planes communicate and are programmed.
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTINGijp2p
Peer-to-Peer (P2P) networking emerged as a disruptive business model displacing the server based
networks within a point in time.P2P technologies are on the edge of becoming all-purpose in developing
several applications for social networking. In the past seventeen years, research on P2P computing and
systems has received enormous amount of attention in the areas of academia and the industry. P2P rose to
triumphant profit-making systems in the internet. It represents the best incarnation of the end to end
argument, the frequently disputed design philosophies that guided the design of the internet. The doubting
factor then is why is research on P2P computing now fading from the spotlight and suffering a nose dive
fall as dramatic as its rise to its popularity. This paper is going to capture a quick look at past results in
peer-to-peer computing with focus on understanding what led to its rise, what contributed to its commercial
success and what has led to its lack of interest. The insight of this paper introduces cloud computing as a
paradigm to peer-to-peer computing.
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTINGijp2p
Peer-to-Peer (P2P) networking emerged as a disruptive business model displacing the server based
networks within a point in time.P2P technologies are on the edge of becoming all-purpose in developing
several applications for social networking. In the past seventeen years, research on P2P computing and
systems has received enormous amount of attention in the areas of academia and the industry. P2P rose to
triumphant profit-making systems in the internet. It represents the best incarnation of the end to end
argument, the frequently disputed design philosophies that guided the design of the internet. The doubting
factor then is why is research on P2P computing now fading from the spotlight and suffering a nose dive
fall as dramatic as its rise to its popularity. This paper is going to capture a quick look at past results in
peer-to-peer computing with focus on understanding what led to its rise, what contributed to its commercial
success and what has led to its lack of interest. The insight of this paper introduces cloud computing as a
paradigm to peer-to-peer computing.
Huawei Advanced Data Science With Spark StreamingJen Aman
This document discusses streamDM, an open source machine learning library for stream mining in Spark Streaming. It summarizes streamDM's capabilities for incremental learning on data streams using algorithms like SGD, Naive Bayes, clustering and decision trees. Examples of using streamDM in Huawei's network alarm analysis and fault localization systems are provided, demonstrating improvements in efficiency, accuracy and ability to handle large volumes of streaming data. The document encourages researchers to apply for Huawei's Innovation Research Program grants to further collaborative work on stream mining algorithms and applications.
Dissertation title and final project: Data source registration in the Virtual Laboratory. The subject of the thesis and related project was to integrate EGEE/WLCG data sources into GridSpace Virtual Laboratory (http://gs.cyfronet.pl/).
Poster presentation entitled Integrating EGEE Storage Services with the Virtual Laboratory:
http://www.plgrid.pl/en/pr_materials/posters
Dissertation available at http://virolab.cyfronet.pl/trac/vlvl#MasterofScienceThesesrelatedtoViroLab
Similar to Kalman Graffi - IEEE ICC 2013 - Symbiotic Coupling of Peer-to-Peer and Cloud Systems: The Wikipedia Case (20)
IEEE CRS 2014 - Secure Distributed Data Structures for Peer-to-Peer-based Soc...Kalman Graffi
The document describes research into secure distributed data structures for peer-to-peer social networks. It proposes using distributed lists to store social media content like guestbook entries or photos, partitioned into buckets stored on different nodes. Remote operations are introduced to allow efficient list manipulation with less network traffic than retrieving full buckets. Access control is implemented cryptographically by encrypting or signing list elements and buckets. Evaluation shows the approach reduces network traffic compared to naïve distributed list implementations.
Timo Klerx and Kalman Graffi. Bootstrapping Skynet: Calibration and Autonomic Self-Control of Structured Peer-to-Peer Networks. In IEEE P2P ’13: Proceedings of the International Conference on Peer-to-Peer Computing, 2013.
Abstract—Peer-to-peer systems scale to millions of nodes and provide routing and storage functions with best effort quality. In order to provide a guaranteed quality of the overlay functions, even under strong dynamics in the network with regard to peer capacities, online participation and usage patterns, we propose to calibrate the peer-to-peer overlay and to autonomously learn which qualities can be reached. For that, we simulate the peer- to-peer overlay systematically under a wide range of parameter configurations and use neural networks to learn the effects of the configurations on the quality metrics. Thus, by choosing a specific quality setting by the overlay operator, the network can tune itself to the learned parameter configurations that lead to the desired quality. Evaluation shows that the presented self-calibration succeeds in learning the configuration-quality interdependencies and that peer-to-peer systems can learn and adapt their behavior according to desired quality goals.
Vitaliy Rapp and Kalman Graffi. Continuous Gossip-based Aggregation through Dynamic Information Aging. In IEEE ICCCN ’13: Proceedings of the International Conference on Computer Communications and Networks, 2013.
Abstract—Existing solutions for gossip-based aggregation in peer-to-peer networks use epochs to calculate a global estimation from an initial static set of local values. Once the estimation converges system-wide, a new epoch is started with fresh initial values. Long epochs result in precise estimations based on old measurements and short epochs result in imprecise aggregated estimations. In contrast to this approach, we present in this paper a continuous, epoch-less approach which considers fresh local values in every round of the gossip-based aggregation. By using an approach for dynamic information aging, inaccurate values and values from left peers fade from the aggregation memory. Evaluation shows that the presented approach for continuous information aggregation in peer-to-peer systems monitors the system performance precisely, adapts to changes and is lightweight to operate.
IEEE HPCS 2013 - Comparative Evaluation of Peer-to-Peer Systems Using Peerfac...Kalman Graffi
Matthias Feldotto and Kalman Graffi. Comparative Evaluation of Peer-to-Peer Systems Using PeerfactSim.KOM. In IEEE HPCS’13: Proceedings of the International Conference on High Per- formance Computing and Simulation, 2013.
Kalman Graffi - IEEE HPCS 2013 - Comparative Evaluation of P2P Systems Using ...Kalman Graffi
The document discusses the PeerfactSim.KOM simulator for evaluating peer-to-peer systems. It describes how PeerfactSim.KOM allows for:
1) Automated execution of large numbers of simulations in parallel by specifying simulation setups once.
2) Automated compilation and plotting of results from all simulations to compare performance of different protocols and parameters.
3) A layered architecture and common interfaces that enable flexible testing and evaluation of individual components like overlays, applications and network models.
Kalman Graffi - Monitoring and Management of P2P Systems - 2010Kalman Graffi
This is the long presentation of the contributions made in the dissertation of Dr.-Ing. Kalman Graffi - Monitoring and Management of P2P Systems. The talk was given at 29. Sept. 2009 in Madrid / Spain.
IEEE CCNC 2011: Kalman Graffi - LifeSocial.KOM: A Secure and P2P-based Soluti...Kalman Graffi
The phenomenon of online social networks reaches millions of users in the Internet nowadays. In these, users present themselves, their interests and their social links which they use to interact with other users. We present in this paper LifeSocial.KOM, a p2p-based platform for secure online social networks which provides the functionality of common online social networks in a totally distributed and secure manner. It is plugin-based, thus extendible in its functionality, providing secure communication and access-controlled storage as well as monitored quality of service, addressing the needs of both, users and system providers. The platform operates solely on the resources of the users, eliminating the concentration of crucial operational costs for one provider. In a testbed evaluation, we show the feasibility of the approach and point out the potential of the p2p paradigm in the field of online social networks.
Kalman Graffi - 15 Slide on Monitoring P2P Systems - 2010Kalman Graffi
The document discusses monitoring and managing peer-to-peer (P2P) overlays. It notes that as P2P applications have evolved to support real-time services like voice/video, there is a need to coordinate millions of autonomous peers to provide controlled quality of service (QoS). The modular nature of P2P software also necessitates monitoring and management components to optimize performance across dynamic, heterogeneous networks of peers.
LifeSocial - A P2P-Platform for Secure Online Social NetworksKalman Graffi
LifeSocial is a peer-to-peer platform for online social networks that aims to address issues with centralized social networks like high costs and limited scalability. It uses a distributed architecture that stores data across user devices instead of centralized servers to reduce costs. The platform provides rich functionality through an extensible plugin architecture and ensures security, reliability and quality of service through its monitoring and management capabilities. The goal is to satisfy both users and providers by offering a flexible yet controlled social networking environment at lower operational costs compared to centralized solutions.
Dagstuhl 2010 - Kalman Graffi - Alternative, more promising IT Paradigms for ...Kalman Graffi
This document discusses alternative IT paradigms for digital social networks, specifically a peer-to-peer (P2P) based approach. It introduces LifeSocial.KOM, a secure P2P digital social network developed by the KOM research group. LifeSocial.KOM uses a distributed architecture that shifts costs and load to users, in contrast to traditional client-server social networks. It provides functionality for user profiles, content sharing, messaging, and interaction through a framework of reusable components and an underlying P2P overlay network.
This document discusses monitoring and managing peer-to-peer systems. It aims to coordinate millions of autonomous peers to provide controlled quality of service. Specifically, it addresses how to monitor system-specific and peer-specific metrics to analyze the current system state. It then proposes a self-configuration framework where the root peer derives and distributes new parameter configurations to reach predefined quality goals. The evaluation shows this approach enables quick convergence to quality intervals while imposing low overhead.
The document discusses monitoring and management of peer-to-peer (P2P) systems. It describes monitoring P2P systems to gather global statistics and peer-specific information. A self-configuration control loop is used to automate rule application and reach and maintain quality goals through incremental steps. The contact is Kalman Graffi of Paderborn University for further information.
Kalman Graffi - 3rd Research Talk - 2010Kalman Graffi
The document discusses monitoring and managing peer-to-peer overlays through a self-configuration cycle that involves monitoring the system state, analyzing the metrics to derive a new parameter configuration, and distributing the new configuration to peers in order to meet predefined quality goals for the overlay. The goal is to coordinate millions of autonomous peers to provide controlled quality of service through an automated process of reconfiguring established peer-to-peer overlays.
IEEE P2P 2009 - Kalman Graffi - Monitoring and Management of Structured Peer-...Kalman Graffi
The peer-to-peer paradigm shows the potential to provide the same functionality and quality like client/server based systems, but with much lower costs. In order to control the quality of peer-to-peer systems, monitoring and management mechanisms need to be applied. Both tasks are challenging in large-scale networks with autonomous, unreliable nodes. In this paper we present a monitoring and management framework for structured peer-to-peer systems. It captures the live status of a peer-to-peer network in an exhaustive statistical representation. Using principles of autonomic computing, a preset system state is approached through automated system re-configuration in the case that a quality deviation is detected. Evaluation shows that the monitoring is very precise and lightweight and that preset quality goals are reached and kept automatically.
The document discusses aspects of autonomic computing applied to peer-to-peer (P2P) systems to manage quality of service. It describes using a monitoring mechanism called SkyEye.KOM to gather statistics on P2P systems in a scalable and self-organizing way. Based on the monitoring data, the system can analyze for deviations from preset quality levels, plan adaptations like changing routing table sizes, and execute adaptations to reach and maintain quality goals. Simulations showed the approach enables P2P systems to precisely reach and hold preset quality intervals through self-configuration.
Cebit 2008 - PeerfactSim.KOM - A Simulator for Large Scale Peer-to-Peer SystemsKalman Graffi
This document describes a simulator called PeerfactSim.KOM for large scale peer-to-peer systems. PeerfactSim.KOM will be presented at CeBIT from March 4-10, 2008. The simulator models different aspects of peer-to-peer networks including user behavior, applications, overlay structures, transport protocols, and network effects. It aims to support research and development of new peer-to-peer applications.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
How to Get CNIC Information System with Paksim Ga.pptx
Kalman Graffi - IEEE ICC 2013 - Symbiotic Coupling of Peer-to-Peer and Cloud Systems: The Wikipedia Case
1. Symbiotic Coupling of P2P and Cloud Systems:
The Wikipedia Case
Lars Bremer, University of Paderborn, Germany
Kalman Graffi, University of Düsseldorf, Germany
2. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 2
Know this Banners?
Know these banners?
Know these banners?
3. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 3
Background on Wikipedia
Wikipedia
– Collaborative Internet Encylopaedia
Numbers on English Wikipedia
– Alexa rank: 6
– Article Count: 3.8 million
– Edits: 3.4 million per month
– Page Views: 11.3 million per hour
Figures show the popularity of articles
– Top: All articles
– Bottom: Top 250 articles
Problem: Costs through high traffic
0 0 .5 1 1 .5 2 2 .5
x 1 0
6
1 0
1 0
0
1 0
2
1 0
4
1 0
6
R a n k
PageViews
Wikipedia: Page View D
Distribution, All Ranks
0 0 .5 1 1 .5 2 2 .5
x 1 0
6
1 0
1 0
0
1 0
2
1 0
4
1 0
6
R a n k
PageViews
0 5 0 1 0 0 1 5 0 2 0 0 2 5 0
1 0
2
1 0
3
1 0
4
1 0
5
1 0
6
R a n k
PageViews
Wikipedia: Page View Distribution
4. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 4
Motivation and Outline of our Work
Goal: Efficiency increase
– Cloud-like performance
• Maintain high data availability
• Quick article delivery
– Low operational costs
• Users should help in sharing articles
• Donations of network resources
Approach
– Combine peer-to-peer (p2p) and centralized (cloud) architecture
– Cloud is used as backup and main hoster
• Much less traffic and costs
– Users participate in p2p overlay
• Lookup articles first there
• Provide downloaded articles to other peers
5. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 5
Outline
Motivation / Use Case
Background on Structured P2P Overlays
Symbiotic Coupling of P2P and Cloud Systems
Evaluation
Conclusions
6. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 6
Background on Structured P2P Overlays
Nodes and objects use same ID
space
Each object is managed by a
node ( responsible)
Assignment based on IDs
Nodes maintain a topology /
routing structure to support:
Lookup:
getResponsibleNode(ID)
After that: e.g. data tranfer
H(„my data“)
= 3107
2207
2906
3485
201116221008
709
611
?
H(„my data“)
= 3107
2207
2906
3485
201116221008
709
611
?
Lookup
Data transfer
Model Overview
7. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 7
Cloud Computing vs. P2P Technology
Cloud and P2P
– Access to a distributed pool of
resources:
• Storage, bandwidth, computational
power
Cloud computing
– Resource providers: companies
– Controlled environment
• No (/minimal) churn
• Homogenous devices
– Selective centralized structures
– Mainly paid by usage
P2P systems
– Resource providers: user devices
– Uncontrolled environment
• Churn
• Heterogeneous devices
• Uncertainty / unpredictability
• Distributed access points
8. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 8
Symbiotic Coupling of P2P and Cloud Systems
Goal: High performance at low costs
– Performance: data availability, low delays
– Costs: traffic at cloud operator (linked to monetary expenses)
Our approach
– Main service (here Wikipedia) remains as main data pool
– Nodes install an (p2p) addon p2p overlay
• Allows to share content of specific services
– Nodes visiting Wikipedia
• Join the p2p overlay and remain online for a while
– Articles are served and provided in p2p overlay
– If not available / initially: download from cloud
9. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 9
Model Overview
a r t ic le n o d e
A 1 N 1
A 1 N 2
A 1 N 4
A 2 N 2
D O C U M E N T T A B L E
R E F E R E N C E T A B L E
A 1
A 1
N 1
N 2
N 3
N 4
A 2
D O C U M E N T T A B L E
A 1
D O C U M E N T T A B L E
A 1
A 2
• Key Idea: References
• All downloaded Articles are published
• Cloud used as Fallback
Overview on the Architecture
Document space
– Article ID is hashed article name
– Responsible node maintains
list of articles providers
– Article providers
• Downloaded once the article
• Registered at resp. node
We use Chord
– Any other DHT also fine
– Needs to support
Key-based Routing
10. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 10
Operation: Initial Lookup for an Article
11. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 11
Operation: Further Lookup for an Article
12. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 12
Other Operations
Update
– Editing done on the Cloud
– Active vs. passive updates
• Active: Cloud actively informs node holding references
• Passive: Responsible peers periodically check for updates
– Frequency based on object popularity
– Old references are discarded
• They point to outdated content
– New reference table is built-up
Leave
– Leaving node informs all nodes holding references to it
– Can also be detected, but introduces delay
13. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 13
Evaluation
Main questions
– What is the efficiency gain?
– How much traffic is saved?
Approach
– Evaluation through simulation
Layer setup
– User mode: downscaled Wikipedia workload
– Application: document storage
– Overlay: Chord
– Network model:
• Global Network Positioning delay model
• OECD bandwidth model
14. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 14
► PeerfactSim.KOM (see www.peerfact.org)
Type
– Event-based simulator
– Written in Java
– Simulations up to 100K peers possible
– Focus on simulation of p2p systems on various layers
• User
• Application
• Services: monitoring, replication …
• Overlays
• Network models
Invitation to join the community
– Several universities use and extend the simulator actively
– Used and heavily extended in the project
15. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 15
Layered View
Layered Architecture
– Easy exchange of components
– Testing of new applications
– Testing of new mechanisms
Main idea
– Layers have several implementations
– Enables testing of individual layer
mechanisms
• on its own
• in combination with other layers
See www.peerfact.org
Application
Overlay
User
SimulationEngine
Network
Service
Transport
Application
Overlay
User
SimulationEngine
Network
Service
Transport
16. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 16
Simulation Setup / Workload Model
17. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 17
Simulation Results
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
0 0.2 0.4 0.6 0.8 1.0
ReferenceLookupTime(Sec)
All Queries
MST 30min, Reference
MST 60min, Reference
MST 90min, Reference
MST 120min, Reference
MST 30min, Article
MST 60min, Article
MST 90min, Article
MST 120min, Article
(a) Reference Lookup and Total Download Time
18. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 18
Simulation Results
1.0
d Time (b) Traffic Load in the Cloud
0
10
20
30
40
50
60
70
80
0
Articles/min
(c) Tra
19. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 19
Simulation Results
0
10
20
30
40
50
60
70
80
0 20 40 60 80 100 120 140 160 180
Articles/min
Simulation Time (min)
Downloaded from Peer
Downloaded from Cloud
(c) Traffic Load Savings for Session Time 120 min
20. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 20
Conclusions and Future Work
Symbiotic p2p/cloud approach lowers operational costs
– Users take load and share content
– Traffic load on server was reduced to 27.6% in this experiment
– Websites with many users can benefit from p2p support
– WebP2P – browser-based p2p via peerjs, nodejs, etc. is coming
– User devices are powerful, load can be handled „for free“
Future Work
– Investigate WebP2P approach
• Browser plugin to create p2p overlay
– Create p2p framework for social networks
• Use capacity of user devices to host a social network
• See http://www.p2pframework.com
– Further extend PeerfactSim.KOM – the p2p system simulator
• See http://www.peerfact.org
21. Lars Bremer, Kalman Graffi: Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case 21
Thank You for Your Attention
Jun.-Prof. Dr.-Ing. Kalman Graffi
Technology of Social Networks Group
Institute of Computer Science
Heinrich-Heine-Universität Düsseldorf
eMail: graffi@cs.uni-duesseldorf.de
Web: www.p2pframework.com
??