In-Network Caching in Information Centric Networking (ICN) with Content Centric Networking (CCN) as key design architecture. Aim is to make an exhaustive review of work related to in-network caching in ICN.
Hadoop Meetup Jan 2019 - Overview of OzoneErik Krogen
A presentation by Anu Engineer of Cloudera regarding the state of the Ozone subproject. He covers a brief introduction of what Ozone is, and where it's headed.
This is taken from the Apache Hadoop Contributors Meetup on January 30, hosted by LinkedIn in Mountain View.
Bringing Real-Time to the Enterprise with Hortonworks DataFlowDataWorks Summit
This document discusses TELUS's journey to enable real-time streaming analytics of data from IPTV set top boxes (STBs) to improve the customer experience. It describes moving from batch processing STB log data every 12 hours to streaming the data in real-time using Apache Kafka, NiFi, and Spark. Key lessons learned include using Java 8 for SSL, Spark 2.0 for Kafka integration, and addressing security challenges in their multi-tenant Hadoop environment.
This document discusses security features in Apache Kafka including SSL for encryption, SASL/Kerberos for authentication, authorization controls using an authorizer, and securing Zookeeper. It provides details on how these security components work, such as how SSL establishes an encrypted channel and SASL performs authentication. The authorizer implementation stores ACLs in Zookeeper and caches them for performance. Securing Zookeeper involves setting ACLs on Zookeeper nodes and migrating security configurations. Future plans include moving more functionality to the broker side and adding new authorization features.
LinkedIn leverages the Apache Hadoop ecosystem for its big data analytics. Steady growth of the member base at LinkedIn along with their social activities results in exponential growth of the analytics infrastructure. Innovations in analytics tooling lead to heavier workloads on the clusters, which generate more data, which in turn encourage innovations in tooling and more workloads. Thus, the infrastructure remains under constant growth pressure. Heterogeneous environments embodied via a variety of hardware and diverse workloads make the task even more challenging.
This talk will tell the story of how we doubled our Hadoop infrastructure twice in the past two years.
• We will outline our main use cases and historical rates of cluster growth in multiple dimensions.
• We will focus on optimizations, configuration improvements, performance monitoring and architectural decisions we undertook to allow the infrastructure to keep pace with business needs.
• The topics include improvements in HDFS NameNode performance, and fine tuning of block report processing, the block balancer, and the namespace checkpointer.
• We will reveal a study on the optimal storage device for HDFS persistent journals (SATA vs. SAS vs. SSD vs. RAID).
• We will also describe Satellite Cluster project which allowed us to double the objects stored on one logical cluster by splitting an HDFS cluster into two partitions without the use of federation and practically no code changes.
• Finally, we will take a peek at our future goals, requirements, and growth perspectives.
SPEAKERS
Konstantin Shvachko, Sr Staff Software Engineer, LinkedIn
Erik Krogen, Senior Software Engineer, LinkedIn
This document discusses streaming data ingestion and processing options. It provides an overview of common streaming architectures including Kafka as an ingestion hub and various streaming engines. Spark Streaming is highlighted as a popular and full-featured option for processing streaming data due to its support for SQL, machine learning, and ease of transition from batch workflows. The document also briefly profiles StreamSets Data Collector as a higher-level tool for building streaming data pipelines.
The document discusses the upcoming introduction of IPv6. [1] IPv6 is a new standard for IP numbering that will provide more IP addresses as the current IPv4 addresses are running out. [2] It will help overcome limitations in the old IPv4 system and ensure there are enough addresses available into the next century. [3] The document outlines some of the key features and improvements IPv6 will provide, such as larger packet sizes, better security features, quality of service support, and mobility support.
With Hadoop-3.0.0-alpha2 being released in January 2017, it's time to have a closer look at the features and fixes of Hadoop 3.0.
We will have a look at Core Hadoop, HDFS and YARN, and answer the emerging question whether Hadoop 3.0 will be an architectural revolution like Hadoop 2 was with YARN & Co. or will it be more of an evolution adapting to new use cases like IoT, Machine Learning and Deep Learning (TensorFlow)?
Hadoop Meetup Jan 2019 - Overview of OzoneErik Krogen
A presentation by Anu Engineer of Cloudera regarding the state of the Ozone subproject. He covers a brief introduction of what Ozone is, and where it's headed.
This is taken from the Apache Hadoop Contributors Meetup on January 30, hosted by LinkedIn in Mountain View.
Bringing Real-Time to the Enterprise with Hortonworks DataFlowDataWorks Summit
This document discusses TELUS's journey to enable real-time streaming analytics of data from IPTV set top boxes (STBs) to improve the customer experience. It describes moving from batch processing STB log data every 12 hours to streaming the data in real-time using Apache Kafka, NiFi, and Spark. Key lessons learned include using Java 8 for SSL, Spark 2.0 for Kafka integration, and addressing security challenges in their multi-tenant Hadoop environment.
This document discusses security features in Apache Kafka including SSL for encryption, SASL/Kerberos for authentication, authorization controls using an authorizer, and securing Zookeeper. It provides details on how these security components work, such as how SSL establishes an encrypted channel and SASL performs authentication. The authorizer implementation stores ACLs in Zookeeper and caches them for performance. Securing Zookeeper involves setting ACLs on Zookeeper nodes and migrating security configurations. Future plans include moving more functionality to the broker side and adding new authorization features.
LinkedIn leverages the Apache Hadoop ecosystem for its big data analytics. Steady growth of the member base at LinkedIn along with their social activities results in exponential growth of the analytics infrastructure. Innovations in analytics tooling lead to heavier workloads on the clusters, which generate more data, which in turn encourage innovations in tooling and more workloads. Thus, the infrastructure remains under constant growth pressure. Heterogeneous environments embodied via a variety of hardware and diverse workloads make the task even more challenging.
This talk will tell the story of how we doubled our Hadoop infrastructure twice in the past two years.
• We will outline our main use cases and historical rates of cluster growth in multiple dimensions.
• We will focus on optimizations, configuration improvements, performance monitoring and architectural decisions we undertook to allow the infrastructure to keep pace with business needs.
• The topics include improvements in HDFS NameNode performance, and fine tuning of block report processing, the block balancer, and the namespace checkpointer.
• We will reveal a study on the optimal storage device for HDFS persistent journals (SATA vs. SAS vs. SSD vs. RAID).
• We will also describe Satellite Cluster project which allowed us to double the objects stored on one logical cluster by splitting an HDFS cluster into two partitions without the use of federation and practically no code changes.
• Finally, we will take a peek at our future goals, requirements, and growth perspectives.
SPEAKERS
Konstantin Shvachko, Sr Staff Software Engineer, LinkedIn
Erik Krogen, Senior Software Engineer, LinkedIn
This document discusses streaming data ingestion and processing options. It provides an overview of common streaming architectures including Kafka as an ingestion hub and various streaming engines. Spark Streaming is highlighted as a popular and full-featured option for processing streaming data due to its support for SQL, machine learning, and ease of transition from batch workflows. The document also briefly profiles StreamSets Data Collector as a higher-level tool for building streaming data pipelines.
The document discusses the upcoming introduction of IPv6. [1] IPv6 is a new standard for IP numbering that will provide more IP addresses as the current IPv4 addresses are running out. [2] It will help overcome limitations in the old IPv4 system and ensure there are enough addresses available into the next century. [3] The document outlines some of the key features and improvements IPv6 will provide, such as larger packet sizes, better security features, quality of service support, and mobility support.
With Hadoop-3.0.0-alpha2 being released in January 2017, it's time to have a closer look at the features and fixes of Hadoop 3.0.
We will have a look at Core Hadoop, HDFS and YARN, and answer the emerging question whether Hadoop 3.0 will be an architectural revolution like Hadoop 2 was with YARN & Co. or will it be more of an evolution adapting to new use cases like IoT, Machine Learning and Deep Learning (TensorFlow)?
This session will go into best practices and detail on how to architect a near real-time application on Hadoop using an end-to-end fraud detection case study as an example. It will discuss various options available for ingest, schema design, processing frameworks, storage handlers and others, available for architecting this fraud detection application and walk through each of the architectural decisions among those choices.
A presentation by Ted Dunning of MapR on why Streaming Matters made to the Hadoop User Group (HUG) Ireland at Hadoop Summit on April 12th 2016. This presentation covers streaming and why it is so important in any big data solution
Where are we now: IPv6 deployment update - Brunei National IPv6 Day ConferenceAPNIC
This document discusses IPv6 deployment and the choices network operators face as IPv4 addresses run out. It describes IPv6 and the need to transition due to IPv4 address exhaustion. The main choices for network operators are to do nothing and rely solely on IPv4, prolong IPv4 usage through widespread NAT deployment and IPv4 address trading, or deploy IPv6 through dual stack or transition technologies. Each option has advantages and disadvantages relating to address availability, network architecture requirements, and supporting new protocols.
I gave this talk at Buzzwords just now to fill in for an ill speaker.
The topics include things that are being added to or taken out of Mahout. These include cruft (out), fast clustering (in), nearest neighbor search (in), Pig bindings for Mahout (who knows).
The document discusses evolving HDFS to support generalized storage containers in order to better scale the number of files and blocks. It proposes using block containers and a partial namespace approach to initially scale to billions of files and blocks, and eventually much higher numbers. The storage layer is being restructured to support various container types for use cases beyond HDFS like object storage and HBase.
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
Today's typical Apache Hadoop deployments use HDFS for persistent, fault-tolerant storage of big data files. However, recent emerging architectural patterns increasingly rely on cloud object storage such as S3, Azure Blob Store, GCS, which are designed for cost-efficiency, scalability and geographic distribution. Hadoop supports pluggable file system implementations to enable integration with these systems for use cases such as off-site backup or even complex multi-step ETL, but applications may encounter unique challenges related to eventual consistency, performance and differences in semantics compared to HDFS. This session explores those challenges and presents recent work to address them in a comprehensive effort spanning multiple Hadoop ecosystem components, including the Object Store FileSystem connector, Hive, Tez and ORC. Our goal is to improve correctness, performance, security and operations for users that choose to integrate Hadoop with Cloud Storage. We use S3 and S3A connector as case study.
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...DataWorks Summit
The last 5 years have been marked by an explosion of Internet-connected devices. From cars to solar power, from TVs to juice makers, modern life is filled with interconnected smart devices.
But while those ubiquitous devices enhance the interaction with the technology that surrounds us, the lifecycle management of IoT firmware and poor security design choices still present a significant threat to our daily lives.
Despite the ascent of threats like the Mirai botnet, the amount of published research around how to programmatically detect new IoTs in the wild has been somewhat limited.
In this presentation we introduce Data Engineering in the context of cyber security, discuss why it is important to move away from the view that security log pipelines are enrichment and indicator matching tools, and push the boundaries of “Simple Event Processing” to demonstrate how Apache NiFi and Apache MiNiFi’s feature rich dataflows can be used to dynamically identify new IoT botnet activities in the wild.
Speakers
Andre Fucs De Miranda, Independent Consultant, Fluenda
Andy LoPresto, Sr. Member of Technical Staff, Hortonworks
We’re living in an era of digital disruption, where the accessibility and adoption of emerging digital technologies are enabling
enterprises to reimagine their businesses in exciting new ways. Data flows from the edge to the core to the cloud while
performing analytics and gaining actionable intelligence at all steps along the way. This connected, automated and data-driven
future enables organizations to rapidly acquire, analyze, and take action on real-time data as well as curate flows for additional
analysis at a later stage. New IoT use cases require enterprises to properly handle data in motion and create newer edge
applications with data flow management, stream processing and analytics while still being governed by existing enterprise
services.
This session highlights the importance of an edge-to-core-to-cloud digital infrastructure that can adapt to your flexing
business needs, capturing expanding data flows at the edge and aligning them to a core infrastructure that can drive insight.
Speakers
Bob Mumford, Hewlett Packard Enterprise, Big Data Solutions Architect
Ozone is an object store that can be built into HDFS to provide highly scalable object storage capabilities. It uses a hashing algorithm to map object keys to storage containers, which are then distributed across data nodes similarly to HDFS blocks. The storage containers are managed by a storage container manager that maintains metadata about container locations and performs functions like replication. This allows Ozone to provide secure, reliable storage of trillions of objects with a wide range of sizes.
This document discusses a solution for cooperative data exploration using IPython Notebooks and a shared Spark application. The solution allows multiple users to access in-memory results from a single Spark application running on a cluster. Users can connect IPython Notebooks to the shared SparkContext and SqlContext via Py4J to collaborate on exploring big data in a transparent manner without data duplication.
This document discusses Apache Kafka and how it can be used by Oracle DBAs. It begins by explaining how Kafka builds upon the concept of a database redo log by providing a distributed commit log service. It then discusses how Kafka is a publish-subscribe messaging system and can be used to log transactions from any database, application logs, metrics and other system events. Finally, it discusses how schemas are important for Kafka since it only stores messages as bytes, and how Avro can be used to define and evolve schemas for Kafka messages.
The document discusses how Apache Ambari can be used to streamline Hadoop DevOps. It describes how Ambari can be used to provision, manage, and monitor Hadoop clusters. It highlights new features in Ambari 2.4 like support for additional services, role-based access control, management packs, and Grafana integration. It also covers how Ambari supports automated deployment and cluster management using blueprints.
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
Apache Tez is a framework for accelerating Hadoop query processing. It is based on expressing a computation as a dataflow graph and executing it in a highly customizable way. Tez is built on top of YARN and provides benefits like better performance, predictability, and utilization of cluster resources compared to traditional MapReduce. It allows applications to focus on business logic rather than Hadoop internals.
International Refereed Journal of Engineering and Science (IRJES) is a peer reviewed online journal for professionals and researchers in the field of computer science. The main aim is to resolve emerging and outstanding problems revealed by recent social and technological change. IJRES provides the platform for the researchers to present and evaluate their work from both theoretical and technical aspects and to share their views.
The document discusses in-flux limiting for a multi-tenant logging service. It describes Symantec's logging and metrics architecture using Kafka, Elasticsearch, and InfluxDB. It addresses the issue of ingestion spikes overwhelming InfluxDB and presents a solution to normalize event rates using buffers that allocate ingestion quotas per tenant. The design implements rate limiting using a scheduled task pattern in Storm to track each tenant's event rate over a configurable window and throttle events if the threshold is exceeded.
HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit
Most users know HDFS as the reliable store of record for big data analytics. HDFS is also used to store transient and operational data when working with cloud object stores, such as Azure HDInsight and Amazon EMR. In these settings- but also in more traditional, on premise deployments- applications often manage data stored in multiple storage systems or clusters, requiring a complex workflow for synchronizing data between filesystems to achieve goals for durability, performance, and coordination.
Building on existing heterogeneous storage support, we add a storage tier to HDFS to work with external stores, allowing remote namespaces to be "mounted" in HDFS. This capability not only supports transparent caching of remote data as HDFS blocks, it also supports synchronous writes to remote clusters for business continuity planning (BCP) and supports hybrid cloud architectures.
This idea was presented at last year’s Summit in San Jose. Lots of progress has been made since then and the feature is in active development at the Apache Software Foundation on branch HDFS-9806, driven by Microsoft and Western Digital. We will discuss the refined design & implementation and present how end-users and admins will be able to use this powerful functionality.
Data Guarantees and Fault Tolerance in Streaming SystemsDataWorks Summit
Does your big data streaming pipeline have a hole in its pocket ? Streaming involves gathering data, processing it and delivering the results to the intended destinations in real time. Glitches at any stage can cause data loss unless the products employed in the pipeline provide the necessary guarantees and are configured properly to deliver on those guarantees.
Realtime stream processing brings unique challenges with respect to data handling guarantees and fault tolerance. Each streaming product comes with a unique approach to tackle these problems. When assembling a streaming pipeline, it is important to understand this critical topic for proper selection and configuration of the individual components of the pipeline. The exercise to determine if you are missing some records in your data lake can be expensive, but it can be extremely difficult to track down the cause to prevent it from recurring.
To help you build reliable streaming pipelines, this talk will give you a better understanding of the problems involved in realtime streaming, the kinds of guarantees involved and how they are handled in popular open source products such as Storm, Flink, Kafka, Hive Streaming APIs and Flume.
Speaker
Roshan Naik, Senior MTS, Hortonworks
The document discusses using Apache Kafka for event detection pipelines. It describes how Kafka can be used to decouple data pipelines and ingest events from various source systems in real-time. It then provides an example use case of using Kafka, Hadoop, and machine learning for fraud detection in consumer banking, describing the online and offline workflows. Finally, it covers some of the challenges of building such a system and considerations for deploying Kafka.
The document discusses various techniques for transitioning from IPv4 to IPv6, including dual stack, tunnels, and translation. Dual stack allows simultaneous support of both IPv4 and IPv6 by keeping both protocol stacks. Tunnels encapsulate IPv6 packets in IPv4 packets to carry IPv6 traffic over IPv4 networks. Translation techniques like NAT64 algorithmically translate IPv4 and IPv6 addresses to allow communication between IPv4-only and IPv6-only nodes. Newer methods like 464XLAT and DS-Lite aim to address IPv4 exhaustion by sharing public IPv4 addresses among more clients.
This session will go into best practices and detail on how to architect a near real-time application on Hadoop using an end-to-end fraud detection case study as an example. It will discuss various options available for ingest, schema design, processing frameworks, storage handlers and others, available for architecting this fraud detection application and walk through each of the architectural decisions among those choices.
A presentation by Ted Dunning of MapR on why Streaming Matters made to the Hadoop User Group (HUG) Ireland at Hadoop Summit on April 12th 2016. This presentation covers streaming and why it is so important in any big data solution
Where are we now: IPv6 deployment update - Brunei National IPv6 Day ConferenceAPNIC
This document discusses IPv6 deployment and the choices network operators face as IPv4 addresses run out. It describes IPv6 and the need to transition due to IPv4 address exhaustion. The main choices for network operators are to do nothing and rely solely on IPv4, prolong IPv4 usage through widespread NAT deployment and IPv4 address trading, or deploy IPv6 through dual stack or transition technologies. Each option has advantages and disadvantages relating to address availability, network architecture requirements, and supporting new protocols.
I gave this talk at Buzzwords just now to fill in for an ill speaker.
The topics include things that are being added to or taken out of Mahout. These include cruft (out), fast clustering (in), nearest neighbor search (in), Pig bindings for Mahout (who knows).
The document discusses evolving HDFS to support generalized storage containers in order to better scale the number of files and blocks. It proposes using block containers and a partial namespace approach to initially scale to billions of files and blocks, and eventually much higher numbers. The storage layer is being restructured to support various container types for use cases beyond HDFS like object storage and HBase.
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
Today's typical Apache Hadoop deployments use HDFS for persistent, fault-tolerant storage of big data files. However, recent emerging architectural patterns increasingly rely on cloud object storage such as S3, Azure Blob Store, GCS, which are designed for cost-efficiency, scalability and geographic distribution. Hadoop supports pluggable file system implementations to enable integration with these systems for use cases such as off-site backup or even complex multi-step ETL, but applications may encounter unique challenges related to eventual consistency, performance and differences in semantics compared to HDFS. This session explores those challenges and presents recent work to address them in a comprehensive effort spanning multiple Hadoop ecosystem components, including the Object Store FileSystem connector, Hive, Tez and ORC. Our goal is to improve correctness, performance, security and operations for users that choose to integrate Hadoop with Cloud Storage. We use S3 and S3A connector as case study.
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...DataWorks Summit
The last 5 years have been marked by an explosion of Internet-connected devices. From cars to solar power, from TVs to juice makers, modern life is filled with interconnected smart devices.
But while those ubiquitous devices enhance the interaction with the technology that surrounds us, the lifecycle management of IoT firmware and poor security design choices still present a significant threat to our daily lives.
Despite the ascent of threats like the Mirai botnet, the amount of published research around how to programmatically detect new IoTs in the wild has been somewhat limited.
In this presentation we introduce Data Engineering in the context of cyber security, discuss why it is important to move away from the view that security log pipelines are enrichment and indicator matching tools, and push the boundaries of “Simple Event Processing” to demonstrate how Apache NiFi and Apache MiNiFi’s feature rich dataflows can be used to dynamically identify new IoT botnet activities in the wild.
Speakers
Andre Fucs De Miranda, Independent Consultant, Fluenda
Andy LoPresto, Sr. Member of Technical Staff, Hortonworks
We’re living in an era of digital disruption, where the accessibility and adoption of emerging digital technologies are enabling
enterprises to reimagine their businesses in exciting new ways. Data flows from the edge to the core to the cloud while
performing analytics and gaining actionable intelligence at all steps along the way. This connected, automated and data-driven
future enables organizations to rapidly acquire, analyze, and take action on real-time data as well as curate flows for additional
analysis at a later stage. New IoT use cases require enterprises to properly handle data in motion and create newer edge
applications with data flow management, stream processing and analytics while still being governed by existing enterprise
services.
This session highlights the importance of an edge-to-core-to-cloud digital infrastructure that can adapt to your flexing
business needs, capturing expanding data flows at the edge and aligning them to a core infrastructure that can drive insight.
Speakers
Bob Mumford, Hewlett Packard Enterprise, Big Data Solutions Architect
Ozone is an object store that can be built into HDFS to provide highly scalable object storage capabilities. It uses a hashing algorithm to map object keys to storage containers, which are then distributed across data nodes similarly to HDFS blocks. The storage containers are managed by a storage container manager that maintains metadata about container locations and performs functions like replication. This allows Ozone to provide secure, reliable storage of trillions of objects with a wide range of sizes.
This document discusses a solution for cooperative data exploration using IPython Notebooks and a shared Spark application. The solution allows multiple users to access in-memory results from a single Spark application running on a cluster. Users can connect IPython Notebooks to the shared SparkContext and SqlContext via Py4J to collaborate on exploring big data in a transparent manner without data duplication.
This document discusses Apache Kafka and how it can be used by Oracle DBAs. It begins by explaining how Kafka builds upon the concept of a database redo log by providing a distributed commit log service. It then discusses how Kafka is a publish-subscribe messaging system and can be used to log transactions from any database, application logs, metrics and other system events. Finally, it discusses how schemas are important for Kafka since it only stores messages as bytes, and how Avro can be used to define and evolve schemas for Kafka messages.
The document discusses how Apache Ambari can be used to streamline Hadoop DevOps. It describes how Ambari can be used to provision, manage, and monitor Hadoop clusters. It highlights new features in Ambari 2.4 like support for additional services, role-based access control, management packs, and Grafana integration. It also covers how Ambari supports automated deployment and cluster management using blueprints.
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
Apache Tez is a framework for accelerating Hadoop query processing. It is based on expressing a computation as a dataflow graph and executing it in a highly customizable way. Tez is built on top of YARN and provides benefits like better performance, predictability, and utilization of cluster resources compared to traditional MapReduce. It allows applications to focus on business logic rather than Hadoop internals.
International Refereed Journal of Engineering and Science (IRJES) is a peer reviewed online journal for professionals and researchers in the field of computer science. The main aim is to resolve emerging and outstanding problems revealed by recent social and technological change. IJRES provides the platform for the researchers to present and evaluate their work from both theoretical and technical aspects and to share their views.
The document discusses in-flux limiting for a multi-tenant logging service. It describes Symantec's logging and metrics architecture using Kafka, Elasticsearch, and InfluxDB. It addresses the issue of ingestion spikes overwhelming InfluxDB and presents a solution to normalize event rates using buffers that allocate ingestion quotas per tenant. The design implements rate limiting using a scheduled task pattern in Storm to track each tenant's event rate over a configurable window and throttle events if the threshold is exceeded.
HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit
Most users know HDFS as the reliable store of record for big data analytics. HDFS is also used to store transient and operational data when working with cloud object stores, such as Azure HDInsight and Amazon EMR. In these settings- but also in more traditional, on premise deployments- applications often manage data stored in multiple storage systems or clusters, requiring a complex workflow for synchronizing data between filesystems to achieve goals for durability, performance, and coordination.
Building on existing heterogeneous storage support, we add a storage tier to HDFS to work with external stores, allowing remote namespaces to be "mounted" in HDFS. This capability not only supports transparent caching of remote data as HDFS blocks, it also supports synchronous writes to remote clusters for business continuity planning (BCP) and supports hybrid cloud architectures.
This idea was presented at last year’s Summit in San Jose. Lots of progress has been made since then and the feature is in active development at the Apache Software Foundation on branch HDFS-9806, driven by Microsoft and Western Digital. We will discuss the refined design & implementation and present how end-users and admins will be able to use this powerful functionality.
Data Guarantees and Fault Tolerance in Streaming SystemsDataWorks Summit
Does your big data streaming pipeline have a hole in its pocket ? Streaming involves gathering data, processing it and delivering the results to the intended destinations in real time. Glitches at any stage can cause data loss unless the products employed in the pipeline provide the necessary guarantees and are configured properly to deliver on those guarantees.
Realtime stream processing brings unique challenges with respect to data handling guarantees and fault tolerance. Each streaming product comes with a unique approach to tackle these problems. When assembling a streaming pipeline, it is important to understand this critical topic for proper selection and configuration of the individual components of the pipeline. The exercise to determine if you are missing some records in your data lake can be expensive, but it can be extremely difficult to track down the cause to prevent it from recurring.
To help you build reliable streaming pipelines, this talk will give you a better understanding of the problems involved in realtime streaming, the kinds of guarantees involved and how they are handled in popular open source products such as Storm, Flink, Kafka, Hive Streaming APIs and Flume.
Speaker
Roshan Naik, Senior MTS, Hortonworks
The document discusses using Apache Kafka for event detection pipelines. It describes how Kafka can be used to decouple data pipelines and ingest events from various source systems in real-time. It then provides an example use case of using Kafka, Hadoop, and machine learning for fraud detection in consumer banking, describing the online and offline workflows. Finally, it covers some of the challenges of building such a system and considerations for deploying Kafka.
The document discusses various techniques for transitioning from IPv4 to IPv6, including dual stack, tunnels, and translation. Dual stack allows simultaneous support of both IPv4 and IPv6 by keeping both protocol stacks. Tunnels encapsulate IPv6 packets in IPv4 packets to carry IPv6 traffic over IPv4 networks. Translation techniques like NAT64 algorithmically translate IPv4 and IPv6 addresses to allow communication between IPv4-only and IPv6-only nodes. Newer methods like 464XLAT and DS-Lite aim to address IPv4 exhaustion by sharing public IPv4 addresses among more clients.
our schemes forgoes ip address entirely and instead uses hostnames as identifiers in packets. The scalability of routing in ensured by encapsulating these packets in highly aggregated routing allocator. We use autonomous system numbers (ANSs) and Here we are going to present data experiment which shows that a much simple and scalable routing future internet by using fewer identifiers for its entities.
IPv6 is the next generation Internet Protocol that provides a vastly larger number of IP addresses compared to the current IPv4. It features 128-bit addressing which allows for trillions of devices to have unique IP addresses. IPv6 also aims to make networking more secure and allow for more efficient routing. The transition from IPv4 to IPv6 is underway, with most modern operating systems and network hardware now supporting IPv6, though applications support is still growing. IPv6's expanded addressing capabilities and additional features will help meet future demands on the Internet as more devices connect online.
1) The document discusses 6LoWPAN (IPv6 over Low-Power Wireless Personal Area Networks), which allows IPv6 packets to be sent over IEEE 802.15.4 low-power networks.
2) A key challenge is that the large IPv6 address and header do not fit efficiently into the small 802.15.4 frames, so 6LoWPAN defines header compression methods.
3) 6LoWPAN defines a dispatch byte and optional headers for mesh routing, header compression, and fragmentation to optimize IPv6 packets for transmission over 802.15.4 networks.
Selective placement of caches for hash based off-path caching in icn slidesAnshuman Kalla
Cache Allocation in Information Centric Networking
Selective Placement of Caches in ICN
Selective versus Pervasive In-Network Caching in ICN
Hash-Based Off-Path Caching in ICN
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
This document discusses best practices for implementing Ceph-powered storage as a service. It covers planning a Ceph implementation based on business and technical requirements. Various use cases for Ceph are described, including OpenStack, cloud storage, web-scale applications, high performance block storage, archive/cold storage, databases and Hadoop. Architectural considerations for redundancy, servers, networking are also discussed. The document concludes with a case study of a university implementing a Ceph-based storage cloud to address storage needs for cancer and genomic research data.
PLNOG 13: B. van der Sloot, S. Abdel-Hafez: Running a 2 Tbps global IP networ...PROIDEA
Bart van der Sloot joined FiberRing as Managing Director in April 2014, with the objectives to further grow and enhance the footprint, quality and business of FiberRing’s 2 Tbps global IP network, covering over 50 locations on 3 continents. From 1999 to 2013 Bart worked at Global Crossing (acquired by Level 3 in 2011), where he developed staffing, systems and business processes for Global Crossing’s European brand new sales team, built and coached a Wholesale Sales team to sign new telecom customers and grow revenues in various countries across Europe, led Global Crossing’s expansion into Central and Eastern Europe and established Level 3’s position in the Benelux broadcast market.
Samer Abdel-Hafez joined the FiberRing network team in December 2013 as Network Design Engineer. Samer’s responsibilities within the team include planning capacity for the large traffic volume of FiberRing, arranging interconnections in new locations and markets, designing advanced ad-hoc solutions for the FiberRing network and customers and advise the Network support team on day to day issues.
Abstract: FiberRing operates one of the largest content networks in the world, peaking at over 2 Tb/s. In order to facilitate troubleshooting, detect attacks and saving important data as router configurations, we implement a series of tools mostly implemented in house or open source.
The key point of this presentation is to describe how FiberRing is using these tools for:
monitoring: FiberRing makes extensive use of Opsview (Nagios) and NMIS. We utilise Opsview for alerts and reporting and NMIS for detailed traffic analysis.
capacity planning: FiberRing choose PMACCT as netflow collector software and implemented an in house front-end solution that helps us locate strategic peering partners and explore ways to reduce the costs to deliver our content.
DDOS attacks detection: As every large hosting provider, we are regularly target of DDoS attacks. We implement a set of linux boxes running running nfcapd to collect traffic flows with 1 minute/per host granularity. This gives us great flexibility and incredibly valuable data to quick detect attacks and take corrective actions.
routers’ configuration backups: FiberRing is actively involved in the development of Oxidized, an innovative configuration backup tool which poses itself as rancid replacement.
1) The document discusses optimizing NFV placement in OpenStack clouds through efficient resource placement strategies.
2) It proposes extending the OpenStack scheduler to implement a "smart scheduler" using analytics and constraints-based optimization to jointly schedule compute, storage, and networking resources in an energy-efficient manner.
3) A demo showed placing NFV service VMs with affinity constraints for specific storage volumes on nearby physical servers in an optimal way using the proposed smart scheduler approach.
The document introduces RINA (Recursive Internet Architecture), a new networking architecture that aims to address structural problems with the OSI reference model. RINA defines a single type of layer that repeats as needed, providing the same inter-process communication (IPC) service between applications. It separates mechanisms from policies to simplify management. RINA allows for incremental deployment alongside existing technologies and is being researched through open source projects and standardization efforts.
This document discusses various techniques for IPv6 transition and coexistence with IPv4, including:
- Dual-stack which allows simultaneous support of both IPv4 and IPv6.
- Tunnels which encapsulate IPv6 packets in IPv4 packets to provide IPv6 connectivity through IPv4 networks.
- Translation techniques like NAT64 which allow communication between IPv4-only and IPv6-only nodes.
Network Management and Flow Analysis in Today’s Dense IT EnvironmentsSolarWinds
For more information on NetFlow Traffic Analyzer visit: http://www.solarwinds.com/products/network-traffic-analyzer/info.aspx
For more information on IP SLA visit: http://www.solarwinds.com/products/ip-sla-monitoring/info.aspx
Watch this webcast: http://www.solarwinds.com/resources/webcasts/network-management-and-flow-analysis-in-today-dense-it-environments.html
In the 1990’s, when the Internet and enterprise network build-out occurred you had to manage individual intersite WAN connections and single-purpose networking equipment. Network management required managing devices and their basic functions. Today, multifunctional and virtual devices are common. New device types and services are going to market every day.
During this webcast, we discuss network management and flow analysis evolved to keep pace with today’s complicated and dense IT environments. Specifically we’ll discuss managing data centers, changes in WAN technologies and WAN management, and best practices for flow analysis, including where to deploy flow exporters.
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
Archiving data from Durham to RAL using the File Transfer Service (FTS)Jisc
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
Update on IRATI technical work after month 6Eleni Trouva
This document provides an update on technical work in IRATI after month 6, including a description of use cases for integration testing and cloud/network integration, refinement of RINA specifications like the shim DIF over Ethernet and forwarding table generator, and an overview of the high-level software architecture and mapping of RINA concepts to the IRATI implementation. It outlines components like application processes, the IPC process daemon, IPC manager daemon, and supporting libraries.
Spark and Couchbase: Augmenting the Operational Database with SparkSpark Summit
The document discusses integrating Couchbase NoSQL with Apache Spark for augmenting operational databases with analytics. It outlines architectural alignment between Couchbase and Spark, including automatic data sharding and locality, data streaming replication from Couchbase to Spark, predicate pushdown to Couchbase global indexes from Spark, and flexible schemas. Integration points discussed include using the Couchbase data locality hints in Spark, limitations on predicate pushdown for Couchbase views and N1QL, and using the Couchbase change data capture protocol for low-latency data streaming into Spark Streaming.
12.00 - Dr. Tim Chown - University of SouthamptonIPv6 Summit 2010
1) The university deployed IPv6 in a phased approach over many years, first running it in 1997 and now having a large dual-stack production network.
2) They took a dual-stack approach to allow existing IPv4 systems while gaining experience with IPv6. Managing the complexity of dual-stack has been the main challenge.
3) Early experiences included getting IPv6 connectivity, enabling core services like DNS and web servers, and porting internal software. Harder aspects involved multi-addressing, some application support, and security issues like rogue routers.
This paper provides a comparison of the current Internet architecture based on TCP/IP and the proposed future Internet architecture of Named Data Networking (NDN). It discusses key differences in their approaches, components, packet formats, and security implementations. The TCP/IP model uses IP addresses and has a client-server request model, while NDN is information-centric, names content directly, and uses an interest-initiated model. NDN aims to more efficiently distribute popular content, optimize bandwidth usage, and reduce congestion compared to the TCP/IP architecture.
Similar to A constructive review of in network caching a core functionality of icn slides (20)
The CBC machine is a common diagnostic tool used by doctors to measure a patient's red blood cell count, white blood cell count and platelet count. The machine uses a small sample of the patient's blood, which is then placed into special tubes and analyzed. The results of the analysis are then displayed on a screen for the doctor to review. The CBC machine is an important tool for diagnosing various conditions, such as anemia, infection and leukemia. It can also help to monitor a patient's response to treatment.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
A constructive review of in network caching a core functionality of icn slides
1. A Constructive Review of In-Network
Caching: A Core Functionality of ICN*
Anshuman Kalla
1Anshuman Kalla
* A. Kalla and S. K. Sharma, "A constructive review of in-network caching: A core functionality
of ICN," 2016 International Conference on Computing, Communication and Automation
(ICCCA), Noida, 2016, pp. 567-574. DOI: 10.1109/CCAA.2016.7813785
Square brackets ‘[ ]’ denotes the reference number as per the reference list in the paper
2. Introduction
• ICN conceives caching at network layer as one of the
indispensable core functionalities of ICN
– beyond the premise of end-to-end principle
2Anshuman Kalla
3. Introduction
• ICN conceives caching at network layer as one of the
indispensable core functionalities of ICN
– beyond the premise of end-to-end principle
• Moreover, ICN advocates named-contents instead of
named-hosts
3Anshuman Kalla
4. Introduction
• ICN conceives caching at network layer as one of the
indispensable core functionalities of ICN
– beyond the premise of end-to-end principle
• Moreover, ICN advocates named-contents instead of
named-hosts
• Together the two functionalities result in content-
aware in-network caching is configured
4Anshuman Kalla
5. Introduction
• The idea is to allow caching at network layer
– That is routers are configured with Content Stores (cache facility)
that enable them to cache the contents traversing them
5Anshuman Kalla
6. Introduction
• The idea is to allow caching at network layer
– That is routers are configured with Content Stores (cache facility)
that enable them to cache the contents traversing them
• Thus every node, in addition to routing, buffering and
forwarding operations
– should perform caching of (traversing) contents
6Anshuman Kalla
7. Review of Literature
March 7, 2017 7Anshuman Kalla
Factors Affecting In-Network Caching
Aim of review of
In-Network Caching
Relevant Performance Metrics
Network Topologies Exploited
Traffic Patterns Fed
Simulators Available for Evaluation
Issues Related to In-Network Caching
Advantages of In-Network Caching
8. Issues Related to TCP/IP Networking [1],[2]
1. Data Dissemination & Service Access (prominent usage today)
– Current networking was tailored to share networking resources
Anshuman Kalla 8* See paper for all the references
9. Issues Related to TCP/IP Networking [1],[2]
1. Data Dissemination & Service Access (prominent usage today)
– Current networking was tailored to share networking resources
2. Named Hosts (i.e. IP address do actually exist in current network)
– Content name (identifier) IP address (locator) i.e. DNS lookup
Anshuman Kalla 9* See paper for all the references
10. Issues Related to TCP/IP Networking [1],[2]
1. Data Dissemination & Service Access (prominent usage today)
– Current networking was tailored to share networking resources
2. Named Hosts (i.e. IP address do actually exist in current network)
– Content name (identifier) IP address (locator) i.e. DNS lookup
3. Mobility (was least imagined when TCP/IP was designed)
– Leads to intermittent connectivity results in change in IP
Anshuman Kalla 10* See paper for all the references
11. Issues Related to TCP/IP Networking [1],[2]
1. Data Dissemination & Service Access (prominent usage today)
– Current networking was tailored to share networking resources
2. Named Hosts (i.e. IP address do actually exist in current network)
– Content name (identifier) IP address (locator) i.e. DNS lookup
3. Mobility (was least imagined when TCP/IP was designed)
– Leads to intermittent connectivity results in change in IP
4. Availability (of content and/or service with min. possible latency)
– Dependent on node/link/server state
Anshuman Kalla 11* See paper for all the references
12. Issues Related to TCP/IP Networking [1],[2]
1. Data Dissemination & Service Access (prominent usage today)
– Current networking was tailored to share networking resources
2. Named Hosts (i.e. IP address do actually exist in current network)
– Content name (identifier) IP address (locator) i.e. DNS lookup
3. Mobility (was least imagined when TCP/IP was designed)
– Leads to intermittent connectivity results in change in IP
4. Availability (of content and/or service with min. possible latency)
– Dependent on node/link/server state
5. Security (implies comm. over secured channel & trusted server)
– So far implemented at network-level but missing at content-level
Anshuman Kalla 12* See paper for all the references
13. Issues Related to TCP/IP Networking [1],[2]
1. Data Dissemination & Service Access (prominent usage today)
– Current networking was tailored to share networking resources
2. Named Hosts (i.e. IP address do actually exist in current network)
– Content name (identifier) IP address (locator) i.e. DNS lookup
3. Mobility (was least imagined when TCP/IP was designed)
– Leads to intermittent connectivity results in change in IP
4. Availability (of content and/or service with min. possible latency)
– Dependent on node/link/server state
5. Security (implies comm. over secured channel & trusted server)
– So far implemented at network-level but missing at content-level
6. Flash Crowd leads to congestion, DoS, poor QoS etc.
Anshuman Kalla 13* See paper for all the references
14. The Trend For Problem Solving
• Dedicated patch(es) for each problem encountered (for ex.)
– CDN and P2P for data dissemination
– DNS for Named Host (i.e. to resolve any name to IP address)
– MobileIP for mobility
– DNSSec and IPSec for security
– Web caching or CDN for availability
Anshuman Kalla 14
15. The Trend For Problem Solving
• Dedicated patch(es) for each problem encountered (for ex.)
– CDN and P2P for data dissemination
– DNS for Named Host (i.e. to resolve any name to IP address)
– MobileIP for mobility
– DNSSec and IPSec for security
– Web caching or CDN for availability
• These patches/fixes are add-on (not integral)
– Thus transforming TCPIP networking into complex & delicate architecture
Anshuman Kalla 15
16. The Trend For Problem Solving
• Dedicated patch(es) for each problem encountered (for ex.)
– CDN and P2P for data dissemination
– DNS for Named Host (i.e. to resolve any name to IP address)
– MobileIP for mobility
– DNSSec and IPSec for security
– Web caching or CDN for availability
• These patches/fixes are add-on (not integral)
– Thus transforming TCPIP networking into complex & delicate architecture
• Shift in primary usage of networking facility
– Instead sharing of network resources prime usage is content centric
Anshuman Kalla 16
17. The Trend For Problem Solving
• Dedicated patch(es) for each problem encountered (for ex.)
– CDN and P2P for data dissemination
– DNS for Named Host (i.e. to resolve any name to IP address)
– MobileIP for mobility
– DNSSec and IPSec for security
– Web caching or CDN for availability
• These patches/fixes are add-on (not integral)
– Thus transforming TCPIP networking into complex & delicate architecture
• Shift in primary usage of networking facility
– Instead of sharing network resources prime usage is content centric
• Lately researchers realized need for clean-slate approach
– To reconcile all the issues (and shift in usage) in a unified manner
Anshuman Kalla 17
18. Core Functionalities of ICN
• Named content
• In-network caching
• Named based routing
• Data-level security
• Multi-path routing
• Hop-by-hop flow control
• Pull-based communication
• Adaptability to Multiple simultaneous connectivities
Anshuman Kalla 18
19. Core Functionalities of ICN
• Named content
• In-network caching
• Named based routing
• Data-level security
• Multi-path routing
• Hop-by-hop flow control
• Pull-based communication
• Adaptability to Multiple simultaneous connectivities
Anshuman Kalla 19
20. Types of In-Network Caching in ICN
March 7, 2017 20Anshuman Kalla
In-Network Caching
Off-Path Caching Edge CachingOn-Path Caching
Hybrid Caching
21. March 7, 2017 21Anshuman Kalla
• On-Path Caching
– Caches the retrieved contents at the intermediate nodes that fall on
the (symmetrical) way back from server to the requester
– Thus interest taps nodes falling on-the-path from requester to server
Types of In-Network Caching in ICN
22. March 7, 2017 22Anshuman Kalla
• On-Path Caching
– Caches the retrieved contents at the intermediate nodes that fall on
the (symmetrical) way back from server to the requester
– Thus interest taps nodes falling on-the-path from requester to server
• Off-Path Caching
– Appoints node(s) as a dedicated cache(s) for a retrieved content
– Selected caches have no contrived correlation with the nodes that
fall on the path being followed by interest to reach the server
Types of In-Network Caching in ICN
23. March 7, 2017 23Anshuman Kalla
• On-Path Caching
– Caches the retrieved contents at the intermediate nodes that fall on
the (symmetrical) way back from server to the requester
– Thus interest taps nodes falling on-the-path from requester to server
• Off-Path Caching
– Appoints node(s) as a dedicated cache(s) for a retrieved content
– Selected caches have no contrived correlation with the nodes that
fall on the path being followed by interest to reach the server
• Edge Caching
– Opposes pervasive in-network caching
– Only the nodes at the boundary of a network are enabled with
caching capability
Types of In-Network Caching in ICN
24. Types of In-Network Caching in ICN
Interest Packet
Data Packet
R8 R7 R6
R1
R2
R3R4
R5
R1
R2
R3R4
R5
R8 R7 R6
R1
R2
R3R4
R5R8 R7 R6
Nodes that could cache data
On-Path Caching
(R1, R2, R3, R6 – On-Path Caches)
Off-Path Caching
(R4 – Designated Off-Path Cache)
Edge Caching
(R6 – Edge Cache)
25. Advantages of In-Network Caching in ICN
1. Cost Effective Data Retrieval
– Minimizes delegation of traffic for cached contents over egress links
– Thereby minimizes traffic over expensive external links and server load
Anshuman Kalla 25* See paper for all the references
26. Advantages of In-Network Caching in ICN
1. Cost Effective Data Retrieval
– Minimizes delegation of traffic for cached contents over egress links
– Thereby minimizes traffic over expensive external links and server load
2. Reduction in Latency
– Since contents are cached at comparatively closer intermediate nodes
– Thereby improves Quality-of-Service (QoS) perceived by users
Anshuman Kalla 26* See paper for all the references
27. Advantages of In-Network Caching in ICN
1. Cost Effective Data Retrieval
– Minimizes delegation of traffic for cached contents over egress links
– Thereby minimizes traffic over expensive external links and server load
2. Reduction in Latency
– Since contents are cached at comparatively closer intermediate nodes
– Thereby improves Quality-of-Service (QoS) perceived by users
3. Heavy Load Handling
– Caching transforms nodes into legitimate proxies of origin server
– Thereby inherently tackles heavy load situations like flash crowd
Anshuman Kalla 27* See paper for all the references
28. Advantages of In-Network Caching in ICN
1. Cost Effective Data Retrieval
– Minimizes delegation of traffic for cached contents over egress links
– Thereby minimizes traffic over expensive external links and server load
2. Reduction in Latency
– Since contents are cached at comparatively closer intermediate nodes
– Thereby improves Quality-of-Service (QoS) perceived by users
3. Heavy Load Handling
– Caching transforms nodes into legitimate proxies of origin server
– Thereby inherently tackles heavy load situations like flash crowd
4. Efficient Retransmissions
– Caching allows retransmission of content’s cached copy from closest node
– Thereby ensures better resiliency to packet losses
Anshuman Kalla 28* See paper for all the references
29. Advantages of In-Network Caching in ICN
5. Higher Availability
– More legitimate proxies of server i.e. caches improves content availability
– Thereby reduces the probability of Denial of Service (DoS) attack
Anshuman Kalla 29* See paper for all the references
30. Advantages of In-Network Caching in ICN
5. Higher Availability
– More legitimate proxies of server i.e. caches improves content availability
– Thereby reduces the probability of Denial of Service (DoS) attack
6. Buoyancy to Intermittent Connectivity
– Caching inherently allows to sustain intermittent connectivity
– Also allows mobile nodes to act as a network medium for areas
uncovered by network
Anshuman Kalla 30* See paper for all the references
31. Issues Related to In-Network Caching in ICN
1. Cache Placement or Allocation
– Where to place the caches (i.e. content stores)?
– That is caching facility at all or selected nodes in a network
– Edge nodes / core nodes / central nodes / strategically selected nodes
Anshuman Kalla 31* See paper for all the references
32. Issues Related to In-Network Caching in ICN
1. Cache Placement or Allocation
– Where to place the caches (i.e. content stores)?
– That is caching facility at all or selected nodes in a network
– Edge nodes / core nodes / central nodes / strategically selected nodes
2. Cache Size Dimensioning
– What should be the size of caches?
– That is allowing homogeneous or heterogeneous caches
– In case of heterogeneous where to boost cache size comparatively
Anshuman Kalla 32* See paper for all the references
33. Issues Related to In-Network Caching in ICN
1. Cache Placement or Allocation
– Where to place the caches (i.e. content stores)?
– That is caching facility at all or selected nodes in a network
– Edge nodes / core nodes / central nodes / strategically selected nodes
2. Cache Size Dimensioning
– What should be the size of caches?
– That is allowing homogeneous or heterogeneous caches
– In case of heterogeneous where to boost cache size comparatively
3. Content Placement
– Where to cache a retrieved content within a network?
– That is where to cache the retrieved content to improve performance
– Centralized or decentralized manner (explicit or implicit coordination)
Anshuman Kalla 33* See paper for all the references
34. Issues Related to In-Network Caching in ICN
4. Content Selection
– What to cache out of huge flow of contents?
– That is to identify profitable contents from content catalog for caching
– Could be performed event after content placement if the placement
mechanism is oblivious of content’s utility characteristics
Anshuman Kalla 34* See paper for all the references
35. Issues Related to In-Network Caching in ICN
4. Content Selection
– What to cache out of huge flow of contents?
– That is to identify profitable contents from content catalog for caching
– Could be performed event after content placement if the placement
mechanism is oblivious of content’s utility characteristics
5. Replacement policy
– Which cached-content should be evicted to accommodate an incoming
content?
– That is when cache is full then which residing content to be evicted to
cache the retrieved content
Anshuman Kalla 35* See paper for all the references
36. Factors Affecting In-Network Caching in ICN
1. Network topology
– Its cognizance might be crucial for performing caching
Anshuman Kalla 36* See paper for all the references
37. Factors Affecting In-Network Caching in ICN
1. Network topology
– Its cognizance might be crucial for performing caching
2. Size of Content Population (Content Catalog)
– Total number of distinct contents for which request could be received
Anshuman Kalla 37* See paper for all the references
38. Factors Affecting In-Network Caching in ICN
1. Network topology
– Its cognizance might be crucial for performing caching
2. Size of Content Population (Content Catalog)
– Total number of distinct contents for which request could be received
3. Popularity Distribution
– Plays vital role but popularity estimation is itself a challenging task
Anshuman Kalla 38* See paper for all the references
39. Factors Affecting In-Network Caching in ICN
1. Network topology
– Its cognizance might be crucial for performing caching
2. Size of Content Population (Content Catalog)
– Total number of distinct contents for which request could be received
3. Popularity Distribution
– Plays vital role but popularity estimation is itself a challenging task
4. Popularity Dynamics
– Percentage and/or frequency of change in popularity of contents
Anshuman Kalla 39* See paper for all the references
40. Factors Affecting In-Network Caching in ICN
1. Network topology
– Its cognizance might be crucial for performing caching
2. Size of Content Population (Content Catalog)
– Total number of distinct contents for which request could be received
3. Popularity Distribution
– Plays vital role but popularity estimation is itself a challenging task
4. Popularity Dynamics
– Percentage and/or frequency of change in popularity of contents
5. Latency
– In terms of hop-count or distance, used to trigger caching decision
Anshuman Kalla 40* See paper for all the references
41. Factors Affecting In-Network Caching in ICN
6. Bandwidth
– Available over retrieval path is another factor used for caching decision
Anshuman Kalla 41* See paper for all the references
42. Factors Affecting In-Network Caching in ICN
6. Bandwidth
– Available over retrieval path is another factor used for caching decision
7. Cache size per node
– Homo or heterogeneous sized caches to analyze caching performance
Anshuman Kalla 42* See paper for all the references
43. Factors Affecting In-Network Caching in ICN
6. Bandwidth
– Available over retrieval path is another factor used for caching decision
7. Cache size per node
– Homo or heterogeneous sized caches to analyze caching performance
8. Granularity of content
– Entire object or packet or chunk – granularity may affect performance
Anshuman Kalla 43* See paper for all the references
44. Factors Affecting In-Network Caching in ICN
6. Bandwidth
– Available over retrieval path is another factor used for caching decision
7. Cache size per node
– Homo or heterogeneous sized caches to analyze caching performance
8. Granularity of content
– Entire object or packet or chunk – granularity may affect performance
9. Size of Content
– Homogeneous (small or large sized) or heterogeneous sized contents
Anshuman Kalla 44* See paper for all the references
45. Factors Affecting In-Network Caching in ICN
6. Bandwidth
– Available over retrieval path is another factor used for caching decision
7. Cache size per node
– Homo or heterogeneous sized caches to analyze caching performance
8. Granularity of content
– Entire object or packet or chunk – granularity may affect performance
9. Size of Content
– Homogeneous (small or large sized) or heterogeneous sized contents
10.Pricing (Cost involved in fetching contents)
– In order to prioritize caching of costlier contents
Anshuman Kalla 45* See paper for all the references
46. Factors Affecting In-Network Caching in ICN
11. Mobility
– Movement tendency of users for pre-fetching based caching
Anshuman Kalla 46* See paper for all the references
47. Factors Affecting In-Network Caching in ICN
11. Mobility
– Movement tendency of users for pre-fetching based caching
12. Routing
– Multipath routing affects the caching performance differently
Anshuman Kalla 47* See paper for all the references
48. Factors Affecting In-Network Caching in ICN
11. Mobility
– Movement tendency of users for pre-fetching based caching
12. Routing
– Multipath routing affects the caching performance differently
13. Spatial Locality
– Accessing tendency of user in a geographical area for caching decisions
Anshuman Kalla 48* See paper for all the references
49. Factors Affecting In-Network Caching in ICN
11. Mobility
– Movement tendency of users for pre-fetching based caching
12. Routing
– Multipath routing affects the caching performance differently
13. Spatial Locality
– Accessing tendency of user in a geographical area for caching decisions
14. Social Networking
– Caching of contents accessed or produced by socially active & influential
users
Anshuman Kalla 49* See paper for all the references
50. Performance Metrics For In-Network Caching
1. Hit Ratio
– Number of satisfied requests by caching to total number of requests
– Higher is hit ratio better is the caching performance
Anshuman Kalla 50* See paper for all the references
51. Performance Metrics For In-Network Caching
1. Hit Ratio
– Number of satisfied requests by caching to total number of requests
– Higher is hit ratio better is the caching performance
2. Bandwidth Usage
– Implies usage of expensive external links as well as internal links
– Lower bandwidth usage implies better caching performance
Anshuman Kalla 51* See paper for all the references
52. Performance Metrics For In-Network Caching
1. Hit Ratio
– Number of satisfied requests by caching to total number of requests
– Higher is hit ratio better is the caching performance
2. Bandwidth Usage
– Implies usage of expensive external links as well as internal links
– Lower bandwidth usage implies better caching performance
3. Cache Load
– Number of contents to be cached by a content store
– Homo or heterogeneously loaded cached
– Later leads to unbalanced caches & creation of hot spots
Anshuman Kalla 52* See paper for all the references
53. Performance Metrics For In-Network Caching
1. Hit Ratio
– Number of satisfied requests by caching to total number of requests
– Higher is hit ratio better is the caching performance
2. Bandwidth Usage
– Implies usage of expensive external links as well as internal links
– Lower bandwidth usage implies better caching performance
3. Cache Load
– Number of contents to be cached by a content store
– Homo or heterogeneously loaded cached
– Later leads to unbalanced caches & creation of hot spots
4. Server Load
– Number of content-requests arriving at original server
– Lower the server load better will be service providedAnshuman Kalla 53
54. Performance Metrics For In-Network Caching
5. Latency
– Implies delay encountered in retrieving a requested content
– Lower latency boosts Quality-of-Experience (QoE) perceived by users
– Thus reduction in latency achieved is used to gauge caching performance
Anshuman Kalla 54* See paper for all the references
55. Performance Metrics For In-Network Caching
5. Latency
– Implies delay encountered in retrieving a requested content
– Lower latency boosts Quality-of-Experience (QoE) perceived by users
– Thus reduction in latency achieved is used to gauge caching performance
6. Cache Diversity
– Implies number of unique contents residing in network caches
– Higher cache diversity improves overall performance
Anshuman Kalla 55* See paper for all the references
56. Performance Metrics For In-Network Caching
5. Latency
– Implies delay encountered in retrieving a requested content
– Lower latency boosts Quality-of-Experience (QoE) perceived by users
– Thus reduction in latency achieved is used to gauge caching performance
6. Cache Diversity
– Implies number of unique contents residing in network caches
– Higher cache diversity improves overall performance
7. Complexity & Overheads
– Caching needs to be simple, light-weight and practically deployable
Anshuman Kalla 56* See paper for all the references
57. Performance Metrics For In-Network Caching
5. Latency
– Implies delay encountered in retrieving a requested content
– Lower latency boosts Quality-of-Experience (QoE) perceived by users
– Thus reduction in latency achieved is used to gauge caching performance
6. Cache Diversity
– Implies number of unique contents residing in network caches
– Higher cache diversity improves overall performance
7. Complexity & Overheads
– Caching needs to be simple, light-weight and practically deployable
8. Fairness
– In terms of content selection fairness, link load fairness, popularity
estimation fairness etc.
Anshuman Kalla 57* See paper for all the references
58. Performance Metrics For In-Network Caching
9. Resiliency to DoS Attack
– Network caching transforms caches into legitimate proxies of origin server
– Thus caches collectively handles DoS attack by divide-and-conquer rule
Anshuman Kalla 58* See paper for all the references
59. Real Network Topologies
• Abilene [14]
• Rocketfuel [12]
• CERNET2 [9]
• CAIDA [10]
• CRAWDAD [15]
• CERNET [16]
Anshuman Kalla 59
• GEANT [17]
• Tiger [18]
• GARR [19]
• WIDE [20]
• PlanetLab [21]
* See paper for all the references
60. Fabricated Network Topologies
• Barabasi-Albert (BA) Power Law Model [11]
• Watts-Strogatz (WS) Model [13]
• Boston university Representative Internet Topology
gEnerator (BRITE) Tool [22]
• Gorgia Tech -Internetwork Topology Models (GT-ITM)Tool
[23]
• Internet Topology Generator (INET) Tool [24]
Anshuman Kalla 60* See paper for all the references
61. Traffic Patterns
• Synthetic traffic workload have been generated using
– Zipf distribution (α ranging between 0.6 to 1.8) and
– Zipf-Mandelbrot distribution (with different value of α and q)
• Real traffic traces that have been used are
– P2P Workload [36]
– LastFM [32]
– Facebook data [33]
Anshuman Kalla 61* See paper for all the references
62. Network Simulators For ICN
• ccnSim simulator [25]
• CCNx simulator [26]
• OPNET simulator [27]
• CCNPL simulator [28]
• OMNET++ simulator [29]
• ICARUS simulator [30]
• NEPI simulator [31]
Anshuman Kalla 62* See paper for all the references
63. References
The reference list remains the same that is used in the original paper
Anshuman Kalla 63* See paper for all the references