Big Data Analytics and Advanced Computer Networking Scenarios


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Protocols tend to be defined in isolation, however, with each solving a specific problem and without the benefit of any fundamental abstractions. This has resulted in one of the primary limitations of today’s networks: complexity. For example, to add or move any device, IT must touch multiple switches, routers, firewalls, Web authentication portals, etc. and update ACLs, VLANs, quality of services (QoS), and other protocol-based mechanisms using device-level management tools. In addition, network topology, vendor switch model, and software version all must be taken into account. Due to this complexity, today’s networks are relatively static as IT seeks to minimize the risk of service disruption.
  • SDN also greatly simplifies the network devices themselves, since they no longer need to understand and process thousands of protocol standards but merely accept instructions from the SDN controllers.
  • The open standards (north and south)
  • Suppose that you have a cloud distributed services to compute and visualize in different locations. Can imagine how the network might suffer to transport a massive amount of data between datacenters? So, how can the network support such operations? It can’t, using current technologies.
  • As an example, datacenters can now offer multiple clouds to different tenants, instead of separating virtual networks. This is a more abstract view and facilitates infrastructure management
  • The FIB is a table used to forward Interest packets to potential sources of their content.The CS acts such as the buffer memory of an IP router. However CS has a different replacement policy: it remembers the Data packets arriving as long as possible (using LRU or LFU scheme) for maximizing the probability of sharing and minimizing the upstream bandwidth demandThe PIT keeps track of the IOs recently requested and not yet served
  • ICN frequently has to validate the binding between names and content. One technique to do that is known by self-certification. Self-certification is related to all data or just pieces of IO depending of the approach chosen. Therefore, self-certification ensures that the only way of performing unauthorized changes in the data is by changing the IO´s ID (i.e. the content name)persistent names ensures that content names would not change in spite of chances of the storage location
  • Some of these challenges can be tackle by the research work on big data
  • Big Data Analytics and Advanced Computer Networking Scenarios

    1. 1. August 2013 Institute for Big Data Analytics – Dalhousie University Big Data Analytics and Advanced Computer Networking Scenarios: Research Challenges and Opportunities Stenio Fernandes CIn/UFPE, Recife, Brazil
    2. 2. Agenda  A bit of technical background – Measurements and Analysis in Computer Networks  Advanced Networking Architectures – Software-Defined Networking (SDN) – Information-Centric Networking (CCN) – Network Visualization (NV)  Tools and Techniques for High-Performance Network Traffic Analysis – Visual Analytics, GPU, Map Reduce  Applied Research on Computer Networking – Opportunities and Directions  Research agenda – CIn/UFPE and DalhousieU
    3. 3. TECHNICAL BACKGROUND Measurements and Analysis in Computer Networks
    4. 4. Essential (Core) motivation Profiling Internet traffic • is an essential task for precise network management • At both access and backbone networks It provides useful information for • Proper (re) configuration of networks • Deployment of accurate policies (security, routing, throttling, capping, etc) • Optimization of network resources • Support for network design and planning • Counterattack abnormal behavior
    5. 5. Why Operators need Internet profiling? Network-wide Reporting Performance/reliability troubleshooting Security Traffic engineering Capacity planning • Generating basic information about usage and reliability • Detecting and diagnosing anomalous events • Detecting, diagnosing, and blocking security problems • Adjusting network configuration to the prevailing traffic • Deciding where and when to install new equipment 5
    6. 6. Reporting Examples • Total volume of traffic sent to/from each private peer • Mixture of traffic by application (e.g., Web, Streamin g, P2P, SPAM) • Mixture of traffic to/from individual customers • Usage, loss, and reliability trends for each link Requirements • Network-wide view of basic traffic statistics • Ability to have different views: by application, by customer, by peer, by link type • Real-time and offline monitoring of high- speed links 6
    7. 7. Core Network Troubleshooting Detecting and diagnosing problems • Recognizing and explaining anomalous events Why a backbone link is suddenly overloaded? Why DNS queries are failing with high probability? Why a router processor has high CPU utilization? Why a customer cannot reach certain networks? 7
    8. 8. Core Security Detecting and diagnosing problems Recognizing suspicious traffic or disruptions Examples Denial-of-service attack on a customer or service Spread of a worm or virus through the network Router hijack Requirements Detailed measurements from multiple places Include payload inspection, in some cases Online analysis of the data Installing filters to block the offending traffic 8
    9. 9. Core Traffic Engineering • Active queue management and link scheduling • Green Networking Resource allocation policies • Divert traffic from congested links • Balance load on peering links • Link-scheduling weights to reduce delay for premium traffic Examples • Network-wide view of the traffic carried in the backbone • Timely view of the network topology • Analytical models to assess and predict performance of control operations Requirements 9
    10. 10. Core Capacity Planning Deploying new equipment • What? Where? When? Examples • Where to put the next backbone router • When to upgrade a link to higher capacity • Whether to add/remove a particular peer • Whether the network can accommodate a new customer • Whether to install a caching proxy Requirements • Projections of future traffic patterns from measurements • Cost estimates for buying/deploying the new equipment • Model of the potential impact of the change (e.g., latency reduction and bandwidth savings) 10
    11. 11. TECHNICAL BACKGROUND Measurements, Analysis, and Modeling
    12. 12. Technical Background: Measurements Packet • More detailed: from link to application layer (with timestamps) • Huge storage and processing requirements • Header or payload (full or partial) Flow • Flow summaries • connection info, number of packets, duration, volume • IPFIX/CISCO’s NetFlow v5/v9 records Aggregate • SNMP counts
    13. 13. Measurements: Packets
    14. 14. Measurements: Flows Sampling Technique Flow Monitoring Tool F4 F3 F2 F1 F4 F3 Representative flow sample Collected, classifiedflows Network Packets Flow Collector Router: flow building Collector: flow storage 31 2 4 GUI: flow analysis and reporting 5 On-line sampling Off-line sampling Traffic Management and Analysis Live Network
    15. 15. Technical Background: Analysis of Packet Traces IP header • Traffic volume by IP addresses or ASes • Burstiness of the stream of packets • Packet properties (e.g., sizes, out-of-order) Transport header • Traffic breakdown by protocol • TCP congestion and flow control • Number of bytes and packets per session Application header • URLs, HTTP headers, file type • DNS queries and responses, • mobile devices 15
    16. 16. Core Modelling • maximize insight into the data set • extract important variables • detect outliers and anomalies • develop parsimonious models Exploratory Data Analysis • Does the data follow a particular PDF? • Maximum Likelihood Estimation • Hypothesis testing Statistics Inference
    18. 18. Research Challenges: Measurements Network-wide view Crucial for evaluating control actions Multiple kinds of data from multiple locations Large scale Large number of high-speed links and routers Large volume of measurement data The “do no harm” principle (passive measurements) Don’t degrade router performance Don’t require disabling key router features Don’t overload the network with measurement data 22
    19. 19. Research Challenges: Packet Measurements Building efficient DPI engines • 1 packet every 5ns!!! • Based on DFA/NFA from regular expressions that express application signatures • For hardware-based or commodity platforms Update of app signatures database • Encrypted traffic is not possible • Analysis of packet payload forbidden in a number of countries
    20. 20. High-Performance Traffic Monitoring Systems Large number of application signatures Complexity of the signature patterns Unpredictability of signature location in the network flow, as well as within the packet payload Performance bottlenecks at OS and hardware levels Visual Analytics
    21. 21. Research Challenges: Flow level Analysis Tries to identify application or classes of applications without looking at the payload • May extract high-level models for unsupervised classification and learning Less data volume to analyse • Still tough to do it in real-time in high-speed links • from 40Gbps and beyond Address privacy issues for lawful interception
    23. 23. Server, OS, Programming Platforms Several abstraction layers in programming, db, etc, but networking
    24. 24. Networking Services
    25. 25. NEW NETWORKING ARCHITECTURES Software Defined Networking (SDN)
    26. 26. SDN – Motivation Current networks cannot support this growth! -Not service-oriented -Static configuration -Status not available to apps/users -Cannot provide dynamic negotiation to users
    27. 27. Motivation: economics
    28. 28. The Need for a New Network Architecture (The ONF view)  key computing trends: – Changing traffic patterns  contrast to client-server applications  today’s apps access different services  access to content and applications from any type of device, anywhere, at any time – The rise of cloud services  agility to access applications, infrastructure, and other IT resources on demand and à la carte – Big data means more bandwidth  Mega datasets is fueling a constant demand for additional network capacity in the data center
    29. 29. Limitations of Current Networking Technologies (The ONF View)  Meeting current market requirements using device-level management tools and manual processes  Complexity that leads to stasis – The static nature of networks is in stark contrast to the dynamic nature of today’s environment  Inconsistent policies – To implement a network-wide policy, thousands of devices and mechanisms must be configured  Inability to scale – traffic patterns are dynamic and unpredictable – users with different apps and performance needs
    30. 30. SDN (the ONF view)  Emerging network architecture where network control is decoupled from forwarding and is directly programmable – Migration of control into accessible computing devices enables the underlying infrastructure to be abstracted for applications and network services  can treat the network as a logical or virtual entity  Network intelligence is (logically) centralized – SDN controllers maintains a global view of the network  Network appears to the applications and policy engines as a single, logical switch – infrastructure gains vendor-independent control over the entire network from a single logical point
    31. 31. SDN Architecture
    32. 32. Motivation: what drives SDN research and development?  Reduced network costs (CAPEX / OPEX)  Support to Innovative New Products (applications, services)  Synergy with Cloud Computing Services and Infrastructure  And most importantly: Real time network programmability  This is the quest for networks with improved performance while keeping them simple, scalable, and “ smart”
    33. 33. Innovation Roadblocks vs. Enablers for Big Data Analytics  Roadblocks – from the Network Layer  Proprietary software in network devices  Developers have to rely on the network as is – Support for data-intensive science and applications  One-size-fits-all approach to network data flows  Enablers – from the Network Layer  Let developers communicate with and program the network itself  Allow developers to optimize the network for specific applications • Support for data-intensive science and applications  Allow special solutions to high- performance data flows  Include support to network programmability
    34. 34. Internet2 SDN use case
    35. 35. Internet2 SDN infrastructure
    36. 36. A Simplified View of SDN 1. A network in which the control plane is physically separate from the forwarding (data) plane • A single control plane controls several forwarding devices
    37. 37. Consequences of SDN adoption 1. Hardware and Software from different vendors 2. Simplified Programmability 3. Enable application-level control/programming of network 4. Enables centralized control, which implies simplification of network operations 5. Prospective integration with Network Virtualization technologies (cf. next section)
    38. 38. Supporting SDN with OpenFlow  First standard communications interface for SDN – between the control and forwarding layers  It allows direct access to and manipulation of the forwarding plane of network devices – both physical and virtual (hypervisor-based)  OpenFlow IS NOT SDN!
    39. 39. SDN - Challenges  North (apps) to South (devices) Traffic Pattern – Needs precise classification systems – Needs model building – At high-speed – Real-time – Adapt to abrupt and long-term changes – Cope with millions to billions of flows in short-term (e.g., mice flows in 5min time window)  Core challenge: decide which service policy to be applied to a flow (Classification and optimization problem)
    40. 40. OF-based SDN Benefits (1/2)  Centralized control of multi-vendor environments – use SDN-based orchestration and management tools to quickly deploy, configure, and update devices across the entire network  Reduced complexity through automation – develop tools that automate many management tasks  Higher rate of innovation – Allowing operators to program and reprogram the network  in real time to meet specific business needs and user requirements
    41. 41. OF-based SDN Benefits (2/2)  Increased network reliability and security – define high-level configuration and policy statements  More granular network control – apply policies at a very granular level  session, user, device, and application levels  Better user experience – Centralized network control and state information available to higher-level applications  Infrastructure can better adapt to dynamic user needs – E.g.: Adaptive Video Streaming
    42. 42. SDN: Virtual Cloud
    43. 43. SDN: Research Challenges (1/2)  SDN Architecture Design – accommodating consistency, dependability, and scalability requirements  control plane: centralized or distributed processing? – controller placement problem  How many? Where to place them? How to distribute tasks? – Maximizing fault tolerance and dependable infrastructure  to support high-performance intra-DC data exchange for Big Data Analytics  Optimized Policy Framework – automatic policy transformation
    44. 44. SDN Challenges (2/2)  Resiliency to security and DoS attacks – Vulnerability in the Control Plane  Multi-Dimensional Aggregation of Rules – Use multi-dimensional tags – Ensure policy consistency  Example: Mobile Infrastructure
    45. 45. NEW NETWORKING ARCHITECTURES Network Virtualization
    46. 46. NV: concepts  What is NV? – Decoupling of the services provided by a (virtualized) network from the physical network  Virtual network is a “container” of network services (L2 - L7) provisioned by software – Faithful reproduction of services provided by physical network  Analogy to a VM – complete reproduction of physical machine (CPU, memory, I/O, etc.)
    47. 47. NV: concepts
    48. 48. Business Model for NV Players: 1. InP: Infrastructure Provider 2. Virtual Network Provider/Operator 3. SP: Service Provider 4. End-user
    49. 49. NV: Mapping problem
    50. 50. NEW NETWORKING ARCHITECTURES Information-Centric Networking (ICN)
    51. 51. ICN: Motivation  Traditional Internet communication model is based on end-to-end communication  There is a growing need of highly scalable and efficient distribution of content – CDN is a success although might be seen as a patch  Information driven communication breaks the traditional packet-based model allowing an content-centric communication – ICN architectures takes advantage of  in-network storage  multiparty communication  interaction models (e.g., publish-subscribe)
    52. 52. ICN: Technical Background  New location-independent approach to communicate – more suitable for content distribution  ICN architectures are replacing where with what  Ruled by the consumers of data – Interest and Data packets  i) a content consumer asks for some content by broadcasting its interest to all nodes it can reach  ii) any node that receives the Interest packet and has the content responds with a Data packet
    53. 53. ICN: Technical Background  The basic operation of an ICN node is similar to an IP host – A packet arrives on an interface  A longest-match lookup is performed on its name  Building blocks for ICN architectures – Information Objects – Content Naming – Security – Content Forwarding – In-Network Caching – Routing and Transport
    54. 54. ICN: Technical Background  Information Objects (IO) – IO represents content information without taking in consideration its storage location and physical representation – IO can have multiple copies of itself  Content Naming – treat content as a network primitive  Unique, Persistence, Scalability – Hierarchical or Flat Naming
    55. 55. ICN: Technical Background  Security – Content Validation – Name Persistence – Owner Authentication and Identification  Content Forwarding
    56. 56. ICN: Technical Background  In-Network Caching – store temporarily content in the network core elements – small but popular content generates most Internet traffic  Heavy-tailed nature of Internet traffic  Routing and Transport – IO identifiers are not bind to a specific location – common topology-based routing and forwarding algorithms are not effective for routing Ios  Current Architectures:  CCN  Publish-Subscribe Internet Routing Paradigm (PSIRP)  4WARD-Netinf  Dona  CCNx
    57. 57. ICN: challenges  Scalability – To be effective, routers should be able to keep TBs of information in cache  Security – naming scheme that allows both self-certification and human-friendly identification while avoiding the use of a PKI is an open issue  Privacy – makes information visible and identifiable at the network level  Economic model – Adoption of ICN depends not only on technical aspects
    59. 59. VA: Motivation  Effectively use the immense wealth of data and information acquired, computed, and stored  analysts can get lost in irrelevant or inappropriately processed or presented information – For computer networks, acquisition of raw data is no longer a problem  Visualization techniques might be very effective – but for some analyses, pure visualization do not completely expose insights hidden in the data
    60. 60. VA: definition  Science of analytical reasoning supported by highly interactive visual interfaces, transcending simple and direct data visualization, and requiring active user participation
    61. 61. VA: supporting technologies
    62. 62. VA example
    63. 63. VA: Challenges  Challenges for Visualization Systems for computer networks data – Limited scalability – Knowledge discovery – Appropriateness to perform data transformation – Data presentation – Interaction with the visualization system – Hardware bottlenecks – Multi-attribute visualization
    66. 66. Research Challenges and Opportunities  Cloud Computing Services are driving huge changes in the computer networking field – Distributed and hybrid clouds will be a reality soon  Moving massive amount of data to be moved  SDN seems to be a smart solution to address scalability and other issues for Big Data – NV is available as the supporting technology  CCN is a paradigm shift and might face barriers to full deployment  Opportunities for advanced research is everywhere in those new scenarios – Content is becoming king in networking
    67. 67. Center For Informatics (CIn) Federal University Of Pernambuco (UFPE) Recife, Brazil About
    68. 68. CIn/UFPE • ~42K students, ~1K PhD professorsUFPE • Top 5 CS Graduate Program in Brazil • Evaluation: CAPES level 6 (scale 1 to 7) • Top 10 most important CS Research Center in Latin America Recognition • 80+ PhD professors • ~25% CNPq Research ChairsFaculty • Computer Science, Computer Engineering, Information SystemsPrograms
    69. 69. 2000+ students International collaboration: Europe, Asia, and North America Research Projects (Private and Public funded) CNPq, CAPES, FACEPE Samsung, Ericsson, Motorola, Nokia, LG, HP, etc Recipient of a number of awards: • 2011 Most Innovative Brazilian Research Center • Microsoft Imagine Cup (since 2005) • ACM Intl. Programming Marathon Recruitment: Google, Microsoft, Facebook CIn/UFPE
    70. 70. Leucotron Mecaf Itautec Motorola 2003 Waytec Ericsson Leucotron Mecaf Itautec Motorola 2004 Engetron Samsung Ericsson Leucotron Mecaf Itautec Motorola 2005 Epson Engetron Samsung Ericsson Leucotron Mecaf Itautec Motorola 2006 Positivo Epson Engetron Samsung Ericsson Leucotron Mecaf Itautec Motorola 2007 Siemens Positivo Epson Engetron Samsung Ericsson HP Mecaf Itautec Motorola 2008 Sankwang Positivo Epson Engetron Samsung Ericsson HP Celestica Itautec Motorola 2009 Motorola 2002 Megaware Elcoma Foxconn Sankwang Positivo Epson Engetron Samsung Ericsson HP Celestica Itautec Motorola 2010 1 4 6 7 8 9 10 10 13
    71. 71. Research Agenda with Dalhousie • International Science & Technology Partnership (ISTP) and Pernambuco State Research Funding Agency (FACEPE) • UFPE, Dalhousie University • GSTS, Neurotech • ~ CAD 2Mi over 2 years New R&D program • Open to new ideas and interests Further Collaboration
    72. 72. Recife, Pernambuco, Brazil