• Save
A Social Content Delivery Network for Scientific Cooperation: Vision,  Design, and Architecture
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

A Social Content Delivery Network for Scientific Cooperation: Vision, Design, and Architecture

on

  • 707 views

Data volumes have increased so significantly that we need to carefully consider how we interact with, share, and analyze data to avoid bottlenecks. In contexts such as eScience and scientific ...

Data volumes have increased so significantly that we need to carefully consider how we interact with, share, and analyze data to avoid bottlenecks. In contexts such as eScience and scientific computing, a large emphasis is placed on collaboration, resulting in many well-known challenges in ensuring that data is in the right place at the right time and accessible by the right users. Yet these simple requirements create substantial challenges for the distribution, analysis, storage, and replication of potentially "large" datasets. Additional complexity is added through constraints such as budget, data locality, usage, and available local storage. In this paper, we propose a "socially driven" approach to address some of the challenges within (academic) research contexts by defining a Social Data Cloud and underpinning Content Delivery Network: a Social CDN (S-CDN). Our approach leverages digitally encoded social constructs via social network platforms that we use to represent (virtual) research communities. Ultimately, the S-CDN builds upon the intrinsic incentives of members of a given scientific community to address their data challenges collaboratively and in proven trusted settings. We define the design and architecture of a S-CDN and investigate its feasibility via a coauthorship case study as first steps to illustrate its usefulness.

Statistics

Views

Total Views
707
Views on SlideShare
694
Embed Views
13

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 13

http://www.ksri.kit.edu 13

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

A Social Content Delivery Network for Scientific Cooperation: Vision, Design, and Architecture Presentation Transcript

  • 1. A Social Content Delivery Networkfor Scientific Cooperation:Vision, Design, and ArchitectureKyle Chard, Simon Caton, Omer Rana, Daniel S. Katz www.ci.anl.gov www.ci.uchicago.edu
  • 2. Introduction• Collaboration is increasingly data intensive• To avoid research bottlenecks we need data... – At the right place, at the right time, with appropriate access permissions• Challenges – Distribution, storage, replication, budget, security, perf ormance, locality, reliability, availability..• Current approaches to data distribution/sharing? www.ci.anl.gov2 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 3. Data (Content) Distribution• Other domains use CDNs – E.g. web objects, downloads, streaming media, social networks• But, scientific data is often – BigData – Long tail – Private – Geographically distributed• Commercial CDNs infeasible and unaffordable for scientific data. www.ci.anl.gov3 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 4. Social Content Delivery Network (S-CDN)• Utilizes the resources of community members – Low cost, distributed infrastructure• Social network Social Layer identifies locations to distribute and store subsets of data Resource Layer • Algorithms to partition and distribute data based relationships with others• Built upon the concept Content Delivery Layer of a Social (Data) Cloud www.ci.anl.gov4 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 5. Trust• Types of trust for a S-CDN 1. Infrastructure trust via appropriate security and authentication mechanisms as well as policies 2. Inter-personal trust as an enabler of social collaboration. – “a positive expectation or assumption on future outcomes that results from proven contextualized personal interaction- histories”• In the context of a S-CDN – Leverage trust to select interaction partners – Develop “trust models” to aid CDN management algorithms www.ci.anl.gov5 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 6. Motivating Use Case – Medical Imaging (1) www.ci.anl.gov6 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 7. Motivating Use Case – Challenges Data Privacy Data Access Big Data? • Storage and transfer • Many researchers • Multiple centers • Regulations (HIPAA) • Geographically • Multiple subjects • Research IP distributed • Mutliple scans • Trust • Different institutions • Mutltple analyses/ reconstructions www.ci.anl.gov7 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 8. Motivating Use Case – S-CDN • Trustworthiness: Relationships encoded within a real world social/collaboration network and previous scientific interactions or institutional affiliations • Data availability: Access to those who are permitted to view (and need) data when required • Reduced barriers: Collaborative infrastructure and potential to aggregate other middleware such as authentication, job submission, data staging • Access and data placement: Algorithms that leverage properties of the social graph www.ci.anl.gov8 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 9. Architecture Trust relationship • Storage Servers – CDN edge nodes on which research datasets (or fragments thereof) reside – Shared folder used for CDN and local storage Trusted third – Client to manage and transfer party datasets • Social Middleware – Adds a layer of abstraction between users and the S-CDN – Provides authentication and authorization • Allocation Servers – Centralized catalogs for global datasets – Maintain a list of current replicas and place, move, update, and maintain replicas • Implementation? www.ci.anl.gov9 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 10. Preliminary Investigation• Explore data availability using a S-CDN – Based on researcher relationships in a collaboration• How can we extract a representation of scientific (data) collaboration? – Extrapolate collaborative research from the publication history of a scientist• Analysis – Extract communities with different levels of trust – Investigate simple CDN placement using social algorithms www.ci.anl.gov10 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 11. Community Graphs Baseline Double Coauthorship Number of AuthorsAuthors 2335 811 604Publications 1163 881 435Edges 17973 5123 1988• Baseline: DBLP publications, 3 Degrees, 2009-2010• Double Coauthorship: At least 2 publications• Number of Authors: < 6 authors per publication www.ci.anl.gov11 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 12. Replica Selection• Random – Avg Hops: 2.23• Node Degree – Highest number of edges – Avg Hops: 1.54• Community Node Degree – Highest degree within a community (i.e. no adjacent placement) – Avg Hops: 1.38• Clustering Coefficient – Highest likelihood that an author’s coauthors are also connected – Avg Hops: 2.62 www.ci.anl.gov12 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 13. Results 30 Baseline Random Node Degree 25 Community Node Degree Replica Hit Rate (%) Clustering Coefficient 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 Number of Replicas Double Coauthorship Number of Authors 40 Random 70 Random 35 Node Degree Node Degree Community Node Degree 60 30 Community Node DegreeReplica Hit Rate (%) Clustering Coefficient Replica Hit Rate (%) 50 Clustering Coefficient 25 40 20 15 30 10 20 5 10 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Number of Replicas Number of Replicas www.ci.anl.gov 13 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 14. Target users of a Social CDN1. Large collaborative project with multiple distributed participants2. Participants are able to provide some resources to the project3. Good overall connectivity between participants4. Different data set requirements for members of the collaboration5. Availability of data sets that can be co-hosted by other participants6. Varying sized data sets – not all of which may be able to fit in one place. www.ci.anl.gov14 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 15. Summary• Data management across collaborations is difficult – Right place, right time, accessible to the right people – Complicated by size, security, availability, distance …• Social CDN – Builds upon the proven CDN model from other domains – Relies on user contributed edge nodes – Social overlay to incorporate trust and social replica selection• Future work – Analysis and formalization of trust as an enabler of collaboration o Further investigation into mechanisms to extract trustworthiness from scientific networks. – Simulation of a wider range of attributes, such as data access algorithms, different research networks, and indicators of trust. – Proof of concept implementation www.ci.anl.gov15 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 16. Thanks• Questions? Resources are idle 40-95% 1,000,000,000 Users On average 190 friends Users contribute to “good” causes• Kyle Chard: kyle@ci.uchicago.edu• http://www.facebook.com/SocialCloudComputing www.ci.anl.gov16 Social CDN -- DataCloud 2012 www.ci.uchicago.edu