SlideShare a Scribd company logo
A Social Content Delivery Network
for Scientific Cooperation:
Vision, Design, and Architecture
Kyle Chard, Simon Caton, Omer Rana, Daniel S. Katz




                                                     www.ci.anl.gov
                                                     www.ci.uchicago.edu
Introduction
• Collaboration is increasingly data intensive
• To avoid research bottlenecks we need data...
    –   At the right place, at the right time, with appropriate
        access permissions
•   Challenges
    –   Distribution, storage, replication, budget, security, perf
        ormance, locality, reliability, availability..
•   Current approaches to data distribution/sharing?



                                                           www.ci.anl.gov
2       Social CDN -- DataCloud 2012
                                                           www.ci.uchicago.edu
Data (Content) Distribution
•   Other domains use CDNs
    –   E.g. web
        objects, downloads, streaming
        media, social networks
•   But, scientific data is often
    –   BigData
    –   Long tail
    –   Private
    –   Geographically distributed
•   Commercial CDNs infeasible
    and unaffordable for scientific
    data.
                                        www.ci.anl.gov
3       Social CDN -- DataCloud 2012
                                        www.ci.uchicago.edu
Social Content Delivery Network (S-CDN)
•   Utilizes the resources of
    community members
    –   Low cost, distributed
        infrastructure
•   Social network                                  Social Layer
    identifies locations to
    distribute and store
    subsets of data
                                              Resource Layer
    •   Algorithms to partition and
        distribute data based
        relationships with others
•   Built upon the concept
                                       Content Delivery Layer
    of a Social (Data) Cloud

                                               www.ci.anl.gov
4       Social CDN -- DataCloud 2012
                                               www.ci.uchicago.edu
Trust
•   Types of trust for a S-CDN
    1. Infrastructure trust via appropriate security and
       authentication mechanisms as well as policies
    2. Inter-personal trust as an enabler of social
       collaboration.
                 – “a positive expectation or assumption on future outcomes that
                   results from proven contextualized personal interaction-
                   histories”
•   In the context of a S-CDN
    – Leverage trust to select interaction partners
    – Develop “trust models” to aid CDN management
      algorithms

                                                                        www.ci.anl.gov
5        Social CDN -- DataCloud 2012
                                                                        www.ci.uchicago.edu
Motivating Use Case – Medical Imaging (1)




                                            www.ci.anl.gov
6   Social CDN -- DataCloud 2012
                                            www.ci.uchicago.edu
Motivating Use Case – Challenges

    Data Privacy                           Data Access                 Big Data?
     • Storage and transfer                 • Many researchers          • Multiple centers
     • Regulations (HIPAA)                  • Geographically            • Multiple subjects
     • Research IP                            distributed               • Mutliple scans
     • Trust                                • Different institutions    • Mutltple analyses/
                                                                           reconstructions




                                                                                   www.ci.anl.gov
7           Social CDN -- DataCloud 2012
                                                                                   www.ci.uchicago.edu
Motivating Use Case – S-CDN
          •    Trustworthiness: Relationships encoded within a
               real world social/collaboration network and
               previous scientific interactions or institutional
               affiliations

          •    Data availability: Access to those who are
               permitted to view (and need) data when required

          •    Reduced barriers: Collaborative infrastructure and
               potential to aggregate other middleware such as
               authentication, job submission, data staging

          •    Access and data placement: Algorithms that
               leverage properties of the social graph

                                                          www.ci.anl.gov
8   Social CDN -- DataCloud 2012
                                                          www.ci.uchicago.edu
Architecture
                  Trust relationship    •   Storage Servers
                                             –   CDN edge nodes on which
                                                 research datasets (or fragments
                                                 thereof) reside
                                             –   Shared folder used for CDN and
                                                 local storage
                        Trusted third        –   Client to manage and transfer
                           party                 datasets

                                        •   Social Middleware
                                             –   Adds a layer of abstraction
                                                 between users and the S-CDN
                                             –   Provides authentication and
                                                 authorization

                                        •   Allocation Servers
                                             –   Centralized catalogs for global
                                                 datasets
                                             –   Maintain a list of current replicas
                                                 and place, move, update, and
                                                 maintain replicas
                                        •   Implementation?

                                                                   www.ci.anl.gov
9   Social CDN -- DataCloud 2012
                                                                   www.ci.uchicago.edu
Preliminary Investigation
•    Explore data availability using a S-CDN
     –   Based on researcher relationships in a collaboration
•    How can we extract a representation of
     scientific (data) collaboration?
     –   Extrapolate collaborative research from the
         publication history of a scientist
•    Analysis
     – Extract communities with different levels of trust
     – Investigate simple CDN placement using social
       algorithms
                                                       www.ci.anl.gov
10       Social CDN -- DataCloud 2012
                                                       www.ci.uchicago.edu
Community Graphs




                             Baseline    Double Coauthorship   Number of Authors
Authors                        2335             811                  604
Publications                   1163             881                  435
Edges                          17973            5123                 1988

• Baseline: DBLP publications, 3 Degrees, 2009-2010
• Double Coauthorship: At least 2 publications
• Number of Authors: < 6 authors per publication

                                                                      www.ci.anl.gov
11        Social CDN -- DataCloud 2012
                                                                      www.ci.uchicago.edu
Replica Selection
•    Random
     –   Avg Hops: 2.23
•    Node Degree
     –   Highest number of edges
     –   Avg Hops: 1.54
•    Community Node Degree
     –   Highest degree within a community
         (i.e. no adjacent placement)
     –   Avg Hops: 1.38
•    Clustering Coefficient
     – Highest likelihood that an author’s
       coauthors are also connected
     – Avg Hops: 2.62


                                             www.ci.anl.gov
12       Social CDN -- DataCloud 2012
                                             www.ci.uchicago.edu
Results
                                                                                30
                                                                                                        Baseline
                                                                                         Random
                                                                                         Node Degree
                                                                                25
                                                                                         Community Node Degree




                                                         Replica Hit Rate (%)
                                                                                         Clustering Coefficient
                                                                                20

                                                                                15

                                                                                10

                                                                                 5

                                                                                 0
                                                                                     1     2     3      4      5       6       7                        8       9       10
                                                                                                            Number of Replicas

                                            Double Coauthorship                                                                                                     Number of Authors
                       40
                                Random                                                                                                   70
                                                                                                                                                  Random
                       35       Node Degree                                                                                                       Node Degree
                                Community Node Degree                                                                                    60
                       30                                                                                                                         Community Node Degree
Replica Hit Rate (%)




                                Clustering Coefficient                                                            Replica Hit Rate (%)   50       Clustering Coefficient
                       25
                                                                                                                                         40
                       20
                       15                                                                                                                30

                       10                                                                                                                20

                        5                                                                                                                10
                        0                                                                                                                0
                            1     2     3      4      5       6       7                     8     9     10                                    1     2       3       4      5       6       7        8     9     10
                                                   Number of Replicas                                                                                                   Number of Replicas

                                                                                                                                                                                               www.ci.anl.gov
                       13          Social CDN -- DataCloud 2012
                                                                                                                                                                                               www.ci.uchicago.edu
Target users of a Social CDN
1.   Large collaborative project with multiple
     distributed participants
2.   Participants are able to provide some resources to
     the project
3.   Good overall connectivity between participants
4.   Different data set requirements for members of
     the collaboration
5.   Availability of data sets that can be co-hosted by
     other participants
6.   Varying sized data sets – not all of which may be
     able to fit in one place.
                                                www.ci.anl.gov
14     Social CDN -- DataCloud 2012
                                                www.ci.uchicago.edu
Summary
•    Data management across collaborations is difficult
     – Right place, right time, accessible to the right people
     – Complicated by size, security, availability, distance …
•    Social CDN
     – Builds upon the proven CDN model from other domains
     – Relies on user contributed edge nodes
     – Social overlay to incorporate trust and social replica selection
•    Future work
     –   Analysis and formalization of trust as an enabler of collaboration
          o   Further investigation into mechanisms to extract trustworthiness from
              scientific networks.
     – Simulation of a wider range of attributes, such as data access
       algorithms, different research networks, and indicators of trust.
     – Proof of concept implementation



                                                                           www.ci.anl.gov
15       Social CDN -- DataCloud 2012
                                                                           www.ci.uchicago.edu
Thanks

•    Questions?
                                      Resources are idle 40-95%
                1,000,000,000 Users




             On average 190 friends
                                       Users contribute to “good” causes




• Kyle Chard: kyle@ci.uchicago.edu
• http://www.facebook.com/SocialCloudComputing
                                                                           www.ci.anl.gov
16     Social CDN -- DataCloud 2012
                                                                           www.ci.uchicago.edu

More Related Content

Viewers also liked

Content Delivery Networks
Content Delivery NetworksContent Delivery Networks
Content Delivery Networks
Kshitij Agarwal
 

Viewers also liked (9)

Content Delivery Networks
Content Delivery NetworksContent Delivery Networks
Content Delivery Networks
 
What’s the Difference between an Application Delivery Network and a Content D...
What’s the Difference between an Application Delivery Network and a Content D...What’s the Difference between an Application Delivery Network and a Content D...
What’s the Difference between an Application Delivery Network and a Content D...
 
Content Delivery Network
Content Delivery NetworkContent Delivery Network
Content Delivery Network
 
CDN - Content Delivery Network
CDN - Content Delivery NetworkCDN - Content Delivery Network
CDN - Content Delivery Network
 
How a Content Delivery Network Can Help Speed Up Your Website
How a Content Delivery Network Can Help Speed Up Your WebsiteHow a Content Delivery Network Can Help Speed Up Your Website
How a Content Delivery Network Can Help Speed Up Your Website
 
The Evolution of the Content Delivery Network
The Evolution of the Content Delivery NetworkThe Evolution of the Content Delivery Network
The Evolution of the Content Delivery Network
 
Joomla Content Delivery Networks
Joomla Content Delivery NetworksJoomla Content Delivery Networks
Joomla Content Delivery Networks
 
Using Content Delivery Networks with Drupal
Using Content Delivery Networks with DrupalUsing Content Delivery Networks with Drupal
Using Content Delivery Networks with Drupal
 
Content Delivery Network
Content Delivery NetworkContent Delivery Network
Content Delivery Network
 

Similar to A Social Content Delivery Network for Scientific Cooperation: Vision, Design, and Architecture

Ndsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharingNdsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharing
University of California Curation Center
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
SEAD
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtc
DataTactics
 

Similar to A Social Content Delivery Network for Scientific Cooperation: Vision, Design, and Architecture (20)

Policy Based Data Management iRODS - Reagan Moore - RDAP12
Policy Based Data Management iRODS - Reagan Moore - RDAP12Policy Based Data Management iRODS - Reagan Moore - RDAP12
Policy Based Data Management iRODS - Reagan Moore - RDAP12
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data Management
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific Research
 
NCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - OverviewNCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - Overview
 
Ndsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharingNdsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharing
 
Distributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsDistributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based Systems
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
 
Semantic Search: We're Living in a Golden Age for Information
Semantic Search: We're Living in a Golden Age for InformationSemantic Search: We're Living in a Golden Age for Information
Semantic Search: We're Living in a Golden Age for Information
 
Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data Interoperability
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
Data accessibilityandchallenges
Data accessibilityandchallengesData accessibilityandchallenges
Data accessibilityandchallenges
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtc
 
Distributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingDistributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of Everything
 
Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks
 
Intelligent Cloud Enablement
Intelligent Cloud EnablementIntelligent Cloud Enablement
Intelligent Cloud Enablement
 
Identity Management for Virtual Organizations: A Model
Identity Management for Virtual Organizations: A ModelIdentity Management for Virtual Organizations: A Model
Identity Management for Virtual Organizations: A Model
 
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
 

More from Simon Caton

More from Simon Caton (12)

Preference-Based Resource Allocation: Using Heuristics to Solve Two-Sided Mat...
Preference-Based Resource Allocation: Using Heuristics to Solve Two-Sided Mat...Preference-Based Resource Allocation: Using Heuristics to Solve Two-Sided Mat...
Preference-Based Resource Allocation: Using Heuristics to Solve Two-Sided Mat...
 
Research Discovery, Social Networks and VIVO
Research Discovery, Social Networks and VIVO Research Discovery, Social Networks and VIVO
Research Discovery, Social Networks and VIVO
 
A Simulator for Social Exchanges and Collaborations - Architecture and Case S...
A Simulator for Social Exchanges and Collaborations - Architecture and Case S...A Simulator for Social Exchanges and Collaborations - Architecture and Case S...
A Simulator for Social Exchanges and Collaborations - Architecture and Case S...
 
The Gamification of Well-Being Measures
The Gamification of Well-Being MeasuresThe Gamification of Well-Being Measures
The Gamification of Well-Being Measures
 
eSoN Overview Slides
eSoN Overview SlideseSoN Overview Slides
eSoN Overview Slides
 
Social Cloud talk at KSRI Service Summit 2012
Social Cloud talk at KSRI Service Summit 2012Social Cloud talk at KSRI Service Summit 2012
Social Cloud talk at KSRI Service Summit 2012
 
Collaborative eResearch in a Social Cloud
Collaborative eResearch in a Social CloudCollaborative eResearch in a Social Cloud
Collaborative eResearch in a Social Cloud
 
Social Cloud Computing
Social Cloud ComputingSocial Cloud Computing
Social Cloud Computing
 
A Social Cloud for Public eResearch
A Social Cloud for Public eResearchA Social Cloud for Public eResearch
A Social Cloud for Public eResearch
 
Incentivising Resource Sharing in Social Clouds
Incentivising Resource Sharing in Social CloudsIncentivising Resource Sharing in Social Clouds
Incentivising Resource Sharing in Social Clouds
 
Engineering Incentives in Social Clouds
Engineering Incentives in Social Clouds Engineering Incentives in Social Clouds
Engineering Incentives in Social Clouds
 
Social Cloud: Cloud Computing in Social Networks
Social Cloud: Cloud Computing in Social NetworksSocial Cloud: Cloud Computing in Social Networks
Social Cloud: Cloud Computing in Social Networks
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 

A Social Content Delivery Network for Scientific Cooperation: Vision, Design, and Architecture

  • 1. A Social Content Delivery Network for Scientific Cooperation: Vision, Design, and Architecture Kyle Chard, Simon Caton, Omer Rana, Daniel S. Katz www.ci.anl.gov www.ci.uchicago.edu
  • 2. Introduction • Collaboration is increasingly data intensive • To avoid research bottlenecks we need data... – At the right place, at the right time, with appropriate access permissions • Challenges – Distribution, storage, replication, budget, security, perf ormance, locality, reliability, availability.. • Current approaches to data distribution/sharing? www.ci.anl.gov 2 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 3. Data (Content) Distribution • Other domains use CDNs – E.g. web objects, downloads, streaming media, social networks • But, scientific data is often – BigData – Long tail – Private – Geographically distributed • Commercial CDNs infeasible and unaffordable for scientific data. www.ci.anl.gov 3 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 4. Social Content Delivery Network (S-CDN) • Utilizes the resources of community members – Low cost, distributed infrastructure • Social network Social Layer identifies locations to distribute and store subsets of data Resource Layer • Algorithms to partition and distribute data based relationships with others • Built upon the concept Content Delivery Layer of a Social (Data) Cloud www.ci.anl.gov 4 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 5. Trust • Types of trust for a S-CDN 1. Infrastructure trust via appropriate security and authentication mechanisms as well as policies 2. Inter-personal trust as an enabler of social collaboration. – “a positive expectation or assumption on future outcomes that results from proven contextualized personal interaction- histories” • In the context of a S-CDN – Leverage trust to select interaction partners – Develop “trust models” to aid CDN management algorithms www.ci.anl.gov 5 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 6. Motivating Use Case – Medical Imaging (1) www.ci.anl.gov 6 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 7. Motivating Use Case – Challenges Data Privacy Data Access Big Data? • Storage and transfer • Many researchers • Multiple centers • Regulations (HIPAA) • Geographically • Multiple subjects • Research IP distributed • Mutliple scans • Trust • Different institutions • Mutltple analyses/ reconstructions www.ci.anl.gov 7 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 8. Motivating Use Case – S-CDN • Trustworthiness: Relationships encoded within a real world social/collaboration network and previous scientific interactions or institutional affiliations • Data availability: Access to those who are permitted to view (and need) data when required • Reduced barriers: Collaborative infrastructure and potential to aggregate other middleware such as authentication, job submission, data staging • Access and data placement: Algorithms that leverage properties of the social graph www.ci.anl.gov 8 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 9. Architecture Trust relationship • Storage Servers – CDN edge nodes on which research datasets (or fragments thereof) reside – Shared folder used for CDN and local storage Trusted third – Client to manage and transfer party datasets • Social Middleware – Adds a layer of abstraction between users and the S-CDN – Provides authentication and authorization • Allocation Servers – Centralized catalogs for global datasets – Maintain a list of current replicas and place, move, update, and maintain replicas • Implementation? www.ci.anl.gov 9 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 10. Preliminary Investigation • Explore data availability using a S-CDN – Based on researcher relationships in a collaboration • How can we extract a representation of scientific (data) collaboration? – Extrapolate collaborative research from the publication history of a scientist • Analysis – Extract communities with different levels of trust – Investigate simple CDN placement using social algorithms www.ci.anl.gov 10 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 11. Community Graphs Baseline Double Coauthorship Number of Authors Authors 2335 811 604 Publications 1163 881 435 Edges 17973 5123 1988 • Baseline: DBLP publications, 3 Degrees, 2009-2010 • Double Coauthorship: At least 2 publications • Number of Authors: < 6 authors per publication www.ci.anl.gov 11 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 12. Replica Selection • Random – Avg Hops: 2.23 • Node Degree – Highest number of edges – Avg Hops: 1.54 • Community Node Degree – Highest degree within a community (i.e. no adjacent placement) – Avg Hops: 1.38 • Clustering Coefficient – Highest likelihood that an author’s coauthors are also connected – Avg Hops: 2.62 www.ci.anl.gov 12 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 13. Results 30 Baseline Random Node Degree 25 Community Node Degree Replica Hit Rate (%) Clustering Coefficient 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 Number of Replicas Double Coauthorship Number of Authors 40 Random 70 Random 35 Node Degree Node Degree Community Node Degree 60 30 Community Node Degree Replica Hit Rate (%) Clustering Coefficient Replica Hit Rate (%) 50 Clustering Coefficient 25 40 20 15 30 10 20 5 10 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Number of Replicas Number of Replicas www.ci.anl.gov 13 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 14. Target users of a Social CDN 1. Large collaborative project with multiple distributed participants 2. Participants are able to provide some resources to the project 3. Good overall connectivity between participants 4. Different data set requirements for members of the collaboration 5. Availability of data sets that can be co-hosted by other participants 6. Varying sized data sets – not all of which may be able to fit in one place. www.ci.anl.gov 14 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 15. Summary • Data management across collaborations is difficult – Right place, right time, accessible to the right people – Complicated by size, security, availability, distance … • Social CDN – Builds upon the proven CDN model from other domains – Relies on user contributed edge nodes – Social overlay to incorporate trust and social replica selection • Future work – Analysis and formalization of trust as an enabler of collaboration o Further investigation into mechanisms to extract trustworthiness from scientific networks. – Simulation of a wider range of attributes, such as data access algorithms, different research networks, and indicators of trust. – Proof of concept implementation www.ci.anl.gov 15 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 16. Thanks • Questions? Resources are idle 40-95% 1,000,000,000 Users On average 190 friends Users contribute to “good” causes • Kyle Chard: kyle@ci.uchicago.edu • http://www.facebook.com/SocialCloudComputing www.ci.anl.gov 16 Social CDN -- DataCloud 2012 www.ci.uchicago.edu