Learning Open Source through GSOC

1,079 views
966 views

Published on

The goal of this talk is to highlight open source opportunities for students especially through an opportunity to earn $5000 through Google Summer of Code program. I will discuss some of the tips on how to engage with open source communities, the befits for contributing. I will provide motivating examples on how students can gain significant experience in contributing challenging distributed systems problems while impacting scientific research. I will specifically focus with a concrete example of Apache Airavata software suite for Web-based science gateways. I will list some example GSoC topics of interest and provide some recipes for success in getting accepted and navigating through success.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,079
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Providing capabilities and services beyond flops We provide the integrated environment allowing for the coherent use of the various resources and services supported by NSF.
  • Most popular these days is CIPRES- Phylogeny (Mark Miller)
  • Learning Open Source through GSOC

    1. 1. Science Gateways, Open Source & Google Summer of Code Suresh MarruApache Software Foundation Indiana University
    2. 2. AcknowledgementsApache Software Foundation (ASF)Extreme Science and Engineering Discovery Environments (XSEDE)Science Gateways Group, Pervasive Technology Institute, Indiana University (SGG)
    3. 3. Credits to ….Science Gateways Group @ IU  Marlon Pierce: Group Lead  Amila Jayasekara  Chathuri Wimalasena  Heshan Suriyaachchi  Jun Wang  Lahiru Gunathilake  Raminder Singh  Saminda Wijeratne  Suresh Marru  Viknes Balasubramanee  Yu (Marie) Ma
    4. 4. What will you hear today?Science Gateways Web 2.0, Social Networking, Grid & Cloud Computing, BigData, everything-as-a-service - - churned into real-world scientific research.Open Source Hack into Open Source projects – a good way to cherish doing what you like as opposite to what you have to.Google Summer of Code Reward yourself with $5000 while making a case for Future Employments & Graduate School Admissions Apache Airavata
    5. 5. Outline What are Science Gateways? Getting your way in Open Source Apache Software Foundation Google Summer of Code Interested? Next Steps……
    6. 6. www.google-melange.orgwww.google-melange.com
    7. 7. What is Google Summer of Code?Google Summer of Code is a program designed to encourage college student participation in open source software development.
    8. 8. Key Goals of GSOC• Inspire young developers to begin participating in open source development• Provide students in computer science and related fields the opportunity to do work related to their academic pursuits during the summer• Give students more exposure to real-world software development scenarios (e.g. distributed development, software licensing questions, mailing list etiquette, etc.)• Get more open source code created and released for the benefit of all• Help open source projects identify and bring in new developers and committers
    9. 9. GSoC in numbers: Countries
    10. 10. GSoC Top Schools
    11. 11. GSoC in numbers: Students Number of students max’ed and stabilized around 1200. This is not expected to grow in near future, understandable, still thank you Google!!
    12. 12. GSoC Win-Win Perspective• Project Perspective: o Paid software developer for the summer. o Attracting a new member into the project community.• Student Perspective o Opportunity to gain (open source) software development experience. o Good payment for rewarding work. o Ability to network and become known within a structured, distributed setting.
    13. 13. What to look for in a project? Can you engage with project (not just the mentor)?. Can they guide you with tutorials and hand hold early on? For instance, will you get to experience “Apache Way”? Is the project welcoming and appreciative? Is there a mileage for your extra effort with long term commitments?
    14. 14. Apache Software Foundation Indiana University
    15. 15. Core Contributions beyond GSOC Milinda realized he could execute his GSOC project, but had great thoughts on how we can fundamentally improve Airavata Architecture to make it easy for future extensions. Developer community agreed to the new Architecture.  Simple  Easy extendibility. Airavata has adopted his proposed new architecture
    16. 16. Enhanced Airavata Architecture Global InHandlersJob Execution Context Provider Logic Provider specific InHandlers Application specific In Handlers Application specific OutHandlers Global OutHandlers Provider specific OutHandlers
    17. 17. Pick what motivates you Harness your skills and interests If possible pick a project relevant and “required” by aligning with your’ academic curriculum  As a final year (research) project  As a Masters-level research project Create an interesting and challenging research problem Sense of satisfaction and achievements  Research publications  Presentations at ApacheCon and similar conferences  Committership
    18. 18. What does a good mentor look for?Free & Paid Contributions – the realityLong term participant in the project (not a software developer for ~3 months)Accomplish meaningful research-oriented goals either within the project or cross- cutting projects.Teach open source/community participation to the next generation workforce
    19. 19. What will you hear today?Science Gateways Web 2.0, Social Networking, Grid & Cloud Computing, BigData, everything-as-a-service - - churned into real-world scientific research. Apache Airavata
    20. 20. What Is Cyberinfrastructure? “Cyberinfrastructure consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high performance networks to improve research productivity and enable breakthroughs not otherwise possible.” –Craig Stewart, Indiana University
    21. 21. Apache Software Foundation Indiana University
    22. 22. Dynamic Adaptive Cyberinfrastructure - Reacting to real-time weather Storms Forming Forecast Model StreamingObservations Data Mining Instrument Steering Refine forecastEnvisioned by a multi-disciplinaryteam fromOU, IU, NCSA, Unidata, UAH, Howard, On-DemandMillersville, Colorado State, RENCI Grid Computing
    23. 23. Anatomy of a Science Gateway Gateway User Interface  Web Portals  Desktop Clients  Social/ Collaboration Capabilities Security Infrastructure Analyses & Visualization Capabilities Workflow Execution Framework  Application Abstraction  Workflow construction & Enactment  Compute Resource Management  Scheduling  Messaging System Data Management Provenance Collection
    24. 24. Apache Software Foundation Indiana University
    25. 25. 25
    26. 26. XSEDE VisionThe eXtreme Science andEngineering DiscoveryEnvironment (XSEDE): enhances the productivity of scientists and engineers by providing them with new and innovative capabilitiesand thus facilitates scientific discovery while enabling transformational science/engineering and innovative educational programs
    27. 27. https://www.xsede.org/gateways-overview
    28. 28. Today, there are approximately 35gateways using XSEDE
    29. 29. What will you hear today?Open Source Hack into Open Source projects – a good way to cherish doing what you like as opposite to what you have to. Apache Airavata
    30. 30. The Apache Software Foundation Apache software powers  Governance and Staffing 65% of web sites worldwide  Board of Directors Project Management 501(c)3 non-profit  Committees foundation  ASF Members Reasons for creating ASF  Committers  Create legal entity  Contributors  Protect contributors from  Funding liability  All-volunteer  Protect Apache assets staffing/development Membership: individual resources Apache Incubator  Donations  Corporate investment
    31. 31. Apache Way: Beyond Open Source, Open Community Transparency  Decision-making and actions are observable  Events of interest are published and recorded  Transparency invites collaboration Meritocratic Governance  Influence on decisions is based on merit  Merit is earned in public  Community based governance Community  Common interest, Community interest, Common experience  “Community before code” Collaboration  Systems supporting communication and coordination: repositories, trackers, forums, build tools  You can reuse what you can see and influence  More eyeballs means better quality
    32. 32. Apache Organization• Apache is a meritocratic organization – Merit does not expire. You earn your keep and your credentials• Start out as Contributor – Patches, mailing list comments, testing, documentation, etc. – No commit access• Move onto Committer – Commit access, evolve the code• PMC Members – Have binding VOTEs on releases/personnel• Officer (VP, Project) – PMC Chair• ASF Member – Have binding VOTE in the state of the foundation – Elect Board of Directors• Director – Oversight of projects, foundation activities
    33. 33. Our experience with Apache .. Give up control and get back contributions. Being in apache by itself doesn’t guarantee sustainability but open doors for sustainability. Google Summer of code has bought in students, increased documentation, identified confined projects. Do not have to worry about getting sued by Oracle for using Java API’s. Standing behind a shield of expert lawyers. Companies make in-kind contributions, some have concrete plans, some or just evangelizing. Both are good. Todays, Cyberinfrastructure eco-system is not in a funding situation to work on parallel independent implementation. Shared implementation is hard to achieve, but well thought architectures can achieve it. Also encourage multiple implementations and let the communities sort out. The winner sustains. Example: Apache Axis2, Apache CXF
    34. 34. Apache Contributions Aren’t JustSoftware• Apache committers and PMC members aren’t just code writers.• Successful communities also include – Important users – Project evangelists – Content providers: documentation, tutorials – Testers, requirements providers, and constructive complainers • Using Jira and mailing lists – Anything else that needs doing.
    35. 35. Apache Airavatahttp://airavata.apache.org
    36. 36. Science Gateways with Airavata
    37. 37. Apache Software Foundation Indiana University
    38. 38. Apache Software Foundation Indiana University
    39. 39. Key Airavata Features Graphical user interface to construct, execute, control, manage and reuse scientific workflows. Desktop tools and browser-based web interface components to manage applications, workflows and generated data. Sophisticated server-side tools to register, schedule and manage scientific applications on high performance computational resources. Ability to Interface and interoperate with various external (third party) data, workflow and provenance management tools.
    40. 40. A Classic Scientific Workflow Workflows are composite applications built out of independent parts.  Parts are executables wrapped as network accessible services The classic example is that codes A, B, and C need to be executed in a specific sequence.  A, B, C: parallel codes compiled and executable on a cluster, supercomputer, etc. by schedulers.  A, B, and C do not need to be co-located  A, B, and C may be sequential or parallel  A, B and C may have date or control dependencies  Data may need to be staged in and out Some variations on ABC:  Conditional execution branches  Dynamic execution resource binding  Iterations (Do-while, For-Each) over all or parts of the sequence  Triggers, events, data streams
    41. 41. Challenges in Scientific WorkflowsAccommodating wide range of execution patterns  Iterations: for-each, do-while, dot and Cartesian products  Interactivity, adaptivity, non-determinismAccommodating error and uncertainties
    42. 42. NextGen Workflow Systems:Need for Interactivity Across Layers Scientific workflow systems and compiled workflow languages have focused on modeling, scheduling, data movement, dynamic service creation and monitoring of workflows. Building on these foundations Airavata extends to a interactive and flexible workflow systems. Airavata Workflow Features include:  interactive ways of interfering and steering the workflow execution  interpreted workflow execution model  high level instruction set  flexibility to execute individual workflow activity and wait for further analysis.
    43. 43. Interactivity Contd. Derivations during workflow Execution that does not affect the structure of the workflow  dynamic change workflow inputs, workflow rerun. interpreted workflow execution model.  dynamic change in point of execution, workflow smart rerun.  Fault handling and exception models. Derivation that change the workflow DAG during runtime  Reconfiguration of activity..  dynamic addition of activities to the workflow.  Dynamic remove or replace of activity to the workflow
    44. 44. Interactivity Mathematical uncertainty:  PDE’s from domain problems do not have analytical solution and thereby look at numerical methods to find solutions  These solvers may not converge depending on method, PDE system, initial conditions and expected output tolerances  statistical techniques lead to nondeterministic results.  closer observation at computational output ensure acceptability of results. Domain uncertainty:  Scenarios of running against range of parameter values in an attempt to find the most appropriate input set.  Initial execution providing estimate of the accuracy of the inputs and facilitating further refinement.  Outputs are diverse and nondeterministic Resource uncertainty:  Failures in distributed systems are norm than an exception  transient failures can be retried if computation is side-effect free/Idempotent.  persistent failures require migration Real-time Model refinement  Real-time event processing systems not having data available prior to initialization of model.  models evolve over time and can take advantage of more and more events as they become available
    45. 45. Illustrating Interactivity Asynchronous Applica on refinements SteeringOrchestra on level Interac ons Job Level Interac ons Parametric Provenance Workflow Job launch, Checkpoint/ Sweeps Steering gliding Restart Model Mathema cal Domain Resource Refinement Uncertain es
    46. 46. Apache Airavata in ActionDomain DescriptionAstronomy Image processing pipeline for One Degree Imager instrument on XSEDEAstrophysics Supporting workflow of Dark Energy Survey simulations working group on XSEDEBioinformatics Supported workflow executions on Amazon EC2 for BioVLAB projectBiophysics Manage large scale data analysis of analytical ultracentrifugation experiments on XSEDE and campus resourcesComputational Manage workflows to support computationalChemistry chemistry parameter studies for ParamChem.org on XSEDENuclear Physics Workflows for nuclear structure calculations using Leadership Class Configuration Interaction (LCCI) computations on DOE resources
    47. 47. What will you hear today?Google Summer of Code Reward yourself with $5000 while making a case for Future Employments & Graduate School Admissions Apache Airavata
    48. 48. How to crack GSoC? 1 2 3 4 • Engage Early • Familiarize Projects • Propose Ideas •Win, Code, Earn… Cherish !!! Apache Airavata
    49. 49. Be Part of the project Community• Play with different popular open source software ..• Experiment with the emerging technologies …• Learn & Engage with a multidisciplinary community..
    50. 50. Be pro-active instead of being reactive:come up with your own ideas
    51. 51. GSoC Win-Win Perspective• Project Perspective: o Paid software developer for the summer. o Attracting a new member into the project community.• Student Perspective o Opportunity to gain (open source) software development experience. o Good payment for rewarding work. o Ability to network and become known within a structured, distributed setting.
    52. 52. What to look for in a project? Engage with project (not just the mentor). Can they guide you with tutorials and hand hold early on? For instance, will you get to experience “Apache Way”? Is the project welcoming and appreciative? Is there a mileage for your extra effort with long term commitments?
    53. 53. Pick what motivates you Harness your skills and interests If possible pick a project relevant and “required” by aligning with your’ academic curriculum  As a final year (research) project  As a Masters-level research project Create an interesting and challenging research problem Sense of satisfaction and achievements  Research publications  Presentations at ApacheCon and similar conferences  Committership
    54. 54. What does a good mentor look for?Free & Paid Contributions – the realityLong term participant in the project (not a software developer for ~3 months)Accomplish meaningful research-oriented goals either within the project or cross- cutting projects.Teach open source/community participation to the next generation workforce
    55. 55. Join the mailing list Google Group - sgw-gsoc-discuss:  https://groups.google.com/d/forum/sgw-gsoc- discussNeed more info – smarru@apache.org Apache Airavata

    ×