Your SlideShare is downloading. ×
0
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
IESL Talk Series: Apache System Projects in the Real World
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

IESL Talk Series: Apache System Projects in the Real World

1,518

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,518
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Apache Systems Projects in the Real World
    Srinath Perera Ph.D.
    Senior Software Architect, WSO2 Inc.
    Member, Apache Software Foundation
    Visiting Faculty, University of Moratuwa
    Research Scientist, Lanka Software Foundation
  • 2. Goals of this Talk
    Intro to Apache and Opensource
    Describe a large Scale E-Science Project build on Apache Technology and some open problems.
    Apache Airavata
    Discuss “should your project move to Apache?”
    photo by John Trainoron Flickr http://www.flickr.com/photos/trainor/2902023575/, Licensed under CC
  • 3. Open Source
    Basic definition is code accessible to everyone.
    Yes, you can write something and make it opensource.
    But Community is one of the key aspects.
    Often build by volunteers (at least not payed by the project)
    Does serious Crowdsourcing
    Ideally, Code contributions , governance, and decision model all open and decentralized.
    Not all opensource projects are equal (different license)
    GPL License – Linux etc., you have to contribute back changes
    Apache License – Commercial friendly
    Copyright digitalART2 and licensed for reuse under CC License , http://www.flickr.com/photos/digitalart/2101765353/
  • 4. How does a Opensource Work?
    Open code repository (SVN or Git etc.)
    Two parts of the community
    Developer Community
    User Community
    Communication through Mailing lists / IRC Channel
    Develop mailing list
    User mailing list
    Bug tracking database to track errors (Jira, Bugzilla)
    People submit improvements as patches through Jira etc.
    Committers have write access to repository
    Committers review and apply patches, and when you submit lot of them, they will make you a committer.
  • 5. Success Stories
    Apache Web Server
    Linux
    MySQL
    Apache Tomcat
    Apache Axis2
    Apache Synapse/WSO2 ESB
    Firefox
    Eclipse

    Victory
    Gartner Predicted that by 2012 most systems will use open source components
    Copyright kafka4prez and licensed for reuse under CC License , http://www.flickr.com/photos/kafka4prez/198465913
  • 6. Why People Contribute?
    Because they Enjoy it
    To work with smart people
    Because they get payedto do it.
    If you are a reputed opensourcedeveloper, bets are that you can get someone to pay you for contributing to opensource.
    Visibility, to Make an impact
    Recognition, prestige
    To Improve your brand / profile
    To get into Grad school
    As a Business Strategy
    Building or supporting an opensource project may be a long term strategic action.
    Great investments need faith and patience
    Copyright U. S. Fish and Wildlife Service and licensed for reuse under CC License , http://www.flickr.com/photos/usfwsnortheast/4754624921 and Copyright WxMom and licensed for reuse under CC License , http://www.flickr.com/photos/wxmom/1359996991.
  • 7. Open Source Business Model
    Opensourceprojects occupy a significant portion of the middleware space and many others.
    Many commercial products are powered by Open source projects
    Many large companies invest a significant amount of resources on opensourceprojects (sometime 1000s)
    Often there are companies around opensource Projects
    Business models
    Build an improved pro versions and sell them
    Sell production support
    Provide Consultancy, learning etc.
    Copyright Emdotand licensed for reuse under CC License, http://www.flickr.com/photos/emdot/2418695
  • 8. Apache Software Foundation
    Build on the Success of Apache Web Server
    Home to many successful and highly influential Open Source Project like Apache Web Server
    Governed by Apache License
    Can edit and redistribute, and even sell
    Not viral, you are free to make money on top it
    Community is the Key
    User Community
    Developer Community
    Open development model with Open decisions
    Communication through mailing lists
    Warm Springs Chiricahua Apache
    Copyright Jeff Kubina and licensed for reuse under CC License , http://www.flickr.com/photos/95118988@N00/416015918
  • 9. Apache System Projects
    Web Service Support
    Apache Axis2, Apache Rampart, Apache Sandesha, Apache CXF ..
    Workflow Engine
    Apache ODE
    Enterprise Service Bus
    Apache Synapse
    Apache Camel
    Messaging
    Apache Qpid/ ActiveMQ
    Data Storages
    Apache Cassandra, CouchDB, Apache OODT
    J2EE Container
    Apace Geronimo

    Copyright ind{yeah} and licensed for reuse under CC License , http://www.flickr.com/photos/flickcoolpix/3566848458/
  • 10. A Large E-Science Project as a CaseStudy
  • 11. E-Science
    • Continuation of High Performance Computing, Parallel Computing, and Grid.
    • 12. Underline theme is “Cyber-infrastructures to support Scientific Research”.
    • 13. Build around “Computation” as the third pillar of Science (along with Analysis and Experimentation).
    • 14. Characterized by wide range of computing (CPU minutes to CPU years) and Data (few KB to PBs of data) requirements.
    • 15. Based on Real life usecases.
  • “Tis strange—but true; for truth is always strange,Stranger than fiction.” ---- Lord Byron, Don Juan (1818-24)
    • E-Science joins Theory with Real life data
    • 16. Real Life Applications often go beyond our experiences.
    • 17. Most Weather models are calculated much less than ideal resolutions, otherwise a 24 hour forecast takes more than 24 hours !!!
    • 18. Physics Usecases (e.g. Large Hadron Collider), Telescopes, Genome Analysis generate Tera bytes of data in days if not hours, and moving a 1TB takes hours even in a 10 GB networks of TeraGrid.
    • 19. Scale, geographical distribution of resources, Heterogeneity makes these usecases Complex.
    Surprise
    Copyright Nrbelex and licensed for reuse under CC License , http://www.flickr.com/photos/nrbelex/529393643
  • 20. Linked Environments for Atmospheric Discovery (LEAD)
    • U.S. NSF funded, 10+ Universities, 11M $, 5 Years.
    • 21. Used for U.S. National Weather forecasts by NOAA.
    • 22. Presented to U.S. Congress as an example to justify Scientific research spending by U.S. NSF.
    • 23. Have brought the state of the art forecasting capabilities to wider audience ranging from hardcore scientists to high schools students.
    Copyright f2n_downtown and licensed for reuse under CC License , http://www.flickr.com/photos/myneighborhood/4809104443
  • 24. LEAD: Dynamic Weather Analysis in U.S. Wide Scale
  • 25. Why is it Hard?
    • Geographically Distributed Sensors, Computing Power, Storage, and Expertise.
    • 26. Handling Failures and Recovery
    • 27. Long Running Jobs (> 1 Hour).
    • 28. Large Scale Jobs (10-1000+ processors).
    • 29. Large Sized Data (KBs to GB of data).
    • 30. Need to serve many parallel users.
    • 31. Usage spikes.
    Copyright Wonderlane and licensed for reuse under CC License , http://www.flickr.com/photos/wonderlane/3302165946
  • 32. LEAD as an Example
    • Assume a Hurricane has developed, and 1000 scientists across U.S. come to the LEAD portal to run forecasts.
    • 33. Lets assume,
    • 34. Each user run 3 workflows.
    • 35. Each Workflow has 6 services, generates about 300 notifications, moves 50 100MB files, generates 50 100MB files, and runs for one hour.
    • 36. Each Service needs 5 CPUs Hours .
    Copyright gletham GIS, Social, Mobile Tech Images and licensed for reuse under CC License, http://www.flickr.com/photos/gisuser/54062274/
  • 37. Which Means
    • 3000 Parallel workflows
    • 38. Need 90,000 CPUs per Hour
    • 39. 250 TPS for messaging System
    • 40. Move 8GB/Sec through the network
    • 41. Generate 15TB data per Hour
    Do the math
    Not all of this can be handled now, but they give us an idea about the challenge.
    Copyright matsuyuki and licensed for reuse under CC License, http://www.flickr.com/photos/matsuyuki/5461363022
  • 42. SOA, E-Science and LEAD
    • E-Science infrastructures are distributed, complex, and heterogeneous.
    • 43. SOA is designed to handle just the like.
    • 44. LEAD is based on many SOA Specs
    • 45. WSDL, SOAP, WS-Addressing for Communication
    • 46. WS-BPEL for Workflows
    • 47. WS-Eventing for Messaging
    • 48. WSDM for service Management
    • 49. LEAD People have closely worked with and contributed to Web Services, pushing its limits to apply it to LEAD.
  • LEAD Architecture
  • 50. Workflow Subsystem
  • 51. Workflow Subsystem Challenges
    Maximizing Resource Utilization
    Utilizing the Cloud
    Cloud Bursting
    Handling Priorities
    Scaling up
    Service and Workflow Governance
    Execution Delegation
    Copyright Doug Lee and licensed for reuse under CC License, http://www.geograph.org.uk/photo/1893583
  • 52. Data Subsystem
  • 53. Data Subsystem Challenges
    Large Scale data Repositories
    To detect, collect metadata, and store
    To Search
    Replica Management
    Data Mining
    CEP
    Clustering algorithms etc.
    Data Provenance
    Data Quality
    Copyright Anne Petty and licensed for reuse under CC License, http://www.geograph.org.uk/photo/101401
  • 54. Messaging Subsystem
  • 55. Messaging Subsystem Challenges
    Underline model is Publish/ Subscribe pattern
    Challenges are
    How to scale up? Supporting large number of users and supporting large number of subscriptions
    Avoid single Point of Failure
    Ensure guaranteed delivery
    Security within Publish/Subscribe pattern
    Related Projects
    WS-Messenger
    Narada Broker
    Apache Qpid
    Copyright Dave Croker and licensed for reuse under CC License, http://www.geograph.org.uk/photo/689155
  • 56. LEAD & Apache WS History
    • LEAD and Apache teams both has contributed to other (and there is overlap)
    • 57. LEAD is older than Axis2, and it forked off in Axis era, mainly because of Async messaging support.
    • 58. Five years ago LEAD implemented many tools (e.g. Registries, Async Messaging, Workflow Engine), that are hot topics now.
    • 59. Team receive Continuing funding to make it Open Source under OGCE
    • 60. LEAD code base now based on Axis2, ODE and others
    • 61. Moved into Apache as “Apache Airavata”
  • LEAD with Apache Projects
    • LEAD Switched to Apache ODE for workflow execution more than 3 years ago.
    • 62. LEAD data subsystems switched to Axis2 about 3 years ago.
    • 63. Job Submission was switched to Axis2 about 2 years back.
    • 64. Service Factory is being converted to Axis2 about year back.
    • 65. Conversion of Messaging System about year back (Through a Indiana University and LSF collaboration).
  • Apache Airavata
    All partners agreed that best option for OGCE Project to continue through is Apache Project
    Joined Apache Incubator about 2 months back
    Includes following subprojects
    Xbaya workflow composer
    WS-Messenger as the Messaging system
    Generic Service Toolkit
    Service Registry
    Copyright ZeePack and licensed for reuse under CC License, http://www.flickr.com/photos/zeepack/3681815248
  • 66. Should You Try toMove your Project to Apache?
  • 67. Apache as a Sustainability model for Research projects
    • Industry values “People”, we (opensource) value “Code”, and Academia values “Ideas”.
    • 68. Most NSF Grants, now, ask for a Sustainability Model as part of Proposals.
    • 69. One option is a commercial spin off.
    Diamonds are
    Forever
    • Doing it in a opensource way, building a community and users around a project is also a potential Solution.
    • 70. Many Challenges: ownership, need to renounce control, active engagement of the community are the key.
    • 71. “Source Open” is not good enough!!
    • 72. “Dump and Run” does not work either.
    Copyright stephend9 and licensed for reuse under CC License, http://www.flickr.com/photos/stephend9/372996705
  • 73. Pros & Cons
  • 74. How does the Model Works?
    Need a Champion
    Have to submit a Proposal to Apache Incubator
    If accepted, will be placed in the incubator
    Team should work to build the community
    Users
    Developers
    Diversity of the community
    Graduation
    More users usually means more contribution
    Apache Board continues to monitor for compliance
  • 75. Conclusion
    • Wanted to share a Real Life, Large-Scale SOA Usecase
    • 76. Wanted to show LEAD-Apache interactions as a real Life Case Study of interactions between Apache and an Academic Project.
    • 77. Wanted to Showcase Apache as a Sustainability Mechanism, if it is done right.
    • 78. Wanted to Give you a sense of Some open problems and kind of problems Distributed Systems and E-Science trying to solve.
  • Questions?
    Copyright by romainguy, and licensed for reuse under CC License http://www.flickr.com/photos/romainguy/249370084

×