Your SlideShare is downloading. ×
0
Get involved with the Apache Software Foundation Shalin Shekhar Mangar shalin [at] apache [dot] org
Who am I?
History <ul><li>1996 – A ”patchy” web server
1999 – The Apache Software Foundation, Tomcat, Lucene
2002 – Nutch
2006 – Solr, Hadoop
2008 – Mahout </li></ul>
Today <ul><li>Apache HTTPD powers 65% of all servers and serves 100 million websites!
Lucene powers search on thousands of web sites
Hadoop powers AOL, Yahoo, Facebook. Runs on thousand node clusters
So many projects!
Thousands of active contributors </li></ul>
Why work on Apache/OSS? <ul><li>Work on what you like, when you like
Development in the ”real” world
Learn from the best
Build a publicly verifiable resume
Companies will find you! </li></ul>
Problems we're solving <ul><li>Fast full-text search
Application servers & frameworks
Processing petabytes of data on thousands of unreliable commodity servers
Upcoming SlideShare
Loading in...5
×

Get involved with the Apache Software Foundation

2,809

Published on

Presented at Indian Institute of Information Technology (IIIT) Allahabad on 21 Oct 2009 to students about the Apache Software Foundation, Lucene, Solr, Hadoop and on the benefits of contributing to open source projects. The target audience was sophomore, junior and senior B.Tech students.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,809
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
39
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Get involved with the Apache Software Foundation"

  1. 1. Get involved with the Apache Software Foundation Shalin Shekhar Mangar shalin [at] apache [dot] org
  2. 2. Who am I?
  3. 3. History <ul><li>1996 – A ”patchy” web server
  4. 4. 1999 – The Apache Software Foundation, Tomcat, Lucene
  5. 5. 2002 – Nutch
  6. 6. 2006 – Solr, Hadoop
  7. 7. 2008 – Mahout </li></ul>
  8. 8. Today <ul><li>Apache HTTPD powers 65% of all servers and serves 100 million websites!
  9. 9. Lucene powers search on thousands of web sites
  10. 10. Hadoop powers AOL, Yahoo, Facebook. Runs on thousand node clusters
  11. 11. So many projects!
  12. 12. Thousands of active contributors </li></ul>
  13. 13. Why work on Apache/OSS? <ul><li>Work on what you like, when you like
  14. 14. Development in the ”real” world
  15. 15. Learn from the best
  16. 16. Build a publicly verifiable resume
  17. 17. Companies will find you! </li></ul>
  18. 18. Problems we're solving <ul><li>Fast full-text search
  19. 19. Application servers & frameworks
  20. 20. Processing petabytes of data on thousands of unreliable commodity servers
  21. 21. Crawling the web
  22. 22. Scalable machine learning algorithms
  23. 23. Data mining & analytics </li></ul>
  24. 24. <ul><li>High performance, scalable, full text search library
  25. 25. Focus: Indexing + Searching Documents
  26. 26. 100% Java, no dependencies
  27. 27. No crawlers or document parsing
  28. 28. Users: Wikipedia, Technorati, SourceForge, …
  29. 29. Applications: Eclipse, Jira, Nutch, Solr, many commercial products </li></ul>
  30. 30. Lucene Inverted Index
  31. 31. Lucene Components <ul><li>Inverted Index
  32. 32. Write once – merge in the background
  33. 33. Query Types – Term, Boolean, Prefix, Range
  34. 34. Scoring – TF, IDF, Length, Constant, Function
  35. 35. Filtering </li></ul>
  36. 36. Lucene – Towards the future <ul><li>Near real-time search – Many engineering challenges
  37. 37. Flexible indexing – Alternate file formats, data structures
  38. 38. Updates – Common values & per-document
  39. 39. Query Optimization
  40. 40. Better language support </li></ul>
  41. 41. <ul><li>Search server built on Lucene
  42. 42. Schema
  43. 43. HTTP APIs
  44. 44. Replication
  45. 45. Distributed Search
  46. 46. Caching
  47. 47. Extensible with plugins </li></ul>
  48. 48. Solr – Towards the Future <ul><li>Near Real-Time Search & Replication
  49. 49. Scale to hundreds of servers
  50. 50. Scale to thousands of indexes on a single box
  51. 51. Update documents
  52. 52. Faster auto-complete component
  53. 53. Field Collapsing
  54. 54. Clustering, Spell Suggestions, Clickstream feedback </li></ul>
  55. 55. <ul><li>Distributed File System – HDFS
  56. 56. Map/Reduce
  57. 57. Job scheduler
  58. 58. Reliably store petabytes of data
  59. 59. Compute in parallel
  60. 60. Detect/handle failures </li></ul>
  61. 61. Map/Reduce <ul><li>map(key1,value) -> list<key2,value2>
  62. 62. reduce(key2, list<value2>) -> list<value3>
  63. 63. A large number of problems can be solved in this functional way
  64. 64. Sort, Word Count, PageRank, Deduplication
  65. 65. Data mining, co-occurence analysis </li></ul>
  66. 66. Hadoop Map/Reduce
  67. 67. Hadoop – Towards the Future <ul><li>Better job scheduling, resource sharing
  68. 68. Hadoop Workflow systems
  69. 69. Hbase – Large databases in the cloud
  70. 70. Performance improvements
  71. 71. Hundreds more! </li></ul>
  72. 72. How do I start? <ul><li>Choose your project
  73. 73. Join the mailing list or forum
  74. 74. Check out the code
  75. 75. Find open issues and feature requests
  76. 76. Ask the developers on what you can work on </li></ul>
  77. 77. Contributing <ul><li>Ideas!
  78. 78. Features & Bug fixes
  79. 79. Unit tests
  80. 80. Documentation
  81. 81. Performance benchmarks </li></ul>
  82. 82. Do's and Don'ts <ul><li>dnt rite sms lingo!
  83. 83. Be courteous
  84. 84. Don't be an island. Collaborate.
  85. 85. Learn from your mistakes
  86. 86. Persevere. It takes time. </li></ul>
  87. 87. Questions? Shalin Shekhar Mangar shalin [at] apache [dot] org http://twitter.com/shalinmangar http://shalinsays.blogspot.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×