Get involved with the Apache Software Foundation

  • 2,682 views
Uploaded on

Presented at Indian Institute of Information Technology (IIIT) Allahabad on 21 Oct 2009 to students about the Apache Software Foundation, Lucene, Solr, Hadoop and on the benefits of contributing to …

Presented at Indian Institute of Information Technology (IIIT) Allahabad on 21 Oct 2009 to students about the Apache Software Foundation, Lucene, Solr, Hadoop and on the benefits of contributing to open source projects. The target audience was sophomore, junior and senior B.Tech students.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,682
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
34
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Get involved with the Apache Software Foundation Shalin Shekhar Mangar shalin [at] apache [dot] org
  • 2. Who am I?
  • 3. History
    • 1996 – A ”patchy” web server
    • 4. 1999 – The Apache Software Foundation, Tomcat, Lucene
    • 5. 2002 – Nutch
    • 6. 2006 – Solr, Hadoop
    • 7. 2008 – Mahout
  • 8. Today
    • Apache HTTPD powers 65% of all servers and serves 100 million websites!
    • 9. Lucene powers search on thousands of web sites
    • 10. Hadoop powers AOL, Yahoo, Facebook. Runs on thousand node clusters
    • 11. So many projects!
    • 12. Thousands of active contributors
  • 13. Why work on Apache/OSS?
    • Work on what you like, when you like
    • 14. Development in the ”real” world
    • 15. Learn from the best
    • 16. Build a publicly verifiable resume
    • 17. Companies will find you!
  • 18. Problems we're solving
    • Fast full-text search
    • 19. Application servers & frameworks
    • 20. Processing petabytes of data on thousands of unreliable commodity servers
    • 21. Crawling the web
    • 22. Scalable machine learning algorithms
    • 23. Data mining & analytics
  • 24.
    • High performance, scalable, full text search library
    • 25. Focus: Indexing + Searching Documents
    • 26. 100% Java, no dependencies
    • 27. No crawlers or document parsing
    • 28. Users: Wikipedia, Technorati, SourceForge, …
    • 29. Applications: Eclipse, Jira, Nutch, Solr, many commercial products
  • 30. Lucene Inverted Index
  • 31. Lucene Components
    • Inverted Index
    • 32. Write once – merge in the background
    • 33. Query Types – Term, Boolean, Prefix, Range
    • 34. Scoring – TF, IDF, Length, Constant, Function
    • 35. Filtering
  • 36. Lucene – Towards the future
    • Near real-time search – Many engineering challenges
    • 37. Flexible indexing – Alternate file formats, data structures
    • 38. Updates – Common values & per-document
    • 39. Query Optimization
    • 40. Better language support
  • 41.
  • 48. Solr – Towards the Future
    • Near Real-Time Search & Replication
    • 49. Scale to hundreds of servers
    • 50. Scale to thousands of indexes on a single box
    • 51. Update documents
    • 52. Faster auto-complete component
    • 53. Field Collapsing
    • 54. Clustering, Spell Suggestions, Clickstream feedback
  • 55.
    • Distributed File System – HDFS
    • 56. Map/Reduce
    • 57. Job scheduler
    • 58. Reliably store petabytes of data
    • 59. Compute in parallel
    • 60. Detect/handle failures
  • 61. Map/Reduce
    • map(key1,value) -> list<key2,value2>
    • 62. reduce(key2, list<value2>) -> list<value3>
    • 63. A large number of problems can be solved in this functional way
    • 64. Sort, Word Count, PageRank, Deduplication
    • 65. Data mining, co-occurence analysis
  • 66. Hadoop Map/Reduce
  • 67. Hadoop – Towards the Future
    • Better job scheduling, resource sharing
    • 68. Hadoop Workflow systems
    • 69. Hbase – Large databases in the cloud
    • 70. Performance improvements
    • 71. Hundreds more!
  • 72. How do I start?
    • Choose your project
    • 73. Join the mailing list or forum
    • 74. Check out the code
    • 75. Find open issues and feature requests
    • 76. Ask the developers on what you can work on
  • 77. Contributing
  • 82. Do's and Don'ts
    • dnt rite sms lingo!
    • 83. Be courteous
    • 84. Don't be an island. Collaborate.
    • 85. Learn from your mistakes
    • 86. Persevere. It takes time.
  • 87. Questions? Shalin Shekhar Mangar shalin [at] apache [dot] org http://twitter.com/shalinmangar http://shalinsays.blogspot.com