Get involved with the Apache Software Foundation

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Get involved with the Apache Software Foundation - Presentation Transcript

    1. Get involved with the Apache Software Foundation Shalin Shekhar Mangar shalin [at] apache [dot] org
    2. Who am I?
    3. History
      • 1996 – A ”patchy” web server
      • 1999 – The Apache Software Foundation, Tomcat, Lucene
      • 2002 – Nutch
      • 2006 – Solr, Hadoop
      • 2008 – Mahout
    4. Today
      • Apache HTTPD powers 65% of all servers and serves 100 million websites!
      • Lucene powers search on thousands of web sites
      • Hadoop powers AOL, Yahoo, Facebook. Runs on thousand node clusters
      • So many projects!
      • Thousands of active contributors
    5. Why work on Apache/OSS?
      • Work on what you like, when you like
      • Development in the ”real” world
      • Learn from the best
      • Build a publicly verifiable resume
      • Companies will find you!
    6. Problems we're solving
      • Fast full-text search
      • Application servers & frameworks
      • Processing petabytes of data on thousands of unreliable commodity servers
      • Crawling the web
      • Scalable machine learning algorithms
      • Data mining & analytics
      • High performance, scalable, full text search library
      • Focus: Indexing + Searching Documents
      • 100% Java, no dependencies
      • No crawlers or document parsing
      • Users: Wikipedia, Technorati, SourceForge, …
      • Applications: Eclipse, Jira, Nutch, Solr, many commercial products
    7. Lucene Inverted Index
    8. Lucene Components
      • Inverted Index
      • Write once – merge in the background
      • Query Types – Term, Boolean, Prefix, Range
      • Scoring – TF, IDF, Length, Constant, Function
      • Filtering
    9. Lucene – Towards the future
      • Near real-time search – Many engineering challenges
      • Flexible indexing – Alternate file formats, data structures
      • Updates – Common values & per-document
      • Query Optimization
      • Better language support
      • Search server built on Lucene
      • Schema
      • HTTP APIs
      • Replication
      • Distributed Search
      • Caching
      • Extensible with plugins
    10. Solr – Towards the Future
      • Near Real-Time Search & Replication
      • Scale to hundreds of servers
      • Scale to thousands of indexes on a single box
      • Update documents
      • Faster auto-complete component
      • Field Collapsing
      • Clustering, Spell Suggestions, Clickstream feedback
      • Distributed File System – HDFS
      • Map/Reduce
      • Job scheduler
      • Reliably store petabytes of data
      • Compute in parallel
      • Detect/handle failures
    11. Map/Reduce
      • map(key1,value) -> list<key2,value2>
      • reduce(key2, list<value2>) -> list<value3>
      • A large number of problems can be solved in this functional way
      • Sort, Word Count, PageRank, Deduplication
      • Data mining, co-occurence analysis
    12. Hadoop Map/Reduce
    13. Hadoop – Towards the Future
      • Better job scheduling, resource sharing
      • Hadoop Workflow systems
      • Hbase – Large databases in the cloud
      • Performance improvements
      • Hundreds more!
    14. How do I start?
      • Choose your project
      • Join the mailing list or forum
      • Check out the code
      • Find open issues and feature requests
      • Ask the developers on what you can work on
    15. Contributing
      • Ideas!
      • Features & Bug fixes
      • Unit tests
      • Documentation
      • Performance benchmarks
    16. Do's and Don'ts
      • dnt rite sms lingo!
      • Be courteous
      • Don't be an island. Collaborate.
      • Learn from your mistakes
      • Persevere. It takes time.
    17. Questions? Shalin Shekhar Mangar shalin [at] apache [dot] org http://twitter.com/shalinmangar http://shalinsays.blogspot.com

    + Shalin MangarShalin Mangar, 4 weeks ago

    custom

    249 views, 0 favs, 0 embeds more stats

    Presented at Indian Institute of Information Techno more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 249
      • 249 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 5
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories