Get involved with the Apache Software Foundation

1.
Get involved withthe Apache Software Foundation Shalin Shekhar Mangar shalin [at] apache [dot] org

2.
Who am I?

3.
History 1996 –A ”patchy” web server

4.
1999 – TheApache Software Foundation, Tomcat, Lucene

5.
2002 – Nutch

6.
2006 – Solr,Hadoop

7.
2008 – Mahout

8.
Today Apache HTTPDpowers 65% of all servers and serves 100 million websites!

9.
Lucene powers searchon thousands of web sites

10.
Hadoop powers AOL,Yahoo, Facebook. Runs on thousand node clusters

11.
So many projects!

12.
Thousands of activecontributors

13.
Why work onApache/OSS? Work on what you like, when you like

14.
Development in the”real” world

15.
Learn from thebest

16.
Build a publiclyverifiable resume

17.
Companies will findyou!

18.
Problems we're solvingFast full-text search

19.
Application servers &frameworks

20.
Processing petabytes ofdata on thousands of unreliable commodity servers

21.
Crawling the web

22.
Scalable machine learningalgorithms

23.
Data mining &analytics

24.
High performance, scalable,full text search library

25.
Focus: Indexing +Searching Documents

26.
100% Java, nodependencies

27.
No crawlers ordocument parsing

28.
Users: Wikipedia, Technorati,SourceForge, …

29.
Applications: Eclipse, Jira,Nutch, Solr, many commercial products

30.
Lucene Inverted Index

31.
Lucene Components InvertedIndex

32.
Write once –merge in the background

33.
Query Types –Term, Boolean, Prefix, Range

34.
Scoring – TF,IDF, Length, Constant, Function

35.
Filtering

36.
Lucene – Towardsthe future Near real-time search – Many engineering challenges

37.
Flexible indexing –Alternate file formats, data structures

38.
Updates – Commonvalues & per-document

39.
Query Optimization

40.
Better language support

41.
Search server builton Lucene

42.
Schema

43.
HTTP APIs

44.
Replication

45.
Distributed Search

46.
Caching

47.
Extensible with plugins

48.
Solr – Towardsthe Future Near Real-Time Search & Replication

49.
Scale to hundredsof servers

50.
Scale to thousandsof indexes on a single box

51.
Update documents

52.
Faster auto-complete component

53.
Field Collapsing

54.
Clustering, Spell Suggestions,Clickstream feedback

55.
Distributed File System– HDFS

56.
Map/Reduce

57.
Job scheduler

58.
Reliably store petabytesof data

59.
Compute in parallel

60.
Detect/handle failures

61.
Map/Reduce map(key1,value) ->list<key2,value2>

62.
reduce(key2, list<value2>) ->list<value3>

63.
A large numberof problems can be solved in this functional way

64.
Sort, Word Count,PageRank, Deduplication

65.
Data mining, co-occurenceanalysis

66.
Hadoop Map/Reduce

67.
Hadoop – Towardsthe Future Better job scheduling, resource sharing

68.
Hadoop Workflow systems

69.
Hbase – Largedatabases in the cloud

70.
Performance improvements

71.
Hundreds more!

72.
How do Istart? Choose your project

73.
Join the mailinglist or forum

74.
Check out thecode

75.
Find open issuesand feature requests

76.
Ask the developerson what you can work on

77.
Contributing Ideas!

78.
Features & Bugfixes

79.
Unit tests

80.
Documentation

81.
Performance benchmarks

82.
Do's and Don'tsdnt rite sms lingo!

83.
Be courteous

84.
Don't be anisland. Collaborate.

85.
Learn from yourmistakes

86.
Persevere. It takestime.

87.
Questions? Shalin ShekharMangar shalin [at] apache [dot] org http://twitter.com/shalinmangar http://shalinsays.blogspot.com

Get involved with the Apache Software Foundation

More Related Content

What's hot

Viewers also liked

Similar to Get involved with the Apache Software Foundation

More from Shalin Shekhar Mangar

Recently uploaded

Get involved with the Apache Software Foundation