Your SlideShare is downloading. ×
Eguide lucene revolution_2011_v1d
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Eguide lucene revolution_2011_v1d

3,137

Published on

Welcome to San Francisco! …

Welcome to San Francisco!
We are excited to be bringing you the second Lucene Revolution event, following quickly on the
success of our 2010 conference in Boston last year. In addition to all the great feedback we received
after Boston, many people asked about bringing the conference to the West Coast – and here we
are. It’s great to host the community here in our home state of California.
There’s now no question: the revolution is in full swing, and Lucene and Solr are shaping the future
of search. The diverse range of search technology and applications is without a doubt one of its
greatest strengths. For the extended community and ecosystem of open source search, Lucene
Revolution is an unmatched opportunity to learn, network, share experiences, see how others have
changed the world of search.

Published in: Technology
1 Comment
0 Likes
Statistics
Notes
  • Hello,
    If you are seeking to find houses for rent, houses for sale, flats for rent and anything else property related in the Spain area. YespanYa Property is your number one online website if you are searching for houses for sale, you can search by an array of dimensions including bedrooms and property type, you can also narrow your search down to price or amenities. On the other hand if you are looking to sell your house or flat in Spain, then there is no better place to list it then YespanYa Property, put your ad in front of millions of local and national buyers.

    La Zenia property
    Campoamor property
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total Views
3,137
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
33
Comments
1
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. LUCENE REVOLUTION San Francisco 2011Welcome to San Francisco!We are excited to be bringing you the second Lucene Revolution event, following quickly on thesuccess of our 2010 conference in Boston last year. In addition to all the great feedback we receivedafter Boston, many people asked about bringing the conference to the West Coast – and here weare. It’s great to host the community here in our home state of California.There’s now no question: the revolution is in full swing, and Lucene and Solr are shaping the futureof search. The diverse range of search technology and applications is without a doubt one of itsgreatest strengths. For the extended community and ecosystem of open source search, LuceneRevolution is an unmatched opportunity to learn, network, share experiences, see how others havechanged the world of search.Speakers here at the conference hail from companies large and small, from innovative startups andestablished companies, as well as from government, academia and non-profits. Even better, therange of experience and application interests of your fellow-attendees should inspire you to seek outnew ways to put search technology to work. We’ve allotted ample time in breaks to have formal andinformal conversations. And be sure to join the Revolution social network at:http://lucene.crowdvine.com/. Keep an eye out at the Registration Desk for agenda changes andupdates.One group you should definitely seek out here is the core group of developers and committers whoare the heart and soul of the Apache Lucene/Solr project. You know them from the mailing lists;these are the people who do the hard work of making the code do its magic, resolving challengingtechnical and architectural issues that we all benefit from. Don’t just attend their roadmap panel andtechnical sessions; make sure you avail yourself of the opportunity to put faces to names, so thatwhen you’re on the mailing lists, you’ll have more than a ‘to’ and a ‘from’ to go by.As the commercial entity for Lucene/Solr, we at Lucid Imagination are always looking for new waysto help make the most of open source search. Be sure to tell us what you like, what could beimproved, and what topics should be covered in future events. Think about sharing your ownsuccesses with the community by speaking at the next Lucene Revolution.Let the conference staff, or anyone on the Lucid Imagination team, know if you have any questions,or if there’s anything you need.Onward to the revolution!Eric Gries, CEOLucid Imagination 1
  • 2. San Francisco 2011 LUCENE REVOLUTIONOpening Letter .................................................................................................................................................... 1!Contents ............................................................................................................................................................... 2!Timetable at a Glance ........................................................................................................................................ 3!Agenda .................................................................................................................................................................. 6!About Lucid Imagination .................................................................................................................................. 8!About Our Sponsors ........................................................................................................................................ 10!Training .............................................................................................................................................................. 14!Keynotes ............................................................................................................................................................ 18!Sessions–Day 1.................................................................................................................................................. 19!Lightning Talks ................................................................................................................................................. 25!Sessions–Day 2.................................................................................................................................................. 28!Speaker Bios ...................................................................................................................................................... 36!Hotel, Maps & Transportation Info .............................................................................................................. 50!Lucene, Apache Lucene, Solr, Apache Solr, Hadoop, Apache Hadoop and other Apache projects mentioned are trademarks of The Apache Software Foundation. 2
  • 3. LUCENE REVOLUTION San Francisco 2011SUNDAY MAY 2216:00 - 18:00 ........................................................................................ REGISTRATION OPEN Sandpebble Foyer outside Grand Peninsula BallroomMONDAY MAY 238:00 – 9:00 ....................................................................................... TRAINING REGISTRATION OPEN9:00 - 17:00 ...................................................................................... Training Workshops/Day 1 ! Solr Application Development Workshop ! Developing Search Applications with LucidWorks Enterprise ! Lucene Application Development Workshop ! Scaling Search with Solr and Big Data See registration desk in Sandpebble Foyer for room assignment.TUESDAY MAY 248:00 – 9:00 ....................................................................................... TRAINING REGISTRATION OPEN9:00 - 17:00 ...................................................................................... Training Workshops/Day 2 ! Solr Application Development Workshop ! Developing Search Applications with LucidWorks Enterprise ! Lucene Application Development Workshop ! Scaling Search with Solr and Big Data16:00 – 18:00 .............................................................................................. Ticket Pickup for Giants Game (advance tickets required). Tickets may be picked up at the Conference Registration Desk in the Sandpebble Foyer18:00.................................................................................................................. Buses depart for Giants Game from front entrance of Hyatt Hotel 3
  • 4. San Francisco 2011 LUCENE REVOLUTIONWEDNESDAY, MAY 257:30 – 18:00............................................................................................................. REGISTRATION OPEN7:30 – 8:30 ..................................................................................................................Light Breakfast Available8:30 – 10:05 ................................................................................................ Welcome & Keynotes Welcome .................................................................. Eric Gries, Lucid Imagination Keynotes ......................................................Marc Krellenstein, Lucid Imagination Stephen Dunn, The Guardian News and Media10:05 – 10:35 .......................................................................................................................................... BREAK10:35 - 11:25 ........................................................................................ Technical Track Sessions11:25 – 11:35 .......................................................................................................................................... BREAK11:35 - 12:25 ........................................................................................ Technical Track Sessions12:25 - 13:30 ....................................................................................LUNCH AND SPONSOR EXHIBITS13:30 - 14:20 ........................................................................................ Technical Track Sessions14:20 - 14:30 ........................................................................................................................................... BREAK14:30 - 15:20 ........................................................................................ Technical Track Sessions15:20 - 15:50 .......................................................................................................................................... BREAK15:50 - 16:40 ..................................................................................... Panel: “Stump the Chump”16:40 – 17:00 ......................................................................................................................................... BREAK17:00 - 18:30 ........................................................................................................ Lightning Talks18:30........................................................................................................................... REVOLUTION PARTYTHURSDAY MAY 267:45 – 8:45 ..................................................................................................................Light Breakfast Available8:45 – 10:15 Keynote ....................................................................... Stephen O’Grady, Redmonk Panel ..................................................... Committers Q&A, Lucene/Solr Roadmap10:15 – 10:45 .......................................................................................................................................... BREAK10:45 - 11:35 ........................................................................................ Technical Track Sessions11:35 - 11:45 ........................................................................................................................................... BREAK11:45 - 12:35 ........................................................................................ Technical Track Sessions12:35 - 13:45 ....................................................................................LUNCH AND SPONSOR EXHIBITS13:45 - 14:35 ........................................................................................ Technical Track Sessions14:35 - 14:45 ........................................................................................................................................... BREAK14:45 - 15:35 ........................................................................................ Technical Track Sessions15:35 - 15:45 ........................................................................................................................................... BREAK15:45 - 16:35 ....................................................................................... Technical Track Sessions16:35 - 17:30 ......................................... Panel: “Search for Tomorrow (RDBMS for Yesterday)”17:30............................................................................................................................ CONFERENCE ENDS 4
  • 5. LUCENE REVOLUTION San Francisco 2011LOGISTICS ! REGISTRATION is in the Grand Peninsula Foyer ! KEYNOTES and PANEL DISCUSSIONS are Grand Peninsula Ballroom D ! TRACK 1 is in Grand Peninsula Ballroom A/B/C ! TRACK 2 is in Grand Peninsula Ballroom D ! TRACK 3 is in Grand Peninsula Ballroom E/F/G ! TRACK 4 is in Sand Pebble A/B/C ! LUNCHES are in the Atrium (upstairs above Ballroom ) ! THE REVOLUTION PARTY is in the Grand Peninsula Foyer ! TRAINING CLASSES will be held in the Sandpebble Conference Rooms ! TRAINING REGISTRATION is outside the Sandpebble Conference Rooms (please contact charelm@gmail.com if are unsure which class you are in): 5
  • 6. San Francisco 2011 LUCENE REVOLUTION6
  • 7. LUCENE REVOLUTION San Francisco 2011 7
  • 8. San Francisco 2011 LUCENE REVOLUTIONAs the world’s leading source of expertise in open source search technology and thecommercial company for Apache Solr/Lucene, Lucid Imagination offers the products andservices you need for cost-effective development and production deployment of cutting edge searchapplications that lower your cost of growth. Thousands of organizations around the world haveturned to the power of Apache Solr/Lucene open source technology to drive their cutting-edgesearch applications.LucidWorks: Enterprise Grade Solr/LuceneLucidWorks Enterprise is a flexible, cost-effective scalable platform that simplifies development,tuning, configuration and deployment of Solr/Lucene open source search technology. It features: POW ERFUL SEARCH ! Complete Apache Solr 4.x Release Integrated and tested with powerful enhancements ! Scalability Distributed search and indexing ! Cloud-Ready Centrally managed search replication and configuration ! REST API Simplifies integration SIM PLIFIED ADM INSTRATION ! Easy-to-use Installer & Admin UI Streamlines startup and common configuration tasks ! Data Connectors for databases, file systems, Web sites, SharePoint and more ! Multiple file types MS Office, PDF, native XML format documents and more ! Security: LDAP-aware, document level, role- based, policy-driven. ADVANCED USER EXPERIENCE ! Enriched Query Parsing: more resilient interpretation of user input ! Click Scoring: boosts results based on user behavior ! User Alerts: Automatic notification of new results ! Integrated Auto-complete and spellchecking. 8
  • 9. LUCENE REVOLUTION San Francisco 2011Global Expertise: Training & 24x7 ServicesLucid Imagination offers a deep bench of resources in search and open source, backed byunmatched experience with thousands of diverse search applications at the world’s largestcompanies.TRAININGA comprehensive selection of courses and classes for developers, system administrators, managers,and search application users on LucidWorks Enterprise, Solr and Lucene; instruction is offered ina variety of formats around the world.CONSULTINGOur unique ExpertLink Advisory Services provides consultative guidance on design andoptimization for search applications during development and production to ensure yourLucene/Solr implementations meet the requirements of your business.ENTERPRISE SUPPORT AND SUBSCRIPTIONSLucid Imagination offers attractively priced subscriptions that deliver Solr/Lucene technology in anintegrated, well-packaged format. Subscriptions combine stability, security, robust interfaces, andpredictable release schedules with unmatched support resources in reach 24 x 7 x 365 across theglobe. 9
  • 10. San Francisco 2011 LUCENE REVOLUTIONPlatinum Sponsor: Basis TechnologyBasis Technology provides software solutions for multilingual text analytics, information retrieval,and name resolution. Our Rosette© Linguistics Platform is the text analysis engine behind manycommercial and government search-based applications, adding language support to Lucene and Solrfor better search precision and recall in English or 27 other languages. Starting with languageidentification in 55 languages, our high quality linguistic analysis seamlessly integrates into Luceneand Solr via a connector — enabling customizable tokenization and stemming/lemmatization forlanguages like Chinese, Japanese, Arabic, and Persian. Dictionary-based decompounding is availablein German, Dutch, Danish, Swedish, Norwegian, and Korean. Entity extraction enriches search byadding auto-generated metadata and faceted navigation to results. Implementing support for newlanguages to Solr is less than a day’s work.The Rosette Platform powers search, business intelligence, e-discovery, and other enterprise andgovernment applications for customers worldwide including: Microsoft/Bing, Cisco, EMC, Endeca,Oracle, and Yahoo! !!!"#$%&%()*")+,- - 10
  • 11. LUCENE REVOLUTION San Francisco 2011ExhibitorsSALESFORCE.COMSalesforce.com is the enterprise cloud computing leader and the world’s 4th fastest-growingcompany. We’re also one of the “Best Places to Work” (FORTUNE). Salesforce.com’s Search Teamis strong and experienced, with deep architecture expertise. We’re dedicated to delivering the fastest,most reliable cloud-scale enterprise search in the industry. In addition to innovating aroundscalability and security, we strive to delight our end users with an original, intuitive user experienceand relevancy that’s adaptive, robust, and deeply satisfying. If you share our passion for search andfor solving tough problems, swing by our booth to chat. !!!"%$.(%/+0)(")+,-SEARCH TECHNOLOGIESSearch Technologies is the leading independent provider of search engine integration and supportservices. Operating internationally, we help clients to gain business advantage using search. Ourtechnical team of more than 80 experts is the most experienced group of search implementationprofessionals globally, and this mitigates risk for our customers. In short, we are the experts at fine-tuning search applications to deliver business benefits. !!!"%($0)*()*1+.+2&(%")+,-DOCUM ILLDocumill is an independent software vendor (ISV) enabling browser-based access to MicrosoftOffice and PDF documents and empowering high volume server-side content processingsolutions.Documill Visual Search dramatically improves search user experience and discoverabilityof multi-page documents. Instant document previews and page-level search results improvedocument data mining experience and accuracy. With page-level bookmarking features, DocumillVisual Search enables collaborative search, allowing users to take actions based on their findings,share results and syndicate relevant pages into new documents. !!!"3+)4,&..")+,- 11
  • 12. San Francisco 2011 LUCENE REVOLUTIONCommunity SponsorsSEM ATEXTSematext is a software products and services company focused on Search & Analytics using Lucene,Solr, Nutch, Hadoop, HBase, Flume, Mahout, and other open-source technologies. Sematext alsooffers Lucene & Solr technical support subscriptions, consulting packages, and training. Thecompany also runs the popular search-hadoop.com and search-lucene.com sites. Founded in 2007 inNew York, Sematext is privately held and self-funded with presence in North America and Europe.Sematext’s customers include The Library of Congress, Lockheed Martin, Simon & Schuster,Salesforce, NAVTEQ, Comcast, Cox Communications, ProQuest, Citysearch, Gilt Groupe,Autodesk, and many others. !!!"#$%&$(")*%+EM C CORPORATIONEMC Corporation is the world’s leading developer and provider of information infrastructuretechnology and solutions that enable organizations of all sizes to transform the way they competeand create value from their information.We can help you design, build, and manage flexible, scalable,and secure information infrastructures. And with these infrastructures, you’ll be able to intelligentlyand efficiently store, protect, and manage your information so that it can be made accessible,searchable, shareable, and, ultimately, actionable.In short, with an information infrastructure, youcan avoid the potentially serious risks and reduce the significant costs associated with managinginformation, while fully exploiting its value for business advantage. !!!"$%)")*%+SPRINGSOURCE, A DIVISION OF VM W ARE, INC.SpringSource, a division of VMware, Inc., (NYSE: VMW), employs the open source leaders whocreated and drive innovation for Spring, the de facto standard programming model for enterpriseJava applications, as well as the Java and web thought leaders within the Apache Tomcat, ApacheHTTP Server, RabbitMQ, Hyperic, Groovy and Grails open source communities. SpringSourceforges open source innovations to create lean and powerful technology that people love to use.From high productivity developer tools and framework to lightweight application server runtimesincluding data management solutions for the hardest enterprise and cloud scale problems,SpringSource provides solutions for tomorrow’s enterprise challenges. !!!"#,-./0#*1-)$")*%"+ 12
  • 13. LUCENE REVOLUTION San Francisco 2011M ANNING PUBLICATIONSManning Publications offers computer books for professionals—programmers, systemadministrators, designers, architects, managers and others. Manning’s focus is on computing titles atprofessional levels. We care about the quality of our books. Our books are designed withoutgimmicks. Their main goal is elegance and readability—we feel the two are often the same. Ourcovers are understated, decorated with pictures of worldwide regional dress habits of two hundredyears ago. Many of our books come with online reader support: authors answer the questions oftheir readers in our Web-based Author Online discussion forums. - - !!!",$11&12")+,-DZONEDZone is a social linking and blogging network for the developer and IT communities. According toPC Magazine, “DZone is a developer’s dream—a vast network of user-submitted links to messageboards, news, coding tricks, and more.” Launched in June, 2006, DZone is in Alexa’s top 3000 sites,surpassing established leaders like DevX, Sys-con, FTP Online and TheServerSide.com. DZone isthe only vertically focused site regularly listed among the web’s largest social bookmarking sites. Inits first year of operation DZone sent over 5 million visitors to other developer websites. Today,DZone has curated topic pages for Java, Solr/Lucene, Cloud Computing, PHP, Agile, Mobile, andmuch more. !!!"37+1(")+,-TNR GLOBALTNR Global is a systems design and integration company focused on enterprise search and cloudcomputing solutions. TNR develops scalable, fault-tolerant web-based search solutions built on theopen source LAMP stack and utilizing Amazon Web Services and/or physical servers. TNR hasover ten years of experience in web systems and enterprise search implementations, both proprietaryand open source, and specializes in Lucene Solr and FAST ESP search applications. TNR Globalbuilds solutions for: Vertical Search Engines, Publishing, Web Directories, News Sites, InformationPortals, Web Catalogs, Education. We also work with web based startups to build scalable services. !!!"102.+#$.")+,-UCHIDA SPECTRUMUchida Spectrum, Inc. (USI) is a leader in the Japan search market. USI provides SMART/Insight, asearch application that integrates and analyzes enterprise information. SMART/InSight is used byleading blue chips, like Canon and Moody’s. USI is working with Lucid Imagination as its StrategicAlliance Partner to integrate LucidWorks Enterprise into its products and offer Lucene/Solr supportservices. In 2011, USI expanded its offerings to Enterprise Search and Web Services/Ecommercecompanies across Asia. USI now serves clients and partners in Japan, India, China and Singapore. !!!"%6()04,")+"86- 13
  • 14. San Francisco 2011 LUCENE REVOLUTIONScaling Search With Big Data And SolrScaling Search with Big Data and Solr is a 2-day instructor-led, hands-on classroom training coursedelivered by instructors certified by Lucid in a shared classroom setting. The class is for Solrdevelopers who want to know how to leverage the flexible search functionality of Apache Solr andthe Big Data processing of Apache Hadoop, to create the indexes for both general search andaugmented data analytics. Lab exercises and real-world examples will be used to reinforce content.We’ll start with Hadoop from the ground up, and cover MapReduce, HDFS—the HadoopDistributed File System, cluster management, “the shuffle,” etc., before continuing on to connectingit to Solr. We’ll look at common use cases for generating search indexes from big data, typicalpatterns for the data processing workflow, and how to make it all work reliably at scale. We willexplore in-depth an example of processing 1 billion records to create a faceted Solr search solution.You’ll learn how Solr can be used as a NoSQL solution, and how it compares to classic NoSQLprojects such as Cassandra and HBase.The class will continue with techniques for scaling your Solr installation, how to identify bottlenecksin your Solr installation, how to monitor your installation, and how determine resource usage. We’llalso cover various Solr architectures, their characteristics and use cases. We’ll examine how to applythese to make appropriate tradeoffs to effectively scale your Solr installation.THE COURSE COVERS ! An overview of Hadoop. ! Understanding MapReduce. ! Principles of Hadoop development, operations & eco-system. ! How to use Hadoop with Solr. ! How to Index large volumes of data. ! How to effectively search large indexes. ! Understanding NoSQL. ! How to shard/federate/replicate your data for large indexes. ! Understanding resources cost & tradeoffs for Solr Features.PREREQUISITESProspective students should be familiar with Solr, obtained either through work experience withSolr, or having completed the Lucid Imagination Solr training course. It is assumed the student doesnot have prior Hadoop experience. 14
  • 15. LUCENE REVOLUTION San Francisco 2011Developing Search Applications With Lucidworks EnterpriseDeveloping Search Applications with LucidWorks Enterprise is a 2-day instructor-led, hands-onclassroom training course designed and developed by the engineers that developed LucidWorksEnterprise (LWE), and delivered by instructors certified by Lucid in a shared classroom setting.The objective of this course is to introduce LucidWorks Enterprise to users with no previousexperience working with search applications. Through a combination of lectures and hands-on labexercises you will learn how to get up and running with LucidWorks Enterprise, what thecomponents of a search application are, and how to make your content searchable and findable in asearch application built on LucidWorks Enterprise. There will be time for questions and discussionto enhance your learning experience.At the end of the course you will know what a search application is, and how to set up and useLucidWorks Enterprise to index and search your content. You will also learn about all of thefeatures LWE such as highlighting, spell checking, and custom alerts, and how to use these featuresto build a satisfying search experience for end users who will search your content.THE COURSE COVERS ! What a search application is and how to build one with LucidWorks Enterprise. ! How to install and configure LWE. ! How to make your content searchable and findable. ! How to work with different data sources such as web pages, relational databases, and rich content files. ! How to build queries to search for content in LWE. ! Techniques and features in LWE that can be used to make results for end users more relevant. ! Different ways to process search results returned by LWE.PREREQUISITESNo programming skills are necessary, however some technical background and familiarity withapplication development will be helpful. There will be labs accompanying the lectures that willrequire basic computer skills including how to run a simple command from the command line.Noprevious experience with search applications is necessary. 15
  • 16. San Francisco 2011 LUCENE REVOLUTIONSolr Application Development WorkshopSolr Application Development Workshop is a two-day hands-on training course designed anddeveloped by the engineers that helped write the Apache Lucene/Solr code, and delivered byinstructors certified by Lucid in a shared classroom setting. The workshop is targeted at developerswho want to build applications with Apache Solr, the Lucene Search Server. You will learn how toset up and use Solr to index and search, how to analyze and solve common problems, and how touse optional Solr modules such as facets, spell check, and highlighting. Lab exercises and real-worldexamples will be used to reinforce content.There will be time for questions and discussion to enhance your learning experience. At the end ofthe course you will understand how to set up and use Solr to index and search, how to analyze andsolve common problems, and how to use optional Solr modules such as facets, spell check, andhighlighting.THE COURSE COVERS ! Principles of search application development ! Common search use cases and their application ! How to make content searchable ! Key Solr and Lucene concepts ! Basics of indexing and searching using Solr ! How to design and run a Solr application ! Best practices for indexing, searching and performance ! Techniques to analyze and resolve common search problems ! How to leverage Solr’s optional modules including spell checking, highlighting, Data Import Handler, Tika Integration and other popular capabilities ! Advanced topics in designing Solr apps and running a site ! Solr operations and deployment tools and strategies ! How to customize and extend SolrPREREQUISITESSome programming skill and experience with a modern programming language such as Java, PHP,Perl, Ruby, .NET, or any language that supports HTTP and/or XML. 16
  • 17. LUCENE REVOLUTION San Francisco 2011Lucene Application Development WorkshopLucene Application Development Workshop is a two day instructor-led hands-on trainingworkshop, written and led by the engineers who helped write the Apache Lucene/Solr code. Theobjective of this course is to provide you with real life use cases and teach you how to apply Luceneto real business requirements. During the course you will learn to apply best practices in developingscalable, highly available and high performance search applications.There will be time for questions and discussion to enhance your learning experience.THE COURSE COVERS ! Principals of search application development. ! Common search use cases and their application. ! How to make content searchable. ! Key Lucene concepts. ! Basics of indexing and searching with the Lucene APIs. ! Best practices for indexing, searching and performance. ! Analysis techniques for solving common search problems. ! Lucene Internals. ! Lucene’s optional modules to enable spell checking, highlighting and other common search features.PREREQUISITESBasic Java programming skills 17
  • 18. San Francisco 2011 LUCENE REVOLUTIONThe Once and Future Historyof Enterprise Search and Open SourceM ARC KRELLENSTEIN | LUCID IM AGINATIONWhile it remains challenging to build best practice search applications, core search technology hasbecome commoditized. Open source Lucene/Solr represents the best form of that commodity, asgood as or better than any commercial search technology while also providing the cost, control andflexibility advantages of open source. In this talk, we’ll look at how past challenges in search weremet and new ones evolved, and the place of Lucene/Solr in that evolution.From Publisher To Platform: How The GuardianEmbraced the Internet using Content, Search, and Open SourceSTEPHEN DUNN | GUARDIAN NEW S AND M EDIA UKIn 2009 The Guardian launched The Open Platform, a suite of services and tools that enablecontent partners and developers to build applications with The Guardian’s rich content. The contentAPI, hosted on Solr instances on EC2, contains JSON representations of all Guardian articles backto 1999 - over 1 million articles, and is an increasingly complete representation of the output of theorganization. The DataStore contains curated data sets for use in applications and virtualizations.This talk will cover how The Guardian opened up their business, enriched it, and reached newmarkets with its Open Platform strategy. Stephen will cover the technical architecture,implementation of Solr (the key technology powering the platform), and how The Guardian hasused it to embrace disruption in the media space, while finding new sources of revenue andinnovation. With two years since its launch, Stephen will cover some of the lessons learned, andexplain how the Guardian complements use of Solr with other open-source non-relationaltechnology, as it platform evolves.All Data Big and SmallSTEPHEN O’GRADY | REDM ONKThe last twenty four months have seen a veritable explosion in discussion around what is commonlyreferred to as Big Data and the infrastructure technology employed to manage it. The wealth ofavailable open source software means that businesses from any industry have easily accessible toolswith which to tackle projects that would have been out of their reach just a few years prior. Lessheralded, however, has been the fact that making data actually useful - whatever its size - remains achallenge. In this session we’ll explore the role of search in putting data - big and small - to workanswering the important questions for businesses and society by reducing the friction betweenquestion and answer. 18
  • 19. LUCENE REVOLUTION San Francisco 2011Integrating Advanced Text Analytics into SolrSTEVE KEARNS | BASIS TECHNOLOGYText analytics provides a number of interesting analytic capabilities that can enhance enterprisesearch applications, though in practice it is not always obvious how these can be integratedeffectively into Solr. This presentation will describe some of the practical ways that leadingorganizations are using text analytics by integrating them directly into Solr and their user interface toimprove relevance, navigate results, and discover new information. The combination of Solr andquality text analytics can improve existing keyword search solutions, and enable new ways ofdiscovering knowledge hidden in existing data.Finite State Automata in Lucene: Internals and ApplicationsDAW ID W EISS | POZNAN UNIVERSITY OF TECHNOLOGY, POLANDFinite state automata and transducers made it into Lucene fairly recently, but already show a verypromising impact on search performance. This data structure is rarely exploited because it iscommonly (and unfairly) associated with high complexity. During the talk, I will try to show thatautomata and transducers are in fact very simple, their construction can be very efficient (memoryand time-wise) and their field of applications very broad. This will be backed by an introduction tohow FSTs are implemented in Lucene (construction and traversals) and practical use cases of whereFSTs have been useful so far. If you’d like to see how to squeeze a 150MB of text data into 1.8MBof compact data structure, this talk is for you.Case Study - Panasonic Europe Powered by Apache SolrDANIEL POTZINGER | AOE M EDIA GM BHIn 2010 Panasonic made the decision to replace their legacy enterprise search tool and switched thesearch for all their European websites to a Apache Solr based solution. Now their customers benefitfrom an incredibly fast and feature rich solution that is much more than just a search and hasbecome a valuable sales-driving tool for Panasonic. Features like relevancy manipulation,autosuggest, contextual filtering for properties like color or product category were implementedunder not the most ideal circumstances mainly that there was no access to structured data. Thesearch was rolled out in close to 30 countries so far also putting Solr multi-lingual handling to a test. 19
  • 20. San Francisco 2011 LUCENE REVOLUTIONReal-time Search at YammerBORIS ALEKSANDROVSKY | YAM M ER, INC.This talk will be focused on the architecture, scalability concerns, performance bottlenecks,operational characteristics and lessons learned while designing and implementing Yammerdistributed real-time search system. Yammer is an enterprise social network SaaS offering with over100,000 networks (including 85% of the Fortune 100) and nearly 2 million users. The search systemwe developed scales well up to 1B messages and serves a foundation of knowledge base analysisservices Yammer is developing.Boosting Documents in Solr by Recency,Popularity and Personal PreferencesTIM OTHY POTTER | NATIONAL RENEW ABLE ENERGY LABORATORY (NREL)Attendees with come away from this presentation with a good understanding and access to sourcecode for boosting and/or filtering documents by recency, popularity, and personal preferences. Mysolution improves upon the common “recipe” based solution for boosting by document age. Theframework also supports boosting documents by a popularity score, which is calculated andmanaged outside the index. I will present a few different ways to calculate popularity in a scalablemanner. Lastly, my solution supports the concept of a personal document collection, where eachuser is only interested in a subset of the total number of documents in the index. My presentationwill provide a good example of how to filter and/or boost results based on user preferences, whichis a very common requirement of many Web applications.Jazzed about Solr: People as a Search ProblemJOSHUA TUBERVILLE | EHARM ONYSearch oriented architectures are obvious approaches for web pages, emails, documents, and othertext based entities. Often with traditional structured data, text searching is “added on” to thetraditional Boolean queries in relational stores. When Jazzed was initiated we wanted search to befront and center. When we evaluated Solr we realized we could take the opposite approach “add on”Boolean components to textual searches. This hybrid query approach makes transitioning to flexibleranking easy and straightforward. In this talk we will cover ! How we model semi-structured user data in Solr ! Indexing strategies and their tradeoffs ! Where in Jazzed architecture Solr does and doesn’t fit ! What aspects of Solr we are using ! Future considerations 20
  • 21. LUCENE REVOLUTION San Francisco 2011Heavy Committing: DocValuesaka. Column Stride Fields in Lucene 4.0SIM ON W ILLNAUER | APACHE LUCENE PM CLucene 4.0 is on its way to deliver a tremendous amount of new features and improvements. BesideReal-Time Search & Flexible Indexing DocValues aka. Column Stride Fields is one of the “nextgeneration” features. DocValues enable Lucene to efficiently store and retrieve type-safe Document& Value pairs in a column stride fashion either entirely memory resident random access or diskresident iterator based without the need to un-invert fields. Its final goal is to provide aindependently update-able per document storage for scoring, sorting or even filtering. This talk willintroduce the current state of development, implementation details, its features and how DocValueshave been integrated into Lucene’s Codec API for full extendability.Search, APIs, capability management and the Sensis journeyCRAIG REES | SENSISEarlier this year, Sensis launched its Business Search API, which allows publishers to develop localsearch propositions powered by the two million business listings contained in the Australian YellowPages® and White Pages® directories.This case study will explore Sensis’ strategic direction for search and explain how the framework andmetrics by which search is managed at Sensis were used to define our search roadmap. Keyarchitectural decisions including our use of Solr and MongoDB will be discussed as well as ourapproach to real-time search tuning and quality management.A Study of I/O and Virtualization Performance witha Search Engine based on an XML database and LuceneED BUECHE | EM CDocumentum xPlore provides an integrated Search facility for the Documentum Content Server.The standalone search engine is based on EMC’s xDB (Native XML database) and Lucene. In thistalk we will introduce xPlore and some of its key components and capabilities. These include aspectsof a tight integration of Lucene with the XML database: xQuery translation and optimization intoLucene query/API’s as well as transactional update Lucene). In addition, xPlore is being deployedaggressively into virtualized environments (both disk I/O and VM). We cover some performanceresults and tuning tips in these areas. 21
  • 22. San Francisco 2011 LUCENE REVOLUTIONFour Pillars of Designing the Search ExperienceTYLER TATE | TW IGKITLucene and Solr provide many excellent tools for presenting information to users, but what makessome search user interfaces better than others? Should you aim for a rich, advanced UI or shouldyou “just make it look like Google”? Through his work at TwigKit with blue-chip corporations,scientific institutes, and governments, Tyler has identified four guiding pillars of the searchexperience: ! User Expertise - Novices orienteer, experts teleport ! User Behaviour - Lookup, learn, and investigate ! Information Diversity - homogenous vs. heterogenous data ! Situational Context - factors from the surrounding environmentWe’ll delve deep into each dimension and discuss how to achieve useful, useable, and beautifulsearch interfaces using design patterns including: autocomplete, faceted navigation, breadcrumbs,best bets, related searches, spelling suggestions, clickable metadata, result clustering, saved searches,data visualisation, and more.Using Solr in OnlineTravel Shopping to Improve User ExperienceESTEBAN DONATO, SUDHAKARA KAREGOW DRA AND RAM ON RESM A | TRAVELOCITYIn this talk we would like to present three different use cases of Solr in the travel industry. First of allwe would describe how we implemented faceted navigation for hotel shopping. Then, we willintroduce how we implemented destination searching functionality like auto-complete andmisspelling. Lastly, we will show you how we integrated Solr to provide better experiences to mobileusers.Solr @ eBay KleinanzeigenOLAF ZSCHIEDRICH | EBAY.DEAttendees will learn how eBay Germany has implemented Solr, why Solr was selected, which Solrfeatures are utilized. and how Solr is configured and used in production. Recommended bestpractices will be profiled alomng with eBay Kleinanzeigen plans for future deployment of Solr. 22
  • 23. LUCENE REVOLUTION San Francisco 2011Rapid Prototyping with SolrERIK HATCHER | LUCID IM AGINATIONGot data? Let’s make it searchable! This interactive presentation will demonstrate getting documentsinto Solr quickly, will provide some tips in adjusting Solr’s schema to match your needs better, andfinally will discuss how showcase your data in a flexible search user interface. We’ll see how torapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there willbe enough time left to outline the next steps in developing your search application and taking it toproduction.Search Analytics: What? Why? How?OTIS GOSPODNETIC | SEM ATEXTYou’ve indexed your data and people are searching it. But how do you know if they are happy withthe results? How do you know if they are finding what they need? With search increasinglybecoming the primary information access mechanism, knowing how your search is doing is not justa matter of mere curiosity, but often has direct business impact. In this talk we’ll talk about SearchAnalytics and how it can be used to answer questions like: ! Are too many users getting the dreaded “no matches” results? ! How deep into search results do people dig? ! Which hits are they clicking on, or what percentage of them don’t click on any hits? ! How much do they use the Did You Mean or Auto-Complete suggestions?We’ll explore what specific Search Analytics reports tell us and what specific actions you should takebased on those reports. 23
  • 24. San Francisco 2011 LUCENE REVOLUTION“Stump The Chump”: GetOn The Spot Solutions To Your Real Life Solr/Lucene ChallengesGRANT INGERSOLL | LUCID IM AGINATIONGot a tough problem with your Solr or Lucene application? Facing challenges that you’d like someadvice on? Looking for new approaches to overcome a Lucene/Solr issue? Not sure how to get theresults you expected? Don’t know where to get started? Then this session is for you.Now, you can get your questions answered live, in front of an audience of hundreds of LuceneRevolution attendees! Back again by popular demand, “Stump the Chump” at Lucene Revolution2011 is hosted by PMC chairman and Lucid Imagination co-founder Grant Ingersoll. All you needto do is send in your questions to us here at info@lucenerevolution.org. You can ask anything youlike, but consider topics in areas like: ! Data modelling ! Query parsing ! Tricky faceting ! Text analysis ! ScalabilityYou can email your questions to info@lucenerevolution.org. Please describe in detail the challengeyou have faced and possible approach you have taken to solve the problem. Anything related toSolr/Lucene is fair game. Our MC will read the questions, and Grant will have to formulate asolution on the spot. A panel of judges will decide if he has provided an effective answer. Prizes willbe awarded by the panel for the best question—and for those deemed to have “stumped thechump”. 24
  • 25. LUCENE REVOLUTION San Francisco 2011Improve Relevance by UsingMorphology and Named Entity RecognitionCHRISTOPH GOLLER, DIRECTOR, RESEARCH | INTRAFIND SOFTW ARE AGThis talk will show how the relevance of search results can be improved by using morphology andnamed entity recognition. After briefly explaining the purpose of morphological analysis and ofnamed entity recognition we will analyze their potential advantages for search, faceting, andclustering of search results. Based on these ideas we will briefly sketch details how to implement amorphological analyzer in Lucene and how to implement a natural language question answeringsystem based on Lucene using named entity recognition. The talk will be accompanied by a lifedemo of these ideas.BIO:Christoph Goller has more than 10 years of experience in the search industry. He got a Ph.D in computer science fromthe Technical University of Munich where he worked in several research projects on artificial intelligence, machinelearning and neural networks. Christoph started his career at Lernout & Hauspie. Since 2002 he has been DirectorResearch of Intrafind Software AG (www.intrafind.de), a German company specializing in full-text search and textmining based on Lucene/Solr. Christoph has been a Lucene committer since 2004. He has accompanied dozens ofcommercial projects using Lucene and Solr. Christoph is author of more than 15 scientific papers, frequently givespresentations on search related topics and is responsible for partner training at Intrafind.Scientific Data Searchin the Pharmaceutical Industry with SolrJEFFREY GUO, CEO | SEM TIFIC SOFTW ARE, INC.Tremendous amount of experimental information and scientific knowledge has been locked or lostin data silos in the forms of semi-structured or unstructured data in today’s pharmaceutical industry.Out of the box full text search engines do not understand embedded scientific terms and objectsand their relationships to facilitate context sensitive and relevant searches. This presentation willdiscuss a successful implementation at a major pharmaceutical company that utilizes Solr asenterprise search platform and enhances it with chemistry (molecular entities and reactions) searchcapabilities. The scope of the document indexing process is expanded to cover embedded chemistryobjects and terms of various types such as common chemical names, corporate IDs, SMILES, andInChI from documents. Scientifically aware search based on query structure drawing or chemicalterms is therefore enabled. Enterprise scientific search strategies and lessons learned will bediscussed during the presentation.Bio: Founder of Semtific Software, Inc., a company that provides products and services that streamline drug discoveryworkflow and enterprise search of scientific research data. 25
  • 26. San Francisco 2011 LUCENE REVOLUTIONUsing Lucene’s Test FrameworkROBERT M UIR | LUCID IM AGINATIONThe Lucene/Solr community takes testing seriously: we have a suite of over 3500 tests to ensuresoftware quality. Over time we accumulated some useful extensions to JUnit testing, and severalpeople found themselves using our extensions for other projects. We released this “test framework”for the first time in Lucene 3.1, and this talk is a short summary of its feature list to hopefullyencourage you to go check it out for yourself. Find out how you can:! Improve test coverage for custom Lucene components.! Speed up your unit test suite by running tests in parallel! Find resource leaks, localization or timezone-sensitive bugs in your application! Use our extensions to make unit tests easier to write.Bio: Robert Muir, software engineer for Lucid Imagination, us a Lucene/Solr committer & PMC member.Using Apache Solr and Active Directory tounify data access across Intranet, ERP and Filesystem ClusterROBERT W EIßGRAEBER, PROJECT DIRECTOR | LIGHTW ERKSolr is tightly linked into all available data and business intelligence sources in the enterprise:Indexing the TYPO3 CMS-based Intranet, downloads, forms, handbooks, an Oxaion based ERP-Database, and the file system Cluster running Microsoft Distributed File System – using TIKA forfull-text content extraction. All data is connected via ActiveDirectory servers into user based fine-grained access control lists, which are evaluated in real-time and early-binding mode by Solr. Aworldwide Solr-Cluster using different shards gives additional security for world-wide deployment,e.g. keeping confidential data inside the headquarters own data centers.Bio: Robert Weißgraeber is Project Director at Lightwerk, primary specialized in designing, planning and executingcorporate portals. 26
  • 27. LUCENE REVOLUTION San Francisco 2011Thousands of Indexes in the CloudSHANEAL M ANEK, LEAD SEARCH ENGINEER | GREPLINIndexes at Greplin are strange - instead of having one giant index that is searched all the time andupdated infrequently, there are thousands of relatively small indexes that are updated much morefrequently than they are searched. These unorthodox requirements lead to an unorthodoxarchitecture that uses techniques inspired by Zoie and Bobo. We will discuss techniques that allowedus to exploit the inherent shardability and access patterns of our data to build an extremely highthroughput information retrieval architecture. We will also examine some of the challenges andopportunities presented by running Lucene on Amazon’s Elastic Compute cloud.Bio: Shaneal Manek is the lead search engineer at Greplin. He was previously the founder and CTO of Signpost.com,which built a geospatial search and recommendation engine on top of Lucene and Lisp. 27
  • 28. San Francisco 2011 LUCENE REVOLUTIONIntuit’s Live CommunityFLOYD M ORGAN | INTUITTurboTax Live Community is a large-scale web application that uses user contribution and opensource technology to assist millions of TurboTax users complete their tax returns. Other benefitsfrom Live Community include reducing support calls, highly effective advertising campaigns,usability engineering and new for this year conversion prediction analytics. I will present howSolr/Lucene powers the many facets of TurboTax Live Community now in the future.Highly Relevant Search Result Ranking forLarge Law Enforcement Information Sharing SystemsRONALD M AYER | FORENSIC LOGICLaw enforcement data has many interesting complexities for search. Cross-agency searches are evenmore challenging because each agency has its own shorthand. Many different types of similaritybetween search clauses and documents should influence the ranking of results. For example, asearch clause mentioning a “tall suspect” might want to include results with “6 foot 4 suspect”.Spatial clusters are important, as are temporal patterns. Different fields may be more or lessimportant depending on the type of crime—for example, a victim’s race may matter more than avehicle’s make in a sex crime but less in an auto theft. Also, documents may be related to each otherin various ways that may also affect their ideal search ranking.Solr’s great flexibility in its analyzers, filters, synonyms, and boosting make it excellent tool for suchdiverse requirements. We’ve contributed a patch to Solr (#SOLR-2058) that helped further improvesearch result ranking for cases where a search for a suspect with a “red baseball cap, black leatherjacket” is compared against many documents mentioning red caps, black caps, etc. This presentationwill describe how we addressed some domain-specific challenges of our data.Using Solr/Lucene/LWE for eCommerceGRANT INGERSOLL | LUCID IM AGINATIONIf your user can’t find it, they can’t buy it right? In this talk, Apache Lucene and Solr committerGrant Ingersoll will discuss architecture, techniques and tips for successfully deploying search toolslike Lucene, Solr and LucidWorks Enterprise in eCommerce environments. 28
  • 29. LUCENE REVOLUTION San Francisco 2011Flexible Indexing in Lucene 4.0UW E SCHINDLER | SD DATASOLUTIONSApache Lucene’s next major release, 4.0, will introduce lots of flexibility into indexing, but alsofundamental changes to the well-known APIs: It features a new and consistent, 4-dimensionaliteration API on top of a low-level, pluggable codec API giving applications full control over thepostings data. Terms are now arbitrary opaque bytes enabling users to store terms in any encoding,not necessarily UTF-8, natively in the index (e.g. numeric fields). Currently under development is ahigher performance postings iteration API, enabling interesting codecs based on recent encodingalgorithms to work effectively. Several codecs have already been created, including the default“standard” codec, which enables sizable RAM reduction for searchers, and a “pulsing” codec thatinlines postings data directly into the terms dictionary, which provides a solid performance boost forprimary key fields. A lot of new codecs are under development like “PFOR”, “FOR”, “AFOR”, or“Simple64”. In this talk, Uwe presents an overview of all of these exciting changes, as well as severalconcrete, real-world examples of how applications can tap into these new features.Transforming the House Hunting Experience: How Solr is HelpingTrulia Reshape the Real Estate IndustryALEXANDER KANARSKY | TRULIATrulia is a real estate search company that helps customers find homes for sale or to rent andprovides them with information to help them make better decisions in the process. It is also a hubfor real estate professionals to market their listings, view real estate data and promote their services.The presentation describes how Solr helped Trulia to transform the traditional real estate experienceand make real estate data accessible and understandable to millions of users. It discusses approacheswe took to achieve this by using custom-built distributed index management, indexing integrationwith Hadoop and geospatial search enhancements to Solr. 29
  • 30. San Francisco 2011 LUCENE REVOLUTIONExtending Solr: Behind CareerBuilder’sCloud-like Knowledge Discovery PlatformTREY GRAINGER| CAREERBUILDERFor CareerBuilder, a 1% deviance in search relevancy can mean millions of missed job opportunitiesfor our users. When CareerBuilder moved to Solr from an expensive, proprietary search vendor, ourtop priorities were maintaining the quality of our search results and drastically improving our agility.This talk will describe how we addressed both needs. For search quality, we’ll cover some of ourinternal studies and resulting methods for dealing with multi-lingual content across dozens oflanguages, as well as customizing and experimenting with relevancy calculations. For platform agility,we’ll discuss CareerBuilder’s cloud-like search API framework which seamlessly handles millions ofsearches an hour, processes hundreds of millions of documents, and is powered by hundreds ofglobally-distributed servers. Come hear the results of our studies and some best practices for qualityand performance. Learn how our framework has lead to staggering improvements in bothmaintainability and technology innovation, allowing us to learn from our content, not just find it.Handy Installation Tool “Anuenue” for Solr Cluster & Implemen-tation of “Did you mean” Facility for Queries in JapaneseTAKAHIKO ITO| M IXImixi is one of the largest social networking services in Japan, providing various communicationservices for over 14M monthly active users. The latest internal mixi project is to replace the in-housesearch engine with Apache Solr. This session covers two topicsa simple packaging system for Solr that eases the installation process and daily operations, andimplementation of a “Did you mean” facility for Japanese queries using a log mining tool. Thesetools have been released as OSS projects.Implementing Click-throughRelevance Ranking in Solr and LucidWorks EnterpriseANDRZEJ BIALECKI | LUCID IM AGINATIONThis talk will present what are click-through events and how to process them with LucidWorksEnterprise. This innovative technique puts powerful search and relevancy at your fingertips—at afraction of the time and effort required to program them yourself with native Apache Solr. Andrzejwill discuss and present how you can use LucidWorks Enterprise for: ! Click Scoring to automatically configure relevance for most popular results ! Simplified implementation of auto-complete and “did-you-mean” functionality ! Unsupervised feedback to automatically provide relevance improvement on every query 30
  • 31. LUCENE REVOLUTION San Francisco 2011Using Solr to find the Right Person for the Right JobLAURA KANG | THELADDERSIn this talk, we’ll describe how TheLadders.com uses Lucene/Solr to instantly recommendcandidates to a recruiter when he/she posts a job on the recruiter site. Our matching algorithmscores candidates from our job seeker site based on the criteria and description of jobs and jobseekers’ resume and profile data. This helps recruiters quickly identify candidates that are right forthe job and increases the chance of our job seekers getting hired.The talk covers an overview of our Solr architecture and a description of our matching algorithm.We’ll also a discuss criteria for evaluating the algorithm, including an overview of our testingsessions and their format. Finally, we’ll also demo the feature so you can see how it works inpractice.Using Solr For Enabling Highly Customized Sitewide NavigationSHANTANU DEO | AT&TThe organization needed to enable a very customizable form of Global Navigation for the varioustypes of users (based on their profile and other factors). This would normally have involved complexlogic to figure out the appropriate set of links to show for a customer, and would have been amaintenance nightmare. Instead we approached the problem as a search problem. Coupled with anovel encoding scheme we were able to solution the problem simply by searching on the customersprofile groups and return a coherent global navigation using Solr to index the data. This has resultedin a very simple to understand and maintain solution that will stand in good stead in the future. Thepresentation is meant to be a description of using Solr to implement a real-world application.Building Specialized Industry ApplicationsUsing Solr, And Migration From FAST ESPRAHUL AGARW ALLA | UCHIDA SPECTRUM INC.Uchida Spectrum, Inc. is a leader in the Japan search market. USI provides SMART InSight, a searchapplication used by many Fortune 500 companies for specialized industry applications like R&D andquality assurance for manufacturing, claims and customer management etc.Originally SMART/InSight was based on Microsoft FAST. This talk will review howSMART/InSight has migrated from FAST ESP to LucidWorks Enterprise, and howSMART/InSight incorporates virtual data integration, enterprise search, and the ability for users tohave a unified way to navigate diverse data sources, analyze data more easily, and personalize results.Several use cases will be profiled with demonstrations of real-world use cases. 31
  • 32. San Francisco 2011 LUCENE REVOLUTIONThe Seven Deadly Sins of SolrJAY HILL | LUCID IM AGINATIONSloth. Greed. Pride. Lust. Envy. Gluttony. Wrath. Getting started with Solr can present some pitfallsand temptations, often turning into a trial and error process. (Confess - some or all of these mayhave been part of your development project.) Based on a broad swath of experience across Solrimplementations running in some of the largest Fortune 500 companies as well as some of thesmallest start-ups, this talk will cover common mistakes made by newbies and even veterandevelopers—and how to avoid them. You’ll learn how best to face the challenges that can occureither when starting out with a new Solr implementation, or in keeping up with the latestimprovements and changes.Advanced Search and Analytics in 20 MinutesM ARK DAVIS | KITENGAKitenga’s ZettaVox and ZettaSearch products support Solr and Lucene ecosystems at both theingestion point and for the search user. In this talk, I will show how ZettaVox, our professionalcontent mining platform on Hadoop, can be used to index content and rich metadata into aLucidWorks Enterprise installation. Being built on Hadoop, ZettaVox scales up by scaling out. I willthen create an end-user search and analytics experience using our ZettaSearch solution that leveragesthe faceted metadata to enhance information discovery and analysis. All in about 20 minutes.Building SaaS Solutions for Online Media Using Apache SolrALBERTO M IJARES | CANOO ENGINEERING AGSaaS applications have the advantage of remote web deployment that can be instantaneously be usedby potentially any consumer in internet, or of the cost reduction that a Web-based deploymentprovides. The speaker explains in this talk the architecture of an innovative SaaS solution built forAxel Springer media group (Switzerland). This application can extracting remotely the content ofmultiple online newspaper articles, analyze them and classify them, determining which articles arethe most similar to a given one, and integrating back into the article to provide the user with a“related articles” feature. The core components of the analysis process are: language-specific tools(used to filter the superfluous language terms) and semantic knowledge bases (like Wikipedia, usedto enrich the indexed information with new context specific terms, or to disambiguate the extractedterms). In a more technical layer, the speaker will explain the criteria to select the emergingenterprise search framework Apache Solr as platform and how it reduced drastically thedevelopment effort required. 32
  • 33. LUCENE REVOLUTION San Francisco 2011Solr Performance: Key InnovationsYONIK SEELEY | LUCID IM AGINATIONRecent developments in Solr/Lucene have made significant contributions to distributed searchprocessing, scalability, and throughput. In this talk, Yonik Seeley, creator of Solr, will survey keyperformance strategies for building search applications with Solr, and review innovations included inSolr 3.1, as well as forthcoming development work in Solr 4.0 and beyond.Solr and Lucene at EtsyGREGG DONOVAN | ETSYEtsy is using Solr and Lucene to serve queries at a rate of more than 8 billion per year (and growing).In this case study, we will describe how Etsy has integrated Solr/Lucene into our continuousdeployment infrastructure (see: http://codeascraft.etsy.com/2010/05/20/quantum-of-deployment/), allowing for Solr configuration, Java-based indexers, and query parsing logic to gofrom passing tests to production code in minutes. We’ll also discuss how we’re leveraging Solr’s newGeo-search to power both local item search and GeoIP-personalized location autosuggest.We’ll also share how we’ve extended Solr, adding personalized faceting and filtering as well as multi-currency sorting and filtering that accounts for real-time currency fluctuation (contributed in SOLR-2202) Note that code will be open-sourced/contributed for both of these features]. We will shareour real-time monitoring techniques, including how we track Solr replication, query, and GC timesin Ganglia. Finally, we’ll discuss how we’ve used Hadoop-based user analytics to improve relevanceand power data-driven spelling corrections, autocomplete suggestions, and related searches. 33
  • 34. San Francisco 2011 LUCENE REVOLUTIONLucene @ YelpSUDARSHAN GAIKAIW ARI | YELPThis talk describes how the Yelp uses Lucene to provide search services. It includes ! Statistics of Yelp search usage ! Overview of Yelp search architecture: Yelp uses different services to provide searches for different types of data. Some are based on Lucene and some on Solr ! Deeper dive into business and review search. This is the most important search service at Yelp.We will cover: ! Yelp’s implementation of a micro sharded architecture and differences with Katta. ! Yelp extensions to Lucene to implement features such as filters and performance comparison with solr/Bobo ! Yelp’s implementation of index replication. ! Various tricks used at Yelp to make the service faster.Using Solr Cloud to Tame an Index ExplosionJON GIFFORD | LOGGLYWe have hundreds of customers, each of whom may have dozens of shards. To manage thisexplosion of indexes, I’ll describe how we’re using Solr Cloud to manage every index - fromcreation, through migration from box to box, and finally destruction. I’ll describe some of theperformance issues we had to deal with, especially with ZooKeeper.Lots of Facets, FastANNE VELING | BEYONDTREESWe created a web application for a well-known US newspaper, to create a maps-like zoomingapplication on top of the 60,000 newspapers since 1850 and using Solr over the 28,000,000 articlesto create an interactive heatmap over it. The out-of-the-box faceting solution was optimized usingdomain knowledge by order-of-magnitude which allowed us to create a great visual way of exploringtrends in historical newspapers. 34
  • 35. LUCENE REVOLUTION San Francisco 2011CPython Embedded in Solr - Search Solutionfor Python Lovers With the Speed of Native JavaROM AN CHYLA | CERNSPIRES is the biggest bibliographic database for High Energy Physics, ArXiv is the biggest full textrepository for the full text papers in High Energy Physics, and INSPIRE is the biggest digital librarythat merges the two. We must work with result sets bigger than 1 million for citation related queriesand our partners from Astrophysics with 6 million sets, however INSPIRE is written in Python. Sohow do we move several million result sets between the two systems fast? How do we takeadvantage of our special NLP processing pipeline written in Python? How do we join them? We donot use Jython. We do not use pipes. We do not embed Solr inside INSPIRE. We embed INSPIREinto Solr! The talk shows benefits and challenges of this surprisingly elegant solution. 35
  • 36. San Francisco 2011 LUCENE REVOLUTIONRahul AgarwallaHEAD OF INTERNATIONAL BUSINESS, UCHIDA SPECTRUM INC !!!"%6()04,")+"86-Rahul Agarwalla heads international business for Uchida Spectrum Inc, Japan. Previously he hasbuilt and exited two content/technology ventures including Matrix Information, the pioneer ofdigital content syndication in India. He has over 14 years of experience with various searchtechnologies like Verity, FAST ESP and Solr/Lucene.Boris AleksandrovskySEARCH ARCHITECT, YAM M ER -!!!"9$,,(0")+,-Boris Aleksandrovsky works for Yammer, the Enterprise Social Network company, where they aretrying to bring benefits of social media to enterprises by creating discoverable knowledge bases. Hespecializes in solving problems of search, machine learning and data analysis on large scale byemploying distributed and scalable software architectures. Boris has almost completed his PhD inComputer Science and Neuroscience at University of California at Irvine.Josh BerkusCORE TEAM , POSTGRESQL !!!"62(56(0%")+,-Josh Berkus has been working as a database application consultant for 8 years. Josh primarily buildsapplications for the legal and HR industries and does performance tuning. He was also head of SunMicrosystems PosgtreSQL support staff for 2 years and helped launch BI startup Greenplum. 36
  • 37. LUCENE REVOLUTION San Francisco 2011Ed BuecheDISTINGUISHED ENGINEER, EM C !!!"#$%"%&$Ed Bueche is an EMC Distinguished Engineer and one of the Architects of the Documentum xPloresearch engine (part of EMC’s Information Intelligence Group). He has been with Documentum/EMCfor 12+ years and has more than 23 years of experience in performance/development in the industry,including companies like AT&T Bell Labs and Sybase. At Documentum he worked to improveperformance & scalability for all previous Documentum full-text integrations (Verity and FAST). Ed hasbeen a regular speaker for over 11 years at the Documentum worldwide user conferences (both inAmerica, Europe) as well as at EMC World.Andrzej BialeckiTECHNICAL ADVISOR, LUCID IM AGINATION !!!"()%*+*$,-*.,/*&."%&$Andrzej Bialecki, Apache Lucene PMC Member, also serves as project lead for Nutch, and as committerin the Lucene-java, Nutch and Hadoop projects. He has broad expertise across domains as diverse asinformation retrieval, systems architecture, embedded systems, networking and business process/e-commerce modeling. He’s also author of the popular Luke index inspection utility.Roman ChylaRESEARCH FELLOW , CERN !!!"%#0."%1Roman Chyla is a research fellow at CERN, Switzerland. He works in the INSPIRE team to buildthe biggest digital library for the High Energy Physics. He is a developer and also informationspecialist, presented at four conferences, two of them international: Knihovny soucasnosti 2006,CASLIN 2007, IKI 2009, CASLIN 2009.Mark DavisCTO, KITENGA, INC !!!"2*/#.-,"%&$Mark Davis is Founder and CTO of Kitenga, Inc. Previously he served as Principal Engineer atXerox PARC spin-out InXight (acquired by Business Objects) and designed their enterprise productsuite, as well as at Microsoft as a Program Manager for enterprise search and SharePoint. Mark spentnearly a decade as an academic researcher in the defense/intelligence community specializing incross-language search and computational linguistics. He has extensive speaking experience inprofessional and academic forums. 37
  • 38. San Francisco 2011 LUCENE REVOLUTIONShantanu DeoTECHNICAL DIRECTOR, AT&T !!!"$")+,-Shantanu Deo is a Technical Director in AT&T, in charge of their ecommerce CMS team. He is apatent holder and has in the past presented and published his work at the INFORMs conference onOptimization. His interests include web technologies, optimization and lately mobile webcommunications. Shantanu holds a BS in Computer Engineering from the university of Poona, Indiaand MS degrees in the areas of Operations Research and Computer Science from the Louisiana StateUniversity.Esteban DonatoLEAD ARCHITECT, TRAVELOCITY !!!"0$;(.+)&9")+,-Esteban Donato works as Lead Architect for Travelocity. He has worked as Java Developer,Technical Leader and Architect for the last 10 years in different industries. Esteban has beenworking with Solr and Lucene technology for the last 2 years implementing it in different projects.Esteban has given conferences about Solr and Data Mining in Travelocity and Universities inBuenos Aires, Argentina.Gregg DonovanTECHNICAL LEAD SEARCH, ETSY !!!"(%9")+,-Gregg Donovan is currently Technical Lead, Search at Etsy.com, the world’s most vibranthandmade marketplace. He has worked extensively with Solr and Lucene at Etsy, and, previously, atTheLadders.com. At Etsy, located in Brooklyn, NY, he leads the search engineering team as ittackles the challenges presented by a growing international marketplace with a half-million differentsellers in 150 different countries selling tens of millions of items.Stephen DunnHEAD OF TECHNOLOGY STRATEGY, GUARDIAN NEW S AND M EDIA UK !!!"*(24$03&$1")+"4:-Stephen Dunn is Head of Technology Strategy for Guardian News and Media in the UK. He joinedThe Guardian in 1999 where he helps guide the technology strategy for it’s multiple award winningnetwork of web sites and services. His professional interests include open web technologies, digitalidentity and security. Prior to joining the Guardian, Stephen completed his PhD at the Center forComputational Neuroscience and Robotics at Sussex University, UK. 38
  • 39. LUCENE REVOLUTION San Francisco 2011Sudarshan GaikaiwariSOFTW ARE ENGINEER, YELP INC !!!"9(.6")+,-Sudarshan Gaikaiwari is a software engineer working on Yelp’s search team. Prior to Yelp heworked on various information retrieval technologies at Symantec’s Data Loss Prevention group.Jon GiffordCO-FOUNDER, LOGGLY !!!".+22.9")+,-Jon Gifford is the CTO and co-founder of Loggly, where he spends all day coercing Solr intoplaying nice with the cloud, and with high-volume real-time data streams. An active user andfrequent hacker of Lucene since 2004, he’s happy to let Solr take care of some of the hard work fora change. Prior to Loggly, he has spent more than a decade working on Search systems at MinimalLoop, Scout Labs, Technorati and LookSmart. He is concerned that his near-complete web-anonymity is under threat.Otis GospodneticFOUNDER, SEM ATEXT !!!"%(,$(5")+,-Otis Gospodnetic is a coauthor of Lucene in Action (1st and 2nd edition). He has been involved withLucene since 2000 and Solr since 2006. He is also a member of Nutch, and Mahout developmentteams, as well as Lucene Project Management Committee. Otis is an Apache Software Foundationmember and the founder of Sematext, a software development and consulting company focused onSearch & Analytics using open-source technologies like Lucene, Solr, Nutch, Hadoop, HBase,Flume, and more. 39
  • 40. San Francisco 2011 LUCENE REVOLUTIONTrey GraingerSEARCH TECHNOLOGY DEVELOPM ENT TEAM LEAD, CAREERBUILDER !!!")$0((0#4&.3(0")+,-Trey Grainger leads the Search Technology Development group at CareerBuilder.com. Heintroduced Solr to CareerBuilder and led the successful conversion away from the Microsoft FASTESP platform. He has been with CareerBuilder for 4 years, and his search experience includeshandling multi-lingual content across dozens of markets/languages, genetic algorithm and usergroup based relevancy tuning, geo-spatial search and validation, and work on customized payloadscoring models, data mining, clustering, and recommendations. He is responsible for architectingCareerBuilder’s cloud-like search API exposing search as a simple, dynamic, and powerful genericservice abstracted away from a large, globally-distributed architecture. Trey is also the founder andChief Architect of Celiaccess.com, a gluten-free search engine and networking site.Eric GriesPRESIDENT AND CEO, LUCID IM AGINATION !!!".4)&3&,$2&1$&+1")+,-Eric Gries joined Lucid Imagination as the President and CEO, after spending more than 20 years inexecutive leadership roles, where he built high-growth technology-based businesses. Prior to joiningthe company, Eric was an Executive-in-Residence at Granite Ventures. Eric has served as CEO,general manager and vice president for companies in application development, systemsmanagement, networking, financial services and hardware systems, in both the U.S. and Europe.Prior to joining Granite Ventures, Eric led XACCT, a pioneering network mediation market leader,as its president and CEO. XACCT was acquired by Amdocs in 2004, at which time Eric joinedAmdocs’ executive team as Senior Vice President. Earlier in his career, Eric served as generalmanager of Compuware’s Network and Systems Management division, and held productmanagement, marketing, sales and engineering positions at companies such as ACI, CullinetSoftware and DEC.Erik HatcherTECHNICAL STAFF, LUCID IM AGINATION !!!".4)&3&,$2&1$&+1")+,-Erik Hatcher is the co-author of two books, Lucene in Action co-author of Java Development with Ant.Erik has been an active member of the Lucene community - a leading Lucene and Solr committer,member of the Lucene Project Management Committee, member of the Apache SoftwareFoundation as well as a frequent invited speaker at various industry events. Erik earned his B.S. inComputer Science from University of Virginia, Charlottesville, VA. 40
  • 41. LUCENE REVOLUTION San Francisco 2011Jay HillSENIOR SEARCH ARCHITECT, LUCID IM AGINATION !!!".4)&3&,$2&1$&+1")+,-Jay Hill has been building enterprise search applications since 2003, and has worked extensively withAutonomy IDOL, Lucene, and Solr. He is a certified Solr trainer, and is lead author for LucidImagination’s Solr training courses.Grant IngersollCO-FOUNDER, LUCID IM AGINATION !!!".4)&3&,$2&1$&+1")+,-Grant Ingersoll is a founder and member of the technical staff at Lucid Imagination. Grant’sprogramming interests include information retrieval, machine learning, text categorization, andextraction. Grant is a regularly featured speaker at ApacheCon and other industry events. He hasbeen an active member of the Lucene community – a Lucene and Solr committer, co-founder of theApache Mahout machine learning project, chairman of the Lucene Project Management Committee(PMC) as well as a Vice President at the Apache Software Foundation. He is also the co-author ofTaming Text (Manning, forthcoming) covering open source tools for natural-language processing.Grant’s prior experience includes work at the Center for Natural Language Processing at SyracuseUniversity in natural language processing and information retrieval. Grant earned his B.S. fromAmherst College in Math and Computer Science and his M.S. in Computer Science from SyracuseUniversity, NY.Takahiko ItoSOFTW ARE ENGINEER, MIXI, INC !!!",&5&"86-Takahiko Ito received his Ph.D. in Engineering at Nara Institute of Science and Technology,specializing in graph mining. He was a specialist for Japanese and Asian language processing at FastSearch and Transfer prior to joining mixi, Inc as an R&D engineer. Selected Papers include: ! Masashi Shimbo, Takahiko Ito, Daichi Mochihashi, Yuji Matsumoto. On the Properties of von Neumann Kernels for Link Analysis. Machine Learning, 75:37-67, 2009. ! Takahiko Ito, Massashi Shimbo, Taku Kudo, Yuji Matsumoto. Application of Kernels to Link Analysis, The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2005. 41
  • 42. San Francisco 2011 LUCENE REVOLUTIONAlexander KanarskySENIOR SOFTW ARE ENGINEER, TRULIA !!!"04.&$")+,-Alexander Kanarsky is responsible for managing day-to-day operations of Trulia’s indexing andsearch infrastructure and oversees the search related development there. Prior to Trulia he was amember of core development team for Autonomy’s Digital Safe, world’s largest private archive ofelectronic documents.Laura KangTECHNICAL LEAD, SEARCH AND M ATCHING, THELADDERS !!!"*(.$33(0%")+,-Laura Kang holds a B.A. in computer science, mathematics, and economics from University ofCalifornia at Berkeley, and M.S. and Ph.D. in computational mechanism design from HarvardUniversity. She has presented her work at several conferences, including the InternationalConference for Electronic Commerce and the ACM Conference on Electronic Commerce. Beforejoining TheLadders, she was a manager at a NYC technology startup. At TheLadders, she focuseson search and matching algorithms.Sudhakara KaregowdraPRINCIPLE ARCHITECT, TRAVELOCITY !!!"0$;(.+)&9")+,-Sudhakara Karegowdra works as Principle Architect for Travelocity. He has worked as JavaDeveloper, Technical Leader and Architect for the last 14 years in different industries and 10 out ofthose in Travel industry. Sudhakar has been working with Solr and Lucene technology for the last 3years implementing it in different projects. Sudhakara has given conferences about Solr inTravelocity. 42
  • 43. LUCENE REVOLUTION San Francisco 2011Steve KearnsROSETTE PRODUCT M ANAGER !!!"#$%&%()*")+,-Steve is the product manager for the Rosette Platform and is also the subject matter expert for theinternational compliance market within Basis Technology. Prior to Basis Technology, Steve workedat BBN Technologies where he worked on the Broadcast and Web Monitoring Systems, whichcapture and extract open-source intelligence from live television and internet news websites. He hasexperience in information visualization, distributed systems architecture and received his MS inInformation Technology and BS in Computer Information Systems from Bentley University. Healso spoke at the Apache Lucene EuroCon 2010 in Prague, on the topic of Building MultilingualSearch Based Applications.Marc KrellensteinFOUNDER, LUCID IM AGINATION !!!".4)&3&,$2&1$&+1")+,-Marc Krellenstein is the founder of Lucid Imagination. Marc has 30 years’ experience in thecomputer industry, focusing for the last 20 years on information retrieval technology andapplications. Marc was previously Chief Technology Officer and Vice President for Search andDiscovery Technology at Elsevier, the scientific, technical and medical publishing division of Reed-Elsevier. Prior to Elsevier Marc was Chief Technology Officer and Senior Vice President ofEngineering at Northern Light Technology, where he was the founding technologist and led thedesign and development of the Northern Light search service, including designing the data model,query interpretation, relevancy ranking, automatic document classification and patented technologyfor document clustering. Marc has an A.B. in philosophy from Cornellhe earned his M.S. in computer science from the University of Wisconsin at Madison and a Ph.D. inpsychology (cognitive science) from the New School for Social Research, NY.Ronald MayerCTO, FORENSIC LOGIC, INC. !!!"/+0(1%&).+2&)")+,-Ronald Mayer has spent his career with technology start-ups in a number of fields ranging frommedical devices to digital video to law enforcement software. Ron has also been involved in OpenSource for decades, with code that has been incorporated in the LAME MP3 library, thePostgreSQL database, and the PostGIS geospatial extension. His most recent speaking engagementwas when he gave a presentation on a broader aspect of this system to the SD Forum’s EmergingTech SIG titled “Fighting Crime: Information Chokepoints & New Software Solutions” 43
  • 44. San Francisco 2011 LUCENE REVOLUTIONAlberto MijaresCANOO ENGINEERING AG !!!")$1++")+,-Alberto Mijares is a software engineer with more than 10 years of experience. He is Scrum Masterand an agile practitioner. He has a large background in Web technologies and Java, havingparticipated in the past in W3C activities related with Semantic Web. His usual role is either leadingprojects or designing architectures for web applications. He started working in Canoo EngineeringAG (Switzerland) in 2008 and speaks Spanish, English and German. He has a degree in ComputerEngineering. He has participated giving talks in Java and Web related conferences and user groups inSwitzerland and Spain.Floyd MorganINTUIT !!!"&14&")+,-Floyd is a Principal Software Engineer who works in the Central Technology Organization at Intuit,makers of TurboTax, Quickbooks, Quicken and Intuit Payroll, to name a few. Floyd has developedcore features of the flagship TurboTax product line and recently co-founded Intuit’s newest socialdriven technology Live Community. Under Floyd’s direction, Live Community has gone from asmall project to a widely adopted platform used by most Intuit products and services. Floyd earnedhis B.S. from San Diego State University in Computer Science.Stephen O’GradyCO-FOUNDER AND PRINCIPAL ANALYST, REDM ONK !!!"0(3,+1:")+,-Stephen O’Grady is the co-founder and Principal Analyst of RedMonk, a boutique industry analystfirm focused on developers. Founded in 2002, RedMonk provides strategic advisory services tosome of the most successful technology firms in the world. Stephen’s focus is on infrastructuresoftware such as programming languages, operating systems and databases, with a special focus onopen source and big data. Before setting up RedMonk, Stephen worked as an analyst at Illuminata.Prior to joining Illuminata, Stephen served in various senior capacities with large systems integrationfirms like Keane and consultancies like Blue Hammock. Regularly cited in publications such as theNew York Times, NPR, the Boston Globe, and the Wall Street Journal, and a popular speaker andmoderator on the conference circuit, Stephen’s advice and opinion is well respected throughout theindustry. 44
  • 45. LUCENE REVOLUTION San Francisco 2011Timothy PotterSENIOR ENGINEER, NATIONAL RENEW ABLE ENERGY LABORATORY (NREL) !!!"10(."2+;-Timothy is a highly skilled technologist with over 13 years experience delivering innovative softwaresolutions that encompass a wide range of technologies and business sectors. Currently, Mr. Potter isa Senior Engineer at the National Renewable Energy Laboratory (NREL) where he leads the effortto build a large-scale distributed platform for handling smart grid related energy data using Hadoopand NoSQL technologies. Prior to NREL, Timtohy was the CTO for Viyya Technologies where hedeveloped a large-scale content recommendation system based on Solr, Mahout, and Hadooprunning in the Amazon Cloud. As a Senior Software Engineer for the WebLogic Platform at BEASystems, he was the chief inventor of several US Patents that helped revolutionize J2EE-basedenterprise application integration. His technical blog (http://thelabdude.blogspot.com/) is highlyrespected as a guide for other developers in the open-source Java community. Mr. Potter has a BS inMathematics and BA in Economics with honors (summa cum laude) from the University ofColorado.Daniel PotzingerAOE M EDIA GM BH !!!"$+(,(3&$"3(-Daniel Potzinger has more than 10 years of web development experience under his belt. He is askillful hand at developing clean solutions with a particular love of elegant, easily maintained andreusable coding. Daniel is always open to new projects and development methods, such as AgileSoftware development.Over the last few years since joining AOE media, Daniel has played “midwife” to more than 60Enterprise CMS-Projects for such renowned clients as congstar, Cisco WebEx and VMware,Panasonic and the like: taking care of client requirements, directing the development and launchingthe results. 45
  • 46. San Francisco 2011 LUCENE REVOLUTIONCraig ReesSENSIS !"#$%$&()&*+,Craig Rees has been at Sensis since 2008. Craig heads up the content and search groups whichmanage the search capabilities, platforms and operational teams that support the Yellow Pages® andWhite Pages® businesses. Craig is the author of the Sensis Content Strategy and the technologyowner of the Sensis Business Search API. Prior to joining Sensis, Craig worked in digital strategydevelopment and implementation roles in the United Kingdom with companies including BBC, Skyand Argos.Ramon ResmaARCHITECT, TRAVELOCITY ---&./*0"1(%.2&(),Ramon Resma works as an Architect for Travelocity Mobile. He has over 22 years of experience inthe travel industry and has worked on technical leadership roles for Travelocity Architecture, SabreAirline Solutions Architecture, and American Airlines. Ramon has been working with Solr andLucene technology for the last 2 years. Recently he worked on implementing Solr functions forserving location-based content on travel mobile applications.Yonik SeeleyCREATOR OF APACHE SO LR & CO-FOUNDER LUCID IM AGINATION ---&1+%3%)*4%#*.%(#&(),Yonik Seeley is the creator of Solr. He is an expert in distributed search systems architecture andperformance. Yonik has been a prolific Lucene/Solr committer, a member of the Lucene PMC, anda member of the Apache Software Foundation. Yonik’s work experience includes CNET Networks,BEA and Telcordia. He earned his M.S. in Computer Science from Stanford University. 46
  • 47. LUCENE REVOLUTION San Francisco 2011Uwe SchindlerM ANAGING DIRECTOR, SD DATASOLUTIONS GM BH !!!"6$12$($"3(-Uwe is committer and PMC member of Apache Lucene and Solr. His main focus is on developmentof Lucene Java. He implemented fast numerical search and is maintaining the new attribute-basedtext analysis API. He studied Physics at the University of Erlangen-Nuremberg and works asmanaging director for SD DataSolutions GmbH in Bremen, Germany, a company that providesconsulting and support for Apache Lucene and Solr. A primary customer of his company is“PANGAEA – Publishing Network for Geoscientific & Environmental Data” where heimplemented the portal’s geo-spatial retrieval functions with Lucene Java. Uwe had talks aboutLucene at various international conferences like the previous Lucene Revolution, ApacheConEU/US, Lucene Eurocon, Berlin Buzzwords and various local meetups.Tyler TateHEAD OF USER EXPERIENCE, TW IGKIT !!!"!&2:&")+,-Tyler Tate leads user experience at TwigKit where he has helped governments, not-for-profits, andblue-chip corporations build superb search experiences. Tyler also organises the Enterprise SearchLondon meetup and has written for a number of publications including UX Magazine, JohnnyHolland, Smashing Magazine, and UX Booth. Tyler lives in London with his wife Ruth and sonGalileo, and you can keep up with him on Twitter.Joshua TubervilleSEARCH ARCHITECT !!!"(=$0,+19")+,-Joshua Tuberville is a Software Architect with eHarmony.com. With over 15 years of Internettechnology experience, he specializes in high-scale online architectures. He has been with eHarmonyfor the past 9 years and previously worked with Sony, Disney, as well as several startups. Heregularly speaks at user groups and conferences. His recent focus has leading the architecture ofjazzed.com, a new dating site, which uses Solr to allow people to find highly relevant profiles. 47
  • 48. San Francisco 2011 LUCENE REVOLUTIONAnne VelingSEARCH ARCHITECT, BEYONDTREES !!!"#(9+130((%")+,-After a M.Sc. in Computer Science/Artificial Intelligence, Anne worked for several years in thesearch engine industry, designing highly scalable knowledge extraction, clustering and visualizationmodules for search applications. Currently self-employed, helping out global companies create webapplications that involve search. Anne is also busy doing performance troubleshooting, and givesLucene and Solr workshopsDawid WeissASSOCIATE PROFESSOR, INSTITUTE OF COM PUTING SCIENCEPOZNAN UNIVERSITY OF TECHNOLOGY, POLAND !!!")$00+%($0)*")+,-David Weiss shares academic and industrial background: he is an associate professor at the Instituteof Computing Science of Poznan University of Technology in Poland (PhD in InformationRetrieval) and co-owns Carrot Search, a company that provides commercial services revolvingaround text processing, text mining and text clustering. In his spare time Dawid contributes toseveral open source projects, including Carrot2.org, reads books and passionately plays basketballwith a bunch of his old friends. He lives in Poznan, Poland with his wife and two children.Simon WillnauerSOLR / LUCENE COM M ITTER, APACHE LUCENE PM C !!!"$6$)*("+02-Simon is a Lucene core committer and PMC member. During the last couple of years he worked ondesign and implementation of scalable software systems and search infrastructure. He studiedComputer Science at the University of Applied Sciene Berlin. Currently, he work as a consultant forApache Solr, Lucene Java and Hadoop and is a co-organizer of the “BerlinBuzzwords” conferenceon Scalability June 2011 in Berlin (Germany). 48
  • 49. LUCENE REVOLUTION San Francisco 2011Olaf ZschiedrichHEAD OF TECHNOLOGY EBAY KLEINANZEIGEN :.(&1$17(&2(1"(#$9"3(-Olaf leads development for eBay Kleinanzeigen, Germany’s number one classifieds ad site. Beforethat he was part of the core architecture team at the mobile.international GmbH. He also workedfor Siemens TS where he was involved in building the Customer Information System for the MTANew York City Transit subway system. He has a passion for high-traffic web applications, searchtechnologies, agile development methods and is a believer in open source. 49
  • 50. San Francisco 2011 LUCENE REVOLUTIONHotel InformationADDRESS Hyatt Regency San Francisco Airport 1333 Bayshore Highway, Burlingame, California, USA 94010 Tel: +1 650 347 1234 Fax: +1 650 696 2669 !""#$%%&&&()*+,)*-.(-/).,#/,"!0)""-/12DIRECTIONSFROM SAN FRANCISCO INTERNATIONAL AIRPORT (2 M ILES): Take 101 South toward San Jose. Exit Millbrae Ave. Turn left on Millbrae Ave. Turn right at the second stoplight onto Bayshore Hwy. Proceed through 4 stoplights. Our Burlingame California hotel is on the right hand side.FROM OAKLAND AIRPORT (APPROXIM ATELY 30 M ILES) AND POINTS EAST: Take I-880 South toward San Jose. Merge onto CA-92 W toward San Mateo Br. Merge onto US-101 N toward San Francisco to the Broadway Exit. Take the Airport Blvd ramp toward Bayshore Blvd, then turn left onto Bayshore Hwy to our Burlingame lodging.FROM SAN JOSE AIRPORT (APPROXIM ATELY 30 M ILES) AND POINTS SO UTH: Take 101 North to the Broadway Exit. Take the Airport Blvd ramp toward Bayshore Blvd, then turn left onto Bayshore Hwy to the hotel. 50
  • 51. LUCENE REVOLUTION San Francisco 2011HOTEL MAPSM EETING ROOM S Hyatt Regency San Francisco Airport DIRECTIONS From San Francisco Int’l Airport (2 miles): Take 101 South. Exit Millbrae Ave. East. Turn right at stoplight onto Bayshore Hwy. Proceed through 4 stoplights. Hotel is on right. 51
  • 52. San Francisco 2011 LUCENE REVOLUTIONM AP OF HOTEL AND AIRPORT Hyatt Regency DIRECTIONS From San Francisco Int’l Airpor Turn right at stoplight onto Bay on right. 52
  • 53. LUCENE REVOLUTION San Francisco 2011PUBLIC TRANSPORTATION (BART): 53
  • 54. San Francisco 2011 LUCENE REVOLUTIONSAN FRANCISCO DOW NTOW N 54
  • 55. Cloud-scale enterprise search begins hereSalesforce.com is the enterprise cloud computing leader and the worldís 4th fastest-growing company.Our Search Team is experienced, with deep architecture expertise. Weíre dedicated to delivering thefastest, most reliable cloud-scale enterprise search. If you share our passion, come introduce yourself. www.salesforce.com
  • 56. !"#$%&()&*$+(,-+.#/( )&*$+(,-+.#/(I-J(<-+4$.-*(www.documill.com !"#$%&0(1-23&&2+34&-(560(785950(:*;""0(<=>?@>! 4-A(BC9D(97(67D(5DCE0(F+G(BC9D(E(895H(8878
  • 57. San Francisco 2011 LUCENE REVOLUTION58

×