Discover HDP 2.1: Apache Solr for Hadoop Search

5,380 views

Published on

Apache Solr is the open source platform for searching data stored in Hadoop. Solr powers search on many of the world's largest Internet sites, enabling powerful full-text search and near real-time indexing. Whether users search for tabular, text, geo-location or sensor data in Hadoop, they find it quickly with Apache Solr. Hortonworks Data Platform 2.1 includes Apache Solr.

In this deck from their 30-minute webinar, Rohit Bakhshi, Hortonworks product manager, and Paul Codding, Hortonworks solution engineer describe how Solr works within HDP's YARN-based architecture.

Published in: Software, Technology

Discover HDP 2.1: Apache Solr for Hadoop Search

  1. 1. Page 1 © Hortonworks Inc. 2014 Discover HDP 2.1 Apache Solr for Hadoop Search Hortonworks. We do Hadoop.
  2. 2. Page 2 © Hortonworks Inc. 2014 Speakers Justin Sears Hortonworks Product Marketing Manager Rohit Bakhshi Hortonworks Senior Product Manager & PM for Apache Hadoop & Apache Solr in Hortonworks Data Platform Paul Codding Hortonworks Solution Engineer, focused on customer success with Apache Storm & Apache Solr
  3. 3. Page 3 © Hortonworks Inc. 2014 Agenda •  Overview of Apache Solr and Hadoop Search •  Hadoop Search Demo •  Q & A
  4. 4. Page 4 © Hortonworks Inc. 2014 OPERATIONS  TOOLS   Provision, Manage & Monitor DEV  &  DATA  TOOLS   Build & Test A Modern Data ArchitectureAPPLICATIONS  DATA    SYSTEM   REPOSITORIES   RDBMS   EDW   MPP   Business     Analy<cs   Custom   Applica<ons   Packaged   Applica<ons   Governance &Integration ENTERPRISE HADOOP Security Operations Data Access Data Management SOURCES   OLTP,  ERP,   CRM  Systems   Documents,     Emails   Web  Logs,   Click  Streams   Social   Networks   Machine   Generated   Sensor   Data   GeolocaCon   Data  
  5. 5. Page 5 © Hortonworks Inc. 2014 HDP 2.1: Enterprise Hadoop HDP 2.1 Hortonworks Data Platform     Provision,   Manage  &   Monitor     Ambari   Zookeeper   Scheduling     Oozie   Data  Workflow,   Lifecycle  &   Governance     Falcon   Sqoop   Flume   NFS   WebHDFS   YARN  :  Data  Opera<ng  System   DATA    MANAGEMENT   DATA    ACCESS   GOVERNANCE  &   INTEGRATION   OPERATIONS   Script     Pig       Search     Solr       SQL     Hive/Tez,   HCatalog       NoSQL     HBase   Accumulo       Stream       Storm         Others     In-­‐Memory   AnalyCcs,     ISV  engines   1   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   N   HDFS     (Hadoop  Distributed  File  System)   Batch     Map   Reduce       SECURITY   Authen<ca<on   Authoriza<on   Accoun<ng   Data  Protec<on     Storage:  HDFS   Resources:  YARN   Access:  Hive,  …     Pipeline:  Falcon   Cluster:  Knox  
  6. 6. Page 6 © Hortonworks Inc. 2014 HDP 2.1: Enterprise Hadoop HDP 2.1 Hortonworks Data Platform     Provision,   Manage  &   Monitor     Ambari   Zookeeper   Scheduling     Oozie   Data  Workflow,   Lifecycle  &   Governance     Falcon   Sqoop   Flume   NFS   WebHDFS   DATA    MANAGEMENT   GOVERNANCE  &   INTEGRATION   OPERATIONS   Script     Pig       SQL     Hive/Tez,   HCatalog       NoSQL     HBase   Accumulo       Stream       Storm         Others     In-­‐Memory   AnalyCcs,     ISV  engines   1   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   N   HDFS     (Hadoop  Distributed  File  System)   Batch     Map   Reduce       SECURITY   Authen<ca<on   Authoriza<on   Accoun<ng   Data  Protec<on     Storage:  HDFS   Resources:  YARN   Access:  Hive,  …     Pipeline:  Falcon   Cluster:  Knox   YARN  :  Data  Opera<ng  System   DATA    ACCESS   Search     Solr      
  7. 7. Page 7 © Hortonworks Inc. 2014 Agenda Overview Features Q & A
  8. 8. Page 8 © Hortonworks Inc. 2014 Search: Overview Expanded Data Access Interfaces to Hadoop BATCH   MapReduce   INTERACTIVE   Tez   STREAMING   Storm   ONLINE   HBase,  Accumulo   HDFS:  Redundant,  Reliable  Storage   YARN:  Cluster  Resource  Management       SEARCH   Solr  
  9. 9. Page 9 © Hortonworks Inc. 2014 Search: Overview Apache Solr Open source enterprise search for Hadoop and HDP •  Open architecture: In the community, for the community •  Simple, powerful UI for advanced search applications •  High performance indexing & sub-second search times over billions of documents •  Deep Integration Roadmap with HDP LucidWorks Hortonworks partner for search •  Enterprise support provided as partnership with LucidWorks •  9 committers total (7 PMC) for Apache Solr
  10. 10. Page 10 © Hortonworks Inc. 2014 Agenda Overview Features Q & A
  11. 11. Page 11 © Hortonworks Inc. 2014 Open Source Components for HDP Search Comprehensive  enterprise  search  using  open  source   technologies:   + •  High-Performance Indexing •  Powerful, Accurate & Efficient Search Algorithms •  Ranked & Field searching •  Flexible faceting, highlighting, joins & result grouping •  Pluggable ranking models •  Advanced Full-Text Search Capabilities •  Optimized for High Volume Web Traffic •  Standards Based Open Interfaces - XML, JSON and HTTP •  Comprehensive HTML Administration Interfaces •  Server statistics exposed over JMX for monitoring •  Linearly scalable
  12. 12. Page 12 © Hortonworks Inc. 2014 Scalable Indexing of Data in HDFS •  Ingest: MapReduce job –  CSV –  Microsoft Office files –  Grok (log data) –  Zip –  Solr XML –  Seq files –  WARC •  Processing: Apache Pig –  Write your own pig scripts to index content –  Pig for preprocessing and joining –  Output the resulting datasets to Solr HDFS MapReduce or Pig Job Solr Raw Documents Lucene Indexes
  13. 13. Page 13 © Hortonworks Inc. 2014 Search: Reference Architecture HDFS     (Hadoop  Distributed  File  System)   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   MapReduce   Indexing  Job  
  14. 14. Page 14 © Hortonworks Inc. 2014 HDP Search Demo Paul Codding
  15. 15. Page 15 © Hortonworks Inc. 2014 Agenda Overview Features Q & A
  16. 16. Page 16 © Hortonworks Inc. 2014 Learn More About Hadoop Search Hortonworks.com/hadoop/solr/ Register for the remaining 2 Discover HDP 2.1 Webinars Hortonworks.com/webinars Next Webinar: Apache Storm for Stream Data Processing in Hadoop Thursday, June 19, 10am Pacific
  17. 17. Page 17 © Hortonworks Inc. 2014 Thank you!

×