Cloudera Search Webinar: Big Data Search, Bigger Insights

Like this? Share it with your network

Share

Cloudera Search Webinar: Big Data Search, Bigger Insights

  • 2,242 views
Uploaded on

Cloudera Search brings full-text, interactive search and scalable indexing to data in HDFS and Apache HBase. Powered by and adding to Apache Solr, Cloudera Search fully integrates with CDH to bring......

Cloudera Search brings full-text, interactive search and scalable indexing to data in HDFS and Apache HBase. Powered by and adding to Apache Solr, Cloudera Search fully integrates with CDH to bring scale and reliability for next-generation open source search -- Big Data search.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,242
On Slideshare
1,416
From Embeds
826
Number of Embeds
7

Actions

Shares
Downloads
78
Comments
0
Likes
4

Embeds 826

http://www.scoop.it 485
http://cloudera.com 210
http://www.cloudera.com 116
http://author01.mtv.cloudera.com 7
http://staging-author01.mtv.cloudera.com 3
http://webcache.googleusercontent.com 3
http://author01.core.cloudera.com 2

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 1Cloudera  Search  Embracing  Apache  Solr  into  Cloudera’s  Pla9orm  for  Big  Data    Eva  Andreasson,  Sr.  Product  Manager,  Cloudera    Steven  Noels,  Co-­‐founder  and  SVP  of  Products,  NGDATA  
  • 2. Who  is  Cloudera?  2  What  the  Enterprise  Requires  §  Only  100%  open  source  Hadoop-­‐based  pla<orm  with  both  batch  and  real-­‐@me  processing  engines,  enterprise-­‐ready  with  na@ve  high  availability  §  Suite  of  system  and  data  management  soEware  §  Comprehensive  support  and  consul@ng  services  §  Broadest  Hadoop  training  and  cer@fica@on  programs  Extensive  Partner  Ecosystem  §  Over  600  partners  across  hardware,  soEware  and  services    The  Leader  in  Big  Data  Management    §  Deliver  a  revolu@onary  data  management  pla<orm  powered  by  Apache  Hadoop  §  World’s  leading  commercial  vendor  of    Apache  Hadoop  §  Enable  organiza@ons  to  improve  opera@onal  efficiency  and  Ask  Bigger  Ques@ons  of  all  their  data  Customers  &  Users  Across  Industries  §  More  produc@on  deployments  than  all  other  vendors  combined  
  • 3.    INGEST   STORE   EXPLORE   PROCESS   ANALYZE   SERVE  CDH   CLOUDERA  MANAGER  CLOUDERA  SUPPORT  Cloudera  Enterprise  3  BRINGS  STORAGE  &  COMPUTE  TOGETHER  WORKS  WITH  EVERY  TYPE  OF  DATA  CHANGES  THE  ECONOMICS  OF  DATA  MANGAGEMENT  A  revolu@onary  solu@on  powered  by  Apache  Hadoop  CLOUDERA  NAVIGATOR  
  • 4. “About  NGDATA  NGDATA  is  the  next  genera@on  Customer  Intelligence  company  that  enables  ac@onable  customer  insights,  personalized  product  offers  and  in@mate  customer  experience  with  a  unique  combina@on  of  interac@ve  Big  Data  management  and  machine  learning  technologies  in  one  integrated  solu@on.  Business ExpertiseEnterpriseArchitecturesBig Data TechnologyMachineLearning,Algorithms,AnalyticsCustomerIntelligenceVISION  &  EXPERTISE   SOLUTION  Customer DatabaseEnterprise DataReferenceDataCustomerDataCustomerEngagementGovernanceand RiskManagementInsights, Trendsand AnalysislilyA  Next  GeneraVon  Customer  Intelligence  Company  
  • 5. Agenda  §  Why  Search?  §  What  is  Cloudera  Search?  §  Using  Cloudera  Search  §  Learn  more  
  • 6. 6Why  Search?  
  • 7. Cloudera’s  Enterprise  Strategy  An  Integrated  Part  of  the  Hadoop  System  One  pool  of  data  One  security  framework  One  set  of  system  resources  One  management  interface  
  • 8. Search  Simplifies  Interac@on  Explore  Navigate  Correlate  Experts  know  MapReduce.  Savvy  people  know  SQL.    Everyone  knows  Search.  
  • 9. Benefits  of  Search  Improved  Big  Data  ROI  •  An  interac@ve  experience  without  technical  knowledge  •  Single  data  set  for  mul@ple  compu@ng  frameworks  9Faster  Vme  to  insight  •  Exploratory  analysis,  esp.  unstructured  data  •  Broad  range  of  indexing  op@ons  to  accommodate  needs  Cost  efficiency  •  Single  scalable  pla<orm;  no  incremental  investment  •  No  need  for  separate  systems,  storage  Solid  foundaVons  and  reliability  •  Solr  in  produc@on  environments  for  years  •  Hadoop-­‐powered  reliability  and  scalability  
  • 10. 10What  is  Cloudera  Search?  
  • 11. Cloudera  Search  InteracVve  search  for  Hadoop  •  Full-­‐text  and  faceted  naviga@on  •  Batch,  near  real-­‐@me,  and  on-­‐demand  indexing  11Apache  Solr  integrated  with  CDH  •  Established,  mature  search  with  vibrant  community  •  Separate  run@me  like  MapReduce,  Impala  •  Incorporated  as  part  of  the  Hadoop  ecosystem  Open  Source  •  100%  Apache,  100%  Solr  •  Standard  Solr  APIs  
  • 12. Scalable  and  Robust  Index  Storage  HDFS  Lucene  Extrac@on   Mapping  Solr  Zookeeper  SolrCloud  Querying  API   Indexing  API  12  Solr  and  HDFS  •  Scalable,  cost-­‐efficient  index  storage  •  Higher  availability  •  Search  and  process  data  in  one  pla<orm  
  • 13. Near  Real  Time  Indexing  at  Ingest  Log  File  Solr  and  Flume  •  Data  ingest  at  scale  •  Flexible  extrac@on  and  mapping  •  Indexing  at  data  ingest  •  Document-­‐level  ACL  HDFS  Flume  Agent  Indexer  Other  Log  File  Flume  Agent  Indexer  13  
  • 14. Streamlined  Extrac@on  and  Mapping  Cloudera  Morphlines  •  Simple  and  flexible  data  transforma@on    •  Reusable  across  mul@ple  index  workloads  •  Over  @me,  extend  and  re-­‐use  across  pla<orm  workloads  syslog   Flume  Agent  Solr  sink  Command:  readLine  Command:  grok  Command:  loadSolr  Solr  Event  Record  Record  Record  Document  
  • 15. Scalable  Batch  Indexing  Index  shard  Files  Index  shard  Indexer  Files  Solr  server  Indexer  Solr  server  15HDFS  Solr  and  MapReduce  •  Flexible,  scalable  batch  indexing  •  Start  serving  new  indices  with  no  down@me  •  On-­‐demand  indexing,  cost-­‐efficient  re-­‐indexing  
  • 16. Scalable  Batch  Indexing  16Mapper:  Parse  input  into  indexable  document  Mapper:  Parse  input  into  indexable  document  Mapper:  Parse  input  into  indexable  document  Index  shard  1  Index  shard  2  Arbitrary  reducing  steps  of  indexing  and  merging  End-­‐Reducer  (shard  1):  Index  document  End-­‐Reducer  (shard  2):  Index  document  
  • 17. Searchable  Real-­‐Time  Data  Indexing  HBase  HDFS  HBase  interac@ve  load  Indexer(s)  Triggers  on  updates  Solr  server  Solr  server  Solr  server  Solr  server  Solr  server  Search  +   =  planet-­‐sized  tabular  data  immediate  access  &  updates  fast  &  flexible  informaVon  discovery  BIG  DATA  DATAMANAGEMENT  
  • 18. Searchable  Real-­‐Time  Data  HBase  &  Search  HBase  SEP  Triggers  &  Indexer  •  HBase  replica@on  mechanism  for  reliable  indexing  •  light-­‐weight,  zero  impact  on  write  performance  •  easy  to  set  up  &  integrate  •  flexible,  configura@on-­‐based  mapping  &  content  extrac@on  Many  use  cases  •  indexes  near-­‐real-­‐@me  HBase  updates  into  Solr  •  fielded  search  on  HBase  columns  •  faceted  search  •  query  by  example  •  datacube  •  secondary  indexes  
  • 19. Simple,  Customizable  Search  Interface  Hue  •  Simple  UI  •  Navigated,  faceted  drill  down  •  Customizable  display  •  Full  text  search,  standard  Solr  API  and  query  language  
  • 20. Simplified  Management  Cloudera  Manager  •  Install,  configure,  deploy  Solr  services  on  the  cluster  •  Unified  management  and  monitoring  •  Resource  management  
  • 21. 21Using  Cloudera  Search  
  • 22. Skybox  •  Advanced  parallel  image  processing  on  images  stored  in  HDFS  •  Before:  difficult  to  interac@vely  evaluate  image  quality  and  correlate  with  satellite  logs  •  Now:  Index  images  and  satellite  logs  at  acquisi@on  and  on  demand,  interac@vely  introspect  image  quality  Scalable,  efficient  image  search  for  analysis  and  process  improvement  
  • 23. Explorys  Medical  "Hadoop  has  been  Explorys  center  of  gravity  for  data  management  since  the  companys  incep@on.  The  addi@on  of  Search  to  Clouderas  pla<orm  expands  its  usability  by  suppor@ng  more  workloads  and  reducing  data  movement  between  infrastructure  systems.  Deploying  Cloudera  Search  supports  Explorys  mission  to  help  healthcare  providers  deliver  beker,  more  cost  efficient  care  through  fast,  flexible  data  analysis."    -­‐-­‐  Michael  Onders,  SVP  &  CTO,  Explorys  Event,  exploraVon,  and  data  correlaVon    to  meet  SLAs  
  • 24. Pakerns  and  Predic@ons  •  Iden@fy  pakerns  in  social  media  and  perform  analy@cs  on  term  usage  to  improve  suicide  predic@ve  capability    •  Before:  Social  media  data  sets  too  large;  tradi@onal  enterprise  search  •  Now:  Near  real-­‐@me  correla@on  of  medical  records,  notes,  social  media;  access  for  doctors  and  non-­‐tech  staff  ProacVve  healthcare  for  returning  military  veterans  
  • 25. Ques@ons  •  Ask  on  the  Q&A  tab      •  Recording  will  be  available    at  cloudera.com    •  A^er  webinar,  inquire  at:  info@cloudera.com      •  Presenters  contact  info:    eva@cloudera.com  stevenn@ngdata.com      Thank  you  for  a,ending!    25Download  Cloudera  Search    cloudera.com/downloads    Learn  more  about  Cloudera  Search,  powered  by  Solr  cloudera.com/search        Learn  more  about  NGDATA  and  Lily  www.ngdata.com