Kitenga               reinventing informationMark DavisFounder/CTO
EnablingBig DataSearch viathe LucidReST API
Big	  Data	  	  Enormous	  transactional	  data	  Enormous	  unstructured	  information	  Too	  big	  for	  databases	  Ne...
kilobyte (kB) 103 210 kibibyte(KiB) 210 megabyte (MB)106 220 mebibyte (MiB) 220gigabyte (GB) 109 230gibibyte (GiB) 230 ter...
Indexing	  Challenges	  	  Complex,	  varied	  data	  Compute-­‐intensive	  metadata	  generation	  Schema	  and	  collect...
Initial	  Query	              Refine	  Query	       Evaluate	                                                              ...
The	  Solution	  Enable fast metadata generation:      Hadoop      Mahout      GPUsManage and control collections and sche...
SQL 	     Search	             RDBMS      	     Documents	  Transactional	  Data 	     Text	  Classification	             BI...
Machine-­‐Learning	                         Finite	  State	  Transducer                                                   ...
Resource	  Integration	  Facet	  Browsing	                             Facet	  Charting	     Spellcheck	                  ...
¡  Start	  to	  POC	  in	  a	  week	  ¡  Open	  source	  intelligence	  problems	  
ZettaSearch	  GOAL:	  Be	  more	  competitive	                                                                            ...
¡  Understand	  IP	  among	  competitors	  ¡  Assist	  legal	  team	  with	  litigation	  ¡  Custom	  search	  experien...
Documents	                   Size	  Dell	                 102,508	                     9Gb	  EMC	                  303,678...
ZettaSearch	  GOAL:	  Discover	  new	  drugs,	  detect	  side-­‐    effects,	  speed	  R&D	                                ...
¡  Lousy	  search	  (Google	  Search	  Appliance)	  ¡  Internal	  regulators	  can’t	  find	  by	  accession	      number...
ZettaSearch	  GOAL:	  Build	  “second	  screen	                                                                           ...
¡  Crawlers	  on	  Hadoop	  ¡  Document	  format	  crackers	  on	  Hadoop	  ¡  Extractors	  on	  Hadoop	  ¡  Filters	 ...
¡  Missing	  piece	  of	  the	  puzzle	  ¡  Addresses	  the	  impedance	  mismatch	  between	      Big	  Data	  technolo...
¡  Create	  collections	  ¡  Delete	  collections	  ¡  Update	  collection	  properties	  ¡  Create	  schema	  ¡  Mod...
¡  Schema	  interrogation	  ¡  Schema	  binding	  to	  user	  experience	  ¡  Facetted	  search	  ¡  Embedded	  analyt...
¡    Big	  Data	  search	  and	  analytics	  has	  many	  challenges:	        §    Volume	  of	  data	        §    Vari...
Analyst	  Browser                 	                                             Enterprise	  servers	                     ...
Analyst	  Browser                 	                                                                                Enterpr...
Questions?	  
Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience
Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience
Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience
Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience
Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience
Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience
Upcoming SlideShare
Loading in...5
×

Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

762
-1

Published on

Presented by Mark Davis, CTO Kitenga - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012

Kitenga's Analyst system uses the LucidWorks Enterprise REST API in a variety of ways, including for configuring collections and managing Solr schema. As part of the Kitenga platform, the ZettaSearch Designer empowers the end-user to dynamically drag-and-drop search widgets to create a specialized search interface. For a user to effectively design search UIs that meet their needs, they need to be able to understand the available schema fields that populate a given collection. ZettaSearch Designer interrogates the Solr infrastructure using the Lucid REST API to provide an overview of the available metadata. It is then easy for the user to build rich, facetted search experiences around the metadata library indexed into the collection. In this implementation overview, I will describe the design of ZettaSearch Designer, how it interacts with big data technologies like Hadoop as part of the indexing pipeline, and how it uses the LucidWorks API to enable user discovery of the metadata needed to create novel search user interfaces on the fly.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
762
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

  1. 1. Kitenga reinventing informationMark DavisFounder/CTO
  2. 2. EnablingBig DataSearch viathe LucidReST API
  3. 3. Big  Data    Enormous  transactional  data  Enormous  unstructured  information  Too  big  for  databases  New  tools  are  needed    
  4. 4. kilobyte (kB) 103 210 kibibyte(KiB) 210 megabyte (MB)106 220 mebibyte (MiB) 220gigabyte (GB) 109 230gibibyte (GiB) 230 terabyte(TB) 1012 240 tebibyte (TiB)240 petabyte (PB) 1015 250pebibyte (PiB) 250 exabyte(EB) 1018 260 exbibyte (EiB)260 zettabyte (ZB) 1021 270zebibyte (ZiB) 270 yottabyte(YB) 1024 280 yobibyte (YiB)280 Volume   Velocity   Variety  
  5. 5. Indexing  Challenges    Complex,  varied  data  Compute-­‐intensive  metadata  generation  Schema  and  collection  management     Gather   Extract  Metadata   Index   Resources   •  Crawl   •  Named   •  Schema   •  Crack  formats   entities   definition   •  Categories   •  Collection   •  Machine   management   learning   •  Semantic   analysis  
  6. 6. Initial  Query   Refine  Query   Evaluate   Relevance   •  Keyword   •  Analytic   •  Read  KWIC   guesses   tools   •  Read   •  Category   •  Facetted   metadata   guidance   guidance   •  Read   document  Search  Experience  Challenges    Complex,  varied  data  Resource  discovery  Facetted  search  experience  management    
  7. 7. The  Solution  Enable fast metadata generation: Hadoop Mahout GPUsManage and control collections and schema: LucidWorks Enterprise API
  8. 8. SQL   Search   RDBMS   Documents  Transactional  Data   Text  Classification   BI  Tools   Taxonomies   Ontologies  
  9. 9. Machine-­‐Learning   Finite  State  Transducer  Finite  State  Transducer   Finite  State  Transducer   Parts-­‐of-­‐Speech  Tagging   Lemmatization   Tokenization  
  10. 10. Resource  Integration  Facet  Browsing   Facet  Charting   Spellcheck   Autosuggest   Query  Language   Indexing   Metadata  Extraction  
  11. 11. ¡  Start  to  POC  in  a  week  ¡  Open  source  intelligence  problems  
  12. 12. ZettaSearch  GOAL:  Be  more  competitive   Facetted SearchSOURCES:  Patents,  PR   and Analytics announcements,  legal  documents,   relationships   whitepapers,  crawled  websites   metadata   entities   ZettaVox   data  ANALYSIS:  Extract  named  entities  and   relationships,  classify  and  label;   visually  understand  relationships  and   trends   Sources  ACTION:  Change  R&D  priorities  and   improve  marketing  approaches   13
  13. 13. ¡  Understand  IP  among  competitors  ¡  Assist  legal  team  with  litigation  ¡  Custom  search  experience  ¡  Custom  extractors:   §  Electronic  parts   §  Memory  types   §  Flash  memory   . 5/15/12 14
  14. 14. Documents   Size  Dell   102,508   9Gb  EMC   303,678   14Gb  Huawei   11,912   890Mb  Kingston   2,534   134Mb  Lenovo   8,305   542Mb  NEC   3,900   252Mb  Nokia   174,681   22Gb  Panasonic   5,804   473Mb  Rim   181   8Mb  Sharp  USA   31,918   4.9Gb   645,421   60.2Gb   5/15/12 . 15
  15. 15. ZettaSearch  GOAL:  Discover  new  drugs,  detect  side-­‐ effects,  speed  R&D   Facetted Search and AnalyticsSOURCES:  Published  research  reports,   relationships   pathways   patents,  adverse  effects  databases,   sequences   entities   ZettaVox   genomics  and  proteomics  databases   data  ANALYSIS:  Extract  named  entities  and   relationships,  classify  and  label;  visually   discover  trends  and  relationships  ACTION:  Change  R&D  priorities   Sources   16
  16. 16. ¡  Lousy  search  (Google  Search  Appliance)  ¡  Internal  regulators  can’t  find  by  accession   number  ¡  Custom  extractors:   §  Accession  number   §  Ontology  of  active  ingredients   §  Drug  names   © 2012 Kitenga Proprietary 17
  17. 17. ZettaSearch  GOAL:  Build  “second  screen   Facetted Search experiences”   and AnalyticsSOURCES:  wikipedia,  IMDB,  blogs   relationships  ANALYSIS:  Extract  named  entities  and   metadata   entities   ZettaVox   data   relationships,  preserve  existing   structural  metadata  ACTION:  Enable  new  media  experiences   Sources   18
  18. 18. ¡  Crawlers  on  Hadoop  ¡  Document  format  crackers  on  Hadoop  ¡  Extractors  on  Hadoop  ¡  Filters  on  Hadoop  ¡  HTTP  documents  to  Solr  sharded  cluster  ¡  Intermediary  files  remain  on  HDFS  for   reprocessing  
  19. 19. ¡  Missing  piece  of  the  puzzle  ¡  Addresses  the  impedance  mismatch  between   Big  Data  technologies  and  Solr  search  ¡  Manage  collections  ¡  Manage  schema  
  20. 20. ¡  Create  collections  ¡  Delete  collections  ¡  Update  collection  properties  ¡  Create  schema  ¡  Modify  schema  
  21. 21. ¡  Schema  interrogation  ¡  Schema  binding  to  user  experience  ¡  Facetted  search  ¡  Embedded  analytics  
  22. 22. ¡  Big  Data  search  and  analytics  has  many  challenges:   §  Volume  of  data   §  Variety  of  data   §  Velocity  of  data   §  Extracting  structure  from  unstructured  information  ¡  Hadoop  processing  enables  each  of  these  aspects  ¡  Controlling  indexing  and  search  is  enabled  by  the   Lucid  Imagination  search  API  ¡  We  can  enable  complex  user  interactions  with  Big   Data  on  a  self-­‐serve  basis  
  23. 23. Analyst  Browser   Enterprise  servers   Cloud  services   Tomcat  App  Server   Amazon  S3   Tomcat   Web  Services   Enterprise   ZettaVoxServices   Cloud   XML   Manager   ZettaVox   +   Author   JSON   GPU   Hadoop   RIA   Search  Indexing   Services   Services   Manager   Manager   ReST   JSON   GPU  MR  Service   Hadoop  Server   Hadoop  Server   Manager   Name  node   Job  Tracker   GPU   GPU   Hadoop   Hadoop     Task  Manager   Hadoop Task  Manager     Quantum4D   Task  Manager RDBMS   Entity   Mahout   Crawling   Extraction   ©  2012    Kitenga  Proprietary  
  24. 24. Analyst  Browser   Enterprise  servers   Search  Indexing   • Get  collection  information   • Create  new  collection   • Create  fields   • Delete  fields   • Edit  fields   ZettaVox   ReST   Author     RIA   JSON   Hadoop  Server   Hadoop  Server   Name  node   Job  Tracker   Hadoop   Hadoop     Task  Manager   Hadoop Task  Manager     Task  Manager Entity   Mahout   Crawling   Indexing   Extraction   ©  2012    Kitenga  Proprietary  
  25. 25. Questions?  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×