SlideShare a Scribd company logo
Clouds, Search or HLT The 'forecast'?
Benson Margulies
Executive Vice President and Chief Technology Officer

Basis Technology – Human Language Technology Conference 2012   1
Clouds, Search or HLT
                               The 'forecast'?




Basis Technology – Human Language Technology Conference 2012   2
Meteorology - or - Why Clouds

•  Lie	
  on	
  the	
  grass	
  and	
  look	
  up	
  at	
  the	
  clouds	
  
   •  Everyone	
  sees	
  something	
  different	
  

•  Computerized	
  Clouds	
  are	
  no	
  different	
  
      • 
      Applica;ons	
  Always	
  Available	
  
      • 
      Data	
  Always	
  Available	
  
      • 
      Tools	
  for	
  Processing	
  Big	
  Data	
  


Basis Technology – Human Language Technology Conference 2012                   3
Big Data and Clouds =~ Hadoop

•  It's	
  not	
  just	
  a	
  maFer	
  of	
  size	
  
•  Hadoop	
  ...	
  
      o    Takes	
  in	
  structured	
  data	
  sets	
  
      o    Op;mizes	
  stateless,	
  batch	
  processes	
  
      o    Moves	
  computa3on	
  to	
  data	
  
•  All	
  of	
  which	
  is	
  great	
  if	
  that's	
  what	
  you	
  have	
  
•  The	
  world	
  is	
  more	
  complicated	
  than	
  that	
  

Basis Technology – Human Language Technology Conference 2012                      4
What it Doesn't Do So Easily

•  On-­‐the-­‐fly	
  (non-­‐batch)	
  processing	
  
•  Stateful,	
  non-­‐local,	
  processing	
  
•  For	
  example,	
  consider	
  a	
  search	
  engine	
  
      o    All	
  about	
  online:	
  a	
  document	
  arrives,	
  users	
  want	
  
           to	
  find	
  it.	
  
      o    All	
  about	
  global	
  state:	
  relevancy	
  involves	
  global	
  
           data	
  across	
  the	
  whole	
  index.	
  




Basis Technology – Human Language Technology Conference 2012                           5
More on Search-in-a-Cloud

•  Good	
  News:	
  'conven;onal'	
  technologies	
  scale	
  
     to	
  very	
  large	
  indices.	
  
      o    Solr	
  
      o    SolrCloud	
  
      o    Elas;c	
  Search	
  
      o    ...	
  
•  How?	
  Shards.	
  
      o    'hash'	
  to	
  split	
  docs	
  
      o    queries	
  go	
  everywhere	
  

Basis Technology – Human Language Technology Conference 2012     6
Search-in-a-Cloud less good news

•  Alterna;ves	
  are	
  s;ll:	
  
      o  Limited	
  
      o  Research	
  
      o  or	
  both	
  
•    Solandra	
  
      o    Scaling	
  via	
  Cassandra	
  
      o    'just	
  another	
  sharded	
  solu;on'	
  
      o    Just	
  the	
  thing	
  if	
  you	
  like	
  Cassandra	
  
•  	
  or	
  Accumulo	
  
      o    So	
  far,	
  very	
  basic	
  inverted	
  index	
  
      o    beFer	
  things	
  coming	
  
Basis Technology – Human Language Technology Conference 2012            7
Other HLT tasks ...

•  'Extrac;on'	
  is	
  'straighZorward'	
  
•  Text	
  comes	
  in,	
  en;;es	
  or	
  rela;onships	
  come	
  
     out.	
  
•    Results	
  end	
  up	
  in	
  graph	
  DB	
  or	
  bigtable	
  or	
  ...	
  
•    Scale	
  via	
  Hadoop	
  or	
  whatever	
  
•    The	
  Challenge	
  of	
  Mixing	
  and	
  Matching	
  
•    But	
  ...	
  what	
  if	
  you	
  want	
  a	
  feedback	
  loop?	
  



Basis Technology – Human Language Technology Conference 2012                        8
Interoperation

•  Lot's	
  of	
  focus	
  on	
  applica;ons	
  
      o    e.g.	
  Ozone	
  Widgets	
  
•  Not	
  so	
  much	
  on	
  backend	
  processes	
  
•  What	
  good	
  is	
  'data	
  everywhere'	
  if:	
  
      o    you	
  can't	
  deploy	
  processing	
  to	
  exploit	
  it?	
  
      o    you	
  can't	
  fit	
  together	
  pieces	
  of	
  the	
  puzzle?	
  
•  A	
  stovepipe	
  in	
  a	
  cloud	
  is	
  s;ll........	
  
•  A	
  stovepipe	
  
Basis Technology – Human Language Technology Conference 2012                      9
Harder Unstructured Problems

•  Imagine	
  you	
  wanted	
  to	
  cluster	
  ...	
  
•  New	
  items	
  show	
  up	
  
•  Need	
  to	
  find	
  'best'	
  exis;ng	
  cluster	
  
      o    It	
  could	
  be	
  'anywhere'	
  
•  Need	
  to	
  update	
  to	
  reflect	
  each	
  new	
  item	
  
•  (If	
  you're	
  wondering	
  what	
  we're	
  clustering	
  ...)	
  


Basis Technology – Human Language Technology Conference 2012           10
Rosette Concrete Examples

•  Straight	
  Search	
  
      o    RoseFe	
  Solr	
  Plugins	
  work	
  all	
  the	
  same	
  
      o    SolrCloud	
  hashes/shards	
  
      o    RoseFe	
  runs	
  on	
  the	
  target	
  node	
  


•  Extrac;on	
  and	
  similar	
  processes	
  
      o    Same	
  story,	
  using	
  Update	
  Request	
  Processor	
  




Basis Technology – Human Language Technology Conference 2012               11
Rosette and Hadoop

•  Stateless	
  APIs	
  lead	
  to	
  simple	
  implementa;on	
  
•  Non-­‐code	
  resources	
  lead	
  to	
  some	
  issues	
  
•  Stateful	
  processes	
  (e.g.	
  RNI)	
  ...	
  back	
  to	
  Solr	
  




Basis Technology – Human Language Technology Conference 2012                 12

More Related Content

Viewers also liked

Autopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics ConferenceAutopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics Conference
Basis Technology
 
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics PlatformAutopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
Basis Technology
 
Verslag ontkiemen
Verslag ontkiemenVerslag ontkiemen
Verslag ontkiemen
sveetje
 
Basis Technology showcase at elasticsearch meetup in Japan
Basis Technology showcase at elasticsearch meetup in JapanBasis Technology showcase at elasticsearch meetup in Japan
Basis Technology showcase at elasticsearch meetup in Japan
Basis Technology
 
Patagonia
PatagoniaPatagonia
Folleto rehabilitacion cardiaca 3.2
Folleto rehabilitacion cardiaca 3.2Folleto rehabilitacion cardiaca 3.2
Folleto rehabilitacion cardiaca 3.2
Elvis Carnajal Moscoso
 
Individual Student Feedback Diagnostic Report- Sample
Individual Student Feedback Diagnostic Report- SampleIndividual Student Feedback Diagnostic Report- Sample
Individual Student Feedback Diagnostic Report- Sample
Sayed Ali
 
Campus Performace Report
Campus Performace ReportCampus Performace Report
Campus Performace Report
Sayed Ali
 
Campus New Proposal.
Campus New Proposal.Campus New Proposal.
Campus New Proposal.
Sayed Ali
 

Viewers also liked (9)

Autopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics ConferenceAutopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics Conference
 
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics PlatformAutopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
 
Verslag ontkiemen
Verslag ontkiemenVerslag ontkiemen
Verslag ontkiemen
 
Basis Technology showcase at elasticsearch meetup in Japan
Basis Technology showcase at elasticsearch meetup in JapanBasis Technology showcase at elasticsearch meetup in Japan
Basis Technology showcase at elasticsearch meetup in Japan
 
Patagonia
PatagoniaPatagonia
Patagonia
 
Folleto rehabilitacion cardiaca 3.2
Folleto rehabilitacion cardiaca 3.2Folleto rehabilitacion cardiaca 3.2
Folleto rehabilitacion cardiaca 3.2
 
Individual Student Feedback Diagnostic Report- Sample
Individual Student Feedback Diagnostic Report- SampleIndividual Student Feedback Diagnostic Report- Sample
Individual Student Feedback Diagnostic Report- Sample
 
Campus Performace Report
Campus Performace ReportCampus Performace Report
Campus Performace Report
 
Campus New Proposal.
Campus New Proposal.Campus New Proposal.
Campus New Proposal.
 

Similar to A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
Alexandru Iosup
 
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
Dr. Haxel Consult
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
Oscar Corcho
 
Big Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case studyBig Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case study
Sharjeel Imtiaz
 
Spark
SparkSpark
The Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedInThe Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedIn
Carl Steinbach
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
RightScale
 
LinkedIn
LinkedInLinkedIn
Not Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache HadoopNot Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache Hadoop
Adaryl "Bob" Wakefield, MBA
 
Carpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSyncCarpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSync
nisohq
 
Resource Sync - Introduction
Resource Sync - IntroductionResource Sync - Introduction
Size does not matter (if your data is in a silo)
Size does not matter (if your data is in a silo)Size does not matter (if your data is in a silo)
Size does not matter (if your data is in a silo)
Ora Lassila
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera, Inc.
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
CS, NcState
 
Sogeti labs developer-today-v1.1
Sogeti labs developer-today-v1.1Sogeti labs developer-today-v1.1
Sogeti labs developer-today-v1.1
Laurent Guérin
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
BigDataEverywhere
 
Some news about the SW
Some news about the SWSome news about the SW
Some news about the SW
Ivan Herman
 
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseDeep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the Enterprise
Josh Patterson
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
Discover Pinterest
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital Preservation
PRELIDA Project
 

Similar to A Lightning Introduction To Clouds & HLT - Human Language Technology Conference (20)

Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
Big Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case studyBig Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case study
 
Spark
SparkSpark
Spark
 
The Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedInThe Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedIn
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
 
LinkedIn
LinkedInLinkedIn
LinkedIn
 
Not Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache HadoopNot Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache Hadoop
 
Carpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSyncCarpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSync
 
Resource Sync - Introduction
Resource Sync - IntroductionResource Sync - Introduction
Resource Sync - Introduction
 
Size does not matter (if your data is in a silo)
Size does not matter (if your data is in a silo)Size does not matter (if your data is in a silo)
Size does not matter (if your data is in a silo)
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
 
Sogeti labs developer-today-v1.1
Sogeti labs developer-today-v1.1Sogeti labs developer-today-v1.1
Sogeti labs developer-today-v1.1
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
Some news about the SW
Some news about the SWSome news about the SW
Some news about the SW
 
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseDeep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the Enterprise
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital Preservation
 

More from Basis Technology

Product Update: Customization with Rosette
Product Update: Customization with RosetteProduct Update: Customization with Rosette
Product Update: Customization with Rosette
Basis Technology
 
Smart Matching for Screening Webinar - May 2020
Smart Matching for Screening Webinar - May 2020Smart Matching for Screening Webinar - May 2020
Smart Matching for Screening Webinar - May 2020
Basis Technology
 
Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020
Basis Technology
 
Rosette Product Update (May 2019)
Rosette Product Update (May 2019)Rosette Product Update (May 2019)
Rosette Product Update (May 2019)
Basis Technology
 
Simple fuzzy name matching in elasticsearch paris meetup
Simple fuzzy name matching in elasticsearch   paris meetupSimple fuzzy name matching in elasticsearch   paris meetup
Simple fuzzy name matching in elasticsearch paris meetup
Basis Technology
 
Optimizing multilingual search in SOLR
Optimizing multilingual search in SOLROptimizing multilingual search in SOLR
Optimizing multilingual search in SOLR
Basis Technology
 
Gregor Stewart - OSIRA 2014
Gregor Stewart - OSIRA 2014Gregor Stewart - OSIRA 2014
Gregor Stewart - OSIRA 2014
Basis Technology
 
Rosette Search Essentials for Elasticsearch
Rosette Search Essentials for ElasticsearchRosette Search Essentials for Elasticsearch
Rosette Search Essentials for Elasticsearch
Basis Technology
 
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff GodboldHLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
Basis Technology
 
OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
OSS 2013 - Real World Facets with Entity Resolution by Benson MarguliesOSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
Basis Technology
 
Big Data Triage with Rosette Human Language Technology Conference
Big Data Triage with Rosette Human Language Technology ConferenceBig Data Triage with Rosette Human Language Technology Conference
Big Data Triage with Rosette Human Language Technology Conference
Basis Technology
 
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search ConferenceMultilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
Basis Technology
 

More from Basis Technology (12)

Product Update: Customization with Rosette
Product Update: Customization with RosetteProduct Update: Customization with Rosette
Product Update: Customization with Rosette
 
Smart Matching for Screening Webinar - May 2020
Smart Matching for Screening Webinar - May 2020Smart Matching for Screening Webinar - May 2020
Smart Matching for Screening Webinar - May 2020
 
Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020
 
Rosette Product Update (May 2019)
Rosette Product Update (May 2019)Rosette Product Update (May 2019)
Rosette Product Update (May 2019)
 
Simple fuzzy name matching in elasticsearch paris meetup
Simple fuzzy name matching in elasticsearch   paris meetupSimple fuzzy name matching in elasticsearch   paris meetup
Simple fuzzy name matching in elasticsearch paris meetup
 
Optimizing multilingual search in SOLR
Optimizing multilingual search in SOLROptimizing multilingual search in SOLR
Optimizing multilingual search in SOLR
 
Gregor Stewart - OSIRA 2014
Gregor Stewart - OSIRA 2014Gregor Stewart - OSIRA 2014
Gregor Stewart - OSIRA 2014
 
Rosette Search Essentials for Elasticsearch
Rosette Search Essentials for ElasticsearchRosette Search Essentials for Elasticsearch
Rosette Search Essentials for Elasticsearch
 
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff GodboldHLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
 
OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
OSS 2013 - Real World Facets with Entity Resolution by Benson MarguliesOSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
 
Big Data Triage with Rosette Human Language Technology Conference
Big Data Triage with Rosette Human Language Technology ConferenceBig Data Triage with Rosette Human Language Technology Conference
Big Data Triage with Rosette Human Language Technology Conference
 
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search ConferenceMultilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
 

Recently uploaded

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 

Recently uploaded (20)

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 

A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

  • 1. Clouds, Search or HLT The 'forecast'? Benson Margulies Executive Vice President and Chief Technology Officer Basis Technology – Human Language Technology Conference 2012 1
  • 2. Clouds, Search or HLT The 'forecast'? Basis Technology – Human Language Technology Conference 2012 2
  • 3. Meteorology - or - Why Clouds •  Lie  on  the  grass  and  look  up  at  the  clouds   •  Everyone  sees  something  different   •  Computerized  Clouds  are  no  different   •  Applica;ons  Always  Available   •  Data  Always  Available   •  Tools  for  Processing  Big  Data   Basis Technology – Human Language Technology Conference 2012 3
  • 4. Big Data and Clouds =~ Hadoop •  It's  not  just  a  maFer  of  size   •  Hadoop  ...   o  Takes  in  structured  data  sets   o  Op;mizes  stateless,  batch  processes   o  Moves  computa3on  to  data   •  All  of  which  is  great  if  that's  what  you  have   •  The  world  is  more  complicated  than  that   Basis Technology – Human Language Technology Conference 2012 4
  • 5. What it Doesn't Do So Easily •  On-­‐the-­‐fly  (non-­‐batch)  processing   •  Stateful,  non-­‐local,  processing   •  For  example,  consider  a  search  engine   o  All  about  online:  a  document  arrives,  users  want   to  find  it.   o  All  about  global  state:  relevancy  involves  global   data  across  the  whole  index.   Basis Technology – Human Language Technology Conference 2012 5
  • 6. More on Search-in-a-Cloud •  Good  News:  'conven;onal'  technologies  scale   to  very  large  indices.   o  Solr   o  SolrCloud   o  Elas;c  Search   o  ...   •  How?  Shards.   o  'hash'  to  split  docs   o  queries  go  everywhere   Basis Technology – Human Language Technology Conference 2012 6
  • 7. Search-in-a-Cloud less good news •  Alterna;ves  are  s;ll:   o  Limited   o  Research   o  or  both   •  Solandra   o  Scaling  via  Cassandra   o  'just  another  sharded  solu;on'   o  Just  the  thing  if  you  like  Cassandra   •   or  Accumulo   o  So  far,  very  basic  inverted  index   o  beFer  things  coming   Basis Technology – Human Language Technology Conference 2012 7
  • 8. Other HLT tasks ... •  'Extrac;on'  is  'straighZorward'   •  Text  comes  in,  en;;es  or  rela;onships  come   out.   •  Results  end  up  in  graph  DB  or  bigtable  or  ...   •  Scale  via  Hadoop  or  whatever   •  The  Challenge  of  Mixing  and  Matching   •  But  ...  what  if  you  want  a  feedback  loop?   Basis Technology – Human Language Technology Conference 2012 8
  • 9. Interoperation •  Lot's  of  focus  on  applica;ons   o  e.g.  Ozone  Widgets   •  Not  so  much  on  backend  processes   •  What  good  is  'data  everywhere'  if:   o  you  can't  deploy  processing  to  exploit  it?   o  you  can't  fit  together  pieces  of  the  puzzle?   •  A  stovepipe  in  a  cloud  is  s;ll........   •  A  stovepipe   Basis Technology – Human Language Technology Conference 2012 9
  • 10. Harder Unstructured Problems •  Imagine  you  wanted  to  cluster  ...   •  New  items  show  up   •  Need  to  find  'best'  exis;ng  cluster   o  It  could  be  'anywhere'   •  Need  to  update  to  reflect  each  new  item   •  (If  you're  wondering  what  we're  clustering  ...)   Basis Technology – Human Language Technology Conference 2012 10
  • 11. Rosette Concrete Examples •  Straight  Search   o  RoseFe  Solr  Plugins  work  all  the  same   o  SolrCloud  hashes/shards   o  RoseFe  runs  on  the  target  node   •  Extrac;on  and  similar  processes   o  Same  story,  using  Update  Request  Processor   Basis Technology – Human Language Technology Conference 2012 11
  • 12. Rosette and Hadoop •  Stateless  APIs  lead  to  simple  implementa;on   •  Non-­‐code  resources  lead  to  some  issues   •  Stateful  processes  (e.g.  RNI)  ...  back  to  Solr   Basis Technology – Human Language Technology Conference 2012 12