SlideShare a Scribd company logo
SOLR Under the Hood
(Our Experience)
Sumit Vadhera
Senior Manager, S&P Global Market Intelligence
#Activate18 #ActivateSearch
Agenda
• A little bit about myself
• A Little bit about S&P Global
• How we use SOLR
• SOLR based search Challenges
• Our Journey so far to Cloud
• Next steps
• Q&A
A little bit about myself
• Sumit Vadhera (Senior Manger Database Engineering)
• Big Data Solution Architect for around 12+ Years
• Certified Experience ranging from different RDBMS to NoSQL & Big Data
Technologies
• Barely knew about SOLR search until I joined S&P Global
• Now manages Big Data & NoSQL Solutions(build &design architecture)
including query platform
A little bit about S&P Global
• The Market Intelligence platform delivers deep industry data across a broad range of
sectors to create cutting-edge insights The Market Intelligence platform digs deeper to
deliver solutions that are sector-specific, data-rich, and hyper-targeted for your evolving
business needs.
• Our Global Coverage currently Includes:
 56,000+ banks
 58,000+ asset management companies
 11,000+ specialty finance companies
 18,000+ investment banks and broker dealers
 25,000+ insurance companies
 90,000+ real estate companies
 240,000+ tech, media & telecommunication companies
 26,000+ oil & gas companies
 19,000+ electric, natural gas & water utilities
 30,000+ mining & exploration companies
Search Data capabilities
How we use SOLR
• Primary use case:
 Data is more than our business, it’s our passion. Some of the datasets we provide.
 Financials
 Estimates
 Ownership
 Key developments
 Private company data
 Transactions
 Professionals
 Corporate Actions
 Events and Transcripts
• Use SOLR to power some of our critical datasets and use a lot of custom code too
How we use SOLR cont..
• Universal Search engine for our platform powered on SOLR provides
 Text (keyword) & relevancy based search capabilities to our users
 Page Combinations allows our users to type in the name of all possible CIQ pages rather than navigating all of the links.
For example, a user can simply type “IBM Key Stats” in the Search box and immediately navigate to the exact page desired.
 Autosuggest feature for different objects
 Faceting search
 Type ahead search (Filter based text)
 Advanced Search
 Speech to text transcripts
• Indexing-Querying client interacting with SOLR exposes different datasets indexed from
various sources including data pipeline and exposes it to users
• Multiple SOLR Clusters(hybrid) with Terabytes of data and utilize hybrid sharding techniques
including application and cloud based
• Leveraging Lucidworks Customer Support extensively
Typical Platform Page Search 1
Typical Platform Page Search 2
SOLR Based Search Challenges(legacy)..
~40-50 million docs ingestion rate &1-2 million docs per month & transaction rate of approx. 300-350
per min. Average queries hitting per week of 5 million.
• Performance challenges(bottlenecks to overall query traffic)
• Timeouts on platform applications due to complex queries choking entire clusters and creating
bottlenecks
• Relevance performance
• Indexing lags causing near real time data lags on platform. Manual exception handling.
• Fragmentation inside SOLR Cores were a primary factor
• Optimization Downtime
• Analyzing & extracting SOLR log queries stored in RDBMS.
• Re-indexing process
• GC issues & customized code and customized indexing solution
• Security and product bugs
• Single point of failure on Master-Slave
• Document exceptions(tika parser)
What we did..
• Extensive GC Tuning
• Extensive JVM Tuning
• IO Tuning(Trying out LOCAL disks)
• Query tuning not just limited to
 Move non-scoring queries to the filter cache and improve the use of the field Value Cache – with
date descending sorts
 Caching of time range queries by decreasing granularity from seconds to day to speed auto warm
times
 Changing scoring algorithms(custom) & use of edismax parser to support multi language(foreign)
 Cleaning off date range & phrase queries
• Turning off term vectors and switching to doc values
• Addition of more searchers(horizontal scalability)
• Automation of optimization & recycling SOLR more frequently during off hours
Storage
A 5 Megabyte hard drive from 1956
Being loaded into a plane
Cost: More than USD$ 100,000
What has changed and is changing fast
1985 Cost 2018 Cost
Storage is (nearly) free
1985 Cost 2018 Cost
Processing power doubles each year
1 TB 1 GHz
The significant problems we face today cannot be solved at the same level of thinking
We were at when we created them.
- Albert Einstein
Our Journey so far to cloud..
Today we utilize SOLR latest cloud architecture with hybrid cloud infrastructure
Our Journey so far to cloud cont..
Key benefits we see as of today..
• No single point of failure
• Increased availability (HA) and reduced TAT
• Significant performance gains(query) and improved relevancy for page searches(scale searching).
• Improvements to indexing and decreased incremental lags(scale indexing).
• Banana dashboards to identify bottlenecks
• Leverage fusion for security and auditing (authorization/authentication)
• Indexing pipelines with auto detection of failures & NEAR REAL TIME data on platform
• Type ahead, Facet, Search supporting highlighting ,recent &related terms custom searches
• Moving off SOLR logs data from async sonic message queue to new piplelines integrating with ES and Kibana.
• Improved Searched through multiple filters
• Quicker alerts to setup on our variety of searches
• Support natural language search, screening & mappings
• Improved platform search serving screening questions, quick navigation to individual workflows, surfacing pages
and documents.
Next steps..
Search, now Data Science
• Further improving relevancy of search results, the presentation of our information as a
whole really, makes our platform more essential to our clients
• Data science as a whole continues creating models that feeds to improve relevancy.
• Continue leverage new features & enhancements & Support of LW SOLR Cloud
• Create Scalable, Extensible, and Transparent data pipelines
• Expand data glue to query Lucene
• Continue leveraging and expanding our analytical search capabilities
• Use further machine learning with SOLR to process rule-based tasks like data extraction
and cleaning.
• Meta data driven models based on search
• Use ML/AI in search
aa
We are hiring…
https://www.spglobal.com/en/careers/
Other DB & NoSQL/Big Data technologies we use…
• SOLR
• ELK
• Kafka
• Hadoop
• MySQL
• Oracle
• MSSQL
• Cassandra
• PostgreSQL
• Dynamo DB
• Redshift
Many More…..
Thank you!
Sumit Vadhera
Senior Manager, S&P Global
Database Engineering (Architecture)
Email:- sumitvadhera@spglobal.com
Linked IN:- https://www.linkedin.com/in/sumit-vadhera-993059162
#Activate18 #ActivateSearch

More Related Content

What's hot

SQL Server 2017 Enhancements You Need To Know
SQL Server 2017 Enhancements You Need To KnowSQL Server 2017 Enhancements You Need To Know
SQL Server 2017 Enhancements You Need To Know
Quest
 
Power BI: Tips and Tricks
Power BI: Tips and TricksPower BI: Tips and Tricks
Power BI: Tips and Tricks
GlobalLogic Ukraine
 
Tableau & MongoDB: Visual Analytics at the Speed of Thought
Tableau & MongoDB: Visual Analytics at the Speed of ThoughtTableau & MongoDB: Visual Analytics at the Speed of Thought
Tableau & MongoDB: Visual Analytics at the Speed of Thought
MongoDB
 
Top 5 Things to Know About Integrating MongoDB into Your Data Warehouse
Top 5 Things to Know About Integrating MongoDB into Your Data WarehouseTop 5 Things to Know About Integrating MongoDB into Your Data Warehouse
Top 5 Things to Know About Integrating MongoDB into Your Data WarehouseMongoDB
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Rakesh Jayaram
 
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to MarketBusiness Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to MarketMongoDB
 
Continuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData AnalysisContinuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData Analysis
Kai Sasaki
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
Ike Ellis
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
BizTalk360
 
Personalization Journey: From Single Node to Cloud Streaming
Personalization Journey: From Single Node to Cloud StreamingPersonalization Journey: From Single Node to Cloud Streaming
Personalization Journey: From Single Node to Cloud Streaming
Databricks
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data Factory
BizTalk360
 
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
Databricks
 
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
MongoDB Evenings DC: MongoDB - The New Default Database for Giant IdeasMongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
MongoDB
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
Thomas Sykes
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB
 
Azure analysis services next step to bi in the cloud
Azure analysis services   next step to bi in the cloudAzure analysis services   next step to bi in the cloud
Azure analysis services next step to bi in the cloud
Gabi Münster
 
DBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data ApplicationsDBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data Applications
decode2016
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDB
MongoDB
 
Moving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed InstanceMoving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed Instance
Thomas Sykes
 
Exploring Puerto Rico Open Data with Power BI
Exploring Puerto Rico Open Data with Power BIExploring Puerto Rico Open Data with Power BI
Exploring Puerto Rico Open Data with Power BI
Guillermo Caicedo
 

What's hot (20)

SQL Server 2017 Enhancements You Need To Know
SQL Server 2017 Enhancements You Need To KnowSQL Server 2017 Enhancements You Need To Know
SQL Server 2017 Enhancements You Need To Know
 
Power BI: Tips and Tricks
Power BI: Tips and TricksPower BI: Tips and Tricks
Power BI: Tips and Tricks
 
Tableau & MongoDB: Visual Analytics at the Speed of Thought
Tableau & MongoDB: Visual Analytics at the Speed of ThoughtTableau & MongoDB: Visual Analytics at the Speed of Thought
Tableau & MongoDB: Visual Analytics at the Speed of Thought
 
Top 5 Things to Know About Integrating MongoDB into Your Data Warehouse
Top 5 Things to Know About Integrating MongoDB into Your Data WarehouseTop 5 Things to Know About Integrating MongoDB into Your Data Warehouse
Top 5 Things to Know About Integrating MongoDB into Your Data Warehouse
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to MarketBusiness Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
 
Continuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData AnalysisContinuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData Analysis
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
 
Personalization Journey: From Single Node to Cloud Streaming
Personalization Journey: From Single Node to Cloud StreamingPersonalization Journey: From Single Node to Cloud Streaming
Personalization Journey: From Single Node to Cloud Streaming
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data Factory
 
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
 
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
MongoDB Evenings DC: MongoDB - The New Default Database for Giant IdeasMongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
Azure analysis services next step to bi in the cloud
Azure analysis services   next step to bi in the cloudAzure analysis services   next step to bi in the cloud
Azure analysis services next step to bi in the cloud
 
DBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data ApplicationsDBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data Applications
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDB
 
Moving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed InstanceMoving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed Instance
 
Exploring Puerto Rico Open Data with Power BI
Exploring Puerto Rico Open Data with Power BIExploring Puerto Rico Open Data with Power BI
Exploring Puerto Rico Open Data with Power BI
 

Similar to Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global

Le big data à l'épreuve des projets d'entreprise
Le big data à l'épreuve des projets d'entrepriseLe big data à l'épreuve des projets d'entreprise
Le big data à l'épreuve des projets d'entreprise
Rubedo, a WebTales solution
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications
Tugdual Grall
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Historic Opportunities: Discover the Power of Ignition's Historian
Historic Opportunities: Discover the Power of Ignition's HistorianHistoric Opportunities: Discover the Power of Ignition's Historian
Historic Opportunities: Discover the Power of Ignition's Historian
Inductive Automation
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema
 
Data Treatment MongoDB
Data Treatment MongoDBData Treatment MongoDB
Data Treatment MongoDB
Norberto Leite
 
Euro IT Group
Euro IT GroupEuro IT Group
Euro IT Group
Ben Oakford
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013MongoDB
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
MongoDB
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
Inside Analysis
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
 
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
Amazon Web Services
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
Amazon Web Services
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Data Con LA
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
itnewsafrica
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Clustrix
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Summit
 

Similar to Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global (20)

Le big data à l'épreuve des projets d'entreprise
Le big data à l'épreuve des projets d'entrepriseLe big data à l'épreuve des projets d'entreprise
Le big data à l'épreuve des projets d'entreprise
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Historic Opportunities: Discover the Power of Ignition's Historian
Historic Opportunities: Discover the Power of Ignition's HistorianHistoric Opportunities: Discover the Power of Ignition's Historian
Historic Opportunities: Discover the Power of Ignition's Historian
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
Data Treatment MongoDB
Data Treatment MongoDBData Treatment MongoDB
Data Treatment MongoDB
 
Euro IT Group
Euro IT GroupEuro IT Group
Euro IT Group
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Lucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
Lucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
Lucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
Lucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Lucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Lucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
Lucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Lucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Lucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
Lucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
Lucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 

Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global

  • 1. SOLR Under the Hood (Our Experience) Sumit Vadhera Senior Manager, S&P Global Market Intelligence #Activate18 #ActivateSearch
  • 2. Agenda • A little bit about myself • A Little bit about S&P Global • How we use SOLR • SOLR based search Challenges • Our Journey so far to Cloud • Next steps • Q&A
  • 3. A little bit about myself • Sumit Vadhera (Senior Manger Database Engineering) • Big Data Solution Architect for around 12+ Years • Certified Experience ranging from different RDBMS to NoSQL & Big Data Technologies • Barely knew about SOLR search until I joined S&P Global • Now manages Big Data & NoSQL Solutions(build &design architecture) including query platform
  • 4. A little bit about S&P Global • The Market Intelligence platform delivers deep industry data across a broad range of sectors to create cutting-edge insights The Market Intelligence platform digs deeper to deliver solutions that are sector-specific, data-rich, and hyper-targeted for your evolving business needs. • Our Global Coverage currently Includes:  56,000+ banks  58,000+ asset management companies  11,000+ specialty finance companies  18,000+ investment banks and broker dealers  25,000+ insurance companies  90,000+ real estate companies  240,000+ tech, media & telecommunication companies  26,000+ oil & gas companies  19,000+ electric, natural gas & water utilities  30,000+ mining & exploration companies
  • 5.
  • 7. How we use SOLR • Primary use case:  Data is more than our business, it’s our passion. Some of the datasets we provide.  Financials  Estimates  Ownership  Key developments  Private company data  Transactions  Professionals  Corporate Actions  Events and Transcripts • Use SOLR to power some of our critical datasets and use a lot of custom code too
  • 8. How we use SOLR cont.. • Universal Search engine for our platform powered on SOLR provides  Text (keyword) & relevancy based search capabilities to our users  Page Combinations allows our users to type in the name of all possible CIQ pages rather than navigating all of the links. For example, a user can simply type “IBM Key Stats” in the Search box and immediately navigate to the exact page desired.  Autosuggest feature for different objects  Faceting search  Type ahead search (Filter based text)  Advanced Search  Speech to text transcripts • Indexing-Querying client interacting with SOLR exposes different datasets indexed from various sources including data pipeline and exposes it to users • Multiple SOLR Clusters(hybrid) with Terabytes of data and utilize hybrid sharding techniques including application and cloud based • Leveraging Lucidworks Customer Support extensively
  • 11. SOLR Based Search Challenges(legacy).. ~40-50 million docs ingestion rate &1-2 million docs per month & transaction rate of approx. 300-350 per min. Average queries hitting per week of 5 million. • Performance challenges(bottlenecks to overall query traffic) • Timeouts on platform applications due to complex queries choking entire clusters and creating bottlenecks • Relevance performance • Indexing lags causing near real time data lags on platform. Manual exception handling. • Fragmentation inside SOLR Cores were a primary factor • Optimization Downtime • Analyzing & extracting SOLR log queries stored in RDBMS. • Re-indexing process • GC issues & customized code and customized indexing solution • Security and product bugs • Single point of failure on Master-Slave • Document exceptions(tika parser)
  • 12. What we did.. • Extensive GC Tuning • Extensive JVM Tuning • IO Tuning(Trying out LOCAL disks) • Query tuning not just limited to  Move non-scoring queries to the filter cache and improve the use of the field Value Cache – with date descending sorts  Caching of time range queries by decreasing granularity from seconds to day to speed auto warm times  Changing scoring algorithms(custom) & use of edismax parser to support multi language(foreign)  Cleaning off date range & phrase queries • Turning off term vectors and switching to doc values • Addition of more searchers(horizontal scalability) • Automation of optimization & recycling SOLR more frequently during off hours
  • 13. Storage A 5 Megabyte hard drive from 1956 Being loaded into a plane Cost: More than USD$ 100,000
  • 14. What has changed and is changing fast 1985 Cost 2018 Cost Storage is (nearly) free 1985 Cost 2018 Cost Processing power doubles each year 1 TB 1 GHz The significant problems we face today cannot be solved at the same level of thinking We were at when we created them. - Albert Einstein
  • 15. Our Journey so far to cloud.. Today we utilize SOLR latest cloud architecture with hybrid cloud infrastructure
  • 16. Our Journey so far to cloud cont.. Key benefits we see as of today.. • No single point of failure • Increased availability (HA) and reduced TAT • Significant performance gains(query) and improved relevancy for page searches(scale searching). • Improvements to indexing and decreased incremental lags(scale indexing). • Banana dashboards to identify bottlenecks • Leverage fusion for security and auditing (authorization/authentication) • Indexing pipelines with auto detection of failures & NEAR REAL TIME data on platform • Type ahead, Facet, Search supporting highlighting ,recent &related terms custom searches • Moving off SOLR logs data from async sonic message queue to new piplelines integrating with ES and Kibana. • Improved Searched through multiple filters • Quicker alerts to setup on our variety of searches • Support natural language search, screening & mappings • Improved platform search serving screening questions, quick navigation to individual workflows, surfacing pages and documents.
  • 17. Next steps.. Search, now Data Science • Further improving relevancy of search results, the presentation of our information as a whole really, makes our platform more essential to our clients • Data science as a whole continues creating models that feeds to improve relevancy. • Continue leverage new features & enhancements & Support of LW SOLR Cloud • Create Scalable, Extensible, and Transparent data pipelines • Expand data glue to query Lucene • Continue leveraging and expanding our analytical search capabilities • Use further machine learning with SOLR to process rule-based tasks like data extraction and cleaning. • Meta data driven models based on search • Use ML/AI in search aa
  • 18. We are hiring… https://www.spglobal.com/en/careers/ Other DB & NoSQL/Big Data technologies we use… • SOLR • ELK • Kafka • Hadoop • MySQL • Oracle • MSSQL • Cassandra • PostgreSQL • Dynamo DB • Redshift Many More…..
  • 19. Thank you! Sumit Vadhera Senior Manager, S&P Global Database Engineering (Architecture) Email:- sumitvadhera@spglobal.com Linked IN:- https://www.linkedin.com/in/sumit-vadhera-993059162 #Activate18 #ActivateSearch