The webinar discussed how Apache Solr can help businesses scale by providing better search capabilities than MySQL. It covered how Solr is more than a search engine and is key to scalability. Various Solr features were presented like the data import handler, replication, sharding and listening to the MySQL bin log that can help businesses with growing catalogs, traffic and real-time indexing. The webinar provided examples of how these Solr capabilities address issues faced at different stages of a business in supporting initial catalog growth, growing traffic, substantial catalog growth and a real-time catalog.
Non-interactive big-data analysis prohibits experimentation and can interrupt the analyst’s train of thoughts but analyzing and drawing insights in real time is no easy task with jobs often taking minutes/hours to complete. What if you want to put a interactive interface in front of that data that allows iterative insights? What if you need that interactive experience to be sub second?
Traditional SQL and most MPP/NoSQL databases cannot run complex calculations over large data in a performant manner. Popular distributed systems such as Hadoop or Spark can execute jobs but their job overhead prohibits sub second response times. Learn how an in-memory computing framework enabled us to perform complex analysis jobs on massive data points with sub second response times — allowing us to plug it into a simple, drag-and-drop web 2.0 interface.
Collaborate 2018: How to Get Cross Functional Reporting with an Enterprise Da...Datavail
Many organizations not only lack the ability to look at their data across the organization as whole, but often have no lens into the metrics that they need to report against or manage the business of their own departments.
How beneficial would it be to have a central data information repository – we call it an Enterprise Data Warehouse – from which to retrieve accurate data from across all aspects of your business? This presentation explains how this, and more, can be a reality for your business, in a relatively short amount of time.
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
This session will cover building the modern Data Warehouse by migration from the traditional DW platform into the cloud, using Amazon Redshift and Cloud ETL Matillion in order to provide Self-Service BI for the business audience. This topic will cover the technical migration path of DW with PL/SQL ETL to the Amazon Redshift via Matillion ETL, with a detailed comparison of modern ETL tools. Moreover, this talk will be focusing on working backward through the process, i.e. starting from the business audience and their needs that drive changes in the old DW. Finally, this talk will cover the idea of self-service BI, and the author will share a step-by-step plan for building an efficient self-service environment using modern BI platform Tableau.
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Lucidworks
This document summarizes S&P Global's use of Solr for search capabilities across their large datasets. It discusses how S&P Global indexes over 50 million documents into Solr monthly and handles over 5 million queries per week. It outlines challenges faced with an on-premise Solr deployment and how migrating to Solr Cloud helped address issues like performance, availability, and scalability. Next steps discussed include improving relevancy through data science, continuing to leverage new Solr features, and exploring ways to integrate machine learning into search capabilities.
Webinar: Transforming Customer Experience Through an Always-On Data PlatformDataStax
According to Forrester Research, leaders in customer experience drive 5.1X revenue growth over laggards. And although 84% of companies aspire to be a leader in this space, only 1 in 5 successfully delivers good or great customer experience. Join us for our next webinar where Mike Gualtieri, VP and Principal Analyst at Forrester Research and Rajay Rai, Head of Digital Engineering at Macquarie Bank will share how Customer Experience can drive business results such as faster revenue growth, longer customer retention, greater employee engagement and improved profit margins.
View webinar recording: https://youtu.be/eEc5tx-nHvI
Explore past DataStax webinars: http://www.datastax.com/resources/webinars
This document discusses how ClearStory Data uses Spark and Shark to enable fast cycle analysis on diverse data sources. Spark and Shark allow ClearStory to perform iterative and interactive computations across structured and unstructured data at large scale. ClearStory leverages Spark and Shark's RDDs, SQL support, and machine learning libraries to power its platform for interactive visualization and analysis of blended internal and external data.
A data warehouse is an organized collection of integrated subject-oriented databases designed to aid decision support. It supports business reporting and data mining by providing a consolidated view of cleaned and organized corporate data. Key considerations in data warehouse design include being subject oriented, integrated, time-variant, nonvolatile, summarized, not normalized, including metadata, and being near real-time or tight-time. Data is loaded into the warehouse through extraction, transformation, and loading processes from sources like operational data, specialized applications, and external syndicated data.
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersLucidworks
This document discusses how SQL can be used in Lucidworks Fusion for various purposes like aggregating signals to compute relevance scores, ingesting and transforming data from various sources using Spark SQL, enabling self-service analytics through tools like Tableau and PowerBI, and running experiments to compare variants. It provides examples of using SQL for tasks like sessionization with window functions, joining multiple data sources, hiding complex logic in user-defined functions, and powering recommendations. The document recommends SQL in Fusion for tasks like analytics, data ingestion, machine learning, and experimentation.
Non-interactive big-data analysis prohibits experimentation and can interrupt the analyst’s train of thoughts but analyzing and drawing insights in real time is no easy task with jobs often taking minutes/hours to complete. What if you want to put a interactive interface in front of that data that allows iterative insights? What if you need that interactive experience to be sub second?
Traditional SQL and most MPP/NoSQL databases cannot run complex calculations over large data in a performant manner. Popular distributed systems such as Hadoop or Spark can execute jobs but their job overhead prohibits sub second response times. Learn how an in-memory computing framework enabled us to perform complex analysis jobs on massive data points with sub second response times — allowing us to plug it into a simple, drag-and-drop web 2.0 interface.
Collaborate 2018: How to Get Cross Functional Reporting with an Enterprise Da...Datavail
Many organizations not only lack the ability to look at their data across the organization as whole, but often have no lens into the metrics that they need to report against or manage the business of their own departments.
How beneficial would it be to have a central data information repository – we call it an Enterprise Data Warehouse – from which to retrieve accurate data from across all aspects of your business? This presentation explains how this, and more, can be a reality for your business, in a relatively short amount of time.
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
This session will cover building the modern Data Warehouse by migration from the traditional DW platform into the cloud, using Amazon Redshift and Cloud ETL Matillion in order to provide Self-Service BI for the business audience. This topic will cover the technical migration path of DW with PL/SQL ETL to the Amazon Redshift via Matillion ETL, with a detailed comparison of modern ETL tools. Moreover, this talk will be focusing on working backward through the process, i.e. starting from the business audience and their needs that drive changes in the old DW. Finally, this talk will cover the idea of self-service BI, and the author will share a step-by-step plan for building an efficient self-service environment using modern BI platform Tableau.
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Lucidworks
This document summarizes S&P Global's use of Solr for search capabilities across their large datasets. It discusses how S&P Global indexes over 50 million documents into Solr monthly and handles over 5 million queries per week. It outlines challenges faced with an on-premise Solr deployment and how migrating to Solr Cloud helped address issues like performance, availability, and scalability. Next steps discussed include improving relevancy through data science, continuing to leverage new Solr features, and exploring ways to integrate machine learning into search capabilities.
Webinar: Transforming Customer Experience Through an Always-On Data PlatformDataStax
According to Forrester Research, leaders in customer experience drive 5.1X revenue growth over laggards. And although 84% of companies aspire to be a leader in this space, only 1 in 5 successfully delivers good or great customer experience. Join us for our next webinar where Mike Gualtieri, VP and Principal Analyst at Forrester Research and Rajay Rai, Head of Digital Engineering at Macquarie Bank will share how Customer Experience can drive business results such as faster revenue growth, longer customer retention, greater employee engagement and improved profit margins.
View webinar recording: https://youtu.be/eEc5tx-nHvI
Explore past DataStax webinars: http://www.datastax.com/resources/webinars
This document discusses how ClearStory Data uses Spark and Shark to enable fast cycle analysis on diverse data sources. Spark and Shark allow ClearStory to perform iterative and interactive computations across structured and unstructured data at large scale. ClearStory leverages Spark and Shark's RDDs, SQL support, and machine learning libraries to power its platform for interactive visualization and analysis of blended internal and external data.
A data warehouse is an organized collection of integrated subject-oriented databases designed to aid decision support. It supports business reporting and data mining by providing a consolidated view of cleaned and organized corporate data. Key considerations in data warehouse design include being subject oriented, integrated, time-variant, nonvolatile, summarized, not normalized, including metadata, and being near real-time or tight-time. Data is loaded into the warehouse through extraction, transformation, and loading processes from sources like operational data, specialized applications, and external syndicated data.
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersLucidworks
This document discusses how SQL can be used in Lucidworks Fusion for various purposes like aggregating signals to compute relevance scores, ingesting and transforming data from various sources using Spark SQL, enabling self-service analytics through tools like Tableau and PowerBI, and running experiments to compare variants. It provides examples of using SQL for tasks like sessionization with window functions, joining multiple data sources, hiding complex logic in user-defined functions, and powering recommendations. The document recommends SQL in Fusion for tasks like analytics, data ingestion, machine learning, and experimentation.
This document provides an overview of Azure Synapse Analytics and its key capabilities. Azure Synapse Analytics is a limitless analytics service that brings together enterprise data warehousing and big data analytics. It allows querying data on-demand or at scale using serverless or provisioned resources. The document outlines Synapse's integrated data platform capabilities for business intelligence, artificial intelligence and continuous intelligence. It also describes the different types of analytics workloads that Synapse supports and key architectural components like the dedicated SQL pool and massively parallel processing concepts.
How Big Data Can Help Marketers Improve Customer RelationshipsCloudera, Inc.
As consumer and business buying behaviors continually evolve, it’s become more challenging than ever to acquire, retain, and create happy customers. The good news is that the key to building successful customer relationships exists within the customer interactions that are captured in the form of big data.
This presentation explores how Tableau paired with Cloudera's distribution of Hadoop will help marketers reveal customer insights that traditional technologies miss. Learn:
-How Hadoop enables organizations to build a richer customer 360 profile
-How Cloudera and Tableau are empowering marketers to become data-driven
-How to use Tableau to reveal unknown leading indicators of customer churn
The SAS Search Journey: Using AI to Move from Google to Lucidworks - Alex Fl...Lucidworks
1) SAS migrated their enterprise search from Google and another solution to Lucidworks Fusion to have a single search platform.
2) They encountered issues with the out of box configuration and content that impacted search relevance and ranking.
3) Through multiple iterations of configuration changes, indexing adjustments, and using AI/SAS tools to evaluate search terms, they improved relevance and the types of results returned for key search terms.
4) Future plans for the search include immediate indexing of new content, auto-suggest, spellcheck, and integrating search data into analytics dashboards. Lucidworks was chosen for its data analytics abilities, easy administration, and connectors.
AWS User Group: Building Cloud Analytics Solution with AWSDmitry Anoshin
Abebooks is one of Amazon Subsidiary and it treats data as an asset. It always looks the way to improve existing analytics solution and extract information from terabytes of data.
One of the recent initiatives was the migration from legacy DW platform to the AWS Redshift. During this journey, our data engineers met lots of challenges and sometimes tried to reinvent the wheel.
This talk will cover Abebooks journey towards Cloud DW. Moreover, we will cover the ETL tool selection process for the Cloud as well as the adoption process for the end users. This talk will help you understand the potential of the modern cloud DW and learn about our use case and save time for the future projects.
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...DataStax
This webinar covered graph databases and how they can solve problems that were previously difficult for traditional databases. It included presentations on why graph databases are useful, common use cases like recommendations and network analysis, different types of graph databases, and a demonstration of the DataStax Enterprise graph database. There was also a question and answer session where attendees could ask about graph databases and DataStax Enterprise graph.
From Data to Services at the Speed of BusinessAli Hodroj
From Data to Services at the Speed of Business: Applying cloud-native paradigm to combine fast data analytics with microservices architecture for hybrid workloads.
Title: How Leroy Merlin Uses Elasticsearch to Drive Sales and Increase Revenue on their E-commerce
Nowadays the success of any e-commerce business depends on many moving parts. Understand how Leroy Merlin improved their product promotion on its e-commerce website using Elasticsearch, and implemented a content search approach to increase annual sales turnover.
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Lucidworks
This document discusses how Walmart uses Apache Solr as a "not-so-evil twin" to complement their source of truth database and help scale their data infrastructure. It describes how Walmart abstracts the complexity of managing databases, caches, search queries, and messaging to provide scalable querying across database shards. The use of Solr has allowed Walmart to offload queries, recurring reads, analytics
How did it go? The first large enterprise search project in Europe using Shar...Petter Skodvin-Hvammen
This document summarizes a presentation about implementing a large enterprise search project in Europe using SharePoint 2013. It describes the background of the global oil services company undertaking a knowledge initiative. It details the key pains they faced, content sources indexed, and search strategy. It outlines the infrastructure needs, customizations made, performance considerations, and efforts to improve relevancy. In conclusion, it provides the current status and outcomes of the project.
MariaDB AX: Solución analítica con ColumnStoreMariaDB plc
MariaDB ColumnStore is a high performance columnar storage engine that provides fast and efficient analytics on large datasets in distributed environments. It stores data column-by-column for high compression and read performance. Queries are processed in parallel across nodes for scalability. MariaDB ColumnStore is used for real-time analytics use cases in industries like healthcare, life sciences, and telecommunications to gain insights from large datasets for applications like customer behavior analysis, genome research, and call data monitoring.
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB plc
MariaDB ColumnStore is a high performance columnar storage engine that provides fast and efficient analytics on large datasets in distributed environments. It stores data column-by-column for high compression and read performance. Queries are processed in parallel across nodes for scalability. MariaDB ColumnStore is used for real-time analytics use cases in industries like healthcare, life sciences, and telecommunications to gain insights from large datasets.
This document discusses how to take an agile approach to data warehouse projects. It introduces agile practices like iterative development, minimal inventory, and frequent delivery that can be applied. It proposes using both a normalized and dimensional data model to validate understanding of the data and business domains. Visualization tools like kanban boards and thermometers are recommended. Version control is key to integrate the data model with the rest of the project. The "Spock approach" combines relational and dimensional modeling in a hybrid method.
Just the Job: Employing Solr for Recruitment Search -Charlie Hull lucenerevolution
See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011
Using a case study on a major European executive recruitment company, we will show how we used Apache Lucene/Solr to build powerful, flexible, accurate and scalable search services over tens of millions of CVs and candidate records, allowing the company to completely restructure their IT provision for both local and national offices.
Analytics in Search
Many companies including Lucidworks have embraced the Kibana open source code to add visualization and analytics to enhance search management. Ravi Krishnamurthy , VP of Professional Services at Lucidworks, will show Silk, Lucid's implementation of Kibana, which provides all the capabilities of the open source code but adds enterprise-critical capabilities like authentication and security to protect restricted content.
Types of database processing,OLTP VS Data Warehouses(OLAP), Subject-oriented
Integrated
Time-variant
Non-volatile,
Functionalities of Data Warehouse,Roll-Up(Consolidation),
Drill-down,
Slicing,
Dicing,
Pivot,
KDD Process,Application of Data Mining
This document discusses an agile approach to developing a data warehouse. It advocates using an Agile Enterprise Data Model to provide vision and guidance. The "Spock Approach" is described, which uses an operational data store, dimensional data warehouse, and iterative development of data marts. Data visualization techniques like data hexes are recommended to improve planning and visibility. Leadership, version control, adaptability, refinement, and refactoring are identified as important ongoing processes for an agile data warehouse project.
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauDATAVERSITY
This document discusses amaysim's implementation of Amazon Redshift, Alteryx, and Tableau for data analytics. It provides an overview of each tool and how amaysim uses them together in their business intelligence stack. Key points include:
- Amaysim uses Redshift for data warehousing, Alteryx to prepare and blend data, and Tableau for visualization and self-service analytics. This allows for analysis within hours rather than weeks.
- With a small analytics team, the tools empower line of business users to solve their own problems quickly. This increases workforce productivity.
- Lessons learned include democratizing analytics, making tools relevant to different stakeholders, and celebrating successes to drive cultural
Webinar: Personalized Retail Search & Recommendations with FusionLucidworks
Fusion provides personalized retail search and recommendations through machine learning capabilities like clickstream auto-tuning, query intent classification, and recommendation models. It ingests diverse data sources, processes signals like clicks and purchases, and leverages these to improve relevancy, boost performance, and drive higher conversion. The document demonstrates how Fusion has helped large travel and home improvement retailers by increasing engagement, conversion, and revenues through personalized search experiences.
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Imply
Target is one of the largest retailers in the United States, with brick-and-mortar stores in all 50 states and one of the most-visited ecommerce sites in the country. In addition to typical merchandising functions like assortment planning, pricing and inventory management, Target also operates a large supply chain, financial/banking operations and property management organizations. As a data-driven organization, we need a data analytics platform that can address the unique needs of each of these various business units, while scaling to hundreds of thousands of users and accommodating an ever-increasing amount of data.
In this talk we’ll cover why Target chose to create our own analytics platform and specifically how Druid makes this platform successful. We’ll cover how we utilize key features in Druid, such as union datasources, arbitrary granularities, real-time ingestion, complex aggregation expressions and lightning-fast query response to provide analytics to users at all levels of the organization. We’ll also cover how Druid’s speed and flexibility allow us to provide interactive analytics to front-line, edge-of-business consumers to address hundreds of unique use-cases across several business units.
2013 11-07 lsr-dublin_m_hausenblas_when solr is bestlucenerevolution
This document discusses when Solr is a good tool to use versus other options. It provides an overview of Solr in the big data ecosystem and the concept of polyglot persistence, where different data stores are used for different needs. Common use cases for Solr like search-based recommendations and log analysis are described. A checklist is presented for determining if Solr is a good fit based on factors like data volume, query characteristics, throughput needs, and data type. The document concludes by listing some red flags where Solr may not be suitable, such as if strong consistency, transactions, or graphs are needed requirements.
This document provides an overview of Azure Synapse Analytics and its key capabilities. Azure Synapse Analytics is a limitless analytics service that brings together enterprise data warehousing and big data analytics. It allows querying data on-demand or at scale using serverless or provisioned resources. The document outlines Synapse's integrated data platform capabilities for business intelligence, artificial intelligence and continuous intelligence. It also describes the different types of analytics workloads that Synapse supports and key architectural components like the dedicated SQL pool and massively parallel processing concepts.
How Big Data Can Help Marketers Improve Customer RelationshipsCloudera, Inc.
As consumer and business buying behaviors continually evolve, it’s become more challenging than ever to acquire, retain, and create happy customers. The good news is that the key to building successful customer relationships exists within the customer interactions that are captured in the form of big data.
This presentation explores how Tableau paired with Cloudera's distribution of Hadoop will help marketers reveal customer insights that traditional technologies miss. Learn:
-How Hadoop enables organizations to build a richer customer 360 profile
-How Cloudera and Tableau are empowering marketers to become data-driven
-How to use Tableau to reveal unknown leading indicators of customer churn
The SAS Search Journey: Using AI to Move from Google to Lucidworks - Alex Fl...Lucidworks
1) SAS migrated their enterprise search from Google and another solution to Lucidworks Fusion to have a single search platform.
2) They encountered issues with the out of box configuration and content that impacted search relevance and ranking.
3) Through multiple iterations of configuration changes, indexing adjustments, and using AI/SAS tools to evaluate search terms, they improved relevance and the types of results returned for key search terms.
4) Future plans for the search include immediate indexing of new content, auto-suggest, spellcheck, and integrating search data into analytics dashboards. Lucidworks was chosen for its data analytics abilities, easy administration, and connectors.
AWS User Group: Building Cloud Analytics Solution with AWSDmitry Anoshin
Abebooks is one of Amazon Subsidiary and it treats data as an asset. It always looks the way to improve existing analytics solution and extract information from terabytes of data.
One of the recent initiatives was the migration from legacy DW platform to the AWS Redshift. During this journey, our data engineers met lots of challenges and sometimes tried to reinvent the wheel.
This talk will cover Abebooks journey towards Cloud DW. Moreover, we will cover the ETL tool selection process for the Cloud as well as the adoption process for the end users. This talk will help you understand the potential of the modern cloud DW and learn about our use case and save time for the future projects.
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...DataStax
This webinar covered graph databases and how they can solve problems that were previously difficult for traditional databases. It included presentations on why graph databases are useful, common use cases like recommendations and network analysis, different types of graph databases, and a demonstration of the DataStax Enterprise graph database. There was also a question and answer session where attendees could ask about graph databases and DataStax Enterprise graph.
From Data to Services at the Speed of BusinessAli Hodroj
From Data to Services at the Speed of Business: Applying cloud-native paradigm to combine fast data analytics with microservices architecture for hybrid workloads.
Title: How Leroy Merlin Uses Elasticsearch to Drive Sales and Increase Revenue on their E-commerce
Nowadays the success of any e-commerce business depends on many moving parts. Understand how Leroy Merlin improved their product promotion on its e-commerce website using Elasticsearch, and implemented a content search approach to increase annual sales turnover.
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Lucidworks
This document discusses how Walmart uses Apache Solr as a "not-so-evil twin" to complement their source of truth database and help scale their data infrastructure. It describes how Walmart abstracts the complexity of managing databases, caches, search queries, and messaging to provide scalable querying across database shards. The use of Solr has allowed Walmart to offload queries, recurring reads, analytics
How did it go? The first large enterprise search project in Europe using Shar...Petter Skodvin-Hvammen
This document summarizes a presentation about implementing a large enterprise search project in Europe using SharePoint 2013. It describes the background of the global oil services company undertaking a knowledge initiative. It details the key pains they faced, content sources indexed, and search strategy. It outlines the infrastructure needs, customizations made, performance considerations, and efforts to improve relevancy. In conclusion, it provides the current status and outcomes of the project.
MariaDB AX: Solución analítica con ColumnStoreMariaDB plc
MariaDB ColumnStore is a high performance columnar storage engine that provides fast and efficient analytics on large datasets in distributed environments. It stores data column-by-column for high compression and read performance. Queries are processed in parallel across nodes for scalability. MariaDB ColumnStore is used for real-time analytics use cases in industries like healthcare, life sciences, and telecommunications to gain insights from large datasets for applications like customer behavior analysis, genome research, and call data monitoring.
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB plc
MariaDB ColumnStore is a high performance columnar storage engine that provides fast and efficient analytics on large datasets in distributed environments. It stores data column-by-column for high compression and read performance. Queries are processed in parallel across nodes for scalability. MariaDB ColumnStore is used for real-time analytics use cases in industries like healthcare, life sciences, and telecommunications to gain insights from large datasets.
This document discusses how to take an agile approach to data warehouse projects. It introduces agile practices like iterative development, minimal inventory, and frequent delivery that can be applied. It proposes using both a normalized and dimensional data model to validate understanding of the data and business domains. Visualization tools like kanban boards and thermometers are recommended. Version control is key to integrate the data model with the rest of the project. The "Spock approach" combines relational and dimensional modeling in a hybrid method.
Just the Job: Employing Solr for Recruitment Search -Charlie Hull lucenerevolution
See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011
Using a case study on a major European executive recruitment company, we will show how we used Apache Lucene/Solr to build powerful, flexible, accurate and scalable search services over tens of millions of CVs and candidate records, allowing the company to completely restructure their IT provision for both local and national offices.
Analytics in Search
Many companies including Lucidworks have embraced the Kibana open source code to add visualization and analytics to enhance search management. Ravi Krishnamurthy , VP of Professional Services at Lucidworks, will show Silk, Lucid's implementation of Kibana, which provides all the capabilities of the open source code but adds enterprise-critical capabilities like authentication and security to protect restricted content.
Types of database processing,OLTP VS Data Warehouses(OLAP), Subject-oriented
Integrated
Time-variant
Non-volatile,
Functionalities of Data Warehouse,Roll-Up(Consolidation),
Drill-down,
Slicing,
Dicing,
Pivot,
KDD Process,Application of Data Mining
This document discusses an agile approach to developing a data warehouse. It advocates using an Agile Enterprise Data Model to provide vision and guidance. The "Spock Approach" is described, which uses an operational data store, dimensional data warehouse, and iterative development of data marts. Data visualization techniques like data hexes are recommended to improve planning and visibility. Leadership, version control, adaptability, refinement, and refactoring are identified as important ongoing processes for an agile data warehouse project.
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauDATAVERSITY
This document discusses amaysim's implementation of Amazon Redshift, Alteryx, and Tableau for data analytics. It provides an overview of each tool and how amaysim uses them together in their business intelligence stack. Key points include:
- Amaysim uses Redshift for data warehousing, Alteryx to prepare and blend data, and Tableau for visualization and self-service analytics. This allows for analysis within hours rather than weeks.
- With a small analytics team, the tools empower line of business users to solve their own problems quickly. This increases workforce productivity.
- Lessons learned include democratizing analytics, making tools relevant to different stakeholders, and celebrating successes to drive cultural
Webinar: Personalized Retail Search & Recommendations with FusionLucidworks
Fusion provides personalized retail search and recommendations through machine learning capabilities like clickstream auto-tuning, query intent classification, and recommendation models. It ingests diverse data sources, processes signals like clicks and purchases, and leverages these to improve relevancy, boost performance, and drive higher conversion. The document demonstrates how Fusion has helped large travel and home improvement retailers by increasing engagement, conversion, and revenues through personalized search experiences.
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Imply
Target is one of the largest retailers in the United States, with brick-and-mortar stores in all 50 states and one of the most-visited ecommerce sites in the country. In addition to typical merchandising functions like assortment planning, pricing and inventory management, Target also operates a large supply chain, financial/banking operations and property management organizations. As a data-driven organization, we need a data analytics platform that can address the unique needs of each of these various business units, while scaling to hundreds of thousands of users and accommodating an ever-increasing amount of data.
In this talk we’ll cover why Target chose to create our own analytics platform and specifically how Druid makes this platform successful. We’ll cover how we utilize key features in Druid, such as union datasources, arbitrary granularities, real-time ingestion, complex aggregation expressions and lightning-fast query response to provide analytics to users at all levels of the organization. We’ll also cover how Druid’s speed and flexibility allow us to provide interactive analytics to front-line, edge-of-business consumers to address hundreds of unique use-cases across several business units.
2013 11-07 lsr-dublin_m_hausenblas_when solr is bestlucenerevolution
This document discusses when Solr is a good tool to use versus other options. It provides an overview of Solr in the big data ecosystem and the concept of polyglot persistence, where different data stores are used for different needs. Common use cases for Solr like search-based recommendations and log analysis are described. A checklist is presented for determining if Solr is a good fit based on factors like data volume, query characteristics, throughput needs, and data type. The document concludes by listing some red flags where Solr may not be suitable, such as if strong consistency, transactions, or graphs are needed requirements.
The document provides guidance on leveling up a company's data infrastructure and analytics capabilities. It recommends starting by acquiring and storing data from various sources in a data warehouse. The data should then be transformed into a usable shape before performing analytics. When setting up the infrastructure, the document emphasizes collecting user requirements, designing the data warehouse around key data aspects, and choosing technology that supports iteration, extensibility and prevents data loss. It also provides tips for creating effective dashboards and exploratory analysis. Examples of implementing this approach for two sample companies, MESI and SalesGenomics, are discussed.
Prepare for Peak Holiday Season with MongoDBMongoDB
This document discusses preparing for the holiday season by providing a seamless customer experience. It covers expected trends for the 2014 holiday season including increased spending and an extended shopping window. The opportunity is to provide personalized and relevant experiences for customers. The document then provides an overview of how MongoDB can be used to power various retail functions like product catalogs, real-time inventory and orders, and consolidated customer views to enable a modern seamless retail experience. Technical details are discussed for implementing product catalogs and real-time inventory using MongoDB.
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
This document discusses Elasticsearch and how it can be used to search, analyze, and make sense of large amounts of data. It provides examples of how Elasticsearch is being used by large companies to handle petabytes of data and gain insights. Implementations in France are highlighted. The document concludes by demonstrating how easily Elasticsearch can be deployed and used to ingest and search sample data.
Frank Bien, CEO of Looker - along with Amazon, Google and other data disrupters - discuss how innovators are deeply integrating analytics into every aspect of their businesses, from mobile to warehouse to cloud.
Frank shares Looker’s vision for the future of business intelligence and data analytics and reveal pivotal product and partnership updates.
Frank Bien, CEO of Looker - along with Amazon, Google and other data disrupters - discuss how innovators are deeply integrating analytics into every aspect of their businesses, from mobile to warehouse to cloud.
Frank shares Looker’s vision for the future of business intelligence and data analytics and reveal pivotal product and partnership updates.
Similar to UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013 (20)
1. Magento Expert Consulting Group Webinar | July 31, 2013
Thinking Beyond Search with Solr
Understanding How Solr Can Help Your Business Scale
2. Udi Shamay
Head, Expert Consulting Group
udi@ebay.com
Steve Kukla
Business Solution Architect, Expert Consulting Group
skukla@ebay.com
Kirill Morozov
Application Architect, Expert Consulting Group
kmorozov@ebay.com
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 2
The presenters
Magento Expert Consulting Group
3. What is Apache Solr?
Business Use Cases for Scale
Supporting Initial Catalog Growth
Supporting Growing Traffic
Supporting Substantial Catalog Growth
Supporting A Real-Time Catalog
Key Points to Remember
Q&A
Today’s agenda
July 31, 2013 | 3Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
4. What is Apache Solr?
July 31, 2013 | 4Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
5. Solr
• Separate application – installed on its own server, or
on an existing server in the environment depending on
business needs.
• Solr uses schema configuration files which can be
found in Magentto/lib/Apache
• Magento communicates with Solr via HTTP/XML
• Searching options configured via the Magento admin
panel
July 31, 2013 | 5Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
What is Apache Solr?
General Solr Overview
6. Better text-based searching provides a better
customer experience
• More relevant “fuzzy” searching*
• Faceted searches
• Search corrections
• Out of the box type-ahead*
• Response caching for better performance
July 31, 2013 | 6Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
*Requires customization to leverage at 100%
What is Apache Solr?
Solr the Search Platform
7. Solr is more than a search engine
because…
• Most data customers see is handled by
Solr instead of MySQL
July 31, 2013 | 7Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
What is Apache Solr?
What Makes Solr Powerful
8. Solr is more than a search engine
because…
• Most data customers see is handled by
Solr instead of MySQL
• Solr uses a simpler data structure
July 31, 2013 | 8Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
What is Apache Solr?
What Makes Solr Powerful
product_id
attribute_id
product_id
attribute_name
attribute_id
product_id
attribute_value
product_id
attribute_name
attribute_value
MySQL (EAV)
Solr (No EAV)
9. Solr is more than a search engine
because…
• Most data customers see is handled by
Solr instead of MySQL
• Solr uses a simpler data structure
• Solr supports replication which allows it to
truly scale for growth
July 31, 2013 | 9Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
What is Apache Solr?
What Makes Solr Powerful
Solr
Solr
Solr
Solr
Solr
Magento
10. Supporting Initial Catalog Growth
July 31, 2013 | 10Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
11. Business Background
• Growing catalog – from 10K to 100K SKUs
• From 1 to 2 stores
• From 1 to 2+ web nodes / 1 database node
• Using native Solr Search
July 31, 2013 | 11Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Problems
• Increased indexing time
• Out-dated information on the front-end
Business Use Case
Supporting Initial Catalog Growth
12. Supporting Initial Catalog Growth
Problem – Increasing Index Footprint
*Expected indexing time
July 31, 2013 | 12
35
Min*17.5
min*
3.5
min*
Year 2
2 websites
2 store views
17.5
min*
10
Min*
1.75
Min*
Control
1 website
1 store view
10,000
SKUs
50,000
SKUs
100,000
SKUs
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Slow
Indexing
13. July 31, 2013 | 13
Concept
• Connects to the database using JDBC
• Extra data transformations must be
written in Java/JavaScript.
• Uses a prepared xml configuration
Supporting Initial Catalog Growth
Solution – Custom Data Import Handler
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
14. Results
• 10 times faster indexing
• Supports delta-indexing
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 14
Supporting Initial Catalog Growth
Data Import Handler – Results
Things to keep in mind
• Solr knows about its data source
• May require extra development efforts
• Extra data transformations must be
written in Java/JavaScript
15. Supporting Growing Traffic
July 31, 2013 | 15Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
16. Business Background
• Growing catalog – 1,000,000 SKUs
• Growing traffic: up to 100 requests / second
• 3 stores
• 3+ web nodes/ 1 database node
• Using Data Import Handler
July 31, 2013 | 16Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Problem
• Solr can’t handle increasing user
concurrency
Business Use Case
Supporting Growing Traffic
17. 47.5
Min*23.75
min*
35
min*
17.5
Min*
3.5
Min*
Control
2 website
2 store view
500,000
SKUs
1,000,000
SKUs
*Expected indexing time
July 31, 2013 | 17
4.75
min*
Year 3
3 websites
3 store views
100,000
SKUs
< 1000 updates/sec
Indexing delta
data handles
updates
Supporting Growing Traffic
Increasing Index Footprint – OK
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
18. 120
msec*100
msec*80
msec*
Year 3
3 websites
3 store views
105
msec*
95
msec*
75
msec*
Control
2 website
2 store view
100,000 SKUs
30 RPS
500,000 SKUs
60 RPS
1,000,000 SKUs
100 RPS
*Expected average response time
July 31, 2013 | 18
Solr CPU
is maxed
out
Supporting Growing Traffic
Problem – Increased Response Time
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
19. July 31, 2013 | 19
Supporting Growing Traffic
Solution – Solr Replication
Concept
• Separate reading requests
• Replicate index across multiple nodes
• Read from multiple servers
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
20. Results
• Allows Solr to handle read traffic
• Introduces fail-over
Things to keep in mind
• Requires middle-ware or Magento customization
• Possible heavy data duplication
• Extra changes in infrastructure
July 31, 2013 | 20
Supporting Initial Catalog Growth
Solr Replication – Results
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
22. Business Background
• Growing catalog – 5,000,000 SKUs
• 4 stores
• 4+ web nodes / 1 database node
• Using Data Import Handler
• Using Solr replication
July 31, 2013 | 22Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Problems
• Delta-indexing delays
• Slow response time
Business Use Case
Supporting Substantial Catalog Growth
23. 317.5
Min*
158.75
min*
237.5
min*
118.75
Min*
47.5
Min*
Control
3 website
3 store view
2,500,000
SKUs
5,000,000
SKUs
*Expected indexing time
July 31, 2013 | 23
63.5
min*
Year 4
4 websites
4 store views
1,000,000
SKUs
> 1000 updates/sec
Supporting Substantial Catalog Growth
Problem – Increasing Index Footprint
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Delta
indexing
delays
24. 400
msec*270
msec*150
msec*
Year 4
4 websites
4 store views
300
msec*
230
msec*
120
msec*
Control
3 website
3 store view
1,000,000 SKUs
100 RPS
2,500,000 SKUs
200 RPS
5,000,000 SKUs
400 RPS
*Expected average response time
July 31, 2013 | 24
Slow
response
time
Supporting Substantial Catalog Growth
Problem – Increased Response Time
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
25. July 31, 2013 | 25
Concept
• Distributed search
• Distributed + Replication
(SolrCloud)
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Supporting Substantial Catalog Growth
Solution – Index Sharding
26. Results
• Distributed search for faster response time
• 50 times faster indexing with 5 shards
Supporting Growing Traffic
Index Sharding – Results
July 31, 2013 | 26
MySQL A B C
I D H
F G E
Magento
D E F
G H ISolr
Shards
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Things to keep in mind…
• Custom solution
• Requires Magento customization or
middleware introduction
• Extra changes in infrastructure
27. Supporting A Real-Time Catalog
July 31, 2013 | 27Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
28. Business Background
• Growing catalog – 10,000,000 SKUs
• 5 stores
• 5+ web nodes / 1 database node
• Data Import Handler
• SolrCloud and distributed search
July 31, 2013 | 28Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Business Requirement
• Always up-to-date index
Business Use Case
Supporting A Real-Time Catalog
29. Supporting A Real-Time Catalog
Solution – Listen To The MySQL Bin Log
July 31, 2013 | 29
Concept
• Connect via MySql replication protocol
• Listen to data-related events
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
MySQL
MySql
Slave
ReplicationBinlog
30. Supporting A Real-Time Catalog
Solution – Listen To The MySQL Bin Log
July 31, 2013 | 30
Concept
• Connect via MySql replication protocol
• Listen to data-related events
• Extract information from events
• Manipulate with document in Lucene index
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
MySQL
Solr
Log
Parser
Replication
Listener
Binlog
31. Results
• Replication-like connection
• Indexes are always up-to-date
Things to keep in mind
• Relatively complex implementation
July 31, 2013 | 31
Magento
MySQL
A
Solr
Shards
B C
I D H
F G E
D E F
G H I
Bin log
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Supporting A Real-Time Catalog
Listening To The MySQL Bin Log – Results
32. Key Points to Remember
July 31, 2013 | 32Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
33. • Solr’s search capabilities provide a better site experience than MySQL LIKE or Full-text
• Solr is more than a search platform – it is a key for scalability and growth
• Solr’s data import handler keeps Solr performing well as your catalog grows
• Solr replication helps accommodate growing traffic
• Solr shards help keep indexing execution time and search response times low for very
large catalogs
• Listening to the MySQL bin log can help facilitate a continuously updating catalog
July 31, 2013 | 33Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Key Points to Remember
Solr helps businesses scale
34. Scaling Solr
Solr Wiki http://wiki.apache.org/solr/
Type-Ahead http://wiki.apache.org/solr/Suggester
Data Import Handler(DIH) http://wiki.apache.org/solr/DataImportHandler
Replication http://wiki.apache.org/solr/SolrReplication
Shard http://wiki.apache.org/solr/SolrCloud
Distributed Search http://wiki.apache.org/solr/DistributedSearch
MySql Replication listening
Change Data Capture http://www.slideshare.net/mkindahl/binary-log-api-presentation-oscon-2011
Replication Listener (C) https://launchpad.net/mysql-replication-listener
Open-Replicator (Java) http://code.google.com/p/open-replicator/
References
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 34
35. Q&A
July 31, 2013 | 35Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
36. Udi Shamay
Head, Expert Consulting Group
udi@ebay.com
Steve Kukla
Business Solution Architect, Expert Consulting Group
skukla@ebay.com
Kirill Morozov
Application Architect, Expert Consulting Group
kmorozov@ebay.com
July 31, 2013 | 36
The presenters
Magento Expert Consulting Group
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale