Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...Caserta
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using Solr sponsored by O'Reilly Media!
Caserta Concepts shared one of their innovative DW projects using Solr. See how open source search technology can serve high performance analytic use cases. Presentation and solution walk-through given by Caserta Concepts' Joe Caserta and Elliott Cordo.
For more information, visit www.casertaconcepts.com
LucidWorks SiLK is an open source stack that combines Lucene/Solr with best in class open source data ingestion and analytics tools such as Flume, LogStash and Kibana. This webinar will explore the features of SiLK, and provide attendees with valuable information on how they can benefit from the following:
- A powerful UI to analyze time series data stored in Lucene/Solr
- Creating and sharing visualizations, dashboards and reports
- Discovery and analysis of data coming from servers, applications, devices and more
- Exploration of click, geospatial and social data in ways previously unimaginable
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...Caserta
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using Solr sponsored by O'Reilly Media!
Caserta Concepts shared one of their innovative DW projects using Solr. See how open source search technology can serve high performance analytic use cases. Presentation and solution walk-through given by Caserta Concepts' Joe Caserta and Elliott Cordo.
For more information, visit www.casertaconcepts.com
LucidWorks SiLK is an open source stack that combines Lucene/Solr with best in class open source data ingestion and analytics tools such as Flume, LogStash and Kibana. This webinar will explore the features of SiLK, and provide attendees with valuable information on how they can benefit from the following:
- A powerful UI to analyze time series data stored in Lucene/Solr
- Creating and sharing visualizations, dashboards and reports
- Discovery and analysis of data coming from servers, applications, devices and more
- Exploration of click, geospatial and social data in ways previously unimaginable
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...Lucidworks
Running SolrCloud in Public Cloud is the future. This presentation and the code that will be contributed back to the community will allow such clusters to be highly efficient, scalable and elastic. Attendees will understand the challenges and potential of sharing index data between servers.
Speakers: Ilan Ginzburg & Yonik Seeley, Salesforce
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Databricks
The Semantic Engine is a custom search engine deployable on top of large, non-native language corpora that goes beyond keyword search and does NOT require translation. The large, on-the-fly calculations essential to making this an effective search engine necessitated development on a distributed platform capable of processing large volumes of unstructured data.
Hear how the low barrier to entry provided by Apache Spark allowed the Novetta Solutions team to focus on the hard analytical challenges presented by their data, without having to spend much time grappling with the inherent difficulties normally associated with distributed computing.
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
Slides from my talk on "Building a Large Scale SEO/SEM Application with Apache Solr" in Lucene/Solr Revolution 2014 where I talk how we handle Indexing/Search of 40 billion records (documents)/month in Apache Solr with 4.6 TB compressed index data.
Abstract: We are working on building a SEO/SEM application where an end user search for a "keyword" or a "domain" and gets all the insights about these including Search engine ranking, CPC/CPM, search volume, No. of Ads, competitors details etc. in a couple of seconds. To have this intelligence, we get huge web data from various sources and after intensive processing it is 40 billion records/month in MySQL database with 4.6 TB compressed index data in Apache Solr.
Due to large volume, we faced several challenges while improving indexing performance, search latency and scaling the overall system. In this session, I will talk about our several design approaches to import data faster from MySQL, tricks & techniques to improve the indexing performance, Distributed Search, DocValues(life saver), Redis and the overall system architecture.
Semi-Supervised Learning In An Adversarial EnvironmentDataWorks Summit
Unlike a typical e-commerce marketplace, Uber’s marketplace is real-time, therefore stopping fraud has to happen in real-time too. In this talk, we will dive into Account Takeover (ATO) attacks and how we built a near real-time semi-supervised learning system to keep your Uber accounts safe. ATO attacks evolve very fast and may last only for a short time period. Thus traditional way of training a model once a week/day doesn’t really help us to defend against new attack vectors. This lead us to develop a semi-supervised learning system that is built on top of Apache Spark and uses clustering techniques and feedback signals from our ATO challenges to detect and stop new attack vectors. We will discuss our results from online clustering using streaming k-means and also from batch clustering using more complex clustering algorithm, DBSCAN. We will also share some lessons on feature selection and hyperparameter tuning for clustering algorithms which plays a crucial part in performance.
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin Databricks
Machine learning has its challenges, and understanding the algorithms is not always easy. In this session, you’ll discover methods to make these challenges less daunting.
Intended for software engineers who need to understand the requirements and constraints of data scientists, and data scientists who need to implement or help implement production systems, the session will begin with a quick introduction to data quality and a level-set on common vocabulary. You’ll then explore the formats that are required by Spark ML to run its algorithms, and see how to automate the build through user-defined functions and other techniques. Automation will make reproducibility easy, minimize errors and increase the efficiency of data scientists.
Key takeaways will include:
– How to build the required tool set in Java
– Understanding the formats required by Spark ML (a new vocabulary)
– Learning fundamentals about data quality and how to make sure the data is usable
Presented at Indian Institute of Information Technology (IIIT) Allahabad on 21 Oct 2009 to students about the Apache Software Foundation, Lucene, Solr, Hadoop and on the benefits of contributing to open source projects. The target audience was sophomore, junior and senior B.Tech students.
Building a real time big data analytics platform with solrTrey Grainger
Having “big data” is great, but turning that data into actionable intelligence is where the real value lies. This talk will demonstrate how you can use Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery.
At CareerBuilder, we utilize these techniques to report the supply and demand of the labor force, compensation trends, customer performance metrics, and many live internal platform analytics. You will walk away from this talk with an advanced understanding of faceting, including pivot-faceting, geo/radius faceting, time-series faceting, function faceting, and multi-select faceting. You’ll also get a sneak peak at some new faceting capabilities just wrapping up development including distributed pivot facets and percentile/stats faceting, which will be open-sourced.
The presentation will be a technical tutorial, along with real-world use-cases and data visualizations. After this talk, you'll never see Solr as just a text search engine again.
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...Lucidworks
Running SolrCloud in Public Cloud is the future. This presentation and the code that will be contributed back to the community will allow such clusters to be highly efficient, scalable and elastic. Attendees will understand the challenges and potential of sharing index data between servers.
Speakers: Ilan Ginzburg & Yonik Seeley, Salesforce
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Databricks
The Semantic Engine is a custom search engine deployable on top of large, non-native language corpora that goes beyond keyword search and does NOT require translation. The large, on-the-fly calculations essential to making this an effective search engine necessitated development on a distributed platform capable of processing large volumes of unstructured data.
Hear how the low barrier to entry provided by Apache Spark allowed the Novetta Solutions team to focus on the hard analytical challenges presented by their data, without having to spend much time grappling with the inherent difficulties normally associated with distributed computing.
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
Slides from my talk on "Building a Large Scale SEO/SEM Application with Apache Solr" in Lucene/Solr Revolution 2014 where I talk how we handle Indexing/Search of 40 billion records (documents)/month in Apache Solr with 4.6 TB compressed index data.
Abstract: We are working on building a SEO/SEM application where an end user search for a "keyword" or a "domain" and gets all the insights about these including Search engine ranking, CPC/CPM, search volume, No. of Ads, competitors details etc. in a couple of seconds. To have this intelligence, we get huge web data from various sources and after intensive processing it is 40 billion records/month in MySQL database with 4.6 TB compressed index data in Apache Solr.
Due to large volume, we faced several challenges while improving indexing performance, search latency and scaling the overall system. In this session, I will talk about our several design approaches to import data faster from MySQL, tricks & techniques to improve the indexing performance, Distributed Search, DocValues(life saver), Redis and the overall system architecture.
Semi-Supervised Learning In An Adversarial EnvironmentDataWorks Summit
Unlike a typical e-commerce marketplace, Uber’s marketplace is real-time, therefore stopping fraud has to happen in real-time too. In this talk, we will dive into Account Takeover (ATO) attacks and how we built a near real-time semi-supervised learning system to keep your Uber accounts safe. ATO attacks evolve very fast and may last only for a short time period. Thus traditional way of training a model once a week/day doesn’t really help us to defend against new attack vectors. This lead us to develop a semi-supervised learning system that is built on top of Apache Spark and uses clustering techniques and feedback signals from our ATO challenges to detect and stop new attack vectors. We will discuss our results from online clustering using streaming k-means and also from batch clustering using more complex clustering algorithm, DBSCAN. We will also share some lessons on feature selection and hyperparameter tuning for clustering algorithms which plays a crucial part in performance.
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin Databricks
Machine learning has its challenges, and understanding the algorithms is not always easy. In this session, you’ll discover methods to make these challenges less daunting.
Intended for software engineers who need to understand the requirements and constraints of data scientists, and data scientists who need to implement or help implement production systems, the session will begin with a quick introduction to data quality and a level-set on common vocabulary. You’ll then explore the formats that are required by Spark ML to run its algorithms, and see how to automate the build through user-defined functions and other techniques. Automation will make reproducibility easy, minimize errors and increase the efficiency of data scientists.
Key takeaways will include:
– How to build the required tool set in Java
– Understanding the formats required by Spark ML (a new vocabulary)
– Learning fundamentals about data quality and how to make sure the data is usable
Presented at Indian Institute of Information Technology (IIIT) Allahabad on 21 Oct 2009 to students about the Apache Software Foundation, Lucene, Solr, Hadoop and on the benefits of contributing to open source projects. The target audience was sophomore, junior and senior B.Tech students.
Building a real time big data analytics platform with solrTrey Grainger
Having “big data” is great, but turning that data into actionable intelligence is where the real value lies. This talk will demonstrate how you can use Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery.
At CareerBuilder, we utilize these techniques to report the supply and demand of the labor force, compensation trends, customer performance metrics, and many live internal platform analytics. You will walk away from this talk with an advanced understanding of faceting, including pivot-faceting, geo/radius faceting, time-series faceting, function faceting, and multi-select faceting. You’ll also get a sneak peak at some new faceting capabilities just wrapping up development including distributed pivot facets and percentile/stats faceting, which will be open-sourced.
The presentation will be a technical tutorial, along with real-world use-cases and data visualizations. After this talk, you'll never see Solr as just a text search engine again.
Webinar: Simpler Semantic Search with SolrLucidworks
Hear from Lucidworks Senior Solutions Consultant Ted Sullivan about how you can leverage Apache Solr and Lucidworks Fusion to improve semantic awareness of your search applications.
SF Solr Meetup - Interactively Search and Visualize Your Big Datagethue
Open up your user base to the data! Contrary to programming and SQL, almost everybody knows how to search. This talk describes through an interactive demo based on open source Hue how users can graphically search their data in Hadoop. The underlying technical details of the application and its interaction with Apache Solr will be clarified.
The session will detail how to get started with data indexing in just a few clicks as well as explore several data analysis scenarios with the latest Solr Analytics Facets and Spark Streaming. Through a Web browser, attendees will be shown how to explore and visualize data for quick answers. The search dashboard in Hue, with its draggable charts and dynamic interface, lets any non-technical user look for documents or patterns.
Attendees of this talk will learn how to get started with interactive search visualization in their Solr cluster.
Data Discovery, Visualization, and Apache HadoopHortonworks
In this webinar, we will discuss how Apache Hadoop works with your current infrastructure and how you can use data discovery and visualization tools to gain deeper insights from new data types stored in Hadoop and your existing data center investments.
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
Apache NiFi, Storm and Kafka augment each other in modern enterprise architectures. NiFi provides a coding free solution to get many different formats and protocols in and out of Kafka and compliments Kafka with full audit trails and interactive command and control. Storm compliments NiFi with the capability to handle complex event processing.
Join us to learn how Apache NiFi, Storm and Kafka can augment each other for creating a new dataplane connecting multiple systems within your enterprise with ease, speed and increased productivity.
https://www.brighttalk.com/webcast/9573/224063
Over the last few years, Airbnb’s frontend architecture has evolved to keep pace with the rapid advancement happening the JavaScript world. Starting as a humble Rails 2 + Prototype.js app in 2008, the frontend stack powering airbnb.com has gone through a few revisions, including a push towards single-page app architecture with Backbone.js and Handlebars.js, an adventure into isomorphic JavaScript with Rendr (our library for using Node.js to server-render Backbone SPAs), and most recently, a move toward React.js and a re-envisioning of our build pipeline to take advantage of CommonJS, ES6, and a Node.js-based transform system. Spike Brehm, software engineer on the @AirbnbNerds team, will walk through how we approached and executed on these changes. Plus, get excited to see a preview of our new approach to isomorphic JavaScript, allowing us to server-render React components from our Rails app.
Spike Brehm is a software engineer at Airbnb who specializes in building rich web experiences. As a JavaScript nerd, he has spent the last few years shipping web apps and prototyping Airbnb’s front-end stack, experimenting with “isomorphic JavaScript” — apps that have the flexibility to run on both the client and sever using the same codebase.
Correlate Log Data with Business Metrics Like a JediTrevor Parsons
The Logentries and Hosted Graphite integration allows you to connect two of your favorite Ops tools to easily extract important data from your log files, visualize them as metrics, and share them in Hosted Graphite dashboards.
• Integrate your systems to extract the metrics you need, from both your applications and log data.
• Set-up log metric dashboards based on common use cases (e.g. error tracking, performance, app usage).
• Get off the "complexity elevator" of hosting your own in-house logging or graphite solutions.
• Delight your team and organization with valuable metrics and performance insights.
Deploying any software can be a challenge if you don't understand how resources are used or how to plan for the capacity of your systems. Whether you need to deploy or grow a single MongoDB instance, replica set, or tens of sharded clusters then you probably share the same challenges in trying to size that deployment.
This webinar will cover what resources MongoDB uses, and how to plan for their use in your deployment. Topics covered will include understanding how to model and plan capacity needs for new and growing deployments. The goal of this webinar will be to provide you with the tools needed to be successful in managing your MongoDB capacity planning tasks.
This session defines what time series data is (and isn’t), how the problem domain time series differs from more traditional data workloads like full-text search and examines how InfluxData is differentiated from other proposed solutions. This session also includes a review of the most common use cases and a brief demo of InfluxDB.
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
Think you have big data? What about high availability
requirements? At DataDog we process billions of data points every day including metrics and events, as we help the world
monitor the their applications and infrastructure. Being the world’s monitoring system is a big responsibility, and thanks to
Redis we are up to the task. Join us as we discuss how the DataDog team monitors and scales Redis to power our SaaS based monitoring offering. We will discuss our usage and deployment patterns, as well as dive into monitoring best practices for production Redis workloads
Measuring CDN performance and why you're doing it wrongFastly
Integrating content delivery networks into your application infrastructure can offer many benefits, including major performance improvements for your applications. So understanding how CDNs perform — especially for your specific use cases — is vital. However, testing for measurement is complicated and nuanced, and results in metric overload and confusion. It's becoming increasingly important to understand measurement techniques, what they're telling you, and how to apply them to your actual content.
In this session, we'll examine the challenges around measuring CDN performance and focus on the different methods for measurement. We'll discuss what to measure, important metrics to focus on, and different ways that numbers may mislead you.
More specifically, we'll cover:
Different techniques for measuring CDN performance
Differentiating between network footprint and object delivery performance
Choosing the right content to test
Core metrics to focus on and how each impacts real traffic
Understanding cache hit ratio, why it can be misleading, and how to measure for it
Webinar: Best Practices for Getting Started with MongoDBMongoDB
MongoDB adoption continues to grow at a record pace due to the significant enhancements in developer productivity and scalability that the database provides. Occasionally, however, organizations new to the technology make mistakes that limit their ability to leverage the significant advantages MongoDB provides. This webinar will discuss some of the common mistakes made by users when they first start working with MongoDB, how to identify when you've made those mistakes, and how to resolve them.
This presentation will describe how to go beyond a "Hello world" stream application and build a real-time data-driven product. We will present architectural patterns, go through tradeoffs and considerations when deciding on technology and implementation strategy, and describe how to put the pieces together. We will also cover necessary practical pieces for building real products: testing streaming applications, and how to evolve products over time.
Presented at highloadstrategy.com 2016 by Øyvind Løkling (Schibsted Products & Technology), joint work with Lars Albertsson (independent, www.mapflat.com).
Beyond DevOps: How Netflix Bridges the Gap?C4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1mv6Kpr.
Josh Evans uses the Netflix Operations Engineering as a case study to explore the challenges faced by centralized engineering teams and approaches to addressing those challenges. Filmed at qconsf.com.
Josh Evans is Director of Operations Engineering at Netflix, with experience in e-commerce, playback control services, infrastructure, tools, testing, and operations.
This presentation was given in one of the DSATL Mettups in March 2018 in partnership with Southern Data Science Conference 2018 (www.southerndatascience.com)
Effective Microservices In a Data-centric WorldRandy Shoup
From a talk at GOTOChicago 2017, these slides discuss the speaker's experiences at Stitch Fix with
* Organizational, Process, and Cultural prerequisites for being successful with Microservices: small teams, TDD / CD, DevOps
* How to handle shared data when your data is split among microservices
* How to handle "joins" across microservices
* How to simulate "transactions" across microservices
Slides link: https://gotochgo.com/3/sessions/79/slides
Video link: https://gotochgo.com/3/sessions/79/video
Great contribution from our partner Splitpoints solutions on how to collect and format Performance Vision data into Elastic Search / Kibana.
Potential applications are:
- NPM or APM custom dashboards
- Dashboards mixing Performance Vision data with other ITSM tools / sources
- Alerting and baselining.
State of Florida Neo4j Graph Briefing - Cyber IAMNeo4j
Identity is based on relationships. Graph databases ensure those connections are current, scoped to actual requirements, and secure. David Rosenblum will discuss how customers from large financial institutions to smart home security systems are IAM enabled with Neo4j.
Similar to Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadhav, Bloomberg L.P. (20)
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
With ecommerce experiencing explosive growth, it seems intuitive that the B2B segment of that ecosystem is mirroring the same trajectory. That said, B2B has very different needs when it comes to transacting with the same style of experiences that we see in B2C. For instance, B2B ecommerce is about precision findability, whereas B2C customers can convert at higher rates when they’re just browsing online. In order for the B2B buying experience to be successful, search needs to be tuned to meet the unique needs of the segment.
In this webinar with Forrester senior analyst Joe Cicman, you’ll learn:
-Which verticals in B2B will drive the most growth, and how machine-learning powered personalization tactics can be deployed to support those specific verticals
-Why an omnichannel selling approach must be deployed in order to see success in B2B
-How deploying content search capabilities will support a longer sales cycle at scale
-What the next steps are to support a robust B2B commerce strategy supported by new technology
Speakers
Joe Cicman, Senior Analyst, Forrester
Jenny Gomez, VP of Marketing, Lucidworks
Customer loyalty starts with quickly responding to your customer’s needs. When it comes to resolving open support cases, time is of the essence. Time spent searching for answers adds up and creates inefficiencies in resolving cases at scale. Relevant answers need to be a few clicks away and easily accessible for agents directly from their service console.
We will explore how Lucidworks’ Agent Insights application automatically connects agents with the correct answers and resources. You’ll learn how to:
-Configure a proactive widget in an agent’s case view page to access resources across third-party systems (such as Sharepoint, Confluence, JIRA, Zendesk, and ServiceNow).
-Easily set up query pipelines to autonomously route assets and resources that are relevant to the case-at-hand—directly to the right agent.
-Identify subject matter experts within your support data and access tribal knowledge with lightning-fast speed.
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
Lunch and Learn during Retail TouchPoints #RIC21 virtual event.
***
Crate & Barrel’s previous search solution couldn’t provide its shoppers with an online search and browse experience consistent with the customer-centric Crate & Barrel brand. Meanwhile, Crate & Barrel merchandisers spent the bulk of their time manually creating and maintaining search rules. The search experience impacted customer retention, loyalty, and revenue growth.
Join this lunch & learn for an interactive chat on how Crate & Barrel partnered with Lucidworks to:
-Improve search and browse by modernizing the technology stack with ML-based personalization and merchandising solutions
-Enhance the experience for both shoppers and merchandisers
-Explore signals to transform the omnichannel shopping experience
Questions? Visit https://lucidworks.com/contact/
Learn how to guide customers to relevant products using eCommerce search, hyper-personalisation, and recommendations in our ‘Best-In-Class Retail Product Discovery’ webinar.
Nowadays, shoppers want their online experience to be engaging, inspirational and fulfilling. They want to find what they’re looking for quickly and easily. If the sought after item isn’t available, they want the next best product or content surfaced to them. They want a website to understand their goals as though they were talking to a sales assistant in person, in-store.
In this webinar, we explore IMRG industry data insights and a best-in-class example of retail product discovery. You’ll learn:
- How AI can drive increased revenue through hyper-personalised experiences
- How user intent can be easily understood and results displayed immediately
- How merchandisers can be empowered to curate results and product placement – all without having to rely on IT.
Presented by:
Dave Hawkins, Principal Sales Engineer - Lucidworks
Matthew Walsh, Director of Data & Retail - IMRG
Connected Experiences Are Personalized ExperiencesLucidworks
Many companies claim personalization and omnichannel capabilities are top priorities. Few are able to deliver on those experiences.
For a recent Lucidworks-commissioned study, Forrester Consulting surveyed 350+ global business decision-makers to see what gets in the way of achieving these goals. They discovered that inefficient technology, lack of behavioral insights, and failure to tie initiatives to enterprise-wide goals are some of the most frequent blockers to personalization success.
Join guest speaker, Forrester VP and Principal Analyst, Brendan Witcher, and Lucidworks CEO, Will Hayes, to hear the results of the Forrester Consulting study, how to avoid “digital blindness,” and how to apply VoC data in real-time to delight customers with personalized experiences connected across every touchpoint.
In this webinar, you’ll learn:
- Why companies who utilize real-time customer signals report more effective personalization
- How to connect employees and customers in a shared experience through search and browse
- How Lucidworks clients Lenovo, Morgan Stanley and Red Hat fast-tracked improvements in conversion, engagement and customer satisfaction
Featuring
- Will Hayes, CEO, Lucidworks
- Brendan Witcher, VP, Principal Analyst, Forrester
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
Intelligent Policing. Leveraging Data to more effectively Serve Communities.
Policing in the next decade is anticipated to be very different from historical methods. More data driven, more focused on the intricacies of communities they serve and more open and collaborative to make informed recommendations a reality. Whether its social populations, NIBRS or organization improvement that’s the driver, the IT requirement is largely the same. Provide 360 access to large volumes of siloed data to gain a full 360 understanding of existing connections and patterns for improved insight and recommendation.
Join us for a round table discussion of how the Toronto Police Service is better serving their community through deploying a unified intelligent data platform.
Data innovation improves officers' engagement with existing data and streamlines investigation workflows by enhancing collaboration. This improved visibility into existing police data allows for a more intelligent and responsive police force.
In this webinar, we'll cover:
-The technology needs of an intelligent police force.
-How a Global Search improves an officer's interaction with existing data.
Featuring:
-Simon Taylor, VP, Worldwide Channels & Alliances, Lucidworks
-Michael Cizmar, Managing Director, MC+A
-Ian Williams, Manager of Analytics & Innovation, Toronto Police Service
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
Policing in the next decade is anticipated to be very different from historical methods. More data driven, more focused on the intricacies of communities they serve and more open and collaborative to make informed recommendations a reality. Whether its social populations, NIBRS or organization improvement that’s the driver, the IT requirement is largely the same. Provide 360 access to large volumes of siloed data to gain a full 360 understanding of existing connections and patterns for improved insight and recommendation.
Join us for a round table discussion of how the Toronto Police Service is better serving their community through deploying a unified intelligent data platform.
Data innovation improves officers' engagement with existing data and streamlines investigation workflows by enhancing collaboration. This improved visibility into existing police data allows for a more intelligent and responsive police force.
In this webinar, we'll cover:
The technology needs of an intelligent police force.
How a Global Search improves an officer's interaction with existing data.
Featuring
-Simon Taylor, VP, Worldwide Channels & Alliances, Lucidworks
-Michael Cizmar, Managing Director, MC+A
-Ian Williams, Manager of Analytics & Innovation, Toronto Police Service
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
Wish your conversion rates were higher? Can’t figure out how to efficiently and effectively serve all the visitors on your site? Embarrassed by the quality of your product discovery experience? The bar is high and the influx of online shopping over recent months has reminded us that the opportunities are real. We’re all deep in holiday prep, but let’s take a few minutes to think about January 2021 and beyond. How can we position ourselves for success with our customers and against our competition?
Grab your lunch and let’s dive into three strategies that need to be part of your 2021 roadmap. You don’t need an army to get there. But you do need to take action and capitalize on the shoppers abandoning the product discovery journey on your site.
In this session, attendees will find out how to:
-Take control of merchandising at scale;
-Implement hands-free search relevancy; and
-Address personalization challenges.
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
For a personalized search experience, search curation requires robust text interpretation, data enrichment, relevancy tuning and recommendations. In order to achieve this, language and entity identification are crucial.
For teams working on search applications, advanced language packages allow them to achieve greater recall without sacrificing precision.
Join us for a guided tour of our new Advanced Linguistics packages, available in Fusion, thanks to the technology partnership between Lucidworks and Basistech.
We’ll explore the application of language identification and entity extraction in the context of search, along with practical examples of personalizing search and enhancing entity extraction.
In this webinar, we’ll cover:
-How Fusion uses the Rosette Basic Linguistics and Entity Extraction packages
-Tips for improving language identification and treatment as well as data enrichment for personalization
-Speech2 demo modeling Active Recommendation
-Use Rosette’s packages with Fusion Pipelines to build custom entities for specific domain use cases
Featuring:
-Radu Miclaus, Director of Product, AI and Cloud, Lucidworks, Lucidworks
-Robert Lucarini, Senior Software Engineer, Lucidworks
-Nick Belanger, Solutions Engineer, Basis Technology
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
Before COVID-19, almost 80% of the US workforce worked service in jobs that involve in-person interaction with strangers. Now, leaders of service organizations must reshape their offerings during the pandemic and prepare for whatever the new normal turns out to be. Our three panelists will share ideas for adapting their service businesses, now that closer-than-six-feet isn’t an option.
Join Lucidworks as we talk shop with 3 service business leaders, covering:
-Common impacts of the pandemic on service businesses (and what to do about them),
-How service teams can maintain a human touch across virtual channels, and
-Plans for the future, before and after the pandemic subsides.
Featuring
-Sara Nathan, President & CEO, AMIGOS
-Anthony Carruesco, Founder, AC Fly Fishing
-sara bradley, chef and proprietor, freight house
-Justin Sears, VP Product Marketing, Lucidworks
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
The COVID-19 pandemic has forced companies to support far more customers and employees through digital channels than ever before. Many are turning to chatbots to help meet increasing demand, but traditional rules-based approaches can’t keep up. Our new Smart Answers add-on to Lucidworks Fusion makes existing chatbots and virtual assistants more intelligent and more valuable to the people you serve.
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
Watch our on-demand webinar showcasing Smart Answers on Lucidworks Fusion. This technology makes existing chatbots and virtual assistants more intelligent and more valuable to the people you serve.
In this webinar, we’ll cover off:
-How search and deep learning extend conversational frameworks for improved experiences
-How Smart Answers improves customer care, call deflection, and employee self-service
-A live demo of Smart Answers for multi-channel self-service support
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
In the current climate, it’s now more important than ever to digitally enable your workforce and customers.
Hear from Simon Taylor, VP Global Partners & Alliances, Lucidworks and Matt Aslett, Research Vice President, 451 Research to get the inside scoop on how industry leaders in Europe are developing and executing their digital transformation strategies.
In this webinar, we’ll discuss:
The top challenges and aspirations European business and technology leaders are solving using AI and search technology
Which search and AI use cases are making the biggest impact in industries such as finance, healthcare, retail and energy in Europe
What technology buyers should look for when evaluating AI and search solutions
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
In this webinar with 451 Research, you'll understand how retailers are using AI to predict customer intent and learn which key performance metrics are used by more than 120 online retailers in Lucidworks’ 2019 Retail Benchmark Survey.
In this webinar, you’ll learn:
● What trends and opportunities are facing the ecommerce industry in 2020
● Why search is the universal path to understanding customer intent
● How large online retailers apply AI to maximize the effectiveness of their personalization efforts
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
Nordstrom Rack | Hautelook curates and serves customers a wide selection of on-trend apparel, accessories, and shoes at an everyday savings of up to 75 percent off regular prices. With over a million visitors shopping across different platforms every day, and a realization that customers have become accustomed to robust and personalized search interactions, Nordstrom Rack | Hautelook launched an initiative over a year ago to provide data science-driven digital experiences to their customers.
In this session, we’ll discuss Nordstrom Rack | Hautelook’s journey of operationalizing a hefty strategy, optimizing a fickle infrastructure, and rallying troops around a single vision of building an expansible machine-learning driven product discovery engine.
The audience will learn about:
-The key technical challenges and outcomes that come with onboarding a solution
-The lessons learned of creating and executing operational design
-The use of Lucidworks Fusion to plug custom data science models into search and browse applications to understand user intent and deliver personalized experiences
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
Knowledge graphs and machine learning are on the rise as enterprises hunt for more effective ways to connect the dots between the data and the business world. With newer technologies, the digital workplace can dramatically improve employee engagement, data-driven decisions, and actions that serve tangible business objectives.
In this webinar, you will learn
-- Introduction to knowledge graphs and where they fit in the ML landscape
-- How breakthroughs in search affect your business
-- The key features to consider when choosing a data discovery platform
-- Best practices for adopting AI-powered search, with real-world examples
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
3. Who am I ?
• Big Search and Distributed database specialist
• Built a Search as a Service platform
• Lead Search Architect @ Bloomberg Vault
• Credit Derivatives Analytics Engineer @ Bloomberg
• Masters' @ Courant Institute of Mathematical Sciences, New York University
• Passionate about Search, Scuba Diving , Motorcycles and German Shepherds
5. Agenda
• Search
at
Bloomberg
•
Goals
and
Objec5ves
•
A
li9le
background
•
Factors
affec5ng
indexing
•
Our
tests
and
benchmarks
•
Design
for
a
be9er
NRT
indexer
•
Future
work
•
Q/A
11. Indexing Workflow
We were talking about IBM during the fishing trip
Down
Cas)ng
[We] [were] [talking] [about] [IBM] [during] [the] [fishing] [trip]
Creates
tokens
by
lowercasing
all
le4ers
and
dropping
non-‐le4ers.
[we] [were] [talking] [about] [ibm] [during] [the] [fishing] [trip]
[we] [were] [talking] [about] [ibm] [during] [the] [fishing] [trip]
[talk] [fish]
[talking] [about] [ibm] [fishing] [trip]
[talk] [big] [blue] [fish] [journey]
[chat]
Consider
the
sentence:
[we] [were] [talking] [about] [ibm] [during] [the] [fishing] [trip]
[talk] [fish]
Tokeniza)on
A
tokenizer
splits
the
stream
of
characters
into
a
series
of
tokens.
Stemming
Lemma)za)on
Stemming
algorithms
reduce
words
"fishing",
"fished",
"fish",
and
"fisher"
to
the
root
word,
"fish"
Lemma*za*on
expands
words
to
their
inflected
forms
(ie
fishing
-‐>
fished,
fishes,
fish
but
not
fisher)
Stop
Word
Removal
Remove
common
stop
words
“and”,”or”
etc.
which
introduce
noise
in
the
search
process
Synonym
Expansion
Mapping
of
words
based
upon
thesaurus
(synonyms,
acronyms,
hypernyms,
business
rules,
etc..)
For
example
talk
-‐>
chat,
IBM
-‐>
“big
blue”,
trip
-‐>
journey
12. Designing the Search Index
Designing
a
good
Search
Applica)on
also
involves
many
aspects
of
user
interac)on
that
directly
influence
indexing
design
•
Data
Type
and
Data
Distribu)on
•
Server
side
parameters
•
Networking
•
Client
side
parameters
•
Query
pa4erns
14. Data and Distribution of Tokens
Common types of data that we index in a search index
•
Textual
data
(
human
generated
)
e.g.
messages,
news,
blogs
•
Textual
data
(
machine
generated
)
e.g.
logs
,
5ckets
•
Numerical
data
•
Geospa5al
data
How does this affect search index designs ?
•
Query
speed
and
indexing
speed
depend
on
the
size
of
an
index
•
Size
is
dependent
on
•
Number
of
documents
in
the
index
•
Average
size
of
each
document
•
Distribu5on
of
tokens
•
Index
features
eg.
Face5ng,
Highligh5ng
15. Server-side Factors
• Ratio of CPU’s to the number of solr cores running
•
2
Solr
indices
per
CPU
or
a
Thread
• Disk space
•
Disk
space
for
Solr
index
*
2
(
head
room
for
merge
cycles
)
• Memory
•
JVM
heap
•
Off
Heap
•
DocValues
16. Networking
Cluster design consideration
•
Should
a
cluster
span
data
centers
?
•
Latency
between
datacenters
•
Reliability
and
availability
SLA’s
•
Where
does
your
Zookeeper
ensemble
live
?
•
How
many
elec5on
members
•
Consider
observers
to
scale
zookeeper
•
Dynamically
promote
an
observer
to
elec5on
member
Manage concurrent connections on the server
Monitor network latencies for QoS guarantees
17. Client-side Factors
• Managing connections and reusing connections
• Which format to use for indexing data
•
javabin
•
csv
•
json
•
xml
• How many simultaneous threads to use
18. Experiments with NRT Indexing
It’s not always efficient to send a single document to Solr for indexing
How do you decide how many documents to send ?
Collector : A buffer that collects Solr update documents
•
Time
Triggers
(
T
)
•
Time
based
collector
on
the
client-‐side
to
batch
document
payloads
to
Solr
•
Document
Size
Triggers
(
S
)
•
Document
size
based
collector
on
the
client-‐side
to
batch
document
payloads
to
Solr
•
Document
Number
Triggers
(
N
)
•
Number
of
documents
based
collector
on
the
client-‐side
to
batch
document
payloads
to
Solr
The
collectors
are
all
simultaneously
used
in
order
of
priority.
The
lower
priority
collectors
act
as
a
cut-‐off
backups
to
safe
guard
from
overflows.
20. Benchmarking Setup
• Client application sending data to 4-way replicated SolrCloud
• 5 node Zookeeper ensemble
• All tests done with a similar dataset ( machine generated text )
• We synthesize a high throughput ingest stream, which serves as our input
• Soft commits set at 1sec
21. Benchmarking : Time Limit Tests
docs/sec
Time Triggers: Collection window in ms
22. Benchmarking : Document Limit Tests
docs/sec
Document Number Triggers: Collection window in number of documents
24. Observations
• On an average we were able to observe 5x-7x increase in ingestion throughput
• Optimization parameters are dependent constantly changing factors
• The tuning variables need to be constantly adjusted for best performance
• How to use this now
26. PID Controller
Proportional term ( P ) – present
Output proportional to current error value
Integral term ( I ) - past
Sum of instantaneous error over time,
and give accumulated offset that should
have been corrected previously
Derivative term ( D ) - future
Calculated by determining the slope of previous
error over time times the rate of change
27. PID implementation in the indexer
Solr
Cloud
Solr
response
Sampling
thread
Process
variable
Docs/sec
Client
indexer
process
Pick
one
of
the
Triggers
Time
(T
)
Control
Variable
PID
controller
implementa5on
Indexing
threads
29. Future work
• Perfect the PID indexer
• Add it to the YCSB benchmarking framework
• Add other server side parameters on the PID indexer
• Use the PID indexer along with the YCSB framework to size hardware
30. Never Stop Exploring:
Pushing the Limits of Solr
Anirudha Jadhav , Bloomberg LP
QUESTIONS ?