Until 2014, gutefrage.net was striving towards chaos and chaos had many faces like
* Bad answers were shown to the users and hiding the good and helpful ones
* Spam was overlooked in the huge amount of generated content
* Tags do not always represent the intended topic of the question
* The page load time led to aborting users before the site is fully rendered
These problems imply bad consequences for the user experience, the manageability of our content, the image of our platform and also on the revenue.
To tackle these big problems, we decided to leverage the full power of our data. We put great effort into an automatic rating of the answers and hide the really bad ones from every user, we alert problems with Spammers in realtime to our Community Management, improve the page load time tremendously and currently testing prototypes for (semi-) automatically inferring the topic of a question.
I will show you on some examples, how we discovered the problems by making the data visible to everyone in the company, fixing them either by advanced Machine Learning techniques or by relying on the „collective brain“ of our community and improving the user experience step by step until the chaos is finally defeated.
A Hadoop User Group (HUG) Ireland talk on Data Science production environments and their online set up using #ExpertModels by Cronan McNamara, CEO @CremeGlobal
Thinking in Graphs - GraphQL problems and more - Maciej Rybaniec (23.06.2017)Grand Parade Poland
The GraphQL specification is intentionally silent on a handful of important issues facing APIs such as dealing with the network, authorization, errors and other things. During my presentation I want to describe each of these problems and present sample solutions for them.
Presentation from Lunch&Learn prepared by Maciej Rybaniec, Senior Frontend Developer at Grand Parade.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
With so much talk of how Big Data is revolutionizing the world and how a data lake with Hadoop and/or Spark will solve all your data problems, it is hard to tell what is hype, reality, or somewhere in-between.
In working with dozens of enterprises in varying stages of their enterprise data management (EDM) strategy, MongoDB enterprise architect, Matt Kalan, sees the same challenges and misunderstandings arise again and again.
In this session, he will explain common challenges in data management, what capabilities are necessary, and what the future state of architecture looks like. MongoDB is uniquely capable of filling common gaps in the data lake strategy.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
Until 2014, gutefrage.net was striving towards chaos and chaos had many faces like
* Bad answers were shown to the users and hiding the good and helpful ones
* Spam was overlooked in the huge amount of generated content
* Tags do not always represent the intended topic of the question
* The page load time led to aborting users before the site is fully rendered
These problems imply bad consequences for the user experience, the manageability of our content, the image of our platform and also on the revenue.
To tackle these big problems, we decided to leverage the full power of our data. We put great effort into an automatic rating of the answers and hide the really bad ones from every user, we alert problems with Spammers in realtime to our Community Management, improve the page load time tremendously and currently testing prototypes for (semi-) automatically inferring the topic of a question.
I will show you on some examples, how we discovered the problems by making the data visible to everyone in the company, fixing them either by advanced Machine Learning techniques or by relying on the „collective brain“ of our community and improving the user experience step by step until the chaos is finally defeated.
A Hadoop User Group (HUG) Ireland talk on Data Science production environments and their online set up using #ExpertModels by Cronan McNamara, CEO @CremeGlobal
Thinking in Graphs - GraphQL problems and more - Maciej Rybaniec (23.06.2017)Grand Parade Poland
The GraphQL specification is intentionally silent on a handful of important issues facing APIs such as dealing with the network, authorization, errors and other things. During my presentation I want to describe each of these problems and present sample solutions for them.
Presentation from Lunch&Learn prepared by Maciej Rybaniec, Senior Frontend Developer at Grand Parade.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
With so much talk of how Big Data is revolutionizing the world and how a data lake with Hadoop and/or Spark will solve all your data problems, it is hard to tell what is hype, reality, or somewhere in-between.
In working with dozens of enterprises in varying stages of their enterprise data management (EDM) strategy, MongoDB enterprise architect, Matt Kalan, sees the same challenges and misunderstandings arise again and again.
In this session, he will explain common challenges in data management, what capabilities are necessary, and what the future state of architecture looks like. MongoDB is uniquely capable of filling common gaps in the data lake strategy.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
Detecting Anomalous Behavior with Surveillance AnalyticsDatabricks
Surveillance feed has essentially been monitored manually until recent years. Video analytics as a technology has made great strides and leverages video surveillance networks to derive searchable, actionable, and quantifiable intelligence from live or recorded video content.
Driven by artificial intelligence and deep learning, video intelligence solutions detect and extract objects in a video. These solutions identify target objects based on trained Deep Neural Networks and then classify each object to enable intelligent video analysis, including search & filtering, alerting, data aggregation and visualization.
In our session, we will:
Discuss the current state of surveillance and popular Python libraries used in video analytics
Elucidate various approaches deployed, using a myriad of pre-trained models from MobileNet SSD to the state-of-the-art Yolo Model.
Describe the many pre-processing techniques we have used, such as the generation of a time-averaged frame, erosion, dilation, and many others
With the basics covered, it’s LIGHTS! CAMERA! ACTION ….Let us show you how this works. We will be presenting a live demo that will explain the performance-computing trade-offs between the use of different models, techniques, and their limitations.
What you can expect to take away from our session:
Gain a deeper understanding of advanced Video Analytics techniques
Understand how to utilize pre-trained models for video analytics solutions
Learn more about the hardware requirements, limitations and challenges posed while devising a video analytics solution
Benefit from the lessons learnt upon deployment in a real-life scenario
The future direction and possibilities of the solution we have developed
DataMass Summit - Machine Learning for Big Data in SQL ServerŁukasz Grala
Sesja pokazująca zarówno Machine Learning Server (czyli algorytmy uczenia maszynowego w językach R i Python), ale także możliwość korzystania z danych JSON w SQL Server, czy też łączenia się do danych znajdujących się na HDFS, HADOOP, czy Spark poprzez Polybase w SQL Server, by te dane wykorzystywać do analizy, predykcji poprzez modele w językach R lub Python.
Sesja na temat analizy sentymentu, ale także i algorytmów uczenia maszynowego w bibliotekach do języka R Microsoft. Sesja była prezentowana na konferencji WhyR? w Warszawie
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Databricks
The Semantic Engine is a custom search engine deployable on top of large, non-native language corpora that goes beyond keyword search and does NOT require translation. The large, on-the-fly calculations essential to making this an effective search engine necessitated development on a distributed platform capable of processing large volumes of unstructured data.
Hear how the low barrier to entry provided by Apache Spark allowed the Novetta Solutions team to focus on the hard analytical challenges presented by their data, without having to spend much time grappling with the inherent difficulties normally associated with distributed computing.
Talk on Data Discovery and Metadata by Mark Grover from July 2019.
Goes into detail of the problem, build/buy/adopt analysis and Lyft's solution - Amundsen, along with thoughts on the future.
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Databricks
PixieDust is a new open source library that helps data scientists and developers working in Jupyter Notebooks and Apache Spark be more efficient. PixieDust speeds up data manipulation and display with features like: auto-visualization of Spark DataFrames, real-time Spark job progress monitoring, automated local install of Python and Scala kernels running with Spark, and much more.
Come along and learn how you can use this tool in your own projects to visualize and explore data effortlessly with no coding. Oh, and if you prefer working with a Scala Notebook, this session is also for you, as PixieDust can also run on a Scala Kernel. Imagine being able to visualize your favorite Python chart engines from a Scala Notebook!
We’ll finish the session with a demo combining Twitter, Watson Tone Analyzer, Spark Streaming, and some fun real-time visualizations–all running within a Notebook.
Warehousing Your Hits - The Why and How of Owning Your DataScott Arbeitman
These are the slides from my recent presentation at Melbourne' Web Analytics Wednesdays. I talk about transitioning from collecting your data in primary digital analytics systems to storing them in a data warehouse or data lake.
Importance of ML Reproducibility & Applications with MLfLowDatabricks
With data as a valuable currency and the architecture of reliable, scalable Data Lakes and Lakehouses continuing to mature, it is crucial that machine learning training and deployment techniques keep up to realize value. Reproducibility, efficiency, and governance in training and production environments rest on the shoulders of both point in time snapshots of the data and a governing mechanism to regulate, track, and make best use of associated metadata.
This talk will outline the challenges and importance of building and maintaining reproducible, efficient, and governed machine learning solutions as well as posing solutions built on open source technologies – namely Delta Lake for data versioning and MLflow for efficiency and governance.
Although you may not have heard of JavaScript Object Notation Linked Data (JSON-LD), it is already impacting your business. Search engine giants such as Google have mandated JSON-LD as a preferred means of adding structured data to web pages to make them considerably easier to parse for more accurate search engine results. The Google use case is indicative of the larger capacity for JSON-LD to increase web traffic for sites and better guide users to the results they want.
Expectations are high for (JSON-LD), and with good reason. JSON-LD effectively delivers the many benefits of JSON, a lightweight data interchange format, into the linked data world. Linked data is the technological approach supporting the World Wide Web and one of the most effective means of sharing data ever devised.
In addition, the growing number of enterprise knowledge graphs fully exploit the potential of JSON-LD as it enables organizations to readily access data stored in document formats and a variety of semi-structured and unstructured data as well. By using this technology to link internal and external data, knowledge graphs exemplify the linked data approach underpinning the growing adoption of JSON-LD—and the demonstrable, recurring business value that linked data consistently provides.
Join us learn more about optimizing the unique Document and Graph Database capabilities provided by AllegroGraph to develop or enhance your Enterprise Knowledge Graph using JSON-LD.
The Query Store, part of Azure SQL Database and SQL Server 2016, changes the way query performance tuning will be done. Learn about this new technology, how it works and how to apply it.
Today’s highly connected world is flooding businesses with big and fast-moving data. The ability to trawl this data ocean and identify actionable insights can deliver a competitive advantage to any organization. The WSO2 Analytics Platform enables businesses to do just that by providing batch, real-time, interactive and predictive analysis capabilities all in one place.
In this tutorial we will
* Plug in the WSO2 Analytics Platform to some common business use cases
* Showcase the numerous capabilities of the platform
* Demonstrate how to collect data, analyze, predict and communicate effectively
* Demonstrate how it can analyze integration, security and IoT scenarios
Stick around till the end and you will walk away with the necessary skills to create a winning data strategy for your organization to stay ahead of its competition.
Neo4j-Databridge: Enterprise-scale ETL for Neo4jGraphAware
Neo4j - London User Group Meetup - 28th March, 2018
If your data ingestion requirements have grown beyond importing occasional CSV files then this talk is for you. Neo4j-Databridge from GraphAware is a comprehensive ETL tool specifically built for Neo4j. It has been designed for usability, expressive power and high performance to address the most common isues faced when importing data into Neo4j - multiple data sources and type, very large data sets, bespoke data conversions, non-tabular formats, filtering, merging and de-duplication, as well as bulk imports and incremental updates.
In this talk, we'll take a quick tour of the some of the main features, loading data from Kafka, Redis, JDBC and various other data sources along the way, to understand how Neo4j Databridge solves these problems and how it can help you import your data quickly and easily into Neo4j.
Vince Bickers is a Principal Consultant at GraphAware and the main author of Spring Data Neo4j (v4). He has been writing software and leading software development teams for over 30 years at organisations like Vodafone, Deutsche Bank, HSBC, Network Rail, UBS, VMWare, ConocoPhillips, Aviva and British Gas.
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
Slides from my talk at the Hadoop User Group Ireland meetup on June 13th 2016: building a data pipeline to ingest data from sources of different nature into Hadoop in minutes (and no coding at all) using the Open Source Streamsets Data Collector tool.
Detecting Anomalous Behavior with Surveillance AnalyticsDatabricks
Surveillance feed has essentially been monitored manually until recent years. Video analytics as a technology has made great strides and leverages video surveillance networks to derive searchable, actionable, and quantifiable intelligence from live or recorded video content.
Driven by artificial intelligence and deep learning, video intelligence solutions detect and extract objects in a video. These solutions identify target objects based on trained Deep Neural Networks and then classify each object to enable intelligent video analysis, including search & filtering, alerting, data aggregation and visualization.
In our session, we will:
Discuss the current state of surveillance and popular Python libraries used in video analytics
Elucidate various approaches deployed, using a myriad of pre-trained models from MobileNet SSD to the state-of-the-art Yolo Model.
Describe the many pre-processing techniques we have used, such as the generation of a time-averaged frame, erosion, dilation, and many others
With the basics covered, it’s LIGHTS! CAMERA! ACTION ….Let us show you how this works. We will be presenting a live demo that will explain the performance-computing trade-offs between the use of different models, techniques, and their limitations.
What you can expect to take away from our session:
Gain a deeper understanding of advanced Video Analytics techniques
Understand how to utilize pre-trained models for video analytics solutions
Learn more about the hardware requirements, limitations and challenges posed while devising a video analytics solution
Benefit from the lessons learnt upon deployment in a real-life scenario
The future direction and possibilities of the solution we have developed
DataMass Summit - Machine Learning for Big Data in SQL ServerŁukasz Grala
Sesja pokazująca zarówno Machine Learning Server (czyli algorytmy uczenia maszynowego w językach R i Python), ale także możliwość korzystania z danych JSON w SQL Server, czy też łączenia się do danych znajdujących się na HDFS, HADOOP, czy Spark poprzez Polybase w SQL Server, by te dane wykorzystywać do analizy, predykcji poprzez modele w językach R lub Python.
Sesja na temat analizy sentymentu, ale także i algorytmów uczenia maszynowego w bibliotekach do języka R Microsoft. Sesja była prezentowana na konferencji WhyR? w Warszawie
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Databricks
The Semantic Engine is a custom search engine deployable on top of large, non-native language corpora that goes beyond keyword search and does NOT require translation. The large, on-the-fly calculations essential to making this an effective search engine necessitated development on a distributed platform capable of processing large volumes of unstructured data.
Hear how the low barrier to entry provided by Apache Spark allowed the Novetta Solutions team to focus on the hard analytical challenges presented by their data, without having to spend much time grappling with the inherent difficulties normally associated with distributed computing.
Talk on Data Discovery and Metadata by Mark Grover from July 2019.
Goes into detail of the problem, build/buy/adopt analysis and Lyft's solution - Amundsen, along with thoughts on the future.
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Databricks
PixieDust is a new open source library that helps data scientists and developers working in Jupyter Notebooks and Apache Spark be more efficient. PixieDust speeds up data manipulation and display with features like: auto-visualization of Spark DataFrames, real-time Spark job progress monitoring, automated local install of Python and Scala kernels running with Spark, and much more.
Come along and learn how you can use this tool in your own projects to visualize and explore data effortlessly with no coding. Oh, and if you prefer working with a Scala Notebook, this session is also for you, as PixieDust can also run on a Scala Kernel. Imagine being able to visualize your favorite Python chart engines from a Scala Notebook!
We’ll finish the session with a demo combining Twitter, Watson Tone Analyzer, Spark Streaming, and some fun real-time visualizations–all running within a Notebook.
Warehousing Your Hits - The Why and How of Owning Your DataScott Arbeitman
These are the slides from my recent presentation at Melbourne' Web Analytics Wednesdays. I talk about transitioning from collecting your data in primary digital analytics systems to storing them in a data warehouse or data lake.
Importance of ML Reproducibility & Applications with MLfLowDatabricks
With data as a valuable currency and the architecture of reliable, scalable Data Lakes and Lakehouses continuing to mature, it is crucial that machine learning training and deployment techniques keep up to realize value. Reproducibility, efficiency, and governance in training and production environments rest on the shoulders of both point in time snapshots of the data and a governing mechanism to regulate, track, and make best use of associated metadata.
This talk will outline the challenges and importance of building and maintaining reproducible, efficient, and governed machine learning solutions as well as posing solutions built on open source technologies – namely Delta Lake for data versioning and MLflow for efficiency and governance.
Although you may not have heard of JavaScript Object Notation Linked Data (JSON-LD), it is already impacting your business. Search engine giants such as Google have mandated JSON-LD as a preferred means of adding structured data to web pages to make them considerably easier to parse for more accurate search engine results. The Google use case is indicative of the larger capacity for JSON-LD to increase web traffic for sites and better guide users to the results they want.
Expectations are high for (JSON-LD), and with good reason. JSON-LD effectively delivers the many benefits of JSON, a lightweight data interchange format, into the linked data world. Linked data is the technological approach supporting the World Wide Web and one of the most effective means of sharing data ever devised.
In addition, the growing number of enterprise knowledge graphs fully exploit the potential of JSON-LD as it enables organizations to readily access data stored in document formats and a variety of semi-structured and unstructured data as well. By using this technology to link internal and external data, knowledge graphs exemplify the linked data approach underpinning the growing adoption of JSON-LD—and the demonstrable, recurring business value that linked data consistently provides.
Join us learn more about optimizing the unique Document and Graph Database capabilities provided by AllegroGraph to develop or enhance your Enterprise Knowledge Graph using JSON-LD.
The Query Store, part of Azure SQL Database and SQL Server 2016, changes the way query performance tuning will be done. Learn about this new technology, how it works and how to apply it.
Today’s highly connected world is flooding businesses with big and fast-moving data. The ability to trawl this data ocean and identify actionable insights can deliver a competitive advantage to any organization. The WSO2 Analytics Platform enables businesses to do just that by providing batch, real-time, interactive and predictive analysis capabilities all in one place.
In this tutorial we will
* Plug in the WSO2 Analytics Platform to some common business use cases
* Showcase the numerous capabilities of the platform
* Demonstrate how to collect data, analyze, predict and communicate effectively
* Demonstrate how it can analyze integration, security and IoT scenarios
Stick around till the end and you will walk away with the necessary skills to create a winning data strategy for your organization to stay ahead of its competition.
Neo4j-Databridge: Enterprise-scale ETL for Neo4jGraphAware
Neo4j - London User Group Meetup - 28th March, 2018
If your data ingestion requirements have grown beyond importing occasional CSV files then this talk is for you. Neo4j-Databridge from GraphAware is a comprehensive ETL tool specifically built for Neo4j. It has been designed for usability, expressive power and high performance to address the most common isues faced when importing data into Neo4j - multiple data sources and type, very large data sets, bespoke data conversions, non-tabular formats, filtering, merging and de-duplication, as well as bulk imports and incremental updates.
In this talk, we'll take a quick tour of the some of the main features, loading data from Kafka, Redis, JDBC and various other data sources along the way, to understand how Neo4j Databridge solves these problems and how it can help you import your data quickly and easily into Neo4j.
Vince Bickers is a Principal Consultant at GraphAware and the main author of Spring Data Neo4j (v4). He has been writing software and leading software development teams for over 30 years at organisations like Vodafone, Deutsche Bank, HSBC, Network Rail, UBS, VMWare, ConocoPhillips, Aviva and British Gas.
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
Slides from my talk at the Hadoop User Group Ireland meetup on June 13th 2016: building a data pipeline to ingest data from sources of different nature into Hadoop in minutes (and no coding at all) using the Open Source Streamsets Data Collector tool.
This presentation demonstrate how Incorta support the data security requirements. It describes how to define the session variable and how to define the security filters.
HBase from the Trenches - Phoenix Data Conference 2015Avinash Ramineni
Apache HBase has been widely adopted at many enterprises. In this talk we will cover a few war stories with troubleshooting, tuning and fixing problems with HBase Cluster. We will be covering some of the best practices, tools , utilities and lessons learnt from evaluating deployments at different organizations
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...sparktc
At the sold-out Spark & Machine Learning Meetup in Brussels on October 27, 2016, Nick Pentreath of the Spark Technology Center teamed up with Jean-François Puget of IBM Analytics to deliver a talk called Creating an end-to-endRecommender System with Apache Spark and Elasticsearch.
Jean-François and Nick started with a look at the workflow for recommender systems and machine learning, then moved on to data modeling and using Spark ML for collaborative filtering. They closed with a discussion of deploying and scoring the recommender models, including a demo.
You’re Solr powered, and needing to customize its capabilities. Apache Solr is flexibly architected, with practically everything pluggable. Under the hood, Solr is driven by the well-known Apache Lucene. Lucene for Solr Developers will guide you through the various ways in which Solr can be extended, customized, and enhanced with a bit of Lucene API know-how. We’ll delve into improving analysis with custom character mapping, tokenizing, and token filtering extensions; show why and how to implement specialized query parsing, and how to add your own search and update request handling.
We went over what Big Data is and it's value. This talk will cover the details of Elasticsearch, a Big Data solution. Elasticsearch is an NoSQL-backed search engine using a HDFS-based filesystem.
We'll cover:
• Elasticsearch basics
• Setting up a development environment
• Loading data
• Searching data using REST
• Searching data using NEST, the .NET interface
• Understanding Scores
Finally, I show a use-case for data mining using Elasticsearch.
You'll walk away from this armed with the knowledge to add Elasticsearch to your data analysis toolkit and your applications.
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Amazon Web Services
Organisations today need a way to manage the ever-increasing volume of data from numerous sources such as log systems, click streams or connected devices and be able to analyse this data in real-time. In this session we will walk through an architecture demonstration of how to leverage AWS services to meet these needs.
Speaker: Ganesh Raja, Solutions Architect, Amazon Web Services
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
Presentation from full-stack agile on how you can scale your agile teams as your company grows. As your company grows your teams need to be able to adapt to change quickly.
New feature overview of Cubes 1.0 – lightweight Python OLAP and pluggable data warehouse. Video: https://www.youtube.com/watch?v=-FDTK80zsXc Github sources: https://github.com/databrewery/cubes
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
Search engines frequently miss the mark when it comes to understanding user intent. This talk will describe how to overcome this by leveraging Lucene/Solr to power a knowledge graph that can extract phrases, understand and weight the semantic relationships between those phrases and known entities, and expand the query to include those additional conceptual relationships. For example, if a user types in (Senior Java Developer Portland, OR Hadoop), you or I know that the term “senior” designates an experience level, that “java developer” is a job title related to “software engineering”, that “portland, or” is a city with a specific geographical boundary, and that “hadoop” is a technology related to terms like “hbase”, “hive”, and “map/reduce”. Out of the box, however, most search engines just parse this query as text:((senior AND java AND developer AND portland) OR (hadoop)), which is not at all what the user intended. We will discuss how to train the search engine to parse the query into this intended understanding, and how to reflect this understanding to the end user to provide an insightful, augmented search experience. Topics: Semantic Search, Finite State Transducers, Probabilistic Parsing, Bayes Theorem, Augmented Search, Recommendations, NLP, Knowledge Graphs
In the age of digital transformation and disruption, your ability to thrive depends on how you adapt to the constantly changing environment. MongoDB 3.4 is the latest release of the leading database for modern applications, a culmination of native database features and enhancements that will allow you to easily evolve your solutions to address emerging challenges and use cases.
In this webinar, we introduce you to what’s new, including:
- Multimodel Done Right. Native graph computation, faceted navigation, rich real-time analytics, and powerful connectors for BI and Apache Spark bring additional multimodel database support right into MongoDB.
- Mission-Critical Applications. Geo-distributed MongoDB zones, elastic clustering, tunable consistency, and enhanced security controls bring state-of-the-art database technology to your most mission-critical applications.
- Modernized Tooling. Enhanced DBA and DevOps tooling for schema management, fine-grained monitoring, and cloud-native integration allow engineering teams to ship applications faster, with less overhead and higher quality.
Building a real time big data analytics platform with solrTrey Grainger
Having “big data” is great, but turning that data into actionable intelligence is where the real value lies. This talk will demonstrate how you can use Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery.
At CareerBuilder, we utilize these techniques to report the supply and demand of the labor force, compensation trends, customer performance metrics, and many live internal platform analytics. You will walk away from this talk with an advanced understanding of faceting, including pivot-faceting, geo/radius faceting, time-series faceting, function faceting, and multi-select faceting. You’ll also get a sneak peak at some new faceting capabilities just wrapping up development including distributed pivot facets and percentile/stats faceting, which will be open-sourced.
The presentation will be a technical tutorial, along with real-world use-cases and data visualizations. After this talk, you'll never see Solr as just a text search engine again.
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
Percona Live 2017 - How sitecore depends on mongo db for scalability and performance, and what it can teach you by Antonios Giannopoulos and Grant Killian
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB
Data analytics can offer insights into your business and help take it to the next level. In this talk you'll learn about MongoDB tools for building visualizations, dashboards and interacting with your data. We'll start with exploratory data analysis using MongoDB Compass. Then, in a matter of minutes, we'll take you from 0 to 1 - connecting to your Atlas cluster via BI Connector and running analytical queries against it in Microsoft Excel. We'll also showcase the new MongoDB Charts product and you'll see how quick, easy and intuitive analytics can be on the MongoDB platform without flattening the data or spending time and effort on complicated and fragile ETL.
• Explored and cleaned huge amount of user activity logs (JSON) from Movies website using Map Reduce jobs in Python.
• Classified user accounts into adults and children for targeted advertising by implementing Similarity Ranking algorithm.
• Grouped user sessions based on user behavior using K means clustering to observe outliers and to find distinctive groups.
• Predicted ratings for movies using User-user and Item-Item based recommendation algorithms using Mahout.
Eagle6 is a product that use system artifacts to create a replica model that represents a near real-time view of system architecture. Eagle6 was built to collect system data (log files, application source code, etc.) and to link system behaviors in such a way that the user is able to quickly identify risks associated with unknown or unwanted behavioral events that may result in unknown impacts to seemingly unrelated down-stream systems. This session is designed to present the capabilities of the Eagle6 modeling product and how we are using MongoDB to support near-real-time analysis of large disparate datasets.
[This work was presented at SIGMOD'13.]
The use of large-scale data mining and machine learning has proliferated through the adoption of technologies such as Hadoop, with its simple programming semantics and rich and active ecosystem. This paper presents LinkedIn's Hadoop-based analytics stack, which allows data scientists and machine learning researchers to extract insights and build product features from massive amounts of data. In particular, we present our solutions to the "last mile" issues in providing a rich developer ecosystem. This includes easy ingress from and egress to online systems, and managing workflows as production processes. A key characteristic of our solution is that these distributed system concerns are completely abstracted away from researchers. For example, deploying data back into the online system is simply a 1-line Pig command that a data scientist can add to the end of their script. We also present case studies on how this ecosystem is used to solve problems ranging from recommendations to news feed updates to email digesting to descriptive analytical dashboards for our members.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
14. Learnings
+ Reads are fast
+ Spark helps building a Lambda Architecture
- Still duplicate code and complexity
- Each change needs an update of the batch view
20. Overall ranking with MySQL
SELECT
user_id,
SUM(points) as score
FROM event_log
WHERE created_at BETWEEN now() AND 90 Days ago
GROUP BY user_id
ORDER BY score DESC
21. First results of performance test
● Some queries were fast enough
● BUT: 17 - 20 seconds queries in worst case scenario
23. Aggregations in Elasticsearch
The aggregations framework helps provide aggregated data based on a search
query. It is based on simple building blocks called aggregations, that can be
composed in order to build complex summaries of the data.
elasticsearch documentation
24. Aggregation for Top User List
"aggregations": {
"top_users": {
"terms": {
"field": "user_id",
"size": 100,
"shard_size": 2000,
"order": {
"total_score": "desc"
}
},
"total_score": {
"sum": {
"field": "score"
}
}
}
}
25. Aggregation for Top User List
"aggregations": {
"top_users": {
"terms": {
"field": "user_id",
"size": 100,
"shard_size": 2000,
"order": {
"total_score": "desc"
}
},
"total_score": {
"sum": {
"field": "score"
}
}
}
}
groupBy
26. Aggregation for Top User List
"aggregations": {
"top_users": {
"terms": {
"field": "user_id",
"size": 100,
"shard_size": 2000,
"order": {
"total_score": "desc"
}
},
"total_score": {
"sum": {
"field": "score"
}
}
}
}
order by
27. Aggregation for Top User List
"aggregations": {
"top_users": {
"terms": {
"field": "user_id",
"size": 100,
"shard_size": 2000,
"order": {
"total_score": "desc"
}
},
"total_score": {
"sum": {
"field": "score"
}
}
}
}
tune accuracy
29. Request cache
● Search on local shards
● Cache local
● Invalidated on changes
● Hits.total, aggregations and suggestions
30. Request cache
● Search on local shards
● Cache local
● Invalidated on changes
● Hits.total, aggregations and suggestions
➔ Too much updates
➔ A lot of cache misses
31. Split data:
● Data of today: use index template to create index with first event
● Historical data: index without changes
Incoming Event
historical data
data of today
32. Use filtered aliases to select data of time range
Incoming Event
historical data
data of today
today
90days
filtered alias
33. Use cached results from historical data
Incoming Event
historical data
data of today
today
90days
filtered alias
Cache
_search?request_cache=true
service
34. The next day
Incoming Event
historical data
data of yesterday
today
90days
filtered alias
Cache
_search?request_cache=true
service
data of today
35. Merge the old indices
Incoming Event
historical data
data of yesterday
today
90days
filtered alias
Cache
_search?request_cache=true
service
data of today
36. Warm cache already in merge job
Incoming Event
historical data
data of today
today
90days
filtered alias
Cache
_search?request_cache=true
service
38. Learnings:
● Improved internal reindex framework
● Alias are always your friends
● Request cache FTW
● Cache miss, when you use index name instead of alias (?)
● Results may not be 100% accurate (but no problem for us)
40. We’re hiring…
● Web-Developer
● We are looking for experts in the area of Search and
NLP interested in supporting us for a couple of days!
Please get in touch. :)