This document discusses search analytics and Sematext's search analytics product. It summarizes Sematext's search analytics software, which collects search data using Flume and stores it in HBase. It then generates reports to help optimize search experiences. The software provides insights to help search providers and satisfies the needs of search users.
HBaseCon 2012 | Real-time Analytics with HBase - SematextCloudera, Inc.
In this talk we’ll explain how we implemented “update-less updates” (not a typo!) for HBase using append-only approach. This approach uses HBase core strengths like fast range scans and the recently added coprocessors to enable real-time analytics. It shines in situations where high data volume and velocity make random updates (aka Get+Put) prohibitively expensive. Apart from making real-time analytics possible, we’ll show how the append-only approach to updates makes it possible to perform rollbacks of data changes and avoid data inconsistency problems caused by tasks in MapReduce jobs that fail after only partially updating data in HBase.
JavaOne-2013: Save Scarce Resources by Managing Terabytes of Objects off-heap...harvraja
Presented at JavaOne 2013. A description of Coherence's Elastic Data feature and the reasons to want such a solution that allows storing data in other resource other than heap-based memory.
Apache Accumulo, originally developed by the National Security Agency and now an Apache Software Foundation project, builds upon Google's Bigtable design to provide a scalable, lightly-structured database capability complementing the ubiquitous Hadoop environment. The core capabilities of Accumulo include cell-level security, flexible schemas, real-time analytics, bulk I/O, and linear scalability beyond trillions of entries and petabytes of data. These new capabilities lead to techniques that unlock the power of Big Data, but don't fit into traditional database design patterns. Learn about the advantages of Apache Accumulo and how it fits into the Hadoop and NoSQL ecosystem.
Presenter: Adam Fuchs, CTO, sqrrl
HBaseCon 2012 | Real-time Analytics with HBase - SematextCloudera, Inc.
In this talk we’ll explain how we implemented “update-less updates” (not a typo!) for HBase using append-only approach. This approach uses HBase core strengths like fast range scans and the recently added coprocessors to enable real-time analytics. It shines in situations where high data volume and velocity make random updates (aka Get+Put) prohibitively expensive. Apart from making real-time analytics possible, we’ll show how the append-only approach to updates makes it possible to perform rollbacks of data changes and avoid data inconsistency problems caused by tasks in MapReduce jobs that fail after only partially updating data in HBase.
JavaOne-2013: Save Scarce Resources by Managing Terabytes of Objects off-heap...harvraja
Presented at JavaOne 2013. A description of Coherence's Elastic Data feature and the reasons to want such a solution that allows storing data in other resource other than heap-based memory.
Apache Accumulo, originally developed by the National Security Agency and now an Apache Software Foundation project, builds upon Google's Bigtable design to provide a scalable, lightly-structured database capability complementing the ubiquitous Hadoop environment. The core capabilities of Accumulo include cell-level security, flexible schemas, real-time analytics, bulk I/O, and linear scalability beyond trillions of entries and petabytes of data. These new capabilities lead to techniques that unlock the power of Big Data, but don't fit into traditional database design patterns. Learn about the advantages of Apache Accumulo and how it fits into the Hadoop and NoSQL ecosystem.
Presenter: Adam Fuchs, CTO, sqrrl
Presentation by Otis Gospodnetić, Sematext International, at Smart Content: The Content Analytics Conference, October 19, 2010, http://smartcontentconference.com
Getting the Most Out of Google AnalyticsSanger & Eby
You can’t manage what you can’t measure, and this session on Google Analytics opens up the world of data about your website. Understand how Google Analytics works, as well as what you can learn from it and how to interpret the data and put it to use to build actionable plans that deliver results to your bottom line.
Getting The Most Out of Google AnalyticsKat Jenkins
You can’t manage what you can’t measure, and this session on Google Analytics opens up the world of data about your website. Understand how Google Analytics works, as well as what you can learn from it and how to interpret the data and put it to use to build actionable plans that deliver results to your bottom line.
Presentation I gave at Enterprise Search Summit 2011. It suggests low-tech and relatively inexpensive ways to bring failing search projects back to life.
Apache Deep Learning 101 - DWS Berlin 2018Timothy Spann
Apache Deep Learning 101 with Apache MXNet, Apache NiFi, MiniFi, Apache Tika, Apache Open NLP, Apache Spark, Apache Hive, Apache HBase, Apache Livy and Apache Hadoop. Using Python we run various existing models via MXNet Model Server and via Python APIs. We also use NLP for entity resolution
A Digital Asset Management (DAM) solution and strategy can be key enablers for your enterprise to produce and deliver content in today's multi-channel world. As an open platform for content management, Alfresco can be used to build your DAM infrastructure: from search, preview, and assembly to digital rights management, renditioning, and publishing.
[ITOnAir]데브멘토 동영상 John Pocknell/Toad for Oracle 솔루션 수석 매니저 1부(총2부)
퀘스트 Toad11 신제품 및 데이터베이스 분야 전략
발표 기자간담회(2011. 10. 17)<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
퀘스트의 데이터베이스 분야 솔루션 전략 및 비전 발표. 최근의 클라우드
및 빅데이터와 같은 IT 환경 변화에 따른 DB 분야 최신
트렌드와 시장 상황에 대한 업데이트 내용과 Toad 11 제품군 및 강력한 성능이 어떻게 비즈니스에
부합하고 업무를 혁신할 수 있는지에 대한 설명
Empower your Enterprise with language intelligence_Francisco Webber Dataconomy Media
Francisco's Webber presentation, Inventor and Co-founder of cortical.io, who discusses how one can fundamentally understand how we can computationally model language and revolutionise semantic fingerprinting. Those are the slides from his presentations in Big Data Berlin, London, Paris, Munich and Vienna
Enterprise IIoT Edge Processing with Apache NiFiTimothy Spann
April 5, 2018 IoT Fusion 2018 Conference in Philadelphia, PA hosted by Chariot Solutions. This talk is about Apache NiFi, MiniFi, Python, Deep Learning, NVidia Jetson TX1, Raspberry Pi, Apache MXNet, TensorFlow and how to run things at the edge and process in your big data center. http://iotfusion.net/session/ https://github.com/tspannhw/IoTFusion2018Talk
UPA 2011 - Better Usability Through VisualizationOneSpring LLC
Better Usability Through Visualization
Visualization is a requirements elicitation and documentation technique which significantly reduces or eliminates the common problems of software definition.
Practitioners of this technique can expect improved usability, increased innovation, lower development costs and faster project time lines. This workshop provides attendees with the ingredients for successful use of visualization.
DataEngConf SF16 - Methods for Content Relevance at LinkedInHakka Labs
Learn how LinkedIn makes article recommendations for its users. Talk by Ajit Singh, LinkedIn. To hear about future conferences go to http://dataengconf.com
This talk was given during Activate Conference 2019. Lucene has a lot of options for configuring similarity, and Solr inherits them. Similarity makes the base of your relevancy score: how similar is this document to the query? The default similarity (BM25) is a good start, but you may need to tweak it for your use-case. In this session, you will learn how BM25 works and how you may want to change its parameters. Then, we'll move to other similarity classes: DFR, DFI, IB and LM. You will learn the thinking behind them, how that thinking translates to the similarity score, and which parameters allow you to tweak how score evolves based on things like term frequency or document length. By the end, you’ll have a good understanding of which similarity options are likely to work well for your use-case. You'll know which tunables are available and whether you need to implement a custom similarity class. As an example, we’ll focus on E-commerce, where you often end up ignoring term frequency altogether.
Key Takeaway
1) What are the built-in Lucene/Solr similarities and what they do
2) Which similarity to use for which use-case
3) How to use a custom similarity class in Solr
Learn more about search relevance and similarity: sematext.com/blog/search-relevance-solr-elasticsearch-similarity
This talk was given during DockerCon EU 2018.
It ain't just a whim - to be able to continue innovating, we’ve moved our good old static production to containers. We needed to be elastic, fast, reliable and production ready at any time - that's why we chose Docker. But like in most enterprises, lots of our apps run on the JVM and most JVMs’ ergonomics assume they “own” the server they are running on. So how do you containerize JVM apps? Should you really increase JVM heap if you have spare memory? What about OS caches? What are the differences between JDK 8, 9 and 10 when it comes to container awareness? Outages because of out of memory errors? Slowness because of long garbage collection and poor environment visibility? Long story short, in this session, we’ll look at the gotchas of running JVM apps in containers and teach you how to avoid costly mistakes.
Top 3 things attendees will learn:
1. Key differences between various JVM versions relevant for containerized Java apps.
2. Best practices for running JVM in containers.
3. Avoiding common pitfalls when running containerized JVM applications.
More Related Content
Similar to Search Analytics Business Value & NoSQL Backend
Presentation by Otis Gospodnetić, Sematext International, at Smart Content: The Content Analytics Conference, October 19, 2010, http://smartcontentconference.com
Getting the Most Out of Google AnalyticsSanger & Eby
You can’t manage what you can’t measure, and this session on Google Analytics opens up the world of data about your website. Understand how Google Analytics works, as well as what you can learn from it and how to interpret the data and put it to use to build actionable plans that deliver results to your bottom line.
Getting The Most Out of Google AnalyticsKat Jenkins
You can’t manage what you can’t measure, and this session on Google Analytics opens up the world of data about your website. Understand how Google Analytics works, as well as what you can learn from it and how to interpret the data and put it to use to build actionable plans that deliver results to your bottom line.
Presentation I gave at Enterprise Search Summit 2011. It suggests low-tech and relatively inexpensive ways to bring failing search projects back to life.
Apache Deep Learning 101 - DWS Berlin 2018Timothy Spann
Apache Deep Learning 101 with Apache MXNet, Apache NiFi, MiniFi, Apache Tika, Apache Open NLP, Apache Spark, Apache Hive, Apache HBase, Apache Livy and Apache Hadoop. Using Python we run various existing models via MXNet Model Server and via Python APIs. We also use NLP for entity resolution
A Digital Asset Management (DAM) solution and strategy can be key enablers for your enterprise to produce and deliver content in today's multi-channel world. As an open platform for content management, Alfresco can be used to build your DAM infrastructure: from search, preview, and assembly to digital rights management, renditioning, and publishing.
[ITOnAir]데브멘토 동영상 John Pocknell/Toad for Oracle 솔루션 수석 매니저 1부(총2부)
퀘스트 Toad11 신제품 및 데이터베이스 분야 전략
발표 기자간담회(2011. 10. 17)<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
퀘스트의 데이터베이스 분야 솔루션 전략 및 비전 발표. 최근의 클라우드
및 빅데이터와 같은 IT 환경 변화에 따른 DB 분야 최신
트렌드와 시장 상황에 대한 업데이트 내용과 Toad 11 제품군 및 강력한 성능이 어떻게 비즈니스에
부합하고 업무를 혁신할 수 있는지에 대한 설명
Empower your Enterprise with language intelligence_Francisco Webber Dataconomy Media
Francisco's Webber presentation, Inventor and Co-founder of cortical.io, who discusses how one can fundamentally understand how we can computationally model language and revolutionise semantic fingerprinting. Those are the slides from his presentations in Big Data Berlin, London, Paris, Munich and Vienna
Enterprise IIoT Edge Processing with Apache NiFiTimothy Spann
April 5, 2018 IoT Fusion 2018 Conference in Philadelphia, PA hosted by Chariot Solutions. This talk is about Apache NiFi, MiniFi, Python, Deep Learning, NVidia Jetson TX1, Raspberry Pi, Apache MXNet, TensorFlow and how to run things at the edge and process in your big data center. http://iotfusion.net/session/ https://github.com/tspannhw/IoTFusion2018Talk
UPA 2011 - Better Usability Through VisualizationOneSpring LLC
Better Usability Through Visualization
Visualization is a requirements elicitation and documentation technique which significantly reduces or eliminates the common problems of software definition.
Practitioners of this technique can expect improved usability, increased innovation, lower development costs and faster project time lines. This workshop provides attendees with the ingredients for successful use of visualization.
DataEngConf SF16 - Methods for Content Relevance at LinkedInHakka Labs
Learn how LinkedIn makes article recommendations for its users. Talk by Ajit Singh, LinkedIn. To hear about future conferences go to http://dataengconf.com
Similar to Search Analytics Business Value & NoSQL Backend (20)
This talk was given during Activate Conference 2019. Lucene has a lot of options for configuring similarity, and Solr inherits them. Similarity makes the base of your relevancy score: how similar is this document to the query? The default similarity (BM25) is a good start, but you may need to tweak it for your use-case. In this session, you will learn how BM25 works and how you may want to change its parameters. Then, we'll move to other similarity classes: DFR, DFI, IB and LM. You will learn the thinking behind them, how that thinking translates to the similarity score, and which parameters allow you to tweak how score evolves based on things like term frequency or document length. By the end, you’ll have a good understanding of which similarity options are likely to work well for your use-case. You'll know which tunables are available and whether you need to implement a custom similarity class. As an example, we’ll focus on E-commerce, where you often end up ignoring term frequency altogether.
Key Takeaway
1) What are the built-in Lucene/Solr similarities and what they do
2) Which similarity to use for which use-case
3) How to use a custom similarity class in Solr
Learn more about search relevance and similarity: sematext.com/blog/search-relevance-solr-elasticsearch-similarity
This talk was given during DockerCon EU 2018.
It ain't just a whim - to be able to continue innovating, we’ve moved our good old static production to containers. We needed to be elastic, fast, reliable and production ready at any time - that's why we chose Docker. But like in most enterprises, lots of our apps run on the JVM and most JVMs’ ergonomics assume they “own” the server they are running on. So how do you containerize JVM apps? Should you really increase JVM heap if you have spare memory? What about OS caches? What are the differences between JDK 8, 9 and 10 when it comes to container awareness? Outages because of out of memory errors? Slowness because of long garbage collection and poor environment visibility? Long story short, in this session, we’ll look at the gotchas of running JVM apps in containers and teach you how to avoid costly mistakes.
Top 3 things attendees will learn:
1. Key differences between various JVM versions relevant for containerized Java apps.
2. Best practices for running JVM in containers.
3. Avoiding common pitfalls when running containerized JVM applications.
This talk was given during Monitorama EU 2018.
Observability, like other ops practices, has hard and soft benefits. No logs - no root cause, that’s a hard benefit. A soft benefit is when we have more confidence in an observable system. Then we can be more productive in developing it. The trouble with soft benefits like confidence, is how to measure them. Does observability actually make us more productive? How about other activities, such as post-mortems? Why is alert fatigue so bad? Turns out, there are plenty of studies about the impact of such activities on our brain, our behavior, our productivity. In this session, we’ll explore what [neuro]science says about such practices so that:
We turn soft benefits into hard benefits
We can encourage a culture where we get the benefits and avoid the traps
Be prepared for surprises, as some “best practices” aren’t “best” at all.
This talk was given during DevOps Con 2017.
Have you ever spent time digging through various terminals, greping, lessing, awking and trying to find that few log lines that may be important? Have you every done that under time pressure, because mission critical services were not working? Have you every heard from your developers that they can’t tell you anything, because they don’t have access to application logs? Have you ever considered a centralized storage for logs, but time and resources are not on your side?
If you said yes, to any of the above questions, than this talk is for you. During the talk we’ll introduce you to the world of log centralization and analysis, both when it comes to open source, but also commercial tools. We will go from top to bottom and learn how to setup log centralization and analysis for servers, virtualized environments and containers. We will get from log shipping, through centralized buffering to storage and analysis to show you, that having a centralized log analysis tool is not a rocket science.
Finally, you will see how useful is to combine the logs from all your servers in a single place for blazingly fast correlation.
This talk was given during Lucene Revolution 2017.
They say optimize is bad for you, they say you shouldn't do it, they say it will invalidate operating system caches and make your system suffer. This is all true, but is it true in all cases?
In this presentation we will look closer on what optimize or better called force merge does to your Solr search engine. You will learn what segments are, how they are built and how they are used by Lucene and Solr for searching. We will discuss real-life performance implications regarding Solr collections that have many segments on a single node and compare that to the Solr where the number of segments is moderate and low. We will see what we can do to tune the merging process to trade off indexing performance for better query performance and what pitfalls are there waiting for us. Finally, at the end of the talk we will discuss possibilities of running force merge to avoid system disruption and still benefit from query performance boost that single segment index provides.
This talk was given during Lucene Revolution 2017 and has two goals: first, to discuss the tradeoffs for running Solr on Docker. For example, you get dynamic allocation of operating system caches, but you also get some CPU overhead. We'll keep in mind that Solr nodes tend to be different than your average container: Solr is usually long running, takes quite some RSS and a lot of virtual memory. This will imply, for example, that it makes more sense to use Docker on big physical boxes than on configurable-size VMs (like Amazon EC2).
The second goal is to discuss issues with deploying Solr on Docker and how to work around them. For example, many older (and some of the newer) combinations of Docker, Linux Kernel and JVM have memory leaks. We'll go over Docker operations best practices, such as using container limits to cap memory usage and prevent the host OOM killer from terminating a memory-consuming process - usually a Solr node. Or running Docker in Swarm mode over multiple smaller boxes to limit the spread of a single issue.
Docker is all the rage these days. While one doesn't hear much about Solr on Docker, we're here to tell you not only that it can be done, but also share how it's done.
We'll quickly go over the basic Docker ideas - containers are lighter than VMs, they solve "but it worked on my laptop" issues - so we can dive into the specifics of running Solr on Docker.
We'll do a live demo showing you how to run Solr master - slave as well as SolrCloud using containers, how to manage CPU assignments, constraint memory and use Docker data volumes when running Solr in containers. We will also show you how to create your own containers with custom configurations.
Finally, we'll address one of the core Solr questions - which deployment type should I use? We will demonstrate performance differences between the following deployment types:
- Single Solr instance running on a bare metal machine
- Multiple Solr instances running on a single bare metal machine
- Solr running in containers
- Solr running on virtual machine
- Solr running on virtual machine using unikernel
For each deployment type we'll address how it impacts performance, operational flexibility and all other key pros and cons you ought to keep in mind.
An updated talk about how to use Solr for logs and other time-series data, like metrics and social media. In 2016, Solr, its ecosystem, and the operating systems it runs on have evolved quite a lot, so we can now show new techniques to scale and new knobs to tune.
We'll start by looking at how to scale SolrCloud through a hybrid approach using a combination of time- and size-based indices, and also how to divide the cluster in tiers in order to handle the potentially spiky load in real-time. Then, we'll look at tuning individual nodes. We'll cover everything from commits, buffers, merge policies and doc values to OS settings like disk scheduler, SSD caching, and huge pages.
Finally, we'll take a look at the pipeline of getting the logs to Solr and how to make it fast and reliable: where should buffers live, which protocols to use, where should the heavy processing be done (like parsing unstructured data), and which tools from the ecosystem can help.
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerSematext Group, Inc.
Sematext engineer Rafal Kuc (@kucrafal) walks through the details of running high-performance, fault tolerant Elasticsearch clusters on Docker. Topics include: Containers vs. Virtual Machines, running the official Elasticsearch container, container constraints, good network practices, dealing with storage, data-only Docker volumes, scaling, time-based data, multiple tiers and tenants, indexing with and without routing, querying with and without routing, routing vs. no routing, and monitoring. Talk was delivered at DevOps Days Warsaw 2015.
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Sematext Group, Inc.
In this talk from Lucene/Solr Revolution 2015, Solr and centralized logging experts Radu Gheorghe and Rafal Kuć cover topics like: flow in Logstash, flow in rsyslog, parsing JSON, log shipping, Solr tuning, time-based collections and tiered clusters.
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...Sematext Group, Inc.
This talk covers the basics of centralizing logs in Elasticsearch and all the strategies that make it scale with billions of documents in production. Topics include:
- Time-based indices and index templates to efficiently slice your data
- Different node tiers to de-couple reading from writing, heavy traffic from low traffic
- Tuning various Elasticsearch and OS settings to maximize throughput and search performance
- Configuring tools such as logstash and rsyslog to maximize throughput and minimize overhead
Sematext's DevOps Evangelist, Stefan Thies (@seti321), takes a Docker Logging tour through the different log collection options Docker users have, the pros and cons of each, specific and existing Docker logging solutions, tooling, the role of syslog, log shipping to ELK Stack, and more. Q&A session at end.
For the Docker users out there, Sematext's DevOps Evangelist, Stefan Thies, goes through a number of different Docker monitoring options, points out their pros and cons, and offers solutions for Docker monitoring. Webinar contains actionable content, diagrams and how-to steps.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Search Analytics Business Value & NoSQL Backend
1. Search Analytics
Business Value
&
NoSQL Backend
Otis Gospodnetić – Sematext International
@otisg ◦ @sematext ◦ sematext.com
sematext.com/search-analytics
2. About Otis Gospodnetić
• ASF Member: Lucene, Solr, Nutch, Mahout
• Author: Lucene in Action 1 & 2
• Entrepreneur: Sematext, Simpy
2
Copyright 2011 Sematext Int'l. All rights reserved.
3. Sematext Metrics
● 100% organic: no GMO, no VC
● 4 years old
● < 10 people
● 7 countries
● 3 timezones
● 2 continents
● > 100 customers
3
Copyright 2011 Sematext Int'l. All rights reserved.
4. About Sematext
Products & Services
Consulting, Development, Tech Support:
● Search (Lucene, Solr, ElasticSearch...)
● Big Data (Hadoop, HBase, Voldemort...)
● Web Crawling (Nutch, Droids)
● Machine Learning (Mahout)
4
Copyright 2011 Sematext Int'l. All rights reserved.
5. Agenda
● What is Search Analytics and why it matters
● Example reports and their value
● What we built, why, and how
5
Copyright 2011 Sematext Int'l. All rights reserved.
6. Communication
● twitter.com/sematext
● twitter.com/otisg
● hash tags: #stsa or #stanalytics
● http://sematext.com/search-analytics/index.html
● Raise your hand!
● otis@sematext.com
6
Copyright 2011 Sematext Int'l. All rights reserved.
7. The Compass
Search logs are your Map
Search Analytics is your Compass
7
Copyright 2011 Sematext Int'l. All rights reserved.
8. High Level Why
search
users
search
experience
search
providers
8
Copyright 2011 Sematext Int'l. All rights reserved.
9. High Level Why
This search sucks!
It takes 17 tries to find anything here!
F!?@#$%^&?!?
search
users
search
experience
search
providers
Cool, the latest search tweaks
made our site really sticky!
Awesome!
9
Copyright 2011 Sematext Int'l. All rights reserved.
10. Don't Be Like This Dude
10
Copyright 2011 Sematext Int'l. All rights reserved.
11. Got Clue?
Performance Monitoring
Tuning Search Analytics UI
Quality Assurance
11
Copyright 2011 Sematext Int'l. All rights reserved.
12. More Concrete Why
● Measure and monitor everything. Introspection.
● Supports (re)design, navigation choices
● Helps with content acquisition & enhancement
● Improve search experience
● Mula
12
Copyright 2011 Sematext Int'l. All rights reserved.
13. The Moment of Truth
Question for the audience #1
What do you use for Search Analytics?
a) Home grown stuff
b) Google Analytics
c) Omniture
d) Webtrends
e) Other
f ) Nothing
13
Copyright 2011 Sematext Int'l. All rights reserved.
14. Search Analytics Outline
● Collect: queries & clicks & interactions & ...
● Analyze: actions / xactions / conversions
● Output: reports – over time
● Output++: feedback loop remember this
● The means, not the goal
● Ongoing, not one-off
14
Copyright 2011 Sematext Int'l. All rights reserved.
15. Search vs. Web Analytics
● User intent and information needs vs. inferring
● Hand in hand
● Ideally you can relate data from both or even
unify it
15
Copyright 2011 Sematext Int'l. All rights reserved.
16. Example Core Reports
● Rate & Volume, Latency (mean, avg, 90%)
● Click Through Rate, Mean Reciprocal Rank
● Top Queries by count, clicks, 0 hits...
● Query Trending
● Top Seen Docs, Top Clicked Docs (msft)
● Page & Click Depth
● Facet & Sort Usage
● ...
16
Copyright 2011 Sematext Int'l. All rights reserved.
17. More Reports in More Detail
● See Search Analytics What? Why?
How?
http://blog.sematext.com/tag/analytics/
17
Copyright 2011 Sematext Int'l. All rights reserved.
18. Part Dos
Switching gears... Juno digs NoSQL
18
Copyright 2011 Sematext Int'l. All rights reserved.
19. What We've Built
● Search Analytics SaaS
● Numerous reports (e.g. query volume,
rate, latency, term frequencies /
comparisons, hit buckets, search origins,
etc.)
● Trending over time
● Comparisons of time periods
● Top N reports
● Filter, slice and dice
19
Copyright 2011 Sematext Int'l. All rights reserved.
20. Who Needs a Compass?
● We need it
● search-hadoop.com & search-lucene.com
● Our customers need it!
● You?
20
Copyright 2011 Sematext Int'l. All rights reserved.
24. SaaS vs. In-House
Question for the audience #2
SaaS vs in-house Search Analytics?
a) SaaS
b) in-house
24
Copyright 2011 Sematext Int'l. All rights reserved.
29. Data Flow
● See Search Analytics with Flume and HBase
http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/
29
Copyright 2011 Sematext Int'l. All rights reserved.
30. Data Collection
● See Search Analytics with Flume and HBase
http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/
30
Copyright 2011 Sematext Int'l. All rights reserved.
31. Core Tech
● JavaScript Beacons
● Metric Capture Web App aka Receiver
● Flume Agents, Collectors, Sinks
● HBase
● MapReduce Aggregations
● Search Analytics Reporting Web App
31
Copyright 2011 Sematext Int'l. All rights reserved.
32. What is Flume
● Distributed data/log collection service
● Scalable, configurable, extensible
● Centrally manageable, open source
● Agents get data from app, Collectors save it
● Abstractions: Source → Decorator(s) → Sink
32
Copyright 2011 Sematext Int'l. All rights reserved.
33. What is HBase
● Scalable, reliable, distributed, column-oriented DB
● On top of HDFS
● MapReducable
33
Copyright 2011 Sematext Int'l. All rights reserved.
35. Why Flume
● Reliable delivery
● e.g. queue msgs locally if destination unreachable
● Easy, centralized management via Web UI or
console
● Good community, good progress, now @ASF
● But: more complex, more moving parts
● On Flume: slideshare.net/cloudera/inside-flume
● Alternatives: Kafka, Scribe...
35
Copyright 2011 Sematext Int'l. All rights reserved.
36. Why HBase
● Scalable raw & aggregate data storage
● MapReduce data input
● Fast scans for time ranges, fast key lookups
● Easy storage and compute power expansion
● Good looking roadmap, community, progress
36
Copyright 2011 Sematext Int'l. All rights reserved.
37. Open Sourcing
● 2 open-source projects:
github.com/sematext/HBaseWD
github.com/sematext/HBaseHUT
● See sematext.com/open-source/index.html
● Patches for Flume and HBase
blog.sematext.com/tag/flume/
37
Copyright 2011 Sematext Int'l. All rights reserved.
38. Challenges
● Data size. Solutions:
● Compression (4-5x smaller with lzo)
● Data pruning (variable levels)
● Query string distribution: very long-tail
● Lots of data to process, update, aggregate
● Young tools: Flume, HBase
● Poor IO on EC2
● Hadoop distributions
38
Copyright 2011 Sematext Int'l. All rights reserved.
39. Output++
● AutoComplete - $MM improvement
● Better DYM Spellchecker
● Related Searches
● Recommendations
● Relevance Feedback
● ...
39
Copyright 2011 Sematext Int'l. All rights reserved.
40. Closing the Loop
search
users
search
experience
search
providers
40
Copyright 2011 Sematext Int'l. All rights reserved.
41. Resource
Search Analytics for Your Site
Louis Rosenfeld
http://rosenfeldmedia.com/books/searchanalytics/
41
Copyright 2011 Sematext Int'l. All rights reserved.
42. We're Hiring
Dig Search?
Dig Analytics?
Dig Big Data?
Dig Performance?
Dig working with and in open-source?
We're hiring world-wide!
http://sematext.com/about/jobs.html
42
Copyright 2011 Sematext Int'l. All rights reserved.
43. Contact
sematext.com
blog.sematext.com
@sematext
@otisg
otis@sematext.com
Want SA? Grab me or go to:
sematext.com/search-analytics
Hash tags: #stsa or #stanalytics
43
Copyright 2011 Sematext Int'l. All rights reserved.