Many statistics are impossible to compute precisely on streaming data. There are some very clever algorithms, however, which allow us to compute very good approximations of these values efficiently in terms of CPU and memory.
Anomaly Detection - New York Machine LearningTed Dunning
Anomaly detection is the art of finding what you don't know how to ask for. In this talk, I walk through the why and how of building probabilistic models for a variety of problems including continuous signals and web traffic. This talk blends theory and practice in a highly approachable way.
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
This talk describes how indicator-based recommendations can be evolved in real time. Normally, indicator-based recommendations use a large off-line computation to understand the general structure of items to be recommended and then make recommendations in real-time to users based on a comparison of their recent history versus the large-scale product of the off-line computation.
In this talk, I show how the same components of the off-line computation that guarantee linear scalability in a batch setting also give strict real-time bounds on the cost of a practical real-time implementation of the indicator computation.
This talk focuses on how larger data sets are not only enabling advanced techniques, but also increasing the number of problems within reach of relatively simple techniques, that is "cheap learning".
These are the slides from my talk at FAR Con in Minneapolis recently. The topics are the implications of buried treasure hoards on data security, horror stories and new, simpler and provably secure methods for public data disclosure.
This talk describes the general architecture common to anomaly detections systems that are based on probabilistic models. By examining several realistic use cases, I illustrate the common themes and practical implementation methods.
This is the position talk that I gave at CIKM. Included are 4 algorithms that I feel don't get much academic attention, but which are very important industrially. It isn't necessarily true that these algorithms *should* get academic attention, but I do feel that it is true that they are quite important pragmatically speaking.
Anomaly Detection - New York Machine LearningTed Dunning
Anomaly detection is the art of finding what you don't know how to ask for. In this talk, I walk through the why and how of building probabilistic models for a variety of problems including continuous signals and web traffic. This talk blends theory and practice in a highly approachable way.
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
This talk describes how indicator-based recommendations can be evolved in real time. Normally, indicator-based recommendations use a large off-line computation to understand the general structure of items to be recommended and then make recommendations in real-time to users based on a comparison of their recent history versus the large-scale product of the off-line computation.
In this talk, I show how the same components of the off-line computation that guarantee linear scalability in a batch setting also give strict real-time bounds on the cost of a practical real-time implementation of the indicator computation.
This talk focuses on how larger data sets are not only enabling advanced techniques, but also increasing the number of problems within reach of relatively simple techniques, that is "cheap learning".
These are the slides from my talk at FAR Con in Minneapolis recently. The topics are the implications of buried treasure hoards on data security, horror stories and new, simpler and provably secure methods for public data disclosure.
This talk describes the general architecture common to anomaly detections systems that are based on probabilistic models. By examining several realistic use cases, I illustrate the common themes and practical implementation methods.
This is the position talk that I gave at CIKM. Included are 4 algorithms that I feel don't get much academic attention, but which are very important industrially. It isn't necessarily true that these algorithms *should* get academic attention, but I do feel that it is true that they are quite important pragmatically speaking.
Apache Mahout is changing radically. Here is a report on what is coming, notably including an R like domain specific language that can use multiple computational engines such as Spark.
Building multi-modal recommendation engines using search enginesTed Dunning
This is my strata NY talk about how to build recommendation engines using common items. In particular, I show how multi-modal recommendations can be built using the same framework.
These are the slides that we used to ignite the conversation with the audience at Hadoop Summit EU. Come over to the Mahout dev list to be part of the ongoing conversation.
This was one of the talks that I gave at the Strata San Jose conference. I migrated my topic a bit, but here is the original abstract:
Application developers and architects today are interested in making their applications as real-time as possible. To make an application respond to events as they happen, developers need a reliable way to move data as it is generated across different systems, one event at a time. In other words, these applications need messaging.
Messaging solutions have existed for a long time. However, when compared to legacy systems, newer solutions like Apache Kafka offer higher performance, more scalability, and better integration with the Hadoop ecosystem. Kafka and similar systems are based on drastically different assumptions than legacy systems and have vastly different architectures. But do these benefits outweigh any tradeoffs in functionality? Ted Dunning dives into the architectural details and tradeoffs of both legacy and new messaging solutions to find the ideal messaging system for Hadoop.
Topics include:
* Queues versus logs
* Security issues like authentication, authorization, and encryption
* Scalability and performance
* Handling applications that span multiple data centers
* Multitenancy considerations
* APIs, integration points, and more
This talk shows practical methods for find changes in a variety of kinds of data as well as giving real-world examples from finance, telecom, systems monitoring and natural language processing.
Using Mahout and a Search Engine for RecommendationTed Dunning
I presented this talk at the Open World Forum in Paris in 2013. The ideas here are that you can do basic recommendations and extended forms of recommendation such as intelligent search or cross recommendation or multi-modal recommendation using Mahout's cooccurrence analysis together with a search engine.
Cognitive computing with big data, high tech and low tech approachesTed Dunning
I explain some very approachable methods for analyzing big data via a detour through clipper ships and the 19th century open source scene.
Note that I mixed up the route of the Flying Cloud record in this talk. The Flying Cloud's record was actually from New York to San Francisco and was even more impressive than what I said. The usual time had been about 180 days. With Maury's charts, the time was reduced to about 135 days. The Flying Cloud's time was 89 days.
Thanks to Chen Kung for noticing my error.
You know that a single number isn't a good summary of a measurement. T-digest and other non-uniform histograms can make it easy to keep track of an entire distribution and can be combined in OLAP queries.
The latest t-digest is faster, more accurate and has hard bounds on size.
Recent work in recommendations allows some really amazing simplicity of implementation while extending the inputs handled to multiple kinds of interactions against items different from the ones being recommended.
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
Tensors are a very useful tool for mathematical programming. Moreover, the optimization frameworks that are part of most machine learning frameworks have some very cool uses outside of the normal machine learning kinds of tasks.
Ted Dunning, Chief Application Architect, MapR at MLconf SFMLconf
Abstract: Near real-time Updates for Cooccurrence-based Recommenders
Most recommendation algorithms are inherently batch oriented and require all relevant history to be processed. In some contexts such as music, this does not cause significant problems because waiting a day or three before recommendations are available for new items doesn’t significantly change their impact. In other contexts, the value of items drops precipitously with time so that recommending day-old items has little value to users.
In this talk, I will describe how a large-scale multi-modal cooccurrence recommender can be extended to include near real-time updates. In addition, I will show how these real-time updates are compatible with delivery of recommendations via search engines.
Apache Mahout is changing radically. Here is a report on what is coming, notably including an R like domain specific language that can use multiple computational engines such as Spark.
Building multi-modal recommendation engines using search enginesTed Dunning
This is my strata NY talk about how to build recommendation engines using common items. In particular, I show how multi-modal recommendations can be built using the same framework.
These are the slides that we used to ignite the conversation with the audience at Hadoop Summit EU. Come over to the Mahout dev list to be part of the ongoing conversation.
This was one of the talks that I gave at the Strata San Jose conference. I migrated my topic a bit, but here is the original abstract:
Application developers and architects today are interested in making their applications as real-time as possible. To make an application respond to events as they happen, developers need a reliable way to move data as it is generated across different systems, one event at a time. In other words, these applications need messaging.
Messaging solutions have existed for a long time. However, when compared to legacy systems, newer solutions like Apache Kafka offer higher performance, more scalability, and better integration with the Hadoop ecosystem. Kafka and similar systems are based on drastically different assumptions than legacy systems and have vastly different architectures. But do these benefits outweigh any tradeoffs in functionality? Ted Dunning dives into the architectural details and tradeoffs of both legacy and new messaging solutions to find the ideal messaging system for Hadoop.
Topics include:
* Queues versus logs
* Security issues like authentication, authorization, and encryption
* Scalability and performance
* Handling applications that span multiple data centers
* Multitenancy considerations
* APIs, integration points, and more
This talk shows practical methods for find changes in a variety of kinds of data as well as giving real-world examples from finance, telecom, systems monitoring and natural language processing.
Using Mahout and a Search Engine for RecommendationTed Dunning
I presented this talk at the Open World Forum in Paris in 2013. The ideas here are that you can do basic recommendations and extended forms of recommendation such as intelligent search or cross recommendation or multi-modal recommendation using Mahout's cooccurrence analysis together with a search engine.
Cognitive computing with big data, high tech and low tech approachesTed Dunning
I explain some very approachable methods for analyzing big data via a detour through clipper ships and the 19th century open source scene.
Note that I mixed up the route of the Flying Cloud record in this talk. The Flying Cloud's record was actually from New York to San Francisco and was even more impressive than what I said. The usual time had been about 180 days. With Maury's charts, the time was reduced to about 135 days. The Flying Cloud's time was 89 days.
Thanks to Chen Kung for noticing my error.
You know that a single number isn't a good summary of a measurement. T-digest and other non-uniform histograms can make it easy to keep track of an entire distribution and can be combined in OLAP queries.
The latest t-digest is faster, more accurate and has hard bounds on size.
Recent work in recommendations allows some really amazing simplicity of implementation while extending the inputs handled to multiple kinds of interactions against items different from the ones being recommended.
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
Tensors are a very useful tool for mathematical programming. Moreover, the optimization frameworks that are part of most machine learning frameworks have some very cool uses outside of the normal machine learning kinds of tasks.
Ted Dunning, Chief Application Architect, MapR at MLconf SFMLconf
Abstract: Near real-time Updates for Cooccurrence-based Recommenders
Most recommendation algorithms are inherently batch oriented and require all relevant history to be processed. In some contexts such as music, this does not cause significant problems because waiting a day or three before recommendations are available for new items doesn’t significantly change their impact. In other contexts, the value of items drops precipitously with time so that recommending day-old items has little value to users.
In this talk, I will describe how a large-scale multi-modal cooccurrence recommender can be extended to include near real-time updates. In addition, I will show how these real-time updates are compatible with delivery of recommendations via search engines.
A talk that Ted Dunning gave at the Big Data Analytics meetup hosted by Klout about how real-time and long-time can be integrated into a single computation.
Talk on the upcoming Mahout nearest neighbor framework focussing particularly on the k-means acceleration provided by the streaming k-means implementation.
From the Hadoop Summit 2015 Session with Ted Dunning:
Just when we thought the last mile problem was solved, the Internet of Things is turning the last mile problem of the consumer internet into the first mile problem of the industrial internet. This inversion impacts every aspect of the design of networked applications. I will show how to use existing Hadoop ecosystem tools, such as Spark, Drill and others, to deal successfully with this inversion. I will present real examples of how data from things leads to real business benefits and describe real techniques for how these examples work.
Python is dominating the fast-growing data-science landscape. This talk provides a foundational overview of the practice of data science and some of the most popular Python libraries for doing data science. It also provides an overview of how Anaconda brings it all together.
Practical deep learning for computer visionEran Shlomo
This is the presentation given in TLV DLD 2017. In this presentation we walk through the planning and implemintation of deeplearning solution for image recognition, with focus on the data.
It is based on the work we do at dataloop.ai and its customers.
We introduce the idea that metadata, including project information, data labels, data characteristics and indications of valuable use, can be propagated through a data processing lineage graph. Further, finding examples of significant cooccurrence of propagated and original metadata gives us the basis of an interesting kind of search engine gives interesting recommendations of data given a problem statement even in a near cold-start situation.
The folk wisdom has always been that when running stateful applications inside containers, the only viable choice is to externalize the state so that the containers themselves are stateless or nearly so. Keeping large amounts of state inside containers is possible, but it’s considered a problem because stateful containers generally can’t preserve that state across restarts.
In practice, this complicates the management of large-scale Kubernetes-based infrastructure because these high-performance storage systems require separate management. In terms of overall system management, it would be ideal if we could run a software-defined storage system directly in containers managed by Kubernetes, but that has been hampered by lack of direct device access and difficult questions about what happens to the state on container restarts.
Ted Dunning describes recent developments that make it possible for Kubernetes to manage both compute and storage tiers in the same cluster. Container restarts can be handled gracefully without loss of data or a requirement to rebuild storage structures and access to storage from compute containers is extremely fast. In some environments, it’s even possible to implement elastic storage frameworks that can fold data onto just a few containers during quiescent periods or explode it in just a few seconds across a large number of machines when higher speed access is required.
The benefits of systems like this extend beyond management simplicity, because applications can be more Agile precisely because the storage layer is more stable and can be uniformly accessed from any container host. Even better, it makes it a snap to configure and deploy a full-scale compute and storage infrastructure.
Ellen Friedman and I spoke at the ACM meetup about how stream-first architecture can have a big impact and how the logistics of machine learning is a great example of that impact.
This is my half of the presentation.
The logistics of machine learning typically take waaay more effort than the machine learning itself. Moreover, machine learning systems aren't like normal software projects so continuous integration takes on new meaning.
How the Internet of Things is Turning the Internet Upside DownTed Dunning
This is a wide-ranging talk that goes into how the internet is architected, how that architecture is changing as a result of internet of things, how the internet of things worked in the 19th century big data, open-source community and how to build time-series databases to make this all possible.
Really.
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
Apache Kylin (incubating) is a new project to bring OLAP cubes to Hadoop. I walk through the project and describe how it works and how users see the project.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/