A talk that Ted Dunning gave at the Big Data Analytics meetup hosted by Klout about how real-time and long-time can be integrated into a single computation.
Talk on the upcoming Mahout nearest neighbor framework focussing particularly on the k-means acceleration provided by the streaming k-means implementation.
This set of slides describes several on-line learning algorithms which taken together can provide significant benefit to real-time applications. Given by Ted Dunning at Strata New York.
I gave this talk at Buzzwords just now to fill in for an ill speaker.
The topics include things that are being added to or taken out of Mahout. These include cruft (out), fast clustering (in), nearest neighbor search (in), Pig bindings for Mahout (who knows).
Many statistics are impossible to compute precisely on streaming data. There are some very clever algorithms, however, which allow us to compute very good approximations of these values efficiently in terms of CPU and memory.
Weather, opponents, geopolitics: so many uncertainties in such a case ? How to manage power systems in spite of these uncertainties, and how to decide investments.
Talk at Saint-Etienne in 2015; thanks to R. Leriche and to the "games and optimizations" days in Saint-Etienne.
We introduce the idea that metadata, including project information, data labels, data characteristics and indications of valuable use, can be propagated through a data processing lineage graph. Further, finding examples of significant cooccurrence of propagated and original metadata gives us the basis of an interesting kind of search engine gives interesting recommendations of data given a problem statement even in a near cold-start situation.
The folk wisdom has always been that when running stateful applications inside containers, the only viable choice is to externalize the state so that the containers themselves are stateless or nearly so. Keeping large amounts of state inside containers is possible, but it’s considered a problem because stateful containers generally can’t preserve that state across restarts.
In practice, this complicates the management of large-scale Kubernetes-based infrastructure because these high-performance storage systems require separate management. In terms of overall system management, it would be ideal if we could run a software-defined storage system directly in containers managed by Kubernetes, but that has been hampered by lack of direct device access and difficult questions about what happens to the state on container restarts.
Ted Dunning describes recent developments that make it possible for Kubernetes to manage both compute and storage tiers in the same cluster. Container restarts can be handled gracefully without loss of data or a requirement to rebuild storage structures and access to storage from compute containers is extremely fast. In some environments, it’s even possible to implement elastic storage frameworks that can fold data onto just a few containers during quiescent periods or explode it in just a few seconds across a large number of machines when higher speed access is required.
The benefits of systems like this extend beyond management simplicity, because applications can be more Agile precisely because the storage layer is more stable and can be uniformly accessed from any container host. Even better, it makes it a snap to configure and deploy a full-scale compute and storage infrastructure.
Ellen Friedman and I spoke at the ACM meetup about how stream-first architecture can have a big impact and how the logistics of machine learning is a great example of that impact.
This is my half of the presentation.
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
Tensors are a very useful tool for mathematical programming. Moreover, the optimization frameworks that are part of most machine learning frameworks have some very cool uses outside of the normal machine learning kinds of tasks.
A talk that Ted Dunning gave at the Big Data Analytics meetup hosted by Klout about how real-time and long-time can be integrated into a single computation.
Talk on the upcoming Mahout nearest neighbor framework focussing particularly on the k-means acceleration provided by the streaming k-means implementation.
This set of slides describes several on-line learning algorithms which taken together can provide significant benefit to real-time applications. Given by Ted Dunning at Strata New York.
I gave this talk at Buzzwords just now to fill in for an ill speaker.
The topics include things that are being added to or taken out of Mahout. These include cruft (out), fast clustering (in), nearest neighbor search (in), Pig bindings for Mahout (who knows).
Many statistics are impossible to compute precisely on streaming data. There are some very clever algorithms, however, which allow us to compute very good approximations of these values efficiently in terms of CPU and memory.
Weather, opponents, geopolitics: so many uncertainties in such a case ? How to manage power systems in spite of these uncertainties, and how to decide investments.
Talk at Saint-Etienne in 2015; thanks to R. Leriche and to the "games and optimizations" days in Saint-Etienne.
We introduce the idea that metadata, including project information, data labels, data characteristics and indications of valuable use, can be propagated through a data processing lineage graph. Further, finding examples of significant cooccurrence of propagated and original metadata gives us the basis of an interesting kind of search engine gives interesting recommendations of data given a problem statement even in a near cold-start situation.
The folk wisdom has always been that when running stateful applications inside containers, the only viable choice is to externalize the state so that the containers themselves are stateless or nearly so. Keeping large amounts of state inside containers is possible, but it’s considered a problem because stateful containers generally can’t preserve that state across restarts.
In practice, this complicates the management of large-scale Kubernetes-based infrastructure because these high-performance storage systems require separate management. In terms of overall system management, it would be ideal if we could run a software-defined storage system directly in containers managed by Kubernetes, but that has been hampered by lack of direct device access and difficult questions about what happens to the state on container restarts.
Ted Dunning describes recent developments that make it possible for Kubernetes to manage both compute and storage tiers in the same cluster. Container restarts can be handled gracefully without loss of data or a requirement to rebuild storage structures and access to storage from compute containers is extremely fast. In some environments, it’s even possible to implement elastic storage frameworks that can fold data onto just a few containers during quiescent periods or explode it in just a few seconds across a large number of machines when higher speed access is required.
The benefits of systems like this extend beyond management simplicity, because applications can be more Agile precisely because the storage layer is more stable and can be uniformly accessed from any container host. Even better, it makes it a snap to configure and deploy a full-scale compute and storage infrastructure.
Ellen Friedman and I spoke at the ACM meetup about how stream-first architecture can have a big impact and how the logistics of machine learning is a great example of that impact.
This is my half of the presentation.
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
Tensors are a very useful tool for mathematical programming. Moreover, the optimization frameworks that are part of most machine learning frameworks have some very cool uses outside of the normal machine learning kinds of tasks.
The logistics of machine learning typically take waaay more effort than the machine learning itself. Moreover, machine learning systems aren't like normal software projects so continuous integration takes on new meaning.
You know that a single number isn't a good summary of a measurement. T-digest and other non-uniform histograms can make it easy to keep track of an entire distribution and can be combined in OLAP queries.
The latest t-digest is faster, more accurate and has hard bounds on size.
This talk shows practical methods for find changes in a variety of kinds of data as well as giving real-world examples from finance, telecom, systems monitoring and natural language processing.
This was one of the talks that I gave at the Strata San Jose conference. I migrated my topic a bit, but here is the original abstract:
Application developers and architects today are interested in making their applications as real-time as possible. To make an application respond to events as they happen, developers need a reliable way to move data as it is generated across different systems, one event at a time. In other words, these applications need messaging.
Messaging solutions have existed for a long time. However, when compared to legacy systems, newer solutions like Apache Kafka offer higher performance, more scalability, and better integration with the Hadoop ecosystem. Kafka and similar systems are based on drastically different assumptions than legacy systems and have vastly different architectures. But do these benefits outweigh any tradeoffs in functionality? Ted Dunning dives into the architectural details and tradeoffs of both legacy and new messaging solutions to find the ideal messaging system for Hadoop.
Topics include:
* Queues versus logs
* Security issues like authentication, authorization, and encryption
* Scalability and performance
* Handling applications that span multiple data centers
* Multitenancy considerations
* APIs, integration points, and more
This talk focuses on how larger data sets are not only enabling advanced techniques, but also increasing the number of problems within reach of relatively simple techniques, that is "cheap learning".
These are the slides from my talk at FAR Con in Minneapolis recently. The topics are the implications of buried treasure hoards on data security, horror stories and new, simpler and provably secure methods for public data disclosure.
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
This talk describes how indicator-based recommendations can be evolved in real time. Normally, indicator-based recommendations use a large off-line computation to understand the general structure of items to be recommended and then make recommendations in real-time to users based on a comparison of their recent history versus the large-scale product of the off-line computation.
In this talk, I show how the same components of the off-line computation that guarantee linear scalability in a batch setting also give strict real-time bounds on the cost of a practical real-time implementation of the indicator computation.
How the Internet of Things is Turning the Internet Upside DownTed Dunning
This is a wide-ranging talk that goes into how the internet is architected, how that architecture is changing as a result of internet of things, how the internet of things worked in the 19th century big data, open-source community and how to build time-series databases to make this all possible.
Really.
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
Apache Kylin (incubating) is a new project to bring OLAP cubes to Hadoop. I walk through the project and describe how it works and how users see the project.
Anomaly Detection - New York Machine LearningTed Dunning
Anomaly detection is the art of finding what you don't know how to ask for. In this talk, I walk through the why and how of building probabilistic models for a variety of problems including continuous signals and web traffic. This talk blends theory and practice in a highly approachable way.
Cognitive computing with big data, high tech and low tech approachesTed Dunning
I explain some very approachable methods for analyzing big data via a detour through clipper ships and the 19th century open source scene.
Note that I mixed up the route of the Flying Cloud record in this talk. The Flying Cloud's record was actually from New York to San Francisco and was even more impressive than what I said. The usual time had been about 180 days. With Maury's charts, the time was reduced to about 135 days. The Flying Cloud's time was 89 days.
Thanks to Chen Kung for noticing my error.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.