The greater promise of Big Data lies not in doing old things in slightly new ways. Instead, it lies in doing new things that were previously not possible. One major class of new things is adding intelligence to large-scale systems. In this session I will present a survey of how machine learning can be applied to real-life situations without having to get a PhD in advanced mathematics. These systems can be built today from open source components to increase business revenues by understanding what customers need and want. I will provide real world examples of best practices and pitfalls in machine learning including practical ways to build maintainable, high performance systems.
With the rise of Apache Hadoop, a next-generation enterprise data architecture is emerging that connects the systems powering business transactions and business intelligence. Hadoop is uniquely capable of storing, aggregating, and refining multi-structured data sources into formats that fuel new business insights. Apache Hadoop is fast becoming the defacto platform for processing Big Data. Hadoop started from a relatively humble beginning as a point solution for small search systems. Its growth into an important technology to the broader enterprise community dates back to Yahoo’s 2006 decision to evolve Hadoop into a system for solving its internet scale big data problems. Eric will discuss the current state of Hadoop and what is coming from a development standpoint as Hadoop evolves to meet more workloads.
Past Present and Future of Data Processing in Apache HadoopDataWorks Summit
Apache Hadoop MapReduce has undergone a complete re-haul to emerge as Apache Hadoop YARN, a generic compute fabric to support MapReduce and other application paradigms. This really changes the game to recast Hadoop as a much more powerful data-processing system. As a result Hadoop looks very different from itself 12 months ago. Now, ever wonder what it might look like in 12 months or 24 months or longer? This talk will take you through some ideas for YARN itself and the many myriad ways it`s really moving the needle for MapReduce, Pig, Hive, Cascading and other data-processing tools in the Hadoop ecosystem.
Splout SQL - Web latency SQL views for Hadoopdatasalt
There are many Big Data problems whose output is also Big Data. In this presentation we will show Splout SQL, which allows serving an arbitrarily big dataset by partitioning it. Splout serves partitioned SQL views which are generated and indexed by Hadoop. Splout is to Hadoop + SQL what Voldemort or Elephant DB are to Hadoop + Key/Value. Hadoop is nowadays the de-facto open-source solution for Big Data batch-processing. When the output of a Hadoop process is big, there isn`t a satisfying solution for serving it. Think of pre-computed recommendations, for example, where the whole dataset may vary from one day to another. Splout decouples database creation from database serving and makes it efficient and safe to deploy Hadoop-generated datasets. There are many databases that allow serving Big Data such as NoSQL solutions, but they don`t have a rich query language like SQL. You generally can`t aggregate data in real-time like you would do with a GROUP BY clause. Because you can`t precompute everything, SQL is a very convenient feature to have in a Big Data serving solution. Splout is not a “fast analytics” engine. Splout is made for demanding web or mobile applications where query performance is critical. Arbitrary real-time aggregations should be done in less than 200 milliseconds under high traffic load. On top of that, Splout is scalable, flexible, RESTful & open-source.
"The greater promise of Big Data lies not in doing old things in slightly new ways. Instead, it lies in doing new things that were previously not possible. One major class of new things is adding intelligence to large-scale systems. In this session I will present a survey of how machine learning can be applied to real-life situations without having to get a PhD in advanced mathematics. These systems can be built today from open source components to increase business revenues by understanding what customers need and want. I will provide real world examples of best practices and pitfalls in machine learning including practical ways to build maintainable, high performance systems." - Ted Dunning
With the rise of Apache Hadoop, a next-generation enterprise data architecture is emerging that connects the systems powering business transactions and business intelligence. Hadoop is uniquely capable of storing, aggregating, and refining multi-structured data sources into formats that fuel new business insights. Apache Hadoop is fast becoming the defacto platform for processing Big Data. Hadoop started from a relatively humble beginning as a point solution for small search systems. Its growth into an important technology to the broader enterprise community dates back to Yahoo’s 2006 decision to evolve Hadoop into a system for solving its internet scale big data problems. Eric will discuss the current state of Hadoop and what is coming from a development standpoint as Hadoop evolves to meet more workloads.
Past Present and Future of Data Processing in Apache HadoopDataWorks Summit
Apache Hadoop MapReduce has undergone a complete re-haul to emerge as Apache Hadoop YARN, a generic compute fabric to support MapReduce and other application paradigms. This really changes the game to recast Hadoop as a much more powerful data-processing system. As a result Hadoop looks very different from itself 12 months ago. Now, ever wonder what it might look like in 12 months or 24 months or longer? This talk will take you through some ideas for YARN itself and the many myriad ways it`s really moving the needle for MapReduce, Pig, Hive, Cascading and other data-processing tools in the Hadoop ecosystem.
Splout SQL - Web latency SQL views for Hadoopdatasalt
There are many Big Data problems whose output is also Big Data. In this presentation we will show Splout SQL, which allows serving an arbitrarily big dataset by partitioning it. Splout serves partitioned SQL views which are generated and indexed by Hadoop. Splout is to Hadoop + SQL what Voldemort or Elephant DB are to Hadoop + Key/Value. Hadoop is nowadays the de-facto open-source solution for Big Data batch-processing. When the output of a Hadoop process is big, there isn`t a satisfying solution for serving it. Think of pre-computed recommendations, for example, where the whole dataset may vary from one day to another. Splout decouples database creation from database serving and makes it efficient and safe to deploy Hadoop-generated datasets. There are many databases that allow serving Big Data such as NoSQL solutions, but they don`t have a rich query language like SQL. You generally can`t aggregate data in real-time like you would do with a GROUP BY clause. Because you can`t precompute everything, SQL is a very convenient feature to have in a Big Data serving solution. Splout is not a “fast analytics” engine. Splout is made for demanding web or mobile applications where query performance is critical. Arbitrary real-time aggregations should be done in less than 200 milliseconds under high traffic load. On top of that, Splout is scalable, flexible, RESTful & open-source.
"The greater promise of Big Data lies not in doing old things in slightly new ways. Instead, it lies in doing new things that were previously not possible. One major class of new things is adding intelligence to large-scale systems. In this session I will present a survey of how machine learning can be applied to real-life situations without having to get a PhD in advanced mathematics. These systems can be built today from open source components to increase business revenues by understanding what customers need and want. I will provide real world examples of best practices and pitfalls in machine learning including practical ways to build maintainable, high performance systems." - Ted Dunning
WSO2Con USA 2017: Analytics Patterns for Your Digital EnterpriseWSO2
The WSO2 analytics platform provides a high performance, lean, enterprise-ready, streaming solution to solve data integration and analytics challenges faced by connected businesses. This platform offers real-time, interactive, machine learning and batch processing technologies that empower enterprises to build a digital business, by connecting various enterprise data sources to enhance your experience in understanding the data and to increase internal productivity.
This session explores how to enable digital transformation by building a data analytics platform. It will discuss the follwoing topics:
WSO2 Data Analytics Server architecture
Understanding streaming constructs
Architectural styles for data integration
Debugging and troubleshooting your integration
Deployment
Performance tuning
Production hardening
Slides from Enterprise Search & Analytics Meetup @ Cisco Systems - http://www.meetup.com/Enterprise-Search-and-Analytics-Meetup/events/220742081/
Relevancy and Search Quality Analysis - By Mark David and Avi Rappoport
The Manifold Path to Search Quality
To achieve accurate search results, we must come to an understanding of the three pillars involved.
1. Understand your data
2. Understand your customers’ intent
3. Understand your search engine
The first path passes through Data Analysis and Text Processing.
The second passes through Query Processing, Log Analysis, and Result Presentation.
Everything learned from those explorations feeds into the final path of Relevancy Ranking.
Search quality is focused on end users finding what they want -- technical relevance is sometimes irrelevant! Working with the short head (very frequent queries) has the most return on investment for improving the search experience, tuning the results, for example, to emphasize recent documents or de-emphasize archive documents, near-duplicate detection, exposing diverse results in ambiguous situations, using synonyms, and guiding search via best bets and auto-suggest. Long-tail analysis can reveal user intent by detecting patterns, discovering related terms, and identifying the most fruitful results by aggregated behavior. all this feeds back into the regression testing, which provides reliable metrics to evaluate the changes.
By merging these insights, you can improve the quality of the search overall, in a scalable and maintainable fashion.
Exploratory Search upon Semantically Described Web Data Sources: Service regi...Marco Brambilla
This presentation was given at the SSW workshop, collocated with VLDB 2012.
Exploratory search applications upon structured Web content is becoming one of the main information seeking paradigms for users. This is mainly due to the move towards mobile and pervasive Web access and to the more and more tight intertwining between everyday life and information seeking.
Structured data is typically distributed on the Web and accessible through a service-oriented paradigm. This paper proposes a vision on: (1) a semantically-enabled service registration framework for describing in a Web data services in a convenient way; and (2) a design method for applications that exploit such model using a design pattern -based method.
Webinar: Increase Conversion With Better SearchLucidworks
Hear from IBM Product Line Manager Iris Yuan & Lucidworks VP of Partner Engineering Sarath Jarugula for a deep discussion into how improving ecommerce search can drive conversions and increase revenue.
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...Amazon Web Services
The Howard Hughes Corporation partnered with 47Lining to develop a managed enterprise data lake based on Amazon S3. The purpose of the managed EDL is to fuse relevant on-premises and third-party data to enable Howard Hughes to answer its most valuable business questions. Their first analysis was a lead-scoring model that uses Amazon Machine Learning (Amazon ML) to predict propensity to purchase high-end real estate. The model is based on a combined set of public and private data sources, including all publicly recorded real estate transactions in the US for the past 35 years. By changing their business process for identifying and qualifying leads to use the results of data-driven analytics from their managed data lake in AWS, Howard Hughes increased the number of identified qualified leads in their pipeline by over 400% and reduced the acquisition cost per lead by more than 10 times. In this session, you will see a practical example of how to use Amazon ML to improve business results, how to architect a data lake with Amazon S3 that fuses on-premises, third-party, and public data sets, and how to train and run an Amazon ML model to attain predictive accuracy.
HacktoberFestPune - DSC MESCOE x DSC PVGCOETTanyaRaina3
HacktoberFestPune is a beginner-friendly, all-inclusive event that is absolutely free of cost. Certificates will be issued by DSC MESCOE and DSC PVGCOET for everyone who can complete 4 successful Pull Requests by 13th October 10 AM! An evening filled with speaker sessions, interactions with fellow developers, and mini-games, we think you'll have a great time with everyone!
We introduce the idea that metadata, including project information, data labels, data characteristics and indications of valuable use, can be propagated through a data processing lineage graph. Further, finding examples of significant cooccurrence of propagated and original metadata gives us the basis of an interesting kind of search engine gives interesting recommendations of data given a problem statement even in a near cold-start situation.
WSO2Con USA 2017: Analytics Patterns for Your Digital EnterpriseWSO2
The WSO2 analytics platform provides a high performance, lean, enterprise-ready, streaming solution to solve data integration and analytics challenges faced by connected businesses. This platform offers real-time, interactive, machine learning and batch processing technologies that empower enterprises to build a digital business, by connecting various enterprise data sources to enhance your experience in understanding the data and to increase internal productivity.
This session explores how to enable digital transformation by building a data analytics platform. It will discuss the follwoing topics:
WSO2 Data Analytics Server architecture
Understanding streaming constructs
Architectural styles for data integration
Debugging and troubleshooting your integration
Deployment
Performance tuning
Production hardening
Slides from Enterprise Search & Analytics Meetup @ Cisco Systems - http://www.meetup.com/Enterprise-Search-and-Analytics-Meetup/events/220742081/
Relevancy and Search Quality Analysis - By Mark David and Avi Rappoport
The Manifold Path to Search Quality
To achieve accurate search results, we must come to an understanding of the three pillars involved.
1. Understand your data
2. Understand your customers’ intent
3. Understand your search engine
The first path passes through Data Analysis and Text Processing.
The second passes through Query Processing, Log Analysis, and Result Presentation.
Everything learned from those explorations feeds into the final path of Relevancy Ranking.
Search quality is focused on end users finding what they want -- technical relevance is sometimes irrelevant! Working with the short head (very frequent queries) has the most return on investment for improving the search experience, tuning the results, for example, to emphasize recent documents or de-emphasize archive documents, near-duplicate detection, exposing diverse results in ambiguous situations, using synonyms, and guiding search via best bets and auto-suggest. Long-tail analysis can reveal user intent by detecting patterns, discovering related terms, and identifying the most fruitful results by aggregated behavior. all this feeds back into the regression testing, which provides reliable metrics to evaluate the changes.
By merging these insights, you can improve the quality of the search overall, in a scalable and maintainable fashion.
Exploratory Search upon Semantically Described Web Data Sources: Service regi...Marco Brambilla
This presentation was given at the SSW workshop, collocated with VLDB 2012.
Exploratory search applications upon structured Web content is becoming one of the main information seeking paradigms for users. This is mainly due to the move towards mobile and pervasive Web access and to the more and more tight intertwining between everyday life and information seeking.
Structured data is typically distributed on the Web and accessible through a service-oriented paradigm. This paper proposes a vision on: (1) a semantically-enabled service registration framework for describing in a Web data services in a convenient way; and (2) a design method for applications that exploit such model using a design pattern -based method.
Webinar: Increase Conversion With Better SearchLucidworks
Hear from IBM Product Line Manager Iris Yuan & Lucidworks VP of Partner Engineering Sarath Jarugula for a deep discussion into how improving ecommerce search can drive conversions and increase revenue.
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...Amazon Web Services
The Howard Hughes Corporation partnered with 47Lining to develop a managed enterprise data lake based on Amazon S3. The purpose of the managed EDL is to fuse relevant on-premises and third-party data to enable Howard Hughes to answer its most valuable business questions. Their first analysis was a lead-scoring model that uses Amazon Machine Learning (Amazon ML) to predict propensity to purchase high-end real estate. The model is based on a combined set of public and private data sources, including all publicly recorded real estate transactions in the US for the past 35 years. By changing their business process for identifying and qualifying leads to use the results of data-driven analytics from their managed data lake in AWS, Howard Hughes increased the number of identified qualified leads in their pipeline by over 400% and reduced the acquisition cost per lead by more than 10 times. In this session, you will see a practical example of how to use Amazon ML to improve business results, how to architect a data lake with Amazon S3 that fuses on-premises, third-party, and public data sets, and how to train and run an Amazon ML model to attain predictive accuracy.
HacktoberFestPune - DSC MESCOE x DSC PVGCOETTanyaRaina3
HacktoberFestPune is a beginner-friendly, all-inclusive event that is absolutely free of cost. Certificates will be issued by DSC MESCOE and DSC PVGCOET for everyone who can complete 4 successful Pull Requests by 13th October 10 AM! An evening filled with speaker sessions, interactions with fellow developers, and mini-games, we think you'll have a great time with everyone!
We introduce the idea that metadata, including project information, data labels, data characteristics and indications of valuable use, can be propagated through a data processing lineage graph. Further, finding examples of significant cooccurrence of propagated and original metadata gives us the basis of an interesting kind of search engine gives interesting recommendations of data given a problem statement even in a near cold-start situation.
The folk wisdom has always been that when running stateful applications inside containers, the only viable choice is to externalize the state so that the containers themselves are stateless or nearly so. Keeping large amounts of state inside containers is possible, but it’s considered a problem because stateful containers generally can’t preserve that state across restarts.
In practice, this complicates the management of large-scale Kubernetes-based infrastructure because these high-performance storage systems require separate management. In terms of overall system management, it would be ideal if we could run a software-defined storage system directly in containers managed by Kubernetes, but that has been hampered by lack of direct device access and difficult questions about what happens to the state on container restarts.
Ted Dunning describes recent developments that make it possible for Kubernetes to manage both compute and storage tiers in the same cluster. Container restarts can be handled gracefully without loss of data or a requirement to rebuild storage structures and access to storage from compute containers is extremely fast. In some environments, it’s even possible to implement elastic storage frameworks that can fold data onto just a few containers during quiescent periods or explode it in just a few seconds across a large number of machines when higher speed access is required.
The benefits of systems like this extend beyond management simplicity, because applications can be more Agile precisely because the storage layer is more stable and can be uniformly accessed from any container host. Even better, it makes it a snap to configure and deploy a full-scale compute and storage infrastructure.
Ellen Friedman and I spoke at the ACM meetup about how stream-first architecture can have a big impact and how the logistics of machine learning is a great example of that impact.
This is my half of the presentation.
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
Tensors are a very useful tool for mathematical programming. Moreover, the optimization frameworks that are part of most machine learning frameworks have some very cool uses outside of the normal machine learning kinds of tasks.
The logistics of machine learning typically take waaay more effort than the machine learning itself. Moreover, machine learning systems aren't like normal software projects so continuous integration takes on new meaning.
You know that a single number isn't a good summary of a measurement. T-digest and other non-uniform histograms can make it easy to keep track of an entire distribution and can be combined in OLAP queries.
The latest t-digest is faster, more accurate and has hard bounds on size.
This talk shows practical methods for find changes in a variety of kinds of data as well as giving real-world examples from finance, telecom, systems monitoring and natural language processing.
This was one of the talks that I gave at the Strata San Jose conference. I migrated my topic a bit, but here is the original abstract:
Application developers and architects today are interested in making their applications as real-time as possible. To make an application respond to events as they happen, developers need a reliable way to move data as it is generated across different systems, one event at a time. In other words, these applications need messaging.
Messaging solutions have existed for a long time. However, when compared to legacy systems, newer solutions like Apache Kafka offer higher performance, more scalability, and better integration with the Hadoop ecosystem. Kafka and similar systems are based on drastically different assumptions than legacy systems and have vastly different architectures. But do these benefits outweigh any tradeoffs in functionality? Ted Dunning dives into the architectural details and tradeoffs of both legacy and new messaging solutions to find the ideal messaging system for Hadoop.
Topics include:
* Queues versus logs
* Security issues like authentication, authorization, and encryption
* Scalability and performance
* Handling applications that span multiple data centers
* Multitenancy considerations
* APIs, integration points, and more
This talk focuses on how larger data sets are not only enabling advanced techniques, but also increasing the number of problems within reach of relatively simple techniques, that is "cheap learning".
These are the slides from my talk at FAR Con in Minneapolis recently. The topics are the implications of buried treasure hoards on data security, horror stories and new, simpler and provably secure methods for public data disclosure.
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
This talk describes how indicator-based recommendations can be evolved in real time. Normally, indicator-based recommendations use a large off-line computation to understand the general structure of items to be recommended and then make recommendations in real-time to users based on a comparison of their recent history versus the large-scale product of the off-line computation.
In this talk, I show how the same components of the off-line computation that guarantee linear scalability in a batch setting also give strict real-time bounds on the cost of a practical real-time implementation of the indicator computation.
How the Internet of Things is Turning the Internet Upside DownTed Dunning
This is a wide-ranging talk that goes into how the internet is architected, how that architecture is changing as a result of internet of things, how the internet of things worked in the 19th century big data, open-source community and how to build time-series databases to make this all possible.
Really.
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
Apache Kylin (incubating) is a new project to bring OLAP cubes to Hadoop. I walk through the project and describe how it works and how users see the project.
Many statistics are impossible to compute precisely on streaming data. There are some very clever algorithms, however, which allow us to compute very good approximations of these values efficiently in terms of CPU and memory.
Anomaly Detection - New York Machine LearningTed Dunning
Anomaly detection is the art of finding what you don't know how to ask for. In this talk, I walk through the why and how of building probabilistic models for a variety of problems including continuous signals and web traffic. This talk blends theory and practice in a highly approachable way.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
2. Agenda
• Intelligence – Artificial or Reflected
• Quick survey of machine learning
– without a PhD
– not all of it
• Available components
• What do customers really want
5. Reflected Intelligence!
• Society is not just a million individuals
• A web service with a million users is not the
same as a million users each with a computer
• Social computing emerges
6. What is Machine Learning?
• Statistics, but …
• New focus on prediction rather than
hypothesis testing
• Prediction means held-out data, not just the
future (now-casting)
7. The Classics
• Unsupervised
– AKA clustering (but not what you think that is)
– Mixture models, Markov models and more
– Learn from unlabeled data, describe it predictively
• Supervised
– AKA classification
– Learn from labeled data, guess labels for new data
• Also semi-supervised and hundreds of variants
8. Recent Insurgents
• Collaborative learning
– models that learn about you based on others
• Meta-modeling
– models that learn to reason about what other
models say
• Interactive systems
– systems that pick what to learn from
9. Techniques
• Surprise and coincidence
• Anomalous indicators
• Non-textual search using textual tools
• Dithering
• Meta-learning
11. A vice president of South Carolina Bank and Trust in Bamberg,
Maxwell has served as a tireless champion for economic
development in Bamberg County since 1999, welcoming
industrial prospects to the county and working with existing
industries in their expansion efforts. Maxwell served for many
years as the president of the Bamberg County Chamber of
Commerce and remains an active member today.
12. The goal of learning is prediction. Learning falls into many
categories, including supervised learning, unsupervised learning,
online learning, and reinforcement learning. From the
perspective of statistical learning theory, supervised learning is
best understood.
13. Surprise and Coincidence
• Which words stand out in these examples?
• Which are just there because these are in
English?
• The words “the” and “Bamberg” both occur 3
times in the second article
– which is the more interesting statistic? Why?
14. More Surprise
• Anomalous indicators
– Events that occur before other events
– But occur anomalously often
• Indicators are not causes
• Nor certain
15. Example #1- Auto Insurance
• Predict probability of attrition and loss for
auto insurance customers
• Transactional variables include
– Claim history
– Traffic violation history
– Geographical code of residence(s)
– Vehicles owned
• Observed attrition and loss define past
behavior
16. Derived Variables
• Split training data according to observable
classes
• Define LLR variables for each class/variable
combination
• These 2 m v derived variables can be used
for clustering (spectral, k-means, neural gas
...)
• Proximity in LLR space to clusters are the
new modeling variables
17. Example #2 – Fraud Detection
• Predict probability that an account is likely
to result in charge-off due to fraud
• Transactional variables include
– Zip code
– Recent payments and charges
– Recent non-monetary transactions
• Bad payments, charge-off, delinquency are
observable behavioral outcomes
18. Derived Variables
• Split training data according to observable
classes
• Define LLR variables for each class/variable
combination
• These 2 m v derived variables can be used
directly as model variables
19. Search Abuse
• Non-textual search using textual tools
– A document can contain non-word tokens
– These might be anomalous indicators of an event
• SolR and similar engines can search for
indicators
– If we have a history of recent indicators, search
finds possible follow-on events
20. Introducing Noise
• Dithering
– add noise
– less for high ranks, more for low ranks
• Softens page boundary effects
• Introduces more exploration
22. Available components
• Mahout
– LLR test for anomaly
– Coocurrence computations
– Baseline components of Bayesian Bandits
• SolR
– Ready to roll for search
26. Input Data
• User transactions
– user id, merchant id
– SIC code, amount
• Offer transactions
– user id, offer id
– vendor id, merchant id’s,
– offers, views, accepts
27. Input Data
• User transactions
– user id, merchant id
– SIC code, amount
• Offer transactions
– user id, offer id
– vendor id, merchant id’s,
– offers, views, accepts
• Derived merchant data
• Derived user data – local top40
– merchant id’s
– SIC code
– SIC codes
– vendor code
– offer & vendor id’s
– amount distribution
28. Cross-recommendation
• Per merchant indicators
– merchant id’s
– chain id’s
– SIC codes
– offer vendor id’s
• Computed by finding anomalous (indicator =>
merchant) rates
30. Search-based Recommendations
• Sample document
– Merchant Id
– Field for text description
– Phone
– Address
– Location
– Indicator merchant id’s
– Indicator industry (SIC) id’s
– Indicator offers
– Indicator text
– Local top40
31. Search-based Recommendations
• Sample document • Sample query
– Merchant Id – Current location
– Field for text description – Recent merchant
– Phone descriptions
– Address – Recent merchant id’s
– Location – Recent SIC codes
– Recent accepted offers
– Indicator merchant id’s – Local top40
– Indicator industry (SIC) id’s
– Indicator offers
– Indicator text
– Local top40
32. SolR
SolR
Complete Cooccurrence Indexer
Solr
Indexer
history (Mahout) indexing
Item meta- Index
data shards
33. SolR
SolR
User Indexer
Solr
Web tier Indexer
history search
Item meta-
Index
data shards
34. Objective Results
• At a very large credit card company
• History is all transactions, all web interaction
• Processing time cut from 20 hours per day to 3
• Recommendation engine load time decreased
from 8 hours to 3 minutes
35. Platform Needs
• Need to root web services and search system on the
cluster
– Copying negates unification
• Legacy indexers are extremely fast … but they assume
conventional file access
• High performance search engines need high
performance file I/O
• Need coordinated process management
36. Additional Opportunities
• Cross recommend from search queries to
documents
• Result is semantic search engine
• Uses reflected intelligence instead of artificial
intelligence
38. Another Example
• Users enter queries (A)
– (actor = user, item=query)
• Users view videos (B)
– (actor = user, item=video)
• A’A gives query recommendation
– “did you mean to ask for”
• B’B gives video recommendation
– “you might like these videos”
39. The punch-line
• B’A recommends videos in response to a
query
– (isn’t that a search engine?)
– (not quite, it doesn’t look at content or meta-data)
40. Real-life example
• Query: “Paco de Lucia”
• Conventional meta-data search results:
– “hombres del paco” times 400
– not much else
• Recommendation based search:
– Flamenco guitar and dancers
– Spanish and classical guitar
– Van Halen doing a classical/flamenco riff
42. Hypothetical Example
• Want a navigational ontology?
• Just put labels on a web page with traffic
– This gives A = users x label clicks
• Remember viewing history
– This gives B = users x items
• Cross recommend
– B’A = label to item mapping
• After several users click, results are whatever
users think they should be
43. Next Steps
• That is up to you
• But I can help
– platforms (Solr, MapR)
– techniques (Mahout, math)
tdunning@maprtech.com
@ted_dunning
@ApacheMahout