This document summarizes a presentation about using SQL and machine learning to detect fraud. It discusses using SQL rules to initially flag fraudulent transactions based on attributes like transaction amount and sender/receiver. This achieved an error rate of 1% but still lost money. Machine learning was then applied by splitting data into train and test, building a naive Bayes model, and computing metrics like the confusion matrix. Combining SQL rules and ML results improved performance over either approach individually, lowering the error rate. The document emphasizes that the real world requires balancing false positives versus negatives based on business impacts. It promotes using both deterministic and probabilistic approaches together.
Time Series Analysis by JavaScript LL matsuri 2013 Daichi Morifuji
This document discusses time series analysis and visualization using JavaScript. It introduces Series.js, a utility library for time series data that provides methods for aggregation, statistics, and visualization. Sample time series CPU and memory usage data is provided, and it is summarized using Series.js methods to aggregate the data by minute. Future work includes improving performance, documentation, and client-side capabilities of Series.js.
Cloud Native Night, December 2020, talk by Jörg Viechtbauer (Senior Software Architect, QAware)
== Please download slides if blurred! ==
Abstract:
Neural networks like BERT have revolutionized the processing of natural language and achieve state-of-the-art performance in many NLP tasks. One of them is semantic search where documents are found by query intent and not only by exact match.
This talk takes us through the history of information retrieval and shows how keyword search has evolved into the term vector model. The desire for a better search led to the development of the first semantic models like SLI or PLSA. We will see how this culminates today in the use of sophisticated deep neural networks that perform nonlinear dimensional reductions and master long-range dependencies.
Semantic search has never been as good and easy to implement as it is today.
About Jörg:
Jörg is a search expert at QAware and uses neural networks for semantic search and text comprehension. He has spent almost 20 years developing search engines based on both proprietary and open source software for enterprise search, eDiscovery and local search - always hunting for the perfect ranking formula.
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...Databricks
This document discusses common anti-patterns when using Spark with Cassandra. It begins by introducing the authors and their experience. The main section describes several common issues like out of memory errors, RPC failures, and slow performance. It then discusses the most common performance pitfall of collecting and re-parallelizing data. Alternative approaches are provided. Other topics covered include predicate pushdowns, serialization, and understanding how Catalyst optimizes queries.
What's the great thing about a database? Why, it stores data of course! However, one feature that makes a database useful is the different data types that can be stored in it, and the breadth and sophistication of the data types in PostgreSQL is second-to-none, including some novel data types that do not exist in any other database software!
This talk will take an in-depth look at the special data types built right into PostgreSQL version 9.4, including:
* INET types
* UUIDs
* Geometries
* Arrays
* Ranges
* Document-based Data Types:
* Key-value store (hstore)
* JSON (text [JSON] & binary [JSONB])
We will also have some cleverly concocted examples to show how all of these data types can work together harmoniously.
Optimizing Slow Queries with Indexes and CreativityMongoDB
This document discusses optimizing slow queries in MongoDB through the use of indexes and schema design. It begins with examples of common performance issues like inefficient reads from table scans and writes causing locks. Each example shows the slow query, explains the problem, and proposes a solution like adding an index. It emphasizes that new indexes only scale up to a point and sometimes the best approach is to redesign the schema or add hardware. The talk provides guidance on measuring performance, finding slow queries, determining if they can be indexed, and rephrasing problems in terms of desired queries.
Mongo or Die: How MongoDB Powers Doodle or DieAaron Silverman
"Doodle or Die" is a popular online drawing game built on Node.js and MongoDB. It started off as an entry to the 2011 Node Knockout competition and after winning the category for "Most Fun" has continued to grow into a game that has thousands of players play each day who have produced millions of drawings. This talk will use Doodle or Die as a vehicle to showcase the many strengths of using MongoDB as a go-to database for small and midsize applications. It will also cover lessons learned as we used MongoDB to rapidly develop and scale Doodle or Die.
MongoDC 2012: How MongoDB Powers Doodle or DieMongoDB
MongoDB powers the online drawing game Doodle or Die by storing user and game data in flexible document structures within several MongoDB collections. As the game grew popular with millions of players, the MongoDB database scaled to support increased traffic. Key collections include players, chains, and groups, with embedded subdocuments used to group related data and optimize queries. MongoDB provides a simple and powerful way to store the complex interactive data needed to power real-time multiplayer games like Doodle or Die at scale.
The data deluge we currently witnessing presents both opportunities and challenges. Never before have so many aspects of our world been so thoroughly quantified as now and never before has data been so plentiful. One of the main beneficiaries of the current data glut have been data intensive approaches such as Neural Networks in general and Deep Learning in particular, with many important new practical applications arising in the last couple of years.
In this tutorial we provide a practical introduction to the most important concepts of Neural Networks as we gradually implement simple neural networks from scratch to perform simple tasks such as character recognition. Our emphasis will be on understanding the architecture, training, advantages and pitfalls of practical neural networks.
Participants are expected to have a basic familiarity with calculus and linear algebra as well as working proficiency with the Python programming language.
Time Series Analysis by JavaScript LL matsuri 2013 Daichi Morifuji
This document discusses time series analysis and visualization using JavaScript. It introduces Series.js, a utility library for time series data that provides methods for aggregation, statistics, and visualization. Sample time series CPU and memory usage data is provided, and it is summarized using Series.js methods to aggregate the data by minute. Future work includes improving performance, documentation, and client-side capabilities of Series.js.
Cloud Native Night, December 2020, talk by Jörg Viechtbauer (Senior Software Architect, QAware)
== Please download slides if blurred! ==
Abstract:
Neural networks like BERT have revolutionized the processing of natural language and achieve state-of-the-art performance in many NLP tasks. One of them is semantic search where documents are found by query intent and not only by exact match.
This talk takes us through the history of information retrieval and shows how keyword search has evolved into the term vector model. The desire for a better search led to the development of the first semantic models like SLI or PLSA. We will see how this culminates today in the use of sophisticated deep neural networks that perform nonlinear dimensional reductions and master long-range dependencies.
Semantic search has never been as good and easy to implement as it is today.
About Jörg:
Jörg is a search expert at QAware and uses neural networks for semantic search and text comprehension. He has spent almost 20 years developing search engines based on both proprietary and open source software for enterprise search, eDiscovery and local search - always hunting for the perfect ranking formula.
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...Databricks
This document discusses common anti-patterns when using Spark with Cassandra. It begins by introducing the authors and their experience. The main section describes several common issues like out of memory errors, RPC failures, and slow performance. It then discusses the most common performance pitfall of collecting and re-parallelizing data. Alternative approaches are provided. Other topics covered include predicate pushdowns, serialization, and understanding how Catalyst optimizes queries.
What's the great thing about a database? Why, it stores data of course! However, one feature that makes a database useful is the different data types that can be stored in it, and the breadth and sophistication of the data types in PostgreSQL is second-to-none, including some novel data types that do not exist in any other database software!
This talk will take an in-depth look at the special data types built right into PostgreSQL version 9.4, including:
* INET types
* UUIDs
* Geometries
* Arrays
* Ranges
* Document-based Data Types:
* Key-value store (hstore)
* JSON (text [JSON] & binary [JSONB])
We will also have some cleverly concocted examples to show how all of these data types can work together harmoniously.
Optimizing Slow Queries with Indexes and CreativityMongoDB
This document discusses optimizing slow queries in MongoDB through the use of indexes and schema design. It begins with examples of common performance issues like inefficient reads from table scans and writes causing locks. Each example shows the slow query, explains the problem, and proposes a solution like adding an index. It emphasizes that new indexes only scale up to a point and sometimes the best approach is to redesign the schema or add hardware. The talk provides guidance on measuring performance, finding slow queries, determining if they can be indexed, and rephrasing problems in terms of desired queries.
Mongo or Die: How MongoDB Powers Doodle or DieAaron Silverman
"Doodle or Die" is a popular online drawing game built on Node.js and MongoDB. It started off as an entry to the 2011 Node Knockout competition and after winning the category for "Most Fun" has continued to grow into a game that has thousands of players play each day who have produced millions of drawings. This talk will use Doodle or Die as a vehicle to showcase the many strengths of using MongoDB as a go-to database for small and midsize applications. It will also cover lessons learned as we used MongoDB to rapidly develop and scale Doodle or Die.
MongoDC 2012: How MongoDB Powers Doodle or DieMongoDB
MongoDB powers the online drawing game Doodle or Die by storing user and game data in flexible document structures within several MongoDB collections. As the game grew popular with millions of players, the MongoDB database scaled to support increased traffic. Key collections include players, chains, and groups, with embedded subdocuments used to group related data and optimize queries. MongoDB provides a simple and powerful way to store the complex interactive data needed to power real-time multiplayer games like Doodle or Die at scale.
The data deluge we currently witnessing presents both opportunities and challenges. Never before have so many aspects of our world been so thoroughly quantified as now and never before has data been so plentiful. One of the main beneficiaries of the current data glut have been data intensive approaches such as Neural Networks in general and Deep Learning in particular, with many important new practical applications arising in the last couple of years.
In this tutorial we provide a practical introduction to the most important concepts of Neural Networks as we gradually implement simple neural networks from scratch to perform simple tasks such as character recognition. Our emphasis will be on understanding the architecture, training, advantages and pitfalls of practical neural networks.
Participants are expected to have a basic familiarity with calculus and linear algebra as well as working proficiency with the Python programming language.
Data Wars: The Bloody Enterprise strikes backVictor_Cr
I would like to describe such cases when we create problems for "future us" just by an accident. I will show how different Java data types can ease or increase the pain in supporting the application later. Most common pitfals and tricky corner cases you probably have never thought about.
Software is eating the world. The rate at which we produce new software is astounding. Understanding and preventing potential issues is a growing concern.
Building software security teams is much different than building IT security teams. It requires different backgrounds and focus. Software security groups without an emphasis on software fail.
Join Aaron as he talks about the right way to build and run a software security group. You will walk away with a concrete list of actions that you can take back to your job and start working on right away.
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based ShardingMongoDB
In version 2.4, MongoDB introduces hash-based sharding, allowing the user to shard based on a randomized shard key to spread documents evenly across a cluster. Hash-based sharding is an alternative to range-based sharding, making it easier to manage your growing cluster. In this talk, we'll discuss provide an overview of this new feature and discuss the pros and cons of using a hash-based sharding vs. range-based approach.
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...MongoDB
In version 2.4, MongoDB introduces hash-based sharding, a new option for distributing data in sharded collections. Hash-based sharding and range-based sharding present different advantages for MongoDB users deploying large scale systems. In this talk, we'll provide an overview of this new feature and discuss when to use hash-based sharding or range-based sharding.
Beyond PHP - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just writing PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Midway in our life's journey, I went astray from the straight imperative road and woke to find myself alone in a dark declarative wood.
My guide out of this dark declarative wood was a familiar friend, SQL, who showed me the way to wrap a context of a window to push through using Window Functions to escape the Inferno.
Next I found myself somewhere in-between running up hill with one foot in front of the other advancing so as the leading foot was always above the ground running with my friend LINQ, I was able to wrap the context of a collection around my data to advance my journey through Purgatorio.
My last guide into the blinding brilliant light of Paradiso was from the Dutch Caribbean, who taught me how to wrap my computations into a context and move my data through leading me into brilliant bliss.
Join me on my divine data comedy.
This document summarizes a presentation given at Paris Data Ladies #6 on Spark Streaming. The presentation introduced Spark Streaming, discussing how it can be used to perform real-time data analysis. It covered topics such as batch versus streaming processing, why Spark Streaming is useful, and how to build a basic Twitter trending hashtag application using Spark Streaming. Code examples were provided and a live demo of the Twitter application was shown. Considerations for productionizing Spark Streaming applications were also discussed.
Beyond PHP - It's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Beyond PHP - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just writing PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
This webinar took place on August 23, 2012.
Never worry about servers. Never worry about config files. Never worry about patches. Simply focus on your data with Heroku Postgres.
PostgreSQL is a powerful, reliable, and durable open-source SQL-compliant database. Now available as a fully-managed cloud database from salesforce.com, Heroku Postgres reduces the costs and administrative overhead compared to operating your own database. You can even create a database instance within seconds with a single click.
Watch this webinar to learn about:
:: When to use Heroku Postgres versus Database.com
:: What data you can and should store in Heroku Postgres
:: Architecting your application with Heroku Postgres
:: How to efficiently share data in your organization with Dataclips
:: How to take advantage of features such as Fork and Follow to scale
This document discusses statistical spam filtering techniques. It describes decoding, tokenizing emails, building a decision matrix based on probability and interestingness. It discusses Bayesian filtering and challenges with accuracy when filtering a large number of emails versus an individual's emails. Machine learning techniques like training on large datasets are mentioned. Blacklists, whitelists and "honeypot" email addresses to identify spammers are also summarized.
We all have tasks from time to time for bulk-loading external data into MySQL. What's the best way of doing this? That's the task I faced recently when I was asked to help benchmark a multi-terrabyte database. We had to find the most efficient method to reload test data repeatedly without taking days to do it each time. In my presentation, I'll show you several alternative methods for bulk data loading, and describe the practical steps to use them efficiently. I'll cover SQL scripts, the mysqlimport tool, MySQL Workbench import, the CSV storage engine, and the Memcached API. I'll also give MySQL tuning tips for data loading, and how to use multi-threaded clients.
Probabilistic Data Structures and Approximate SolutionsOleksandr Pryymak
Probabilistic and approximate data structures can provide scalable solutions when exact answers are not required. They trade accuracy for speed and efficiency. Approaches like sampling, hashing, cardinality estimation, and probabilistic databases allow analyzing large datasets while controlling error rates. Example techniques discussed include Bloom filters, locality-sensitive hashing, count-min sketches, HyperLogLog, and feature hashing for machine learning. The talk provided code examples and comparisons of these probabilistic methods.
A good data model is key to getting the best performance from Apache Cassandra. The Log Structured Storage Engine and it's distributed architecture mean we cannot rely on a paradigm such as Normal Form to evaluate a model. Instead we need to design data models that support the read path of the application. In this talk Aaron Morton will walk through the key principles and patterns of a good CQL3 data model using simple examples.
Learning to build distributed systems the hard wayTheo Hultberg
This document discusses some key lessons the author has learned about building distributed systems at scale:
- Start with more servers than you need and scale out capacity from the beginning to solve scaling problems proactively.
- Put significant thought into how data is keyed and identified to facilitate queries and sharding.
- Consistency is not always critical, and distributed systems can function with some degree of inconsistency.
- Precompute frequently queried data when possible to avoid ad hoc queries.
- Separate processing from storage to allow independent scaling.
- Carefully plan how data will be deleted as removal is complex.
The document discusses topics related to just-in-time (JIT) compilation in the Java Virtual Machine (JVM). It explains that the JIT compiler is triggered when methods are invoked many times based on invocation and back-edge counters, compiling frequently used code to machine code for improved performance. It also discusses why ahead-of-time compilation is not well-suited for Java due to its dynamic nature, and what can cause compiled code to be deoptimized, such as class initialization failures, null pointer exceptions, and unstable control flow.
- The document discusses optimization techniques for SQL injection attacks, including reducing injection length, improving data retrieval speed, leveraging data compression, and exploiting vulnerabilities through blind SQL injection.
- Specific techniques mentioned include using shorter SQL functions like SUBSTR() instead of SUBSTRING(), retrieving hashed data a byte at a time using logical AND operations, and ordering queries randomly to retrieve data in a non-sequential manner.
- The document provides examples of exploiting a blind SQL injection vulnerability through techniques like ordering results based on random number seeds and retrieving multi-byte values with a binary search approach.
This document discusses future directions for LINQ, including making it work with asynchronous and reactive programming using Rx. It proposes expanding LINQ to handle more types of queries, such as querying over asynchronous data streams, remote data sources, and even non-traditional domains like constraint solving. The goal is to realize a "LINQ to everything" vision by extending LINQ's principles to new programming models and domains.
This document discusses storing product and order data as JSON in a database to support an agile development process. It describes creating tables with JSON columns to store this data, and using JSON functions like JSON_VALUE and JSON_TABLE to query and transform the JSON data. Examples are provided of indexing JSON columns for performance and updating product JSON to include unit costs by joining external data. The goal is to enable flexible and rapid evolution of the application through storing data in JSON.
Added in Oracle Database 18c, Polymorphic Table Functions (PTFs) allow you to change the shape of a result set at runtime. So you can add or remove columns from your results based on input parameters.
This presentation gives an overview of the why & how of PTFs.
More Related Content
Similar to Game of Fraud Detection with SQL and Machine Learning
Data Wars: The Bloody Enterprise strikes backVictor_Cr
I would like to describe such cases when we create problems for "future us" just by an accident. I will show how different Java data types can ease or increase the pain in supporting the application later. Most common pitfals and tricky corner cases you probably have never thought about.
Software is eating the world. The rate at which we produce new software is astounding. Understanding and preventing potential issues is a growing concern.
Building software security teams is much different than building IT security teams. It requires different backgrounds and focus. Software security groups without an emphasis on software fail.
Join Aaron as he talks about the right way to build and run a software security group. You will walk away with a concrete list of actions that you can take back to your job and start working on right away.
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based ShardingMongoDB
In version 2.4, MongoDB introduces hash-based sharding, allowing the user to shard based on a randomized shard key to spread documents evenly across a cluster. Hash-based sharding is an alternative to range-based sharding, making it easier to manage your growing cluster. In this talk, we'll discuss provide an overview of this new feature and discuss the pros and cons of using a hash-based sharding vs. range-based approach.
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...MongoDB
In version 2.4, MongoDB introduces hash-based sharding, a new option for distributing data in sharded collections. Hash-based sharding and range-based sharding present different advantages for MongoDB users deploying large scale systems. In this talk, we'll provide an overview of this new feature and discuss when to use hash-based sharding or range-based sharding.
Beyond PHP - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just writing PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Midway in our life's journey, I went astray from the straight imperative road and woke to find myself alone in a dark declarative wood.
My guide out of this dark declarative wood was a familiar friend, SQL, who showed me the way to wrap a context of a window to push through using Window Functions to escape the Inferno.
Next I found myself somewhere in-between running up hill with one foot in front of the other advancing so as the leading foot was always above the ground running with my friend LINQ, I was able to wrap the context of a collection around my data to advance my journey through Purgatorio.
My last guide into the blinding brilliant light of Paradiso was from the Dutch Caribbean, who taught me how to wrap my computations into a context and move my data through leading me into brilliant bliss.
Join me on my divine data comedy.
This document summarizes a presentation given at Paris Data Ladies #6 on Spark Streaming. The presentation introduced Spark Streaming, discussing how it can be used to perform real-time data analysis. It covered topics such as batch versus streaming processing, why Spark Streaming is useful, and how to build a basic Twitter trending hashtag application using Spark Streaming. Code examples were provided and a live demo of the Twitter application was shown. Considerations for productionizing Spark Streaming applications were also discussed.
Beyond PHP - It's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Beyond PHP - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just writing PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
This webinar took place on August 23, 2012.
Never worry about servers. Never worry about config files. Never worry about patches. Simply focus on your data with Heroku Postgres.
PostgreSQL is a powerful, reliable, and durable open-source SQL-compliant database. Now available as a fully-managed cloud database from salesforce.com, Heroku Postgres reduces the costs and administrative overhead compared to operating your own database. You can even create a database instance within seconds with a single click.
Watch this webinar to learn about:
:: When to use Heroku Postgres versus Database.com
:: What data you can and should store in Heroku Postgres
:: Architecting your application with Heroku Postgres
:: How to efficiently share data in your organization with Dataclips
:: How to take advantage of features such as Fork and Follow to scale
This document discusses statistical spam filtering techniques. It describes decoding, tokenizing emails, building a decision matrix based on probability and interestingness. It discusses Bayesian filtering and challenges with accuracy when filtering a large number of emails versus an individual's emails. Machine learning techniques like training on large datasets are mentioned. Blacklists, whitelists and "honeypot" email addresses to identify spammers are also summarized.
We all have tasks from time to time for bulk-loading external data into MySQL. What's the best way of doing this? That's the task I faced recently when I was asked to help benchmark a multi-terrabyte database. We had to find the most efficient method to reload test data repeatedly without taking days to do it each time. In my presentation, I'll show you several alternative methods for bulk data loading, and describe the practical steps to use them efficiently. I'll cover SQL scripts, the mysqlimport tool, MySQL Workbench import, the CSV storage engine, and the Memcached API. I'll also give MySQL tuning tips for data loading, and how to use multi-threaded clients.
Probabilistic Data Structures and Approximate SolutionsOleksandr Pryymak
Probabilistic and approximate data structures can provide scalable solutions when exact answers are not required. They trade accuracy for speed and efficiency. Approaches like sampling, hashing, cardinality estimation, and probabilistic databases allow analyzing large datasets while controlling error rates. Example techniques discussed include Bloom filters, locality-sensitive hashing, count-min sketches, HyperLogLog, and feature hashing for machine learning. The talk provided code examples and comparisons of these probabilistic methods.
A good data model is key to getting the best performance from Apache Cassandra. The Log Structured Storage Engine and it's distributed architecture mean we cannot rely on a paradigm such as Normal Form to evaluate a model. Instead we need to design data models that support the read path of the application. In this talk Aaron Morton will walk through the key principles and patterns of a good CQL3 data model using simple examples.
Learning to build distributed systems the hard wayTheo Hultberg
This document discusses some key lessons the author has learned about building distributed systems at scale:
- Start with more servers than you need and scale out capacity from the beginning to solve scaling problems proactively.
- Put significant thought into how data is keyed and identified to facilitate queries and sharding.
- Consistency is not always critical, and distributed systems can function with some degree of inconsistency.
- Precompute frequently queried data when possible to avoid ad hoc queries.
- Separate processing from storage to allow independent scaling.
- Carefully plan how data will be deleted as removal is complex.
The document discusses topics related to just-in-time (JIT) compilation in the Java Virtual Machine (JVM). It explains that the JIT compiler is triggered when methods are invoked many times based on invocation and back-edge counters, compiling frequently used code to machine code for improved performance. It also discusses why ahead-of-time compilation is not well-suited for Java due to its dynamic nature, and what can cause compiled code to be deoptimized, such as class initialization failures, null pointer exceptions, and unstable control flow.
- The document discusses optimization techniques for SQL injection attacks, including reducing injection length, improving data retrieval speed, leveraging data compression, and exploiting vulnerabilities through blind SQL injection.
- Specific techniques mentioned include using shorter SQL functions like SUBSTR() instead of SUBSTRING(), retrieving hashed data a byte at a time using logical AND operations, and ordering queries randomly to retrieve data in a non-sequential manner.
- The document provides examples of exploiting a blind SQL injection vulnerability through techniques like ordering results based on random number seeds and retrieving multi-byte values with a binary search approach.
This document discusses future directions for LINQ, including making it work with asynchronous and reactive programming using Rx. It proposes expanding LINQ to handle more types of queries, such as querying over asynchronous data streams, remote data sources, and even non-traditional domains like constraint solving. The goal is to realize a "LINQ to everything" vision by extending LINQ's principles to new programming models and domains.
Similar to Game of Fraud Detection with SQL and Machine Learning (20)
This document discusses storing product and order data as JSON in a database to support an agile development process. It describes creating tables with JSON columns to store this data, and using JSON functions like JSON_VALUE and JSON_TABLE to query and transform the JSON data. Examples are provided of indexing JSON columns for performance and updating product JSON to include unit costs by joining external data. The goal is to enable flexible and rapid evolution of the application through storing data in JSON.
Added in Oracle Database 18c, Polymorphic Table Functions (PTFs) allow you to change the shape of a result set at runtime. So you can add or remove columns from your results based on input parameters.
This presentation gives an overview of the why & how of PTFs.
Using Edition-Based Redefinition for Zero Downtime PL/SQL ChangesChris Saxon
An introduction to edition-based redefinition, a technology which enables zero-downtime application releases for Oracle Database. Discusses the challenges with deploying PL/SQL code changes, and shows how EBR solves these issues.
Why Isn't My Query Using an Index? An Introduction to SQL PerformanceChris Saxon
An introduction to the factors that affect whether or not the optimizer will choose an index to execute a query.
Explains the clustering factor. What this is, why it matters, and how it affects query performance. It also covers techniques you can use to change the clustering factor for a table.
18(ish) Things You'll Love About Oracle Database 18cChris Saxon
An overview of the latest SQL & PL/SQL features in Oracle Database 18c, including:
- Polymorphic Table Functions
- Inline External Tables
- JSON improvements
How to Find Patterns in Your Data with SQLChris Saxon
The document discusses various SQL techniques for finding patterns in data, including identifying consecutive dates and dates that fall within the same week. It provides examples of using regular expressions, window functions, and Oracle Database 12c's MATCH_RECOGNIZE clause to analyze a sample running log dataset and determine consecutive runs, runs within the same week, and consecutive weeks with a minimum number of runs. The document compares different approaches like MATCH_RECOGNIZE versus the Tabibitosan method.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
11. SQL
TDT TAMOUNT TTYPE SENDER_HOUSE RECEIVER_HOUSE
----------- -------------- ---------- --------------- ---------------
04-JAN-2017 257,657.85 CASH_IN House Tyrell House Lannister
03-FEB-2017 105,861.07 TRANSFER House Arryn House Stark
06-FEB-2017 2,410.38 DEBIT House Tyrell House Baratheon
20-FEB-2017 322.94 PAYMENT House Arryn House Stark
21-FEB-2017 1,559.41 CASH_OUT House Tyrell House Targaryen
04-MAY-2017 1,624.57 CASH_IN House Arryn House Baratheon
07-JUN-2017 2,070.47 CASH_OUT Night's Watch House Arryn
22-JUN-2017 3,392.80 DEBIT House Tully House Arryn
03-JUL-2017 1,767.72 PAYMENT Wildling House Lannister
16-JUL-2017 56,125.55 TRANSFER House Greyjoy House Stark
16. SQL
case
when transaction_type in (
'CASH_IN', 'TRANSFER', 'CASH_OUT'
) and max ( trans_id ) over ( … ) is not null
then 1 -- FRAUD, it's FRAUD!
else 0
end
17. SQL
case
when transaction_type in (
'CASH_IN', 'TRANSFER', 'CASH_OUT'
) and max ( trans_id ) over ( … ) is not null
then 1 -- FRAUD, it's FRAUD!
else 0
end
18. SQL
max ( trans_id ) over (
partition by sender_house, receiver_house,
transaction_amount
)
19. SQL
max ( trans_id ) over (
partition by sender_house, receiver_house,
transaction_amount
order by transaction_date
)
20. SQL
max ( trans_id ) over (
partition by sender_house, receiver_house,
transaction_amount
order by transaction_date
range between unbounded preceding and
current row
)
All previous and itself!
21. SQL
max ( trans_id ) over (
partition by sender_house, receiver_house,
transaction_amount
order by transaction_date
range between 90 preceding and
1 preceding
)
Excludes today…
22. SQL
max ( trans_id ) over (
partition by sender_house, receiver_house,
transaction_amount
order by transaction_date
range between 90 preceding and
1/1440 preceding
)
Previous minute
23. Multiple transfers same amount
count (*) over (
partition by sender_house, receiver_house,
transaction_amount
order by transaction_date
range between 90 preceding and
1/1440 preceding
)
26. SENDER_HOUSE RECEIVER_HOUSE TRX#
None None 321
Night's Watch Night's Watch 216
House Greyjoy House Greyjoy 73
House Baratheon House Baratheon 71
House Lannister House Lannister 57
47. SQL
TDT TAMOUNT TTYPE SENDER_HOUSE RECEIVER_HOUSE FRAUD
----------- -------------- ---------- --------------- --------------- -----
04-JAN-2017 257,657.85 CASH_IN House Tyrell House Lannister 1
03-FEB-2017 105,861.07 TRANSFER House Arryn House Stark 0
06-FEB-2017 2,410.38 DEBIT House Tyrell House Baratheon 0
20-FEB-2017 322.94 PAYMENT House Arryn House Stark 0
21-FEB-2017 1,559.41 CASH_OUT House Tyrell House Targaryen 0
04-MAY-2017 1,624.57 CASH_IN House Arryn House Baratheon 0
07-JUN-2017 2,070.47 CASH_OUT Night's Watch House Arryn 1
22-JUN-2017 3,392.80 DEBIT House Tully House Arryn 0
03-JUL-2017 1,767.72 PAYMENT Wildling House Lannister 0
16-JUL-2017 56,125.55 TRANSFER House Greyjoy House Stark 0
48. Machine Learning: Split the data!
Train Data (80%)
Test Data (20%)
got_transactions SAMPLE(80)
seed(124)
… Minus …
49. Supervised Modeling: Settings
Step 2: Setting for the algorithm…..
INSERT INTO MODEL_SETTINGS (setting_name, setting_value)
VALUES ('ALGO_NAME', 'ALGO_NAIVE_BAYES');
INSERT INTO MODEL_SETTINGS (setting_name, setting_value)
VALUES ('PREP_AUTO', 'ON');
60. with ranks as (
select trans_id,
row_number () over (
partition by trans_id
order by prediction desc
) rn
from fraud_nb_results
)
select * from ranks where rn = 1
61. SENDER_HOUSE RECEIVER_HOUSE TRX#
House Lannister House Lannister 421
House Arryn House Arryn 41
Night's Watch Night's Watch 38
House Greyjoy House Greyjoy 35
None None 28
71. How most people view probabilities
% chance of an event
0% 100%50%
72. How most people view probabilities
% chance of an event
0% 100%50%
Impossible CertainToss-up
73. How most people view probabilities
% chance of an event
0% 100%50%
Impossible CertainToss-up
Almost certainNearly impossible
74. SQL rules vs ML
• Deterministic -> guaranteed
to flag; easy to test/verify
• Needs to be told rules
• Need domain knowledge
• Existing skills
• You KNOW why trx flagged
• Probabilistic -> don't know
the outcome
• Can find rules
• Might miss "obvious" fraud
• New skills & approaches
• Why flagged? ¯_(ツ)_/¯
75. The real world
• What's the impact of false positive & false negative?
• How much effort is it to resolve misclassification?
• How high is the false positive rate?
• How do explain the results if someone wants to know?
Which one are you ready for?