[DSC Croatia 22] Building smarter ML and AI models and making them more accurate - David Akka
Building smarter ML & AI
models and making them more
accurate
David Akka, SVP EMEA, SQream
A unique and accelerated approach to decision support
& augmentation algorithms
A Brief History of
AI
1950 1960
Alan Turing introduced
the Turing test in his
paper — an effort to
create an intelligence
design standard for the
tech industry
Isaac Asimov publishes
the influential sci-fi
collection “I,Robot.”
1956, Dartmounth
conference launches the
field of AI and coins the
term “ artificial
intelligence” (IBM
computer, used by first AI
researchers
1964-1966 ELIZA, an
early natural language
processing computer
program created at the
MIT Artificial Intelligence
Laboratory by Joseph
Weizenbaum
1970 1980 1990 2000 2010
Boom Times
The focus shifted to more
practical issues around
1980s
The customer data
management (CDM)
market was formally
established in the 1990s
after the successful
launch of the
aforementioned packaged
CRMs
2011 : IBM’s Watson
wins “Jeopardy!”
beating former
champions
Rise of the Social
Media
2020
Apple introduces
intelligent personel
asistant Siri iPhone
2
Traditional
Data Sources
76% of data scientists say that data
preparation is the worst part of their
job,
but the efficient, accurate business
decisions can only be made with
clean data.
Acquire
Grow
Retain
Predictive
Customer Analytics
Predictive
Threat & Fraud Analytics
Monitor
Detect
Control
Predictive
Operational Analytics
Manage
Maintain
Maximize
xxx
SQL Pushback
System
Metadata Server
Middleware Server
Traditional Modelling
Steps
7
Traditional
Data Sources
Challenges of the new ERA of data-intensive
computing
Gather massive data
Store massive data
Traditional systems are not
efficient enough to prepare
massive data in time due to
scale and complexity. It
requires more processing
power which create additional
cost
More capacity and time
required to run models on
more complex and massive
data.
NLP capabilities are important
for txt data
8
Proprietary and Confidential 8
Proprietary and Confidential
8
SQream - A New Architecture for large
scale data
9
Proprietary and Confidential 9
Proprietary and Confidential
9
Adding propellers to old planes won’t turn them into jets
GPU based SQL Analytics platform @ Any
Scale
• Massively parallel engine
• Faster & smaller than CPUs
• Consolidate servers
POWERED BY
GPUs
• Terabytes to petabytes
• Not limited by RAM
• Not limited by data size
• Ingests 3 TB/hour per GPU
• Accelerated steaming
• Always-on compression
• Familiar ANSI SQL
• Standard connectors
• No learning curve
• High throughput compute
• Very cost-efficient
• Lowest CO₂ emissions
• Python, AI, Jupyter, etc.
• Built for data science
• Accelerated training
MASSIVELY
SCALABLE
SQL
DATABASE
EXTENSIBLE
FOR ML/AI
MINIMAL
FOOTPRINT
LIGHTNING
FAST
10
Proprietary and Confidential 10
Proprietary and Confidential
10
GPU based SQL Analytics platform @ Any
Scale
faster
Queries
of resources
Cost Saving
more data
Analyze
less
Emissions
100x 20x 100x 90%
11
Proprietary and Confidential 11
Proprietary and Confidential
11
Rapid analytics on more data more
frequently
ANALYZE
DATA
FASTER
ANALYZE
MORE
DATA
ANALYZE MORE
DIMENSIONS
SHORTEN
DATA
PREPARATION
RUNSQL
QUERIESFASTER
RUNQUERIES
ONMOREDATA
ENABLEMORE
COMPLEXJOINS
AD-HOCQUERIES
ONRAWDATA
Rapidly analyze the full scope of your massive data, from terabytes to petabytes,
to achieve critical insights that were previously unattainable
12
Proprietary and Confidential 12
Proprietary and Confidential
12
Seamless integration into your
ecosystem
13
Data Sources Data Science & BI
Infrastructure
Proprietary and Confidential 13
Proprietary and Confidential
13
Insights Platform @any
scale
• Business Analysts
• Data Scientists
• Business Apps
• PLC/Machines
Cloud Edge
On-prem
AI/ML
End-to-End
Analytics
Structured Data
Cloud
Semi & Unstructured
Edge
On-Prem
Proprietary and Confidential 14
Proprietary and Confidential
14
Add SQream to build smarter ML & AI
models
Proprietary and Confidential
• Modeling:
• Direct data connection from Jupiter Notebook
• No need for data preparation – Ad hoc querying @ any scale
• Real time data interrogation (drill down)@ any data scale
• Self Empowering
• Full data sets
• Training & Model Accuracy
• Very large data sets
• No in-memory limitation
(GPU to GPU)
• No SQL join limitation
• Aggregation at @ any scale
• Small footprint, lowest TCO
• Model Integration
16
Enterprise Fraud Management (EFM)
in Financial Institutions Using ML
Fraudulent transactions have cost a financial institution in the Netherlands millions of Euros per
year, due to their not being detected in time and acted upon accordingly.
Using Machine Learning models has allowed this organization to study how regular transactions
behave, and more importantly, to pinpoint abnormalities and swiftly prevent fraud.
In order for the model to be as accurate as possible, it needs to be based on a wide and all-
encompassing database, joining together a vast array of data sources.
ML model based
on over 95% of
the data
available
18 million Euros
saved per year
47.6% reduction in
fraudulent
transactions
Why Do Anything?
According to CNBC, users around the world lost almost 6-billion dollars to banking
fraud in 2021 alone, a 70% rise in comparison to 2020, breaking some dubious
records. In addition, the European Union has allocated a dedicated budget to
measure, prevent, detect, report, and prosecute fraud.
Why Now?
With digital transactions increasing due to Covid-19, as well as the rise in popularity
of Apple and Google Pay causing an increase in more complex frauds, the
organization was harmed both financially and in reputation. This motivated their
search to improve EFM.
Why SQream?
• Flexible connectivity to leading Machine Learning vendors such as SAS and IBM
• Fast ingestion, near-real-time feedback
• Rapid joining of many tables with large datasets to fit the Machine Learning
model format and achieve multiple reference points
• Can upload 30 TB in 2-hours
• Supports multiple data types, SQream runs 5x faster than the other competitors
Industry Vertical: Finance
Economic Buyer: Security and Monitoring Team
Enabler: BI Teams; Lead Data Scientist
Aggregation
times reduced
from 12 hours to
40 minutes
17 17
Architecture Considerations
(Before)
Insights are not
accurate; frauds are
missed, & reputation
is harmed
Events come in from
multiple sources
The in-memory engine of the Machine
Learning tool is incapable of directly
handling the full data, using samples
and mediating layers
12 HRS on partial
data
18 18
Architecture Considerations
(After)
Events come in from
multiple sources
SQream rapidly
prepares the full
data
The ML tool performs the intended
analytics unburdened by the prep
work
Insights are accurate, more frauds
are found, reputation is improved
40 MIN
19 19
Come to our booth
for an ML demo &
giveaways!
To book a “Trail & Buy”,
Feel free to contact us on all platforms!
21
davida@sqream.com
www.sqream.com
SQreamTech
SQream