Fight Fraud with Big Data
Analytics this Holiday Season

© 2013 Datameer, Inc. All rights reserved.
View Full Recording

View the full recording of this webinar at:
http://info.datameer.com/SlideshareFighting-Fraud-this-Holiday-Season.html
Fight Fraud with Big Data
Analytics this Holiday Season

© 2013 Datameer, Inc. All rights reserved.
About our Speakers
Karen Hsu (@Karenhsumar)
–  Karen is Senior Director, Product Marketing
at Datameer. With over 15 years of
experience in enterprise software, Karen
Hsu has co-authored 4 patents and worked
in a variety of engineering, marketing and
sales roles. 
–  Most recently she came from Informatica
where she worked with the start-ups
Informatica purchased to bring data quality,
master data management, B2B and data
security solutions to market.  

–  Karen has a Bachelors of Science degree
in Management Science and Engineering
from Stanford University.  
About our Speakers
• John Kreisa (@marked_man)
– A veteran from the enterprise
marketing industry John has worked
worked on products at every level of
the IT stack from the depths of
storage through to the insight of
business intelligence and analytics.
Currently John leads partner and
strategic marketing initiatives at open
source leader Hortonworks who
develops, distributes and supports
Apache Hadoop.
Fight Fraud with Big Data
Analytics this Holiday Season

© 2013 Datameer, Inc. All rights reserved.
Agenda
•  Current challenges
•  What to look for in a solution addressing
fraud
•  Demo
•  Q&A
Challenges
Merchants paying
$200-250B in fraud
losses annually
Banks and Financial
Organizations
losing $12-15B
annually
eTailers lost $3.5B
to online fraud

Over 20B
credit card
transactions
annually
Face of Fraud is Changing

HELLO
my name is

$5.15	

 $3.95	

 $4.10	

$4.15	

$4.55	

$3.22	


greg

7-ELEVEN

POS Reports

Location Data

Transactions

Authorizations
APPLICATIONS	
  

Challenges with Existing Data Architecture
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  

DATA	
  	
  SYSTEM	
  

2.8	
  ZB	
  in	
  2012	
  
85%	
  from	
  New	
  Data	
  Types	
  
RDBMS	
  

EDW	
  

MPP	
  

REPOSITORIES	
  

15x	
  Machine	
  Data	
  by	
  2020	
  
40	
  ZB	
  by	
  2020	
  

SOURCES	
  

Source: IDC

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2013
What to Look For in a Fraud
Analytics Solution

© 2013 Datameer, Inc. All rights reserved.
Big Data Analytics Lifecycle

1. Integrate

Identify
Use Case

4. Visualize

2. Prepare
3. Analyze

Modern Day Architecture

Deploy
Define!
▪ Use Cases

ROI and TCO Methodology

" Customer Analytics

"  ROI customer metrics"

" Operational Analytics

"  ROI and TCO calculator"

" Legacy Modernization
" Fraud and Compliance

Funnel
Optimization

Behavioral
Analytics

Fraud
Prevention

EDW
Customer
Optimization
 Segmentation

Increase
Customer
conversion by 3x

Increase
Revenue by 2x

Identify $2B in
potential fraud

98% OpEx
savings$1M+
CapEx savings

© 2013 Datameer, Inc. All rights reserved.

Lower Customer
Acquisition
Costs by 30%
Polling question 1

© 2013 Datameer, Inc. All rights reserved.
Polling Question
What use cases are looking at or implementing
today?
▪  Profiling and segmentation 

▪  Product development and operations optimization
▪  Cross-sell / up-sell
▪  Campaign management
▪  Acquisition and retention
▪  EDW optimization
▪  Fraud and compliance
▪  Other
Integrate!
Codeless Integration

Big Data Management

"

Reuse existing DB views and SQL"

"

Data Partitioning"

"

50+ Datameer connectors, plug-in API"

"

Data Retention policies"





© 2013 Datameer, Inc. All rights reserved.
Prepare and Analyze!
Interactive Data
Preparation

Interactive + Smart
Analytics

Transparency +
Governance

" JSON, XML, URL-specific

"  250+ built-in functions"

"  Visual data lineage"

"  Automated machine learning"

"  Complete audit trail"

"  SmartSampling "

"  Metadata catalog"

functions

" Multi-column joins, unions"

© 2013 Datameer, Inc. All rights reserved.
Visualize!
Visualization Anywhere

Visual Discovery

"   Infographic or dashboard"

"   Machine Learning algorithms"

"   Run on tablets and smart phone devices"

© 2013 Datameer, Inc. All rights reserved.
Deploy!
Security

Scheduling

Monitoring

"  LDAP / Active Directory "

"  Dependency triggers"

"  Monitoring system, jobs,

"  Role based access control"

"  Data synchronization"

"  Support for Kerberos"

"  External scheduling integration"

performance, throughput"
"  Error handling"
"  Log management"

© 2013 Datameer, Inc. All rights reserved.
APPLICATIONS	
  

Modern Data Architecture Enabled
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  
DEV	
  &	
  DATA	
  
TOOLS	
  

SOURCES	
  

DATA	
  	
  SYSTEM	
  

BUILD	
  &	
  
TEST	
  

OPERATIONAL	
  
TOOLS	
  
RDBMS	
  

EDW	
  

MANAGE	
  &	
  
MONITOR	
  

MPP	
  

REPOSITORIES	
  

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2013 - Confidential

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Page 20
3

Requirements for Hadoop Adoption
Requirements for Hadoop’s Role
in the Modern Data Architecture

Integrated

Interoperable with
existing data center
investments

Key Services
Skills

Platform, operational and
data services essential for
the enterprise

Leverage your existing
skills: development,
operations, analytics

© Hortonworks Inc. 2013 - Confidential

Page 21
Requirements for Enterprise Hadoop

1
2
3

Key Services
Platform, Operational and
Data services essential
for the enterprise

OPERATIONAL	
  
SERVICES	
  
AMBARI	
  

HBASE	
  

CORE	
  

PIG	
  

SQOOP	
  
LOAD	
  &	
  	
  
EXTRACT	
  

Skills

	
  
	
  

PLATFORM	
  	
  
SERVICES	
  

Integrated

MAP	
  	
  
REDUCE	
  
	
  

NFS	
  

TEZ	
  

YARN	
  	
  	
  

WebHDFS	
  

KNOX*	
  

HIVE	
  &	
  

HCATALOG	
  

HDFS	
  
Enterprise Readiness
High Availability, Disaster
Recovery, Rolling Upgrades,
Security and Snapshots

HORTONWORKS	
  	
  
DATA	
  PLATFORM	
  (HDP)	
  

Engineered with existing
data center investments
OS/VM	
  

© Hortonworks Inc. 2013 - Confidential

FLUME	
  

FALCON*	
  
OOZIE	
  

Leverage your existing
skills: development,
analytics, operations

DATA	
  
SERVICES	
  

Cloud	
  

Appliance	
  
Page 22
Requirements for Enterprise Hadoop

3

Leverage your existing
skills: development,
analytics, operations

Integration

DEVELOP	
  
ANALYZE	
  

2

Skills

Platform, operational and
data services essential
for the enterprise

OPERATE	
  

1

Key Services
COLLECT	
  

PROCESS	
  

BUILD	
  

EXPLORE	
  

QUERY	
  

DELIVER	
  

PROVISION	
  

MANAGE	
  

MONITOR	
  

Engineered with existing
data center investments

© Hortonworks Inc. 2013 - Confidential

Page 23
Familiar and Existing Tools

3

Leverage your existing
skills: development,
analytics, operations

Integration

DEVELOP	
  
ANALYZE	
  

2

Skills

Platform, operational and
data services essential
for the enterprise

OPERATE	
  

1

Key Services
COLLECT	
  

PROCESS	
  

BUILD	
  

EXPLORE	
  

QUERY	
  

DELIVER	
  

PROVISION	
  

MANAGE	
  

MONITOR	
  

Interoperable with existing
data center investments

© Hortonworks Inc. 2013 - Confidential

Page 24
APPLICATIONS	
  

Requirements for Enterprise Hadoop
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  

Integrated with
DEV	
  &	
  DATA	
  
TOOLS	
  

Applications
BUILD	
  &	
  

DATA	
  	
  SYSTEM	
  

Business Intelligence,
TEST	
  
Developer IDEs,
Data Integration

SOURCES	
  

3

OPERATIONAL	
  
TOOLS	
  
RDBMS	
  

EDW	
  

MANAGE	
  &	
  
Systems
MONITOR	
  

MPP	
  

Data Systems & Storage,
Systems Management

REPOSITORIES	
  

Platforms

Integration 	
  
Exis4ng	
  Sources	
  

Engineered with Lexisting
(CRM,	
  ERP,	
  Clickstream,	
   ogs)	
  
data center investments

© Hortonworks Inc. 2013 - Confidential

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Operating Systems,
Virtualization, Cloud,
Appliances

Page 25
DATA	
  SYSTEM	
  

APPLICATIONS	
  

Datameer in the Modern Data Architecture

DEV	
  &	
  DATA	
  TOOLS	
  

OPERATIONAL	
  TOOLS	
  
RDBMS	
  

EDW	
  

HANA

MPP	
  

SOURCES	
  

INFRASTRUCTURE	
  

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2013 - Confidential

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Page 26
Demonstration 1

© 2013 Datameer, Inc. All rights reserved.
Identifying Potential Fraud

How much has been spent at
a vendor?

Is that spend normal?

Were there transactions…

When a credit card stolen?
Identify Outliers in Transactions


1.  Calculate average and standard deviation
for each category



2.  Identify outliers in all transactions
Transaction
Amount

-

Category
Average

> 2*

Std Dev of
Category
Demonstration 2

© 2013 Datameer, Inc. All rights reserved.
Fraud and Data Mining on Hadoop 

Clustering

Column Dependencies

Decision Tree

Recommendations
Demonstration 3

© 2013 Datameer, Inc. All rights reserved.
Predictive Modeling and Datameer
Model Building

Model Deployment
Integration / Execution

PMML
	
  

	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  

Datameer Server
PMML	
  
PMML	
  
PMML	
  
(models)	
  
(models)	
  
(models)	
  

UPPI	
  
Predictive Modeling and Fraud


1.  Bring in model 
2.  Apply function data to get likelihood
transaction is fraudulent
Next Steps:
More about Datameer and Big Data
www.datameer.com

Get started on with Datameer and Hortonworks
http://hortonworks.com/hadoop-tutorial/datameer/

Contact us:
John Kreisa jkreisa@hortonworks.com 

Karen Hsu khsu@datameer.com 

Page 35
Polling Question
What part of webinar did you find the most useful?

▪  Use cases
▪  Tool ease of use of setup comparison
▪  Tool quality comparison
▪  Best practices
▪  Demonstration
Q&A
Best Practices

© 2013 Datameer, Inc. All rights reserved.
Calculating ROI is a process
Apply ROI to Multiple Projects 
Project 3
Project 2
Project 1
Hardware
Savings

Software
Savings

Productivity

Business
Benefit
Calculating Return
Benefits

-

Costs

=

Return

Identify Fraud

Hardware

$$$

Improve Marketing

Software

Time

Increase Sales

Integration

Flexibility

Improve Product

People

Increase Conversion

Operations

Lower IT expenses

Logistics
Universal Plug-In Overview
Features and Model Types

The Plug-in delivers a wide range of predictive analytics for high performance scoring, including:
•  Decision Trees for classification and regression

•  Neural Network Models: Back-Propagation, Radial-Basis Function, and Neural-Gas
•  Support Vector Machines for regression, binary and multi-class classification
•  Linear and Logistic Regression (binary and multinomial)
•  Naïve Bayes Classifiers
•  General and Generalized Linear Models
•  Cox Regression Models
•  Rule Set Models (flat decision trees)
•  Clustering Models: Distribution-Based, Center-Based, and 2-Step Clustering
•  Scorecards (including reason codes)
•  Association Rules
•  Multiple Models: Model ensemble, segmentation, chaining and composition
It also implements the a data dictionary, missing/invalid values handling and data pre-processing.

42

Fight Fraud with Big Data Analytics

  • 1.
    Fight Fraud withBig Data Analytics this Holiday Season © 2013 Datameer, Inc. All rights reserved.
  • 2.
    View Full Recording Viewthe full recording of this webinar at: http://info.datameer.com/SlideshareFighting-Fraud-this-Holiday-Season.html
  • 3.
    Fight Fraud withBig Data Analytics this Holiday Season © 2013 Datameer, Inc. All rights reserved.
  • 4.
    About our Speakers KarenHsu (@Karenhsumar) –  Karen is Senior Director, Product Marketing at Datameer. With over 15 years of experience in enterprise software, Karen Hsu has co-authored 4 patents and worked in a variety of engineering, marketing and sales roles. –  Most recently she came from Informatica where she worked with the start-ups Informatica purchased to bring data quality, master data management, B2B and data security solutions to market.  –  Karen has a Bachelors of Science degree in Management Science and Engineering from Stanford University.  
  • 5.
    About our Speakers • JohnKreisa (@marked_man) – A veteran from the enterprise marketing industry John has worked worked on products at every level of the IT stack from the depths of storage through to the insight of business intelligence and analytics. Currently John leads partner and strategic marketing initiatives at open source leader Hortonworks who develops, distributes and supports Apache Hadoop.
  • 6.
    Fight Fraud withBig Data Analytics this Holiday Season © 2013 Datameer, Inc. All rights reserved.
  • 7.
    Agenda •  Current challenges • What to look for in a solution addressing fraud •  Demo •  Q&A
  • 8.
    Challenges Merchants paying $200-250B infraud losses annually Banks and Financial Organizations losing $12-15B annually eTailers lost $3.5B to online fraud Over 20B credit card transactions annually
  • 9.
    Face of Fraudis Changing HELLO my name is $5.15 $3.95 $4.10 $4.15 $4.55 $3.22 greg 7-ELEVEN POS Reports Location Data Transactions Authorizations
  • 10.
    APPLICATIONS   Challenges withExisting Data Architecture Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DATA    SYSTEM   2.8  ZB  in  2012   85%  from  New  Data  Types   RDBMS   EDW   MPP   REPOSITORIES   15x  Machine  Data  by  2020   40  ZB  by  2020   SOURCES   Source: IDC Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013
  • 11.
    What to LookFor in a Fraud Analytics Solution © 2013 Datameer, Inc. All rights reserved.
  • 12.
    Big Data AnalyticsLifecycle 1. Integrate Identify Use Case 4. Visualize 2. Prepare 3. Analyze Modern Day Architecture Deploy
  • 13.
    Define! ▪ Use Cases ROI andTCO Methodology " Customer Analytics "  ROI customer metrics" " Operational Analytics "  ROI and TCO calculator" " Legacy Modernization " Fraud and Compliance Funnel Optimization Behavioral Analytics Fraud Prevention EDW Customer Optimization Segmentation Increase Customer conversion by 3x Increase Revenue by 2x Identify $2B in potential fraud 98% OpEx savings$1M+ CapEx savings © 2013 Datameer, Inc. All rights reserved. Lower Customer Acquisition Costs by 30%
  • 14.
    Polling question 1 ©2013 Datameer, Inc. All rights reserved.
  • 15.
    Polling Question What usecases are looking at or implementing today? ▪  Profiling and segmentation ▪  Product development and operations optimization ▪  Cross-sell / up-sell ▪  Campaign management ▪  Acquisition and retention ▪  EDW optimization ▪  Fraud and compliance ▪  Other
  • 16.
    Integrate! Codeless Integration Big DataManagement " Reuse existing DB views and SQL" " Data Partitioning" " 50+ Datameer connectors, plug-in API" " Data Retention policies" © 2013 Datameer, Inc. All rights reserved.
  • 17.
    Prepare and Analyze! InteractiveData Preparation Interactive + Smart Analytics Transparency + Governance " JSON, XML, URL-specific "  250+ built-in functions" "  Visual data lineage" "  Automated machine learning" "  Complete audit trail" "  SmartSampling " "  Metadata catalog" functions " Multi-column joins, unions" © 2013 Datameer, Inc. All rights reserved.
  • 18.
    Visualize! Visualization Anywhere Visual Discovery "  Infographic or dashboard" "   Machine Learning algorithms" "   Run on tablets and smart phone devices" © 2013 Datameer, Inc. All rights reserved.
  • 19.
    Deploy! Security Scheduling Monitoring "  LDAP /Active Directory " "  Dependency triggers" "  Monitoring system, jobs, "  Role based access control" "  Data synchronization" "  Support for Kerberos" "  External scheduling integration" performance, throughput" "  Error handling" "  Log management" © 2013 Datameer, Inc. All rights reserved.
  • 20.
    APPLICATIONS   Modern DataArchitecture Enabled Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DEV  &  DATA   TOOLS   SOURCES   DATA    SYSTEM   BUILD  &   TEST   OPERATIONAL   TOOLS   RDBMS   EDW   MANAGE  &   MONITOR   MPP   REPOSITORIES   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 - Confidential Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 20
  • 21.
    3 Requirements for HadoopAdoption Requirements for Hadoop’s Role in the Modern Data Architecture Integrated Interoperable with existing data center investments Key Services Skills Platform, operational and data services essential for the enterprise Leverage your existing skills: development, operations, analytics © Hortonworks Inc. 2013 - Confidential Page 21
  • 22.
    Requirements for EnterpriseHadoop 1 2 3 Key Services Platform, Operational and Data services essential for the enterprise OPERATIONAL   SERVICES   AMBARI   HBASE   CORE   PIG   SQOOP   LOAD  &     EXTRACT   Skills     PLATFORM     SERVICES   Integrated MAP     REDUCE     NFS   TEZ   YARN       WebHDFS   KNOX*   HIVE  &   HCATALOG   HDFS   Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS     DATA  PLATFORM  (HDP)   Engineered with existing data center investments OS/VM   © Hortonworks Inc. 2013 - Confidential FLUME   FALCON*   OOZIE   Leverage your existing skills: development, analytics, operations DATA   SERVICES   Cloud   Appliance   Page 22
  • 23.
    Requirements for EnterpriseHadoop 3 Leverage your existing skills: development, analytics, operations Integration DEVELOP   ANALYZE   2 Skills Platform, operational and data services essential for the enterprise OPERATE   1 Key Services COLLECT   PROCESS   BUILD   EXPLORE   QUERY   DELIVER   PROVISION   MANAGE   MONITOR   Engineered with existing data center investments © Hortonworks Inc. 2013 - Confidential Page 23
  • 24.
    Familiar and ExistingTools 3 Leverage your existing skills: development, analytics, operations Integration DEVELOP   ANALYZE   2 Skills Platform, operational and data services essential for the enterprise OPERATE   1 Key Services COLLECT   PROCESS   BUILD   EXPLORE   QUERY   DELIVER   PROVISION   MANAGE   MONITOR   Interoperable with existing data center investments © Hortonworks Inc. 2013 - Confidential Page 24
  • 25.
    APPLICATIONS   Requirements forEnterprise Hadoop Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   Integrated with DEV  &  DATA   TOOLS   Applications BUILD  &   DATA    SYSTEM   Business Intelligence, TEST   Developer IDEs, Data Integration SOURCES   3 OPERATIONAL   TOOLS   RDBMS   EDW   MANAGE  &   Systems MONITOR   MPP   Data Systems & Storage, Systems Management REPOSITORIES   Platforms Integration   Exis4ng  Sources   Engineered with Lexisting (CRM,  ERP,  Clickstream,   ogs)   data center investments © Hortonworks Inc. 2013 - Confidential Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Operating Systems, Virtualization, Cloud, Appliances Page 25
  • 26.
    DATA  SYSTEM   APPLICATIONS   Datameer in the Modern Data Architecture DEV  &  DATA  TOOLS   OPERATIONAL  TOOLS   RDBMS   EDW   HANA MPP   SOURCES   INFRASTRUCTURE   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 - Confidential Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 26
  • 27.
    Demonstration 1 © 2013Datameer, Inc. All rights reserved.
  • 28.
    Identifying Potential Fraud Howmuch has been spent at a vendor? Is that spend normal? Were there transactions… When a credit card stolen?
  • 29.
    Identify Outliers inTransactions 1.  Calculate average and standard deviation for each category 2.  Identify outliers in all transactions Transaction Amount - Category Average > 2* Std Dev of Category
  • 30.
    Demonstration 2 © 2013Datameer, Inc. All rights reserved.
  • 31.
    Fraud and DataMining on Hadoop Clustering Column Dependencies Decision Tree Recommendations
  • 32.
    Demonstration 3 © 2013Datameer, Inc. All rights reserved.
  • 33.
    Predictive Modeling andDatameer Model Building Model Deployment Integration / Execution PMML                   Datameer Server PMML   PMML   PMML   (models)   (models)   (models)   UPPI  
  • 34.
    Predictive Modeling andFraud 1.  Bring in model 2.  Apply function data to get likelihood transaction is fraudulent
  • 35.
    Next Steps: More aboutDatameer and Big Data www.datameer.com Get started on with Datameer and Hortonworks http://hortonworks.com/hadoop-tutorial/datameer/ Contact us: John Kreisa jkreisa@hortonworks.com Karen Hsu khsu@datameer.com Page 35
  • 36.
    Polling Question What partof webinar did you find the most useful? ▪  Use cases ▪  Tool ease of use of setup comparison ▪  Tool quality comparison ▪  Best practices ▪  Demonstration
  • 37.
  • 38.
    Best Practices © 2013Datameer, Inc. All rights reserved.
  • 39.
  • 40.
    Apply ROI toMultiple Projects Project 3 Project 2 Project 1 Hardware Savings Software Savings Productivity Business Benefit
  • 41.
    Calculating Return Benefits - Costs = Return Identify Fraud Hardware $$$ ImproveMarketing Software Time Increase Sales Integration Flexibility Improve Product People Increase Conversion Operations Lower IT expenses Logistics
  • 42.
    Universal Plug-In Overview Featuresand Model Types The Plug-in delivers a wide range of predictive analytics for high performance scoring, including: •  Decision Trees for classification and regression •  Neural Network Models: Back-Propagation, Radial-Basis Function, and Neural-Gas •  Support Vector Machines for regression, binary and multi-class classification •  Linear and Logistic Regression (binary and multinomial) •  Naïve Bayes Classifiers •  General and Generalized Linear Models •  Cox Regression Models •  Rule Set Models (flat decision trees) •  Clustering Models: Distribution-Based, Center-Based, and 2-Step Clustering •  Scorecards (including reason codes) •  Association Rules •  Multiple Models: Model ensemble, segmentation, chaining and composition It also implements the a data dictionary, missing/invalid values handling and data pre-processing. 42