Implementing Big Data solutions is commonly regarded as a technology challenge. This is definitely true given the ever-increasing array of options that we, as engineers, can rely on.
This talk will discuss the other, less visible obstacles that teams and organizations face when dealing with a large-scale rollout of new data products.
6. 6
2017
The Workshop
Accepting payments for virtual
goods is a risky business.
Ecommerce fraud
1.0 Countermeasures
§ Limit the number of attempts
§ Vet the source of transactions
§ Manual review of transactions
Ecommerce
fraud
2.0 Countermeasures
§ Semantic associations
§ Risk modeling
§ Behavioral modeling
vs.
Friendly fraudIdentity theft
Financial fraud
10. 10
2017
The Workshop
The Problem
A long feedback loop between
§ Data entering our systems
§ People gaining insights about
its meaning
§ Having those insights
creating an impact
Data
Users DDDApplications
months
Data Driven Decisions
14. 14
2017
The Workshop
Big Data
More than a technology enabler
Used to define
• A new approach to decision
making
• A different operating model
• Changes in roles and
responsibilities
Something that crystallizes the
imagination of people
We buzzwords
17. 17
2017
The Workshop
Landscape
Technology selection is getting
increasingly more complex
§ Vendors push for vertical platforms
§ We love to build frameworks
All data products can be bent to a
certain extent
§ Native graph, non native document
§ Native columnar, non native time-
series
19. 19
2017
The Workshop
1. Big Picture
Used as a collection of guiding
principles and patterns
It highlights
§ What capabilities are needed
§ When decisions need to be made
Other considerations
§ Existing skills
§ Competency availability
§ Learning curve
Applications
RTDW
BI
OLAP
OLTP
Data Ingestion
Events store
events
raw data
Stream
Processing
Aggregate / Specialized
databases
aggregates
MR / Hive / Spark / R
Cloud Disk FTP ServerCloud
Switch Back Up Server
LDAP Server LDAP Server
UPS Battery
Firewall
Backup Tape
Library
LDAP Repository LDAP Repository
Batch Processing
Dashboard
ML
User
20. 20
2017
The Workshop
2. Complexity
Traditionally regarded to as size.
Reality is there’s more to it
§ 5Vs is about inherent complexity
§ Extrinsic complexity needs to be
factored in
VarietyVeracity
VelocityVolume
Value
Availability Confidentiality
21. 21
2017
The Workshop
3. Experiment
§ Operational and production
readiness
§ CAP theorem in practice
§ The devil is in the details
The product brochure does not give the
full picture
§ What is @Aphyr saying?
§ Do you really need it?
23. 23
2017
The Workshop
Traditional BI
The role of BI is traditionally biased
towards reaction.
§ Reports
§ KPIs
§ Alerts
Heavy reliance on
§ Few, coarse grained aggregates
§ SQL
§ Excel!
BI team
Business
Data visualization
Programming
Statistics
24. 24
2017
The Workshop
Data Science
§ Not just about upskilling
§ Focus on building actionable
insights
§ Find champions that can help
spread the word
§ Learn the craft
Data Science team
Business
Data visualization
Programming
Statistics
Big data
26. 26
2017
The Workshop
Maturity
Process, organizational structure
and engineering practices have the
potential of hindering innovation
Innovation-led projects are hard to
manage when an organization is in
a subsequent phase
So, ultimately…
Fluid phase Transitional phase Specific phase
Rate of innovation
time
Product innovation
Process innovation
Source: (Utterback 1994)
29. 29
2017
The Workshop
The road to MVP
Focus on a Minimum Viable Solution
§ Focus on outcome, not output
§ Deliver value incrementally
§ Measure early
§ Experiment with real data
Build a start-up team to focus on core
benefits
§ Cutting through bureaucracy
§ Ensuring we avoid biggerism
Core benefits
Tangible Specification
Augmented features
Innovation happens at the centre
30. 30
2017
The Workshop
MVS
Core benefits
§ Taxonomy of associations between
players and other data sources
§ Device protection for account
takeovers
§ Fraud ring identification
§ Bonus abuse prevention
Device
fingerprint
Location
ID check
Physical address
Email
Phone number
Credit cards
Date of birth
Risk score
Password
Relationship graph
Event store
Friendly fraud
Friendly fraud
Financial fraud
Identity theftFriendly fraud
Classifier
31. 31
2017
The Workshop
Outcomes
§ Prove something, then engineer it
§ ML can be done in Excel
Choosing not to adopt something is as
important as adopting it
§ Reduces clutter
§ Improves focus
Again, focus on the core benefits
Applications
RTDW
BI
OLAP
OLTP
Data Ingestion
Events store
events
raw data
Stream
Processing
Aggregate / Specialized
databases
aggregates
MR / Hive / Spark / R
Cloud Disk FTP ServerCloud
Switch Back Up Server
LDAP Server LDAP Server
UPS Battery
Firewall
Backup Tape
Library
LDAP Repository LDAP Repository
Batch Processing
Dashboard
ML
User