Data analytics is extremely powerful, but trouble is on the horizon. This deck makes the case for a new approach to data science to gain its benefits without the downfalls.
2. Benefits of Data Science
• Cheap business analytics on demand
• Decision consequences predicted in advance
• Produce personalized products cheaply and efectively
4. Legal Compliance and Privacy
• Regulations on what data, where, and from which people
• Protecting sensitive information from prying eye inside and out
• Guaranteeing non-discrimination in machine learning
• Anti-trust and anti-competition regulations
5. Data Breaches and Analytics Integrity
• Are we breached? By whom? What did they access?
• Are our analytics correct? Are they tampered with?
• Did our data transmit correctly?
• Did input streams ingest correctly?
• Is there malicious intent from any of our data suppliers?
6. Case In Point
• Financial Industry
• Our entire company was looted by a hacker inducing devastating trades
• Health Care Industry
• Massive lawsuit over mental health records made accessible by rogue analysis
• Credit Reporting Bureaus
• Class action lawsuit for malicious bogus content inserted by a rogue provider
• Intellectual Property Generating Firms
• A competitor just bought a new company with an exact copy of our stack?
7. Where did it go wrong?
• Spoofing and Identity Theft
• Gap in Capabilities between attackers and defenses
• Security versus scalability myth
8. Specific Issues to Address
• Due diligence for legal battles on specific breaches or illicit access
• Inability to detect intrusions
• Excessive trust in identities in ‘restricted environments’
• Need to solve these without performance hits
9. Nice to haves
• Did my data set linking actually work?
• Did this new analytic tool produce ‘quality’ results?
• What questions can I ask?
• Has my sensor array(s) just fried into garbage output?
• Is someone tampering with my input data?
10. Potential Solutions
• Option A – extend the TCP/IP stack with a security layer
• Option B – rebuild the entire stack from the ground up
• Option C – There is no C
11. Example B: Project Moonstone
• A set of design and project planning docs - not code
• Designed to provide security capabilities as a framework
• Replace your Hadoop, Spark, and other data science systems
• Adapters that allow these systems to operate inside Moonstone
• Inside a pre-built SCRUM project framework
• Little overhead required
• Integrated modular distributed anti-virus and intrusion detection
12. Qu Secure Data Science Language Concept
• Primitives to build other systems from built in graph analysis / SQL
• Derived from Scala and Erlang
• Security everywhere – no trusted places
• Auditing guarantees
13. Qu Concept Overview
• DataSet contain Table which are collections of Node
• Node contain Links to other Node in the same Table or not
• All are immutable – they can not change once created
• DataSet control access to DataStore that load and create DataSet
• All versions of all data stored, but some are offloaded to bulk storage
• All DataSet, Table, Node, and Link have timestamps for creation
14. Moonstone on Qu
• Identity Graph, 3 connectedness, and the SecureSocket Interface
• Data cleaning as a security module
• ‘System Temperature’ and automated intrusion reactions
• Automated evaluations and auditing interfaces
• Detecting perimeter threats