Designing and building post compromise recoverable services


Published on

A look at how to design and build services, systems, networks, hosts and applications that are designed to be able to successfully deal with a security compromise.

The deck also touches on the topics of self-healing systems and potential applications of machine learning to the problem space.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • These aren’t the only attack paths. For example you could attack upstream i.e.:
    Third party software components source repos.
    Customer threat actors could go after the service’s corporate IT etc.
  • Packaging, testing & deployment
    Careful trust and architecture boundary considerations
    Kill passwords forever (2FA/MFA)
    Ability to easily monitor to varying degrees (live, log or full packet capture)
    Ability to easily isolate aspects while maintaining service
    Ability to easily operate while isolated from known compromised / good
  • Ability to roll credentials / secrets
    Ability to query service properties, behaviour, performance etc.
    Ability to increase protective monitoring / active response
    Ability to verify integrity* (configuration, software, package, system, host, network etc..)
    Ability to increase integrity verification frequency
  • Ability to define, model or learn healthy / normal
    Ability to define and execute reactions to events / situations
    if this then that
    Consider (less tried and tested – or ‘it worked in PhD project’)
    Machine learning for behaviours at all layers (we’ve seen this productized in a focused manner)
    Ability to rate or access limit functionality automatically and/or manually in high alert situations
    Something we’ve not considered
  • Educate in defensive coding and functional design
    Consider 3rd party component integrity verification
    Ability to verify source control integrity
    Ability to verify build server integrity
    Ability to verify development to live assets integrity
    Archive releases (artefacts, source, test output and logs)
    Develop compromise unit test cases for functionality in systems and software
    Test compromise scenarios in pre-production
  • Able to define ‘security healthy’
    Plan for highest level of access compromise
    Ensure configuration management
    Ensure configuration integrity monitoring
    Protective monitoring and anomaly detection
    Have the ability to time-line across many distinct sources of data
    Take inspiration* from Netflix’s Simian Army and fire drill
    investigating, segregating, operating, rebuilding, repairing, rolling and reintegrating
  • You need to be able to define system, network, host, software and service
  • Integrity verification or other high confidence indicator
    Ability to identify likely root cause and remediate*
    Alert (operations)
    Opt out of operation
    Snapshot (machines / configuration / logs)
    Revert (to known good)
  • Client’s user behaviour – needs to be learnt
    Client’s software behaviour – do we care?
    Clients system behaviour – do we care?
    Client behaviour – needs to be learnt
  • Service behaviour – needs to be defined / modelled / learnt
    Software behaviour – needs to be defined / modelled / learnt
    System behaviour – needs to be defined / modelled / learnt
    Network behaviour – needs to be defined / modelled / learnt
    Operations / staff (and their credentials) behaviour
  • Client’s database queries usually*(1) non sequential across records and non complete result sets*(2)
    Query observed doing select * from what is usually a source(*3) of the same base 75 queries
    Results return speed is rate limited*(4) with marginal effect
    Alert is raised to client security point of contact
    query, source, destination (including db and table), time and date
    reaction by system
    Snapshot database logs and source machine taken into security incident zone for client / your analysis
    … facilitates full post incident analysis
  • An operations desktop gets rolled by client side
    Credentials stolen and used at a higher rate*(1) than normal during non incident window*(2) or against systems not part of incident group*(3)
    Credentials used from hosts other than expected*(4)
    Alert sent to operations shift manager and security operations centre
    sources, destination, times and dates
    reaction by system
    Credentials automatically disabled

    … exposure window minutes
  • One large company has Red and Blue teams
    Red always attacking the services
    Blue always looking trying to detect and mitigate

    Your Red team could be a Netflix-esq simian army
    Your Blue team could be your self-healing systems

    Result = If stuff isn’t happening then it’s broken!
  • Services, systems and software need to be compromise ready – old school:
    Secure engineering
    Intrusion prevention
    Principal of least privilege
    Intrusion detection
    Current approaches revolve around:
    Event correlation / confidence indicators
    Human analysis and intervention
    Machine learning
    … it’s the way of the future …
  • Designing and building post compromise recoverable services

    1. 1. Designing and building post compromise recoverable services Ollie Whitehouse
    2. 2. Why? "We may be at the point of diminishing returns by trying to buy down vulnerability" "maybe it’s time to place more emphasis on coping with the consequences of a successful attack, and trying to develop networks that can ‘self-heal’ or ‘self-limit’ the damages inflicted upon them” Gen. Michael Hayden (USAF-Ret.) ex NSA and CIA head February, 2012
    3. 3. Why?
    4. 4. Agenda • Stages of a compromise • Impact limitation • Healing • Requirements for: • design • build • operations • Wrap-up and conclusions
    5. 5. Stages of a compromise
    6. 6. Stages of a compromise
    7. 7. Stages of a compromise
    8. 8. What can we do? Deny
    9. 9. What can we do? Frustrate
    10. 10. What can we do? Misdirect
    11. 11. What can we do? Contain
    12. 12. Services are unique
    13. 13. Indicator collection
    14. 14. Detection
    15. 15. Impact limitation
    16. 16. Healing – old wisdom / not practical rebuild & reinstall everything down to bare metal (to avoid whack-a-mole and persistence)
    17. 17. Healing – reality remediate, re-establish trust & re-integrate (whilst continuing to provide service, avoiding whack-a-mole & persistence)
    18. 18. Healing
    19. 19. Healing - configuration
    20. 20. Healing a live service
    21. 21. Healing – real world
    22. 22. The requirements design, development and operations
    23. 23. Design • Packaging, testing & deployment • Boundaries • Authentication • System wide monitoring • Isolation • Operation while isolated
    24. 24. Design • Roll-ability (not a word) • Query-ability (not a word) • Variable protection • Integrity verification • Frequency of checks
    25. 25. Design • Health / normal • Response • if this then that • Consider • Machine learning for behaviours • Rate limiting • Something else
    26. 26. Development • Staff & vendor education • 3rd party components • Source integrity • Build environment integrity • Build artefact integrity • Archive releases • Compromise unit test cases • Test compromise scenarios
    27. 27. Operations • Able to define ‘security healthy’ • Worse case scenario planning • Configuration management • Configuration integrity • Protective monitoring • Time-line capability • Fire drill - continually
    28. 28. The requirements of tomorrow self healing
    29. 29. Self-heal – defining states
    30. 30. Self-heal - steps • Detect • Verify integrity • Understand and remediate • Alert • Segregate • Snapshot • Revert / Rebuild / Restart • Verify • Reintegrate
    31. 31. Self-heal – what is healthy? • Client’s user behaviour • Client’s software behaviour • Client’s system behaviour • Clients behaviour
    32. 32. Self-heal – what is healthy? • Service behaviour • Software behaviour • System behaviour • Network behaviour • Operations / staff (and their credentials)
    33. 33. Putting it into practice two (simplistic) examples and one point for consideration
    34. 34. Example #1 (semi-passive response) • Client SQLi • Database dump – sequential record read • Response taken • Alerts raised • Snapshots taken … facilitates full post indecent analysis
    35. 35. Example #2 (active response) • Ops client side attack • Credentials stolen • Anomalous credential behaviour • Alerts sent • Credentials automatically disabled … exposure window minutes
    36. 36. Point for consideration • Red and Blue teams • Red team could be a Netflix-esq simian army • Blue team could be your self-healing systems
    37. 37. Conclusions • Design and implement compromise readiness • Self learning / healing the future • Plan for worse case* • Test scenarios continually
    38. 38. Europe Manchester - Head Office Cheltenham Edinburgh Leatherhead London Milton Keynes Amsterdam Copenhagen Munich Zurich North America Atlanta Austin Chicago Mountain View New York San Francisco Seattle Australia Sydney Thanks! Questions?