THE WEBSITE
RESILIENCY
IMPERATIVE
Scott Hilton
VP & GM of Product
Development
Engin Akyol
CTO & Co-Founder
Andy Lawrence
Research Vice
President
Our Expert Panel
Andy Lawrence, VP Research, 451
Research, Datacenters and Critical
Infrastructure
451 Research - Agenda
• Resiliency – The growing promise of zero downtime
• The reality – complex, public facing outages
• Binary and partial failures
• Distributed resiliency
• Distributed planning…
4
5Company Confidential
Towards an age of zero downtime?
“This is a service that has
not gone down in 12
years, so it’s not that…we
could rely on some sort of
previous experience.”
Werner Vogels,
CTO, AWS
You can never say never. But
we’ve had no outage for 8 years
and as I look at, an outage is
looking less, not more likely.”
CIO, Major manufacturer
Drivers to reliability
• Cloud replicates data and reduces single points of failure
• Equipment is engineered for reliability
• Analytics forecasts and prevents failures
• Redundancy and replication route around failure
• Distributed architectures spread the risks…
• 50 years of IT! We are passed IT 1.0……
Outages make Headline news!
Source: Uptime Institute Research
8Company Confidential
Outages proving costly
Southwest CEO characterizes IT service outage as
“once in a 1,000 year flood”
Downtime costs going up…
Source: Ponemon Institute 2016/Vertiv
Causes/impacts, diagnosis vary widely…
Most commonly
cited causes
Datacenter power outage
Software configuration error
DDOS, BOTs and other security issues
Network or network programming/device
Storage corruption/replication/access
Datacenter equipment or site problem
Causes/impacts, diagnosis vary widely…
Most commonly
cited causes:
Datacenter power outage
Software configuration error
DDOS, BOTs and other security issue
Network or network programming/device
Storage corruption/replication/access
Datacenter equipment or site problem
But most
problems are
never explained
or are partial,
temporary or
intermittent
12Company Confidential
Outages and incidents in 2017
Cloud providers
have problems
too…
Aug 2017
Google Cloud
Aug 2017
Downtime is now a major management concern…
92% of organizations said
their management is more
concerned about outages
then they were a year ago
60% of organizations said
they now formally attempt
to measure the cost of
downtime.
Source: Uptime Institute Survey 2017
The problem of partial downtime and degradation
Performance issues force up cloud costs…
15
• DDOS, BOTs and configuration issues can drag
down performance or use up CPU and disk I/O.
• Cloud services (AWS and Microsoft) will respond
by autoscaling up and use up credits.
• On-premise problems may trigger a burst to
cloud or require DR-as-a-service (More rarely).
Architectural swing…
Scale up
• Transactional
• High Integrity
• Interdependent stack
• Single/mirrored sites
• DR/Back up
• Resilient facilities
Scale out
• Distributed components
• Horizontally layered
• Highly virtualized
• High replicated
• Less transactional
• Available but integrity issues
• Less redundant facilities
Resiliency Trends
• More sharing
• More replication
• More in software
• More distributed
• More is active
Distributed resiliency
Advantages
● Can reduce/eliminate vulnerability to local/regional issues
● Can support extremely high availability/maintainability
● Can eliminate need for Disaster Recovery
● Can enable reduced investment in physical redundancy
● Very scalable – suited to Cloud native/scale out IT
Challenges
● Introduces IT complexity, expense
● Requires compromises – integrity v availability
● Requires scale or close collaboration
● Susceptible to problems with performance, replication and load
management
● Requires continual hands on management
Cloud resiliency and distributed data integrity…
[This] is possible in practice if you control the whole network,
which is rare over the wide area.”
Even then, it requires significant redundancy of network paths,
architectural planning to manage correlated failures, and very
careful operations, especially for upgrades.
Even then outages will occur……..
Professor Eric Brewer, VP Infrastructure, Google
Problems move up, and out…
• Datacenter level switching/domain name management
• Intra-datacenter pathway independence
• Capacity planning/management/latency
• Performance planning/protection
• Visibility and recovery
Who deals with resiliency?
2. Security
Resiliency
• Intrusion detection and
prevention
• Viruses, malware
• DDOS and Bots
• Privacy and encryption
• Secure key management
• Fraud and hacking
• Staff and process management
• Management and operations
• Redundancy – power
• Redundancy – Network
• Availability architectures
4. Facility and Network
Resiliency
• Redundancy – power
• Redundancy – Network
• Cooling systems
• High availability power
distribution
• High Availability architectures
• Tier II-IV designs
• Network pathways
• Capacity forecasting/management
• Management and operations
• Physical site security
4. Facility and Network
Resiliency
• Redundancy – power
• Redundancy – Network
• Cooling systems
• High availability power
distribution
• High Availability architectures
• Tier II-IV designs
• Network pathways
• Capacity forecasting/management
• Management and operations
• Physical site security
4. Facility and Network
Resiliency
• Redundancy – power
• Redundancy – Network
• Cooling systems
• High availability power
distribution
• High Availability architectures
• Tier II-IV designs
• Network pathways
• Capacity forecasting/management
• Management and operations
• Physical site security
2. Security
Resiliency
• Intrusion detection and
prevention
• Viruses, malware
• DDOS and Bots
• Privacy and encryption
• Secure key management
• Fraud and hacking
• Staff and process management
• Management and operations
• Redundancy – power
• Redundancy – Network
• Availability architectures
2. Security
Resiliency
• Intrusion detection and
prevention
• Viruses, malware
• DDOS and Bots
• Privacy and encryption
• Secure key management
• Fraud and hacking
• Staff and process management
• Management and operations
• Redundancy – power
• Redundancy – Network
• Availability architectures
Resiliency is multi-disciplined
4. Facility and Network
Resiliency
• Redundancy – power
• Redundancy – Network
• Cooling systems
• High availability power
distribution
• High Availability architectures
• Tier II-IV designs
• Network pathways
• Capacity forecasting/management
• Management and operations
• Physical site security
1. Systems and software
Resiliency
• Redundancy and replication
• Database sychronisation and
transaction management
• Storage management
• Capacity and load management
• Global traffic management
• Domain management
• High availability designs
• Performance monitoring
• Availability zones/regions
• Management and operations
3. Service providers &
partners
Resiliency
• Network carrier assessment
• SLAs and contracts
• Visibility and transparency
• Certification
• Performance monitoring
Architecture risk assessments
• Capacity assurance
• Cloud service assessment
• Infrastructure test
• Business continuity services
1. Systems and software
Resiliency
• Redundancy and replication
• Database sychronisation and
transaction management
• Storage management
• Capacity and load management
• Global traffic management
• Domain management
• High availability designs
• Performance monitoring
• Availability zones/regions
• Management and operations
1. Systems and software
Resiliency
• Redundancy and replication
• Database sychronisation and
transaction management
• Storage management
• Capacity and load management
• Global traffic management
• Domain management
• High availability designs
• Performance monitoring
• Availability zones/regions
• Management and operations
1. Systems and software
Resiliency
• Redundancy and replication
• Database sychronisation and
transaction management
• Storage management
• Capacity and load management
• Global traffic management
• Domain management
• High availability designs
• Performance monitoring
• Availability zones/regions
• Management and operations
3. Service providers &
partners
Resiliency
• Network carrier assessment
• SLAs and contracts
• Visibility and transparency
• Certification
• Performance monitoring
Architecture risk assessments
• Capacity assurance
• Cloud service assessment
• Infrastructure test
• Business continuity services
3. Service providers &
partners
Resiliency
• Network carrier assessment
• SLAs and contracts
• Visibility and transparency
• Certification
• Performance monitoring
Architecture risk assessments
• Capacity assurance
• Cloud service assessment
• Infrastructure test
• Business continuity services
Summary
• Failures and incidents are becoming more visible, more expensive and more
complex.
• “Failures” are increasingly likely to be complex, partial, and be distributed
across multiple sites.
• Distributed architectures are more resilient and may be cheaper – but result in
a need for resilient and diligent IT and network management.
• Resiliency is achieved by diligence, investment and attention at all levels.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Scott Hilton
GM & VP, Product Development, Oracle Dyn GBU
Oracle Cloud Infrastructure
Safe Harbor Statement
The following is intended to outline our general product
direction. It is intended for information purposes only, and may
not be incorporated into any contract. It is not a commitment to
deliver any material, code, or functionality, and should not be
relied upon in making purchasing decisions. The development,
release, and timing of any features or functionality described for
Oracle’s products remains at the sole discretion of Oracle.
Oracle Confidential – Internal/Restricted/Highly Restricted
About Oracle Dyn
● Powered by global network that drives 40B traffic optimization decisions for
customers daily
● Serves over 3,500 enterprise customers globally
● Enterprise Grade services
● World’s most comprehensive internet performance data set
● Battle-tested reliability and security
Confidential – Oracle Internal/Restricted/Highly Restricted
‘01 ‘07 ‘09 ‘10 ‘14 ‘16 2017
Incorporated with focus
on Dynamic DNS
& Remote Access
Launched enterprise
Managed DNS services
Customers include
Twitter, Netflix, Zappos
Launch Email
Delivery services
Launch Monitoring
& Analytics services
Acquired by Oracle Becomes Oracle Dyn
Global Business Unit
(GBU)
Internet Events, Natural and Manmade, Effect Website Resiliency
Constant Disruptions Demand a Comprehensive View
Edge and Websites Contribute the Performance and Resilience of the User Experience
User Access
to Websites
Edge
DNS Lookup
Time to
First Byte
Internet
Connections
Time
Website Infrastructure
Processing
Database
Transaction
Storage I/O
50-70%30-50%
Websites on cloud,
data centers,
hosters, etc.
Edge Resiliency Challenges
• User connections to website resources
• Path Compromises
• DDoS attacks
• IP address hijacking
• Site spoofing
• Resources Availability
• Service
• Geography
• Content
Portal
CDN
Cloud PaaS
Partner SaaS
Ad Network
Websites are Only as Good as the Slowest/Least Secure Process
separate resources – separate
paths
Users
[Customers,
Partners,
Employees,
Things]
IT
[DevOps,
Administrators,
Architects]
Core and Edge Create a Complete Cloud
Expectation
[High quality
experience]
Identity
Compute
Block
Storage
Database Networking
Object
Storage
Edge
Messaging
Name
Resolution
Distributed
Content
Traffic
Steering
The Next Generation Cloud
Internet
Monitoring
Availability
Performance
Security
Control
Edge
Networking
Edge
Security
Core+
Users DNS
www.hybrid.com
is:
129.168.9.10
129.168.9.11
175.230.3.2
cdn.hybrid.com
Modern DNS
with Multiple End Points
IaaS Cloud Providers
cloud.hybrid.com
192.168.9.10
Data Centers
hybrid.com
192.168.9.11
SaaS Providers
saas.hybrid.com
175.230.3.2
CDN
cdn.hybrid.com
175.230.3.4
Want to reach
www.hybrid.com
32
75%
25%
Want to reach
www.gohere.com
Users
Oracle Cloud
DNS
Oracle EBS on Oracle
Cloud
Oracle EBS Hosted in
Datacenter
4.4.4.4/24
3.3.3.3/24
Data replication
Monitoring
RULESET
Status: Load Balancing
Geographical Are: Global
Location: Data Center > 3.3.3.3/24
Percentage Distribution: 75
Response Pool Failover: 4.4.4.4/24
Location: Cloud > 4.4.4.4//24
Percentage Distribution: 25
Response Pool Failover: 3.3.3.3/24
Graceful Cloud Migration
Cloud
Data
Center
XXXX
XXXX
Rethink role of DNS as part of
your cloud migration strategy.
Evaluate and understand your
edge and edge requirements as
you migrate to cloud.
The edge is critical to cloud
performance & resilience.
Key Takeaways
21 3
CONCLUSION
Distil Networks protects
mission-critical websites,
mobile apps, and APIs from
automated cybersecurity
threats (aka bots). Engin Akyol
CTO & Co-Founder
What are bad bots up to?
Scraping
Spamming
Ad Fraud
Account
Credentials
Cardholder Data
Carding
Denial of
Service
Skewing
Vulnerability
Identification
Industry Expertise
● Invented the category
● The recognized leader
The Most Effective Technology
● Wider:
○ Web, API, and Mobile
○ Deploy Anywhere
● Deeper: Catch more bots
● Smarter: Without impacting users
Vigilant and Dedicated Partner
● Not A Solution, Your Solution
● Unprecedented access
● An extension of your team
Bot Defense as Adaptable and
Vigilant as the Threat Itself
Distil Secure CDN
Why did we build a new CDN?
For our Customers
● Prevent both Layer 7 DoS and Layer 3/4 DDoS
● Network resiliency and reduced latency for
content served at edge
For Distil Networks
● Capacity
● Scalability
● Resiliency
Legacy CDN Distil Secure CDN
Integrated Bot Defense
App DoS Protection
DDoS Protection
PCI Compliant Network
PoPs 13 25
Capacity ~130 Gbps ~36 Tbps
Interconnects 100 3,000
Global Anycast Routing
Highly Resilient Design
Legacy CDN compared to Secure CDN
The Case for Resiliency
● Distil Networks protects websites, mobile
applications and API’s from automated
attacks by processing all traffic going to a
website in real-time
● Distil’s core application has to be available
24/7
● Our Secure CDN adds DDoS protection as
a capability, but it too has to maintain 24/7
availability
● Any downtime on our network will lead to
immediate revenue loss for many of our
customers
Because of this, the core requirements for our new
CDN centered around resiliency:
Individual POPs (points of presence)
● Cannot go offline due to hardware failure
● Cannot go offline due to networking issues
● Our application must support graceful failover
● Our POP must maintain 99.999% reachability to Distil NOC services
and customer origin servers
● Our POP must have a 0 downtime failover to a designated backup site
in case of catastrophic failure
Hardware and Network
Redundancy Specific
Requirements
● Every physical server component must
have redundancy
● Every physical network component
must have redundancy
● Layer 2 and Layer 3 services need to
be configured for graceful failover Equipment going through validation testing
Additional Requirements
● Capacity - we need to be able to handle some of the world’s
largest e-commerce sites
● Scalability - the network needs to be resilient and adaptive to
special events like flash sales, black friday or DoS attacks.
● Not cost us millions of dollars
www.example.com
Meeting all of our Requirements: Top Level
Distil Networks Architecture
Meeting all of our Requirements: High Level
While our POP’s are designed to be fault tolerant and resilient, we still
need to plan for a POP being offline and handle POP to POP failover
gracefully.
● Each POP has its own CNAME and IP subnets with a DNS failover
configured address configured.
● DNS TTL’s are configured for 1 minute
● If a POP were to have an unplanned event and go offline, Dyn failover
monitoring would shift the POP CNAME to the IP of its designated backup
● Within a few minutes of a POP having an unplanned outage, traffic would
resume through a backup POP
Inter-POP failover utilizing DNS
Our Secure CDN has to maintain 99.999% uptime with no tolerance for
interruptions of customer traffic. This is accomplished through:
● A partnership with Verizon Edgecast, one of the world’s premier CDN’s
● Network and Hardware redundancy at the POP level
● Fault-Tolerant application
● Oracle Dyn DNS monitoring and failover
In Summary
Scott Hilton
VP & GM of Product
Development
Engin Akyol
CTO & Co-Founder
Andy Lawrence
Research Vice
President
Questions for our Experts?
Questions from Registration Page:
● I want to register dyn.com and making payment processing for my account: cdchieu or kanaco for 30 user
within 05 years. Thanks so much. ??? Pass?
● recommendations for Cloud networks.
● What to look for in a DDoS protection solution?” for Engin
● Can you guarantee 100% resilience for availability and integrity? (akamai employee) for Engin?
● How can we be assured that over time the solution the cloud providers first implemented are now
downgraded and compromised over time as cost pressures mount? For ??
● hOW DO YOU FEEL ABOUT IoT For Andy?
● What is the single most reliable warning sign that an outage is imminent?
Seed Questions:
● It’s interesting to see Bots mentioned in a resiliency context. Is there a lot of evidence of Bots causing failures and
downtime, or is it more a nuisance and something that adds costs?
● It seems that there may be a need for a kind of Chief IT risk officer who assesses and manages responses from power
to security. Do organizations have such a role?
● Can you speak to how the BOTs and DDOS attacks are changing and becoming more sophisticated. I’m getting the
impression that the battle is getting a lot tougher as the tools get smarter….

The Website Resiliency Imperative

  • 1.
  • 2.
    Scott Hilton VP &GM of Product Development Engin Akyol CTO & Co-Founder Andy Lawrence Research Vice President Our Expert Panel
  • 3.
    Andy Lawrence, VPResearch, 451 Research, Datacenters and Critical Infrastructure
  • 4.
    451 Research -Agenda • Resiliency – The growing promise of zero downtime • The reality – complex, public facing outages • Binary and partial failures • Distributed resiliency • Distributed planning… 4
  • 5.
    5Company Confidential Towards anage of zero downtime? “This is a service that has not gone down in 12 years, so it’s not that…we could rely on some sort of previous experience.” Werner Vogels, CTO, AWS You can never say never. But we’ve had no outage for 8 years and as I look at, an outage is looking less, not more likely.” CIO, Major manufacturer
  • 6.
    Drivers to reliability •Cloud replicates data and reduces single points of failure • Equipment is engineered for reliability • Analytics forecasts and prevents failures • Redundancy and replication route around failure • Distributed architectures spread the risks… • 50 years of IT! We are passed IT 1.0……
  • 7.
    Outages make Headlinenews! Source: Uptime Institute Research
  • 8.
    8Company Confidential Outages provingcostly Southwest CEO characterizes IT service outage as “once in a 1,000 year flood”
  • 9.
    Downtime costs goingup… Source: Ponemon Institute 2016/Vertiv
  • 10.
    Causes/impacts, diagnosis varywidely… Most commonly cited causes Datacenter power outage Software configuration error DDOS, BOTs and other security issues Network or network programming/device Storage corruption/replication/access Datacenter equipment or site problem
  • 11.
    Causes/impacts, diagnosis varywidely… Most commonly cited causes: Datacenter power outage Software configuration error DDOS, BOTs and other security issue Network or network programming/device Storage corruption/replication/access Datacenter equipment or site problem But most problems are never explained or are partial, temporary or intermittent
  • 12.
    12Company Confidential Outages andincidents in 2017 Cloud providers have problems too… Aug 2017 Google Cloud Aug 2017
  • 13.
    Downtime is nowa major management concern… 92% of organizations said their management is more concerned about outages then they were a year ago 60% of organizations said they now formally attempt to measure the cost of downtime. Source: Uptime Institute Survey 2017
  • 14.
    The problem ofpartial downtime and degradation
  • 15.
    Performance issues forceup cloud costs… 15 • DDOS, BOTs and configuration issues can drag down performance or use up CPU and disk I/O. • Cloud services (AWS and Microsoft) will respond by autoscaling up and use up credits. • On-premise problems may trigger a burst to cloud or require DR-as-a-service (More rarely).
  • 16.
    Architectural swing… Scale up •Transactional • High Integrity • Interdependent stack • Single/mirrored sites • DR/Back up • Resilient facilities Scale out • Distributed components • Horizontally layered • Highly virtualized • High replicated • Less transactional • Available but integrity issues • Less redundant facilities
  • 17.
    Resiliency Trends • Moresharing • More replication • More in software • More distributed • More is active
  • 18.
    Distributed resiliency Advantages ● Canreduce/eliminate vulnerability to local/regional issues ● Can support extremely high availability/maintainability ● Can eliminate need for Disaster Recovery ● Can enable reduced investment in physical redundancy ● Very scalable – suited to Cloud native/scale out IT Challenges ● Introduces IT complexity, expense ● Requires compromises – integrity v availability ● Requires scale or close collaboration ● Susceptible to problems with performance, replication and load management ● Requires continual hands on management
  • 19.
    Cloud resiliency anddistributed data integrity… [This] is possible in practice if you control the whole network, which is rare over the wide area.” Even then, it requires significant redundancy of network paths, architectural planning to manage correlated failures, and very careful operations, especially for upgrades. Even then outages will occur…….. Professor Eric Brewer, VP Infrastructure, Google
  • 20.
    Problems move up,and out… • Datacenter level switching/domain name management • Intra-datacenter pathway independence • Capacity planning/management/latency • Performance planning/protection • Visibility and recovery
  • 21.
    Who deals withresiliency?
  • 22.
    2. Security Resiliency • Intrusiondetection and prevention • Viruses, malware • DDOS and Bots • Privacy and encryption • Secure key management • Fraud and hacking • Staff and process management • Management and operations • Redundancy – power • Redundancy – Network • Availability architectures 4. Facility and Network Resiliency • Redundancy – power • Redundancy – Network • Cooling systems • High availability power distribution • High Availability architectures • Tier II-IV designs • Network pathways • Capacity forecasting/management • Management and operations • Physical site security 4. Facility and Network Resiliency • Redundancy – power • Redundancy – Network • Cooling systems • High availability power distribution • High Availability architectures • Tier II-IV designs • Network pathways • Capacity forecasting/management • Management and operations • Physical site security 4. Facility and Network Resiliency • Redundancy – power • Redundancy – Network • Cooling systems • High availability power distribution • High Availability architectures • Tier II-IV designs • Network pathways • Capacity forecasting/management • Management and operations • Physical site security 2. Security Resiliency • Intrusion detection and prevention • Viruses, malware • DDOS and Bots • Privacy and encryption • Secure key management • Fraud and hacking • Staff and process management • Management and operations • Redundancy – power • Redundancy – Network • Availability architectures 2. Security Resiliency • Intrusion detection and prevention • Viruses, malware • DDOS and Bots • Privacy and encryption • Secure key management • Fraud and hacking • Staff and process management • Management and operations • Redundancy – power • Redundancy – Network • Availability architectures Resiliency is multi-disciplined 4. Facility and Network Resiliency • Redundancy – power • Redundancy – Network • Cooling systems • High availability power distribution • High Availability architectures • Tier II-IV designs • Network pathways • Capacity forecasting/management • Management and operations • Physical site security 1. Systems and software Resiliency • Redundancy and replication • Database sychronisation and transaction management • Storage management • Capacity and load management • Global traffic management • Domain management • High availability designs • Performance monitoring • Availability zones/regions • Management and operations 3. Service providers & partners Resiliency • Network carrier assessment • SLAs and contracts • Visibility and transparency • Certification • Performance monitoring Architecture risk assessments • Capacity assurance • Cloud service assessment • Infrastructure test • Business continuity services 1. Systems and software Resiliency • Redundancy and replication • Database sychronisation and transaction management • Storage management • Capacity and load management • Global traffic management • Domain management • High availability designs • Performance monitoring • Availability zones/regions • Management and operations 1. Systems and software Resiliency • Redundancy and replication • Database sychronisation and transaction management • Storage management • Capacity and load management • Global traffic management • Domain management • High availability designs • Performance monitoring • Availability zones/regions • Management and operations 1. Systems and software Resiliency • Redundancy and replication • Database sychronisation and transaction management • Storage management • Capacity and load management • Global traffic management • Domain management • High availability designs • Performance monitoring • Availability zones/regions • Management and operations 3. Service providers & partners Resiliency • Network carrier assessment • SLAs and contracts • Visibility and transparency • Certification • Performance monitoring Architecture risk assessments • Capacity assurance • Cloud service assessment • Infrastructure test • Business continuity services 3. Service providers & partners Resiliency • Network carrier assessment • SLAs and contracts • Visibility and transparency • Certification • Performance monitoring Architecture risk assessments • Capacity assurance • Cloud service assessment • Infrastructure test • Business continuity services
  • 23.
    Summary • Failures andincidents are becoming more visible, more expensive and more complex. • “Failures” are increasingly likely to be complex, partial, and be distributed across multiple sites. • Distributed architectures are more resilient and may be cheaper – but result in a need for resilient and diligent IT and network management. • Resiliency is achieved by diligence, investment and attention at all levels.
  • 24.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. Scott Hilton GM & VP, Product Development, Oracle Dyn GBU Oracle Cloud Infrastructure
  • 25.
    Safe Harbor Statement Thefollowing is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Oracle Confidential – Internal/Restricted/Highly Restricted
  • 26.
    About Oracle Dyn ●Powered by global network that drives 40B traffic optimization decisions for customers daily ● Serves over 3,500 enterprise customers globally ● Enterprise Grade services ● World’s most comprehensive internet performance data set ● Battle-tested reliability and security Confidential – Oracle Internal/Restricted/Highly Restricted ‘01 ‘07 ‘09 ‘10 ‘14 ‘16 2017 Incorporated with focus on Dynamic DNS & Remote Access Launched enterprise Managed DNS services Customers include Twitter, Netflix, Zappos Launch Email Delivery services Launch Monitoring & Analytics services Acquired by Oracle Becomes Oracle Dyn Global Business Unit (GBU)
  • 27.
    Internet Events, Naturaland Manmade, Effect Website Resiliency
  • 28.
    Constant Disruptions Demanda Comprehensive View Edge and Websites Contribute the Performance and Resilience of the User Experience User Access to Websites Edge DNS Lookup Time to First Byte Internet Connections Time Website Infrastructure Processing Database Transaction Storage I/O 50-70%30-50% Websites on cloud, data centers, hosters, etc.
  • 29.
    Edge Resiliency Challenges •User connections to website resources • Path Compromises • DDoS attacks • IP address hijacking • Site spoofing • Resources Availability • Service • Geography • Content Portal CDN Cloud PaaS Partner SaaS Ad Network Websites are Only as Good as the Slowest/Least Secure Process separate resources – separate paths
  • 30.
    Users [Customers, Partners, Employees, Things] IT [DevOps, Administrators, Architects] Core and EdgeCreate a Complete Cloud Expectation [High quality experience] Identity Compute Block Storage Database Networking Object Storage Edge Messaging Name Resolution Distributed Content Traffic Steering The Next Generation Cloud Internet Monitoring Availability Performance Security Control Edge Networking Edge Security Core+
  • 31.
    Users DNS www.hybrid.com is: 129.168.9.10 129.168.9.11 175.230.3.2 cdn.hybrid.com Modern DNS withMultiple End Points IaaS Cloud Providers cloud.hybrid.com 192.168.9.10 Data Centers hybrid.com 192.168.9.11 SaaS Providers saas.hybrid.com 175.230.3.2 CDN cdn.hybrid.com 175.230.3.4 Want to reach www.hybrid.com
  • 32.
    32 75% 25% Want to reach www.gohere.com Users OracleCloud DNS Oracle EBS on Oracle Cloud Oracle EBS Hosted in Datacenter 4.4.4.4/24 3.3.3.3/24 Data replication Monitoring RULESET Status: Load Balancing Geographical Are: Global Location: Data Center > 3.3.3.3/24 Percentage Distribution: 75 Response Pool Failover: 4.4.4.4/24 Location: Cloud > 4.4.4.4//24 Percentage Distribution: 25 Response Pool Failover: 3.3.3.3/24 Graceful Cloud Migration Cloud Data Center XXXX XXXX
  • 33.
    Rethink role ofDNS as part of your cloud migration strategy. Evaluate and understand your edge and edge requirements as you migrate to cloud. The edge is critical to cloud performance & resilience. Key Takeaways 21 3 CONCLUSION
  • 34.
    Distil Networks protects mission-criticalwebsites, mobile apps, and APIs from automated cybersecurity threats (aka bots). Engin Akyol CTO & Co-Founder
  • 35.
    What are badbots up to? Scraping Spamming Ad Fraud Account Credentials Cardholder Data Carding Denial of Service Skewing Vulnerability Identification
  • 36.
    Industry Expertise ● Inventedthe category ● The recognized leader The Most Effective Technology ● Wider: ○ Web, API, and Mobile ○ Deploy Anywhere ● Deeper: Catch more bots ● Smarter: Without impacting users Vigilant and Dedicated Partner ● Not A Solution, Your Solution ● Unprecedented access ● An extension of your team Bot Defense as Adaptable and Vigilant as the Threat Itself
  • 37.
  • 39.
    Why did webuild a new CDN? For our Customers ● Prevent both Layer 7 DoS and Layer 3/4 DDoS ● Network resiliency and reduced latency for content served at edge For Distil Networks ● Capacity ● Scalability ● Resiliency
  • 40.
    Legacy CDN DistilSecure CDN Integrated Bot Defense App DoS Protection DDoS Protection PCI Compliant Network PoPs 13 25 Capacity ~130 Gbps ~36 Tbps Interconnects 100 3,000 Global Anycast Routing Highly Resilient Design Legacy CDN compared to Secure CDN
  • 41.
    The Case forResiliency ● Distil Networks protects websites, mobile applications and API’s from automated attacks by processing all traffic going to a website in real-time ● Distil’s core application has to be available 24/7 ● Our Secure CDN adds DDoS protection as a capability, but it too has to maintain 24/7 availability ● Any downtime on our network will lead to immediate revenue loss for many of our customers
  • 42.
    Because of this,the core requirements for our new CDN centered around resiliency: Individual POPs (points of presence) ● Cannot go offline due to hardware failure ● Cannot go offline due to networking issues ● Our application must support graceful failover ● Our POP must maintain 99.999% reachability to Distil NOC services and customer origin servers ● Our POP must have a 0 downtime failover to a designated backup site in case of catastrophic failure
  • 43.
    Hardware and Network RedundancySpecific Requirements ● Every physical server component must have redundancy ● Every physical network component must have redundancy ● Layer 2 and Layer 3 services need to be configured for graceful failover Equipment going through validation testing
  • 44.
    Additional Requirements ● Capacity- we need to be able to handle some of the world’s largest e-commerce sites ● Scalability - the network needs to be resilient and adaptive to special events like flash sales, black friday or DoS attacks. ● Not cost us millions of dollars
  • 45.
    www.example.com Meeting all ofour Requirements: Top Level
  • 46.
    Distil Networks Architecture Meetingall of our Requirements: High Level
  • 47.
    While our POP’sare designed to be fault tolerant and resilient, we still need to plan for a POP being offline and handle POP to POP failover gracefully. ● Each POP has its own CNAME and IP subnets with a DNS failover configured address configured. ● DNS TTL’s are configured for 1 minute ● If a POP were to have an unplanned event and go offline, Dyn failover monitoring would shift the POP CNAME to the IP of its designated backup ● Within a few minutes of a POP having an unplanned outage, traffic would resume through a backup POP Inter-POP failover utilizing DNS
  • 48.
    Our Secure CDNhas to maintain 99.999% uptime with no tolerance for interruptions of customer traffic. This is accomplished through: ● A partnership with Verizon Edgecast, one of the world’s premier CDN’s ● Network and Hardware redundancy at the POP level ● Fault-Tolerant application ● Oracle Dyn DNS monitoring and failover In Summary
  • 49.
    Scott Hilton VP &GM of Product Development Engin Akyol CTO & Co-Founder Andy Lawrence Research Vice President Questions for our Experts?
  • 50.
    Questions from RegistrationPage: ● I want to register dyn.com and making payment processing for my account: cdchieu or kanaco for 30 user within 05 years. Thanks so much. ??? Pass? ● recommendations for Cloud networks. ● What to look for in a DDoS protection solution?” for Engin ● Can you guarantee 100% resilience for availability and integrity? (akamai employee) for Engin? ● How can we be assured that over time the solution the cloud providers first implemented are now downgraded and compromised over time as cost pressures mount? For ?? ● hOW DO YOU FEEL ABOUT IoT For Andy? ● What is the single most reliable warning sign that an outage is imminent? Seed Questions: ● It’s interesting to see Bots mentioned in a resiliency context. Is there a lot of evidence of Bots causing failures and downtime, or is it more a nuisance and something that adds costs? ● It seems that there may be a need for a kind of Chief IT risk officer who assesses and manages responses from power to security. Do organizations have such a role? ● Can you speak to how the BOTs and DDOS attacks are changing and becoming more sophisticated. I’m getting the impression that the battle is getting a lot tougher as the tools get smarter….