A study of our DNS full-resolvers

a study of
our DNS full-resolvers
Matsuzaki ‘maz’ Yoshinobu
<maz@iij.ad.jp>
Topic for today
• Lesson learned from an outage of our full-resolvers
• An interesting behavior of clients
Users and DNS full-resolver
full-resolver
(cache nameserver)
Most users are using ISP’s full-
resolvers as those information
are provided automatically
maz@iij.ad.jp 3
DNS cache nameserver
• Usually ISPs provide 2 nameservers for customers
• Just in case
• Our assumptions here:
• Even single server was failed, another server can handle
DNS queries
• Users somehow automatically pick an usable one up for
their use
maz@iij.ad.jp 4
In 2009, we had a trouble
• Trouble on cache nameservers for consumers
• Apr, 2009
• On two (all) nameservers
• 1st failure happened on
a server (ns01)
• then 2nd failure happened
on another server (ns11)
• About 12min blackout
maz@iij.ad.jp 5
Failures on our cache nameservers
• ns01: 17:14:26 - 17:48:07 (33min14sec)
• ns11: 17:35:51 - 17:48:52 (13min01sec)
• During the both servers were in trouble
(12min16sec), our users couldn’t resolve
hostnames
• During this trouble, the servers couldn’t answer
14,005,644 DNS queries
maz@iij.ad.jp 6
The query graph
maz@iij.ad.jp 7
Before failure
• Clients prefer to use ns01
• Order of configuration?
• Clients sent DNS queries to
another server as well
• measuring delays?
• just in case?
maz@iij.ad.jp 8
During single failure
• DNS queries to ns01 were
discarded during this period
• It seems users could still resolve
hostnames as the ns11 was alive
• No strange traffic pattern here
• Users might feel some delays
• A bit higher rate of queries in the
first 3min, and then ‘stable’ state
maz@iij.ad.jp 9
During single failure (cont.
• Query rate(ns01+ns11) looks
almost the same as before
• Even though ns01 was discarding
queries during this period
• Probably most clients usually send
DNS queries to both nameservers
maz@iij.ad.jp 10
During double failure (outage)
• Users couldn’t resolve hostnames
at all during this period
• Query rate suddenly increased on
both nameservers
• Those are all discarded though
• Mostly because of ‘retries’
• We observed multiple queries that
has the same QNAME
maz@iij.ad.jp 11
Restoration
• Once ns01 was restored, it got
about 7times more queries than
usual for several seconds
• Web pages are composed by many
“modules”
• Single web page makes several and
more DNS queries sometimes
• Browsers’ prefetch function
• 12min was enough to flush
clients’ side DNS cache
maz@iij.ad.jp 12
Restora(on (cont.
• Then ns11 was also restored
• It also got higher rate of queries for
several seconds
• Gradually the query rate were
getting ‘normal’ state as same as
the before
maz@iij.ad.jp 13
Lesson learned
• Single server failure will not cause a disaster, when
users configure multiple DNS cache servers on their
device
• Probably the impact could be negligible
• During double failure (full-outage), nameservers
got more queries
• Once a server is restored from full-outage, it gets
higher query rate for a while
• In our case, 7 times more than usual for several seconds
maz@iij.ad.jp 14
Redundancy is important
• DNS resolving works somehow as long as one of servers
is functional on each part of DNS
• Full-resolvers (caching nameservers)
• Authoritative nameservers
• Do have redundancy, avoid outage
• A multiple server deployment works well
• IP anycast would be also useful
• my bdnog7 talk - https://www.slideshare.net/bdnog/ip-anycasting
• Once outage, we should expect a large amount of
queries during and just after the outage
• A warning for those who has a security device in front of
nameservers
Users and DNS full-resolver
full-resolver
(cache nameserver)
Many devices
on a home network
maz@iij.ad.jp 16
Usual graph - 5min average
measurement date: 2018/05/16
A bit different view - 1sec average
measurement date: 2018/05/16
Peaks on the hour
• Minor peaks on the hour and half (like 07:30)
• “Alarm clock” wakes up the phone itself
• Some applications are also initiated by the wakeup
• I guess those are mostly coming from smartphones
• QNAMEs also hint
A spike at 15sec before the hour
measurement date: 2018/05/16
we’ve something different now
measurement date: 2019/10/25
Summary
• It’s reasonable for us to provide 2 full-resolvers
(caching nameservers) to customers
• Clients seem to have the ability to use a functional one
• Once outage, we should expect a large amount of
queries during and just after the outage
• A warning to those who has a security device in front of
nameservers
• Clients are synced up unintentionally
• ‘alarm clock’ or scheduled tasks
• This particular case is not an issue at this moment, but
it’s worth to pay attention to those behaviors
maz@iij.ad.jp 22
1 of 22

Recommended

Hands-on DNSSEC Deployment by
Hands-on DNSSEC DeploymentHands-on DNSSEC Deployment
Hands-on DNSSEC DeploymentBangladesh Network Operators Group
510 views125 slides
DNS/DNSSEC by Nurul Islam by
DNS/DNSSEC by Nurul IslamDNS/DNSSEC by Nurul Islam
DNS/DNSSEC by Nurul IslamMyNOG
7.8K views94 slides
DNSSEC Tutorial; USENIX LISA 2013 by
DNSSEC Tutorial; USENIX LISA 2013DNSSEC Tutorial; USENIX LISA 2013
DNSSEC Tutorial; USENIX LISA 2013Shumon Huque
2.3K views191 slides
DNSSEC - WHAT IS IT ? INSTALL AND CONFIGURE IN CHROOT JAIL by
DNSSEC - WHAT IS IT ? INSTALL AND CONFIGURE IN CHROOT JAILDNSSEC - WHAT IS IT ? INSTALL AND CONFIGURE IN CHROOT JAIL
DNSSEC - WHAT IS IT ? INSTALL AND CONFIGURE IN CHROOT JAILUtah Networxs Consultoria e Treinamento
2.5K views45 slides
Re-Engineering the DNS – One Resolver at a Time by
Re-Engineering the DNS – One Resolver at a Time Re-Engineering the DNS – One Resolver at a Time
Re-Engineering the DNS – One Resolver at a Time Bangladesh Network Operators Group
225 views29 slides
DNS Cache Poisoning by
DNS Cache PoisoningDNS Cache Poisoning
DNS Cache PoisoningChristiaan Ottow
680 views42 slides

More Related Content

What's hot

Introduction to DNS by
Introduction to DNSIntroduction to DNS
Introduction to DNSJonathan Oxer
11.7K views146 slides
Dnssec by
DnssecDnssec
Dnssecguest3131f85
2.8K views110 slides
Namespaces for Local Networks by
Namespaces for Local NetworksNamespaces for Local Networks
Namespaces for Local NetworksMen and Mice
1.6K views57 slides
Windows 2012 and DNSSEC by
Windows 2012 and DNSSECWindows 2012 and DNSSEC
Windows 2012 and DNSSECMen and Mice
3.2K views137 slides
Get your instance by name integration of nova, neutron and designate by
Get your instance by name  integration of nova, neutron and designateGet your instance by name  integration of nova, neutron and designate
Get your instance by name integration of nova, neutron and designateMiguel Lavalle
3.4K views36 slides
Designate - DNSaaS for OpenStack - FOSDEM 2014 by
Designate - DNSaaS for OpenStack - FOSDEM 2014Designate - DNSaaS for OpenStack - FOSDEM 2014
Designate - DNSaaS for OpenStack - FOSDEM 2014Graham Hayes
1.6K views15 slides

What's hot(20)

Introduction to DNS by Jonathan Oxer
Introduction to DNSIntroduction to DNS
Introduction to DNS
Jonathan Oxer11.7K views
Namespaces for Local Networks by Men and Mice
Namespaces for Local NetworksNamespaces for Local Networks
Namespaces for Local Networks
Men and Mice1.6K views
Windows 2012 and DNSSEC by Men and Mice
Windows 2012 and DNSSECWindows 2012 and DNSSEC
Windows 2012 and DNSSEC
Men and Mice3.2K views
Get your instance by name integration of nova, neutron and designate by Miguel Lavalle
Get your instance by name  integration of nova, neutron and designateGet your instance by name  integration of nova, neutron and designate
Get your instance by name integration of nova, neutron and designate
Miguel Lavalle3.4K views
Designate - DNSaaS for OpenStack - FOSDEM 2014 by Graham Hayes
Designate - DNSaaS for OpenStack - FOSDEM 2014Designate - DNSaaS for OpenStack - FOSDEM 2014
Designate - DNSaaS for OpenStack - FOSDEM 2014
Graham Hayes1.6K views
DNSSEC Tutorial, by Champika Wijayatunga [APNIC 38] by APNIC
DNSSEC Tutorial, by Champika Wijayatunga [APNIC 38]DNSSEC Tutorial, by Champika Wijayatunga [APNIC 38]
DNSSEC Tutorial, by Champika Wijayatunga [APNIC 38]
APNIC3.2K views
Part 3 - Local Name Resolution in Linux, FreeBSD and macOS/iOS by Men and Mice
Part 3 - Local Name Resolution in Linux, FreeBSD and macOS/iOSPart 3 - Local Name Resolution in Linux, FreeBSD and macOS/iOS
Part 3 - Local Name Resolution in Linux, FreeBSD and macOS/iOS
Men and Mice3.5K views
Implementing Domain Name by Napoleon NV
Implementing Domain NameImplementing Domain Name
Implementing Domain Name
Napoleon NV22 views
Dns introduction by sunil kumar
Dns   introduction Dns   introduction
Dns introduction
sunil kumar611 views
DNS for Developers - NDC Oslo 2016 by Maarten Balliauw
DNS for Developers - NDC Oslo 2016DNS for Developers - NDC Oslo 2016
DNS for Developers - NDC Oslo 2016
Maarten Balliauw1.4K views
Best Practices - PHP and the Oracle Database by Christopher Jones
Best Practices - PHP and the Oracle DatabaseBest Practices - PHP and the Oracle Database
Best Practices - PHP and the Oracle Database
Christopher Jones47.9K views
OARC 31: NSEC Caching Revisited by APNIC
OARC 31: NSEC Caching RevisitedOARC 31: NSEC Caching Revisited
OARC 31: NSEC Caching Revisited
APNIC196 views
DNSSEC signing Tutorial by Men and Mice
DNSSEC signing Tutorial DNSSEC signing Tutorial
DNSSEC signing Tutorial
Men and Mice5.5K views
Windows Server 2016 Webinar by Men and Mice
Windows Server 2016 WebinarWindows Server 2016 Webinar
Windows Server 2016 Webinar
Men and Mice3.4K views
bdNOG 7 - Re-engineering the DNS - one resolver at a time by APNIC
bdNOG 7 - Re-engineering the DNS - one resolver at a timebdNOG 7 - Re-engineering the DNS - one resolver at a time
bdNOG 7 - Re-engineering the DNS - one resolver at a time
APNIC533 views
DNS High-Availability Tools - Open-Source Load Balancing Solutions by Men and Mice
DNS High-Availability Tools - Open-Source Load Balancing SolutionsDNS High-Availability Tools - Open-Source Load Balancing Solutions
DNS High-Availability Tools - Open-Source Load Balancing Solutions
Men and Mice4.7K views

Similar to A study of our DNS full-resolvers

DNS in IR: Collection, Analysis and Response by
DNS in IR: Collection, Analysis and ResponseDNS in IR: Collection, Analysis and Response
DNS in IR: Collection, Analysis and Responsepm123008
675 views32 slides
NZNOG 2013 - Experiments in DNSSEC by
NZNOG 2013 - Experiments in DNSSECNZNOG 2013 - Experiments in DNSSEC
NZNOG 2013 - Experiments in DNSSECAPNIC
295 views46 slides
Got Problems? Let's Do a Health Check by
Got Problems? Let's Do a Health CheckGot Problems? Let's Do a Health Check
Got Problems? Let's Do a Health CheckLuis Guirigay
6.8K views45 slides
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ... by
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
6.6K views54 slides
The latest news in the DNS resolution: DNSSEC by
The latest news in the DNS resolution: DNSSECThe latest news in the DNS resolution: DNSSEC
The latest news in the DNS resolution: DNSSECWhalebone, s.r.o.
340 views12 slides
DevOps throughout time by
DevOps throughout timeDevOps throughout time
DevOps throughout timeHany Fahim
318 views71 slides

Similar to A study of our DNS full-resolvers(20)

DNS in IR: Collection, Analysis and Response by pm123008
DNS in IR: Collection, Analysis and ResponseDNS in IR: Collection, Analysis and Response
DNS in IR: Collection, Analysis and Response
pm123008675 views
NZNOG 2013 - Experiments in DNSSEC by APNIC
NZNOG 2013 - Experiments in DNSSECNZNOG 2013 - Experiments in DNSSEC
NZNOG 2013 - Experiments in DNSSEC
APNIC295 views
Got Problems? Let's Do a Health Check by Luis Guirigay
Got Problems? Let's Do a Health CheckGot Problems? Let's Do a Health Check
Got Problems? Let's Do a Health Check
Luis Guirigay6.8K views
The latest news in the DNS resolution: DNSSEC by Whalebone, s.r.o.
The latest news in the DNS resolution: DNSSECThe latest news in the DNS resolution: DNSSEC
The latest news in the DNS resolution: DNSSEC
Whalebone, s.r.o.340 views
DevOps throughout time by Hany Fahim
DevOps throughout timeDevOps throughout time
DevOps throughout time
Hany Fahim318 views
Nagios XI Best Practices by Nagios
Nagios XI Best PracticesNagios XI Best Practices
Nagios XI Best Practices
Nagios14.5K views
Best And Worst Practices Deploying IBM Connections by LetsConnect
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM Connections
LetsConnect1.7K views
DINR 2021 Virtual Workshop: Passive vs Active Measurements in the DNS by APNIC
DINR 2021 Virtual Workshop: Passive vs Active Measurements in the DNSDINR 2021 Virtual Workshop: Passive vs Active Measurements in the DNS
DINR 2021 Virtual Workshop: Passive vs Active Measurements in the DNS
APNIC171 views
PLNOG14: DNS, czyli co nowego w świecie DNS-ozaurów - Adam Obszyński by PROIDEA
PLNOG14: DNS, czyli co nowego w świecie DNS-ozaurów - Adam ObszyńskiPLNOG14: DNS, czyli co nowego w świecie DNS-ozaurów - Adam Obszyński
PLNOG14: DNS, czyli co nowego w świecie DNS-ozaurów - Adam Obszyński
PROIDEA307 views
Signing DNSSEC answers on the fly at the edge: challenges and solutions by APNIC
Signing DNSSEC answers on the fly at the edge: challenges and solutionsSigning DNSSEC answers on the fly at the edge: challenges and solutions
Signing DNSSEC answers on the fly at the edge: challenges and solutions
APNIC1.2K views
Apache Performance Tuning: Scaling Out by Sander Temme
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
Sander Temme7.8K views
Improving Hadoop Cluster Performance via Linux Configuration by DataWorks Summit
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
DataWorks Summit4.8K views
KoprowskiT - SQLBITS X - 2am a disaster just began by Tobias Koprowski
KoprowskiT - SQLBITS X - 2am a disaster just beganKoprowskiT - SQLBITS X - 2am a disaster just began
KoprowskiT - SQLBITS X - 2am a disaster just began
Tobias Koprowski511 views
Adm07 The Health Check Extravaganza for IBM Social and Collaboration Environm... by Kim Greene
Adm07 The Health Check Extravaganza for IBM Social and Collaboration Environm...Adm07 The Health Check Extravaganza for IBM Social and Collaboration Environm...
Adm07 The Health Check Extravaganza for IBM Social and Collaboration Environm...
Kim Greene645 views

More from Bangladesh Network Operators Group

IPv6 Deployment in South Asia 2022 by
IPv6 Deployment in South Asia  2022IPv6 Deployment in South Asia  2022
IPv6 Deployment in South Asia 2022Bangladesh Network Operators Group
43 views20 slides
Introduction to Software Defined Networking (SDN) by
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Bangladesh Network Operators Group
143 views27 slides
RPKI Deployment Status in Bangladesh by
RPKI Deployment Status in BangladeshRPKI Deployment Status in Bangladesh
RPKI Deployment Status in BangladeshBangladesh Network Operators Group
45 views21 slides
An Overview about open UDP Services by
An Overview about open UDP ServicesAn Overview about open UDP Services
An Overview about open UDP ServicesBangladesh Network Operators Group
217 views15 slides
12 Years in DNS Security As a Defender by
12 Years in DNS Security As a Defender12 Years in DNS Security As a Defender
12 Years in DNS Security As a DefenderBangladesh Network Operators Group
111 views21 slides
Contents Localization Initiatives to get better User Experience by
Contents Localization Initiatives to get better User ExperienceContents Localization Initiatives to get better User Experience
Contents Localization Initiatives to get better User ExperienceBangladesh Network Operators Group
78 views31 slides

More from Bangladesh Network Operators Group(20)

Recently uploaded

2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue by
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlueShapeBlue
152 views23 slides
Qualifying SaaS, IaaS.pptx by
Qualifying SaaS, IaaS.pptxQualifying SaaS, IaaS.pptx
Qualifying SaaS, IaaS.pptxSachin Bhandari
1.1K views8 slides
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...ShapeBlue
178 views15 slides
Business Analyst Series 2023 - Week 4 Session 7 by
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7DianaGray10
146 views31 slides
"Package management in monorepos", Zoltan Kochan by
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan KochanFwdays
34 views18 slides
Transcript: Redefining the book supply chain: A glimpse into the future - Tec... by
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...BookNet Canada
41 views16 slides

Recently uploaded(20)

2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue by ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue152 views
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by ShapeBlue
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
ShapeBlue178 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10146 views
"Package management in monorepos", Zoltan Kochan by Fwdays
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan Kochan
Fwdays34 views
Transcript: Redefining the book supply chain: A glimpse into the future - Tec... by BookNet Canada
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
BookNet Canada41 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue171 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue225 views
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... by ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue183 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue303 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue196 views
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue164 views
Business Analyst Series 2023 - Week 4 Session 8 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10145 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash162 views
"Node.js Development in 2024: trends and tools", Nikita Galkin by Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays33 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... by ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue129 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc176 views

A study of our DNS full-resolvers

  • 1. a study of our DNS full-resolvers Matsuzaki ‘maz’ Yoshinobu <maz@iij.ad.jp>
  • 2. Topic for today • Lesson learned from an outage of our full-resolvers • An interesting behavior of clients
  • 3. Users and DNS full-resolver full-resolver (cache nameserver) Most users are using ISP’s full- resolvers as those information are provided automatically maz@iij.ad.jp 3
  • 4. DNS cache nameserver • Usually ISPs provide 2 nameservers for customers • Just in case • Our assumptions here: • Even single server was failed, another server can handle DNS queries • Users somehow automatically pick an usable one up for their use maz@iij.ad.jp 4
  • 5. In 2009, we had a trouble • Trouble on cache nameservers for consumers • Apr, 2009 • On two (all) nameservers • 1st failure happened on a server (ns01) • then 2nd failure happened on another server (ns11) • About 12min blackout maz@iij.ad.jp 5
  • 6. Failures on our cache nameservers • ns01: 17:14:26 - 17:48:07 (33min14sec) • ns11: 17:35:51 - 17:48:52 (13min01sec) • During the both servers were in trouble (12min16sec), our users couldn’t resolve hostnames • During this trouble, the servers couldn’t answer 14,005,644 DNS queries maz@iij.ad.jp 6
  • 8. Before failure • Clients prefer to use ns01 • Order of configuration? • Clients sent DNS queries to another server as well • measuring delays? • just in case? maz@iij.ad.jp 8
  • 9. During single failure • DNS queries to ns01 were discarded during this period • It seems users could still resolve hostnames as the ns11 was alive • No strange traffic pattern here • Users might feel some delays • A bit higher rate of queries in the first 3min, and then ‘stable’ state maz@iij.ad.jp 9
  • 10. During single failure (cont. • Query rate(ns01+ns11) looks almost the same as before • Even though ns01 was discarding queries during this period • Probably most clients usually send DNS queries to both nameservers maz@iij.ad.jp 10
  • 11. During double failure (outage) • Users couldn’t resolve hostnames at all during this period • Query rate suddenly increased on both nameservers • Those are all discarded though • Mostly because of ‘retries’ • We observed multiple queries that has the same QNAME maz@iij.ad.jp 11
  • 12. Restoration • Once ns01 was restored, it got about 7times more queries than usual for several seconds • Web pages are composed by many “modules” • Single web page makes several and more DNS queries sometimes • Browsers’ prefetch function • 12min was enough to flush clients’ side DNS cache maz@iij.ad.jp 12
  • 13. Restora(on (cont. • Then ns11 was also restored • It also got higher rate of queries for several seconds • Gradually the query rate were getting ‘normal’ state as same as the before maz@iij.ad.jp 13
  • 14. Lesson learned • Single server failure will not cause a disaster, when users configure multiple DNS cache servers on their device • Probably the impact could be negligible • During double failure (full-outage), nameservers got more queries • Once a server is restored from full-outage, it gets higher query rate for a while • In our case, 7 times more than usual for several seconds maz@iij.ad.jp 14
  • 15. Redundancy is important • DNS resolving works somehow as long as one of servers is functional on each part of DNS • Full-resolvers (caching nameservers) • Authoritative nameservers • Do have redundancy, avoid outage • A multiple server deployment works well • IP anycast would be also useful • my bdnog7 talk - https://www.slideshare.net/bdnog/ip-anycasting • Once outage, we should expect a large amount of queries during and just after the outage • A warning for those who has a security device in front of nameservers
  • 16. Users and DNS full-resolver full-resolver (cache nameserver) Many devices on a home network maz@iij.ad.jp 16
  • 17. Usual graph - 5min average measurement date: 2018/05/16
  • 18. A bit different view - 1sec average measurement date: 2018/05/16
  • 19. Peaks on the hour • Minor peaks on the hour and half (like 07:30) • “Alarm clock” wakes up the phone itself • Some applications are also initiated by the wakeup • I guess those are mostly coming from smartphones • QNAMEs also hint
  • 20. A spike at 15sec before the hour measurement date: 2018/05/16
  • 21. we’ve something different now measurement date: 2019/10/25
  • 22. Summary • It’s reasonable for us to provide 2 full-resolvers (caching nameservers) to customers • Clients seem to have the ability to use a functional one • Once outage, we should expect a large amount of queries during and just after the outage • A warning to those who has a security device in front of nameservers • Clients are synced up unintentionally • ‘alarm clock’ or scheduled tasks • This particular case is not an issue at this moment, but it’s worth to pay attention to those behaviors maz@iij.ad.jp 22