Cassandra-Powered Distributed DNS
Upcoming SlideShare
Loading in...5
×
 

Cassandra-Powered Distributed DNS

on

  • 6,099 views

P

P

Statistics

Views

Total Views
6,099
Views on SlideShare
6,013
Embed Views
86

Actions

Likes
5
Downloads
114
Comments
1

5 Embeds 86

http://nosql.io 59
http://www.chenliliang.com 13
http://www.linkedin.com 7
http://localhost 6
http://source.gild.com 1

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • I added on nosql.io also.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Cassandra-Powered Distributed DNS Cassandra-Powered Distributed DNS Presentation Transcript

  • Highly Available DNS and Request Routing Using Apache Cassandra A Real-World Introduction to Cassandra's Data Structures + Python's pyCassa module David Strauss / Founder + CTO / Pantheon Systems
  • Why another DNS server?
    • DNS servers either have no replication or require writing to a defined master server .
      • Exceptions (Active Directory and ApacheDS) require backing DNS with a heavyweight and annoying directory service like LDAP.
    • Critical DNS services need replication
      • ...to withstand DDoS attacks
      • ...to maintain uptime when major regional links fail
    • The zone file formats in use are awful.
    • Maintaining data persistence and replication should not be the DNS server's problem.
  • Why Cassandra?
    • Easy cluster setup and management
    • Built-in replication and high availability
    • Multi-master: writers don't need to understand the replication topology.
    • Data model similarity to DNS
    • Eventual consistency isn't a problem
    • Not a perfect match, though:
      • Write scalability is overkill
      • High memory requirements
  • Demo break
    • Let's set up a basic, three-node Cassandra cluster...
  • Creating the data model
    • Think in terms of a nested dictionary.
    • Design for eventual consistency.
      • Columns are the units (atoms) of replication.
      • Some columns may be replicated before others.
      • Column names are unique in each row or SuperC.
      • When possible, dissect objects into columns, keeping in mind that Cassandra may replicate those columns in any order.
    • Design for common read/write patterns.
      • The ability to arbitrarily query is limited.
  • My initial data model
    • I started with normal Column Family:
    • names (Column Family)
      • Key: fully qualified domain name (FQDN)
      • Columns
        • Name: Record type (A, AAAA, MX, …)
        • Value: All data (addresses, TTL, etc.) as JSON
    • Efficient for lookups for a type or ANY
    • But: All records of one type must be replaced at once. Cassandra keeps latest column written.
      • Can't rely on reading, modifying, then writing
  • Data Model
    • Then, I dissected records into sub-columns:
    • names (Super Column Family)
      • Key: fully qualified domain name (FQDN)
      • Super Columns
        • Name: Record Type (A, AAAA, MX, …)
        • Sub-Columns
          • Name: Data (e.g. IP address)
          • Value: Metadata as JSON (TTL, preference)
    • Still efficient for lookups for a type or ANY
    • Using data as sub-column name results in keeping the latest metadata for any record.
  • Visualizing as a dictionary { ”test.example.com”: { ”A”: { ”192.168.0.1”: {”ttl”: 86400} ”192.168.0.2”: {”ttl”: 86400} } ”MX”: { ”mail.example.com”: {”preference”: 10, ”ttl”: 86400} } } } Key Super Column Name Super Column Name Sub- Column Names Stored in Cassandra as a JSON-encoded sub-column value. Sub- Column Name Sub- Column Values
  • Structuring the application
    • cassandranames.py + CassandraNames
      • DNS-centric Python API wrapping Cassandra
    • cassandranames-import.py
      • Shell-based import tool for BIND files
    • cassandranames-test.py
      • Python unit test to exercise the persistence
    • cassandradns.py + CassandraNamesResolver
      • Twisted-based DNS server using CassandraNames
  • Want to follow along with code?
    • Setup directions: https://wiki.getpantheon.com/display/CONF/ Cassandra+DNS+server+setup
    • Code on GitHub: https://github.com/pantheon-systems/cassandra-dns/
  • Demo break
    • Let's clone the code down to two boxes on our demo cluster and run the test suite...
  • Schema setup def install_schema(drop_first=False, rf=3): keyspace_name = "dns" sm = pycassa.system_manager .SystemManager("127.0.0.1:9160") [snip the drop_first implementation] sm.create_keyspace(keyspace_name, replication_factor=rf) sm.create_column_family(keyspace_name, "names", super=True, key_validation_class= pycassa.system_manager.UTF8_TYPE, comparator_type= pycassa.system_manager.UTF8_TYPE, default_validation_class= pycassa.system_manager.UTF8_TYPE)
  • The CassandraNames class class CassandraNames: def __init__(self): self.pool = pycassa.connect("dns") [rest on upcoming slides]
  • Adding new records def insert(self, fqdn, type, data, ttl=900, preference=None): # Connect to the ColumnFamily cf = pycassa.ColumnFamily(self.pool, "names") # Start the metadata with just a TTL metadata = {"ttl": int(ttl)} # Add in a ”preference” if requested. if preference is not None: metadata["preference"] = int(preference) # Actually perform the insertion. cf.insert(fqdn, {str(type): {data: json.dumps(metadata)}})
  • Reading records def lookup(self, fqdn, type=ANY): cf = pycassa.ColumnFamily(self.pool, "names") try: columns = {} if type == ANY: # Pull all types of records. columns = dict(cf.get(fqdn)) else: # Pull only one type of record. columns = {str(type): dict(cf.get(fqdn, super_column=str(type)))} # Convert the JSON metadata into valid Python data. [snip] return decoded_columns except pycassa.cassandra.ttypes.NotFoundException: # If no records exist for the FQDN or type, # fail gracefully. pass return {}
  • Deleting records def remove(self, fqdn, type=ANY, data=None): cf = pycassa.ColumnFamily(self.pool, "names") if type == ANY: # Delete all records for the FQDN. cf.remove(fqdn) elif data is None: # Delete all records of a certain type from the FQDN. cf.remove(fqdn, super_column=str(type)) else: # Delete all records for a certain type and data. cf.remove(fqdn, super_column=str(type), columns=[data])
  • Making it actually serve DNS class CassandraNamesResolver(common.ResolverBase): implements(interfaces.IResolver) def __init__(self): self.names = cassandranames.CassandraNames() common.ResolverBase.__init__(self) def _lookup(self, name, cls, type, timeout): log.msg(”Type %s records for name: %s" % (type, name)) all_types = self.names.lookup(name, type) results = [] authority = [] additional = [] [continued on next slide] Python's Twisted includes a complete DNS server implementation with a pluggable resolver base (IResolver and common.ResolverBase).
  • Making it actually serve DNS def _lookup(self, name, cls, type, timeout): [function started on previous slide] for type, records in all_types.items(): for data, metadata in records.items(): if type == A: payload = dns.Record_A(data) elif type == MX: payload = dns.Record_MX( metadata["preference"], data) elif type == NS: payload = dns.Record_NS(data) header = dns.RRHeader(name, type=type, payload=payload, ttl=metadata["ttl"], auth=True) results.append(header) return defer.succeed((results, authority, additional))
  • Demo break
    • Let's actually play with the cluster:
      • Query the records left around by the test suite
      • Use the Python shell to manage records
      • Import a BIND zone file on one server
      • Query the imported records on a different server
  • Next steps
    • Properly firewall the cluster
      • Cassandra needs port 7000 for replication with other cluster servers.
      • Port 53 needs to be open for DNS requests.
    • Accelerate DNS by fronting each server with a djbdns cache
    • Finish the CNAME implementation (and other record types)
    • Consider a non-blocking library, like txCQL
    • GeoDNS using a Python GeoIP library
  • Conclusion
    • Questions?
    • Questions for later ?
      • I'm David Strauss (@davidstrauss)
    • Setup directions: https://wiki.getpantheon.com/display/CONF/ Cassandra+DNS+server+setup
    • Code on GitHub: https://github.com/pantheon-systems/cassandra-dns/
    • Pantheon Systems is hiring engineers and developers in the San Francisco Bay Area