• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cassandra-Powered Distributed DNS
 

Cassandra-Powered Distributed DNS

on

  • 5,878 views

P

P

Statistics

Views

Total Views
5,878
Views on SlideShare
5,792
Embed Views
86

Actions

Likes
5
Downloads
113
Comments
1

5 Embeds 86

http://nosql.io 59
http://www.chenliliang.com 13
http://www.linkedin.com 7
http://localhost 6
http://source.gild.com 1

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • I added on nosql.io also.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Cassandra-Powered Distributed DNS Cassandra-Powered Distributed DNS Presentation Transcript

    • Highly Available DNS and Request Routing Using Apache Cassandra A Real-World Introduction to Cassandra's Data Structures + Python's pyCassa module David Strauss / Founder + CTO / Pantheon Systems
    • Why another DNS server?
      • DNS servers either have no replication or require writing to a defined master server .
        • Exceptions (Active Directory and ApacheDS) require backing DNS with a heavyweight and annoying directory service like LDAP.
      • Critical DNS services need replication
        • ...to withstand DDoS attacks
        • ...to maintain uptime when major regional links fail
      • The zone file formats in use are awful.
      • Maintaining data persistence and replication should not be the DNS server's problem.
    • Why Cassandra?
      • Easy cluster setup and management
      • Built-in replication and high availability
      • Multi-master: writers don't need to understand the replication topology.
      • Data model similarity to DNS
      • Eventual consistency isn't a problem
      • Not a perfect match, though:
        • Write scalability is overkill
        • High memory requirements
    • Demo break
      • Let's set up a basic, three-node Cassandra cluster...
    • Creating the data model
      • Think in terms of a nested dictionary.
      • Design for eventual consistency.
        • Columns are the units (atoms) of replication.
        • Some columns may be replicated before others.
        • Column names are unique in each row or SuperC.
        • When possible, dissect objects into columns, keeping in mind that Cassandra may replicate those columns in any order.
      • Design for common read/write patterns.
        • The ability to arbitrarily query is limited.
    • My initial data model
      • I started with normal Column Family:
      • names (Column Family)
        • Key: fully qualified domain name (FQDN)
        • Columns
          • Name: Record type (A, AAAA, MX, …)
          • Value: All data (addresses, TTL, etc.) as JSON
      • Efficient for lookups for a type or ANY
      • But: All records of one type must be replaced at once. Cassandra keeps latest column written.
        • Can't rely on reading, modifying, then writing
    • Data Model
      • Then, I dissected records into sub-columns:
      • names (Super Column Family)
        • Key: fully qualified domain name (FQDN)
        • Super Columns
          • Name: Record Type (A, AAAA, MX, …)
          • Sub-Columns
            • Name: Data (e.g. IP address)
            • Value: Metadata as JSON (TTL, preference)
      • Still efficient for lookups for a type or ANY
      • Using data as sub-column name results in keeping the latest metadata for any record.
    • Visualizing as a dictionary { ”test.example.com”: { ”A”: { ”192.168.0.1”: {”ttl”: 86400} ”192.168.0.2”: {”ttl”: 86400} } ”MX”: { ”mail.example.com”: {”preference”: 10, ”ttl”: 86400} } } } Key Super Column Name Super Column Name Sub- Column Names Stored in Cassandra as a JSON-encoded sub-column value. Sub- Column Name Sub- Column Values
    • Structuring the application
      • cassandranames.py + CassandraNames
        • DNS-centric Python API wrapping Cassandra
      • cassandranames-import.py
        • Shell-based import tool for BIND files
      • cassandranames-test.py
        • Python unit test to exercise the persistence
      • cassandradns.py + CassandraNamesResolver
        • Twisted-based DNS server using CassandraNames
    • Want to follow along with code?
      • Setup directions: https://wiki.getpantheon.com/display/CONF/ Cassandra+DNS+server+setup
      • Code on GitHub: https://github.com/pantheon-systems/cassandra-dns/
    • Demo break
      • Let's clone the code down to two boxes on our demo cluster and run the test suite...
    • Schema setup def install_schema(drop_first=False, rf=3): keyspace_name = "dns" sm = pycassa.system_manager .SystemManager("127.0.0.1:9160") [snip the drop_first implementation] sm.create_keyspace(keyspace_name, replication_factor=rf) sm.create_column_family(keyspace_name, "names", super=True, key_validation_class= pycassa.system_manager.UTF8_TYPE, comparator_type= pycassa.system_manager.UTF8_TYPE, default_validation_class= pycassa.system_manager.UTF8_TYPE)
    • The CassandraNames class class CassandraNames: def __init__(self): self.pool = pycassa.connect("dns") [rest on upcoming slides]
    • Adding new records def insert(self, fqdn, type, data, ttl=900, preference=None): # Connect to the ColumnFamily cf = pycassa.ColumnFamily(self.pool, "names") # Start the metadata with just a TTL metadata = {"ttl": int(ttl)} # Add in a ”preference” if requested. if preference is not None: metadata["preference"] = int(preference) # Actually perform the insertion. cf.insert(fqdn, {str(type): {data: json.dumps(metadata)}})
    • Reading records def lookup(self, fqdn, type=ANY): cf = pycassa.ColumnFamily(self.pool, "names") try: columns = {} if type == ANY: # Pull all types of records. columns = dict(cf.get(fqdn)) else: # Pull only one type of record. columns = {str(type): dict(cf.get(fqdn, super_column=str(type)))} # Convert the JSON metadata into valid Python data. [snip] return decoded_columns except pycassa.cassandra.ttypes.NotFoundException: # If no records exist for the FQDN or type, # fail gracefully. pass return {}
    • Deleting records def remove(self, fqdn, type=ANY, data=None): cf = pycassa.ColumnFamily(self.pool, "names") if type == ANY: # Delete all records for the FQDN. cf.remove(fqdn) elif data is None: # Delete all records of a certain type from the FQDN. cf.remove(fqdn, super_column=str(type)) else: # Delete all records for a certain type and data. cf.remove(fqdn, super_column=str(type), columns=[data])
    • Making it actually serve DNS class CassandraNamesResolver(common.ResolverBase): implements(interfaces.IResolver) def __init__(self): self.names = cassandranames.CassandraNames() common.ResolverBase.__init__(self) def _lookup(self, name, cls, type, timeout): log.msg(”Type %s records for name: %s" % (type, name)) all_types = self.names.lookup(name, type) results = [] authority = [] additional = [] [continued on next slide] Python's Twisted includes a complete DNS server implementation with a pluggable resolver base (IResolver and common.ResolverBase).
    • Making it actually serve DNS def _lookup(self, name, cls, type, timeout): [function started on previous slide] for type, records in all_types.items(): for data, metadata in records.items(): if type == A: payload = dns.Record_A(data) elif type == MX: payload = dns.Record_MX( metadata["preference"], data) elif type == NS: payload = dns.Record_NS(data) header = dns.RRHeader(name, type=type, payload=payload, ttl=metadata["ttl"], auth=True) results.append(header) return defer.succeed((results, authority, additional))
    • Demo break
      • Let's actually play with the cluster:
        • Query the records left around by the test suite
        • Use the Python shell to manage records
        • Import a BIND zone file on one server
        • Query the imported records on a different server
    • Next steps
      • Properly firewall the cluster
        • Cassandra needs port 7000 for replication with other cluster servers.
        • Port 53 needs to be open for DNS requests.
      • Accelerate DNS by fronting each server with a djbdns cache
      • Finish the CNAME implementation (and other record types)
      • Consider a non-blocking library, like txCQL
      • GeoDNS using a Python GeoIP library
    • Conclusion
      • Questions?
      • Questions for later ?
        • I'm David Strauss (@davidstrauss)
      • Setup directions: https://wiki.getpantheon.com/display/CONF/ Cassandra+DNS+server+setup
      • Code on GitHub: https://github.com/pantheon-systems/cassandra-dns/
      • Pantheon Systems is hiring engineers and developers in the San Francisco Bay Area