Saving the Elephant with Slonik


Published on

Railsberry 2013 presentation about how we're saving the planet at WCMC using PostgreSQL :)

Published in: Career, Technology, Business
1 Comment
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Saving the Elephant with Slonik

  1. 1. Saving the Elephant withSlonikAgnieszka Figiel @agnessa480UNEP-WCMCRailsberry 2013
  2. 2. Taxon concepts and rankstaxon conceptsranks
  3. 3. A brief history of gorilla classificationAuthor & Year Scientific nameSavage1847Troglodytes gorilla(Pan gorilla)I. Geoffroy St. Hilaire1952Gorilla gorillaTuttle1967Pan gorillaGroves1967Gorilla gorilla gorillahomonymsynonymsplit / merge
  4. 4. A matter of opinionTaxonomy A:Loxodonta africanaTaxonomy B:Loxodonta africanaLoxodonta cyclotis
  5. 5. #1: CTEsWITH name [ ( columns) ] AS (attached query)primary query
  6. 6. WITH endemic_taxon_concepts AS (SELECT taxon_concept_idFROM distributionsGROUP BY taxon_concept_idHAVING COUNT(*) = 1), countries_with_endemic_distributions AS (SELECT d.geo_entity_id, COUNT(d.taxon_concept_id) AS cntFROM distributions dINNER JOIN endemic_taxon_concepts qON d.taxon_concept_id = q.taxon_concept_idGROUP BY d.geo_entity_id)SELECT geo_entities.name_en, cntFROM countries_with_endemic_distributions qINNER JOIN geo_entities ON = q.geo_entity_idORDER BY cnt DESC
  7. 7. name cntIndonesia 1353Mexico 1069Madagascar 970Australia 886Brazil 763Ecuador 564Papua New Guinea 561South Africa 532United States of America 520
  8. 8. Data-modifying CTEsWITH deactivated_geo_entities AS (UPDATE geo_entities SET is_active = FALSEWHERE id IN (#{old_geo_entity_ids})RETURNING id)UPDATE distributionsSET geo_entity_id = #{new_geo_entity_id}FROM deactivated_geo_entitiesWHERE distributions.geo_entity_id = deactivated_geo_entities.idCTE = materialize by design
  9. 9. #2: Recursive CTEsWITH RECURSIVE name [ (columns) ] AS (non-recursive termUNION [ALL]recursive term)primary query
  10. 10. WITH RECURSIVE self_and_descendants (id, full_name) AS (SELECT id, full_name FROM taxon_conceptsWHERE id = 472UNIONSELECT, hi.full_name FROM taxon_concepts hiJOIN self_and_descendants d ON = hi.parent_id)SELECT COUNT(*) FROM self_and_descendantscount432
  11. 11. WITH RECURSIVE self_and_ancestors (parent_id, full_name, level) AS (SELECT parent_id, full_name, 1FROM taxon_concepts WHERE id = 5563UNIONSELECT hi.parent_id, hi.full_name, q.level + 1FROM taxon_concepts hiJOIN self_and_ancestors q ON = q.parent_id)SELECT full_nameFROM self_and_ancestors ORDER BY level DESC
  12. 12. WITH crocodile_ancestry AS (WITH RECURSIVE self_and_ancestors (-- [AS IN PREVIOUS SLIDE]))SELECT ARRAY_TO_STRING(ARRAY_AGG(full_name), > )AS breadcrumb FROM crocodile_ancestrybreadcrumbAnimalia > Chordata > Reptilia > Crocodylia >Crocodylidae > Crocodylus > Crocodylus niloticus
  13. 13. Cascade with exceptions
  14. 14. WITH RECURSIVE cascading_refs(taxon_concept_id, exclusions) AS (SELECT, h_refs.excluded_taxon_concepts_idsFROM taxon_concepts hLEFT JOIN taxon_concept_references h_refs ON h_refs.taxon_concept_id = h.idWHERE = 10 AND h_refs.reference_id = 369UNIONSELECT, cascading_refs.exclusionsFROM taxon_concepts hiJOIN cascading_refs ON cascading_refs.taxon_concept_id = hi.parent_idWHERE NOT COALESCE(cascading_refs.exclusions, ARRAY[]::INT[]) @>ARRAY[])UPDATE taxon_concepts SET has_std_ref = TRUEFROM cascading_refsWHERE cascading_refs.taxon_concept_id =
  15. 15. #3: Window functionsSELECT ROW_NUMBER() OVER(ORDER BY full_name), full_nameFROM taxon_conceptsWHERE parent_id = 335 ORDER BY full_namerow_number full_name1 Canis2 Cerdocyon3 Chrysocyon4 Cuon5 Dusicyon
  16. 16. WITH RECURSIVE q(id, full_name, path) AS (SELECT id, full_name, ARRAY[1] FROM taxon_concepts hWHERE id = 335UNIONSELECT, hi.full_name,q.path || (ROW_NUMBER() OVER(PARTITION BY parent_id ORDER BY hi.full_name))::INTFROM taxon_concepts hiJOIN q ON hi.parent_id = path, full_name FROM qORDER BY pathCTE + window function
  17. 17. path full_name{1} Canidae{1,1} Canis{1,1,1} Canis adustus{1,1,2} Canis aureus{1,1,3} Canis familiaris(...){1,1,7} Canis lupus{1,1,7,1} Canis lupus crassodon{1,1,7,2} Canis lupus dingo{1,2} Cerdocyon{1,2,1} Cerdocyon thous
  18. 18. With CTE and Windowing,SQL is Turing Complete.
  19. 19. SQL Antipatterns: Avoiding the Pitfalls of DatabaseProgramming Bill KarwinPostgreSQL: Up and Running Regina Obe, Leo HsuHigh Performance SQL with PostgreSQL 8.4 of CITES SpeciesBiodiversity Information Standards (TDWG)Items freed into the public domain Pearson ScottForesmanPostgreSQLCode & DemoGraphicsTaxonomy