Your SlideShare is downloading. ×
Saving the Elephant with Slonik
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Saving the Elephant with Slonik


Published on

Railsberry 2013 presentation about how we're saving the planet at WCMC using PostgreSQL :)

Railsberry 2013 presentation about how we're saving the planet at WCMC using PostgreSQL :)

Published in: Career, Technology, Business

1 Comment
  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Saving the Elephant withSlonikAgnieszka Figiel @agnessa480UNEP-WCMCRailsberry 2013
  • 2. Taxon concepts and rankstaxon conceptsranks
  • 3. A brief history of gorilla classificationAuthor & Year Scientific nameSavage1847Troglodytes gorilla(Pan gorilla)I. Geoffroy St. Hilaire1952Gorilla gorillaTuttle1967Pan gorillaGroves1967Gorilla gorilla gorillahomonymsynonymsplit / merge
  • 4. A matter of opinionTaxonomy A:Loxodonta africanaTaxonomy B:Loxodonta africanaLoxodonta cyclotis
  • 5. #1: CTEsWITH name [ ( columns) ] AS (attached query)primary query
  • 6. WITH endemic_taxon_concepts AS (SELECT taxon_concept_idFROM distributionsGROUP BY taxon_concept_idHAVING COUNT(*) = 1), countries_with_endemic_distributions AS (SELECT d.geo_entity_id, COUNT(d.taxon_concept_id) AS cntFROM distributions dINNER JOIN endemic_taxon_concepts qON d.taxon_concept_id = q.taxon_concept_idGROUP BY d.geo_entity_id)SELECT geo_entities.name_en, cntFROM countries_with_endemic_distributions qINNER JOIN geo_entities ON = q.geo_entity_idORDER BY cnt DESC
  • 7. name cntIndonesia 1353Mexico 1069Madagascar 970Australia 886Brazil 763Ecuador 564Papua New Guinea 561South Africa 532United States of America 520
  • 8. Data-modifying CTEsWITH deactivated_geo_entities AS (UPDATE geo_entities SET is_active = FALSEWHERE id IN (#{old_geo_entity_ids})RETURNING id)UPDATE distributionsSET geo_entity_id = #{new_geo_entity_id}FROM deactivated_geo_entitiesWHERE distributions.geo_entity_id = deactivated_geo_entities.idCTE = materialize by design
  • 9. #2: Recursive CTEsWITH RECURSIVE name [ (columns) ] AS (non-recursive termUNION [ALL]recursive term)primary query
  • 10. WITH RECURSIVE self_and_descendants (id, full_name) AS (SELECT id, full_name FROM taxon_conceptsWHERE id = 472UNIONSELECT, hi.full_name FROM taxon_concepts hiJOIN self_and_descendants d ON = hi.parent_id)SELECT COUNT(*) FROM self_and_descendantscount432
  • 11. WITH RECURSIVE self_and_ancestors (parent_id, full_name, level) AS (SELECT parent_id, full_name, 1FROM taxon_concepts WHERE id = 5563UNIONSELECT hi.parent_id, hi.full_name, q.level + 1FROM taxon_concepts hiJOIN self_and_ancestors q ON = q.parent_id)SELECT full_nameFROM self_and_ancestors ORDER BY level DESC
  • 12. WITH crocodile_ancestry AS (WITH RECURSIVE self_and_ancestors (-- [AS IN PREVIOUS SLIDE]))SELECT ARRAY_TO_STRING(ARRAY_AGG(full_name), > )AS breadcrumb FROM crocodile_ancestrybreadcrumbAnimalia > Chordata > Reptilia > Crocodylia >Crocodylidae > Crocodylus > Crocodylus niloticus
  • 13. Cascade with exceptions
  • 14. WITH RECURSIVE cascading_refs(taxon_concept_id, exclusions) AS (SELECT, h_refs.excluded_taxon_concepts_idsFROM taxon_concepts hLEFT JOIN taxon_concept_references h_refs ON h_refs.taxon_concept_id = h.idWHERE = 10 AND h_refs.reference_id = 369UNIONSELECT, cascading_refs.exclusionsFROM taxon_concepts hiJOIN cascading_refs ON cascading_refs.taxon_concept_id = hi.parent_idWHERE NOT COALESCE(cascading_refs.exclusions, ARRAY[]::INT[]) @>ARRAY[])UPDATE taxon_concepts SET has_std_ref = TRUEFROM cascading_refsWHERE cascading_refs.taxon_concept_id =
  • 15. #3: Window functionsSELECT ROW_NUMBER() OVER(ORDER BY full_name), full_nameFROM taxon_conceptsWHERE parent_id = 335 ORDER BY full_namerow_number full_name1 Canis2 Cerdocyon3 Chrysocyon4 Cuon5 Dusicyon
  • 16. WITH RECURSIVE q(id, full_name, path) AS (SELECT id, full_name, ARRAY[1] FROM taxon_concepts hWHERE id = 335UNIONSELECT, hi.full_name,q.path || (ROW_NUMBER() OVER(PARTITION BY parent_id ORDER BY hi.full_name))::INTFROM taxon_concepts hiJOIN q ON hi.parent_id = path, full_name FROM qORDER BY pathCTE + window function
  • 17. path full_name{1} Canidae{1,1} Canis{1,1,1} Canis adustus{1,1,2} Canis aureus{1,1,3} Canis familiaris(...){1,1,7} Canis lupus{1,1,7,1} Canis lupus crassodon{1,1,7,2} Canis lupus dingo{1,2} Cerdocyon{1,2,1} Cerdocyon thous
  • 18. With CTE and Windowing,SQL is Turing Complete.
  • 19. SQL Antipatterns: Avoiding the Pitfalls of DatabaseProgramming Bill KarwinPostgreSQL: Up and Running Regina Obe, Leo HsuHigh Performance SQL with PostgreSQL 8.4 of CITES SpeciesBiodiversity Information Standards (TDWG)Items freed into the public domain Pearson ScottForesmanPostgreSQLCode & DemoGraphicsTaxonomy