• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Saving the Elephant with Slonik
 

Saving the Elephant with Slonik

on

  • 1,653 views

Railsberry 2013 presentation about how we're saving the planet at WCMC using PostgreSQL :)

Railsberry 2013 presentation about how we're saving the planet at WCMC using PostgreSQL :)

Statistics

Views

Total Views
1,653
Views on SlideShare
1,039
Embed Views
614

Actions

Likes
0
Downloads
1
Comments
0

9 Embeds 614

http://informatics.unep-wcmc.org 263
https://assets.txmblr.com 94
https://twitter.com 83
http://assets.txmblr.com 72
http://localhost 53
http://wcmc-informatics.tumblr.com 41
http://eventifier.co 4
http://lolcathost 3
http://www.gizoogle.net 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Saving the Elephant with Slonik Saving the Elephant with Slonik Presentation Transcript

    • Saving the Elephant withSlonikAgnieszka Figiel @agnessa480UNEP-WCMCRailsberry 2013
    • Taxon concepts and rankstaxon conceptsranks
    • A brief history of gorilla classificationAuthor & Year Scientific nameSavage1847Troglodytes gorilla(Pan gorilla)I. Geoffroy St. Hilaire1952Gorilla gorillaTuttle1967Pan gorillaGroves1967Gorilla gorilla gorillahomonymsynonymsplit / merge
    • A matter of opinionTaxonomy A:Loxodonta africanaTaxonomy B:Loxodonta africanaLoxodonta cyclotis
    • #1: CTEsWITH name [ ( columns) ] AS (attached query)primary query
    • WITH endemic_taxon_concepts AS (SELECT taxon_concept_idFROM distributionsGROUP BY taxon_concept_idHAVING COUNT(*) = 1), countries_with_endemic_distributions AS (SELECT d.geo_entity_id, COUNT(d.taxon_concept_id) AS cntFROM distributions dINNER JOIN endemic_taxon_concepts qON d.taxon_concept_id = q.taxon_concept_idGROUP BY d.geo_entity_id)SELECT geo_entities.name_en, cntFROM countries_with_endemic_distributions qINNER JOIN geo_entities ON geo_entities.id = q.geo_entity_idORDER BY cnt DESC
    • name cntIndonesia 1353Mexico 1069Madagascar 970Australia 886Brazil 763Ecuador 564Papua New Guinea 561South Africa 532United States of America 520
    • Data-modifying CTEsWITH deactivated_geo_entities AS (UPDATE geo_entities SET is_active = FALSEWHERE id IN (#{old_geo_entity_ids})RETURNING id)UPDATE distributionsSET geo_entity_id = #{new_geo_entity_id}FROM deactivated_geo_entitiesWHERE distributions.geo_entity_id = deactivated_geo_entities.idCTE = materialize by design
    • #2: Recursive CTEsWITH RECURSIVE name [ (columns) ] AS (non-recursive termUNION [ALL]recursive term)primary query
    • WITH RECURSIVE self_and_descendants (id, full_name) AS (SELECT id, full_name FROM taxon_conceptsWHERE id = 472UNIONSELECT hi.id, hi.full_name FROM taxon_concepts hiJOIN self_and_descendants d ON d.id = hi.parent_id)SELECT COUNT(*) FROM self_and_descendantscount432
    • WITH RECURSIVE self_and_ancestors (parent_id, full_name, level) AS (SELECT parent_id, full_name, 1FROM taxon_concepts WHERE id = 5563UNIONSELECT hi.parent_id, hi.full_name, q.level + 1FROM taxon_concepts hiJOIN self_and_ancestors q ON hi.id = q.parent_id)SELECT full_nameFROM self_and_ancestors ORDER BY level DESC
    • WITH crocodile_ancestry AS (WITH RECURSIVE self_and_ancestors (-- [AS IN PREVIOUS SLIDE]))SELECT ARRAY_TO_STRING(ARRAY_AGG(full_name), > )AS breadcrumb FROM crocodile_ancestrybreadcrumbAnimalia > Chordata > Reptilia > Crocodylia >Crocodylidae > Crocodylus > Crocodylus niloticus
    • Cascade with exceptions
    • WITH RECURSIVE cascading_refs(taxon_concept_id, exclusions) AS (SELECT h.id, h_refs.excluded_taxon_concepts_idsFROM taxon_concepts hLEFT JOIN taxon_concept_references h_refs ON h_refs.taxon_concept_id = h.idWHERE h.id = 10 AND h_refs.reference_id = 369UNIONSELECT hi.id, cascading_refs.exclusionsFROM taxon_concepts hiJOIN cascading_refs ON cascading_refs.taxon_concept_id = hi.parent_idWHERE NOT COALESCE(cascading_refs.exclusions, ARRAY[]::INT[]) @>ARRAY[hi.id])UPDATE taxon_concepts SET has_std_ref = TRUEFROM cascading_refsWHERE cascading_refs.taxon_concept_id = taxon_concepts.id
    • #3: Window functionsSELECT ROW_NUMBER() OVER(ORDER BY full_name), full_nameFROM taxon_conceptsWHERE parent_id = 335 ORDER BY full_namerow_number full_name1 Canis2 Cerdocyon3 Chrysocyon4 Cuon5 Dusicyon
    • WITH RECURSIVE q(id, full_name, path) AS (SELECT id, full_name, ARRAY[1] FROM taxon_concepts hWHERE id = 335UNIONSELECT hi.id, hi.full_name,q.path || (ROW_NUMBER() OVER(PARTITION BY parent_id ORDER BY hi.full_name))::INTFROM taxon_concepts hiJOIN q ON hi.parent_id = q.id)SELECT path, full_name FROM qORDER BY pathCTE + window function
    • path full_name{1} Canidae{1,1} Canis{1,1,1} Canis adustus{1,1,2} Canis aureus{1,1,3} Canis familiaris(...){1,1,7} Canis lupus{1,1,7,1} Canis lupus crassodon{1,1,7,2} Canis lupus dingo{1,2} Cerdocyon{1,2,1} Cerdocyon thous
    • With CTE and Windowing,SQL is Turing Complete.
    • SQL Antipatterns: Avoiding the Pitfalls of DatabaseProgramming Bill KarwinPostgreSQL: Up and Running Regina Obe, Leo HsuHigh Performance SQL with PostgreSQL 8.4https://github.com/unepwcmc/SAPIChecklist of CITES SpeciesBiodiversity Information Standards (TDWG)Items freed into the public domain Pearson ScottForesmanPostgreSQLCode & DemoGraphicsTaxonomy