SlideShare a Scribd company logo
Delicious Data:
Automated Techniques for
   Complex Reports


             Presented by
    Jeff Godin and Benjamin Shum
How NOT to run reports...
Planning the framework
Planning the framework

   SQL Query
     Report
Planning the framework

   SQL Query   Export Results
     Report       to CSV
Planning the framework

   SQL Query    Export Results
     Report        to CSV




    Convert
   CSV to XLS
Planning the framework

   SQL Query    Export Results
     Report        to CSV




    Convert       Email XLS
   CSV to XLS     to Library
Planning the framework

   SQL Query      Export Results
     Report          to CSV

           Schedule as
            Cron Job

    Convert          Email XLS
   CSV to XLS        to Library
Some Strategies


Schedule as   Use our experience
 Cron Job     setting up crontab
SQL Query        Use our existing
   Report          SQL reports



Export Results     CSV is a standard
   to CSV        format to work with
Convert
CSV to XLS    Find a converter




Email XLS
             Find an email client
to Library
Implementation


• Can find online at
  http://evergreen-ils.org/dokuwiki/doku.php?
  id=scratchpad:automated_sql_reports
Future of the work

• Additional formatting
• Dealing with reports with zero results
• Adding new reports
• Implementing more refinements
No database access?

• Create scheduled reports
• Completion notification via e-mail
• Script pulls report data
• Import and process
Don’t do this unless
   you have to
Reports as Exports

• Create template that outputs raw data
• Schedule report
• Enable e-mail notification to special address
Act on the e-mail

• fetchmail -> procmail -> perl
• Report data is automatically downloaded
• Within a minute or so
act-on-mail.pl
#!/usr/bin/perl

# called by procmail, and receives the mail message on stdin

use strict; use warnings;

my $url;

while (<>) {

    if ($_ =~ /https://reporter/) {

    
     $url = $_;

    
     last;

    }
}

system("/home/stats/bin/fetch-report.pl", "-u", $url);
fetch-report.pl logic

• Fetch URL like:
  https://host/reporter/[...]/report-data.html

•   Automatically log in if needed (keep a cookie)

•   Download report-data.csv and save to output dir
imports.ini example 1
[general]
dir=/home/stats/reports
state_dir=/home/stats/state

[group holdratio]
name=Hold Ratio
file1=holds_outstanding.csv
file2=holdable_copies.csv
file3=bib_info.csv
command=psql -f /home/stats/sql/holds_init.sql
#post-command=/home/stats/bin/mail-holdratio.pl
imports.ini example 2
[general]
dir=/home/stats/reports
state_dir=/home/stats/state

[group items_for_export]
name=Items for Export
file1=items_for_export.csv
command=psql -f /home/stats/sql/items_for_export.sql
post-command=/home/stats/bin/export_bibs.pl -f /home/stats/
output/bre_ids.csv
Import script overview
• Runs from cron
• Have all input files been updated since the
  last time I ran this import?
• Run import
• Update timestamp
• Run post-import command
The import process

• drop table
• create table
• import data
• process to output
The post-import

• Send the report output somewhere
• E-mail notify that “report’s ready at
  someurl”
• Trigger something like a bib export of the
  records that were output
One more thing...


• Don’t forget to delete your scheduled
  report output
Custom reporting
views exposed in the
        IDL
SQL views in the IDL

• Exposed via reporting interface
• Good for on-demand reporting
• Don’t break the IDL
 • xmllint is your friend
 • pick a class id unlikely to collide
<class id="rlcd"
   controller="open-ils.cstore open-ils.pcrud open-
ils.reporter-store"
   oils_obj:fieldmapper="reporter::last_copy_deleted"
   oils_persist:readonly="true"
   reporter:core="true"
   reporter:label="Last Copy Delete Time" >
<oils_persist:source_definition>

SELECT b.id,
    MAX(dcp.edit_date) AS last_delete_date

FROM biblio.record_entry b
   JOIN asset.call_number cn ON (cn.record = b.id)
   JOIN asset.copy dcp ON (cn.id = dcp.call_number)

WHERE NOT b.deleted

GROUP BY b.id

HAVING SUM( CASE WHEN NOT dcp.deleted THEN 1
ELSE 0 END) = 0
<!-- continued -->
     <fields oils_persist:primary="id"
oils_persist:sequence="biblio.record_entry">
        <field reporter:label="Record ID" name="id"
reporter:datatype="id"/>
        <field reporter:label="Delete Date/Time"
name="last_delete_date" reporter:datatype="timestamp"/>
     </fields>
     <links>
        <link field="id" reltype="has_a" key="id" map=""
class="bre"/>
     </links>
<!-- continued -->
     <permacrud xmlns="http://open-ils.org/spec/opensrf/
IDL/permacrud/v1">
       <actions>
          <retrieve/>
       </actions>
     </permacrud>
  </class>
The Future

• General third party reporting frameworks?
• Library Dashboards?
• Your Ideas?
Photo Credits
Surrounded by papers by Flickr user suratlozowick
http://www.flickr.com/photos/suratlozowick/
4522926028/
Thanks!

More Related Content

What's hot

Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
DataWorks Summit
 
Operational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos ChristineOperational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos Christine
Spark Summit
 
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Databricks
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Databricks
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Julian Hyde
 
H2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaH2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal Malohlava
Sri Ambati
 
Degrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files SyndromeDegrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files Syndrome
Databricks
 
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested ColumnsMaterialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Databricks
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the union
Databricks
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
Radu Tudoran
 
Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021
Mark Kromer
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
Databricks
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
Databricks
 
Parallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkRParallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkR
Databricks
 
Structured Streaming Use-Cases at Apple
Structured Streaming Use-Cases at AppleStructured Streaming Use-Cases at Apple
Structured Streaming Use-Cases at Apple
Databricks
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
Tao Feng
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
Knoldus Inc.
 
Automated Spark Deployment With Declarative Infrastructure
Automated Spark Deployment With Declarative InfrastructureAutomated Spark Deployment With Declarative Infrastructure
Automated Spark Deployment With Declarative Infrastructure
Spark Summit
 

What's hot (20)

Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 
Operational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos ChristineOperational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos Christine
 
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
H2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaH2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal Malohlava
 
Degrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files SyndromeDegrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files Syndrome
 
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested ColumnsMaterialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the union
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir Volk
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
 
Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
 
Parallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkRParallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkR
 
Structured Streaming Use-Cases at Apple
Structured Streaming Use-Cases at AppleStructured Streaming Use-Cases at Apple
Structured Streaming Use-Cases at Apple
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Automated Spark Deployment With Declarative Infrastructure
Automated Spark Deployment With Declarative InfrastructureAutomated Spark Deployment With Declarative Infrastructure
Automated Spark Deployment With Declarative Infrastructure
 

Viewers also liked

What's New With Open Source (CLA2011)
What's New With Open Source (CLA2011)What's New With Open Source (CLA2011)
What's New With Open Source (CLA2011)
Benjamin Shum
 
The Perils of Perception in 2016: Ipsos MORI
The Perils of Perception in 2016: Ipsos MORIThe Perils of Perception in 2016: Ipsos MORI
The Perils of Perception in 2016: Ipsos MORI
Ipsos UK
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
Aaron Irizarry
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
Ned Potter
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
LinkedIn
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
Seth Familian
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
Drift
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
Leslie Samuel
 

Viewers also liked (8)

What's New With Open Source (CLA2011)
What's New With Open Source (CLA2011)What's New With Open Source (CLA2011)
What's New With Open Source (CLA2011)
 
The Perils of Perception in 2016: Ipsos MORI
The Perils of Perception in 2016: Ipsos MORIThe Perils of Perception in 2016: Ipsos MORI
The Perils of Perception in 2016: Ipsos MORI
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Similar to EG Reports - Delicious Data

Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理
Sadayuki Furuhashi
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
Sadayuki Furuhashi
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development
Open Party
 
6 tips for improving ruby performance
6 tips for improving ruby performance6 tips for improving ruby performance
6 tips for improving ruby performance
Engine Yard
 
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
Amazon Web Services
 
Unbreakable Sharepoint 2016 With SQL Server 2016 availability groups
Unbreakable Sharepoint 2016 With SQL Server 2016 availability groupsUnbreakable Sharepoint 2016 With SQL Server 2016 availability groups
Unbreakable Sharepoint 2016 With SQL Server 2016 availability groups
Isabelle Van Campenhoudt
 
Many Faces Of Bi Publisher In Oracle Ebs
Many Faces Of Bi Publisher In Oracle EbsMany Faces Of Bi Publisher In Oracle Ebs
Many Faces Of Bi Publisher In Oracle Ebs
Hossam El-Faxe
 
Ldap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLLLdap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLL
sbahloul
 
DW on AWS
DW on AWSDW on AWS
DW on AWS
Gaurav Agrawal
 
AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722
Amazon Web Services
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
Amazon Web Services
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
Julian Hyde
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
Amazon Web Services
 
Unbreakable SharePoint 2016 with SQL Server 2016 Always On Availability groups
Unbreakable SharePoint 2016 with SQL Server 2016 Always On Availability groupsUnbreakable SharePoint 2016 with SQL Server 2016 Always On Availability groups
Unbreakable SharePoint 2016 with SQL Server 2016 Always On Availability groups
serge luca
 
DMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4Reporting
David Mann
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and Drupal
Promet Source
 
GGX 2014 - Grails and the real time world
GGX 2014 - Grails and the real time worldGGX 2014 - Grails and the real time world
GGX 2014 - Grails and the real time world
Iván López Martín
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
Databricks
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
Sean Chittenden
 

Similar to EG Reports - Delicious Data (20)

Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development
 
6 tips for improving ruby performance
6 tips for improving ruby performance6 tips for improving ruby performance
6 tips for improving ruby performance
 
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
 
Unbreakable Sharepoint 2016 With SQL Server 2016 availability groups
Unbreakable Sharepoint 2016 With SQL Server 2016 availability groupsUnbreakable Sharepoint 2016 With SQL Server 2016 availability groups
Unbreakable Sharepoint 2016 With SQL Server 2016 availability groups
 
Many Faces Of Bi Publisher In Oracle Ebs
Many Faces Of Bi Publisher In Oracle EbsMany Faces Of Bi Publisher In Oracle Ebs
Many Faces Of Bi Publisher In Oracle Ebs
 
Ldap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLLLdap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLL
 
DW on AWS
DW on AWSDW on AWS
DW on AWS
 
AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Unbreakable SharePoint 2016 with SQL Server 2016 Always On Availability groups
Unbreakable SharePoint 2016 with SQL Server 2016 Always On Availability groupsUnbreakable SharePoint 2016 with SQL Server 2016 Always On Availability groups
Unbreakable SharePoint 2016 with SQL Server 2016 Always On Availability groups
 
DMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4Reporting
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and Drupal
 
GGX 2014 - Grails and the real time world
GGX 2014 - Grails and the real time worldGGX 2014 - Grails and the real time world
GGX 2014 - Grails and the real time world
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 

Recently uploaded

A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
simonomuemu
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 

Recently uploaded (20)

A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 

EG Reports - Delicious Data

  • 1. Delicious Data: Automated Techniques for Complex Reports Presented by Jeff Godin and Benjamin Shum
  • 2. How NOT to run reports...
  • 4. Planning the framework SQL Query Report
  • 5. Planning the framework SQL Query Export Results Report to CSV
  • 6. Planning the framework SQL Query Export Results Report to CSV Convert CSV to XLS
  • 7. Planning the framework SQL Query Export Results Report to CSV Convert Email XLS CSV to XLS to Library
  • 8. Planning the framework SQL Query Export Results Report to CSV Schedule as Cron Job Convert Email XLS CSV to XLS to Library
  • 9. Some Strategies Schedule as Use our experience Cron Job setting up crontab
  • 10. SQL Query Use our existing Report SQL reports Export Results CSV is a standard to CSV format to work with
  • 11. Convert CSV to XLS Find a converter Email XLS Find an email client to Library
  • 12. Implementation • Can find online at http://evergreen-ils.org/dokuwiki/doku.php? id=scratchpad:automated_sql_reports
  • 13. Future of the work • Additional formatting • Dealing with reports with zero results • Adding new reports • Implementing more refinements
  • 14. No database access? • Create scheduled reports • Completion notification via e-mail • Script pulls report data • Import and process
  • 15. Don’t do this unless you have to
  • 16. Reports as Exports • Create template that outputs raw data • Schedule report • Enable e-mail notification to special address
  • 17. Act on the e-mail • fetchmail -> procmail -> perl • Report data is automatically downloaded • Within a minute or so
  • 18. act-on-mail.pl #!/usr/bin/perl # called by procmail, and receives the mail message on stdin use strict; use warnings; my $url; while (<>) { if ($_ =~ /https://reporter/) { $url = $_; last; } } system("/home/stats/bin/fetch-report.pl", "-u", $url);
  • 19. fetch-report.pl logic • Fetch URL like: https://host/reporter/[...]/report-data.html • Automatically log in if needed (keep a cookie) • Download report-data.csv and save to output dir
  • 20. imports.ini example 1 [general] dir=/home/stats/reports state_dir=/home/stats/state [group holdratio] name=Hold Ratio file1=holds_outstanding.csv file2=holdable_copies.csv file3=bib_info.csv command=psql -f /home/stats/sql/holds_init.sql #post-command=/home/stats/bin/mail-holdratio.pl
  • 21. imports.ini example 2 [general] dir=/home/stats/reports state_dir=/home/stats/state [group items_for_export] name=Items for Export file1=items_for_export.csv command=psql -f /home/stats/sql/items_for_export.sql post-command=/home/stats/bin/export_bibs.pl -f /home/stats/ output/bre_ids.csv
  • 22. Import script overview • Runs from cron • Have all input files been updated since the last time I ran this import? • Run import • Update timestamp • Run post-import command
  • 23. The import process • drop table • create table • import data • process to output
  • 24. The post-import • Send the report output somewhere • E-mail notify that “report’s ready at someurl” • Trigger something like a bib export of the records that were output
  • 25. One more thing... • Don’t forget to delete your scheduled report output
  • 27. SQL views in the IDL • Exposed via reporting interface • Good for on-demand reporting • Don’t break the IDL • xmllint is your friend • pick a class id unlikely to collide
  • 28. <class id="rlcd" controller="open-ils.cstore open-ils.pcrud open- ils.reporter-store" oils_obj:fieldmapper="reporter::last_copy_deleted" oils_persist:readonly="true" reporter:core="true" reporter:label="Last Copy Delete Time" >
  • 29. <oils_persist:source_definition> SELECT b.id, MAX(dcp.edit_date) AS last_delete_date FROM biblio.record_entry b JOIN asset.call_number cn ON (cn.record = b.id) JOIN asset.copy dcp ON (cn.id = dcp.call_number) WHERE NOT b.deleted GROUP BY b.id HAVING SUM( CASE WHEN NOT dcp.deleted THEN 1 ELSE 0 END) = 0
  • 30. <!-- continued --> <fields oils_persist:primary="id" oils_persist:sequence="biblio.record_entry"> <field reporter:label="Record ID" name="id" reporter:datatype="id"/> <field reporter:label="Delete Date/Time" name="last_delete_date" reporter:datatype="timestamp"/> </fields> <links> <link field="id" reltype="has_a" key="id" map="" class="bre"/> </links>
  • 31. <!-- continued --> <permacrud xmlns="http://open-ils.org/spec/opensrf/ IDL/permacrud/v1"> <actions> <retrieve/> </actions> </permacrud> </class>
  • 32. The Future • General third party reporting frameworks? • Library Dashboards? • Your Ideas?
  • 33. Photo Credits Surrounded by papers by Flickr user suratlozowick http://www.flickr.com/photos/suratlozowick/ 4522926028/

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n