Tech Talk Live #65:Jeff Potts & Richard Esplin & Tjarda Peelen
Agenda• Who am I?• Why business reporting?• What is it about?• How was it achieved?• Demo• Q&A
Who is Tjarda?• Netherlands• Incentro (ECM, WCM, BI, Search, Advisory)• Started Alfresco since v1.4 EE (private 1.2CE)• Ge...
IncentroInformation is the centerof our approachExpertise
Agenda• Who am I• Why business reporting?• What is it about?• How was it achieved?• Demo• Q&A
Why Business ReportingThe challenge:• Alfresco does ‘not really’ support reporting• Business has reporting needs• Reportin...
My solution• Based on standard tooling(Pentaho Report Designer)• Scheduled execution (no UI for live configuration)• In a ...
Agenda• Who am I• Why business reporting?• What is it about?• How was it achieved?• Demo• Q&A
What is it aboutHarvestingBusiness relatedobjects + metadataExecution
What is it about - Harvesting# usage: key = tablename, value=Lucene queryfolder=TYPE:"cm:folder" AND NOT TYPE:"st:site" AN...
What is it about - ExecutionReportingTemplate ReportingRoot
Agenda• Who am I• Why business reporting?• What is it about?• How it was achieved?• Demo• Q&A
Reporting Considerations• The options– NoSQL– XML– Other– SQL…• Considerations:– Business needs to operate reporting– Know...
Reporting database principles• Alfresco short-qname becomes column name– sys:node-dbid  sys_node_dbid( : and – are not al...
Reporting database principles• alfresco-global.properties settings:– ‘Blacklist’ properties to hide from reporting db– Con...
Design decisions• Scheduled harvesting versus real time– Performance impact(need policies/behaviours for everything)– Star...
How it was achieved: Harvesting• Initial principles:– Metadata of all business objects (incl. customizations)(Aspects…)– H...
Limited number of search results• Problem: MaxSearchResults# The maximum time spent pruning resultssystem.acl.maxPermissio...
Model & Search: Nugget• Why is this feature hidden in the data modelling?<includedInSuperTypeQuery>false</includedInSuperT...
Aspects & Associations & Categories• Strong Alfresco features  flexible & powerful– Find business objects (by query)– For...
Push to reporting database• Tables ‘create ${table} if not exist’• Named queries from searches– Named auditing application...
How it was achieved: Execution• Embed an existing reporting tool in Alfresco.• Business must be comfortable operating repo...
Credentials• Does anyone embed Pentaho and use Java API ?• Username/password stored inside report– Update all reports when...
Parameterization• Pentaho (and JasperReports) accept parametersto drive a report.• Current Alfresco ActionExecuter accepts...
Execution Structure• Reporting Root(s)– Defines scope for contained containers/templates– Defines target queries– Enables/...
UI to tie it together
Troubles along the way• Little knowledge available about Pentaho andcredentials/authentications using Java API• mltext-typ...
Troubles along the way• Max length of sum of column sizes.(MySQL < 65.000 byte if UTF-8)– Tweak default mapping (decrease ...
Challenges• How to detect changes in Categories/structures– Currently no incremental updates• How to detect changes in gro...
ToDo• Allow reporting database multi-vendor– MyBatis integration in progress• Allow multilangual Alfresco install’s– mltex...
Demo
Main report: Select Site
Main report: Select SiteSELECT`site`.`site`,`site`.`st_siteVisibility`,`site`.`cm_title`,`site`.`cm_description`,`site`.`c...
Sub report: Users per Role
Sub report: Users per RoleSELECTcount(*) as amount,`siteperson`.`siteRole` as roleFROM`siteperson`WHERE `siteperson`.`site...
Sub report: Site members
Sub report: Site membersSELECT DISTINCT `siteperson`.`userName`,`person`.`cm_email`,`person`.`cm_mobile`,`person`.`cm_tele...
Configure in Alfresco
My Best Practices• Dashlets/Pages are for real-time information– E.g. workflow progress• Reporting is for insight that doe...
DocumentedWiki BlogYoutube
I like to publish your reporting case on the wiki.And I have a few books to give away to ‘impressive’ contributions:[en] [...
Q&A
Alfresco Business ReportingBlog:http://tpeelen.wordpress.comCode & Wiki:https://code.google.com/p/alfresco-business-report...
Upcoming SlideShare
Loading in …5
×

Alfresco Business Reporting - Tech Talk Live 20130501

2,001 views

Published on

This is the Slide Deck used in Alfresco's Tech Talk Live, May 1, 2013. It featured my Alfresco add-on: Alfresco Business Reporting. The purpose is to the technical 'why' and 'how' of the add-on module, the challenge faced and he solutions designed.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,001
On SlideShare
0
From Embeds
0
Number of Embeds
214
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Alfresco Business Reporting - Tech Talk Live 20130501

  1. 1. Tech Talk Live #65:Jeff Potts & Richard Esplin & Tjarda Peelen
  2. 2. Agenda• Who am I?• Why business reporting?• What is it about?• How was it achieved?• Demo• Q&A
  3. 3. Who is Tjarda?• Netherlands• Incentro (ECM, WCM, BI, Search, Advisory)• Started Alfresco since v1.4 EE (private 1.2CE)• Generic Java, Config, Architecture, (Pre)Sales• Document Management, Publishing, Governement
  4. 4. IncentroInformation is the centerof our approachExpertise
  5. 5. Agenda• Who am I• Why business reporting?• What is it about?• How was it achieved?• Demo• Q&A
  6. 6. Why Business ReportingThe challenge:• Alfresco does ‘not really’ support reporting• Business has reporting needs• Reporting needs:– change over time– can be specific for each business/organization/dept.
  7. 7. My solution• Based on standard tooling(Pentaho Report Designer)• Scheduled execution (no UI for live configuration)• In a language a business user understands• Against Alfresco:– business objects (docs, folders, sites, users, audit)– metadata/propertiesWhy Business Reporting
  8. 8. Agenda• Who am I• Why business reporting?• What is it about?• How was it achieved?• Demo• Q&A
  9. 9. What is it aboutHarvestingBusiness relatedobjects + metadataExecution
  10. 10. What is it about - Harvesting# usage: key = tablename, value=Lucene queryfolder=TYPE:"cm:folder" AND NOT TYPE:"st:site" AND NOTTYPE:"dl:dataList" AND NOT TYPE:"bpm:package" AND NOTTYPE:"cm:systemfolder" AND NOT TYPE:"fm:forum"document=TYPE:"cm:content" AND NOT TYPE:"bpm:task"AND NOT TYPE:"dl:dataListItem" AND NOTTYPE:"ia:calendarEvent" AND NOT TYPE:"lnk:link" ANDNOT TYPE:"cm:dictionaryModel" AND NOTASPECT:"reporting:executionResult"calendar=TYPE:"ia:calendarEvent"forum=TYPE:"fm:forum"link=TYPE:"lnk:link"site=TYPE:"st:site"#datalist=TYPE:"dl:dataList"datalistitem=TYPE:"dl:dataListItem"
  11. 11. What is it about - ExecutionReportingTemplate ReportingRoot
  12. 12. Agenda• Who am I• Why business reporting?• What is it about?• How it was achieved?• Demo• Q&A
  13. 13. Reporting Considerations• The options– NoSQL– XML– Other– SQL…• Considerations:– Business needs to operate reporting– Knowledge and experience needs to exist in organizations– Run in cooperation with existing reporting tooling
  14. 14. Reporting database principles• Alfresco short-qname becomes column name– sys:node-dbid  sys_node_dbid( : and – are not allowed in column/table names)• Multi value properties are comma separated concat.• Fixed (thoug configurable) default mapping ofAlfresco types onto database types.– There are exceptions to the rule, therefore:• Possibility to override default mapping on a per-property basis.– E.g. bt default d:noderefs=VARCHAR(400) but– Someco_relatedProducts=VARCHAR(800)
  15. 15. Reporting database principles• alfresco-global.properties settings:– ‘Blacklist’ properties to hide from reporting db– Configure to harvest WorkSpace and/or ArchiveSpace• Module accepts config override inshared/classes/alfresco/extension• Module harvests as System user.‘All stuff is there’. Reporting people areresponsible to behave nicely.
  16. 16. Design decisions• Scheduled harvesting versus real time– Performance impact(need policies/behaviours for everything)– Started as JavaScript API– Not all objects to harvest generate events (audit)• Scheduled execution versus user interaction– Started as JavaScript API (e.g. no UI)– Parameterized reports are ‘recent’ development– UI driven configuration is even more recent– Manual configuration within UI might be possible (withinlimits of report tempate)• Reporting != auditing
  17. 17. How it was achieved: Harvesting• Initial principles:– Metadata of all business objects (incl. customizations)(Aspects…)– Harvest only changed objects since last successful run– Process versions– JavaScript API to allow flexible execution• Expanded to– List of categories (tree-like structure)– Auditing framework– Users/Groups/SiteMembers– UI over JavaScript
  18. 18. Limited number of search results• Problem: MaxSearchResults# The maximum time spent pruning resultssystem.acl.maxPermissionCheckTimeMillis=10000# The maximum number of results to perform permission checks againstsystem.acl.maxPermissionChecks=1000• Solution:– Limitation is there for a reason. Deal with it.(although technically (Java only?) you can work around it)– Search & sort by sys:node-dbid– Append to query:"AND @sys:node-dbid:[" + (last_dbid +1) + " TO MAX]";
  19. 19. Model & Search: Nugget• Why is this feature hidden in the data modelling?<includedInSuperTypeQuery>false</includedInSuperTypeQuery>• Nice way of hiding custom sub-types from parentqueries (especially config-like types)
  20. 20. Aspects & Associations & Categories• Strong Alfresco features  flexible & powerful– Find business objects (by query)– For each business object:• Get all properties – push to array– Respect multi-value (becomes comma separated)– Resolve tags and categories into labels• Get all parent-child assocs – push to array• Get all source-target assocs – push to array• Derive some meaningful props – push to array(for example: site name, display path, size)
  21. 21. Push to reporting database• Tables ‘create ${table} if not exist’• Named queries from searches– Named auditing applications– Category name– Predefined names for users, groups, sitegroups• For each batch of results,– Determine superset of columns (=properties + types)– ‘Create %{column} if not exist’• Insert batch into the table– If date-modified changed, insert new row.– Insert statement varies depending on number of aspects/assocs– Set validFrom/validTo/isLatest on current & previous version/row• One mechanism fits all (also users/groups/categories)
  22. 22. How it was achieved: Execution• Embed an existing reporting tool in Alfresco.• Business must be comfortable operating reportingtool• Scheduled execution needs no UI. Administratorscan configure, business can use.zip-like subreports sub-reports by relative path
  23. 23. Credentials• Does anyone embed Pentaho and use Java API ?• Username/password stored inside report– Update all reports when migrating to other source orfrom dev  test  prod• JNDI (delegate credentials to app server)• Report is self contained (credentials/JNDI)• JNDI is the only enterprise solution, updatingeach and every report is not an option…– Requires additional config step in alfresco.xml
  24. 24. Parameterization• Pentaho (and JasperReports) accept parametersto drive a report.• Current Alfresco ActionExecuter accepts up to 4parameters per report• Used to generate site based reports• Used to create generic report, and make specific(e.g. report Sites with non-internal SiteManagers)
  25. 25. Execution Structure• Reporting Root(s)– Defines scope for contained containers/templates– Defines target queries– Enables/disables scheduled harvesting/execution– Execute all, harvest all• Reporting Container(s)– Contains reports scheduled at same frequency– Execute all Reporting Templates inside• Reporting Template(s)– Actual reporting templates (Pentaho’s prpt’s)– Enable/disable for automatic execution– Defines output path (by noderef or relative to ‘target’)– References target object from query in Reporting Root
  26. 26. UI to tie it together
  27. 27. Troubles along the way• Little knowledge available about Pentaho andcredentials/authentications using Java API• mltext-type fields (the name ‘Data Dictionary’ isnot the same in other languages)– Forces me into ActionHandlers to fix Explorer UI,– Or in Share development(needs to be done one point in time)• EagerContentCleaner cleans Alfresco’s tempfolder. Very eagerly
  28. 28. Troubles along the way• Max length of sum of column sizes.(MySQL < 65.000 byte if UTF-8)– Tweak default mapping (decrease the defaults)– Make exceptions by property QName(increase/decrease per prop)• Auditing framework uses call-back mechanismdifferent from other services• Module started as a JavaScript API• Documentation is ‘a lot of work’• Finalizing a (side) project is ‘a lot of work’
  29. 29. Challenges• How to detect changes in Categories/structures– Currently no incremental updates• How to detect changes in group structure andusers– Currently no incremental updates• If there is no property yet, there is no column– Can be an issue creating reports– Prepping the reporting database with empty columns• Not always possible  configurable?
  30. 30. ToDo• Allow reporting database multi-vendor– MyBatis integration in progress• Allow multilangual Alfresco install’s– mltext properties bite (Explorer UI)• UI to Share– Harvest & Execution in Admin Panel– Execute parameterized reports on demand?• Cron jobs cluster aware• Get rid of JavaScript history (harvesting)– Script *not* thread-safe, run max 1 instance!• Mavenize & include more unit tests
  31. 31. Demo
  32. 32. Main report: Select Site
  33. 33. Main report: Select SiteSELECT`site`.`site`,`site`.`st_siteVisibility`,`site`.`cm_title`,`site`.`cm_description`,`site`.`cm_owner`FROM`site`WHERE`site`.`isLatest` = trueAND `site`.`site` = ${sitename}
  34. 34. Sub report: Users per Role
  35. 35. Sub report: Users per RoleSELECTcount(*) as amount,`siteperson`.`siteRole` as roleFROM`siteperson`WHERE `siteperson`.`siteName` = ${sitename}GROUP BY `siteperson`.`siteRole`
  36. 36. Sub report: Site members
  37. 37. Sub report: Site membersSELECT DISTINCT `siteperson`.`userName`,`person`.`cm_email`,`person`.`cm_mobile`,`person`.`cm_telephone`,`person`.`cm_firstName`,`person`.`cm_instantmsg`,`person`.`cm_lastName`FROM`siteperson` INNER JOIN `person` ON `person`.`cm_userName` =`siteperson`.`userName`WHERE`siteperson`.`siteName` = ${sitename}ORDER BY`person`.`cm_lastName` ASC,`person`.`cm_firstName` ASC
  38. 38. Configure in Alfresco
  39. 39. My Best Practices• Dashlets/Pages are for real-time information– E.g. workflow progress• Reporting is for insight that does not have to bereal-time.• Reporting must be extendible by the customer• Design for Reporting– Have metadata available– accept redundancy and one or two additionalpolicies/behaviours
  40. 40. DocumentedWiki BlogYoutube
  41. 41. I like to publish your reporting case on the wiki.And I have a few books to give away to ‘impressive’ contributions:[en] [nl]Your reporting case…
  42. 42. Q&A
  43. 43. Alfresco Business ReportingBlog:http://tpeelen.wordpress.comCode & Wiki:https://code.google.com/p/alfresco-business-reportingYoutube:http://www.youtube.com/user/opensourceecmLinkedIn:http://nl.linkedin.com/in/tpeelenTwitter:@tpeelen

×