Your SlideShare is downloading. ×
Alfresco Business Reporting - Tech Talk Live 20130501
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Alfresco Business Reporting - Tech Talk Live 20130501


Published on

This is the Slide Deck used in Alfresco's Tech Talk Live, May 1, 2013. It featured my Alfresco add-on: Alfresco Business Reporting. The purpose is to the technical 'why' and 'how' of the add-on …

This is the Slide Deck used in Alfresco's Tech Talk Live, May 1, 2013. It featured my Alfresco add-on: Alfresco Business Reporting. The purpose is to the technical 'why' and 'how' of the add-on module, the challenge faced and he solutions designed.

Published in: Technology

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Tech Talk Live #65:Jeff Potts & Richard Esplin & Tjarda Peelen
  • 2. Agenda• Who am I?• Why business reporting?• What is it about?• How was it achieved?• Demo• Q&A
  • 3. Who is Tjarda?• Netherlands• Incentro (ECM, WCM, BI, Search, Advisory)• Started Alfresco since v1.4 EE (private 1.2CE)• Generic Java, Config, Architecture, (Pre)Sales• Document Management, Publishing, Governement
  • 4. IncentroInformation is the centerof our approachExpertise
  • 5. Agenda• Who am I• Why business reporting?• What is it about?• How was it achieved?• Demo• Q&A
  • 6. Why Business ReportingThe challenge:• Alfresco does ‘not really’ support reporting• Business has reporting needs• Reporting needs:– change over time– can be specific for each business/organization/dept.
  • 7. My solution• Based on standard tooling(Pentaho Report Designer)• Scheduled execution (no UI for live configuration)• In a language a business user understands• Against Alfresco:– business objects (docs, folders, sites, users, audit)– metadata/propertiesWhy Business Reporting
  • 8. Agenda• Who am I• Why business reporting?• What is it about?• How was it achieved?• Demo• Q&A
  • 9. What is it aboutHarvestingBusiness relatedobjects + metadataExecution
  • 10. What is it about - Harvesting# usage: key = tablename, value=Lucene queryfolder=TYPE:"cm:folder" AND NOT TYPE:"st:site" AND NOTTYPE:"dl:dataList" AND NOT TYPE:"bpm:package" AND NOTTYPE:"cm:systemfolder" AND NOT TYPE:"fm:forum"document=TYPE:"cm:content" AND NOT TYPE:"bpm:task"AND NOT TYPE:"dl:dataListItem" AND NOTTYPE:"ia:calendarEvent" AND NOT TYPE:"lnk:link" ANDNOT TYPE:"cm:dictionaryModel" AND NOTASPECT:"reporting:executionResult"calendar=TYPE:"ia:calendarEvent"forum=TYPE:"fm:forum"link=TYPE:"lnk:link"site=TYPE:"st:site"#datalist=TYPE:"dl:dataList"datalistitem=TYPE:"dl:dataListItem"
  • 11. What is it about - ExecutionReportingTemplate ReportingRoot
  • 12. Agenda• Who am I• Why business reporting?• What is it about?• How it was achieved?• Demo• Q&A
  • 13. Reporting Considerations• The options– NoSQL– XML– Other– SQL…• Considerations:– Business needs to operate reporting– Knowledge and experience needs to exist in organizations– Run in cooperation with existing reporting tooling
  • 14. Reporting database principles• Alfresco short-qname becomes column name– sys:node-dbid  sys_node_dbid( : and – are not allowed in column/table names)• Multi value properties are comma separated concat.• Fixed (thoug configurable) default mapping ofAlfresco types onto database types.– There are exceptions to the rule, therefore:• Possibility to override default mapping on a per-property basis.– E.g. bt default d:noderefs=VARCHAR(400) but– Someco_relatedProducts=VARCHAR(800)
  • 15. Reporting database principles• settings:– ‘Blacklist’ properties to hide from reporting db– Configure to harvest WorkSpace and/or ArchiveSpace• Module accepts config override inshared/classes/alfresco/extension• Module harvests as System user.‘All stuff is there’. Reporting people areresponsible to behave nicely.
  • 16. Design decisions• Scheduled harvesting versus real time– Performance impact(need policies/behaviours for everything)– Started as JavaScript API– Not all objects to harvest generate events (audit)• Scheduled execution versus user interaction– Started as JavaScript API (e.g. no UI)– Parameterized reports are ‘recent’ development– UI driven configuration is even more recent– Manual configuration within UI might be possible (withinlimits of report tempate)• Reporting != auditing
  • 17. How it was achieved: Harvesting• Initial principles:– Metadata of all business objects (incl. customizations)(Aspects…)– Harvest only changed objects since last successful run– Process versions– JavaScript API to allow flexible execution• Expanded to– List of categories (tree-like structure)– Auditing framework– Users/Groups/SiteMembers– UI over JavaScript
  • 18. Limited number of search results• Problem: MaxSearchResults# The maximum time spent pruning resultssystem.acl.maxPermissionCheckTimeMillis=10000# The maximum number of results to perform permission checks againstsystem.acl.maxPermissionChecks=1000• Solution:– Limitation is there for a reason. Deal with it.(although technically (Java only?) you can work around it)– Search & sort by sys:node-dbid– Append to query:"AND @sys:node-dbid:[" + (last_dbid +1) + " TO MAX]";
  • 19. Model & Search: Nugget• Why is this feature hidden in the data modelling?<includedInSuperTypeQuery>false</includedInSuperTypeQuery>• Nice way of hiding custom sub-types from parentqueries (especially config-like types)
  • 20. Aspects & Associations & Categories• Strong Alfresco features  flexible & powerful– Find business objects (by query)– For each business object:• Get all properties – push to array– Respect multi-value (becomes comma separated)– Resolve tags and categories into labels• Get all parent-child assocs – push to array• Get all source-target assocs – push to array• Derive some meaningful props – push to array(for example: site name, display path, size)
  • 21. Push to reporting database• Tables ‘create ${table} if not exist’• Named queries from searches– Named auditing applications– Category name– Predefined names for users, groups, sitegroups• For each batch of results,– Determine superset of columns (=properties + types)– ‘Create %{column} if not exist’• Insert batch into the table– If date-modified changed, insert new row.– Insert statement varies depending on number of aspects/assocs– Set validFrom/validTo/isLatest on current & previous version/row• One mechanism fits all (also users/groups/categories)
  • 22. How it was achieved: Execution• Embed an existing reporting tool in Alfresco.• Business must be comfortable operating reportingtool• Scheduled execution needs no UI. Administratorscan configure, business can subreports sub-reports by relative path
  • 23. Credentials• Does anyone embed Pentaho and use Java API ?• Username/password stored inside report– Update all reports when migrating to other source orfrom dev  test  prod• JNDI (delegate credentials to app server)• Report is self contained (credentials/JNDI)• JNDI is the only enterprise solution, updatingeach and every report is not an option…– Requires additional config step in alfresco.xml
  • 24. Parameterization• Pentaho (and JasperReports) accept parametersto drive a report.• Current Alfresco ActionExecuter accepts up to 4parameters per report• Used to generate site based reports• Used to create generic report, and make specific(e.g. report Sites with non-internal SiteManagers)
  • 25. Execution Structure• Reporting Root(s)– Defines scope for contained containers/templates– Defines target queries– Enables/disables scheduled harvesting/execution– Execute all, harvest all• Reporting Container(s)– Contains reports scheduled at same frequency– Execute all Reporting Templates inside• Reporting Template(s)– Actual reporting templates (Pentaho’s prpt’s)– Enable/disable for automatic execution– Defines output path (by noderef or relative to ‘target’)– References target object from query in Reporting Root
  • 26. UI to tie it together
  • 27. Troubles along the way• Little knowledge available about Pentaho andcredentials/authentications using Java API• mltext-type fields (the name ‘Data Dictionary’ isnot the same in other languages)– Forces me into ActionHandlers to fix Explorer UI,– Or in Share development(needs to be done one point in time)• EagerContentCleaner cleans Alfresco’s tempfolder. Very eagerly
  • 28. Troubles along the way• Max length of sum of column sizes.(MySQL < 65.000 byte if UTF-8)– Tweak default mapping (decrease the defaults)– Make exceptions by property QName(increase/decrease per prop)• Auditing framework uses call-back mechanismdifferent from other services• Module started as a JavaScript API• Documentation is ‘a lot of work’• Finalizing a (side) project is ‘a lot of work’
  • 29. Challenges• How to detect changes in Categories/structures– Currently no incremental updates• How to detect changes in group structure andusers– Currently no incremental updates• If there is no property yet, there is no column– Can be an issue creating reports– Prepping the reporting database with empty columns• Not always possible  configurable?
  • 30. ToDo• Allow reporting database multi-vendor– MyBatis integration in progress• Allow multilangual Alfresco install’s– mltext properties bite (Explorer UI)• UI to Share– Harvest & Execution in Admin Panel– Execute parameterized reports on demand?• Cron jobs cluster aware• Get rid of JavaScript history (harvesting)– Script *not* thread-safe, run max 1 instance!• Mavenize & include more unit tests
  • 31. Demo
  • 32. Main report: Select Site
  • 33. Main report: Select SiteSELECT`site`.`site`,`site`.`st_siteVisibility`,`site`.`cm_title`,`site`.`cm_description`,`site`.`cm_owner`FROM`site`WHERE`site`.`isLatest` = trueAND `site`.`site` = ${sitename}
  • 34. Sub report: Users per Role
  • 35. Sub report: Users per RoleSELECTcount(*) as amount,`siteperson`.`siteRole` as roleFROM`siteperson`WHERE `siteperson`.`siteName` = ${sitename}GROUP BY `siteperson`.`siteRole`
  • 36. Sub report: Site members
  • 37. Sub report: Site membersSELECT DISTINCT `siteperson`.`userName`,`person`.`cm_email`,`person`.`cm_mobile`,`person`.`cm_telephone`,`person`.`cm_firstName`,`person`.`cm_instantmsg`,`person`.`cm_lastName`FROM`siteperson` INNER JOIN `person` ON `person`.`cm_userName` =`siteperson`.`userName`WHERE`siteperson`.`siteName` = ${sitename}ORDER BY`person`.`cm_lastName` ASC,`person`.`cm_firstName` ASC
  • 38. Configure in Alfresco
  • 39. My Best Practices• Dashlets/Pages are for real-time information– E.g. workflow progress• Reporting is for insight that does not have to bereal-time.• Reporting must be extendible by the customer• Design for Reporting– Have metadata available– accept redundancy and one or two additionalpolicies/behaviours
  • 40. DocumentedWiki BlogYoutube
  • 41. I like to publish your reporting case on the wiki.And I have a few books to give away to ‘impressive’ contributions:[en] [nl]Your reporting case…
  • 42. Q&A
  • 43. Alfresco Business ReportingBlog:http://tpeelen.wordpress.comCode & Wiki: