Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

(ATS6-PLAT02) Accelrys Catalog and Protocol Validation


Published on

Accelrys Catalog is a powerful new technology for creating an index of the protocols and components within your organization. You will learn about strategies for indexing and how search capabilities …

Accelrys Catalog is a powerful new technology for creating an index of the protocols and components within your organization. You will learn about strategies for indexing and how search capabilities can be deployed to professional client and Web Port end users. You will also learn how to use this technology to find out about system usage to aid with system upgrades, server consolidations, and general system maintenance. The protocol validation capability in the admin portal allows administrators to created standard reports on server usage characteristics. You will learn how to report on violations of IT policies (e.g. around security), bad protocol authoring practices, or missing or incomplete protocol documentation. Developers will also learn how to extend and customize the rules used to create these reports.

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. (ATS6-PLAT02) Accelrys Catalog andProtocol ValidationTon van DaelenDirector, Platform Product Managementton.vandaelen@accelrys.comDana HoneycuttR&D
  • 2. The information on the roadmap and future software development efforts areintended to outline general product direction and should not be relied on in makinga purchasing decision.
  • 3. Content: Accelrys Catalog and Protocol Validation• What is it?• Who is it for?• How does it work?• What’s behind it?• How does it really work?• How do I maintain it?• How do I troubleshoot?• Show me more details!
  • 4. The Size of the Challenge• 10-100 Pro client users• 50-1000 Web users• 1-10 servers• -> 5000+ protocols to be managed
  • 5. What is Accelrys Catalog?• A searchable index of the component and protocoldatabase (XMLDB) on Accelrys Enterprise Platform (AEP)servers• A Google-like text search facility in the Pipeline Pilotclient and Web Port• A query form and results browser in the AdministrationPortal
  • 6. Who is it for?PP Pro Client User–Personal ProductivitySearch from Pro ClientExamples that use the ‘Http Connector’ componentPilotScript referencing ‘rsplit()’Protocols using MAO dataWeb Port UserSearch from Web PortWeb Port Protocols containing specificterms or phrasesAEP AdministratorAdministerGenerate indexSet index update scheduleStructured Searchfrom Admin PortalProtocols not run in6+ monthsProtocols byspecific authorProtocols withmany versionsCatalogXml log Validation reports‘Canned’ reportsabout policy / bestpractice / securityviolationsXml
  • 7. DemoPro Client Search• See how Protocol Authors can find components andprotocols to speed up building of protocols and facilitatere-use
  • 8. How does it work? PP Client: text searchSearch textSearch results
  • 9. PP Client: usage search
  • 10. DemoWeb Port Search• See how Web Port users can quickly find the protocolthey need to solve a particular problem
  • 11. Web Port: text search
  • 12. Admin Search• Administrators can search protocol databases fordifferent servers
  • 13. Admin Use Cases• General queries. Find protocols:– with components that are deprecated (ad hoc / report)– not run in n days– not changed in n days– by client type (pro client, web port, web service, Notebook,Isentris, …)– with components with GUID x– with SQL components with specific DSN
  • 14. DemoAdmin Portal Search
  • 15. Admin Portal: structured search
  • 16. How does it work?• The inner workings of Accelrys Catalog
  • 17. Introduction to Text Searching• Unstructured orminimally-structured searches– Think “Google”– Keyword-based,non-relational; widerange of user input– Based on lookupsusing pre-built word(token) indexes
  • 18. Introduction to Text Searching (cont’d)• Strategies to make searches more effective– Stop word removal: and, the, by, for, of, …– Stemming: startedstart, clusterscluster, etc.– Synonym aliasing: oncology=cancer, MB=megabyte, etc.(supported but only minimally implemented; extensible)– Language-specific document and query processing (support forAsian languages)
  • 19. What’s behind it? Apache Solr• Open source text search server• Part of Apache Software Foundation• Uses and extends Lucene Java searchlibrary• Hosted by Tomcat in AEP•
  • 20. Solr: Under the Hood…• Schema– XML specification of document fields and their types– Specifies how fields are tokenized and processed for indexing• Solr config file– XML specification of query and result set processing rules– E.g. field weights• Optional auxiliary files– Stop words, synonyms, protected words (unstemmed)• Host application container– For AEP this is Tomcat
  • 21. Tokenization and Filtering• Tokenization options in Solr– Break on whitespace– Break on all non-letter characters– Break on case changes (for CamelCaseTokenization)– Break on character set changes (alphanum/ideographic/katakana)• Additional filters– Lowercase filter: converts all characters to lowercase– CJK bigram filter: outputs adjacent character pairs for Asian languages– Stem filter: applies stemming rules (many language-specific variants)• Field indexing and query processing use same tokenization– Better search results may be obtained by using slightly different analysis for indexingversus querying• See
  • 22. Customizing Solr
  • 23. Creating the Catalog Index• XML Database = Component/Protocol Database• For each item in XMLDB, an indexing protocol– reads the item from the database– creates data record properties corresponding to Solr fields– joins in statistics from usage log– converts the data record to a JSON “document”– POSTs the document to Apache/Tomcat/Solr via HTTP• Weighting– Protocol name and description have higher weight– Proximity has higher weight
  • 24. Some Catalog Fields (defined in schema)• name: protocol or component (or parameter) name• path: location in XMLDB• type: “component” or “protocol” (or “parameter”)• parameters: names of top-level parameters• author: user who created protocol/component• lastsaveddate: date protocol/component last changed• runcount: number of times protocol has been run• lastrundate: date protocol was last run• uses: list of components used by protocol• alltext: composite field for free text search
  • 25. Configuring Accelrys Catalog• Configuration (admin portal)– AEP servers to index– Indexing schedule• Note– Indexer runs as scheduled service– Indexing takes ~1 to 3 minutesper 1000 XMLDB items– Two index copies; users cancontinue search while index isrebuilt– Tomcat and Solr automaticallyinstalled and launched withApache
  • 26. Limitations• Usage info depends on protocol name (“Protocol 1” !)• No indexing at runtime – it can take a day before index isupdated
  • 27. Searching Remote Servers• Support for 8.0 and 8.5 servers– Not all xmldb features supported– Supported for Admin Search, not Web Port or PP• Configure at Catalog Settings page– Remote server username must have admin privileges
  • 28. How do I troubleshoot (indexing)?• Status message in admin portal– Persistent error may indicate need to adjust settings or insufficientuser privileges for remote server– Check server log files for details• Time to update index can vary widely (from ~7 minutes to ~8hours in actual tests)– Settings can be adjusted to trade off indexing time against parametervalue search functionality– Server speed, server load, number and complexity of protocols inXMLDB all affect indexing time
  • 29. Global Properties page of Admin Portal• Only go to this page if there is a persistent problem with indexing• Set Package to Accelrys/Accelrys Catalog• Disabling parameter indexing in Admin Search reduces indexing time by50+%– Set EnableParameterIndexing to “False”• “Chunk size” settings trade off memory against indexing speed– Decrease if indexer reports out-of-memory errors:– ParallelBatchSize: number of components/protocols processed by each sub-jobof indexer to mitigate memory footprint of Component Reader (default: 40)– NumComponentsPerGroup: number of protocol/component documents sentto Solr in a single HTTP POST (default: 10)– NumParametersPerGroup: number of parameter documents sent to Solr in asingle HTTP POST (default: 150). Only relevant if EnableParameterIndexing is“True”.
  • 30. How do I troubleshoot (searching)?• If search results are not what you expect…– Use Raw Query Output example protocol• Connect to Pro Client as admin user• Open Raw Query Output under Protocols/Utilities/…• Set Query Catalog parameters and run• Inspect output to see how Solr processed your query– To really dig deep: use Solr admin page• In Firefox (not IE), go to http://aepserver:aepport/appcatalog/admin• Click on appcat1 or appcat2• Try Query, Analysis, and Schema Browser tools
  • 31. Show me more details! Advanced search syntax
  • 32. More Example Queries• MAO type:Component– Any components referencing ‘MAO’• uses:"Xml Reader" AND NOT author:Accelrys– Components/protocols that have an xml reader and are not authored by Accelrys• lastrun:[* TO NOW-6MONTH]– Last run at least six months prior• runcount:0– Never been runNOTES:• Field names case-sensitive. Field values not.• Phrases require quoting (double quotes); single words do not.• Boolean and other reserved words must be uppercase.• Examples in help text• Definitive list in schema.xml (Solr admin page)
  • 33. Relevant Components and Protocols• You may need admin privileges to see these• Database and Application Integration/Admin/Catalog/Utilities /Internals/Query Catalog– Use this for custom queries within a protocol– Supports faceted queries, exposes all schema fields• Protocols/Utilities/Accelrys/Administration/Catalog/Update CatalogIndex– Main indexing protocol; normally launched by scheduler– Launches a separate remote job for each indexed server• Protocols/Utilities/Accelrys/Administration/Catalog/Utilities folder– Raw Query Output: Demonstrates use of Query Catalog component; useful fordebugging searches– Schedule Catalog Indexing: Generates Catalog Settings admin page
  • 34. Protocol Validation• Automated reports to find issues with protocols withinyour user base
  • 35. Protocol Validation Use Cases …• Bad design practices. Find protocols that:– have shortcuts as copies– have saved checkpoints– store passwords– have components that are owner access only– don’t have top level parameters (Web Port)– have component with absolute file paths• Bad documentation practices. Find protocols that:– don’t have help text (or default help)– have components with missing captions
  • 36. Protocol Validation (also new in 9.0)
  • 37. • Analyze Validationreport– Performs analysis of theprotocol validationreport created for theValidation Report pageof the AdministrationPortal
  • 38. Links for Advanced Topics• Schema and Tokenization–––• Solr query syntax and parsing–– [used by Accelrys Catalog]• Joins in Solr [how Search Catalog does parameter value searching]–• Faceting–––– Not exposed in UI; use Query Catalog component
  • 39. • Accelrys Catalog is a powerful search technology built intoAEP• With Protocol Validation this provides critical tools foradministering enterprise deployments• Plan for 9.0 upgrade now• Relevant talks– (ATS6-DEV01) What’s new for Developers in AEP 9.0– (ATS6-Roadmap01) Platform Roadmap– (ATS6-PLAT07) Managing AEP in an enterprise environmentSummary