8. The Solution
• A custom Puppet Node Classifier (PNC) Ruby script on every compile master
• A custom External Node Classifier (ENC) JavaScript that interfaces with the companies
global CMDB tables and with our scoped application tables
• Multiple scoped Application Tables to store the Puppet Role and Profile data
• Standard REST API calls to communicate between the PNC and ENC
Richard Romanus
ServiceNow Puppet Master
Application
Tables
ENC PNC
REST
JavaScript Ruby
9. ServiceNow
Puppet ENC
The Architecture
Puppet Master
Scoped
Application
Tables ENC
PNC
1. Check for Overrides
2. Get FACTs from Puppet DB
3. Get Puppet data from Puppet Console
4. Get Business Data from ENC
5. Merge values and Return
Servername
environment: production
classes:
puppet_roles::common: {}
parameters:
tmosn_role_type: database
tmosn_business_app_name: Puppet Enterprise
tmosn_sox: 'no'
tmosn_pci: 'no'
tmosn_usedfor: Development
tmosn_install_status: installed
tmosn_location: Seattle
Global
Tables
(Global Space)
(Scoped Application Space)
Server
Puppet Agent
10. The Architecture (ver 1.0)
Puppet ENC
External Node Classifier
Servers
Puppet Agent
Code
Base
Puppet Database
last_puppet_run
Server FACTS
Puppet
Compile Masters
Puppet
Node
Classifier
Latest Code
Releases
Puppet Console
Puppet Cluster
FACTs
FACTs
FACTs
Puppet
Specific
Classes
ENC
Endpoints
GET
ENC
Server Table
ENC
Role Table
ServiceNow
Global
Server
Table
Global
Business
Data
Table ENC
Environments
Table
(Global Space)
Global
Relationship
Tables
(Scoped Application Space)
JavaScript
RUBY
Richard Romanus
11. The Results
Pros:
• It worked at scale!
• Mostly Stable
• Good performance
Richard Romanus
Cons:
• Very limited
• No overriding
• No table ACLs
• No API
12. The Results
• Implemented Roles & Profiles
• The ServiceNow Global Business information
provided immediate default Role classification.
• However, limited access to the role field in
ServiceNow, limited its usability.
Richard Romanus
puppet roles
|
+- manifests
| |
| +- apps
| | |
| | +- <BUSINESS_NAME#1>
| | | +- default.pp
| | | +- <role1>.pp
| | | +- <role2>.pp
. | |
. | +- <BUSINESS_NAME#2>
. | +- default.pp
| +- <role1>.pp
|
+- common.pp
13. The Results
Richard Romanus
Puppet ENC
External Node Classifier ENC
Endpoint
GET
ServiceNow
Environments
Environments
production
test_env
Roles
puppet
puppet
splunk
compiler
database
master
puppet_roles::puppet::compiler
puppet_roles::puppet::database
puppet_roles::splunk::master
splunk
. . .
database
. . .
puppet_roles::splunk::database
. . .
Business Name Role Puppet Classes
Servers
serverabc
serverxyz
server123
production
production
production
server456
. . .
test_env
. . .
Server Name Env
compiler
database
database
master
. . .
Role
puppet
puppet
splunk
splunk
. . .
Business
. . .
server123
environment: production
classes:
puppet_roles::apps::splunk::database: {}
parameters:
tmosn_role_type: database
tmosn_business_app_name: Splunk
tmosn_sox: 'no'
tmosn_pci: 'no'
tmosn_usedfor: Development
tmosn_install_status: installed
tmosn_location: Seattle
puppet roles
|
+- manifests
| |
| +- apps
. | |
. | +- puppet
. | | +- default.pp
| | +- compiler.pp
| | +- database.pp
| |
| +- splunk
| +- default.pp
| +- database.pp
| +- master.pp
|
+- common.pp
class puppet_roles::apps::splunk::database {
include puppet_profile::apps::splunk
}
Puppet Compile Masters
PNC
16. Puppet ENC – Version 2.0
Richard Romanus
Configuration
ManagementBusiness Data
Puppet ENC - Version 2.0
- Automated Patching
User Interface
- Automated Patching
Workflow
- FACTs
- Ops Server Check Data
- Puppet Last Run DataBusiness Data
Server Status Data
17. Version 2.0
Solution:
• Updated the PNC and ENC to gather and store the of Last Puppet Run data and
Operations Server Check data in the ENC tables
• Created a new ‘Puppet Patch Ready’API to determine if a server is ready for patching
• Created a wrapper script that runs the Operations Server Check script and saves the results
to a FACT file.
Richard Romanus
Problem:
• Make automated patching more stable and reliable.
18. The Architecture – Version 2.0
Puppet ENC
External Node Classifier
Servers
Puppet Agent
Code
Base
Puppet Database
last_puppet_run
Server FACTS
Puppet
Compile Masters
Puppet
Node
Classifier
Latest Code
Releases
Puppet Console
Puppet Cluster
FACTs
FACTs
FACTs
Puppet
Specific
Classes
ENC
Endpoints
POST
ENC
Server Table
ENC
Role Table
ServiceNow
Global
Server
Table
Global
Business
Data
Table ENC
Environments
Table
Global
Relationship
Tables
JavaScript
RUBY
Patch
Ready
Endpoint
Automated
Patching
ServiceNow
Automated
Patching
Workflow
JavaScript
External Container
API Web Interface
Sinatra
GET
puppet last run data [new]
ops server check data [new]
Overrides
File
RUBY
run_ops_chk_script.rb
ops_server_chk.sh
/etc/puppetlabs/facter/facts.d/
tmo_ops_chk.json
tmo_ops_chk
GET
19. The Results
Pros:
• Gathers last Puppet Run data
• Gathers Ops Server Check status
• Provides internal API for checking
on server status
• Provides an external API for user
interface with ENC
• Provides a method for overriding
ENC values in the PNC
Richard Romanus
Cons:
• Still no table ACLs
• PNC becoming very customized
• Default Role entry is manual
• No auto deletion of decom servers
• No caching of ServiceNow data
(Defaults can be dangerous!!!)
20. The Results
Time saved with Automated Patching:
20min/server x 10k server = +3k hours
+3k hours of work saved per Quarter
+13k hours of work saved per Year
Richard Romanus
21. Timeline
Richard Romanus
• Release: Dec. 2018
• Provide Business Data in
ServiceNow to Puppet
• Implemented
Roles/Profiles
• Release: Aug. 2019
• Provide Server Status Data
from Puppet, to ServiceNow
• Provided a new
‘Patch Ready’ API for
checking server status
• Release: July 2020
• Update model for
gathering Business Data
from ServiceNow
• Improved the storing,
filtering, and formatting of
the Ops Server Check data
22. What’s Next – Ver 3.0
Richard Romanus
• Improved stability and performance of both ENC and PNC
• ACLs!!! (Finally)
• Auto default role entry for every Business Server type in the ENC Role table
• Store Operations ‘Server Check’ data into its own table
• Improved PNC handling of overrides
23. What’s Next
Richard Romanus
• Capture specified server FACTs and store them in ServiceNow.
• A User Interface (UI) page for users to access ENC settings/data (role, env, etc. )
• A UI page for users to see other server FACTS (software version, status, etc.)
24. Challenges
Richard Romanus
• Developing in ServiceNow for the first time. (with JavaScript)
• Working with non-standard data and fields
• Providing some of the Puppet Console functionality with the ENC
Hello, My name is Richard Romanus.
I’m a Software Developer at T-Mobile, and I’m here today to talk about Puppet ENC – A ServiceNow Scoped Application.
A little about me…
I’m married with two wonderful children, and when I’m not spending time with them, I love Running, Hiking, and Biking.
I have an extensive Telecommunications background, from AirTouch Cellular, to Sprint, Ericsson, and now currently at T-Mobile.
I also graduated from UW in 2013 with a BA in CSSE.
I started learning Puppet when I began working at T-Mobile, five years ago, and have been using it ever since.
At T-Mobile, like most big companies, we have a lot of data.
Unfortunately, not all of that data is centrally located.
In fact, in a lot of cases, our data is isolated into separate silos, or islands.
At T-Mobile, we are using Puppet Enterprise to manage a lot of our server configuration management, including building and patching of servers.
However, that configuration management was limited to only using the available Configuration Data, like server FACTs, and hard-coded Hiera or Param values.
We also use ServiceNow as our primary CMDB (Change Management Database), which is where all our Business Data lives;
That includes Information like Compliance data, Business information, and even physical location information, about a server.
However, we didn’t have a good method for retrieving that Business Data and using it for Configuration Management.
And that is where the idea for Puppet ENC came from.
The External Node Classifier is not new concept for Puppet, but it requires writing the code yourself to classify a server.
The ENC I created is made up of three primary parts:
The Puppet Node Classifier, or PNC, which is a Ruby script running on every compile master
The External Node Classifier, or ENC, which is a JavaScript script running on ServiceNow,
And the application tables in ServiceNow that store the values and data of interest.
I also use standard REST calls to communicate back and forth between the ENC and PNC.
This diagram shows the basic layout of the PNC and ENC, and how it connects.
In ServiceNow, there is a Global Space which contains a lot of our corporate CMDB data, and therefore has restricted access.
However, ServiceNow also has Scoped Application Spaces, where authorized users and teams can create their own applications within ServiceNow;
And, that is where the Puppet ENC resides, in a Scoped Application.
When a server’s Puppet Agent checks into the Puppet Master, the master calls out to the PNC which then gathers information about the server.
It gathers the FACTs from the PuppetDB, the Puppet Specific information from the Puppet Console, and then the server’s Business information from the ENC in ServiceNow.
The PNC then merges all of that data together and passes it back to the compile master for catalog compilation.
This is a more detailed view of the ENC and PNC with the other puppet components included.
It also shows some of the primary Global and Scoped application tables the ENC uses within ServiceNow.
Unfortunately, In the first version of the ENC, a lot of the Scoped data had to be manually added to the tables.
For the most part, the results were good. It performed well and was stable, even at about a million check-in a day.
However, I did find that by replacing the console with a custom ENC, I now needed a method for overriding the ENC values, like environment.
Before the ENC, we could override a server's environment with a FACT, using a rule set in the Puppet console, but with the ENC, that no longer worked.
Another issue I found was that, with no ACLs on the ENC tables or fields, and with no ENC API, it was hard to let anyone else have access to the tables.
For example, we never want a user to be able to delete the ‘Production’ environment from the environment table; Since that is the primary environment most servers are using.
One of the immediate benefit we gained from the ENC was the ability to use the Business data already stored in the CMDB, for classifying a server.
Even when a new server with no role comes online, it always has a Business group already associated with it in the CMDB.
This allowed me to create a default Puppet Role manifest for each Business group we interface with.
That also made it easier to create new Role manifests for other role types a group may have, like database, master or console.
However, as I mentioned before, with no ENC API and a lack of ACLs on the ServiceNow tables, the ability to update the ServiceNow fields was limited to my team.
This is a more detailed diagram showing how the ServiceNow Business Data and Role tie together with the puppet Roles and Profiles.
When the PNC calls out to the ENC, it only passes the server’s certname. The end point API in ServiceNow, looks up that name in the ENC server table.
It then uses that data to look up the associated role, in the ENC Role table.
From the role table, it gets the associated Puppet Class information and passes that back, with the environment and other business-related data, to the PNC.
The Puppet Compile Master, then uses the provided class data to compose the catalog and pass that back to the puppet agent, on the server.
The Parameter data that the ENC passes back to the PNC is available to all puppet modules as top-level variables.
All though Puppet ENC Version 1.0 worked, it was also very limited in that it was using a (one-way) GET call to gather data from ServiceNow.
However, one of the critical roles Puppet has played at T-Mobile for some time now, is the patching of our Linux and Unix servers.
But, with the challenges of schedule and coordination of server patching, there has been a lot of development on an Automated patching method.
Since ServiceNow is also where many of our UI request forms reside, that is also where the Automated patching UI and workflow reside.
So with the first version of the ENC, I solved the problem of getting Business Data to Puppet for building and configuring servers.
But now I had the opposite problem, where ServiceNow needed to know the state of the server to determine if it was really ready for patching.
However, when I realized that ServiceNow could easily use POST as well GET, I saw no reason why I couldn’t make the connection a two-way connection and pass data BOTH ways.
That lead to the code upgrade of the Puppet ENC and PNC to Version 2.0 which now uses two-way communication to pass data in both directions.
With the goal of improving automated patching, we needed a method of checking the status of a server before putting it into the patching workflow.
Changing the REST call from GET to POST between Puppet and ServiceNow, allowed me to pass the status of the server back to ServiceNow to be stored in a table.
I then created a new Endpoint API in ServiceNow that looks up the stored server status to determine if it is in a state ready for patching.
I also had to create a Ruby wrapper script that runs the Operations team, Server Check shell script, and save its values to a FACT file on the server.
This is an overall architectural view of Version 2.0, which is the current running version.
As you can see, I changed the ENC endpoint to use POST and created a new ‘Patch Ready’ endpoint.
The ‘Patch Ready’ endpoint is called by the ServiceNow Automated Patching workflow, whenever a user schedules a server for patching and also again before actual patching starts.
This provides our Operations team time to address the server issues, and if necessary, switch to manual patching for that server.
I also added a new external API interface that allows users access to their ENC data. The external API runs on a container with restricted access to the ENC.
It allows users the ability to read and make minor changes, while still restricting overall access to any other ServiceNow data.
And, as I mentioned earlier, I also had to create a wrapper script that runs the Operations server check script and save the output to a FACT file.
The results for Version 2.0 have been good.
It performs well and is stable even at about 1.5 million check-in’s a day.
There have been many new benefits, related to the gathering and using of the server data.
The external API, despite being very limited, has also made the ENC more usable by other teams, especially since I still don’t have the table fields locked down (with ACLs).
However, some of the other cons that exist in the latest ENC/PNC include: Extensive customization of the PNC, No auto creation of a ‘default’ role entry for each Business group in the ENC Role table, No auto deletion of decommissioned servers, and no data caching of the ServiceNow data.
As we have become more reliant on the ServiceNow data, having that data always available in our Puppet code is very important. Without it, we tend to fall back to ‘Default’ settings.
This may be okay in certain situations, but in other situations, that can be very dangerous, for example location information.
Despite the list of Cons, the benefits of the ENC can easily be seen with the time we save with Automated Patching.
The ‘Patch Ready’ API is a key part at ensuring our Automated Patching workflow runs stable and successfully, and the time saved by Automated Patching has been huge.
Taking into consideration the communication and coordinating between teams, I estimated we are saving around 20 minutes of time per server, with automated patching.
With about 10K servers using automated patching, that saves us over 3,300 hours per quarter, or about 13k hours per year.
The first version of the ENC/PNC was released in 2018. There were some minor releases after that to fix some minor issues and to add some basic functionality.
But then Version 2.0 was released about 9 months later, in August of last year.
Version 3.0 of the ENC and PNC are currently in Beta testing and are on schedule to release next month, July of 2020.
I re-wrote most of the code for both parts, however, the overall architecture hasn’t really changed a lot.
I had to make some table and API changes for a new Business data model that T-Mobile is switching too, since the Business data is at the heart of the ENC classification.
Version 3.0 of the ENC and PNC is basically a total re-write but should be more stability with improved performance.
Now that I have a better understanding of the internal interactions within ServiceNow and Puppet, I believe the new version should be more stable and better at handling non-standard data it encounters.
I also FINALLY add some ACLs to some of the key critical table fields, like Environment.
I also added Auto creation of the ‘default’ role table entry for every server Business type that checks-in.
At the request of our Operations team, I made the captured ‘Server Check’ data in ServiceNow, more usable.
I moved the data to its own table that can be searched and sorted by Ops personnel to help identify common server failures.
In the current version, 2.0, it’s just a big block of string text that is not very usable.
I also, in the PNC, updated the Ruby Overrides file to be a YAML Configuration file that can store both PNC Configuration data and Overrides.
This makes the PNC more generic and less customized for each of our Puppet clusters.
One of the major benefits of creating your own classifier, is the ability to expand its functionality as I see fit.
Some features I’m looking at adding are…
Adding a new table for storing specified Puppet FACTs in ServiceNow that can be used in workflows or viewed by privileged users. For example, application teams that own the servers.
But ServiceNow is not JUST a CMDB; It is also where many teams go to interact and order, new features, or user access, or even new hardware, like servers.
So, with the two-way communication between Puppet and ServiceNow, and with a common UI that many teams are accustomed to using, there are numerous opportunities for creating custom UIs for our data.
For example, it would be easy to create a UI that shows the server owner the current version and/or status of an application on their servers, if that data was already being stored as a FACT.
There was definitely no shortage of challenges on this project.
Developing in ServiceNow, for the first time, was a big first step. Understanding all the tables, fields, workflows, and relationships between everything was a struggle at first, and I’m still learning new things from time to time.
Since ServiceNow is our primary CMDB, there is A LOT of data in there and unfortunately not all fields and tables are standardized. Therefore, I’m often forced to work around some of the edge case data we retrieve.
As I mentioned earlier, using the ENC as our classifier removed some of the functionality from the Puppet console, for example, the ability to create override rules.
Thank you for attending my presentation.
Here is my contact information at T-Mobile.
Feel free to reach out to me with any future questions.