Hear directly from Kaplan’s Vice President of Business Systems and Architecture, who will discuss how he partnered with Datavail to augment his team using robust technical discovery, SQL Server Health Checks and assessments, knowledge transfer, runbook documentation, weekly meetings, status updates, and continuous service support and improvement. Hear their challenges and experience real-world examples on how to tackle them.
3. Free online webinar
events
Free 1-day local
training events
Local user groups
around the world
Online special
interest user groups
Business analytics
training
Get involved
Explore
everything
PASS has
to offer
Free Online Resources
Newsletters
PASS.org
4. Download the GuideBook App
and search: PASS Summit 2018
Follow the QR code link displayed on session
signage throughout the conference venue and
in the program guide
Session
evaluations
Your feedback is
important and valuable.
Go to passSummit.com
3 Ways to Access:
Submit by 5pm Friday, November 16th to win prizes.
5. Our Range of Data Services
Health Checks &
Assessments
Project Services
Upgrades Data Migration
OBIEE &
Hyperion Consulting
24x7 In-Office
Coverage
Operational Managed Services
Monitoring & Incident
Response w/ SLAs
Service Requests
(Patch, Modify, etc.)
Multi-factor
Monitoring
Proactive Services
(Health, Tune)
Structured Service
Review
Data Development Services
Development, Tuning,
Automation
Data Warehouse
Build & Optimize
DevOps
(Deploy & Automate)
6. Win Shoes
Fill out the session survey
for a chance to win an
exclusive pair of Datavail’s
running shoes!
7. Evan Krakauer
Director, Datavail
/evankrakauer
@ekrakauer
Account Management
With 20 years of account management
experience, Evan delivers a top-notch
customer experience to companies of all sizes.
Focus on SQL Server
Evan has spent most of his 8-year journey at
Datavail focusing on delivering a best-in-class
customer service experience to SQL Server
customers like Kaplan.
4 Years Serving Kaplan
Evan has served as the account management
lead to Kaplan for the past 4 years and
counting.
8. Luca Fagetti
Director of Database
Administration, Kaplan
/lucafagetti
18 Years of SQL Experience
From working with SQL Server as a DBA to
managing DBA teams, Luca has an expert-level
grasp of the technology.
Notable Accomplishment
Luca established a SQL 2000 replication
topology across 40 servers for DR and
maintenance purposes. Result: site fail-over in
< 8 minutes.
IT Background
Since 1982, Luca has been working with
technology, holding multiple dev roles in
multiple companies and industries in Europe
and the U.S.
9. JP Chen
Director and Practice Leader,
SQL Server Practice, Datavail
Enterprise SQL Support
SQL Performance Monitoring &
Tuning
SQL Server Blogs & Whitepapers
Both as a DBA and DBA Team Manager and
Director, JP brings a wealth of technical
certifications and hands-on experience to
every project.
Both as a DBA and DBA Team Manager and
Director, JP brings a wealth of technical
certifications and hands-on experience to
every project.
Eager to share his knowledge with the larger
SQL Server community, JP is an avid blogger
and author, posting regular content on
Datavail.com.
/jp-chen-2167439
10. Agenda
• 7 Key Learning and Best Practices in
Optimizing Database Administration
• Success Stories
• Who’s Datavail?
• Who’s Kaplan?
• Q&A
11. 7 Key Learnings and Best
Practices in Optimizing
Database Administration
12. Meet David,
he’s a
Senior SQL
DBA
David is a Senior SQL
DBA for a major
retailing company
based in NYC. David
works hard.
13. Meet David, he’s a Senior SQL DBA
8:00am
Review the Starting
Of Day reports and
set priorities on the
activities he must
do for the day.
9:00am to
10:00am
He resolves the job
failures and disk
space issues that
happened overnight.
10:00 to 12pm
Attends the cloud
migration project
meeting and is made
aware that he must
complete the
migration from his on-
premise Data Center
over to Azure before
the end of the year.
1:00pm
A business user
complains that one of
the major reporting
applications is running
slow. David joins the
conference bridge to
identify the
performance
problems.
2:00pm to 3:00pm
Identifies the missing
indexes and starts
testing on the
development
environment and is
excited to find the 93%
performance gain. Then,
he opens a ticket to
deploy the changes to
the staging environment.
14. second half of David’s day
3:00pm to 4:00pm
Completes the deployment of the
new indexes to the staging
environment and opens a ticket to
release to production later in the
evening at 8:00pm after everyone
is off the system.
4:00pm to 5:00pm
David talks to his manager
about the training plan for
the Azure migration
project as he’s new but
excited to learn more
about Azure.
5:00pm to 6:00pm
Drives home.
6:00pm to 7:00pm
Dinner.
8:00pm to 9:00pm
Remotes in and release
the missing indexes to
the Production SQL
Server instance.
3:00am
Gets a call out
from the help
desk because
one of the
Production SQL
Servers has only
2% disk space
left.
15. 7 Key Learnings and Best Practices in
Optimizing Database Administration
1. Technical Discovery
2. SQL Server Health Check and Assessment
3. Knowledge Transfer
4. Runbook Documentation
5. Communication and Escalation (C&E) Guide
6. Weekly Meetings
7. Continuous Service Support
and Improvement
16. Technical Discovery
• Get the inventory of the SQL Servers in
your network
• Organize SQL Servers by owner, location,
function or any other categories as needed
• Find out the servers with the most business
critical databases and the ones with most
transaction activities
• Use Microsoft Assessment and Planning
Toolkit (MAP)
• Sample MAP report
17. 0
SQL Server Health Check and Assessment
• Assess the current health for the 5
critical areas: Availability, Recoverability,
Integrity, Performance, and Security
• Identify the settings and configurations
to make changes
• Develop the action plan
• Schedule monthly or quarter
health checks
• Sample SQL Health Check and
Assessment report
18. Knowledge Transfer
• Identify and list the key skills and
domain knowledge that the support
team needs to know
• Schedule the training sessions to meet
the timeline required
• Plan the shadowing and reverse
shadowing
• Measure and evaluate the knowledge
transferred
• Sample of knowledge transfer checklist
19. Playbook Documentation
• Document the critical Standard Operating
Procedures (SOPs) and common
troubleshooting steps or walk-arounds
• Expand the knowledge to the support team
• Update and revise as processes change or
new more efficient methods are discovered
• Playbook template
Q: What additional items would you suggest to add to the Playbook?
20. Communication and Escalation
(C&E) Guide
• List the key contacts email
addresses and phone numbers
• Document the Service Level
Agreements (SLAs)
• Clarify the communication
protocols and escalation steps
• Avoid surprises
• Sample of a C&E guide
21. Weekly Meetings
• What are the major accomplishments
since the last meeting?
• What are the escalations and open
items?
• What are scheduled or planned tasks
for the incoming weeks?
• What are the recommendations?
• Sample of a 4UP report
Q: What’s in your meeting agenda?
22. Continual Service Support and
Improvement
• Proactive monitoring and auto-ticket generation
• Assign ownership for the tickets
• Document troubleshooting steps and
solutions to reduce the efforts required
for future occurrences
• Align and re-align database services and
support to the changing business needs by
identifying and implementing improvements
26. Visit to La
Crosse, WI for
an immediate
support
transition for
KPE
Success
Stories
27. • DMX migration
• 3-days non-stop migration
involving both the Kaplan and
Datavail support teams
• Migration completed successfully
Success
Stories
28. • Unexpected loss of
a disk drive due to
SAN maintenance
during the weekend
• Datavail bridges the
gap
Success
Stories
29. • On-site dedicated SQL DBA
• The “bridge” between the two organizations
• The “key” in the relationship with other IT and
non-IT teams
Success
Stories
31. Distribution of Technical Work
& the Impact on Staff
NO
TIME
LEFT
StrategicHighValue
Architecture
Project Work
Planning & Strategy
Operations
Engineering
Production Support
Tier2Tier3
Time spent
by senior
people in an
enterprise
Without 24x7 production support
• Doing low level work
• Being awakened for emergencies
• Frustration in both the staff and management
• Results in job dissatisfaction and turnover
With 24x7 production support
• Far better use of senior resources
• Senior staff do the work they were meant to do
and want to do
• After hours production work essentially
eliminated
NO
TIME
NEEDED
Tier3/4
36. Win Shoes
Fill out the session survey
for a chance to win an
exclusive pair of Datavail’s
running shoes!
Editor's Notes
Over 20 years of account management experience with various size clients ensuring they have a great customer experience.
Evan has been working at Datavail for 8 years with a highly focused area on SQL Server and SQL development support.
Evan has been working with the Kaplan team since 2014.
•18 Years as SQL server DBAand Manager of DBA teams
•Major technical accomplishment with SQL Server: Established a SQL 2000 replication topology across 40 servers (20 of which in two-way replication) for DR (quite useful during Hurricane Wilma in 2005)and maintenance purposes (intra-site fail-over in < 8 minutes)
•Previously ha covered multiple roles in IT (Development/Support/QA/Release Management) across industries, technologies, countries in Europe and US
•Kaplan Higher Education is now supportingthe newly branded Purdue University Global (formerly known as Kaplan University) .PG was created by acquisition of Purdue University to extend Higher Education offering to the adult population of Indiana as part of their charter.Kaplan Higher Education has developed a unique platform to provide Higher Education programs to adults and veterans
Let me tell you a story that many of us may very well familiar with or live through similar experiences. David is a Senior SQL DBA for a major retailing company based in NYC. David works hard.
8:00am: Review the Starting Of Day reports and set priorities on the activities he must do for the day.
9:00am to 10:00am: He resolves the job failures and disk space issues that happened overnight.
10:00 to 12pm: Attends the cloud migration project meeting and made aware that he must complete the migration from his on-premise Data Center over to Azure before end of year.
1:00pm: A business user complains that one of the major reporting application is running slow. David joins the conference bridge to identify the performance problems.
2:00pm to 3:00pm: Identify the missing indexes and start testing on the development environment and excited to find the 93% performance gain. Then, opened a ticket to deploy the changes to the stage environment.
3:00pm to 4:00pm: Completed the deployment of the new indexes to the stage environment and opened a ticket to release to production later in the evening at 8:00pm after everyone is off the system.
4:00pm to 5:00pm: David talks to his manager about the training plan for the Azure migration project as he’s new but excited to learn more about Azure.
5:00pm to 6:00pm: Drives home.
6:00pm to 7:00pm: Dinner.
8:00pm to 9:00pm: Remote in and release the missing indexes to the Production SQL Server instance.
3:00am: Gets a call out from the helpdesk because one of the Production SQL Server has only 2% disk space left.
Does all this sound familiar to you?
What if someone tells you that you will have a team to help you on the Azure migrations project and you will not have to get up 3:00am in the morning to triage the disk space issue, wouldn’t that be nice?
Well, we are here today to share with you that all this is possible. We will share with you the strategies and best practices in optimizing database administration.
We will start off with the Technical Discovery process
Then move on to SQL Server Health Check and Assessment
Knowledge Transfer
Playbook Document
Communication and Escalation (C&E) guide
Weekly meetings
And finally Continuous Service Support and Improvement.
Please note that major if not all these best practices and process apply to whether you have only 1 DBA or you have 100 DBAs in your team or if you have 1 SQL instance or 1000 in your database environment.
As we complete the review and discussion each of the steps, we will open up a brief 3 to 5 minutes Q&A session. In case you have questions that requirement more time to discuss, we will ask you to please stay after the session and we can then further discuss or follow up.
Let’s get started!
Question: If you are elected as president of a newly discovered country, what would be one of the most critical first things you would do to be a successfully leader? Let’s put aside the thought of declaring yourself king or queen for a second. Most likely, the most practical first things to do is to establish a team of national geographic experts and task the team to create you a map of the country so that you will know your country’s geographical areas and utilize the resources efficiently to help in future planning.
In similar fashion, if you have been chosen as the guardian of the company’s data – the SQL Server Database Administrator (DBA), the very first thing you should do is to find out the SQL Server instances, their functionalities, and specifications such as version, edition, service pack, clustering, collations, databases, data files, log files, and etc…
If your boss picked a random database and asked you to describe the application using the database and what department is using it, can you tell your boss the answer right away?
If you had the database inventory and documentations, you will be able to provide the answer rather quickly.
In addition, the inventory will also help with SQL Server license tracking. It will enable you a view of the list of SQL Server builds, versions, and editions to provide the relevant data for you to check your licensing compliance.
Going back to our story about you becoming the king or queen of a new country – I mean president. Wouldn’t you want to know the most critical areas of your country and the best locations to build bridges to collect tolls and which areas have the most resources? In other words, having a map to help you have a clear view of your land. Similarly, as a DBA, it’s paramount to have clear documentation on what are the servers hosting the most critical databases and the ones with the most transaction activities.
Furthermore, a good server inventory can help you with server and database decommissioning strategies. It will help your team’s decision on which server or databases to take offline. Save license cost and maintenance efforts.
Question for your audience: Have you had the fun experience making the decision to whether decommission the server or not? You ran a trace file and see activities coming from the application servers. But, when asked around, no one is using the applications you had identified. What do you do? Well, one of the CTOs we had worked with asked we shut the server off and if no one complained for two weeks, we will then decommission it. Take the full backups, script out the jobs, permissions, and all the server objects and then shut it off.
There are a good # of free and commercial tools to help you take inventory of your SQL Servers. I will share with you the amazing example using Microsoft Assessment and Planning Toolkit (MAP). Please note this is an option. But, not the only option. The idea is to make sure you document and organize your server inventory.
Similar to our annual health check up with our doctors as it is usually important to know our health and well-being, we need to have SQL Server Health checks and assessment.
Unlike the recommendations from my doctor about my health such as reducing my consumption of Krispy Crème donuts and Single Malt Whiskies, making adjustments on our SQL Server instances can actually be done if you had planned them correctly and adding justifications for the server owners or stakeholders. You need to outline the steps and changes required, get the required approvals, test them on the non-production servers, and then roll out on the production instances after approval.
Questions for you audience: There so many areas to check. Which ones should you focus on?
Well, DBAs are responsible for the databases availability, recoverability, integrity, performance, and security, and these are the items need to be the focus for the health check and assessment.
Availability:
Check the SQL Server status: online or offline. If a person is no longer breathing, there’s no need to continue to check the health. The priority will then need to shift to make sure you get the person breathing again before anything else.
SQL Server Agent: online or offline. If offline, then none of the jobs can run.
Recoverability:
Check the database backup or the lack of.
Know where the backups are stored for retrieval purposes.
Integrity:
Check the results of the last DBCC CheckDB run.
Review the SQL Server error log for the database integrity issues.
Performance:
Check the performance configuration best practices
Memory configuration: min and max server memory
Max degree of parallelism and cost threshold for parallelism
Enable Optimize for ad hoc workloads
Check Tempdb confgurations
Check database file placements
Check database auto growth setting. If it’s set to 1mb for a 100GB database, it is within your best interest to change it.
Check index fragmentation and missing indexes
Check the top wait stats
Top 10 long running queries
Top 10 queries use the most io
Security:
Check for Principle of Least Privileges (PoLP): the practice of limiting access rights for users to the bare minimum permissions they need to perform their work
No one should have more permissions to do their work required. The same goes for applications and people.
Check the job ownership. Story: On a Monday morning, more than 10 jobs failed on the production server because the SQL DBA whom had created the jobs had left the company last week. He had happily retired. IT had disabled his AD account on Monday morning. Therefore, the jobs owned by the user failed. Your boss went into a panic mode because the Monday “Flash Report” job had failed. What do you do? Knowing the fact that job only reads data and you can re-run it anytime. You change the job owner to SA and then re-run the job. Problem solved.
Please note that it is not good enough to do the health check, you must identify the actionable items to work on to work on and set priorities to work on them to improve your health. The same goes for your SQL Server Health Check. You need to identify the settings and configurations to make changes on and their priorities.
I have a personal story to share with you. I have similar excuses like many of us do about going to the gym. No time. Too Tired. Football is on and then Game of Thrones is on. So on and so forth. So, I bought a Bowflex Xtreme 2 SE Home Gym. Another problem solved, I supposed. Every Monday, I would check my weight, write down the # of pounds. Then, I would workout once in a while throughout the week. Next Monday comes, I will check my weigh and write down the # of pounds again. Let me tell you something. Many Mondays, I would hesitate to call the weight scale machine manufacture to file complain on the inaccuracies. But, my wife had assured me that there’s nothing wrong with the machine. The problem is with the user.
Similar to me writing down the # of pounds every Monday and comparing it to the previous week and month, I would suggest you do the same for your SQL Server Health checks and assessments. Baseline and benchmark. Compare the before and after your server changes. That’s only way to find out if there are improvements after your changes.
Unlike your annual health check, you need to check your mission critical servers more frequently. Every month or every quarter depending on the criticality and the health concerns.
Here’s an example of the SQL Server health checks that our team had worked on. If we take a look at the “Executive Summary”, we will see the status of each of the checks if they are configured per best practice, not configured as per best practice, and if further review required.
An investment in knowledge always pays the best interest.Benjamin Franklin
The objective of knowledge transfer in our case is not to replace someone who’s retiring. But, to expand the support team’s knowledge supporting the environment to prevent or avoid 3:00am call outs to the Subject Matter Experts (SMEs).
It’s not good enough to know how to do it yourself, you need to share your knowledge with the support team so that you will have around the clock 24*7 support. Unless you really wanted to get call outs 3:00am in the morning. We all been there, waking up 3:00am in the morning to triage a priority 1 problem. Wouldn’t it be nice if we had a team of DBAs and it’s actually 12:30pm for them when it’s 3:00am for you? They can following steps outlined and triage the priority 1 problem and keep you update through tickets. You go into your office 8:00am in the morning with your coffee in hand and review what had happened and follow up if needed rather than waking up 3:00am in the morning.
Develop a plan of action to ensure the capture of that critical knowledge and a plan of action to transfer it. Knowledge transfer is not standing around the water cooler or having a one-time meeting to chat about the role, functions, etc. It’s is a purposeful and ongoing strategy with measurable results.
One of my favorite movie quotes is in the movie “Speed” where Dennis Hopper tells Keanu Reeves, “Pop Quiz Hotshot…there’s a bomb on the bus. Once the speed goes up 50 miles per hour the bomb is armed and if it drops below 50 it blows up. What do you do? What do you do?”
What if in real-world scenarios, one of your mission critical production instances goes down due to power outage and you need to bring your Disaster Recovery (DR) instance up to meet your SLA. You have your boss, your boss’s boss and the CEO calling your desk for answers. What do you do? What do you do?
Well, “Life is not always a matter of holding good cards, but sometimes, playing a poor hand well.”
If you had documented the Disaster Recovery (DR) SOP in your Playbook, you can remain calm and tell them that you and your team had documented the disaster recovery processes and the processes were agreed and signed off by management and you will now activate the Disaster Recovery SOP and will provide updates every 30 minutes. Wouldn’t that be nice?
The Playbook is a dynamic and living document. As processes change, you need to update and revise it. As you and your team discover more efficient methods of solving it problem, update it.
Run “fire drill” on the critical process such as the DR testing monthly or quarterly as agreed with your management team. All teams involved and all steps plus responsibilities must be clarified ahead of time. This is not a game or a movie. You must have your answers to whatever “Pop Quiz” they throw at you or you will need to get your resume updated if you fail the pop quiz.
Imagine you are the SQL DBA, it’s 3:00am in the morning, you had received alert and a call out that a mission critical production SQL Server instance is down. You had also noticed multiple alerts from emails pointing to storage failure. Do you know the contact and the phone number to call to get the storage team or the system admin team to help troubleshoot the problem? If not, we will recommend you start creating a Communication and Escalation Guide. Even if you do know who to call, we still recommend you to document the contacts and phone numbers and update them as needed.
This will clarify and set the clear expectations. Whomever on shift can not be confused what to do or the contact number to call and whomever gets the phone call can’t be complaining or surprised by the phone call.
Responsible parties or contacts change often and so are their phone number, review the Communication and Escalation guides monthly or quarterly and update them whenever needed.
Someone once told me
“Emails need to be similar to dentist visits:
Short
To the point
Only when necessary
“
For meetings, I had created similar rules:
1. Know your agenda.
2. Stick to your agenda.
3. No agenda no meeting.
“If you don't know where you are going, you'll end up someplace else.”
-- Yogi Berra
For meetings, you need to know what you want to discuss or else you will waste everyone’s time.
For technical status update meetings, we recommend you cover the 4 major items:
Major accomplishments since the last meeting
Escalations and open item
Scheduled or planned tasks for the incoming weeks
Recommendations
Capital One commercials always end with the quote “what’s in your wallet.” Here, I will ask “what’s in your meeting agenda?”
We can’t realistically expect DBAs to keep their eyes on their monitors at all times and look out for job failures, disk space running low, or service interruptions. You need a monitoring tool to do that and in case of deviations or thresholds breached, you will need to get notified. In addition, a ticket gets opened and a DBA assigned to the ticket to triage and follow up to make sure the issue gets resolved with all progress and updates documented in the corresponding ticket.
Once the problem gets resolved, it is important that we create new Knowledge Based articles or update the playbook based on new discoveries of more efficient ways to resolve specific problems. Not all problems have solutions, in those cases we will need to agree on the walk-around and document it for the whole team to understand the expectations for future troubleshooting purpose.
Technology changes constantly and consistently. The only way to keep up is to constantly and consistently improve your processes and environment.
As per Charles Darwin "It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is most adaptable to change."
We must also adopt to the business requirement changes and constantly improve to survive in addition to keep up to date in our support of our critical database environment.
Managing multiple divisions and technologies with a mixof FTEs and contractors to cover 24/7 operations
•Processes
•Documentation
•Clarity
•No assumptions
•Ticketing systems
•Alerts
•Escalations
DV, we have a problem…
•A 5 hours outage sparked decision to get the heck out of those servers and storage
•Migrate and upgrade 3 major clusters with 5 instances and 15 databases to new configurations and versions
•Seamless restart of operations, including 2 TB of data thru SQL replication to be rebuilt from scratch
•Just a weekend
•The “Oh no, dang” momentwhen you realize you lost critical pieces or a critical DB @ 3 AM Sunday morning…
•You know what to do, but you need fresh and skilled resources to support/validate (and re-assure higher management) recovery process
•The “insurance” that you want when an hurricane struck even if once every 10 years
•The main conversation is aroundthe improved feeling with other teams used to have a dedicated and familiar face during critical deployment and times.
•Shared + remote works when an organization is built around the same concept, established organization where FTE-only was the basic model have difficulties in accepting shared resources.