2. DBA and performance tuning project
One of the biggest problem we have in Mimecast is system performance. As our Data
Warehouse is growing rapidly and becoming more and more complex it is difficult to
monitor and control the performance. Some procedures took long time to execute
which increased total time to have data refreshed. We had errors caused by lack of
memory, hard disk I/O (latches). Users reported it takes very long time to load a
dashboard. The problem was we had almost no insight into the issues we were facing
in production. As part of my duties was database administration I initiated a
performance tuning project to develop a set of procedure and extended events which
collect various system counters including CPU, memory, disk usage store this
information in database and later analysed with SSRS reports and dashboards in
Tableau to get a complete performance picture of the SQL server environment.
We found out that the performance issues can be divided into 3 main groups:
1. Tableau performance
2. SQL Server performance
3. Network performance
3. 1. Tableau performance
Tableau Data Extract project
As most of our dashboards are very complex and highly interactive. All of them are live connected to our SQL server. As
Tableau is as fast as our data source we need to decided if it is the best access method for our dashboards and try to
replace live connections to use a Tableau Data Extract (TDE).
I developed a script in C# generating TDE files using Tableau Data Extract API. The files then are put into the Tableau
server and all datasources using Tableau extracts are refreshed. It resulted a significant improvement in dashboard load
time.
Tableau Availability project
Our Tableau data is replicated in different Tableau servers but due to high cost of licence for Tableau Failover solution we
need to have an in-house cost-effective custom failover solution without compromising quality which allows us to switch
between our Tableau servers automatically in case one of them fails. This solution gives us ability to save our Tableau
data is safe and have our BI dashboards available even in the event of a major failure. We also manage Tableau backups,
making it easy for us to restore when needed, including point-in-time recovery.
The solution I developed includes the following components:
Backup and maintenance job on the primary server nightly
Restore the backup primary on the secondary server
A script starting the secondary Tableau server when the primary server goes offline/becomes unreachable.
I suggested the following future developments for this subproject:
Create a landing page on corporate portal/website from which all Tableau users redirected to the active server.
Consider a replication solution for PostgreSQL database underlying Tableau
Introduce VM server availability solution by issuing ping command on remote Tableau server and running Powershell
remote commands in case the VM is not available.
4. 2. SQL Server performance
Suggestions I proposed for three main bottleneck areas:
Execution plans optimisation
Review existing indexes usage, introduce new indexes
Introduce native compile procedures (CLR)
Memory
Using SSIS as it uses its own memory more efficient
Improve tempdb performance by
storing it on a flash memory
using a new feature (2014) ‘sort in tempdb’
Introduce in-memory OLTP table
IO (latches)
Introduce partitioned views having physical fact tables stored in different file groups on different disks
As a part of this project, I completed a subproject called SSIS Lineage. The result was a new version of SSIS event handlers, which allows us to capture detailed
audit information about an execution of the package.
I also developed Business Intelligence documentation, which includes:
• DWH documentation (stage, DWH tables, views)
• Processing logic (stored procedures)
• SSAS documentation (cubes, measure groups, dimensions)
5. Integration solution
When I joined Mimecast, they needed to improve an integration stage of their BI system. Massive data requirements put
significant strain on our ability to load and process data quickly and created operational bottlenecks associated with data
loading and processing. As these operational challenges had direct impact on Data Warehouse downstream we decided
to replace existing integration approach with a new one based on vertical data partitioning schemes.
The solution I developed provides real-time integration with 305 tables of our source system Netsuite. Apart of
partitioning, it uses merge statement, Slowly Changing Dimension transformation and SQL Agent schedule executed a
job on indefinite loop.
By implementing an integration solution I developed, we were able to make significant productivity enhancements in
data loading and maintenance, which allows us to shorten the refresh time from 25 to 10 minutes. Scalability and
reliability of integration solution enabled us to meet data management challenges. The solution also provides availability
of the staging data as stage tables not locked for reading and now available all the time without locks because of loading
data with a switching partitions technique is a meta-data operation, which is competed practically instantly.
Measures I suggested for future development in order to improve staging integration:
Set SQL server memory limits to unlimited (use all available to SQL Server memory)
Replace lookup tasks with merge statements based on Kimball methodology
Re-develop existing SSIS packages to use streamline approach instead of memory based by replacing dataflow tasks
with ones processing data flow line by line rather than loading the whole set of data into the buffer
Replace insert SQL Destination step in Data Flow tasks with OLE DB Destination
Rewrite a stored procedure loading fact tables into DWH by removing CurrentRecord column from a clustered index
and include this column in a non-clustered index to eliminate 'Halloween effect’ which significantly decreased
performance of Data Warehouse.
Use alternative methods to download data into the staging phase (RESTlets)
6. DataWarehouseSolutions- ServiceDeliveryDashboard
I was involved in the several BI projects where I develop Data Warehouse solutions to expose data (Datamart) in Tableau
dashboards. Developing a solution for dashboard involves preparation of specific datasets that to demonstrate specific
concepts. Tableau developer then use this dataset (Datamart) and create an example proof of concept dashboards for
clients and management. General requirement for Datamart I had to develop is to ensure that data is easy to use yet
functionally rich. In order to meet this requirement Business Intelligence developer need to understanding complex
underlying data because although we have all data from our source system in our Data Warehouse, turning this data into
meaningful business information was presenting challenges.
One of the project I was involved was Service Delivery Dashboard. One of challenge I faced during this project was to
calculate First Time Response metric - number of business hours between ‘Case opened’ time and ‘Response sent from
Mimecast’ time. The metric should take into account customer’s local time including daylight saving, weekends and
public holidays. In fact, the timestamp captured in our source system Netsuite reflects GMT time zone. What I had to
develop is a solution that converts GMT time to customer local time and calculates the number of business hours
between requests and response.
The solution I developed involves extracting data via web scraping of various public websites providing time zones,
daylight saving and public holidays information. I utilized versatility of Ruby language using Nokogiri library and develop a
script that perform all data collection tasks. Then I import data with a SSIS C# script component into relational database,
perform various cleansing and conversion transformations.
Challenges I faced developing this solution:
• Some holidays are not nationwide, observed only in selected states/territories.
• Customer location data can be clean mostly manually
• Missing information on daylight saving time for some countries.
• I had to use advance business logic like using secondary address in case the registration address not actual office
address for companies registered on areas with special tax regime (BVI, Jersey, Cayman Islands etc.)
7. DataWarehouseSolutions- SalesEnablementKPIs&Lead
indicatorsDashboard
I selected and develop a unified view with all Lead Indicators used in Sales Dashboard.
The entities I suggested to use as Lead Indicators:
The solution I developed included the following components:
stage and DWH tables
switching partition infrastructure (function and scheme)
SSIS packages
stored procedures which implement business logic and load transformed data into data
mart.
Demos
Webexes
Pipeline movements
Targets
Meetings
Activities
CX Activities
Deal Registration
Incentives
Opportunities
Quota
RAMPACT
Sales Forecast
Sensitive Connects
Survey Pre Sales
8. DataWarehouseSolutions- GTMDashboard(Go-To-Market
MarketingDashboard)
The goal of this dashboard is to provide multi-level visibility into the efficacy of Mimecast’s
field and group marketing efforts, and to attribute pipeline creation and won business to the
correct source. GTM Dashboard enabled users to make informed decisions around marketing
campaigns and focus, as well as provide KPIs to track the efficiency and effectiveness of
Mimecast’s marketing against agreed targets.
My role in this project was to develop a Data Warehouse solution, which consists of 2
sections:
Campaign Analysis
Pipeline Creation Waterfall
One of the challenges I faced during this project was to implement AMT (Attributable
Marketing Touch) model, which is an attribution model that allows a 90-day window after a
marketing campaign during which any pipeline creation is attributed to that campaign. There
is no weighting of deals between multiple campaigns, so single opportunities can be
attributed to multiple campaign touches if they all fall within the 90-day window. In addition,
marketing touches 7 days after opportunity creation are given credit due to current
process/timing issues with marketing data imports.
In addition, I needed to make sure that marketing history data is de-duplicated by campaign
and date so that multiple contacts from the same lead attending the same event on the same
day were counted as a single marketing touch.
9. DataWarehouseSolutions- OpportunitiesDashboardproject
I developed Data Warehouse solution which meet business requirements on calculating
conversion rates (sales funnel) – new business pipeline created in a user-selected time period
from Lead to Opportunity Creation, from Opportunity Creation to Qualification, and from
Qualified Opportunity to Won Business are calculated for each strand, and overall.
One of the biggest challenges I encountered during this project was to develop custom
integration solution based on Netsuite saved search capabilities using RESTlets, web-service,
xml processing in script components in C#. In order to meet user requirements for this
dashboard I needed to bring data joined by our source system Netsuite. After profiling data I
realized that Netsuite tables in our Data Warehouse cannot be used as they lack of necessary
keys. I developed a custom integration solution using so called ‘saved searches’ which can be
called within C# script component in SSIS, then process the result as xml and load into our
Data Warehouse.
Another challenge associated with this project was to developing a stored procedure which
implementing data processing based on very complex business logic.
For example, in calculating Highest Active Status metric for Closed Lost Opportunities it needs
to be the stage it was at prior to being lost.
Within this project, I also develop Opportunities OLAP cube, which was used to create
dashboards in Tableau.
10. SAM integration and Data Warehouse solutions
The aim of the project was to make best use of the company's existing operational data currently held
in silos within the large internal operational PostgreSQL database called SAM containing the detailed
network and operations records on delivered services (pre-Big Data). The idea of using operational
data for sales intelligence was not new. Mimecast had wrestled with this problem for almost ten years
and recognised a need to develop a BI solution to bring this data into Data Warehouse.
I developed an integration and Data Warehouse solution bringing datasets from SAM to Data
Warehouse and process them to be used as data sources in various Tableau dashboards. One of the
requirements was to ensure that the solution was scalable to incorporate information from the source
system.
The challenges I faced during this project:
make conversion datetime parsed from varchar date type into datatime format depending on the
server as our developing and production servers span across different continents (US and UK data
format)
workaround buffer limits in SSIS trying to download large amount of data by running multiple queries
selected by id or timestamp
using open source PostgeSQL drivers for C# script components
using a listeners/switch in script component in C# in the process of SAM migration to download data
from the new scheme when it's ready.
11. IntegrationsolutionforCorporateGoals Dashboard
I developed an integration and Data Warehouse solution processing SAM data and
prepare it to be exposed to Tableau were developed.
The challenges I faced during this project:
complex integration solution to download nearly 'Big Data' (PostgreSQL) most
efficient way
complex processing logic applied - flattening, ranking, pivoting
12. Excelintegrationsolution
I developed an integration solution, which provides a seamless integration with Excel files
with various manual adjustments in Data Warehouse.
The solutions consists of the following steps:
Uploaded Excel files are checked for integrity.
Then a user is emailed the status of Excel file informing if the processing was successful. If
integrity test failed an email includes details about errors – column in case of data type error
or name of the worksheet in case it is missing etc.
The challenges I had during this project:
Unable to query Excel by T-SQL via linked server with 64bit driver.
If File Task is placed after a Data flow using Excel connection in a control flow we have an
error "The file is used by another process". To tackle this issue I separated SSIS packages in
SQL Agent job.
Additional data validation layer in Excel template was introduced to eliminate possible
errors on data entry level. The validation is based on conditional formatting using advanced
Excel formulas.
13. Merging(deduplicating)project
I developed an integration and Data Warehouse solution as a part of merging project,
which aimed to eliminate NetSuite duplicate records.
Challenge: develop a complex reconciliation stored procedure to detect any deleted
or non-active or invalid customers and contacts records.
Pricereportin SSRS
Challenge: Data need flattening and a report was based on multiple nested groups.
14. Merging(deduplicating)project
I developed an integration and Data Warehouse solution as a part of merging project,
which aimed to eliminate NetSuite duplicate records.
Challenge: develop a complex reconciliation stored procedure to detect any deleted
or non-active or invalid customers and contacts records.