Innovative Data Leveraging for Procurement AnalyticsTejari
Similar to Webinar: If Your Data Could Talk, What Story Would it Tell? Would it Be a Documentary, a Thriller, or Horror Story? - QueBIT Consulting (20)
3. 3
Agenda
Introduction to QueBIT
What’s your story?
Identifying Common Pitfalls
How to capture, process, and store data
How to rationalize key reference data
Reporting vs. Analytics
Use cases from the field
Q&A
4. 4 4
Housekeeping
Today’s webinar is part of an advanced webinar series offered by QueBIT.
Our next webinar is scheduled for Thursday, December 13th at 2pm Eastern.
We will demonstrate PAx/PAW Tips and Tricks and the benefits of upgrading.
Register today by accessing the Events page on our website at
quebit.com/news-events
Miss a past webinar? No problem! Visit the Resources page on our website
//quebit.com/who-we-are/video-catalog/
Please type all questions in the Questions Pane located on the GTW toolbar.
As time permits, the questions will be addressed and answered at the end of
the webinar
5. 5
900+ Successful
Implementations
400+ Customers in
numerous industries
100+ Employees
across the US
Award-winning leaders in
analytics, 6 years in a row
QueBIT is the leading end-to-end analytics provider.
7. 7
“84% of CEOs are concerned about the data
they’re basing their decisions on.”
- Forbes Insights and KPMG
8. 8
Who are your target audiences
and
what is your message?
Government Agencies
Investors
Executives
Management
Customers
Employees
Other Stakeholders
SEC, IRS, EPA
Growth Expectations
New Market Entry
Product Growth
Service Usage
Local Market Performance
9. 9
Common Pitfalls
• Your story is…unclear
• Your audience is…not well-defined
• Your data is…poorly structured
• You have no…authoritative source(s)
• Your solution is built on…“quick wins”
10. 10
Your story is unclear
• Lack of requirements
• Uncommitted or no project sponsor
• Incomplete vision
12. 12
Poorly Structured Data (a Frankenstein solution)
• Copy of transactional structures
• Little or no integration
• Structured when unstructured is needed
14. 14
“Quick Wins”
• “Dirty” out lives “Quick”
• Cost to “fix it” later
• No plan for follow-up project
• “No Plan’s Land”
• Follow-up/through
15. TRUSTED EXPERTS IN ANALYTICS
Business Sponsorship
Capture, Process, Store (ETL)
Rationalization
Reporting vs Analytics
Get these right and you can avoid the
pitfalls…
16. 16
Business Sponsorship
• What are you building?
• Who is funding it?
• When do they need it and why?
• How often do they need it?
• Are they committed?
17. 17
Capture, Process, Store
(a.k.a. ETL)
Capture
The right information at the right time
Process
The information is integrated
Store
The information is available and fit for purpose
18. 18
Rationalizing key data
• Reference data
• Data Stewards
• Standardize before matching
• Not a one-time process
19. 19
Reporting vs. Analytics
Reporting
• Descriptive
• Backward-looking
• Raises questions
• Data into information
• Reports and dashboards
• What is happening
Analytics
• Proscriptive
• Forward-looking
• Answers questions
• Information into actionable Insights
• Findings, predictions,
recommendations
• Why is it happening
20. 20
Use Case 1
Situation:
• Company XYZ’s finance department is using flat files from multiple ERP
systems in order to create monthly financial statements.
• They want I.T. to build them a data warehouse by connecting to the ERP
systems directly and extracting the data into the database so that they can
have a one-stop shop for their reporting source.
21. 21
Use Case 1
Concerns:
• Which ERP systems take priority in data accuracy in case of conflicts?
• Is connecting to the ERP systems directly the most efficient and effective
way to get the data into a database?
• Does the finance department realize that this is not necessarily a data
warehouse, but rather more of a data store that stores this finance data from
multiple sources?
22. 22
Use Case 1
Resolution:
• Flat file extracts. These aren’t a problem if your update frequency
doesn’t need to be near real time
• Set out a longer term goal to build out a data warehouse
• This builds on the idea of using reporting software on top of the
data warehouse to build the finance report
23. 23
Use Case 2
Situation:
• ABC Music Services collects royalties for song writers. They currently store
their music catalog and performance data in a legacy mainframe database.
• They would like to create a cloud-based portal to allow their writers to be
able to see which songs in their catalogs play when, by which service, by
geographic location. The music catalog does not change frequently. The
performance information is captured daily from many different sources in
many different formats.
24. 25
Use Case 2
Recommendation:
Star-Schema in a relational database.
Why?
Relational database system support star schemas well. The problem
statement describes the need to query by writer, by song, by time, by
which service, by geography. This is a natural fit for a star schema.
25. 26
What we covered…
• What’s your story?
• Identifying Common Pitfalls
• Business sponsorship
• How to capture, process, and store data
• How to rationalize key reference data
• Reporting vs. Analytics
• Use cases from the field
26. 27 27
Data. It’s what we do.
QueBIT designs and builds sustainable analytic solutions that enable you to
unravel the full power of your data. We work with you to understand your
business and technology needs and deliver short and long term projects. We
cater to companies of all sizes and industries and have solutions and offerings
that can work for you.
Data Focus Areas:
Data Integration & Warehousing
Data Governance
Data Quality
Data Architecture
27. 28 28
Q&A
Submit questions by typing them in the Questions Pane
on the GoToWebinar toolbar
Visit our website for additional information www.quebit.com
Or email us at info@quebit.com
Thank you for attending and have a wonderful day!
Editor's Notes
If your data could talk, what story would it tell? Would it be a documentary, a thriller, or horror story?
QueBIT cares passionately about helping good companies become great, through our analytics solutions and service offerings.
We advise clients on analytics best practices and embed analytics into every day business processes. We also develop and implement our own analytics products and solutions, and resell analytics software which together with our own solutions, help us deliver results for our clients.
Client success is at the heart of our mission. We listen to our clients and intensely focus our efforts on solving their business challenges. We hire the best talent we can find, we listen to our employees, and tirelessly work to create a culture of openness, collaboration, fun and opportunity. We believe that passionate, motivated and talented people enable success for our clients.
And we pride ourselves on being the leading end-to-end business analytics provider.
Thanks Mike! Welcome to the QueBIT Data Management webinar:” “What’s your story”. My name is Keith Hollen and I’ll be presenting this part of the webinar.
According to Forbes Insights and KPMG, “84% of CEOs are concerned about the data they’re basing their decisions on.” That’s a powerful statement. Only 16% of CEOs are comfortable with the data they use every day to run their businesses. With all the technology available to us today, you would think this wouldn’t be a problem. Yet we still find that companies are struggling to provide quality data in a reliable and timely manner. Today we’ll talk about why this happens and discuss some things you can do to help your CEOs be more comfortable with the information you provide.
When you think about what you do as an organization, what’s your story? Who is your target audience? You likely have multiple stories for different target audiences.
For instance, you probably report to one or more government agencies, such as the SEC, IRS, or EPA.
You have a story for your investors. For example, Growth Expectations.
Internally, you may have a story for your Executives, mid-level and line management about New Market entry and Product Growth.
What about your customers? Perhaps Service or Product usage
And your employees may be interested in local market performance.
Some, or all of these, are potential audiences for your story.
When you think of each of these, what do you want to say? Do you have the information necessary? Do you trust that information? Is it up-to-date?
In the end, to tell your story, you need the right information, in the right place, at the right time.
So how do you get there? During this webinar, we’ll discuss common pitfalls, key concepts you need to know to avoid these pitfalls, and we’ll follow up with some real-world use cases.
Let’s now talk about some common pitfalls which lead to untrusted data.
Your story is unclear,
your audience is not well-defined,
your data is poorly structured,
you have no, or multiple, authoritative sources,
your solution is built on “quick wins”.
You really don’t have a clear definition of the picture you’re trying to paint. This is usually caused by lack of requirements and/or uncommitted project sponsors
Perhaps you’re building a data warehouse with the “Build it and they will come” philosophy or you’re getting requirements second hand. This approach often results in a data repository that doesn’t get used.
Your data repository is a collection of copies of transactional system tables. This might be fine for operational reporting, but this isn’t a data warehouse pattern.
Data living in the same house is not a data warehouse. If it isn’t integrated, It’s just a bunch of data…
You also may be using the wrong structures for the type of reporting you’re doing.
You don’t have a single source of truth. Common in situations where multiple systems are used – ERP example.
We need it now and we’ll fix it later. A.K.A. “Quick and Dirty”.
The problem is that the dirty long outlives to quick and you never have time to go back and fix it.
However, we all run in to this situation. Usually because someone with a higher pay grade has made a commitment. But it could also be a time-to-market issues.
So what do you do? Mitigate this by having a budget and a plan to fix this at a future time.
This way you can better manage expectations and your chances of actually fixing what you put in are higher than if you take a “someone else will fix it” approach.
We’ve identified some common pitfalls and some of the reasons they may occur. Now let talk about four things you need to do or understand to avoid or mitigate these pitfalls.
Business Sponsorship – Make sure you have it. Someone needs to provide the vision or the blueprint.
Capture, Process, Store (ETL) – Think through the process. Build for flexibility. Keep it as simple as possible.
Rationalization – Identify key concepts and unify them across systems.
Reporting vs Analytics – Understand the difference. They often require different structures.
This is the most important thing you need to have in order to build trusted data solutions.
Why do you need a business sponsor? Well, ask yourself this: “Do you understand the business case?” How can you build a solution without knowing the problem you’re trying to solve?
A business sponsor should be able to answer the questions:
What you building?
Who is funding it?
When do they need it and why?
How often do they need it?
Are they committed?
Capture - This is where you Make sure you have the right information at the right time
Can the information be a day old or does it need to be current as of five minutes ago? Do you need history? How far back? Can you realistically meet the expectations? If not, should the project continue?
Process – This is where you making sure the information is integrated; Keep it simple but build for flexibility. This is where it’s important to assign an enterprise wide identifier to each concept. I’d recommend you create one if even you have a single system of record for a particular domain.
Store – At this stage, you need to make sure the information is available in the right format and fit for purpose - Not all reporting/ analysis methods use the same structures.
Reference Data
What do I mean by “Rationalizing Key Data?” We’re really talking about uniquely identifying reference data. Think of it this way: When you ask for information to make decisions, you’ll ask for something like “Show me sales for customer X by product, quarter, and sales person”. In this example, customer, product, quarter, and salesperson are all reference data. This information is captured as part of a sales transaction.
Transactions exist, or they don’t. You’re simply recording an event. But reference data is what ties it all together. If you don’t get this right, you won’t have true integrated reporting. Especially if you record sales across multiple systems. You need a way to uniquely identify reference information, such as customers, products, locations, and sales people, across all your systems if you want an accurate picture enterprise-wide.
Data Stewards
Who does this? Somebody has to be responsible for defining rules for rationalizing this data. This responsibility falls on the Data Stewards. Data stewards are the people who define which reference data is needed for a domain and in what format. They’ll define required key attributes such as contact methods and demographic information. They’ll also work with others to determine techniques for matching this information from difference source systems.
So, who should fill the data steward role? It should generally be a business person close to, or responsible for, the entry point of the data. The person in this role should also be recognized as the authority on the reference data for their domain.
Once the matching techniques have been defined, much of this can be automated. However, this is not a one-time process and the matching techniques used should be reviewed periodically to validate effectiveness. Even with automation, not everything will be matched with a high level of confidence. Data stewards are responsible for managing the processes, and for the manual review of these edge cases.
Standardize before matching – Pick a technique for standardizing names. You can store the standardized name as additional information or have a reference to the standard form. You should do the same for addresses as well.
And remember, this is not a one-time process.
It’s important to understand the difference between reporting and analytics because they require different approaches to managing data. Reporting is stating known facts; primarily against structured data. Analytics derives new facts from both structured and unstructured data and is usually more fluid. The message here is don’t be too rigid in how you store your data.
So, what are the key differences between reporting and analytics?
Descriptive vs. Proscriptive.
Backward-looking vs. Forward-looking
Raises questions vs. Answers questions
Turns data into information vs. Information into actionable insights
Consist of reports and dashboards vs. Findings, predictions, and recommendations
What is happening vs. Why is it happening
Problem statement
How it was resolved
Our recommendation
Flat file extracts. These aren’t a problem if your update frequency doesn’t need to be near real time. You want to hit the source system as little as possible. In the past we’ve gotten at least 10x better performance by uploading extract files to blob storage and running bulk inserts to load the data. Bulk inserts don’t overwrite existing data so they can be used for periodic refreshes.
Set a longer-term goal to build out a data warehouse. The effort going into building the data store may not be worth it if it’s just used as a convenient one-stop data source, as opposed to a source for an operational data store and eventually a data warehouse.
This builds on the idea of using reporting software on top of the data warehouse to build the finance report
Situation:
ABC Music Services collects royalties for song writers. They currently store their music catalog and performance data in a legacy mainframe database.
They would like to create a cloud-based portal to allow their writers to be able to see which songs in their catalogs play when, by which service, by geographic location. The music catalog does not change frequently. The performance information is captured daily from many different sources in many different formats.
How would you store the portal data to allow for fast processing by typical BI reporting tools?
Unstructured (big data solution)
Normalized tables in a relational database
Star Schema in a relational database
Call the Data Wrangler…
Recommendation:
Star-Schema in a relational database.
Why?
Relational database system support star schemas well. The problem statement describes the need to query by writer, by song, by time, by which service, by geography. This is a natural fit for a star schema. Relational databases can also use techniques to optimize for fast reads.
Typical BI reporting tools support relational formats well. You could use Normalized tables, but normalization works best for transactional systems where you need fast inserts into tables that only allow valid data.
Many BI tools have connectors to big data platforms (such as Hadoop/Hive). My biggest concern here is performance. Hadoop is a platform originally designed for processing very large data files in batch mode.
The Data Wrangler would also be an excellent choice, but he is very busy and probably won’t be available…