Data is often the biggest challenge of self-service analytics. Learn how to efficiently handle data, prep and analyze large amounts of data, and blend data from multiple disparate sources in your analytics solution.
Learn more with the Gartner 2016 Critical Capabilities Report for BI and Analytics Platforms at https://goo.gl/IGNRO5.
5. #Logi16
Data is often the biggest challenge of self-service analytics
Preparing Data for Analytics is Hard
5 @soulety
6. #Logi16
The Data Problem in Self Service Analytics
6 @soulety
Data lives in different places.
Organizations outsource
applications to run their
business (e.g. CRM, Sales,
Marketing)
Accessing Data
RDBMS Applications Files
Half of the organizations are accessing
external data sources*
*MQ Survey for BI and Analytic Platforms
7. #Logi16
The Data Problem in Self Service Analytics
7 @soulety
Data lives in different places.
Organizations outsource
applications to run their
business (e.g. CRM, Sales,
Marketing)
Accessing Data
RDBMS Applications Files
Transactional systems are
often not ready for analysis.
Need to blend data across
sources to get a 360° view of
the business.
Acquiring Data
RDBMS Applications Files
8. #Logi16
The Data Problem in Self Service Analytics
8 @soulety
Data lives in different places.
Organizations outsource
applications to run their
business (e.g. CRM, Sales,
Marketing)
Accessing Data
RDBMS Applications Files
Transactional systems are
often not ready for analysis.
Need to blend data across
sources to get a 360° view of
the business.
Acquiring Data
RDBMS Applications Files
Data needs to be refreshed
and up to date for reporting.
Accessing and reporting on
data in a performant
experience.
Managing Data
RDBMS Applications Files
10. Connect and acquire data, including files,
databases, and cloud applications
Create, prepare, and manage dataviews for
self-service analysis
Speed data prep with smart profiling,
joining, and data enrichment
Accelerate performance for large data
sets with a self-tuning, easy to maintain
columnar data store
@soulety10
11. Connect
• Applications
• Databases
• Files
Data Connectors
Author
• Joining objects
• Blending data sources
• Filter objects
Dataview Authoring
Cache
• Columnar store
• Self-tuning
• Scheduled refresh
Data Repository
Prepare
• DataSmart profiling
• Calculated columns
• Multi-part text
Data Enrichment
… For Self-Service
• Element in Logi Studio
• Info, SSM, Discovery
• Columnar store for
Vision
Logi Integration
Create and Manage
Dataviews
@soulety11
12. #Logi16
Primary DataHub Use Cases
12 @soulety
1
2
3
4
Offload transactional systems that are not optimized for analysis
Ensure transactional system is not overloaded with analytical requests
Blend data from multiple sources
Combine data from databases, applications and files into a single dataview
Support application data sources not included with Logi Info
Extends self-service analysis (SSM) to application sources
Create and manage dataviews for self-service analytics
Self-managed data repository that does not require DBAs to administer
13. #Logi16
Primary DataHub Use Cases
13 @soulety
1
2
3
4
Offload transactional systems that are not optimized for analysis
Ensure transactional system is not overloaded with analytical requests
Blend data from multiple sources
Combine data from databases, applications and files into a single dataview
Support application data sources not included with Logi Info
Extends self-service analysis (SSM) to application sources
Create and manage dataviews for self-service analytics
Self-managed data repository that does not require DBAs to administer
14. #Logi16
Offload transactional systems from analytical requests
14 @soulety
Info Analytic
Application
Transactional
Application
Data is optimized for
transactions
(inserts / updates)
Data is optimized for
reporting and analysis
15. #Logi16
Offload transactional systems from analytical requests
15 @soulety
Franchise Management Software
Transactional system overloading
concerns with self service reporting.
Healthcare Solutions
Managed and self service solutions
that require isolation of the
transactional system.
16. #Logi16
Primary DataHub Use Cases
17 @soulety
1
2
3
4
Offload transactional systems that are not optimized for analysis
Ensure transactional system is not overloaded with analytical requests
Blend data from multiple sources
Combine data from databases, applications and files into a single dataview
Support application data sources not included with Logi Info
Extends self-service analysis (SSM) to application sources
Create and manage dataviews for self-service analytics
Self-managed data repository that does not require DBAs to administer
17. #Logi16
Blend data from DBs, Cloud Applications, and Files
18 @soulety
Sales & Marketing Files
OFX
DatabasesFinance / ERP
18. #Logi16
• Salesforce Connect
In App Data Blending Solutions Are Limited
19 @soulety
Connects Salesforce data to external sources
Recommended for big (external) datasets
Follows security rules defined by the company
Generates reports and charts from blended data
External data can be used in formulas
19. #Logi16
Primary DataHub Use Cases
20 @soulety
1
2
3
4
Offload transactional systems that are not optimized for analysis
Ensure transactional system is not overloaded with analytical requests
Blend data from multiple sources
Combine data from databases, applications and files into a single dataview
Support application data sources not included with Logi Info
Extends self-service analysis (SSM) to application sources
Create and manage dataviews for self-service analytics
Self-managed data repository that does not require DBAs to administer
20. #Logi16
Extended Support for Application Sources in Info
21 @soulety
Info Supported Sources
DataHub Supported Sources
21. #Logi16
Primary DataHub Use Cases
22 @soulety
1
2
3
4
Offload transactional systems that are not optimized for analysis
Ensure transactional system is not overloaded with analytical requests
Blend data from multiple sources
Combine data from databases, applications and files into a single dataview
Support application data sources not included with Logi Info
Extends self-service analysis (SSM) to application sources
Create and manage dataviews for self-service analytics
Self-managed data repository that does not require DBAs to administer
22. #Logi16
Self-managed data repository that does not require DBAs to administer
23 @soulety
• No need to tune/index DB for self-service demands
• Minimal involvement from DBAs
• Faster deployment
24. #Logi16
Data Authoring in 5 Steps
25 @soulety
1. Create a Source
2. Build your Dataview
3. Enrich your Dataview
4. Define a Data Refresh Schedule
5. Connect to Logi Info
25. #Logi16
1. Create a Source
Establish data connectivity
Applications Databases Files
OFX
26. #Logi16
2. Build a Dataview
27 @soulety
Define and cache an optimized table that blends data across sources
27. #Logi16
3. Enrich your Dataview
28 @soulety
Create calculated columns, adjust column names and types, etc.
New
Col 1
New
Col 2
New
Col 3
29. #Logi16
4. Schedule Data Cache Refresh
30 @soulety
Full Replace or Incremental Append
Dataview
ID
100
101
102
103
104
105
106
ID
100
101
103
102
Source Data
30. #Logi16
5. Connect to Logi Info
31 @soulety
Use Dataviews for Self Service reporting and custom Logi Apps
Interactive Dashboards & Reports Data Analysis SharingData Query AuthoringDiscovery
33. Learn more with the
Gartner 2016
Critical Capabilities
Report for BI and
Analytics Platforms
Editor's Notes
GM ev Tx tak time joins us for Data 101 session
Let me first introduce
Product Manager Logi
I been part team 3 years now – working in diff projects
Since last year I’d been more focused on the data side
Which is great news for me cause that’s what this ppt is about in a way
Today talk DATA PROBLEM
HOW LOGI APPROACH IT with a PRODUCT CALLED DH
As we go through the PPT I think you’ll probably relate to this challenges
Hopefully END LEAVE WITH SOME TOOLS ADDRESS THEM IN YOUR ORG
HOUSEKEEPING NOTES
Q&A at the end
Let’s get started
When users look at any type of analytics or BI solutions, they tend to focus on the outcomes (REP/DASH)
It typically starts with (LCD / Responsive reports)
This is what users see, but for us working in analytics.. Tip of iceberg
People tend to forget about data, which is often
-- Where is data coming from, do I have it
Here at Logi this is what we call Data Problem
Today data lives everywhere...
Almost every business is outsourcing app critical daily
Data lives in those apps -- usually in different places and diff formats
LET'S think of MKT MGR -- Campaign analysis
-- Identify all sources (Marketo Eloqua Files SF)
--- It can be challenging
So now you know where your data resides, you need to connect to it
What you find is that these systems were designed for trx apps, not for pulling data out.
Let's think about SF -- (ingesting data fine... calculations/queries, it struggles)
-- You may be familiar with these errors.. I work in product (TOP LEADS by Sales Rep) / Query taking too long, add filters
-- Not blaming SF -- limited for analytical requests
And some of these apps come with their native analytical apps and tools, but you get FRACTION, Not 360 view business
Let's assume you have the data, now what's next?
-- Performant experience to users / drive adoption
-- Data is up to date & reliable
Both are particularly challenging given that data is in cloud, in your internal systems and likely in your computer.
In Logi we recognized these challlenges that our customers were experiencing and that's why we introduced Logi DH.
DataHub allows users to create, prepare, and manage dataviews for self-service analysis. It connects to and acquires data from diverse sources, such as databases, cloud applications, and files. It enables you to quickly prepare the data for analysis through smart profiling, blending of data, and data enrichment. And it offers high performance even with large data sets with a self-tuning columnar data store.
The data problem is not new, and in fact there are tools out there that tackle it, but they do it in a very robust complex way, where probably the 80% of the use cases just some datamarts for basic reporting.
From a feature functionality standpoint, these are the components that make up DataHub.
You may be thinking, is DH good for me?
Let's explore the primary use cases for datahub, hopefully they will help you answer that question
I have my trx app but i don't want to serve my reporting needs from that same place
When dealing with trx apps or DBs there are 2 main concerns
1. Keep systems isolated (don't impact performance)
2. Trx systems are typically not designed to support analytical requests
With DH you can point to your trx system, bring data (separate server) cache, use it reporting needs, without compromising perform
We found that this is probably the #1 use case that we are serving...
Many customers across diff industries share similar concerns
2 examples
Transactional Systems Analytical Systems
Operational data Consolidated data
Single source Multiple Sources
Fast inserts & updates Faster reads / queries
Standard and simple queries Complex queries (aggregations, groupings)
Snapshot of ongoing business processes Multi-dimensional views of various business activities
data in multiple locations
- not only connect to it
- consolidate it to satisfy my reporting needs
We can argue that some apps have OOB solutions that try to address this problem
For example SF
But they have limitations
Salesforce connect
No external data is imported into Salesforce.org
External data is read in real time
Can connect to any source that supports ODATA 2.0
No need to write custom code
Good when external data changes frequently
Good when you only need real time access to a small fraction of the external data
Not appropriate if you need to set up workflows and triggers for the external data
ETL is a better choice when:
You need the external data to follow the sharing rules defined by your org
You want to generate reports and charts from your data
External data CANNOT be used in formula fields.
Easy way for you to manage, author dataviews without the need to engage with DBAs to maintain these views
Why?
Because DH comes with an embedded columnar store
-- self managed and self tuned
If we put DH in the context of the logi platform, it adds a lot of flexibility in terms of how you manage and feed your data to your apps..
DH is an add-on,
Depending on the use case, there might be apps where it makes sense to keep a live connection to the source and keep the data in place,
But now with DH you have the flexibility to cache that data if you need it.